I came across the Spring story some time ago but did not have time to write about it then. I would probably spend time writing about it now but recently I had to refresh my knowledge of JBoss and CXF.
Besides Spring usage, JBoss CXF integration has more nice things to offer. They are more subtle but they definitely deserve to be mentioned.
Remember that little jbossws-cxf.xml? It has its own secrets.
First of all there are several locations where JBoss WS code looks for it. Do you know where exactly the file is packaged in jar/war/whatever? If you think the correct location is META-INF/ for jars and WEB-INF/ for wars, you are close but it is only part of the story. This is the location that the WS deployer checks for jbossws-cxf.xml at application startup. Note that JBoss does not use classloading to locate this file, it goes straight to the jar/war and checks if the file is present. If it is there it is processed together with some other JBoss CXF specific deployment descriptors that the deployer can find at this moment. If no META-INF/jbossws-cxf.xml or WEB-INF/jbossws-cxf.xml is found the deployer does not look at other deployment descriptors even if they are present. This is important because whatever JBoss manages to find at this moment is used for configuration of WS services and injection of WS client references. Well, may be.
Of other places where JBoss WS deployer looks the most important is META-INF/cxf/jbossws-cxf.xml. Pay attention to that cxf in the path. JBoss: consistent as ever; flexible as nothing ever before. It gives you possibility to place config files in at least two different places. There are of course notable differences. First one is already explained above: the deployer is loading config files only if it detects META-INF/jbossws-cxf.xml or WEB-INF/jbossws-cxf.xml. Then it loads that file and also META-INF/cxf/jbossws-cxf.xml if present. So if you have only META-INF/cxf/jbossws-cxf.xml, the deployer will not see at application startup.
Another difference is that uses classloading to load a single META-INF/cxf/jbossws-cxf.xml. This, of course, is a source of some amusement if you happen to have multiple jars with META-INF/cxf/jbossws-cxf.xml. JBoss has some jars with that file already, like server/<server_name>/deployers/jbossws.deployer/jbossws-cxf-client.jar. If you also have META-INF/cxf/jbossws-cxf.xml in your application then which file is loaded depends on the classloading configuration. I do not know how important things from jbossws-cxf-client.jar are but you definitely do not want to miss your configuration.
Location is covered; time to look at the moment when JBoss looks for the file. It does it not only during application startup, as described above. If you use WS API like javax.xml.ws.Service.create() JBoss will do it again. Basically it is the same story except that it never loads META-INF/jbossws-cxf.xml or WEB-INF/jbossws-cxf.xml. It just goes straight to all those other JBoss CXF specific deployment descriptors that it can find, including META-INF/cxf/jbossws-cxf.xml. This process might use a different classloader than the one used during deployment so the loaded configuration might differ significantly from the one loaded during application startup.
A nice finishing touch comes from Spring XML parsing code. Most likely it is somehow configured from JBoss but I did not care to find out where and how. Deployer triggers (META-INF/jbossws-cxf.xml or WEB-INF/jbossws-cxf.xml) are parsed with XSD validation on, all other files are parsed with validation off. It seems to be a minor difference unless you want the configuration from jbossws-cxf.xml applied to the WS objects you get back from Service.create() and Service.getPort().
Basically you can use <jaxws:client> to provide custom configuration on a per-client basis, the same way as you can use <jaxws:endpoint> to configure your endpoints. But there is great confusion around <jaxws:client>: it looks like it is ignored by JBoss or CXF. For example interceptors configured on a CXF bus are used properly but interceptors configured on <jaxws:client> are not.
The reason is that you need quite crazy identification string for your <jaxws:client>. Things that are perfectly OK for <jaxws:endpoint> won't work for <jaxws:client>. Why? Because it is JBoss of course!
Some JBoss documentation suggests you have to use <jaxws:client id="{your.service.namespace}YourPortName" >. This kind of ID is not valid according to Spring XML schema. If you put it in your META-INF/jbossws-cxf.xml or WEB-INF/jbossws-cxf.xml, the application startup will fail with XSD validation error. To be fair to JBoss it looks like this particular naming convention is coming from CXF and not from JBoss.
But it is still not all! <jaxws:client id="{your.service.namespace}YourPortName" > does not work. It does not work if you rely on @Resource or @WebServiceRef, or, at least, it did not work for me when I tried, but I did not try hard. It does not work for javax.xml.ws.Service.create(), and I needed it to work so I wasted most of my time on this use case. 
Still other sources say that <jaxws:client id="{your.service.namespace}YourPortName" createdFromAPI="true"> is a way to go. It still does not work! It took me quite some time poking in CXF and Spring code under debugger to find out the version that worked for me: <jaxws:client id="{your.service.namespace}YourPortName.jaxws-client.proxyFactory">.
Basically it is CXF that ends up looking up a Spring bean named "{your.service.namespace}YourPortName.jaxws-client.proxyFactory" to get its configuration. The funniest thing is that both versions of configuration register a bean with this name. But for whatever reason when a bean is created with createdFromAPI="true", it is registered all right but it is not found later by CXF. And if it is created with ".jaxws-client.proxyFactory" suffix it is registered and successfully found by CXF.
I stopped looking further. I found enough to make my application work. But I still wonder: how could I miss it in the beginning? Nice, intuitive, easily discoverable configuration. Unforgivable.
Wednesday, September 26, 2012
Friday, September 7, 2012
Assorted facts about JBoss. Fact 6: JBoss and CXF: match made in heaven.
Bored? Want to learn something new? Or waste some time? Need quality headache?
My advice: look at web services and try to make something just a bit more complicated than "hello world" under JBoss. You will never forget it.
JBoss 6 uses CXF under the hood to support WS-related sections of EJB specification. Both products have their strong points but JBoss is definitely the leader in the "most creative WTF" contest.
 
JBoss supports declarative configuration of both web service implementation and web service client. All you need is to create a JBoss specific deployment descriptor, a file called jbossws-cxf.xml, and package it with your application. The nice touch here is that this deployment descriptor is Spring-based. This alone makes JBoss leading contester.
It is not that I have anything against Spring. Well, I actually have a lot to say about it, but it deserves a separate post.
No, really, just think about it: first you come up with a way to configure the server using XML, supporting hundreds of different schemas, all based on the same XML configuration parser library. And then you say "screw it, I fancy Spring here".
They did not go as far as packaging Spring with JBoss. Why? Who knows. Having said A they might just well have said B. But no, they use classloading to detect at runtime if Spring is present. Only if Spring is there JBoss goes ahead and reads jbossws-cxf.xml. Wow, say good bye to all hard work you put into creating that jbossws-cxf.xml - it is not used anyway. But you do not know that yet.
After some googling and digging you realize that you need Spring. Sigh. But you do need that configuration from jbossws-cxf.xml! Being "standard obeying" you bite the bullet package Spring with your application. (Being lazy you just drop Spring into server/<server_name>/lib. You save yourself a lot of time but miss a lot of fun.) Redeploy and ... your precious jbossws-cxf.xml is not loaded.
Things are looking up: you are not bored any more. You spend some time double checking your deployment and redeploying it. Nope, still not loaded. If you study jboss logs carefully you might spot this small message Spring not available, skipping check for user provided jbossws-cxf.xml / cxf.xml configuration files. If you know your way about JBoss jmx console you go and check your application's classloader and try to load some Spring classes via it. They are loaded just fine. WTF?
Remember that runtime classloading check I mentioned above? Turns out it runs very early during application deployment, before the classloaders for the application are set up. As a result the check is using the classloader of the WS deployer coming from server/<server_name>/deployers/jbossws.deployer. Surprise!
It is getting more and more interesting. You have to have Spring and you can't package it with your application. Right, but you need to make this application work. Forget about standards and keeping things where they belong. Spring to server/<server_name>/lib, JBoss restart, wow! Congratulations, you made it.
I leave the question of the proper place for Spring jars as an exercise. Instead let's look at more fun that JBoss-CXF integration might bring.
This is how I first got involved in this matter: I had an EJB application that used JMS. For reasons not important here it was decided that the application should use Apache ActiveMQ. No problem, I tested it locally under JBoss 6.1, it looked OK, and so the application was deployed in an integration environment, together with ActiveMQ resource adapter. So far so good. Later more applications were added to that environment. One of them failed to start. There were some classloading errors involving Spring classes. The application was a war that internally used Spring so Spring was packaged into WEB-INF/lib and the classloading was properly configured (WEB-INF/lib jars first). I was asked to look into the problem because there was another Spring in classpath: from ActiveMQ which is using Spring internally.
Of course removing ActiveMQ RA solved the problem for that application. Changing classloading rules of the application (to parent first) "solved" the problem as well.
Digging further I realized what has happened. JBoss classloader architecture is notoriously broken so the Spring classes from ActiveMQ leak into every application. When this failing application (with the original configuration WEB-INF/lib jars first) was starting JBoss WS deployer noticed some WS annotations and went ahead with whatever it supposed to do, including looking for Spring and jbossws-cxf.xml. Spring was there leaked from ActiveMQ. (Remember, the classloader at this moment is the one from jbossws.deployer.) The application did not have jbossws-cxf.xml but the harm was done: some instances of Spring classes were created along the way and remained referenced from CXF objects.
The application startup continued and JBoss finally created the application specific classloader and the rest of the startup code was executed using this classloader. Along the way CXF was involved again and it noticed those Spring instances created before so it went ahead with more Spring dance. But this time all the Spring classes were coming from WEB-INF/lib with predictable result: ClassCastException.
Of course changing classloading configuration would fix the problem in this case. Spring classes would always come from ActiveMQ keeping CXF happy. Well, until some other application, RA, whatever, that has Spring packaged is deployed in the same server. And do not forget that it can be a different version of Spring jars. Care to resolve this mess?
Priceless! Never attribute to malice that which ... I am not sure. Sometimes I think somebody at RedHat came up with this brilliant business idea of how to make more money out of JBoss support ...
My advice: look at web services and try to make something just a bit more complicated than "hello world" under JBoss. You will never forget it.
JBoss 6 uses CXF under the hood to support WS-related sections of EJB specification. Both products have their strong points but JBoss is definitely the leader in the "most creative WTF" contest.
JBoss supports declarative configuration of both web service implementation and web service client. All you need is to create a JBoss specific deployment descriptor, a file called jbossws-cxf.xml, and package it with your application. The nice touch here is that this deployment descriptor is Spring-based. This alone makes JBoss leading contester.
It is not that I have anything against Spring. Well, I actually have a lot to say about it, but it deserves a separate post.
No, really, just think about it: first you come up with a way to configure the server using XML, supporting hundreds of different schemas, all based on the same XML configuration parser library. And then you say "screw it, I fancy Spring here".
They did not go as far as packaging Spring with JBoss. Why? Who knows. Having said A they might just well have said B. But no, they use classloading to detect at runtime if Spring is present. Only if Spring is there JBoss goes ahead and reads jbossws-cxf.xml. Wow, say good bye to all hard work you put into creating that jbossws-cxf.xml - it is not used anyway. But you do not know that yet.
After some googling and digging you realize that you need Spring. Sigh. But you do need that configuration from jbossws-cxf.xml! Being "standard obeying" you bite the bullet package Spring with your application. (Being lazy you just drop Spring into server/<server_name>/lib. You save yourself a lot of time but miss a lot of fun.) Redeploy and ... your precious jbossws-cxf.xml is not loaded.
Things are looking up: you are not bored any more. You spend some time double checking your deployment and redeploying it. Nope, still not loaded. If you study jboss logs carefully you might spot this small message Spring not available, skipping check for user provided jbossws-cxf.xml / cxf.xml configuration files. If you know your way about JBoss jmx console you go and check your application's classloader and try to load some Spring classes via it. They are loaded just fine. WTF?
Remember that runtime classloading check I mentioned above? Turns out it runs very early during application deployment, before the classloaders for the application are set up. As a result the check is using the classloader of the WS deployer coming from server/<server_name>/deployers/jbossws.deployer. Surprise!
It is getting more and more interesting. You have to have Spring and you can't package it with your application. Right, but you need to make this application work. Forget about standards and keeping things where they belong. Spring to server/<server_name>/lib, JBoss restart, wow! Congratulations, you made it.
I leave the question of the proper place for Spring jars as an exercise. Instead let's look at more fun that JBoss-CXF integration might bring.
This is how I first got involved in this matter: I had an EJB application that used JMS. For reasons not important here it was decided that the application should use Apache ActiveMQ. No problem, I tested it locally under JBoss 6.1, it looked OK, and so the application was deployed in an integration environment, together with ActiveMQ resource adapter. So far so good. Later more applications were added to that environment. One of them failed to start. There were some classloading errors involving Spring classes. The application was a war that internally used Spring so Spring was packaged into WEB-INF/lib and the classloading was properly configured (WEB-INF/lib jars first). I was asked to look into the problem because there was another Spring in classpath: from ActiveMQ which is using Spring internally.
Of course removing ActiveMQ RA solved the problem for that application. Changing classloading rules of the application (to parent first) "solved" the problem as well.
Digging further I realized what has happened. JBoss classloader architecture is notoriously broken so the Spring classes from ActiveMQ leak into every application. When this failing application (with the original configuration WEB-INF/lib jars first) was starting JBoss WS deployer noticed some WS annotations and went ahead with whatever it supposed to do, including looking for Spring and jbossws-cxf.xml. Spring was there leaked from ActiveMQ. (Remember, the classloader at this moment is the one from jbossws.deployer.) The application did not have jbossws-cxf.xml but the harm was done: some instances of Spring classes were created along the way and remained referenced from CXF objects.
The application startup continued and JBoss finally created the application specific classloader and the rest of the startup code was executed using this classloader. Along the way CXF was involved again and it noticed those Spring instances created before so it went ahead with more Spring dance. But this time all the Spring classes were coming from WEB-INF/lib with predictable result: ClassCastException.
Of course changing classloading configuration would fix the problem in this case. Spring classes would always come from ActiveMQ keeping CXF happy. Well, until some other application, RA, whatever, that has Spring packaged is deployed in the same server. And do not forget that it can be a different version of Spring jars. Care to resolve this mess?
Priceless! Never attribute to malice that which ... I am not sure. Sometimes I think somebody at RedHat came up with this brilliant business idea of how to make more money out of JBoss support ...
Friday, August 24, 2012
Assorted facts about JBoss. Fact 5: you want control over transaction in @Singleton lifecycle methods? Tough luck.
First, some theory, namely from EJB 3.1 specification (emphasis mine): 
Of course it all started with something much more complex. After a seemingly minor change our application started to give problems during JBoss shutdown. JBoss hanged for some time and then started to spew "transaction rolled back" exceptions.
Finding the change that had caused the problem was easy. Why was it a problem? Too much software is too clever for its own good nowadays. This time it was a piece of code (also from JBoss - ha-ha) that managed to hook itself into the ongoing container transaction. The "minor" change caused this software to try ending the ongoing transaction, which it had no rights to do.
Fixing the problem seemed easy: we did not really need the @PreDestroy method to be transactional so adding appropriate @TransactionAttribute to it looked like a solution. Of course it did not work otherwise I would not write this post.
Investigating the problem led to this simple test class:
Deploying it under JBoss we use (6.1.0.Final) and then undeploying it results in the following message in the server console:
Looks perfect, JBoss dutifully runs our @PreDestroy in a transaction. Let's add @TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED) to the method and repeat the test. Result:
Barring ID difference, the output is exactly the same! Next step: move the annotation to the class itself. Finally I have got what I was expecting:
Apparently JBoss ignores @TransactionAttribute on a @PreDestroy method (and on a @PostConstruct - I verified that).
This behavior is also very clear to see from the stack trace. In case of annotation on the method or no annotation at all:
Annotation on the class:
Can happen. Who cares that this is clearly specified in the EJB 3.1 spec. And it does not matter that JBoss developers knew about this particular nuance of the EJB 3.1 spec.
There is a solution after all. Except for the case that we actually have: we need a transaction in @PostConstruct and at the same time we need "no transaction" in @PreDestroy. Oops. Of course nothing that can't be fixed but it still makes me think how many of such "small things" are around.
For some time already I try all my test things under JBoss 7 as well, using the latest and greatest JBoss 7.1.1.Final. After all, JBoss 6 is the thing from the past, or so RedHat says.
Can JBoss 7 handle this simple test case? Nope. Even the code that worked under JBoss 6 with @TransactionAttribute on the bean class does not work as expected under JBoss 7. It looks like there is no way to control transactions on @Singleton lifecycle methods in JBoss 7. There is no way: I was quite surprised by this so I checked the sources. There was some code that looked at annotations but it was replaced by hardcoded requiresNew somewhere during 7.0 beta. Way to go, guys. Who needs that EJB spec compliance anyway?
 
Section "4.3.4 Session Bean Life cycle Callback Interceptor Methods"The same is applicable for PostConstruct as well. And then more:
The following lifecycle event callbacks are supported for session beans. Lifecycle callback interceptor methods may be defined directly on the bean class
...
The PreDestroy lifecycle callback interceptor methods for singleton beans execute in a transaction context determined by the bean's transaction management type and any applicable transaction attribute.
Section "4.8.3 Transaction Semantics of Initialization and Destruction"Simple enough, is it not? But such "simple" things not working correctly can cost a lot of effort to find.
PostConstruct and PreDestroy methods of Singletons with container-managed transactions are transactional. From the bean developer's view there is no client of a PostConstruct or PreDestroy method.
A PostConstruct or PreDestroy method of a Singleton with container-managed transactions has transaction attribute REQUIRED, REQUIRES_NEW, or NOT_SUPPORTED (Required , RequiresNew, or NotSupported if the deployment descriptor is used to specify the transaction attribute).
Of course it all started with something much more complex. After a seemingly minor change our application started to give problems during JBoss shutdown. JBoss hanged for some time and then started to spew "transaction rolled back" exceptions.
Finding the change that had caused the problem was easy. Why was it a problem? Too much software is too clever for its own good nowadays. This time it was a piece of code (also from JBoss - ha-ha) that managed to hook itself into the ongoing container transaction. The "minor" change caused this software to try ending the ongoing transaction, which it had no rights to do.
Fixing the problem seemed easy: we did not really need the @PreDestroy method to be transactional so adding appropriate @TransactionAttribute to it looked like a solution. Of course it did not work otherwise I would not write this post.
Investigating the problem led to this simple test class:
package p1;
import javax.annotation.PostConstruct;
import javax.annotation.PreDestroy;
import javax.ejb.*;
import javax.naming.InitialContext;
import javax.transaction.TransactionManager;
@Singleton
@Startup
public class TeBean {
    @PostConstruct
    public void startIt() throws Exception {
        System.err.println("TeBean.startIt");
    }
    @PreDestroy
    public void stopIt() throws Exception {
        TransactionManager  tm = (TransactionManager)new InitialContext().lookup("java:/TransactionManager");
        if (tm != null) {
            System.err.println("TeBean.stopIt, transaction: [" + tm.getTransaction() + "] ");
        } else {
            System.err.println("TeBean.stopIt, no transaction manager");
        }
    }
}Deploying it under JBoss we use (6.1.0.Final) and then undeploying it results in the following message in the server console:
TeBean.stopIt, transaction: [TransactionImple < ac, BasicAction: 0:ffffc0a8016f:... status: ActionStatus.RUNNING >]
Looks perfect, JBoss dutifully runs our @PreDestroy in a transaction. Let's add @TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED) to the method and repeat the test. Result:
TeBean.stopIt, transaction: [TransactionImple < ac, BasicAction: 0:ffffc0a8016f:... status: ActionStatus.RUNNING >]
Barring ID difference, the output is exactly the same! Next step: move the annotation to the class itself. Finally I have got what I was expecting:
TeBean.stopIt, transaction: [null]
Apparently JBoss ignores @TransactionAttribute on a @PreDestroy method (and on a @PostConstruct - I verified that).
This behavior is also very clear to see from the stack trace. In case of annotation on the method or no annotation at all:
TeBean.stopIt() line: 24 ... CMTTxInterceptor.invokeInOurTx(TransactionalInvocationContext, TransactionManager) line: 247 CMTTxInterceptor.requiresNew(TransactionalInvocationContext) line: 392 CMTTxInterceptor.invoke(TransactionalInvocationContext) line: 211
Annotation on the class:
TeBean.stopIt() line: 24 ... CMTTxInterceptor.invokeInNoTx(TransactionalInvocationContext) line: 234 CMTTxInterceptor.notSupported(TransactionalInvocationContext) line: 329 CMTTxInterceptor.invoke(TransactionalInvocationContext) line: 207
Can happen. Who cares that this is clearly specified in the EJB 3.1 spec. And it does not matter that JBoss developers knew about this particular nuance of the EJB 3.1 spec.
There is a solution after all. Except for the case that we actually have: we need a transaction in @PostConstruct and at the same time we need "no transaction" in @PreDestroy. Oops. Of course nothing that can't be fixed but it still makes me think how many of such "small things" are around.
For some time already I try all my test things under JBoss 7 as well, using the latest and greatest JBoss 7.1.1.Final. After all, JBoss 6 is the thing from the past, or so RedHat says.
Can JBoss 7 handle this simple test case? Nope. Even the code that worked under JBoss 6 with @TransactionAttribute on the bean class does not work as expected under JBoss 7. It looks like there is no way to control transactions on @Singleton lifecycle methods in JBoss 7. There is no way: I was quite surprised by this so I checked the sources. There was some code that looked at annotations but it was replaced by hardcoded requiresNew somewhere during 7.0 beta. Way to go, guys. Who needs that EJB spec compliance anyway?
Tuesday, July 17, 2012
How good intentions can grind SSL to a halt
We were running a load test of our application installed at the customer. The set up was quite complicated: several nodes running the application at the customer site and a single test server outside of the customer network struggling to send a lot of messages to the customer. The servers running there were in turn sending separate ack messages to our test server.
It also involved a firewall at the customer site, HTTPS with client authentication, custom key- and trust stores, XML digital signatures, Oracle database and Oracle AQ queues and more. And JBoss as the application server.
The setup was working all right but the customer has very strict performance requirements which we had to meet. Hence the tests.
After solving couple of obvious and not so obvious problems we reached quite a good throughput. Not what we wanted but reasonable. The problem was that when we tried to increase the load the hell broke loose.
The log files contained a large number of "send failed" messages with HTTP status code 403 and some HTML page clearly coming from the customer's firewall. The HTML was a generic "you are not allowed in" error message. The customer also reported seeing a log of "read timeout" messages in the firewall's logs.
The firewall was not really a dummy thing in our setup. It also handled SSL at the customer side. Our test server sent HTTPS messages to the firewall which did authentication magic and then passed the request to one of the nodes using HTTP. On the way back the application sent the ack messages as HTTP requests to the firewall which then initiated HTTPS to our test server.
403 errors happened both on our test server and on the servers at the customer, but the pattern was different. Under a moderate load the errors were happening at the customer side, not many, comparing to the number of messages. And first errors came quite quickly. There were no errors on the test server.
Increasing the load did not change the situation until we reached some threshold. The situation at the customer side did not change, but we immediately started to see 403 messages on our test server. Much, much more than at the customer side.
How such a problem can be analyzed? There are probably several ways but I describe what I did.
The best way is of course to patch wss4j. I do not have a problem with patching but unfortunately it is not a solution in our case. In many cases we can't control the environment. Quite often our application is installed into a "standard" jboss environment of a client. Normally our application is the only application in this environment but we still allowed doing only some configuration changes like JAVA_OPTS and such.
Another possibility is to make sure bouncycastle is loaded explicitly for example via jre/lib/security/java.security. But again, it is not always allowed. In addition it just does not feel right: that jdk is not used only by our application and bouncycastle is definitely not present by default in classpath.
One can repackage bouncycastle to remove class org.bouncycastle.jce.provider.BouncyCastleProvider. Or try and mess with classloading preventing this class to be loaded. But these solutions look even less attractive.
SecurityManager can help as well because adding a provider is hooked into it, but using a SecurityManager brings its own share of nuances. For starters it requires understanding SecurityManager. Good luck with it :-)
WSSConfig class has a possibility to disable this stupid functionality so going this way can also solve the problem. The only challenge here is to do it soon enough before WSSConfig manages to screw up things. The best way is building a separate jar with some JBoss deployment descriptor so that the code is triggered during JBoss startup.
To continue testing I have added some code to our application that does two things. First it disables WSSConfig's automatic provider loading. And then it checks if bouncycastle is already loaded and which DH keypair generator is used by default. If I get bouncycastle's keypair generator I just move bouncycastle to the end of the list. Yes I know I am changing the environment I have no right to touch :-) but we need to continue testing. Later I probably just move this code into a separate JBoss specific jar.
And for the authors of this wonderful library, wss4j ... Guys, there is still room for improvement. Maybe my hard drive is not partitioned optimally? Or how about patching KDE2 under FreeBSD? There are so many unsolved problems around. Do not let common sense stop you.
It also involved a firewall at the customer site, HTTPS with client authentication, custom key- and trust stores, XML digital signatures, Oracle database and Oracle AQ queues and more. And JBoss as the application server.
The setup was working all right but the customer has very strict performance requirements which we had to meet. Hence the tests.
After solving couple of obvious and not so obvious problems we reached quite a good throughput. Not what we wanted but reasonable. The problem was that when we tried to increase the load the hell broke loose.
The log files contained a large number of "send failed" messages with HTTP status code 403 and some HTML page clearly coming from the customer's firewall. The HTML was a generic "you are not allowed in" error message. The customer also reported seeing a log of "read timeout" messages in the firewall's logs.
The firewall was not really a dummy thing in our setup. It also handled SSL at the customer side. Our test server sent HTTPS messages to the firewall which did authentication magic and then passed the request to one of the nodes using HTTP. On the way back the application sent the ack messages as HTTP requests to the firewall which then initiated HTTPS to our test server.
403 errors happened both on our test server and on the servers at the customer, but the pattern was different. Under a moderate load the errors were happening at the customer side, not many, comparing to the number of messages. And first errors came quite quickly. There were no errors on the test server.
Increasing the load did not change the situation until we reached some threshold. The situation at the customer side did not change, but we immediately started to see 403 messages on our test server. Much, much more than at the customer side.
How such a problem can be analyzed? There are probably several ways but I describe what I did.
- Switch on SSL logging on our test server: -Djavax.net.debug=ssl and rerun the tests.
 
 
- Analyze the logs. The logs were already quite large and SSL debug setting added a lot more info. And the test server was both sending and receiving HTTPS messages. So analyzing the logs took some time. But eventually I started to see a pattern.
 <timestamp> INFO [STDOUT] (<thread name>) *** ClientHello, Unknown-3.3 <timestamp> INFO [STDOUT] (<thread name>) *** ... <timestamp + [10-20-30sec]> INFO [STDOUT] (<thread name>) INFO %% Created: [Session-101, TLS_DHE_RSA_WITH_AES_128_CBC_SHA] <timestamp + [20-30sec]> INFO [STDOUT] (<thread name>) *** ServerHello, TLSv1 ... <timestamp + [20-30sec]> INFO [STDOUT] (<thread name>) *** ServerHelloDone <timestamp + [20-30sec]> INFO [STDOUT] (<thread name>) WRITE: TLSv1 Alert, length = 2 <timestamp + [20-30sec]> INFO [STDOUT] (<thread name>) Exception sending alert: java.net.SocketException: Broken pipe So the server thread receives a ClientHello SSL message and then does something for dozens of seconds before trying to send ServerHello message. And then is it too late: the sending side has closed the connection.
 
 
- When the same thread was again involved, the log was different:
 <timestamp> INFO [STDOUT] (<thread name>) *** ClientHello, Unknown-3.3 <timestamp> INFO [STDOUT] (<thread name>) *** <timestamp> INFO [STDOUT] (<thread name>) %% Resuming [Session-78, TLS_DHE_RSA_WITH_AES_128_CBC_SHA] <timestamp> INFO [STDOUT] (<thread name>) *** ServerHello, TLSv1 … <timestamp + [200-300ms]> INFO [STDOUT] (<thread name>) *** <timestamp + [200-300ms]> INFO [STDOUT] (<thread name>) *** READ: TLSv1 Application Data, length = 208 ... The delay was happening only if a new SSL session was created. It did not happen for all the created sessions though. In some cases there was some delay between ClientHello and ServerHello but not that large: well under 10 seconds.
 
 
- Googling did not help. The most messages describe "reverse DNS lookup" problem which I believed had nothing to do with our problem. Just in case I tried to verify this with a simple test application which did not experienced any delays doing reverse DNS lookup, both for known and unknown addresses.
 
 
- Everything I have seen so far could match 403s in the log at the customer side. The problem happened only if a new SSL session was created which did not happen that often. This explained why there were not that many 403s. And the reason for the errors starting to appear almost immediately was also clear: more SSL sessions had to be created to cope with more connection attempts from the customer to our test server.
 
 
- The question remained: what was causing the delays. Since I am lazy I did not want waste my time on running some network traffic analysers. Good move in hindsight.
 
 Simple HTTPS tests in my development environment were OK. Reproducing the complete setup locally was out of the question. And debugging on the server ... well, I tried it, but it was extremely slow most of the time. SSH tunnel, VM server, whatever.
 
 Instead I have started the tests again and fired jstack -l <jboss_pid> in the loop redirecting the output to individual files.
 
 As soon I noticed 'broken pipe' messages in the log I stopped the test, found the thread name which logged 'broken pipe' and looked at the output of jstack to see what the thread was doing between outputting ClientHello and ServerHello. Not much, apparently:
 "<thread-name>" daemon prio=10 tid=0x00002aab10d17000 nid=0x354b waiting for monitor entry [0x0000000053617000] java.lang.Thread.State: BLOCKED (on object monitor) at java.security.SecureRandom.nextBytes(SecureRandom.java:433) - waiting to lock <0x00002aaacfc89bf0> (a java.security.SecureRandom) at java.math.BigInteger.randomBits(BigInteger.java:479) ... at org.bouncycastle.crypto.generators.DHParametersHelper.generateSafePrimes(Unknown Source) at org.bouncycastle.crypto.generators.DHParametersGenerator.generateParameters(Unknown Source) at org.bouncycastle.jce.provider.JDKKeyPairGenerator$DH.generateKeyPair(Unknown Source) at com.sun.net.ssl.internal.ssl.DHCrypt.<init>(DHCrypt.java:76) ... at com.sun.net.ssl.internal.ssl.ServerHandshaker.clientHello(ServerHandshaker.java:425) This "waiting to lock <0x00002aaacfc89bf0>" was in almost every jstack output, and then multiple times. There was of course one thread holding the lock, and the thread was also processing ClientHello and generating DH (Diffie-Hellman) key pair.
 
 
- OK, I found the bottleneck but what's next? One thing that bothered me was that bouncycastle in the stack trace, but first I concentrated on the BigInteger generation. Unfortunately, because I lost some time on that, on reading about /dev/random, /dev/urandom, and various post about the default java's SecureRandom being not that fast. But trying to switch to something else involved quite some work. 
 
 
- I decided to look at bouncycastle. As it happened the application had bouncycastle packaged because it used some of the classes but we never needed it to be a security provider. But the fact that is was there in the stack trace clearly proved that it was installed as a security provider. Fortunately the tests we were doing did not use the functionality that relied on bouncycastle so I just deleted the jars and executed the tests again.
 
 I actually ran the tests several times, with and without bouncycastle. Every run with bouncycastle had some 'broken pipe' messages in the log. Every run without it had none. And there were no delays between ClientHello and ServerHello messages.
 
 
- Well, is bouncycastle that slow? I was not sure but I doubted. So more googling, looking at the sources of bouncycastle and the cryptoproviders that come with JDK provided the explanation.
 
 When the SSL server code needs to generate a "Diffie-Hellman" key pair for a new session, it knows only the required key strength. It asks java for a key pair generator and uses it to generate a key pair of the required strength. To generate the key pair some DH parameters (this includes large random prime numbers) are needed. A newly created bouncycastle key pair generator generates parameters first time it is used. The SSL code never reuses key pair generators.
 
 Result: bouncycastle has to generate large random prime numbers for each new key pair. The default JDK DH code generates DH parameters only once and then caches them so every newly created key pair generator just reuses what is there!
 
 Interestingly the latest version of bouncycastle (1.47) has added some caching but is it per thread and if I am not mistaken DH parameters have to be cached explicitly.
 
 
- I verified that my findings were correct with a small test application that generated a DH key pair using a default crypto provider and bouncycastle. 4 threads doing it in parallel with the default provider took less than 100 ms and with bouncycastle more than a minute.
 
 
- OK, the delays are explained. But why bouncycastle ended up being registered as a security provider? For my test application I had to explicitly add it with Security.addProvider(). And even then it was not used if I just asked for a DH keypair generator, it was only used when I explicitly asked for a DH keypair generator from bouncycastle.
 
 And it is not mentioned anywhere in JDK/JRE or JBoss startup or deploy files. It means that there is somewhere some piece of code that not only loads bouncycastle provider but also registers it before at least some of the JDK's default providers.
 
 Another round of tests, this time with -Djava.security.debug=provider. This produces enough information to understand when a provider is loaded. The log output clearly shows that bouncycastle is not there during startup of JBoss or after the startup completed. But almost immediately after I started to send test data some output about bouncycastle was added to the log.
 
 Unfortunately java.security.debug does not help with the most important question: what component/jar/whatever is loading a provider.
 
 
-  Last step: I modified BouncyCastleProvider so that it now prints a stack trace to the log. Another test run and we have the winner:
 java.lang.Exception: XXYYZZ at org.bouncycastle.jce.provider.BouncyCastleProvider.<init>(BouncyCastleProvider.java:78) at sun.reflect.xxx ... at java.lang.Class.newInstance(Class.java:308) at org.apache.ws.security.WSSConfig.loadProvider(WSSConfig.java:569) ... at org.apache.ws.security.WSSConfig.<init>(WSSConfig.java:304) ... at org.apache.ws.security.message.WSSecTimestamp.<init>This class comes from wss4j. And what a beauty it is! You can find the full source code on the internet, for example here.
 
 The important parts are:
 /** * a static boolean flag that determines whether default JCE providers * should be added at the time of construction. * * These providers, and the order in which they are added, can interfere * with some JVMs (such as IBMs). */ private static boolean addJceProviders = true; /** * Set the value of the internal addJceProviders flag. This flag * turns on (or off) automatic registration of known JCE providers * that provide necessary cryptographic algorithms for use with WSS4J. * By default, this flag is true, for backwards compatibility. You may * wish (or need) to initialize the JCE manually, e.g., in some JVMs. */ public static void setAddJceProviders(boolean value) { addJceProviders = value; } private synchronized void staticInit() { ... if (addJceProviders) { addJceProvider("BC", "org.bouncycastle.jce.provider.BouncyCastleProvider"); } ... } ... private boolean loadProvider(String id, String className) { ... // Install the provider after the SUN provider (see WSS-99) // Otherwise fall back to the old behaviour of inserting // the provider in position 2. For AIX, install it after // the IBMJCE provider. // int ret = 0; for (int i = 0; i < provs.length; i++) { if ("SUN".equals(provs[i].getName()) || "IBMJCE".equals(provs[i].getName())) { ret = java.security.Security.insertProviderAt( (java.security.Provider) c.newInstance(), i + 2); break; } } if (ret == 0) { ret = java.security.Security.insertProviderAt( (java.security.Provider) c.newInstance(), 2); } ... }
 Wow! Not only these ... alternatively thinking ... individuals mess the environment they have no business to touch, they do it by default so if I do not want it I have to opt-out. They also knew that whatever they were doing can cause problems. They were at least once bitten by this code (WSS-99). But the code is still there. Some people just never learn. Unfortunately they lay rakes and let other people step on them.
 
 And the reason? It is there, in WSS-99: so that they can get to the strong encryption algorithms on some JDKs. Road to hell is paved with good intentions.
 
 Newsflash, guys: whatever you are doing is wrong. You have no business altering the runtime environment like that. Your jar is just a small piece of big pile of code running on a server. And the problem you just so creatively "solved" could have been solved in a much less obtrusive way, and only for those environments where the problem really existed.
 
 But no, "we know better" kicks in again.
 
 
The best way is of course to patch wss4j. I do not have a problem with patching but unfortunately it is not a solution in our case. In many cases we can't control the environment. Quite often our application is installed into a "standard" jboss environment of a client. Normally our application is the only application in this environment but we still allowed doing only some configuration changes like JAVA_OPTS and such.
Another possibility is to make sure bouncycastle is loaded explicitly for example via jre/lib/security/java.security. But again, it is not always allowed. In addition it just does not feel right: that jdk is not used only by our application and bouncycastle is definitely not present by default in classpath.
One can repackage bouncycastle to remove class org.bouncycastle.jce.provider.BouncyCastleProvider. Or try and mess with classloading preventing this class to be loaded. But these solutions look even less attractive.
SecurityManager can help as well because adding a provider is hooked into it, but using a SecurityManager brings its own share of nuances. For starters it requires understanding SecurityManager. Good luck with it :-)
WSSConfig class has a possibility to disable this stupid functionality so going this way can also solve the problem. The only challenge here is to do it soon enough before WSSConfig manages to screw up things. The best way is building a separate jar with some JBoss deployment descriptor so that the code is triggered during JBoss startup.
To continue testing I have added some code to our application that does two things. First it disables WSSConfig's automatic provider loading. And then it checks if bouncycastle is already loaded and which DH keypair generator is used by default. If I get bouncycastle's keypair generator I just move bouncycastle to the end of the list. Yes I know I am changing the environment I have no right to touch :-) but we need to continue testing. Later I probably just move this code into a separate JBoss specific jar.
And for the authors of this wonderful library, wss4j ... Guys, there is still room for improvement. Maybe my hard drive is not partitioned optimally? Or how about patching KDE2 under FreeBSD? There are so many unsolved problems around. Do not let common sense stop you.
Tuesday, February 28, 2012
Do Repeat Yourself, powered by OSB
I am up to my ears in OSB now. Unfortunately I can't at the moment follow my own advice. So I am going to enjoy this fine product as much as can.
Let's start with the DRY principle and how OSB supports it. DRY is important, no doubt. Working with a complex product I expect that I can easily implement something in one place and then just reuse it in several places.
Being an enterprise product OSB is no exception here. But looking at the "do not repeat yourself" possibilities provided by OSB I wonder what has happened to people who had added these possibilities to the product.
My guess so far is that the functionality available now was released by mistake. After a release it was too late to remove or disable it. People who have added this functionality were probably fired on the spot. And since then adding anything that actually increases reuse within OSB projects was a big no-no. I am almost certain that somebody working on the next version of the product is contemplating the possibility to disable copy-paste.
In OSB there are like two or three places where you as a developer (designer? mouse-driven integration architect?) can reuse some other component or piece of code. And even these possibilities are crippled. Like the notorious OSB XQuery support (emphasis mine).
But OK, let's talk about a specific example, like validating an XML message against an XML schema definition. Unlike other, no so enterpricey, products, OSB is "XML- or nothing- inside" technology. All messages are converted to XML when they received, they stay XML as they pass through OSB and only when a message leaves OSB it might be converted to something else. Quite a logical requirement that it is sometimes necessary to validate these XML messages against some XML schema.
For example, I am building an OSB proxy service that is accessible as a SOAP webservice, and I want to "XML schema" validate incoming messages. OSB provides such a possibility with a nice "Validate" message processing stage action. (Too many words? Do not worry, it is just some diagram element in the design palette that I can insert somewhere in the message flow of the service I am building.) I place it on the message flow and go about configuring it (more mousing around): specify what part of the message I want to validate and the schema element or type to validate against.
This was easy. Except... What is this story about selecting a schema's type or element? I guess somebody had to think hard to come up with such a way to validate XML against a schema. I have to select not only a schema but also an element or a type within that schema! (And do not get me started on the fact that the schema has to be present somewhere within the project. I can't specify URI, URL, reference to classpath or whatever.)
Remember DRY? Now imagine a webservice with 10 or 20 operations defined in its WSDL. As a matter of fact WSDL adds some complexity of its own here, like multipart messages. But let's keep things simple: these WSDLs all use messages with a single part. And now imagine you need to deal with 5 to 10 such webservices.
This can't be true I thought! Let's google.... Hmmm... Uguh... Aha, this way... I see.... And here?... So, that's it?... No, here is another one... Guys, can I have a different Google?
The search was not really useful except for one post. But it confirmed my suspicion. Here is the list of possibilities to deal with XML schema validation in OSB, both googled and the ones I came up with:
Let's start with the DRY principle and how OSB supports it. DRY is important, no doubt. Working with a complex product I expect that I can easily implement something in one place and then just reuse it in several places.
Being an enterprise product OSB is no exception here. But looking at the "do not repeat yourself" possibilities provided by OSB I wonder what has happened to people who had added these possibilities to the product.
My guess so far is that the functionality available now was released by mistake. After a release it was too late to remove or disable it. People who have added this functionality were probably fired on the spot. And since then adding anything that actually increases reuse within OSB projects was a big no-no. I am almost certain that somebody working on the next version of the product is contemplating the possibility to disable copy-paste.
In OSB there are like two or three places where you as a developer (designer? mouse-driven integration architect?) can reuse some other component or piece of code. And even these possibilities are crippled. Like the notorious OSB XQuery support (emphasis mine).
The Oracle XQuery engine fully supports all of the language features that are described in the World Wide Web (W3C) specification for XQuery with one exception: modules. For more information about ...
But OK, let's talk about a specific example, like validating an XML message against an XML schema definition. Unlike other, no so enterpricey, products, OSB is "XML- or nothing- inside" technology. All messages are converted to XML when they received, they stay XML as they pass through OSB and only when a message leaves OSB it might be converted to something else. Quite a logical requirement that it is sometimes necessary to validate these XML messages against some XML schema.
For example, I am building an OSB proxy service that is accessible as a SOAP webservice, and I want to "XML schema" validate incoming messages. OSB provides such a possibility with a nice "Validate" message processing stage action. (Too many words? Do not worry, it is just some diagram element in the design palette that I can insert somewhere in the message flow of the service I am building.) I place it on the message flow and go about configuring it (more mousing around): specify what part of the message I want to validate and the schema element or type to validate against.
This was easy. Except... What is this story about selecting a schema's type or element? I guess somebody had to think hard to come up with such a way to validate XML against a schema. I have to select not only a schema but also an element or a type within that schema! (And do not get me started on the fact that the schema has to be present somewhere within the project. I can't specify URI, URL, reference to classpath or whatever.)
Remember DRY? Now imagine a webservice with 10 or 20 operations defined in its WSDL. As a matter of fact WSDL adds some complexity of its own here, like multipart messages. But let's keep things simple: these WSDLs all use messages with a single part. And now imagine you need to deal with 5 to 10 such webservices.
This can't be true I thought! Let's google.... Hmmm... Uguh... Aha, this way... I see.... And here?... So, that's it?... No, here is another one... Guys, can I have a different Google?
The search was not really useful except for one post. But it confirmed my suspicion. Here is the list of possibilities to deal with XML schema validation in OSB, both googled and the ones I came up with:
- Pretty obvious: ditch OSB in favor of something usable. Unfortunately won't work in most situations.
- Delegate: find some junior developer and ask him to do it. But it is cruel. This poor soul does not deserve such a torture.
-  Much better delegation: find the person or persons responsible for choosing this product in the first place and let them do it. Ah dreams....
 If you lucky this might bring discussion back to point 1 above. If you are not... Either way you don't have to work with OSB anymore.
 Unfortunately this is not a realistic scenario either.
- Fight against the requirement of XML schema validation. Explain. Demonstrate. Say Oracle is aware of the problem so when you migrate once more (like to version 50g) this functionality will be there for you to use.
- Follow a Scrum master. Redefine the definition of XML validation. Be creative. If you are lucky the requirements do not say explicitly "validate XML messages against XML schema". Maybe it's just "validate". Most likely you will have to perform some business validation anyway. Sheer by mistake OSB allows some degree of freedom here. Instead of having to write N XQuery files, drag-and-drop one "case" box and then another N expression boxes you can get away with at most (N + 1) XQuery files and 1 expression box. What a relief.
- Be creative. Ever heard about void*? java.lang.Object? xsd:any is your friend! Take your schema, copy it, change the copy to replace bulk of the existing definitions with xsd:any here and there and use this schema for validation. Nobody will notice anyway.
- Fight against the requirement of XML schema validation. Explain. Demonstrate. I know I have this one already above. But still if none of the above solutions worked try this one again.
- OK, you do not have a choice. XML schema validation is here and you have to implement it. Bite the bullet. If it is a single webservice with 10 or even 20 operations it is not a big deal to do it once. I mean, other options will cost more.
- You are lazy. This is good, really. What options are left? Again, delegate. But this time delegate to some non-OSB code. There are several possibilities: OSB java callout functionality, xpath external functions, even external web services. All these solutions have their own set of challenges which you would need to solve. And this will cost you. Time to build and test it, runtime performance penalty, whatever. I did not go that far yet. But it looks like I have got myself couple of volunteers who might just go all the way. We'll see what happens.
-  Working with OSB makes you think. OSB actually encourages you to think. Or drink a lot of coffee. All that time it takes to start IDE or the OSB server, to stop the OSB server, and especially to do something in IDE.
 I am sorry I can't drink that much. So I ended up thinking. I'll probably switch jobs. But while I am still here I am going to try something. One never knows...
Friday, February 24, 2012
Oracle Service Bus, first impressions
Professionally I consider myself a lucky person. Quite often when I start working on a new (to me) project I get a chance to learn something new: people, techniques, a product, or some technology.
Since beginning of December 2011 I joined a team that is using quite a lot of different Oracle products. One of the products is Oracle Service Bus formerly known as AquaLogic Service Bus. I have not got a chance to work with it before so I took the opportunity to look at it closely.
Oracle Service Bus is really a shiny product that perfectly fits Oracle's SOA strategy and blah-blah-blah standards-based blah-blah-blah mission critical blah-blah-blah and blah-blah-blah. Blah-blah.
Hmmm, does anybody really believe this?!
When I was younger I was naive. Working with something I often thought: "This is baaad. I can't wait to finish this and move on. Next project will be better. Nothing can be worse than this".
Those times are long gone. I know that no matter how strange or convoluted the current project is the next most likely will be no better. The only thing left is a bit of curiosity: one never knows what kind of SNAFU du jour one comes into. But looking at OSB old memories come back. Can anything actually be worse than this?
I started writing down my impressions but researching something on OSB I came across this post. If you ever get a chance to use OSB go read it and be prepared.
The post reflects my experience with OSB and it has more information than I collected so far. Moreover the author expresses itself much more politely than I wanted to.
I am not sure if everything in that post is up to date. After all the post is like a year and a half old, so things might have improved. But do not hold your breath.
There are also some things I have experienced but they are not mentioned in the referenced post. The most amusing one is lack of undo in IDE (Eclipse with a lot of Oracle's plugins). Come on Oracle, it is 2012. No undo? Seriously, how much worse it can get? You can only do mouse-driven development in OSB. You do not want even see their XML, trust me, let alone edit it by hand. And it is really easy to mess something up with a wrong mouse move. Even changing some property or expression has no possibility of undo.
My advice? Maximize your return on investment, stay away from Oracle Service Bus.
And if you are not convinced, read on.
Since beginning of December 2011 I joined a team that is using quite a lot of different Oracle products. One of the products is Oracle Service Bus formerly known as AquaLogic Service Bus. I have not got a chance to work with it before so I took the opportunity to look at it closely.
Oracle Service Bus is really a shiny product that perfectly fits Oracle's SOA strategy and blah-blah-blah standards-based blah-blah-blah mission critical blah-blah-blah and blah-blah-blah. Blah-blah.
Hmmm, does anybody really believe this?!
When I was younger I was naive. Working with something I often thought: "This is baaad. I can't wait to finish this and move on. Next project will be better. Nothing can be worse than this".
Those times are long gone. I know that no matter how strange or convoluted the current project is the next most likely will be no better. The only thing left is a bit of curiosity: one never knows what kind of SNAFU du jour one comes into. But looking at OSB old memories come back. Can anything actually be worse than this?
I started writing down my impressions but researching something on OSB I came across this post. If you ever get a chance to use OSB go read it and be prepared.
The post reflects my experience with OSB and it has more information than I collected so far. Moreover the author expresses itself much more politely than I wanted to.
I am not sure if everything in that post is up to date. After all the post is like a year and a half old, so things might have improved. But do not hold your breath.
There are also some things I have experienced but they are not mentioned in the referenced post. The most amusing one is lack of undo in IDE (Eclipse with a lot of Oracle's plugins). Come on Oracle, it is 2012. No undo? Seriously, how much worse it can get? You can only do mouse-driven development in OSB. You do not want even see their XML, trust me, let alone edit it by hand. And it is really easy to mess something up with a wrong mouse move. Even changing some property or expression has no possibility of undo.
My advice? Maximize your return on investment, stay away from Oracle Service Bus.
And if you are not convinced, read on.
Subscribe to:
Comments (Atom)
 
