Tuesday, July 27, 2010

Deploying JBPM Enterprise in OC4J, part 2: OC4J all other the place

To continue where I left:

1. Before deploying in OC4J I decided to get rid of semi-manual repackaging: I suspected I would need to do it a lot because of OC4J specific deployment descriptor. I succeeded. It is not an interesting subject, it is not the topic of this post, and I probably would not make many friends in maven or m2eclipse community if I say aloud what I think, so I say little: it is definitely not the primary task for maven.

2. Honestly, I do not know how many more times I had to change ejb-jar.xml before OC4J stopped complaining about JMS and started complaining about other things. Even more: some of those changes broke unit tests so I had to do even more changes to compensate for that. Eventually OC4J gave up: no more JMS errors during deployment.

3. Now I have TopLink errors (TopLink classes are not on the classpath). Remember I said the project was using Hibernate native API? The team had some problems making it work in OC4J with TopLink enabled, so they switched TopLink off for their application. I do not know how real the problems were, but this was quite a reasonable thing to do: your application is not using JPA, so you do not need support libraries for that, so you instruct OC4J to remove them from classpath.

But it turns out that OC4J is using TopLink for their entity bean support. And jbpm-enterprise has an entity bean. WTF? You are using Hibernate for all your persistency, "the best ORM in the world" (YMMV), and all of the sudden you need an entity bean?! OK, I have seen that bean, the code, and the deployment descriptor entries while I was busy with JMS configuration, so I knew it is there, and it did not give any troubles in openejb, so I did not worry about it...

Payback time. And more things to learn: I have never used the EJB timer service before, and the sole purpose of this entity bean is to work with the EJB timer service. It took me some reading to understand why jbpm-enterprise has two beans (an SLSB and this entity bean) to work with the EJB timer service, why the SLSB is deprecated, and what the reason for that deprecation is. I should say it is a clever trick, completely spec compliant and that I would probably never think of something like this. To introduce an entity bean, the only entity bean in the whole project, as a solution to a performance problem... Of course, if it is the only solution I would not say a word.

But within 5 minutes after I understood the reason why the entity bean is there I came up with another solution to the same problem that did not require any entity beans. Theoretically that should work, I did not have time to try it, but my guess I will get a chance in the near future. Anyway, at that moment the application did not use jbpm timers, so I went back to the deployment descriptor and now I had one bean less to deploy. Full build cycle, no unit test failures, deploy, no errors, success.

4. Almost all the time while working on this jbpm-enterprise I was thinking: drop it, it will not work. Do you ever get a feeling "one last step, and you are there"? Probably because of this feeling I carried on. And the feeling paid off, except I did not believe it. Successful deployment does not mean no errors at run time.

So I fired manual test scenario and got a big hello from OC4J JMS Database Persistence again. Very big. Stacktraces dozens of lines long. The exception was coming from JMS session.close() in org.jbpm.msg.jms.JmsMessageService. It said basically "rollback is not allowed in this context". Huh? Sequel "WTF comes back and strikes again". It is the same WTF I have mentioned before: javax.jms.Connection.createSession() parameters, nicely commented in class org.jbpm.msg.jms.JmsMessageService:

/*
* If the connection supports XA, the session will always take part in the global transaction.
* Otherwise the first parameter specifies whether message productions and consumptions
* are part of a single transaction (TRUE) or performed immediately (FALSE).
* Messages are never meant to be received before the database transaction commits,
* hence the transacted is preferable.
*/
session = connection.createSession(true,
Session.SESSION_TRANSACTED);


When I was running my test application with JMS Database Persistence implementation, I ignored all the exceptions from xyz.close() calls. Well, it was a testing application, so why bother. Big mistake. I revisited my test application and put some logging in catch, and sure, closing a session created with createSession(true, ...) would throw the same exception while closing a session created with createSession(false, ...) would not.

The strange thing: even if the exception was thrown, everything would work without problems if the exception was just ignored. Messages were sent and successfully received, or not, depending on what was going on in the sending method (nothing/EJBContext.setRollbackOnly()/throw (un-)checked exception).

5. I went as far as looked at Oracle classes under class disassembler. And sure enough closing a transacted session would trigger a rollback. This was also the explanation why having a transacted session with In-Memory JMS resulted in messages being lost: rollback attempt. Except that with In-Memory JMS session.close() would actually rollback the JMS transaction and succeed, and with JMS Database Persistence rollback would fail, and if that exception is caught, the transaction would be committed by the container. But the exception is reported and rethrown in jbpm-enterprise.

6. This was the last straw. I have actually built a modified jbpm-enterprise with changed parameters to createSession(), and satisfied my feeling of 'being there'. Everything worked. And then I sent an email to the team members saying "forget it, it is just not worth the trouble".

Deploying JBPM Enterprise in OC4J, part 1: not yet OC4J specific

I just can't get away from this OC4J and JBPM. After I have solved the problem with JMS transactions the team asked me to do another "small" thing: include jbpm-enterprise.jar in the project instead of copy-pasted code. The idea is nice and sound of course: after all, the only reason for that copy-pasted code was desire to get the project going in a short amount of time and to avoid that JMS transaction problem... Well, let's say I believed them.

After spending quite some time on it, first with another colleague and then on my own, I came to the conclusion: forget it, it is just not worth the trouble. (This is applicable to JBPM 3.3.1 and OC4J 10.1.3; the situation might be better with newer versions.)

This post is a bit more practical than the previous one because there might be somebody else out there trying to get it working in some other app server. Other than JBoss that is.

Here is the list of things I came across trying to make that work. Some of them are generic and some are OC4J specific, and the advice "forget it" is applicable to OC4J. Other app servers might be a bit easier as targets.

1. Of course, the first thing was to get rid of copy-paste code, add jbpm-enterprise.jar as a maven dependency, change jbpm.cfg.xml to point to relevant classes from jbpm-enterprise, and fire 'mvn clean install'. Well, that was easy. It compiles? Wow, we are almost there... Or not, because unit tests fail. Openejb does not like jbpm-enterprise, namely, its MDBs (there are two of them, JobListenerBean that extends CommandListenerBean). The error message is pretty clear:

When annotating a bean class as @MessageDriven without declaring messageListenerInterface, the bean must implement exactly one interface, no more and no less. beanClass=org.jbpm.ejb.impl. JobListenerBean

Yeah, sure, 'CommandListenerBean implements MessageListener' and 'JobListenerBean extends CommandListenerBean'. Except that the beans do not use any annotations. jbpm-enterprise uses a deployment descriptor instead. Annoying... the error message could have been better. Apparently, openejb does not distinguish (at least here) between annotations and deployment descriptor because changing the descriptor was enough to fix the problem (adding javax.jms.MessageListener to both beans). But this means that jbpm-enterprise can't be used as is and has to be repackaged. Not a big problem: we would need to repackage it anyway because it needs an OC4J specific deployment descriptor. The "right" solution is ... would be ... found later, we are just not in the mood for all this maven stuff now, let alone licensing. For now: repackage it manually, and place it back into the local maven repository. At least we can continue.

2. Tests still fail, except this time with javax.naming.NamingException. All the joys of server independent code, configuration in deployment descriptors, etc, etc, and so on were destroyed with one line:

String connectionFactoryJndiName = "java:JmsXA";
Nice, is it not? JBoss specific JNDI names, hardcoded in java code. Those guys know a thing or two about portable applications, don't they? OK, changing a deployment descriptor is one thing, but changing the code is too far. Fortunately those factories in JBPM are pluggable, so we just have to implement our own. Done in no time, subclass JmsMessageServiceFactory, override public ConnectionFactory getConnectionFactory(), except that... what JNDI name should we use? No so difficult to answer thought we, the deployment descriptor uses jms/JbpmConnectionFactory, so we use it as well.

3. I spare you... res-ref-names, openejb specific names, attempts to define openejb connection factories, ... nothing worked. I think that we would have been able to debug the problem eventually, but pure by chance we found it. All that fuss just because of a one letter! Meet the hero: ejb-jar.xml, line 79, javax.jms.ConnnectionFactory

4. Repackage, compile, test. Well, I should have guessed. java:JmsXA was not the only one. Queue names in JmsMessageServiceFactory.java are also JBoss specific. Let's call it a day for today.

Anyway, as of that moment I continued to work on the problem alone.

Sime time later:

5. Using res-ref-names and message-destination-ref-name from the deployment descriptor in my subclass of JmsMessageServiceFactory did not work. I had to use openejb specific JNDI names, and I did not like it at all. Well, at least most of unit tests are green. Only one is still failing.

6. I decided to get rid of the server specific JNDI names first. There were several possibilities, but most of them would require specifying some additional things in the deployment descriptor for each enterprise bean that ends up invoking JmsMessageServiceFactory, including any bean that would be added later. I did not like that, so I just created a single SLSB with the correct @Resource annotations and then used it from my JmsMessageServiceFactory implementation. At least I have to configure one bean correctly and never worry about any other beans. Test again, and the same test as before is failing, with the same message. It is a good sign. Really.

7. So what is interesting about that test? It is the only test so far that not only uses JBPM functionality to send JMS messages but also depends on it. There are some tests that have the necessary jbpm woo-doo to send JMS messages, but those tests never check results of JMS operations. There are other tests relying on JMS, but they are in another, dependent, maven project, and I am not yet that far. Some more logging-debugging, and the result: openejb uses different queues in SLSB and in MDBs (I should have realize it sooner). OK, some more changes to the deployment descriptor of jbpm-enterprise, package (aren't that tiring?), copy, compile, test... Success, everything is green.

Conclusion: if you need to deploy jbpm-enterprise in openejb (-derived) server, it can be done if you are prepared to edit the deployment descriptor and add couple of java classes.

But I need to deploy it in OC4J. To be continued...

Monday, July 19, 2010

Interesting JMS behavior in OC4J

Recently I was asked to help a project developed by colleagues with couple of problems they were having with JMS and transactions.

The project is basically a J2EE 1.5 (as much as possible) application that has to run in Oracle Application Server 10g Release 3 (OC4J 10.1.3). The app server version was a hard requirement so the team could not just ditch it and use something better or modern. The application uses JBoss jBPM 3.3.1.GA and because of this has to use Hibernate native API instead of JPA. They also wanted to use jbpm-enterprise.jar but did not manage to do it on time for some demos. Instead they reimplemented part of the functionality from jbpm-enterprise.jar.

One of the things they needed from jbpm-enterprise was usage of JMS for asynchronous messaging and transaction demarcation. The setup is quite simple: the implementation of org.jbpm.msg.MessageService uses stateless session bean to save data (org.jbpm.job.Job) in the database and to send a JMS message with jobId in it. There is also an MDB that receives the message, reads the Job and data executes it.

The problem was: sometimes, when running the application under OC4J, the MDB would receive a message and try to find a job based on the jobId from the message but it looked like the job was not in the database. But of course the job was there when you look in the database in Toad or SQL*Plus. The problem manifested itself randomly and only under OC4J, and never in their unit tests that were successfully running in openejb (the tests covered the failing code path as well).

The team realized they have a transaction problem and even went as far as creating unit tests to test transactional behavior with JMS. This I found funny: testing something not in the environment where it fails seemed useless to me.

They also came up with a solution: they have added a call to HibernateSession.flush() to SLSB right after HibernateSession.saveOrUpdate(job). The team members claimed it fixed their problem but felt it was a dirty hack, and because of that they needed to find a better solution. So they asked me to look at the problem.

Well, first of all, I would not call HibernateSession.flush() a dirty hack: if it really fixes the problem so be it. The thing is: I was surprised to hear it helped. It should not have worked unless they have done something very special with Hibernate sessions and transaction management. Well, they have not, so the very first thing I did was a small demonstration: adding Thread.sleep(100) right after JMS send resulted in the same error in the MDB. No (reasonable) delay with Thread.sleep() right after flush() solved the problem. OK, I was right, their solution was not a solution at all. The only possible explanation of why this worked I could come up with was a race condition: flush() does all the heavy lifting with SQL so it could just happen that the database transaction is committed just on time before MDB onMessage() is invoked. Without flush() Hibernate must perform all the hard work on session close, and it takes much more time than just commit.

Back to the problem. I did not feel like investigating the problem in the context of their project: it is a multi-module (2 ejb jars, 2 wars, packaged as an ear) maven project with some "features" on maven side, which resulted in a lot of wasted time to build, package and deploy the application. I have created a couple of simple projects to replicate the problem. Actually, I did not include any persistency at all; it was pure 'JMS send' in SLSB and an MDB. I was able to reproduce the problem immediately. Debugging and logging proved that everything worked without a hitch under openejb, but under OC4J the MDB's onMessage() is invoked before the SLSB's sending method completes. I also got quite a WTF feeling along the way when I experimented with javax.jms.Connection.createSession() parameters, but the real meaning of the WTF hit me much later. Another interesting thing: EJBContext.setRollbackOnly() in SLSB did not work either.

Googling did not really help: there were posts describing what could be similar or related problems, but no proposed solutions.

On top of that OC4J has 2 JMS implementation and the fact that Oracle renamed them along the way did not help either. I should probably have mentioned that the project used In-Memory JMS implementation as it was the easiest to configure. Switching to File-Based JMS implementation did not change a thing. There is also JMS Database Persistence implementation, but my initial attempts to configure it failed, so I stopped trying.

It took me quite some debugging and googling to find the solution. Actually, Oracle Containers for J2EE Services Guide, chapter 4, Using Oracle Enterprise Messaging Service (http://download.oracle.com/docs/cd/B32110_01/web.1013/b28958/jms.htm#g1088175), mentions it: setting oc4j.jms.pseudoTransactionEnlistment property in j2ee/home/config/jms.xml "solves" the problem. But who reads the documentation?! (I did, I just skipped most of it.)

Pseudo ... transaction ... enlistment ... sounds scary, but the solution worked for both In-Memory and File-Based JMS implementations and I did not care about the warnings about future releases: OC4J is being replaced with WebLogic anyway.

There are several things I still do not understand about all this, and this makes me feel I am joining the ranks of cargo cult programmers:

1. This should "just work". All the info I was able to find, including specs, tutorials and examples have similar code. And this works in openejb.

2. Apparently this was supported in earlier versions of OC4J but somebody decided that what appears to be a standard behavior was not correct?

3. During my debugging, before I found the solution, I tried to use XA variants of JMS objects, but got errors from OC4J either during application deployment (MDB configuration) or during runtime (SLSB JMS send code) saying that I can't use XA objects there. OK, that might just have been errors in JMS configurations I set up.

4. I also tried to use JMS local transactions in SLSB, with and without explicit rollback and commit. The local transactions were actually a reason for the WTF I mentioned above: if a JMS session is transacted, no messages are delivered to the MDB without explicit commit, but explicit commit/rollback caused OC4J to complain saying that commit/rollback are not allowed in that context. This was also my understanding of using JMS inside an enterprise bean. The parameters to createSession did not matter in openejb. The Java EE 5 Tutorial (and 1.4 as well), chapter "Using the JMS API in a Java EE Application"(http://download.oracle.com/docs/cd/E17477_01/javaee/5/tutorial/doc/bncgl.html) says (emphasis mine):

When you create a session in an enterprise bean, the container ignores the arguments you specify, because it manages all transactional properties for enterprise beans. It is still a good idea to specify arguments of true and 0 to the createSession method to make this situation clear:

session = connection.createSession(true, 0);


Apparently it matters to OC4J. Well, I would not cry when this thing is finally dead.

After I have found a solution for the problem, I decided to look again at configuring the application to use JMS Database Persistence. This would be a different, even better, solution: the project relied on Oracle database anyway, so why not?

I should say this thing is not that easy to configure the first time you try it. Every problem can be solved with another level of indirection? Ha, just imagine how many problems you can get with another level of indirection. Oracle's JMS support is a good example of this: there are "administered objects" and real objects and they all (or not - it depends) are visible in JNDI. And you are not allowed to mix them; you are supposed to use only "administered objects", which is probably a right idea. Except that in case of In-Memory and File-Based JMS you are using real objects most of the time; you can use administered objects as well but real objects are much easier to configure.

And then making sure your application runs successfully under OC4J with JMS Database Persistence and under openejb with ActiveMQ requires some work on deployment descriptors; you can't get away with annotations only. But in the end it works, and most of the time it is the only thing (well, the main one) that matters, right? A nice side effect: WTF is gone, it does not matter what parameters I pass into javax.jms.Connection.createSession().