Boring stuff: Interesting JMS behavior in OC4J

Recently I was asked to help a project developed by colleagues with couple of problems they were having with JMS and transactions.

The project is basically a J2EE 1.5 (as much as possible) application that has to run in Oracle Application Server 10g Release 3 (OC4J 10.1.3). The app server version was a hard requirement so the team could not just ditch it and use something better or modern. The application uses JBoss jBPM 3.3.1.GA and because of this has to use Hibernate native API instead of JPA. They also wanted to use jbpm-enterprise.jar but did not manage to do it on time for some demos. Instead they reimplemented part of the functionality from jbpm-enterprise.jar.

One of the things they needed from jbpm-enterprise was usage of JMS for asynchronous messaging and transaction demarcation. The setup is quite simple: the implementation of org.jbpm.msg.MessageService uses stateless session bean to save data (org.jbpm.job.Job) in the database and to send a JMS message with jobId in it. There is also an MDB that receives the message, reads the Job and data executes it.

The problem was: sometimes, when running the application under OC4J, the MDB would receive a message and try to find a job based on the jobId from the message but it looked like the job was not in the database. But of course the job was there when you look in the database in Toad or SQL*Plus. The problem manifested itself randomly and only under OC4J, and never in their unit tests that were successfully running in openejb (the tests covered the failing code path as well).

The team realized they have a transaction problem and even went as far as creating unit tests to test transactional behavior with JMS. This I found funny: testing something not in the environment where it fails seemed useless to me.

They also came up with a solution: they have added a call to HibernateSession.flush() to SLSB right after HibernateSession.saveOrUpdate(job). The team members claimed it fixed their problem but felt it was a dirty hack, and because of that they needed to find a better solution. So they asked me to look at the problem.

Well, first of all, I would not call HibernateSession.flush() a dirty hack: if it really fixes the problem so be it. The thing is: I was surprised to hear it helped. It should not have worked unless they have done something very special with Hibernate sessions and transaction management. Well, they have not, so the very first thing I did was a small demonstration: adding Thread.sleep(100) right after JMS send resulted in the same error in the MDB. No (reasonable) delay with Thread.sleep() right after flush() solved the problem. OK, I was right, their solution was not a solution at all. The only possible explanation of why this worked I could come up with was a race condition: flush() does all the heavy lifting with SQL so it could just happen that the database transaction is committed just on time before MDB onMessage() is invoked. Without flush() Hibernate must perform all the hard work on session close, and it takes much more time than just commit.

Back to the problem. I did not feel like investigating the problem in the context of their project: it is a multi-module (2 ejb jars, 2 wars, packaged as an ear) maven project with some "features" on maven side, which resulted in a lot of wasted time to build, package and deploy the application. I have created a couple of simple projects to replicate the problem. Actually, I did not include any persistency at all; it was pure 'JMS send' in SLSB and an MDB. I was able to reproduce the problem immediately. Debugging and logging proved that everything worked without a hitch under openejb, but under OC4J the MDB's onMessage() is invoked before the SLSB's sending method completes. I also got quite a WTF feeling along the way when I experimented with javax.jms.Connection.createSession() parameters, but the real meaning of the WTF hit me much later. Another interesting thing: EJBContext.setRollbackOnly() in SLSB did not work either.

Googling did not really help: there were posts describing what could be similar or related problems, but no proposed solutions.

On top of that OC4J has 2 JMS implementation and the fact that Oracle renamed them along the way did not help either. I should probably have mentioned that the project used In-Memory JMS implementation as it was the easiest to configure. Switching to File-Based JMS implementation did not change a thing. There is also JMS Database Persistence implementation, but my initial attempts to configure it failed, so I stopped trying.

It took me quite some debugging and googling to find the solution. Actually, Oracle Containers for J2EE Services Guide, chapter 4, Using Oracle Enterprise Messaging Service (http://download.oracle.com/docs/cd/B32110_01/web.1013/b28958/jms.htm#g1088175), mentions it: setting oc4j.jms.pseudoTransactionEnlistment property in j2ee/home/config/jms.xml "solves" the problem. But who reads the documentation?! (I did, I just skipped most of it.)

Pseudo ... transaction ... enlistment ... sounds scary, but the solution worked for both In-Memory and File-Based JMS implementations and I did not care about the warnings about future releases: OC4J is being replaced with WebLogic anyway.

There are several things I still do not understand about all this, and this makes me feel I am joining the ranks of cargo cult programmers:

1. This should "just work". All the info I was able to find, including specs, tutorials and examples have similar code. And this works in openejb.

2. Apparently this was supported in earlier versions of OC4J but somebody decided that what appears to be a standard behavior was not correct?

3. During my debugging, before I found the solution, I tried to use XA variants of JMS objects, but got errors from OC4J either during application deployment (MDB configuration) or during runtime (SLSB JMS send code) saying that I can't use XA objects there. OK, that might just have been errors in JMS configurations I set up.

4. I also tried to use JMS local transactions in SLSB, with and without explicit rollback and commit. The local transactions were actually a reason for the WTF I mentioned above: if a JMS session is transacted, no messages are delivered to the MDB without explicit commit, but explicit commit/rollback caused OC4J to complain saying that commit/rollback are not allowed in that context. This was also my understanding of using JMS inside an enterprise bean. The parameters to createSession did not matter in openejb. The Java EE 5 Tutorial (and 1.4 as well), chapter "Using the JMS API in a Java EE Application"(http://download.oracle.com/docs/cd/E17477_01/javaee/5/tutorial/doc/bncgl.html) says (emphasis mine):

When you create a session in an enterprise bean, the container ignores the arguments you specify, because it manages all transactional properties for enterprise beans. It is still a good idea to specify arguments of true and 0 to the createSession method to make this situation clear:

session = connection.createSession(true, 0);

Apparently it matters to OC4J. Well, I would not cry when this thing is finally dead.

After I have found a solution for the problem, I decided to look again at configuring the application to use JMS Database Persistence. This would be a different, even better, solution: the project relied on Oracle database anyway, so why not?

I should say this thing is not that easy to configure the first time you try it. Every problem can be solved with another level of indirection? Ha, just imagine how many problems you can get with another level of indirection. Oracle's JMS support is a good example of this: there are "administered objects" and real objects and they all (or not - it depends) are visible in JNDI. And you are not allowed to mix them; you are supposed to use only "administered objects", which is probably a right idea. Except that in case of In-Memory and File-Based JMS you are using real objects most of the time; you can use administered objects as well but real objects are much easier to configure.

And then making sure your application runs successfully under OC4J with JMS Database Persistence and under openejb with ActiveMQ requires some work on deployment descriptors; you can't get away with annotations only. But in the end it works, and most of the time it is the only thing (well, the main one) that matters, right? A nice side effect: WTF is gone, it does not matter what parameters I pass into javax.jms.Connection.createSession().

Boring stuff

Monday, July 19, 2010

Interesting JMS behavior in OC4J

1 comment: