Saturday, December 26, 2020

The most f*cked up piece of software ...

... that I have seen so far. Surprisingly it is not JBoss.

I know I did not blog for a long time. It is not like I did not have any fun with Oracle, JBoss, JDK, or some other woefully broken code. It is just that most problems I come across look pretty mundane. Or they require quite an in-depth write up and I feel like it is stupid to spend a lot of time on describing something after I already have wasted hours if not days to understand and fix.

Case in point: I first ran into a problem with a particular functionality almost 3 years ago. The problem was reproducible only under some specific circumstances so it took some time to find the root cause. When I found it I did not think it would be really interesting to blog about it. After all, I had already encountered much more interesting problems in software coming from the same group of developers. So I came with a workaround which helped in a lot of cases, and I decided that we need to get rid of this dependency altogether as soon as possible.

Unfortunately other tasks with higher priority prevented me from completing this task, and we still have this dependency. On the other hand had I completed that task I would not have recently a "pleasure" wasting literally days on tracking some other problems that were caused by the same software.

I decided that the sheer variance of problems in one not particularly complicated project is really worth blogging about.

Meet the "the most fucked up piece of software du jour": SAAJ reference implementation. I did not really trace its pedigree as I did not care about what particular shithole "incubator of promising projects" it crept out, but it is present under prefix "metro" in a lot of code repositories. Names of some authors in the source code of saaj-ri match names of authors in other metro-related projects like WSIT. So I am not amused.

What made all the problems worse is that Sun started to include metro subprojects in JDK as core functionality. (As if the JDK code was perfect and it needed some low quality code for balance.) As a result a lot of broken code ended up being used by a lot of developers in a lot of projects. This all made fixing bugs more difficult: even if they were fixed as part of the metro projects the fixes were seldom included in the JDK.

Oracle continued this tradition, at least up to JDK 11, when they finally removed a bunch of stuff, including SAAJ, from the JDK. But before that they made some extremely bad choices which resulted in saaj-ri being broken beyond anything imaginable.

saaj-ri is now part of Eclipse EE4J and looking at its git repository I can only conclude that the project decay is evident. Not that I care.

But saaj-ri is still included in a lot of places and a lot of developers are still using it, directly or indirectly. I can only wish good luck to them.

Well, enough ranting. Time to show the code. Let's start with the problem I encountered 3 years ago.

Part of our application was processing SOAP messages using saaj packaged with the JDK. We noticed that under some circumstances processing of several SOAP messages in parallel was almost grinding to a halt. After taking some thread dumps and analyzing them we found this wonderful piece of code in saaj classes.

This is the code from the saaj-ri. The code in the JDK was under a different package name but for the rest it the same or very close. See line 91:

private static ParserPool parserPool = new ParserPool(5);

You can enjoy the source code of ParserPool here, although it is not interesting.

In the current JDK8 EnvelopeFactory is a bit different: it has a classloader-bound pool. What a relief!

What does it mean basically? You can parse at most 5 SOAP messages in parallel [from the code loaded by the same classloader]. If for some reason the parsing takes a lot of time you have a serious problem.

This code raises many questions like why the hell they need the pool at all? OK, this might have been important back when saaj-ri was initially developed but why this code is still present now? Why only 5 instances?

To add insult to injury this code has also a resource leak. In some circumstances you might end up with an empty parser pool: some exceptions can be thrown after a parser was retrieved from the pool. In that case the parser is not returned in the pool.

But do not despair! Good developers of saaj-ri are ready to combat the problems. They have actually fixed the resource leak somewhere in 1.4.X version.

And they added a way to increase the pool size! Wow - now you have it configurable. (As usual, with a JDK system property, once and for everybody, but who am I to point to a better way?)

No comments:

Post a Comment