Sunday, December 27, 2020

The most f*cked up piece of software: "no worries, we know better how you want your SOAP message parsed"

Well, as I wrote at the beginning of the previous post: at the time I had encountered the problem I did not think it would be really interesting to blog about it. So I did not.

But about one month ago we got a new business partner. We use SOAP (and in our case: still use ssaj-ri) to communicate with this partner. Messages received from that partner were failing some complex preprocessing step. Needless to say we did not have any problems with messages sent by other partners.

First we ignored the problem because this was a new partner and their messages were not compliant with some other part of the specification we relied on. We even had to make some changes to adapt our code to detect and accept their peculiar view on the specification. And at first the business partner did not send a lot of messages so we even did not notice the problem.

Eventually we noticed that all messages from that partner are failing a particular preprocessing step. When I looked at the problem I could not understand the reason for the failure. Basically, the message had several parts, each part went through the same preprocessing step, and all but one part were preprocessed successfully. One always failed.

I dumped the message before the preprocessing step and then preprocessed it separately, and it failed exactly the same way as in our product.

I was prepared to blame the business partner. After all, we did have a problem with them not following the specification. So I thought it is just another place where they do something incorrectly.

But I knew that the preprocessing library we use also has its issues. I had found and reported one some time ago, and it was confirmed and fixed.

So I started playing directly with this library trying to make it accept the message, but nothing helped.

Then I decided to look at the message as we receive it. I dumped it before it went to the SOAP parser and tried to preprocess it separately. Again, it failed the same way.

OK, case is closed: the business partner is to blame?

But then I looked at the dumped messages. I mean, really looked, with my eyes. And I could not believe what I have seen. I added a unit test that just parsed the "bytes as dumped before the saaj-ri SOAP parser" and immediately serialized the parsed message. And the parsed-then-serialized message differed from the original one.

This is the original message:

<Envelope xmlns="http://www.w3.org/2003/05/soap-envelope">
<Header ...>
...
<Body ...>
...

And this is the parsed-then-serialized message:

<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Header ...>
...
<env:Body ...>
...

See those "env:" namespace prefixes? When I saw this ... let's put it this way ... I expressed my deep concerns about the state of the software development in general and in this project in particular. I wished all the best to the developer who is responsible for this particular behavior, and to all his relatives from 10 generations before. I was amazed by the state of cognitive problems the said developer must have suffered throughout her or his life.

Needless to say parsing the same raw data with a DOM parser and then preprocessing it suddenly fixed the problem we were having.

I found the code responsible for this fubar almost immediately. This is it: NameImpl.java

Just look at it! Does it not call for even more best wishes to the developer who is responsible for this ... pile? It certainly forced me to do it.

The code looks like it was created by some junior developer who had just came across this wonderful feature of java: static nested classes. And the developer had gone full throttle into using this feature.

Do not get me wrong: I do not have anything against junior developers. Of course I went through this myself. In many ways I am probably still a junior developer. But this is why we have things like code reviews and people like mentors or team leaders or whatever.

This code should have been cleaned up right in the next commit after it was added.

Actually I digress. Normally I would not even blink at this kind of code in a 3rd party dependency: life is too short for that. But having wasted quite some time on this problem I thought I have all the right to rant about it.

The gem of this problem lies in all those wonderful classes like SOAP1_1Name which explicitly check for an empty namespace prefix and "helpfully" replace it with some hardcoded predefined value.

What on earth whoever had added these checks was thinking?! "Oh, look, empty prefix. It is probably some kind of mistake, so let's fix it."? Had this person expected others would thank him? How about a medal? I think something like a slap on the hands with an industrial-grade keyboard would be more appropriate.

It looks like we have to give more priority to the task of moving from saaj-ri.

It is jut bizarre how much low quality code ends up in the core tools ...

For the moment we have just patched this class. Since we still use JDK 8 in most cases, it means we had to patch the version that is packaged with the JDK and use -Xbootclasspath/p JVM option.

And we have added some startup warning to detect that we are running in an environment without the patch being active.

No comments:

Post a Comment