Monday, September 28, 2015

WTF collection: how not to add an API

Some time ago we decided to move one of our projects to JDK 1.8. After all, JDK 1.7 reached EOL. While we did not really want to start with all those JDK 1.8 features, the security updates were important for us. In addition, the newest versions of popular browsers refuse to work over HTTPS with web servers running on JDK 1.7, and our project has a web UI.

The move was not an easy exercise. Yes, most of the problems were traced to our code, although for some of them I would have expected better diagnostics from the tools or frameworks.

But one problem stood out.

The project communicates over HTTPS with some remote web services. And we started getting errors for one of them. The web service is a bit different than others: the web service provider insisted on using the SSL/TLS Server Name Indication (SNI).

We were having some problems when we started communicating with this web service initially while we were still running with JDK 1.7. And now the errors with JDK 1.8 were remarkably similar to the errors we had initially with the web service. It was immediately clear who is the primary suspect.

After all I knew that the JDK 1.8 includes some API to explicitly control the SNI behavior. But I hoped that the JDK does the right thing if SNI settings are not explicitly controlled. Our code did not do it.

Let's look closer at it. This is what -Djavax.net.debug=all for.

First surprise: after setting the property on, I could not log in in our web based UI! We got some errors in browsers saying that the HTTPS connection could not be established. Removing the property helped with UI. How it is even possible to release a product where enabling some debugging output would break the functionality?! Yes, a lot of developers, me including, did make similar mistakes, but leaving such a bug in a released version is too far from my point of view.

And how the hell I suppose to debug the suspected SNI problem? Let's try to use another JDK. OpenJDK 1.8.0.51 out, Oracle JDK 1.8.0_60 in. Result: the problem with the debugging property and HTTPS in the web UI is gone.

Hope dies last ... the problem with the web service is still there. Of course it would have been too easy if this problem were also solved.

But at least I could now look at the SSL debugging output. And indeed, exactly what I thought: SNI is not sent.

I also knew about jsse.enableSNIExtension system property. We started the JVM without this property so the default behavior must have been used. Needless to say that explicitly setting the property to true did not change a thing.

The rest was just a tedious work of creating a reproduction scenario and some googling. A simple program with some basic HttpURLConnection manipulations did not reproduce the problem: the JDK was sending the SNI info. Time to look at the JDK source code and do more debugging.

From my point of view the authors of that part of Java have had reserved themselves a long time ago a permanent place in the developer's hell. This code is a mess and a bloody nightmare. Yes, I have seen some code that was much worse. But somehow I expected the JDK code be of a better quality...

After many cycles of modifying the test program, debugging, and studying the mess they called "source code" I came to this beauty:

HttpsClient.java, starting from line 430, method public void afterConnect():
[430]    public void afterConnect()  throws IOException, UnknownHostException {
...
[439]                    s = (SSLSocket)serverSocket;
[440]                    if (s instanceof SSLSocketImpl) {
[441]                       ((SSLSocketImpl)s).setHost(host);
[442]                    }
...
[470]            // We have two hostname verification approaches. One is in
[471]            // SSL/TLS socket layer, where the algorithm is configured with
            ...
            The rest of the very long and insightful comment is stripped
[518]             boolean needToCheckSpoofing = true;
[519]             String identification =
[520]                s.getSSLParameters().getEndpointIdentificationAlgorithm();
[521]             if (identification != null && identification.length() != 0) {
[522]                 if (identification.equalsIgnoreCase("HTTPS")) {
[523]                    // Do not check server identity again out of SSLSocket,
[524]                    // the endpoint will be identified during TLS handshaking
[525]                    // in SSLSocket.
[526]                    needToCheckSpoofing = falsee;
[527]                }   // else, we don't understand the identification algorithm,
[528]                    // need to check URL spoofing here.
[529]            } else {
[530]                 boolean isDefaultHostnameVerifier = false;
...
[535]                 if (hv != null) {
[536]                    String canonicalName = hv. getClass().getCanonicalName();
[537]                     if (canonicalName != null &&
[538]                    canonicalName.equalsIgnoreCase(defaultHVCanonicalName)) {
[539]                        isDefaultHostnameVerifier = true;
[540]                    }
[541]                }  else {
...
[545]                    isDefaultHostnameVerifier = true;
[546]                }

[548]                 if (isDefaultHostnameVerifier) {
[549]                    // If the HNV is the default from HttpsURLConnection, we
[550]                    // will do the spoof checks in SSLSocket.
[551]                    SSLParameters paramaters = s.getSSLParameters();
[552]                    paramaters.setEndpointIdentificationAlgorithm ("HTTPS");
[553]                    s.setSSLParameters(paramaters);

[555]                    needToCheckSpoofing = false;
[556]                }
[557]            }

[559]            s.startHandshake();
...
[581]    }
There are so many things that are wrong here. But let's start:
  1. Lines 440 - 442: the hostname is passed to the SSL socket via a non-public API. This basically prevents you from providing your own SSL socket factory with your own SSL sockets delegates. Your sockets will not get hostname info. And hostname is used by the default trust verification mechanism invoked from the default SSL socket implementation.

  2. The biggest "wrong" in this code starts on line 470. The authors have probably wasted all their energy on those 50 lines of comments, and got nothing left to properly implement the logic. Basically the SNI information is sent only if method SSLSocketImpl.setSSLParameters() is invoked. And if it is not invoked, no SNI is sent. And the code above shows that setSSLParameters() is invoked in one case only: if no endpoint identification algorithm was specified and no "default" hostname verifier was set. Our code had a custom hostname verifier, and oooops SNI was not send.

    The funny thing about it: if one bothers to explicitly specify an endpoint identification algorithm, even the default one, SNI is not sent either.

    There is actually a bug JDK-8072464 about the "non-default hostname verifier", but it does not mention an explicitly specified endpoint identification algorithm. And it looks like they do not plan to fix it in 1.8.

  3. There is another bug lurking in the API and the implementation: there is no easy way to disable the SNI or actually just customize it for a particular connection. Yes, one can disable sending SNI by setting jsse.enableSNIExtension system property to false, but it is a JVM-wide setting. Don't you hate when the only way to get some functionality is use some system property? I do hate that kind of "all or nothing" approach. And JDK is full of it. One of the worst offenders is javamail: it gives you a way to specify per-connection settings and still relies on JVM system properties is some cases. Really clever technic!

    Back to SNI: you see, to explicitly specify SNI you have to implement an SSL socket factory, which is already quite a step. Then you can use setSSLParameters() to customize the SNI or provide an empty list if you do not want to have SNI sent. So far so good, but this is the only place where you are in control of a socket. And it is too early. Because HttpsCient.afterConnect() is invoked much later. Say there is no endpoint identification algorithm specified and no "default" hostname verifier set. Or just imagine bug JDK-8072464 is actually fixed. In this case the default SNI behavior kicks in and overwrites whatever you have specified in the socket factory. Remember that little setHost() on line 441? This is where the host name gets into the SSL parameters. And then the code on line 551 - 553 overrides your precious SNI with the one that was set on line 441.

    So in reality you have to implement an SSL socket factory and an SSLSocket so that you can do some additional SNI manipulation in method startHandshake(). But then you will not get hostname set because lines 440 - 442 are not executed for your custom SSLSocket.

A small detour: go read what they call a "JDK Enhancement Proposal" about SNI, especially the section about testing.
Need to verify that the implementation doesn't break backward
compatibility in unexpected ways.
A noble goal, is it not?

Just imagine how much time they have spent on that JEP, on the API, on the implementation. Net result? Puff, zilch, and JDK-8072464.

Of course all this applicable only if your code relies on the JDK's HTTP support. This is probably another very good reason to move to libraries like Apache HttpComponents. I do not know if it properly supports SNI and gives you enough rope but it can at least be patched much easier if needed.

Since we still have to rely on the JDK's HTTP support I had to resort to a custom SSL socket factory and a custom SSL socket and on things like "instanceof SSLSocketImpl" and typecasts. Too much code to my liking to work around some silly bug. But at least we now can send messages to that web service.

And, by the way, there is another problem with the JDK's SNI. From my point of view it is also a bug but this time it goes over somewhat grey area of definition ambiguity. The code in question is Utilities.rawToSNIHostName().

The SNI RFC section 3.1 prohibits use of IP addresses in SNI, and the JDK behaves correctly if hostname in the above code is an IP address.

But they also ignore hostnames without a '.' character. This is wrong. I guess they try to follow the RFC where it says:
"HostName" contains the fully qualified DNS hostname of the server,
as understood by the client.

There two problems with the JDK's behavior. First of all, the spec says "as understood by the client". If a hostname with or without a '.' character is correctly resolved to an IP, it is as good as "fully qualified as understood by the client". So the JDK incorrectly excludes hostnames without a '.' character.

On the other hand, if the JDK follows some other specification of a "fully qualified DNS hostname", then a mere presence of a '.' character in a hostname does not make it fully qualified. It is at most "a qualified name". Unless of course the JDK authors have somewhere a specification that says "a hostname is fully qualified if it has at least one '.' character". But I bet they just got tired after writing all those specs, JEPs, APIs, and comments in the code.

Sunday, August 2, 2015

WTF collection: JDK and broken expectations

Imagine: you have a piece of code that have been working fine for a long time. The code connects to some web server, sends some data, gets the result, etc. Nothing fancy here.

One day the very same piece of software was used to send data to yet another web server. This time it was different: we got long delays and a lot of errors like "Unexpected end of stream". Time to investigate.

The packet capture revealed one particular HTTP request header that just screamed "You want some problems? Just use me! " The header was Expect: 100-continue

The code in question uses java.net.HttpURLConnection class. (Yes, I know about Apache commons HTTP, but that's beyond the point.) I just hoped that this header is not set by the JDK code automatically. It is always a great PITA to change HTTP headers that the JDK babysits. Just try to set HTTP request header "Host" for example! Fortunately "Expect" header is not one of those restricted by the JDK. It was set by the application itself based on some conditions. Couple of changes and this header is not sent any more. No more delays, no more "Unexpected end of stream" errors, everything works.

By now I was curious enough to find out what was going on. Even the fact that I had to go through some JDK code did not stop me. Almost every time I had to look in the JDK code I got the feeling I could easily have guess where the authors of that code grew up...

Yep, no surprises here, the same style, see: sun.net.www.protocol.http. HttpURLConnection class. Here is the relevant piece of "logic":

private void  expect100Continue()throws IOException {
            // Expect: 100-Continue was set, so check the return code for
            // Acceptance
            int oldTimeout = http.getReadTimeout();
            boolean enforceTimeOut = false;
            boolean timedOut = false;
            if (oldTimeout <= 0) {
                // 5s read timeout in case the server doesn't understand
                // Expect: 100-Continue
                http.setReadTimeout(5000);
                enforceTimeOut = true;
            }

            try {
                http.parseHTTP(responses, pi, this);
            } catch (SocketTimeoutException se) {
                if (!enforceTimeOut) {
                    throw se;
                }
                timedOut = true;
                http.setIgnoreContinue(true);
            }
            if (!timedOut) {
                // Can't use getResponseCode() yet
                String resp = responses.getValue(0);
                // Parse the response which is of the form:
                // HTTP/1.1 417 Expectation Failed
                // HTTP/1.1 100 Continue
                if (resp != null && resp.startsWith("HTTP/")) {
                    String [] sa = resp.split("\\s+");
                    responseCode = -1;
                    try {
                        // Response code is 2nd token on the line
                        if (sa.length > 1)
                            responseCode = Integer.parseInt(sa[1]);
                    } catch (NumberFormatException numberFormatException) {
                    }
                }
                if (responseCode != 100) {
                    throw new ProtocolException("Server rejected operation");
                }
            }

            http.setReadTimeout(oldTimeout);

            responseCode = -1;
            responses.reset();
            // Proceed
    }

Nice thing: they decided to work around some broken HTTP servers that may send a 100 HTTP response even if a request does not have the "Expect: 100-continue" header. See those http.setIgnoreContinue() here and there?

This is actually the only nice thing I can say about this code.

The authors also decided to work around another possible misbehavior with respect to the "Expect" header. See that comment 5s read timeout in case the server doesn't understand Expect: 100-Continue on line 1185? Except that all this happens only if the connection does not have its own read timeout set, see line 1184.

But if you decided to protect your application against some servers that take too long to respond by setting read timeout on an HttpURLConnection to some reasonable value like 30 or 60 seconds, you are screwed. Because the JDK starts waiting for a 100 HTTP response and if the server does not send it, the JDK times out after waiting as long as your read timeout setting (30 or 60 or whatever seconds!). In that case the JDK does not bother sending the request body to the server, and your application gets a SocketTimeoutException. Nice work, guys!

And it is not all. Another very interesting logic starts on line 1200. If some response was read from the server, the code verifies the response code. Completely useless and extremely harmful piece of code: if the server responded with anything other than 100 the JDK reports "Server rejected operation" error.

Now go back to RFC 2616 and read the very first requirement for HTTP/1.1 origin servers (emphasis mine):

- Upon receiving a request which includes an Expect request-header
field with the "100-continue" expectation, an origin server MUST
either respond with 100 (Continue) status and continue to read
from the input stream, or respond with a final status code. The
origin server MUST NOT wait for the request body before sending
the 100 (Continue) response. If it responds with a final status
code, it MAY close the transport connection or it MAY continue
to read and discard the rest of the request.  It MUST NOT
perform the requested method if it returns a final status code.

See that or respond with a final status code part? Say the server implements that part of the specification correctly. It receives a request with "Expect: 100-continue" header, decides that it cannot handle the request because for example it does not recognize the specified context path. The server immediately sends a 404 HTTP response. But the JDK knows better and instead of the real error your application gets a stupid and meaningless "Server rejected operation" error. Good luck finding out what went wrong.

Be warned and do not use the "Expect: 100-continue" HTTP request header if you have to rely on java.net.HttpURLConnection class. And do not think the JDK code is somehow of a better quality than rest of code out there. In fact it is normally on par or even worse.