18 August, 2014 by Denise << java, play-framework, api >>

Screen Scraping Secure Pages

Recently I needed an API for some data I wanted to use in an app. The data was behind a secure login with no API exposed and I decided to screen scrape. Unfortunately, screen scraping behind secure pages is a little tricker and so to get around it, I wrote some code to hit the login page, parse the cookies and authenticity token, and then post to the form url with the above information and the user provided login credentials.

The code looks like this:

Note that you should only do this kind of scraping if you're the owner of the site and can't or don't have time to implement a proper API. Additionally, this code is really brittle and any structural changes to the DOM will probably cause your screen scraping to fail.

One (reasonable) use case for implementing something like this might be where you have an old website which for some reason you can't update to build a real API but you need one to access the information it provides.

Another thing to consider is that given we're passing the username and password straight through to our code, if you decide to expose the code as a public API it should be over HTTPS only.

17 February, 2014 by Denise << programming, java >>

Boolean comparisons

Ever wondered why people code boolean checks in if statements differently?

Example 1

This is the kind of code that fails silently because true will always be assigned to hungry and will cause the if statement to be true and the conditional code will always execute.

boolean hungry = Utils.isHungry();
if (hungry = true) {
    /* this code will always be executed */
}

Example 2

There is no possibility of an accidental assignment here, and is probably the most readable code.

boolean hungry = Utils.isHungry();
if (hungry) {
    /* make some cookies */
}

Example 3

This is the kind of code I have seen before but wondered why the developer has reversed the comparison. This is a defensive programming style, such that if in the future another developer accidentally changes == to = then the code will not compile because true is not a variable and the value of hungry cannot be assigned to it.

boolean hungry = Utils.isHungry();
if (true == hungry) {
      /* make some cookies */
}

In my opinion Example 2 is the cleanest, but now I at least realise why other developers might choose to use the style of Example 3.

16 February, 2014 by Denise << java, general >>

Why 0xCAFEBABE?

I'm guessing I'll get asked this question a fair number of times and that there'll be confusion around why I called my blog 'cafebabe' - do I consider myself some sort of 'babe' who hangs out in cafes?!

Well, the answer is a lot more geeky than anything like that. In the Java programming language, Java classes are compiled into class files containing bytecode that's then executed by the JVM. The first four bytes in Java class files are marked by the magic hexadecimal number 0xCAFEBABE. This marker is simply an identifier for the class file format to prevent the JVM from loading files that are definitely not valid class files.

As to why 0xCAFEBABE was chosen to be the magic number for Java? This is best explained by James Gosling, creator of the Java programming language:

"We used to go to lunch at a place called St Michael's Alley. According to local legend, in the deep dark past, the Grateful Dead used to perform there before they made it big. It was a pretty funky place that was definitely a Grateful Dead Kinda Place. When Jerry died, they even put up a little Buddhist-esque shrine. When we used to go there, we referred to the place as Cafe Dead. Somewhere along the line it was noticed that this was a HEX number. I was re-vamping some file format code and needed a couple of magic numbers: one for the persistent object file, and one for classes. I used CAFEDEAD for the object file format, and in grepping for 4 character hex words that fit after "CAFE" (it seemed to be a good theme) I hit on BABE and decided to use it. At that time, it didn't seem terribly important or destined to go anywhere but the trash-can of history. So CAFEBABE became the class file format, and CAFEDEAD was the persistent object format. But the persistent object facility went away, and along with it went the use of CAFEDEAD - it was eventually replaced by RMI."