18 August, 2014 by Denise << java, play-framework, api >>

Screen Scraping Secure Pages

Recently I needed an API for some data I wanted to use in an app. The data was behind a secure login with no API exposed and I decided to screen scrape. Unfortunately, screen scraping behind secure pages is a little tricker and so to get around it, I wrote some code to hit the login page, parse the cookies and authenticity token, and then post to the form url with the above information and the user provided login credentials.

The code looks like this:

Note that you should only do this kind of scraping if you're the owner of the site and can't or don't have time to implement a proper API. Additionally, this code is really brittle and any structural changes to the DOM will probably cause your screen scraping to fail.

One (reasonable) use case for implementing something like this might be where you have an old website which for some reason you can't update to build a real API but you need one to access the information it provides.

Another thing to consider is that given we're passing the username and password straight through to our code, if you decide to expose the code as a public API it should be over HTTPS only.

01 July, 2014 by Denise << programming, scala, play-framework >>

CORS and (Scala) Play

What is CORS?

CORS stands for 'Cross Origin Resource Sharing'. It's a way for a website to access resources (e.g. via AJAX calls) that are not on the same domain. This includes the case where only the ports are different, i.e. http://localhost:1111 is not considered to be the same domain as http://localhost:2222. These sorts of requests are forbidden by browsers because of the same origin security policy.

When might you need to deal with CORS?

You'll often need to work around CORS when you have an API serving JSON which you want to call from your fancy Javascript MV* framework application. You might deploy your back end to Heroku or AWS and have your front end code deployed to Cloudfront and serve it from a custom domain. If your back end code doesn't implement CORS then you'll get an HTTP error when your front end tries to call it.

Writing a CORS filter in Play

To implement CORS in the Play Framework, you'll need to create a filter. Mine looks like this, and you can see that all it really does is set a bunch of headers:

package filters

import play.api.mvc._
import scala.concurrent.ExecutionContext.Implicits.global

class CorsFilter extends EssentialFilter {
  def apply(next: EssentialAction) = new EssentialAction {
    def apply(requestHeader: RequestHeader) = {
      next(requestHeader).map { result =>
        result.withHeaders("Access-Control-Allow-Origin" -> "*",
          "Access-Control-Expose-Headers" -> "WWW-Authenticate, Server-Authorization",
          "Access-Control-Allow-Methods" -> "POST, GET, OPTIONS, PUT, DELETE",
          "Access-Control-Allow-Headers" -> "x-requested-with,content-type,Cache-Control,Pragma,Date")

You will also need to reference this filter in your Global.scala class, which you might need to create if it doesn't already exist in your application. Mine looks like this:

import filters.CorsFilter
import play.api.GlobalSettings
import play.api.mvc.WithFilters

object Global extends WithFilters(new CorsFilter) with GlobalSettings


At one point I had an issue where my filter wasn't being executed, no matter what I did, I hooked up the debugger and stepped through the code but for some reason the filter code was never executed. Eventually, I found out that the Global.scala class must be in the default package in your Play application - I had put mine in its own utils package which was why it was never being called.

There is also something that browsers do for more complex requests (like a POST with a MIME type of application/json which is pretty common in these RESTful API style applications) which is called a 'pre-flight request'. This is basically where the browser checks to see whether it is allowed to make the request before performing the full request. It does this by sending an HTTP OPTIONS request, and is something you will also need to handle. The way I did this was to define an additional route in my routes file like this:

OPTIONS        /*all                                            controllers.Application.preflight(all: String)

which is implemented like so in my controller:

  def preflight(all: String) = Action {
    Ok("").withHeaders("Access-Control-Allow-Origin" -> "*",
      "Allow" -> "*",
      "Access-Control-Allow-Methods" -> "POST, GET, PUT, DELETE, OPTIONS",
      "Access-Control-Allow-Headers" -> "Origin, X-Requested-With, Content-Type, Accept, Referrer, User-Agent");


Implementing CORS is often painful and more time consuming than it really should be, so I hope this can help someone else out. Perhaps I should write a Play framework module so this code doesn't have to be re-written every time someone needs to add CORS to their app!


I've created a Play module to implement the above CORS functionality. It's been published to Maven Central. If you have any ideas for improvements or find a bug, please feel free to raise a Github issue or send a pull request through!

Github repo is here: https://github.com/rowdyrabbit/play-cors

Artifact in Maven Central: