Introduction to OpenCPU for R on EC2 with Python

OpenCPUopencpu is (simply put) a server implementing a RESTful web API for remotely executing R functions and retrieving results. In this tutorial I am going to showcase how OpenCPU can be installed on an EC2 instance running Ubuntu 14.04. Python and its requests package come into play for the purpose of conveniently handling HTTP communication. First and foremost thanks to the effort Jeroen Ooms put into developing OpenCPU and composing its documentation the whole process is comparatively easy and painfree.

In case you are merely interested in the API interactions, feel free to skip the first three sections. You can also install OpenCPU locally or simply use An IPython Notebook listing the successive API calls for you may find here.

Setting Up OpenCPU in the Cloud

  1. Sign up for an AWS account
  2. Go to home page of your AWS console and choose EC2
  3. Click “Launch Instance” and choose “Ubuntu Server 14.04 LTS”
  4. Try t2.micro instance for free or go pro for about $0.13 per hour and choose c3.large (recommended)
  5. Keep the default settings; except for step 6 “Configure Security Group” where you add rules for “HTTP” and “HTTPS” both with “Source” set to “My IP”
  6. After a click on “Launch” your asked for whether to use an existing or a newly created key pair. Create a new one and store the pem-file f.x. in  ~/.ssh
  7. In the EC2 dashboard under “Instances” you may now monitor the state of your instance which should show “running” in a few seconds

Connect to your Instance

  1. Restrict permissions of key file:  chmod 400 ~/.ssh/amazon-aws.pem
  2. Connect via SSH:  ssh -i ~/.ssh/amazon-aws.pem ubuntu@[OCPU]

(replace [OCPU] with the domain or IP of your OpenCPU server)

Install R and OpenCPU

  1. sudo add-apt-repository ppa:opencpu/opencpu-1.4
  2. sudo apt-get update
  3. sudo apt-get install r-base r-base-dev
  4. sudo apt-get install opencpu  (go with suggested defaults)
  5. http://[OCPU]/ocpu  should now bring you to your OpenCPU API Explorer :)

Remotely Calling Procedures on HTTP

Open your favorite Python console and import json and requests. Let’s start with something very simple – calculating the mean of a vector using base::mean() :

In short, an RPC here is a POST request to a URL with a path of the following structure:

/ocpu/library/[library name]/R/[function name]

And the function’s arguments are passed as the request’s payload (2). For that purpose we provide the list/array/vector as a seriaized JSON array (1). The response is going to be a number of session-relative paths which lead us to data regarding our RPC call (3). The first one (4) represents the result of the calculation – 3. By adding /json  to the initial request the original response already contains the result as a JSON (5). The expandable code box below features the content for all six paths.

stay-tuned twitter feedly github

Chained RPC with Graphical Output

Now we are going to fit a simple linear model to the cars data set and plot it.

Well, that is all nice and dandy but of course this output is hardly programmatically efficient. It’s just an arbitrarily structured text. To receive something digestible we would have to provide and invoke a customized function which returns for example a well-structured serialized JSON which represents the object’s relevant features. Nonetheless there is a possibility to access fields of an object. In this case the object representing the linear model is an R list, so we can apply  base::get()  to it. The resulting object of a previous session is referenced by the ID of that session:

Important about the past call are two things:

  1. The argument value for "x"  is passed as "'coefficients'" , so OpenCPU handles it as a string and not as the name of an object.
  2. The argument value for "pos"  is passed as "x0f51cfc661" , so OpenCPU does handle it as a reference to an object – and this object happens to be the result of the session with that ID. Same logic applies to "speed~dist"  and "cars"  above, which do not represent strings but actual objects (a formula and a data frame).

For a scatter plot of the data with an overlaying regression line we would have to write a custom function again because this affords two successive function calls – first to plot()  and then to abline()  – and those we cannot chain. Of course this is no big deal and just matter of doing it. But let’s see what happens if we plot the linear model.

pngYou can now for example access the first graphic by: http://[OCPU]/ocpu/tmp/x0dd5b7a086/graphics/1 . This will respond with an image of type PNG. And by appending f.x.  /png?width=300&height=300  you can even specify a size.

Custom Functions

If you want to use your own functions, then you have to organize those in an R package and simply install them as usual in your root-R (which you may start with  sudo -i R). That’s because OpenCPU only finds packages installed on a global level.

(original article published on