Illustrated Guide to ROC and AUC

roc(In a past job interview I failed at explaining how to calculate and interprete ROC curves – so here goes my attempt to fill this knowledge gap.) Think of a regression model mapping a number of features onto a real number (potentially a probability). The resulting real number can then be mapped on one of two classes, depending on whether this predicted number is greater or lower than some choosable threshold. Let’s take for example a logistic regression and data on the survivorship of the Titanic accident to introduce the relevant concepts which will lead naturally to the ROC (Receiver Operating Characteristic) and its AUC or AUROC (Area Under ROC Curve).

Continue reading

Neural Nets with Caffe Utilizing the GPU

network-graphCaffe is an open-source deep learning framework originally created by Yangqing Jia which allows you to leverage your GPU for training neural networks. As opposed to other deep learning frameworks like Theano or Torch you don’t have to program the algorithms yourself; instead you specify your network by means of configuration files. Obviously this approach is less time consuming than programming everything on your own, but it also forces you to stay within the boundaries of the framework, of course. Practically though this won’t matter most of the time as the framework Caffe provides is quite powerful and continuously advanced.

Continue reading

GPU Powered DeepLearning with NVIDIA DIGITS on EC2

activationsIn this tutorial I am going to show you how to set up CUDA 7, cuDNN, caffe and DIGITS on a g2.2xlarge EC2 instance (running Ubuntu 14.04 64 bit) and how to get started with DIGITS. For illustrating DIGITS’ application I use a current Kaggle competition about detecting diabetic retinopathy and its state from fluorescein angiography.

Convolutional Deep Neural Networks for Image Classification

For classification or regression on images you have two choices:

  • Feature engineering and upon that translating an image into a vector
  • Relying on a convolutional DNN to figure out the features

Continue reading

Guide to EC2 from the Command Line

AWSThis tutorial aims at guiding your first steps at controlling your EC2 instances from the command line. It is by no means even remotely complete but it will give you an impression of the basic structure and concepts, so you can quickly fill in the gaps for your personal use case. The tutorial starts with setting up your account and forges a bridge from requesting a Spot instance, over exchanging files with it, hooking up additional storage, to finally terminating it. I am not though explaining interaction with the AWS web console – we’ll only resort it for some initial configuration. As usual the target audience are Linux users but the AWS CLI tools are pretty much identical for Windows.

Continue reading

A Guide on OCR with tesseract 3.03

Tesseract is tough … so tough indeed, even Chuck Norris would have to check the manual twice. Not kidding you. Okay, so this article aimes at structuring what I needed to learn about tesseract to OCR-convert PDFs to text and how to train tesseract for application to new fonts. Let me dampen your expectations – you *will* have to read further texts (esp. the official documentation) to actually perform successful training! This text is describing usage of tesseract 3.03 RC on Ubuntu 14.04. Tesseract is also available for other Linuxes and Windows – the work flow will be mostly the same across OSes – of course some commands I use are though specific to Ubuntu. Also mind that tesseract 3.03 is considerably different to 3.02, which again differs from  3.01 as well – the changes are partially more fundamental than what you might expect from the version numbers.

Continue reading

Introduction to OpenCPU for R on EC2 with Python

OpenCPUopencpu is (simply put) a server implementing a RESTful web API for remotely executing R functions and retrieving results. In this tutorial I am going to showcase how OpenCPU can be installed on an EC2 instance running Ubuntu 14.04. Python and its requests package come into play for the purpose of conveniently handling HTTP communication. First and foremost thanks to the effort Jeroen Ooms put into developing OpenCPU and composing its documentation the whole process is comparatively easy and painfree.

In case you are merely interested in the API interactions, feel free to skip the first three sections. You can also install OpenCPU locally or simply use public.opencpu.org/ocpu. An IPython Notebook listing the successive API calls for public.opencpu.org/ocpu you may find here.

Continue reading

OAuth 2.0 for Google (Analytics) API with Python Explained

oauth2In this tutorial I am going to explain how OAuth 2.0 works and how to apply it for interacting with Google Analytics API using Python. Google provides for that purpose a Python package – which so far only supports Python 2 though … well.

OAuth2 seems to be quite a mess at first and Google’s documentation on this subject is not that well organized in my opinion. So with this article I do my best to save you the sweat I had to invest. After all it’s not that complicated anyway, as you will probably agree.

Continue reading

As a Data Scientist it is my Obligation to support #nobagida, #nopegida and any other #no[a-z]{2}gida today :)

Political Opinion on a Scale from 0 to 2π

nopegida

Just came back with my girlfriend from the demonstration at Sendlinger Tor. Noticed quite a few Palestinian flags being waved around – fair enough – but I thought to myself that I would actually like to see one or two Israeli flags as well. Later we went over the street to have a look at the pegida guys when I noticed no less than two Isareali flags there. That’s was kind of weird … but of course for pegida a lot of their presentation revolves around emphasizing how not-Nazi they are – which is slightly odd given the occasional pegida-israel-flagNeonazi hanging around with them. Also given their focus on how bad muslims are, to those little educated people it might seem plausible to show off how prosemitic they are b/c Jews supposedly share some of their views.

Continue reading

Germans used to have more Sex in Summer!

DE-monthWow – what a headline … okay, I admit it’s phrased quite sensational given that it anticipates just one possible interpretation of increasingly more births around summer / autumn compared to in spring … but I guess I just get more proactive at marketing with every post I publish!

Okay, enough of that – here’s the deal – essentially looking at the monthly variation of the seasonal component of live births by month reveals strong trends indicating more births in the first half of the year towards the 60s relative to towards 2010 and the opposite relation manifests in the second half. Now assuming a standard duration of labour of 9 months we can calculate the maximum-likelihood month of intercourse – and that’s where it gets juicy! BTW – less children being born in spring means less anxious generations ahead – but more on that later.

Continue reading