LOCF and Linear Imputation with PostgreSQL

This tutorial will introduce various tools offered by PostgreSQL, and SQL in general – like custom functions, window functions, aggregate functions, WITH clause (or CTE for Common Table Expression) – for the purpose of implementing a program which imputes numeric observations within a column applying linear interpolation where possible and locf-and-linearforward and backward padding where necessary. I’m going to progressively add and explain those constructs, step by step, so no problem if you are new to the scene. I am very much interested in input regarding potential downsides of the implementation and possible improvements.

Continue reading

Guide to EC2 from the Command Line

AWSThis tutorial aims at guiding your first steps at controlling your EC2 instances from the command line. It is by no means even remotely complete but it will give you an impression of the basic structure and concepts, so you can quickly fill in the gaps for your personal use case. The tutorial starts with setting up your account and forges a bridge from requesting a Spot instance, over exchanging files with it, hooking up additional storage, to finally terminating it. I am not though explaining interaction with the AWS web console – we’ll only resort it for some initial configuration. As usual the target audience are Linux users but the AWS CLI tools are pretty much identical for Windows.

Continue reading

As a Data Scientist it is my Obligation to support #nobagida, #nopegida and any other #no[a-z]{2}gida today :)

Political Opinion on a Scale from 0 to 2?

nopegida

Just came back with my girlfriend from the demonstration at Sendlinger Tor. Noticed quite a few Palestinian flags being waved around – fair enough – but I thought to myself that I would actually like to see one or two Israeli flags as well. Later we went over the street to have a look at the pegida guys when I noticed no less than two Isareali flags there. That’s was kind of weird … but of course for pegida a lot of their presentation revolves around emphasizing how not-Nazi they are – which is slightly odd given the occasional pegida-israel-flagNeonazi hanging around with them. Also given their focus on how bad muslims are, to those little educated people it might seem plausible to show off how prosemitic they are b/c Jews supposedly share some of their views.

Continue reading

Humor is a powerful, alternative Method for processing Data and reporting Results.

-

“Je n’ai pas peur des représailles. Je n’ai pas de gosses, pas de femme, pas de voiture, pas de crédit. Ça fait sûrement un peu pompeux, mais je préfère mourir debout que vivre à genoux.”

(“I am not afraid of reprisals, I have no children, no wife, no car, no debt. It might sound a bit pompous, but I’d prefer to die on my feet rather than living on my knees.”)

Charb – Interview 2012

Transforming an XML Document into a CSV using XMLStarlet

In this little tutorial I am going to describe a handy tool for transforming an XML document into a more easily processable CSV format. There are many ways of getting this job done – but most are more tedious than necessary (like writing a custom made RegEx parser – yuck!). Using XMLStarlet and XPath expressions this is going to be cinch. Let’s evaluate a number of typical XML data configurations and turn them into a flat CSV structure.

Continue reading

A StackOverflow for Business Intelligence – or what BI Can Learn from PHP!

Update 2015-08-25:

The proposal was not successful and has been deleted :(


A gamified, high-speed, high-quality Q&A-site for topics revolving around making professionally sense of a company’s data  – a.k.a. “Business Intelligence” – wouldn’t that be awesome? And let’s face it – asking a question on how to configure a step in Pentaho Kettle does not fit any StackExchange site’s realm yet. Usually this type of question is asked on StackOverflow but the feedback-latency is quite high to say the least. Or let’s take a question on how to design a KPI – this one usually ends up on CrossValidated but will often be greeted with disdain given the statistical triviality – plus most people in statistics are not working with BI and won’t be open for the subject’s specific intricacies. And finally you are wondering about how to configure a MySQL RDMS for a data warehouse – where to ask that? On dba.SE … I guess. And suddenly you get weird issues with TomCat which you need for Pentaho BI Server – hmmm, SuperUser? Or ServerFault?

It’s just too distributed!

Continue reading

Social Network Analysis by Lada Adamic on coursera

courseraDid you know that top researchers and universties from all over the world offer their knowledge in structured and partly certified online courses? Well now you know! Those coures are refered to as MOOC which stands for “Massive Open Online Courses” and is for me one of THE digital discoveries of the year 2013. The three biggest platforms are currently edX, coursera and Udacity. I am following courses on all of those three and I really sometimes can’t believe how awesome this opportunity is.

Continue reading