Free and Certified MongoDB Online Courses

MongoDB_University_LogoIn case you are interested in learning about MongoDB or generally curious about non-relational approaches to storage of data then my recommendation for you is to check out the online courses offered by MongoDB Incorporation. I promise you won’t be disappointed. MongoDB Inc’s educational department – MongoDB University – offers five courses for developers and dev ops:

Continue reading

Transforming an XML Document into a CSV using XMLStarlet

In this little tutorial I am going to describe a handy tool for transforming an XML document into a more easily processable CSV format. There are many ways of getting this job done - but most are more tedious than necessary (like writing a custom made RegEx parser – yuck!). Using XMLStarlet and XPath expressions this is going to be cinch. Let’s evaluate a number of typical XML data configurations and turn them into a flat CSV structure.

Continue reading

How to Import a CSV into MongoDB using AWK

In case the desired JSON objects structure is just a set of simple attributes this can be achieved by using mongoimport directly. But in case some of the fields are supposed to be combined into an array or a sub-document, mongoimport won’t help you. In this tutorial I will show you how to transform a CSV into a collection of GeoJSON objects and in the course of that teach you the basics of AWK.

Continue reading

Talking to Twitter’s REST API v1.1 with R

twitterIn this text I am going to describe a very straightforward way of how to make use of Twitter’s REST API v1.1. I put some code together for that purpose, so that requesting data just needs the API URL, the API parameters and a vector containing the OAuth parameters.

Before you can get started you have to login to your Twitter account on dev.twitter.comcreate an application and generate an “Access Token” for it. So let’s jump right in and fetch IDs of 10 followers of @hrw (Human Rights Watch). The necessary code is located on GitHub – download all three files and then you can just edit the example below as suggested.

Continue reading

Mondrian Schema for OLAP Cube Definition ft. Google Analytics and Saiku

data-insightsWhat I am going to showcase in this tutorial is how to load web stats from Google Analytics into a fact table with Penthao Kettle/PDI. And then how to represent that fact table with Mondrian 3.6 schema so we can visualize the data with Saiku on Pentaho BI Server. In the end I’ll give my two cents on Saiku Analytics and possible options involving d3.js and Roland Bouman‘s xmla4js.

In case you are new to this I recommend reading my articles on the following topics involved here:

Continue reading

Using the Dimension Lookup/Update Step in Pentaho Kettle

dim_lookup_update_iconIn a traditional star schema the dimensions are located within specialized tables which are referred to by numeric keys from the fact table. A dimension can represent anything from the gender (“male”, “female”, “intersex”) over a hierarchy representing a location (“Germany”, “RLP“, “Mainz“) to an individual user’s profile (name, address, date of birth, …). Now thanks to Mr. Kimball we know there are different types of what he refers to as Slow Changing Dimensions (SCD – “slow” because they are expected to change only infrequently):

Continue reading

A StackOverflow for Business Intelligence – or what BI Can Learn from PHP!

A gamified, high-speed, high-quality Q&A-site for topics revolving around making professionally sense of a company’s data  - a.k.a. “Business Intelligence” – wouldn’t that be awesome? And let’s face it – asking a question on how to configure a step in Pentaho Kettle does not fit any StackExchange site’s realm yet. Usually this type of question is asked on StackOverflow but the feedback-latency is quite high to say the least. Or let’s take a question on how to design a KPI – this one usually ends up on CrossValidated but will often be greeted with disdain given the statistical triviality - plus most people in statistics are not working with BI and won’t be open for the subject’s specific intricacies. And finally you are wondering about how to configure a MySQL RDMS for a data warehouse – where to ask that? On dba.SE … I guess. And suddenly you get weird issues with TomCat which you need for Pentaho BI Server – hmmm, SuperUser? Or ServerFault?

It’s just too distributed!

Continue reading

FIR Filter Design and Digital Signal Processing in R

iconThis article serves the purpose of illustrating that signal processing with R is possible – thanks to the signal package – and to keep a reference of some of the stuff that I learned at my last edX course. Anyway, I am by no means an expert on signal processing so I’d prefer to let the pictures and the code speak for themselves. But to give you the idea – I show case the creation and application of an FIR band pass filter (Chebyshev Type 1 in this case) and of an FIR filter created using the Parks-McClellan method with the Remez exchange algorithm. The code snippets are taken from a larger R script which you can find on GitHub. I aim to focus on the essential parts. You’re welcome to share your knowledge and corrections by leaving a comment.

Continue reading

“Discrete Time Signals and Systems” at edX by Richard Baraniuk

Attending “Discrete Time Signals and Systems” by Richard Baraniuk from Rice University was an awesome experience on many levels. Right after “Learning from Data” my second favorite MOOC so far. First of all the subject of extracting a signal from a discrete time series in terms of frequency composition is interesting by itself and provided a smooth opportunity for me to revise some of the math I studied many years ago. But this by itself wouldn’t make a learning experience that superb – what it takes for that is a teacher who knows how to get the knowledge across to the student. And in that regard – apart from the science itself – Richard is a master! It was obvious how much effort Baraniuk and his team put into designing the course. Every detail about the lectures and the exercises seemed superbially well crafted. And this is apparently not by chance – as after googling the professor’s name I found that he is actually something like a MOOC evangelist and very passionate about offering such an opportunity to the learners around the world.

Continue reading