Mondrian Schema for OLAP Cube Definition ft. Google Analytics and Saiku

data-insightsWhat I am going to showcase in this tutorial is how to load web stats from Google Analytics into a fact table with Penthao Kettle/PDI. And then how to represent that fact table with Mondrian 3.6 schema so we can visualize the data with Saiku on Pentaho BI Server. In the end I’ll give my two cents on Saiku Analytics and possible options involving d3.js and Roland Bouman‘s xmla4js.

In case you are new to this I recommend reading my articles on the following topics involved here:

OLAP Setup

The RDBMS I’ll use is MySQL and the OLAP cube consists of four tables:

  • facts_web_stats
  • dim_time
  • dim_location
  • dim_source

The schema and creation of dim_time is described in “Setting Up a Time Dimension Table in MySQL“. The schemas for the other three tables are as follows:

 In case the purpose of columns version, date_from and date_to is obscure to you I recommend to check out “Using the Dimension Lookup/Update Step in Pentaho Kettle“. The quick answer is that we are going to represent the dimensions as SCD of Type 2 and hence need those columns for versioning. The brutal truth is that I treat them as SCD Type 2 instead of Type 1 simply so I can apply the “Dimension Lookup/Update” step.

The column “category” in dim_source I added and populated manually. It’s summarizing similar sites into categories. F.x. all Google search sites end up in “google”, the other non-Google search engines end up in “non_google_se” – and so on.

Loading the Data from GA into the Data Warehouse

Thanks to Kettle taking care of all the intricate details regarding querying the GA API and managing the dimensions we end up with a rather simple transformation:

web-stats-trafoYou can download the transformation file here. It’s pretty straightforward – just notice that I use the output type “String” with format “yyyyMMdd” for the date within the Google Analytics step. The field is implicitely cast to integer when it comes to inserting the facts to facts_web_stats – and the ID for date May 3 2014 is 20140503.

Describing the Cube with Mondrian 3.6 Schema

To be able to access those tables as a cube we need to represent the structure with an XML schema that Mondrian 3.6 understands. Because what happens under the hood – while merrily clicking around in Saiku crafting pretty looking data insight to save our imaginative multi-national enterprise billions of dollars every year – is that Saiku translates what you do into an MDX query. Finally the MDX is then translated by Mondrian into plain old SQL to actually query the data from MySQL. Okay there you go:

I guess the example is simple enough to make quickly sense of the meaning of the various details. The official documentation you may find here. A couple of words are in order though.

“Schema” is the root element which contains one or more cubes. At the center of every “Cube” we have a fact table. A cube is made up of dimensions (“Dimension”) and facts (“Measure”). The dimensions contain one or more hierarchies. That might seem confusing first but makes sense when you think of what f.x. a time dimension might be practically. Time might be week- or month-based. But a week or a month number on itself does not make munch sense as you need the year for the context. And that is where the “Hierarchy” comes into play. The hierarchy is representing an abstract “contains”-relationship. The year contains the month contains the day or for the location dimension – continent contains subcontinent contains country contains city. And given that sometimes a city changes its country – see Crimea – it was maybe not such a bad idea to take that into account by modelling the location dimension as SCD Type 2.

The link between the fact table and a dimension table is defined by “foreignKey” (for the fact table) and “primaryKey” (for the dimension table).

Data Insights with Saiku

Now we load the schema file with Pentaho BI Server and open a new Saiku Analytics page and we are ready to play around with the cube. The following screen shot shows you the setup and the result of a “treemap” plot for subcontinent by source category in May 2014 for some idea on how the world ends up on my site:

data-insightsOr let’s do an area chart of the page views by week for 2014 and the continents:

data-insights-2Well, you get the idea.

Data Exploration like a Pro

Is Saiku Analytics the real deal? Well, I guess that depends on what you need. But judging from what I can tell from using the Community Edition I am missing a lot of features. First and foremost none of the charts seem to be customizable beyond choosing dimensions and metrics. And I am not talking about coloring – which would be nice too – but for example about influencing how a metric is mapped onto the visualization.

As far as I can tell there is no way to configure the scaling for the multiple bar chart – instead automatically for all bar charts the same scaling is applied.

multiple-bar-chartline-chartThen there are some design choices I personally would consider debatable like rounded lines for the line chart – no option for grid lines and missing data point markers. The heat grid is not revealing much due to sub-optimal usage of space and coloring.

But don’t listen to me – form your own opinion. I also would like to add that the Saiku Analytics plugin might not cater to my ideals yet but seems to be a very well developed and visually appealing product meeting all the requirements for making great progress in near future! I also have to emphasize that I used the community edition of Saiku Analytics and can’t say for sure whether some enterprise edition might make all my dreams come true. Though I couldn’t find anything indicating such an extended edition.

xmla4js and d3.js

Few years ago Roland Bouman started xmla4js with the intention of making XMLA server accessible to JavaScript applications. I would love to see a true grassroots developed d3.js driven data explorer connecting to Mondrian. Apparently some russian guy already gave it a try but didn’t took it further than creating a terrible youtube clip showcasing a prototype. I think this would provide a great option to commercial solutions.

(original article published on

6 thoughts on “Mondrian Schema for OLAP Cube Definition ft. Google Analytics and Saiku

  1. “I would love to see a true grassroots developed d3.js driven data explorer connecting to Mondrian.”

    Actually, that is now the case. There is Xavier:

    And there is xmondrian to get started real quick with both mondrian and xavier:

    Xavier uses d3 and dimple and is built on top of xmla4js. So, you can connect to any XML/A provider. So far, I got pretty good results with Mondrian, SSAS, SAP/HANA, icCube and Jedox.

  2. The Saiku Chart Plus plugin for Saiku might interest you. It has additional charting capacities beyond what Saiku CE offers, and the plugin is open source. Unfortunately it uses Highcharts instead of D3. Besides the Highcharts support, it exposes (simple) geographic charting (Google) which works at the state/province and country levels. It was able to interpret my geographic dimension without me having to set up anything. You could probably integrate it with your “location” dimension defined in your cube.

    I would like to see an open source product (perhaps a fork of Saiku) that uses D3 and Xmla4js under the hood.

    Great article!

    • When you say “Saiku Chart Plus exposes (simple) geographic charting (Google) which works at the state/province and country levels” – Does this mean it will only go lower than an aggregate of a country in the US? For example could the add-on be used for counties in the UK. As they’re using Google to power it I’m just wondering what it allows you to pass over the API.

  3. Hi!

    thanks for mentioning Xmla4js. It is in fact pretty much finished for as far as it goes. The design goal for Xmla4js was to have a javascript library for communicating with XML/A servers, and I think most of the XML/A spec is nailed.

    The project lives on github nowadays:

    It supports all major browsers and there is also support for node.js.

    Using xmla4js as a datasource for d3.js sounds really exciting! I was thinking about doing something like that a while a go, but darn those days are short.

    Anyway  thanks again for the shoutout, and kind regards,


    Roland Bouman

    • Hi Roland,

      thank you very much for your reply!

      Having worked with large customers in my function as an Analytics Consultant I can say that there is a huge need for flexible web-based (fancy-looking) reporting solutions (especially featuring cool heatmaps). This use case is currently addressed mostly with QlikView and Tableau. But I believe that there is a biiiig market for open source alternatives which so far do not seem to exist or are not sufficiently mature.
      xml4js could function as the essential tool bridging the gap between data and report. The big challenge here is clearly that the data needs to be accessible as a cube to facilitate automatic query generation. I know Pentaho BI Server with Saiku and Mondrian is residing in that realm but – as far as I can tell – it is not (yet) really an attractive option for multiple reasons.
      I will take some time to play a bit with xml4js as soon as I find time – probably 2015. If it is possible to make HTML/JS-based reporting and visualizations talk to data residing in some SQL-DB via Mondrian/xml4js – then this has huge potential! A lot of people would love to get stuff done this way.

      Kind regards


Comments are closed.