First – let me whet your appetite by showing you what a pretty pretty report you will be able to compose after you finished this tutorial – and there you go (click on the picture to bask in the report’s full glory!):
Oh yeah, if that report was a party, I’d vote for it! (or go there, depending on what kind of party) – And you know what’s the best? Integrated into the Pentaho BI System its dissemination is fully automatable. No more manually updating Excel dashboards – no more clicking refresh and then sending it to your data hungry office friends – no, no, no – it can be scheduled – deployed via a web interface of Pentaho BA Server or kicked through the wire as a shiny PDF! Allowing you to waste less time on technicalities and have real fun with data! Okay, now I got you hooked – So, let’s get down to business!
No Data – No Joy of Reporting
Before we can rock the bar charts we need data – the schema of the data I am going to load into a MySQL table from Google Analytics using Pentaho Data Integration is as follows:
1 2 3 4 5 6 7 8 9 |
CREATE TABLE `web_stats_daily` ( `date` date NOT NULL, `country` varchar(45) NOT NULL, `city` varchar(45) NOT NULL, `source` varchar(45) NOT NULL, `page_path` varchar(100) NOT NULL, `pages` int(11) NOT NULL, PRIMARY KEY (`date`,`country`,`city`,`source`,`page_path`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; |
The PDI transformation I use for the ETL from Analytics to MySQL may be downloaded here and looks like this:
I am going to import the web stats for the time span April 1 2013 to April 30 2014. In case you have no idea how an import with PDI from GA works, then check out this article. And if you would like to simply start playing with PRD shoot me a mail and I will send you the full SQL dump – it’s a deal!
Part 1: Composing the Report of Sub-Reports
Installing PRD is usally as simple as downloading the archive and extracting it somewhere. Then you launch it by executing either report-designer.sh or report-designer.bat – depending on your system. Ah, right, and don’t forget to download the MySQL-JDBC-adapter and place it in [PRD-HOME]/lib/jdbc – or Pentaho won’t know how to flirt with MySQL.
When you want to use data from different data queries in the same report then it is best to partition the report into sub-reports. I am going to use a main report and place four sub-reports in there – one for each chart and the table.
Mind how I place all the charts in the “Report Header” section and the text fields representing the various URLs and the corresponding page views number in the “Details” section. That’s because the “Report Header” section is rendered once, while the “Details” section is evaluated once per data record coming from the sub-reports data query.
Part 2: Setting up the Data Queries
The charts and the table need the data to be specified by MySQL queries in our case. The SQL queries below feature the finally used parameters “SelectYear” and “SelectMonth”. You can substitute those with 2014 and 4 for example – that’s how I use them in this part of the tutorial.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
(select source as source, sum(pages) as pageviews from web_stats_daily where year(date) = ${SelectYear} and month(date) = ${SelectMonth} group by source order by pageviews desc limit 7) union all (select "..." as source, sum(t.pageviews) as pageviews from ( select source as source, sum(pages) as pageviews from web_stats_daily where year(date) = ${SelectYear} and month(date) = ${SelectMonth} group by source order by pageviews desc limit 7,1000 ) t) ; |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
(select country as source, sum(pages) as pageviews from web_stats_daily where year(date) = ${SelectYear} and month(date) = ${SelectMonth} group by country order by pageviews desc limit 7) union all (select "..." as source, sum(t.pageviews) as pageviews from ( select country as source, sum(pages) as pageviews from web_stats_daily where year(date) = ${SelectYear} and month(date) = ${SelectMonth} group by country order by pageviews desc limit 7,1000 ) t) ; |
1 2 3 4 5 6 7 8 9 |
select concat(year(date),"-",month(date)) as ym, sum(pages) as pageViews from web_stats_daily where date between concat(${SelectYear}-1, "-", ${SelectMonth}, "-", 1) and concat(${SelectYear}, "-", ${SelectMonth}, "-", 31) group by ym order by date ; |
1 2 3 4 5 6 7 8 9 |
select page_path as pageUrl, sum(pages) as pageViews from web_stats_daily where year(date) = ${SelectYear} and month(date) = ${SelectMonth} group by pageUrl order by pageViews desc limit 10 ; |
Part 3: Specifying the Main Parameters
So that we can choose what year and month the report should refer to, we need to specify two parameters named “SelectYear” and “SelectMonth”.
Part 4: Communicating the Parameters to the Sub-Reports
At the end of the day we want to apply the parameters within the sub-reports – to have the pie chart f.x. show the top countries for that respective month. For that purpose we need to communicate those down to the sub-reports by introducing a mapping from “external” to “internal” parameter for each sub-report. Then we can also integrate the parameters in the data queries using the notation “${ParameterName}”.
Part 5: Introducing a Dynamic Title
This part deals with blessing the report with a dynamic title. For illustrative purposes I show two options. One is via a “message field” which allow mixing text with parameter values using the “$(ParameterName)” notation. The second option allows a more complex composition by using a formula to turn a parameter value into a string:
1 2 3 4 |
=CHOOSE([SelectMonth]; "Jan";"Feb";"Mar";"Apr";"May";"Jun"; "Jul";"Aug";"Sep";"Oct";"Nov";"Dec") &" "&[SelectYear] |
The value of SelectMonth specifies the index of the string we want to use to concatenate it with the value of SelectYear. This formula is an OpenFormula expression.
Part 6: Prettifying the Report
Finally we are going to adjust a few details and the arrangements. I invite the reader to play around and adjust the aesthetics further – as well as adding titles to the charts.
Delving Deeper
Now that I successfully whetted your appetite I highly recommend you check out the book “Pentaho 5.0 Reporting by Example: Beginner’s Guide” (to which I am in no way affiliated with btw!) – it is very well written and offers a very sound introduction into the whole affair of report design and dissemination with the Pentaho Report Designer 5.
(original article published on www.joyofdata.de)
Thanks! i learned a lot!
Very good but not have sound in video?
I dont’t like my voice on tape … ;)
nice one!