(This article is referring to an initial proof-of-concept version of r-big-pivot)
I have to admit that I very much enjoy pivoting through data using Excel. Its pivoting tool is great for getting a quick insight into a data set’s structure and for discovering interesting anomalies (the sudden rise of deaths due to viral hepatitis serves as a nice example). Unfortunately Excel itself is handicapped by several restrictions:
- a maximum number of one point something million rows per data set (which is crucial because the data needs to be formatted long)
- Excel is slow and often instable when you go beyond 100’000 rows
- cumbersome integration into available data infrastructures
- restricted charting and analysis capabilities
(This article expects from the reader basic experience with PowerPivot. You should know how to access a PowerPivot table from Excel and how to turn it into a pivot table or a chart.)
KPIs for a grocery store
I think this topic is introduced best using an example. Let’s assume an offline store and we are responsible for the data analysis. The boss wants us to count the people visiting the shop considering their age group and also how many of those visits lead to a purchase.
PowerPivot itself cannot be automated with VBA. But with a few tricks we can control how the data drawn from PowerPivot is accessed within Excel. This article assumes that you know how to import data with PowerPivot and how to integrate it into an Excel file as a pivot table. Also you should have activated the developer tools in Excel. There you find all the VBA relevant stuff like the macro recorder and the dialogue controls you want to add.
Let’s get it started with an example
We start with a PowerPivot table (named “data”) keeping the values for two variables A and B for every day in 2012 imported from a MySQL-DB in this case. So in the PowerPivot window this would look something like that:
So let’s assume some manager is asking for a dashboard displaying those two values in a chart but only one at a time and s/he is not inclined to deal with pivot table field lists but wants to switch between A and B using buttons.
Pivot tables are in contrast to popular belief not just an Excel feature but present in a lot of applications dealing with tabularly structured numerical data – pivot tables are the visual and interactive result of a general concept of data aggregation depending on categories. LibreOffice (check out the article on it), OpenOffice and R by libraries like reshape offer this technology targeting at a different audience. If you understand pivoting in one of these tools, you pretty much got it for all of them.
At my work I am offering a workshop on pivoting with Excel 2010 and so I came up with a chart boiling down the concept of using pivot tables so it fits on a sheet of PDF.
When you are responsible for setting up or extending an automized reporting of data stored in a warehouse with Excel, I would always vote for using PowerPivot for fetching the data and displaying it in pivot charts and tables. But there are use cases and situations where PowerPivot is not an option. Maybe you just cannot use PowerPivot for whatever reason. But first and foremost I met / suffered from the case where you already have an Excel reporting using cell references to other files and macros to fill the cells. If this sort of reporting reaches a certain extent and importance you cannot just set it up new again. Now if you need to extend such a report f.x. with a new column and don’t want to use cell references to other files – or worse VBA – because this is very fragile and error prone then I have a solution for you.
And this is how it works