Pivot tables are in contrast to popular belief not just an Excel feature but present in a lot of applications dealing with tabularly structured numerical data – pivot tables are the visual and interactive result of a general concept of data aggregation depending on categories. LibreOffice (check out the article on it), OpenOffice and R by libraries like reshape offer this technology targeting at a different audience. If you understand pivoting in one of these tools, you pretty much got it for all of them.
At my work I am offering a workshop on pivoting with Excel 2010 and so I came up with a chart boiling down the concept of using pivot tables so it fits on a sheet of PDF.
Why is pivoting useful?
Being able to set up a pivot table is of immense practical importance for everybody working with spreadsheet data. Simply because it can be very annoying and time consuming to restructure or expand a static table. Just look at the initial table and imagine removing the country categorization (using addition for aggregating or average) or changing the categorization of gender from column- to row-wise. Sucks. And now imagine this table is 100 times as large. My head starts to spin!
The thing with the long format
The biggest drawback to pivot tables for the average Joe (everybody not working as data analyst or programmer – sorry) is to get or have your data in long format. Most data I deal with comes from an SQL database and then is structured long in the first place. Another option is converting a wide table into a long table using the reshape package in R. But with some capability in organized thinking this can be done manually to quite some extent.
In the end the hard parts of pivot tables is getting the concept of wide versus long structured data and understanding how the grouping and aggregation works – this I address in the last step. If you understand why I chose the colors for the “refers to”-boxes, you got it.
(original article published on www.joyofdata.de)