The German Federal Office of Statistics (Statistisches Bundesamt Deutschland) maintains a web site called GENESIS which offers a huge amount of statistics available with a few mouse clicks. Unfortunately the web site is only offered in German language and a tad clumsy to use. But still, a wide variety of topics regarding German economics, society and politics is covered – which makes this source indispensable.
The organization of the statistics
Let’s assume you are interested in statistics on causes of death (Todesursachenstatistik). You can either search for it using the search bar or you start from the list of topics (Themen) taking the path
- Gesundheit / Health (2)
- Gesundheitswesen / Healthcare (23)
- Todesursachen / Causes of death (232)
- Todesursachenstatistik / Statistics on c.o.d. (23211)
The entity statistics (Statistik) bundles a number of tables offering the information for different characteristics like cause, year, gender, age and so on. In this case we are offered four different tables numbered 23211-0001 to 23211-0004. Usually I will choose the one offering the largest amount of characteristics. You never know when you are going to need a characteristic and the next step is anyway to transform the downloaded cross tables kept in Excel files into something useful – and you don’t want to do that more often than necessary. Let’s have a look.
(One word of caution – try to avoid the back in history functionality of your browser. The web-site doesn’t like this and offers back-buttons for that purpose.)
Preparing a table for download
Looks somewhat intimidating at first – but after playing around with it a bit the layout gets familiar. So I recommend to just click around to get an idea how those features are working.
But to speed things up a little, let me tell you few things. The first column shows you the position of the characteristics. This can not be adjusted (except for basically transposing the whole table with this button having the black arrow in it) – so it’s actually irrelevant and the table is just going to be structured as it is. What you can change though is the values for the characteristics you want in your final table – using the buttons in the last column. This is actually important for two reasons. One is that maybe you just want the data for one gender and not both. The other is that if you select too many values for a characteristic the table grows too large and GENESIS will refuse to generate it for you – unless you get a fee based account. The trick then is to split the values for one prominent characteristic up and download the whole table split in several smaller ones.
When you chose your values you click on “Werteabruf” (fetching of data). There you see an exerpt of the table – but more important – you can click on the Excel-icon to download the spreadsheet.
Due to GENESIS restricting the table size to a quite low volume you will probably end up with several spreadsheets. Some copy and pasting later and you have them composed into a single huge table. Actually this data structure is still quite useless to really investigate the figures. First you will have to turn this cross table from its (so called) wide format into a long format – then you can use the data as a source for a pivot table in the spreadsheet program of your choice.
I am going to use the statistics on death causes in several future post, as it offers an interesting pool of insights to dying – that is always exciting of course. Also there is going to be posts about different table structures and how to turn a wide table into a long table using R’s reshape library.