Generating virtual tables for JOIN operations in MySQL

Let me introduce this post with an example. You offer a web service and every request to it is logged in a database table – keeping the type of request, the user, a lot of other stuff and first and foremost the (date)time it happened. Now you want to know the number of requests within an hour, chronologically ordered, for the last four weeks – to get an idea about when your customers are using the service. If you encounter this situation for the first time your query might look like this:

Now of course you want to chart it (f.x. in Excel) and you will see that the distribution will be quite distorted because of no requests for some hours in general and some unusal requests at random time or missing usual requests due to downtime of your service. Leading to different numbers of hours for a day. That doesn’t look nice because we want a chart having all the 24 hours in a day showing 0 in case no request happened.

How do we do it?

Probably there are many ways of solving this but the easiest is to just fetch the data in such a gapless structure. Now, given that SQL is a declarative language and procedural solutions with SQL should be restricted to very specific situations (so specific that I didn’t encouter one so far in my life) this needs a little trick. With a procedural language we could loop, in SQL we have to describe instead what we want. (Mind that this web site is not called “pain of data” – in case you are wondering why I am not even considering the usage of VBA)

A simple example

Only one prerequiste is needed and that is a table holding more records than the records we want to produce (672 in our example – 24 hours for 28 days). Let’s assume we have a table called users keeping several million happy customers and start with a simple example where we just count to 10.

What we do here is to virtually select all rows of users and attach to it a variable d which is incremented for every record. The initialization of the variable is done in a dummy select aliased x. We could execute the initialization with a SET statement before firing the query but this would not be executable anymore with PowerPivot for example. Now this query would increment as often as there are records in users which is why we restrict to 10 with LIMIT.

The backbone for our usage time line

This is how the “backbone” query for our hourly number of requests chart would look like:

Analog to the counting in the query before, just this time we increment the hours and the initial date is determined by subtracting 28 days and 1 hour. Another important aspect about this query is that I already prepare the different date segments for later usage. Then you can immediately start pivoting around instead of having to use Excel for date surgery.

Now we have the virtual table – the backbone – at hand and the final step is to LEFT JOIN the data with it.

The result looks like this in MySQL Workbench:

Clipboard02

There are further scenarios conceivable where the backbone is a combination of different keys, even strings. So you can create a long formatted table and use that as a source of a pivot table.