When you are dealing with large amounts of big data sets it is much more efficient to organizes those in database tables instead of CVSs or other files. Just yesterday I set up R for fetching data from a MySQL DBMS loading a table of stock quotes consisting of more than 300’000 rows into a data frame within seconds. That is pretty cool – and if necessary you can join huge tables in no time benefting of the indexing infrastructure of the DBMS of your choice.
Environmental questions
First of all I am using Windows 7 64bit, MySQL 5 32bit and R 32bit for this example – but even if your environment differs from mine the approach will be basically the same. What we need is an ODBC Connector for MySQL. Now here comes a pitfall – the bits have to match those of your R edition not your windows version’s. So let’s download and install the 32bit ODBC connector for MySQL.
I am not sure if it is also of importance whether your MySQL DBMS is 32 or 64bit – anyway, you find out this way:
1 |
show variables like 'version_compile_machine'; |
If it shows you x86 like in my case then you are using 32bit.
The 32bit data source manager in Windows 7 64bit
The next step is to register your MySQL database as a data source in windows. In Windows 7 64bit you have to explicitely start the data source manager for 32bit connectors because the one you will find in your system preferences will only manage 64bit connectors. The 32bit data source manager you are going to find here:
1 |
C:\Windows\SysWOW64\odbcad32.exe |
The rest is a cinch – you click “Add..” on the User-DSN sheet, choose the “MySQL ODBC 5.x driver”, fill in the credentials, choose a database and a data source name and you’re done.
The R package RODBC
Now you can go back to R install the RODBC package, load it and connect to your new data source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# install the RODBC package install.packages("RODBC") # load the package library(RODBC) # connect to you new data source db <- odbcConnect("data_source_name", uid="root", pwd="password") # whoe the names of the available tables sqlTables(db) # load a_table into a data fram df <- sqlQuery(db, "select * from a_table") # close the connection odbcClose(db) |
Information about the package’s functionality you will find here.
Note though that some advanced (My)SQL features won’t be available when connecting through ODBC. In PowerPivot I encountered a situation where I wasn’t able to fire a query including some tricks with variables. But joining, grouping, ordering and all that stuff is of course possible.
Like George said, lots of unclear/broken guides out there. This one not only was clear and concise, but actually works! Thanks :)
Hey Archyp – if I could choose a word conveying positive feedback, it would clearly be “concise” – thanks!
Thank you very much for the information.
You’re welcome, Sudheera!
Thank you for this. Very much.
Without your blog, I think I would never have gotten R to talk to MySQL; there’s a lot of bad info out there, even on StackOverflow.
But now it works perfectly, exactly as you describe.
May 2014
R version 3.0.2 (2013-09-25) — “Frisbee Sailing”
RStudio version Version 0.98.501
Windows 7 Home Premium Service Pack 1
Let me mention that
This is the site where I found the connector: http://dev.mysql.com/downloads/connector/odbc/
And http://dev.mysql.com/doc/connector-odbc/en/connector-odbc-configuration-dsn-windows-5-2.html was also very helpful.
You’re most welcome! And thanks for the feedback!