This R protocol shows the steps I took to create the plots for my article on stock quote correlations. You might also be interested in what the difference is between a long and a wide table and how to set up a data source in Windows 7 for R and MySQL.
I use four R packages not loaded by default:
- RODBC for fetching the data from MySQL
- reshape for casting from long to wide
- corrgram for the correlogram plot
- hexbin for the density plot
# a vector keeping the short names of the stocks I chose for presentation
stocks <- c('allia','bayer','bilfi','bmw','celes','comme','conti',
'daiml','dt ba','dt te','dürr','e.on','fmc','gea g','hanno','linde',
# connect to the database registered as 'dw 64bit'
db <- odbcConnect("dw 64bit",uid="root",pwd="")
# fetch all daily final quotes from 'exchange'
quotes_d <- sqlQuery(db,
"select `name` as s,
reference_date as d,
endPrice as p
# cast the quotes table from long into a wide format. the long format
# keeps three columns s,d and p as fetched above. But we need the
# a column for each stock keeping a quote per row chronologically
quotes_d <- cast(quotes_d, d~s)
# the verbal translation of the column selection done here would
# be 'return all columns in quotes_d whose column name is 'd' (the
# date column) or present in the stocks vector
names(quotes_d) %in% c('d',stocks)
# every graphical output between png() and dev.off() is channelled
# into a PNG of the defined name and size
# draw a correlogram for all time series in columns with stock
# quotes - I do this filtering to exclude the date column. The section
# below the diagonal keeps pie symbols the section above shade
corrgram(quotes_d[,which(names(quotes_d) %in% stocks)],
# here we create a function colPal which itself returns a vector of
# colors. The colors it will use for the returned vector is a color
# transition for the specified seven colors.
colPal <- colorRampPalette(c('black','blue','aquamarine','green',
# because I want the coloring to depend on the year of a quote I use
# the date column d of the quotes_d data frame and extract the year
# using the format function. Because format() returns a string I
# cast its value to a number and subtract 1997 becaue the earliest
# quote is from 1998. So the difference is 1 which is the position
# of the color to be used. colPal(15) returns a color transition for
# the above described colors with 15 steps - one color for each year
# from 1998 to 2012.
# The colors are placed in a new column of the data frame.
quotes_d$color <- colPal(15)[as.numeric(format(quotes_d$d,"%Y"))-1997]
# now the scatterplot for all stock combinations (again implicitely
# excluding the date column) is drawn.
plot(quotes_d[,which(names(quotes_d) %in% stocks)],
pch=19, # a filled circle for each dot
cex=.5, # half of default size
col=quotes_d$color # the dot-wise coloring
# because dots are lying on top of each other it is a good idea
# to check the density. hexbin itself doesn't draw a matrix of
# density plots so we use splom() from the lattice package which
# draws a matrix of scatterplots. By choosing the given value of
# the panel parameter we tell splom() to insted use hexbin for
splom(quotes_d[,which(names(quotes_d) %in% stocks)],
panel="panel.hexbinplot", # hexbin for each plot
c('burlywood1','brown') # a brownish coloring