This R protocol shows the steps I took to create the plots for my article on stock quote correlations. You might also be interested in what the difference is between a long and a wide table and how to set up a data source in Windows 7 for R and MySQL.
I use four R packages not loaded by default:
- RODBC for fetching the data from MySQL
- reshape for casting from long to wide
- corrgram for the correlogram plot
- hexbin for the density plot
Useful information on colors in R you find here and here.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
# a vector keeping the short names of the stocks I chose for presentation stocks <- c('allia','bayer','bilfi','bmw','celes','comme','conti', 'daiml','dt ba','dt te','dürr','e.on','fmc','gea g','hanno','linde', 'lufth','man','merck','münch','rwe','sgl c','sieme','thyss','volks', 'heide','vossl','hocht','kuka','rhein' ) # connect to the database registered as 'dw 64bit' library(RODBC) db <- odbcConnect("dw 64bit",uid="root",pwd="") # fetch all daily final quotes from 'exchange' quotes_d <- sqlQuery(db, "select `name` as s, reference_date as d, endPrice as p from exchange ") # cast the quotes table from long into a wide format. the long format # keeps three columns s,d and p as fetched above. But we need the # a column for each stock keeping a quote per row chronologically # ordered library(reshape) quotes_d <- cast(quotes_d, d~s) # the verbal translation of the column selection done here would # be 'return all columns in quotes_d whose column name is 'd' (the # date column) or present in the stocks vector quotes_d<-quotes_d[,which( names(quotes_d) %in% c('d',stocks) )] library(hexbin) library(corrgram) # every graphical output between png() and dev.off() is channelled # into a PNG of the defined name and size # draw a correlogram for all time series in columns with stock # quotes - I do this filtering to exclude the date column. The section # below the diagonal keeps pie symbols the section above shade # symbols png("corrgram_quotes_d_abc.png", width=1000,height=1000) corrgram(quotes_d[,which(names(quotes_d) %in% stocks)], lower.panel="panel.pie", upper.panel="panel.shade") dev.off() # here we create a function colPal which itself returns a vector of # colors. The colors it will use for the returned vector is a color # transition for the specified seven colors. colPal <- colorRampPalette(c('black','blue','aquamarine','green', 'orange','red','chocolate4')) # because I want the coloring to depend on the year of a quote I use # the date column d of the quotes_d data frame and extract the year # using the format function. Because format() returns a string I # cast its value to a number and subtract 1997 becaue the earliest # quote is from 1998. So the difference is 1 which is the position # of the color to be used. colPal(15) returns a color transition for # the above described colors with 15 steps - one color for each year # from 1998 to 2012. # The colors are placed in a new column of the data frame. quotes_d$color <- colPal(15)[as.numeric(format(quotes_d$d,"%Y"))-1997] # now the scatterplot for all stock combinations (again implicitely # excluding the date column) is drawn. png("scatterplot_quotes_d.png", width=5000,height=5000) plot(quotes_d[,which(names(quotes_d) %in% stocks)], pch=19, # a filled circle for each dot cex=.5, # half of default size col=quotes_d$color # the dot-wise coloring ) dev.off() # because dots are lying on top of each other it is a good idea # to check the density. hexbin itself doesn't draw a matrix of # density plots so we use splom() from the lattice package which # draws a matrix of scatterplots. By choosing the given value of # the panel parameter we tell splom() to insted use hexbin for # drawing. png("hexbin_quotes_d.png",width=5000,height=5000) splom(quotes_d[,which(names(quotes_d) %in% stocks)], panel="panel.hexbinplot", # hexbin for each plot colramp=colorRampPalette( c('burlywood1','brown') # a brownish coloring )) dev.off() |