01 Jun

Bland-Altman diagram with ggplot2

Hello world!

Today, I’d like to share with you a code for plotting Bland-Altman diagrams.
The Bland-Altman diagram (Bland & Altman, 1986 and 1999), or difference plot, is a graphical method to compare two measurements techniques. In this graphical method the differences (or alternatively the ratios) between the two techniques are plotted against the averages of the two techniques. Alternatively (Krouwer, 2008) the differences can be plotted against one of the two methods, if this method is a reference or “gold standard” method.
Horizontal lines are drawn at the mean difference, and at the limits of agreement, which are defined as the mean difference plus and minus 1.96 times the standard deviation of the differences.

Bland.Altman.diagram <-
  function(x,y,id=FALSE, alpha = .05,rep.meas = FALSE,subject,idname = FALSE, xname = 'x',yname = 'y',...) {
 
    library(ggplot2)
	  library(RColorBrewer)
 
    #*** 1. Set a few constants
    z <- qnorm(1 - alpha / 2)  ## value of z corresponding to alpha
    d <- x - y               ## pair-wise differences
    m <- (x + y) / 2           ## pair-wise means
 
    #*** 2. Calculate mean difference
    d.mn <- mean(d,na.rm = TRUE)
 
    #*** 3. Calculate difference standard deviation
    if (rep.meas == FALSE) {
      d.sd = sqrt(var(d,na.rm = TRUE))
    } else {
      #*** 3a. Ensure subject is a factor variable
      if (!is.factor(subject))
        subject <- as.factor(subject)
 
      #*** 3b. Extract model information
      n <- length(levels(subject))      # Number of subjects
      model <- aov(d ~ subject)           # One way analysis of variance
      MSB <- anova(model)[[3]][1]       # Degrees of Freedom
      MSW <- anova(model)[[3]][2]       # Sums of Squares
 
      #*** 3c. Calculate number of complete pairs for each subject
      pairs <- NULL
      for (i in 1:length(levels(as.factor(subject)))) {
        pairs[i] <- sum(is.na(d[subject == levels(subject)[i]]) == FALSE)
      }
      Sig.dl <-
        (MSB - MSW) / ((sum(pairs) ^ 2 - sum(pairs ^ 2)) / ((n - 1) * sum(pairs)))
      d.sd <- sqrt(Sig.dl + MSW)
    }
 
    #*** 4. Calculate lower and upper confidence limits
    ucl <- d.mn + z * d.sd
    lcl <- d.mn - z * d.sd
    print(d.mn)
    print(ucl)
    print(lcl)
 
    #*** 5. Make Plot
    xlabstr = paste(c('Mean(',xname,', ',yname,')'), sep = "",collapse = "")
    ylabstr = paste(c(xname, ' - ', yname), sep = "",collapse = "")
    id = ifelse(id==FALSE, as.factor(rep(1,length(m))), as.factor(id))
    xy <- data.frame(m,d,id)
    xy$id <- as.factor(xy$id)
    maxm <- max(m)
 
 
    #scatterplot of x and y variables
    scatter <- ggplot(xy,aes(m, d)) 
 
 
    if(idname == FALSE){
      scatter <- scatter + geom_point(color='darkgrey') + scale_color_manual( guide=FALSE)
    } else {
      scatter <- scatter + geom_point(aes(color = id))+ scale_color_brewer(idname, palette="Set1") # Set1
    }
 
    scatter <- scatter + geom_abline(intercept = d.mn,slope = 0, colour = "black",size = 1,linetype = "dashed" ) +
      geom_abline(
        intercept = ucl,slope = 0,colour = "black",size = 1,linetype = "dotted"
      ) +
      geom_abline(
        intercept = lcl,slope = 0,colour = "black",size = 1,linetype = "dotted"
      ) +
      ylim(min(lcl, -max(xy$d), min(xy$d)), max(ucl, max(xy$d), -min(xy$d)))+
      xlab(xlabstr) +
      ylab(ylabstr) +
      #geom_text(x=maxm-2.4,y=d.mn,label='Mean',colour='black',size=4)+
      #geom_text(x=maxm-2.4,y=ucl,label='+1.96 SD',colour='black',size=4)+
      #geom_text(x=maxm-2.4,y=lcl,label='-1.96 SD',colour='black',size=4)+
      theme(
        plot.background = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.line = element_line(color="black", size = 1)
    )
 
    plot(scatter)
 
    values <- round(cbind(lcl,d.mn,ucl),4)
    colnames(values) <- c("LCL","Mean","UCL")
    if (rep.meas == FALSE){
      Output <- list(limits = values,Var = d.sd ^ 2)
    } else {
      Output <- list(limits = values,Var = Sig.dl,MSB = MSB,MSW = MSW)
    }
    return(Output)
  }
 
A <- c(-0.358, 0.788, 1.23, -0.338, -0.789, -0.255, 0.645, 0.506, 
       0.774, -0.511, -0.517, -0.391, 0.681, -2.037, 2.019, -0.447, 
       0.122, -0.412, 1.273, -2.165)
B <- c(0.121, 1.322, 1.929, -0.339, -0.515, -0.029, 1.322, 0.951, 
       0.799, -0.306, -0.158, 0.144, 1.132, -0.675, 2.534, -0.398, 0.537, 
       0.173, 1.508, -1.955)
 
Bland.Altman.diagram(A, B,alpha = .05,rep.meas = F,xname = 'A',yname = 'B')

The result is the following:
Rplot01

The ‘id’ field allows you to define different categories that will be colored differently on the graph:Rplot02

13 Jan

Plot points with a google map background

Good Morning Followers!

Yesterday, I found a nice post here by Joseph Rickert. I produces nice plots with points and a google map background. Try it and have fun!

library(ggmap)
library(plyr)
library(gridExtra)
temp <- tempfile()
download.file("http://www.plantsciences.ucdavis.edu/plant/data.zip", temp)
connection <- unz(temp, "Data/Set3/Set3data.csv")
rice <- read.csv(connection)
names(rice) <- tolower(names(rice))
# Create a custom soil attribute plot
# @param df Data frame containing data for a field
# @param attribute Soil attribute
# @return Custom soil attribute plot
create_plot <- function(df, attribute) {
  map <- get_map(location = c(median(df$longitude), 
                              median(df$latitude)),
                 maptype = "satellite",
                 source = "google",
                 crop = FALSE,
                 zoom = 15)
  plot <- ggmap(map) + geom_point(aes_string(x = "longitude", 
                                             y = "latitude",
                                             color = attribute),
                                  size = 5,
                                  data = df)
  plot <- plot + ggtitle(paste("Farmer", df$farmer, "/ Field", df$field))
  plot <- plot + scale_color_gradient(low = "darkorange", high = "darkorchid4")
  return(plot)
}
ph_plot <- dlply(rice, "field", create_plot, attribute = "ph")
ph_plots <- do.call(arrangeGrob, ph_plot)

6a010534b1db25970b01bb07d2c8c1970d-800wi

07 Jan

Dreams come true: a function for RGB plots with GGPLOT2

In love with the ggplot2 look? Sick of using plotRGB to display your RGB raster?
Here comes the rggplot function!

Have a look:

 
rggbplot <- function(inRGBRst,npix=NA,scale = 'lin'){
 
  rgblinstretch <- function(rgbDf){
    maxList <- apply(rgbDf,2,max)
    minList <- apply(rgbDf,2,min)
    temp<-rgbDf
    for(i in c(1:3)){
      temp[,i] <- (temp[,i]-minList[i])/(maxList[i]-minList[i])
    }
    return(temp)
  }
 
  rgbeqstretch<-function(rgbDf){
 
    temp<-rgbDf
    for(i in c(1:3)){
      unique <- na.omit(temp[,i])
      if (length(unique>0)){
        ecdf<-ecdf(unique)
        temp[,i] <- apply(temp[,i,drop=FALSE],2,FUN=function(x) ecdf(x))
      }
    }
    return(temp)
  }
 
  if(is.na(npix)){
    if(ncell(inRGBRst)>5000){
      npix <- 5000
    }
    else{
      npix <- ncell(inRGBRst)
    }
  }
  x <- sampleRegular(inRGBRst, size=npix, asRaster = TRUE)
  dat <- as.data.frame(x, xy=TRUE)
  colnames(dat)[3:5]<-c('r','g','b')
 
  if(scale=='lin'){
    dat[,3:5]<- rgblinstretch(dat[,3:5])
  } else if(scale=='stretch'){
    dat[,3:5]<- rgbeqstretch(dat[,3:5])
  }
 
  p <- ggplot()+ geom_tile(data=dat, aes(x=x, y=y, fill=rgb(r,g,b))) + scale_fill_identity()
 
}
 
b <- (brick(system.file("external/rlogo.grd", package="raster")))*6-600
print(rggbplot(b))

rgb_ggplot2

25 Dec

Charting time-series as calendar heat maps

Merry GISmas dearest followers!
A couple of weeks ago, I made a post on heatmaps. Today I came across a nice answer on stackoverflow, it goes like this:

# LOAD THE DATA
stock <- "MSFT"
start.date <- "2006-01-12"
end.date <- Sys.Date()
quote <- paste("http://ichart.finance.yahoo.com/table.csv?s=",
               stock, "&a=", substr(start.date,6,7),
               "&b=", substr(start.date, 9, 10),
               "&c=", substr(start.date, 1,4), 
               "&d=", substr(end.date,6,7),
               "&e=", substr(end.date, 9, 10),
               "&f=", substr(end.date, 1,4),
               "&g=d&ignore=.csv", sep="")    
stock.data <- read.csv(quote, as.is=TRUE)
stock.data <- transform(stock.data,
                        week = as.POSIXlt(Date)$yday %/% 7 + 1,
                        wday = as.POSIXlt(Date)$wday,
                        year = as.POSIXlt(Date)$year + 1900)

The data has the following structure:

> head(stock.data)
        Date  Open  High   Low Close   Volume Adj.Close week wday year
1 2014-12-24 48.64 48.64 48.08 48.14 11437800     48.14   52    3 2014
2 2014-12-23 48.37 48.80 48.13 48.45 23648100     48.45   51    2 2014
3 2014-12-22 47.78 48.12 47.71 47.98 26566000     47.98   51    1 2014
4 2014-12-19 47.63 48.10 47.17 47.66 64551200     47.66   51    5 2014
5 2014-12-18 46.58 47.52 46.34 47.52 40105600     47.52   51    4 2014
6 2014-12-17 45.05 45.95 44.90 45.74 34970900     45.74   51    3 2014

And now this is how we chart it:

library(ggplot2)
ggplot(stock.data, aes(week, wday, fill = Adj.Close)) + 
  geom_tile(colour = "white") + 
  scale_fill_gradientn('MSFT \n Adjusted Close',colours = c("#D61818","#FFAE63","#FFFFBD","#B5E384")) + 
  facet_wrap(~ year, ncol = 1)

Here is how it looks like :-)


Calendar heatmap


05 Dec

Plot heat maps for correlations, confusion matrices etc.

Hi folks!

Wanna present your table in a nice way? Have you ever thought of heat maps?
Here is an example on how to use them:

library(ggplot2)
library(plyr) # might be not needed here anyway it is a must-have package I think in R 
library(reshape2) # to "melt" your dataset
library (scales) # it has a "rescale" function which is needed in heatmaps 
library(RColorBrewer) # for convenience of heatmap colors, it reflects your mood sometimes
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
head(nba)
nba$Name <- with(nba, reorder(Name, PTS))
nba.m <- melt(nba)
nba.m <- ddply(nba.m, .(variable), transform,rescale = rescale(value))
p <- ggplot(nba.m, aes(variable, Name)) + geom_tile(aes(fill = rescale),colour = "white") +  scale_fill_gradient2(low="#006400", mid="#f2f6c3",high="#cd0000",midpoint=0.5)

The nba data.frame has the folowing structure:

> head(nba)
            Name  G  MIN  PTS  FGM  FGA   FGP FTM FTA   FTP X3PM X3PA  X3PP ORB DRB TRB AST STL BLK  TO  PF
1   Dwyane Wade  79 38.6 30.2 10.8 22.0 0.491 7.5 9.8 0.765  1.1  3.5 0.317 1.1 3.9 5.0 7.5 2.2 1.3 3.4 2.3
2  LeBron James  81 37.7 28.4  9.7 19.9 0.489 7.3 9.4 0.780  1.6  4.7 0.344 1.3 6.3 7.6 7.2 1.7 1.1 3.0 1.7
3   Kobe Bryant  82 36.2 26.8  9.8 20.9 0.467 5.9 6.9 0.856  1.4  4.1 0.351 1.1 4.1 5.2 4.9 1.5 0.5 2.6 2.3
4 Dirk Nowitzki  81 37.7 25.9  9.6 20.0 0.479 6.0 6.7 0.890  0.8  2.1 0.359 1.1 7.3 8.4 2.4 0.8 0.8 1.9 2.2
5 Danny Granger  67 36.2 25.8  8.5 19.1 0.447 6.0 6.9 0.878  2.7  6.7 0.404 0.7 4.4 5.1 2.7 1.0 1.4 2.5 3.1
6  Kevin Durant  74 39.0 25.3  8.9 18.8 0.476 6.1 7.1 0.863  1.3  3.1 0.422 1.0 5.5 6.5 2.8 1.3 0.7 3.0 1.8

And here is the output. Nice isn’t it?


heat_map