13 Jul

Italic label in a plot using facet_wrap

Hi!
I just got asked by a journal to italicize some labels in a plot… So here is a small code snippet to assist you when you get asked the same:

factor1=rep(letters[1:3], each=3)
factor2=rep(1:3,times=3)
x=rep(1,9)
y=1:9
df=cbind.data.frame(factor1,factor2,x,y)
 
levels(df$factor1)= c("a"=expression(paste("factor_", italic("a"))),
                      "b"=expression(paste("factor_", italic("b"))),
                      "c"=expression(paste("factor_", italic("c"))))
 
ggplot(df, aes(x=x, y=x))+facet_grid(factor2~factor1, labeller=label_parsed)+geom_point()

And the result is:
italic

Thanks to Berengere Husson for the tip!

01 Jun

Bland-Altman diagram with ggplot2

Hello world!

Today, I’d like to share with you a code for plotting Bland-Altman diagrams.
The Bland-Altman diagram (Bland & Altman, 1986 and 1999), or difference plot, is a graphical method to compare two measurements techniques. In this graphical method the differences (or alternatively the ratios) between the two techniques are plotted against the averages of the two techniques. Alternatively (Krouwer, 2008) the differences can be plotted against one of the two methods, if this method is a reference or “gold standard” method.
Horizontal lines are drawn at the mean difference, and at the limits of agreement, which are defined as the mean difference plus and minus 1.96 times the standard deviation of the differences.

Bland.Altman.diagram <-
  function(x,y,id=FALSE, alpha = .05,rep.meas = FALSE,subject,idname = FALSE, xname = 'x',yname = 'y',...) {
 
    library(ggplot2)
	  library(RColorBrewer)
 
    #*** 1. Set a few constants
    z <- qnorm(1 - alpha / 2)  ## value of z corresponding to alpha
    d <- x - y               ## pair-wise differences
    m <- (x + y) / 2           ## pair-wise means
 
    #*** 2. Calculate mean difference
    d.mn <- mean(d,na.rm = TRUE)
 
    #*** 3. Calculate difference standard deviation
    if (rep.meas == FALSE) {
      d.sd = sqrt(var(d,na.rm = TRUE))
    } else {
      #*** 3a. Ensure subject is a factor variable
      if (!is.factor(subject))
        subject <- as.factor(subject)
 
      #*** 3b. Extract model information
      n <- length(levels(subject))      # Number of subjects
      model <- aov(d ~ subject)           # One way analysis of variance
      MSB <- anova(model)[[3]][1]       # Degrees of Freedom
      MSW <- anova(model)[[3]][2]       # Sums of Squares
 
      #*** 3c. Calculate number of complete pairs for each subject
      pairs <- NULL
      for (i in 1:length(levels(as.factor(subject)))) {
        pairs[i] <- sum(is.na(d[subject == levels(subject)[i]]) == FALSE)
      }
      Sig.dl <-
        (MSB - MSW) / ((sum(pairs) ^ 2 - sum(pairs ^ 2)) / ((n - 1) * sum(pairs)))
      d.sd <- sqrt(Sig.dl + MSW)
    }
 
    #*** 4. Calculate lower and upper confidence limits
    ucl <- d.mn + z * d.sd
    lcl <- d.mn - z * d.sd
    print(d.mn)
    print(ucl)
    print(lcl)
 
    #*** 5. Make Plot
    xlabstr = paste(c('Mean(',xname,', ',yname,')'), sep = "",collapse = "")
    ylabstr = paste(c(xname, ' - ', yname), sep = "",collapse = "")
    id = ifelse(id==FALSE, as.factor(rep(1,length(m))), as.factor(id))
    xy <- data.frame(m,d,id)
    xy$id <- as.factor(xy$id)
    maxm <- max(m)
 
 
    #scatterplot of x and y variables
    scatter <- ggplot(xy,aes(m, d)) 
 
 
    if(idname == FALSE){
      scatter <- scatter + geom_point(color='darkgrey') + scale_color_manual( guide=FALSE)
    } else {
      scatter <- scatter + geom_point(aes(color = id))+ scale_color_brewer(idname, palette="Set1") # Set1
    }
 
    scatter <- scatter + geom_abline(intercept = d.mn,slope = 0, colour = "black",size = 1,linetype = "dashed" ) +
      geom_abline(
        intercept = ucl,slope = 0,colour = "black",size = 1,linetype = "dotted"
      ) +
      geom_abline(
        intercept = lcl,slope = 0,colour = "black",size = 1,linetype = "dotted"
      ) +
      ylim(min(lcl, -max(xy$d), min(xy$d)), max(ucl, max(xy$d), -min(xy$d)))+
      xlab(xlabstr) +
      ylab(ylabstr) +
      #geom_text(x=maxm-2.4,y=d.mn,label='Mean',colour='black',size=4)+
      #geom_text(x=maxm-2.4,y=ucl,label='+1.96 SD',colour='black',size=4)+
      #geom_text(x=maxm-2.4,y=lcl,label='-1.96 SD',colour='black',size=4)+
      theme(
        plot.background = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.line = element_line(color="black", size = 1)
    )
 
    plot(scatter)
 
    values <- round(cbind(lcl,d.mn,ucl),4)
    colnames(values) <- c("LCL","Mean","UCL")
    if (rep.meas == FALSE){
      Output <- list(limits = values,Var = d.sd ^ 2)
    } else {
      Output <- list(limits = values,Var = Sig.dl,MSB = MSB,MSW = MSW)
    }
    return(Output)
  }
 
A <- c(-0.358, 0.788, 1.23, -0.338, -0.789, -0.255, 0.645, 0.506, 
       0.774, -0.511, -0.517, -0.391, 0.681, -2.037, 2.019, -0.447, 
       0.122, -0.412, 1.273, -2.165)
B <- c(0.121, 1.322, 1.929, -0.339, -0.515, -0.029, 1.322, 0.951, 
       0.799, -0.306, -0.158, 0.144, 1.132, -0.675, 2.534, -0.398, 0.537, 
       0.173, 1.508, -1.955)
 
Bland.Altman.diagram(A, B,alpha = .05,rep.meas = F,xname = 'A',yname = 'B')

The result is the following:
Rplot01

The ‘id’ field allows you to define different categories that will be colored differently on the graph:Rplot02

22 Mar

Download files from Dropbox via shareable link

Hello there!

I have a nice piece of code for today on how to download a file from a dropbox shareable link (I reckon it adapted slightly a code found here). Here is how it works. Argument x is the document name, d the document key, and outfile is the desired filename and location.

dl_from_dropbox <- function(x, key, outfile) {
  require(RCurl)
  bin <- getBinaryURL(paste0("https://dl.dropboxusercontent.com/s/", key, "/", x),
                      ssl.verifypeer = FALSE)
  con <- file(outfile, open = "wb")
  writeBin(bin, con)
  close(con)        
}
 
# Example:
dl_from_dropbox("GViewer_Embeds.txt", "06fqlz6gswj80nj", '/home/GViewer_Embeds.txt')
22 Nov

3D plot with a cut-off line

Did you ever notice that things IRL are in 3D? Well today, let’s do some 3D plots!

First, let’s define our three variables

library(rgl)
 
x <- 1:5/10
y <- 1:5
z <- x %o% y
z <- z + .2*z*runif(25) - .1*z

Now let’s suppose that we want to plot an additional line at a specific cut-off value of 0.6

cutoff <- 0.6

Let’s plot all that with the RGL package:

persp(x, y, z, theta=-35, phi=10,col="lightgrey",xlab="X factor", ylab="Why", zlab="Z-bra")
cLines <- contourLines(x,y,z,levels=c(cutoff))
lines(trans3d(x=cLines[[1]]$x, y=cLines[[1]]$y, z=cutoff,pmat=p ),col = 'red',lw=2,lt=2)

Here is the figure:
screenshot-from-2016-11-22-175501

10 Nov

Count points in polygons

Hi guys,

Here is a quick r-snippet to count points inside polygons:

library("raster")
 
x <- getData('GADM', country='ITA', level=1)
class(x)
# [1] "SpatialPolygonsDataFrame"
# attr(,"package")
# [1] "sp"
 
set.seed(1)
# sample random points
p <- spsample(x, n=300, type="random")
p <- SpatialPointsDataFrame(p, data.frame(id=1:300))
 
proj4string(x)
# [1] " +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"
proj4string(p)
# [1] " +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"
 
plot(x)
plot(p, col="red" , add=TRUE)

Here is the figure:
screenshot-from-2016-11-10-165353

res <- over(p, x)
table(res$NAME_1) # count points
#               Abruzzo                Apulia            Basilicata
#                    11                    20                     9
#              Calabria              Campania        Emilia-Romagna
#                    16                     8                    25
# Friuli-Venezia Giulia                 Lazio               Liguria
#                     7                    14                     7
#             Lombardia                Marche                Molise
#                    22                     4                     3
#              Piemonte              Sardegna                Sicily
#                    35                    18                    21
#               Toscana   Trentino-Alto Adige                Umbria
#                    33                    15                     6
#         Valle d'Aosta                Veneto
#                     4                    22
15 May

Load googlesheet data in R

Dear R-addict,

At the era of cloud-based computing, functions such as read.csv, read.xls or even fread are really old-fashioned.
Here is a chunk of code that will allow you to load data from a googlesheet:

 
install.packages("gsheet")
library(gsheet)
# Download a sheet
 
# Download a sheet
url <- 'https://docs.google.com/spreadsheets/d/1XBs7p44-djCPmN4TnPEgboUVAdB2mChbAlCjqnVOyQ0'
a <- gsheet2tbl(url)
10 Mar

Efficient manuscript revision with latexdiff

Hi there,

Here is a nice tip to highlight the revisions in your manuscripts before resubmitting to your favorite journals: use latexdiff!

latexdiff --type=CTRADITIONAL old_shitty_manuscript.tex new_shiny_manuscript.tex > diff.tex

You can then easily customize how differences appear in your pdf by changing the commands in the preamble.
Here are my settings:

\providecommand{\DIFadd}[1]{{\protect\color{blue} #1}} %DIF PREAMBLE
\providecommand{\DIFdel}[1]{} %DIF PREAMBLE

New text appears in blue and old text is simply removed.

22 Jan

Styling shapefiles for QGIS: creating symbology in R

Hello world!

Today, I’d like to share with y’all a function that I wrote to save symbology associated with a shapefile of points in R.
All you need is you shapefile in which you should have three columns:
1. col: indicating the color you want to associate with to your class in hex values (see the super http://www.color-hex.com/ website.).
2. label: the label of your class
3. value: the numeric value of your label

Enjoy!

Here is the function (you can also find it on my shiny GITHUB). You can also simply type in source_url(“https://rawgit.com/waldnerf/R/master/writeQML.R”) (! requires the devtools package).

writeQML <- function(inShp, col, label, value, filename){
  require(RColorBrewer)
  require(XML)
  inNames <- names(inShp)
  names(inShp)[which(names(inShp)==value)] <- "Value"
  names(inShp)[which(names(inShp)==col)] <- "v"
  names(inShp)[which(names(inShp)==label)] <- "Label"
 
  inShp$v <- unlist(lapply(as.character(inShp$v),function(x) paste(c(col2rgb(x),'255'),collapse = ',')))
  inShp$Label <- as.character(inShp$Label)
 
  inShp <- inShp[-which(duplicated(inShp@data[,c('Value','v','Label')])),]
 
  # set alpha
  alpha = 1
 
  base = newXMLNode("qgis")
  addAttributes(base,version="2.8.1-Wien", minimumScale="0", maximumScale="1e+08", simplifyDrawingHints="0", minLabelScale="0", maxLabelScale="1e+08",simplifyDrawingTol="1" ,simplifyMaxScale="1", hasScaleBasedVisibilityFlag="0", simplifyLocal="1", scaleBasedLabelVisibilityFlag="0")
  trans <- newXMLNode("transparencyLevelInt", 255)
  rend <- newXMLNode("renderer-v2", attrs = c(attr=value,symbollevels="0",type="categorizedSymbol"))
 
  # sort the categories
  categories <- newXMLNode("categories")
  category <- lapply(seq_along(inShp$Value),function(x){newXMLNode("category", attrs = c(symbol = as.character(x-1), value = inShp$Value[x], label = inShp$Label[x]))
  })
  addChildren(categories,category)
 
  # sort the symbols
  symbols <- newXMLNode("symbols")
  symbol <- lapply(seq_along(inShp$Value),function(x){dum.sym <- newXMLNode("symbol", attrs = c(outputUnit="MM",alpha=alpha,type="marker",name=as.character(x-1)))
  layer <- newXMLNode("layer", attrs =c(pass="0",class="SimpleMarker",locked="0"))
  prop <- newXMLNode("prop", attrs =c(k="color",v= inShp$v[x]))
  addChildren(layer, prop)
  addChildren(dum.sym, layer)
  }) 
  addChildren(symbols, symbol)
 
  # add categories and symbols to rend
  addChildren(rend, list(categories, symbols))
  addChildren(base, list(trans, rend))
 
  # save to qml-file
  writeLines(saveXML(base, prefix = "<!DOCTYPE qgis >"), filename)
}

And a self-sufficient example.

# Prepare the shapefile
inDf <- structure(list(x = c(25.582125, 25.880375, 24.955375, 26.066375, 25.409375, 25.902125),
                       y = c(-26.147625, -26.202625, -26.166125, -26.117625, -26.035875, -26.046375), 
                       Class = c(1L, 2L, 3L, 10L, 10L, 10L),
                       colT = c("#ffa500", "#aaeeff", "#a8c093", "#4584dc","#4584dc", "#4584dc"), 
                       attT = c("Stuff", "Other stuff", "Thing","Nice thing", "Nice thing", "Nice thing")), 
                       .Names = c("x", "y", "Class", "colT", "attT"),
                       row.names = c(NA, 6L), class = "data.frame")
 
spat.df <- SpatialPointsDataFrame(inDf[,c('x','y')], inDf)
projection(spat.df) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
shapefile(spat.df, filename='/to/your/path/example.shp',overwrite=T)
filename <- "/to/your/path/example.qml"
 
writeQML(spat.df, "colT", 'attT', 'Class', filename)
12 Aug

Share a legend between multiple plots using grid.arrange

Greetings, followeRs!

I wanted to share a really cool function to create a multiplot with a shared legend!
I was created by Hadley Wickham, don’t hesitate to go and check his other awesome functions.

library(ggplot2)
library(gridExtra)
 
grid_arrange_shared_legend <- function(...) {
    plots <- list(...)
    g <- ggplotGrob(plots[[1]] + theme(legend.position="bottom"))$grobs
    legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]]
    lheight <- sum(legend$height)
    grid.arrange(
        do.call(arrangeGrob, lapply(plots, function(x)
            x + theme(legend.position="none"))),
        legend,
        ncol = 1,
        heights = unit.c(unit(1, "npc") - lheight, lheight))
}
 
dsamp <- diamonds[sample(nrow(diamonds), 1000), ]
p1 <- qplot(carat, price, data=dsamp, colour=clarity)
p2 <- qplot(cut, price, data=dsamp, colour=clarity)
p3 <- qplot(color, price, data=dsamp, colour=clarity)
p4 <- qplot(depth, price, data=dsamp, colour=clarity)
grid_arrange_shared_legend(p1, p2, p3, p4)
21 Apr

Sampling a raster in a class-equalized fashion

Dear followers,

Here is a function that would allow to sample randomly each class of a raster equally given the minor class in the landscape.

rSampleEqualClass <- function(r,prop=1){
 
  if((prop>1) | (prop<0) ){
    print('prop value outside bouds')
  } else {
    # Identify n, the number of pixels to extract by class
    count <- as.matrix(table(r[]))
    n <- ceiling(prop*min(count))
 
    # extract the random pixels
    l <- as.data.frame(sampleStratified(r,size=n))
    colnames(l)<-c("cell",'values')
 
    # create new raster with the selected pixels
    r.out <- r
    r.out[]<-NA
    r.out[l$cell] <- l$values
 
 
  }
  return(r.out)
}

And here a small example to demonstrate its use:

library(raster)
 
r <- raster(system.file("external/rlogo.grd", package="raster"))
r[r<=105]<-0
r[r>105]<-1
plot(r)
 
for(i in 1:10){
  print(i)
  plot(rSampleEqualClass(r,prop=0.8),main=i)
}