17 Jul

Fast Euclidean Distance Function

HolĂ  GISpo,

Today, a simple function to (highly) speed up the standard Euclidean distance function in R when you use large matrix (n> 100.000.000).

euc_dist <- function(m) {mtm <- Matrix::tcrossprod(m); sq <- rowSums(m*m);  sqrt(outer(sq,sq,"+") - 2*mtm)} 
 
# Example
M <- matrix(rexp(1000000, rate=.1), ncol=1000)
 
ptm <- proc.time()
d <- dist(M, method = "euclidean")
proc.time() - ptm
 
#   user  system elapsed 
#   1.08    0.00    1.08 
 
ptm <- proc.time()
d2 <- euc_dist(M)
d2 <- as.dist(d2)
proc.time() - ptm
 
#   user  system elapsed 
#  0.424   0.008   0.429 
 
isTRUE(all.equal(as.matrix(d), as.matrix(d2)))
#TRUE
13 Jul

Italic label in a plot using facet_wrap

Hi!
I just got asked by a journal to italicize some labels in a plot… So here is a small code snippet to assist you when you get asked the same:

factor1=rep(letters[1:3], each=3)
factor2=rep(1:3,times=3)
x=rep(1,9)
y=1:9
df=cbind.data.frame(factor1,factor2,x,y)
 
levels(df$factor1)= c("a"=expression(paste("factor_", italic("a"))),
                      "b"=expression(paste("factor_", italic("b"))),
                      "c"=expression(paste("factor_", italic("c"))))
 
ggplot(df, aes(x=x, y=x))+facet_grid(factor2~factor1, labeller=label_parsed)+geom_point()

And the result is:
italic

Thanks to Berengere Husson for the tip!

01 Jun

Bland-Altman diagram with ggplot2

Hello world!

Today, I’d like to share with you a code for plotting Bland-Altman diagrams.
The Bland-Altman diagram (Bland & Altman, 1986 and 1999), or difference plot, is a graphical method to compare two measurements techniques. In this graphical method the differences (or alternatively the ratios) between the two techniques are plotted against the averages of the two techniques. Alternatively (Krouwer, 2008) the differences can be plotted against one of the two methods, if this method is a reference or “gold standard” method.
Horizontal lines are drawn at the mean difference, and at the limits of agreement, which are defined as the mean difference plus and minus 1.96 times the standard deviation of the differences.

Bland.Altman.diagram <-
  function(x,y,id=FALSE, alpha = .05,rep.meas = FALSE,subject,idname = FALSE, xname = 'x',yname = 'y',...) {
 
    library(ggplot2)
	  library(RColorBrewer)
 
    #*** 1. Set a few constants
    z <- qnorm(1 - alpha / 2)  ## value of z corresponding to alpha
    d <- x - y               ## pair-wise differences
    m <- (x + y) / 2           ## pair-wise means
 
    #*** 2. Calculate mean difference
    d.mn <- mean(d,na.rm = TRUE)
 
    #*** 3. Calculate difference standard deviation
    if (rep.meas == FALSE) {
      d.sd = sqrt(var(d,na.rm = TRUE))
    } else {
      #*** 3a. Ensure subject is a factor variable
      if (!is.factor(subject))
        subject <- as.factor(subject)
 
      #*** 3b. Extract model information
      n <- length(levels(subject))      # Number of subjects
      model <- aov(d ~ subject)           # One way analysis of variance
      MSB <- anova(model)[[3]][1]       # Degrees of Freedom
      MSW <- anova(model)[[3]][2]       # Sums of Squares
 
      #*** 3c. Calculate number of complete pairs for each subject
      pairs <- NULL
      for (i in 1:length(levels(as.factor(subject)))) {
        pairs[i] <- sum(is.na(d[subject == levels(subject)[i]]) == FALSE)
      }
      Sig.dl <-
        (MSB - MSW) / ((sum(pairs) ^ 2 - sum(pairs ^ 2)) / ((n - 1) * sum(pairs)))
      d.sd <- sqrt(Sig.dl + MSW)
    }
 
    #*** 4. Calculate lower and upper confidence limits
    ucl <- d.mn + z * d.sd
    lcl <- d.mn - z * d.sd
    print(d.mn)
    print(ucl)
    print(lcl)
 
    #*** 5. Make Plot
    xlabstr = paste(c('Mean(',xname,', ',yname,')'), sep = "",collapse = "")
    ylabstr = paste(c(xname, ' - ', yname), sep = "",collapse = "")
    id = ifelse(id==FALSE, as.factor(rep(1,length(m))), as.factor(id))
    xy <- data.frame(m,d,id)
    xy$id <- as.factor(xy$id)
    maxm <- max(m)
 
 
    #scatterplot of x and y variables
    scatter <- ggplot(xy,aes(m, d)) 
 
 
    if(idname == FALSE){
      scatter <- scatter + geom_point(color='darkgrey') + scale_color_manual( guide=FALSE)
    } else {
      scatter <- scatter + geom_point(aes(color = id))+ scale_color_brewer(idname, palette="Set1") # Set1
    }
 
    scatter <- scatter + geom_abline(intercept = d.mn,slope = 0, colour = "black",size = 1,linetype = "dashed" ) +
      geom_abline(
        intercept = ucl,slope = 0,colour = "black",size = 1,linetype = "dotted"
      ) +
      geom_abline(
        intercept = lcl,slope = 0,colour = "black",size = 1,linetype = "dotted"
      ) +
      ylim(min(lcl, -max(xy$d), min(xy$d)), max(ucl, max(xy$d), -min(xy$d)))+
      xlab(xlabstr) +
      ylab(ylabstr) +
      #geom_text(x=maxm-2.4,y=d.mn,label='Mean',colour='black',size=4)+
      #geom_text(x=maxm-2.4,y=ucl,label='+1.96 SD',colour='black',size=4)+
      #geom_text(x=maxm-2.4,y=lcl,label='-1.96 SD',colour='black',size=4)+
      theme(
        plot.background = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.line = element_line(color="black", size = 1)
    )
 
    plot(scatter)
 
    values <- round(cbind(lcl,d.mn,ucl),4)
    colnames(values) <- c("LCL","Mean","UCL")
    if (rep.meas == FALSE){
      Output <- list(limits = values,Var = d.sd ^ 2)
    } else {
      Output <- list(limits = values,Var = Sig.dl,MSB = MSB,MSW = MSW)
    }
    return(Output)
  }
 
A <- c(-0.358, 0.788, 1.23, -0.338, -0.789, -0.255, 0.645, 0.506, 
       0.774, -0.511, -0.517, -0.391, 0.681, -2.037, 2.019, -0.447, 
       0.122, -0.412, 1.273, -2.165)
B <- c(0.121, 1.322, 1.929, -0.339, -0.515, -0.029, 1.322, 0.951, 
       0.799, -0.306, -0.158, 0.144, 1.132, -0.675, 2.534, -0.398, 0.537, 
       0.173, 1.508, -1.955)
 
Bland.Altman.diagram(A, B,alpha = .05,rep.meas = F,xname = 'A',yname = 'B')

The result is the following:
Rplot01

The ‘id’ field allows you to define different categories that will be colored differently on the graph:Rplot02

18 Apr

Online LaTeX diff tool

Hi PubliGISer,

You want to publish the revision of your awesome paper but you need a track change to meet reviewers’ requirements and you do not know how to get it? You are desperate and think that you will not be able to realize your dream of a successful research career?

Online LaTeX diff tool is the TOOL you are looking for.

Just copy paste the old and the new latex files and this tool will automatically compute the difference between the 2 documents.

Problem solved.

22 Mar

Download files from Dropbox via shareable link

Hello there!

I have a nice piece of code for today on how to download a file from a dropbox shareable link (I reckon it adapted slightly a code found here). Here is how it works. Argument x is the document name, d the document key, and outfile is the desired filename and location.

dl_from_dropbox <- function(x, key, outfile) {
  require(RCurl)
  bin <- getBinaryURL(paste0("https://dl.dropboxusercontent.com/s/", key, "/", x),
                      ssl.verifypeer = FALSE)
  con <- file(outfile, open = "wb")
  writeBin(bin, con)
  close(con)        
}
 
# Example:
dl_from_dropbox("GViewer_Embeds.txt", "06fqlz6gswj80nj", '/home/GViewer_Embeds.txt')
22 Nov

3D plot with a cut-off line

Did you ever notice that things IRL are in 3D? Well today, let’s do some 3D plots!

First, let’s define our three variables

library(rgl)
 
x <- 1:5/10
y <- 1:5
z <- x %o% y
z <- z + .2*z*runif(25) - .1*z

Now let’s suppose that we want to plot an additional line at a specific cut-off value of 0.6

cutoff <- 0.6

Let’s plot all that with the RGL package:

persp(x, y, z, theta=-35, phi=10,col="lightgrey",xlab="X factor", ylab="Why", zlab="Z-bra")
cLines <- contourLines(x,y,z,levels=c(cutoff))
lines(trans3d(x=cLines[[1]]$x, y=cLines[[1]]$y, z=cutoff,pmat=p ),col = 'red',lw=2,lt=2)

Here is the figure:
screenshot-from-2016-11-22-175501

10 Nov

Count points in polygons

Hi guys,

Here is a quick r-snippet to count points inside polygons:

library("raster")
 
x <- getData('GADM', country='ITA', level=1)
class(x)
# [1] "SpatialPolygonsDataFrame"
# attr(,"package")
# [1] "sp"
 
set.seed(1)
# sample random points
p <- spsample(x, n=300, type="random")
p <- SpatialPointsDataFrame(p, data.frame(id=1:300))
 
proj4string(x)
# [1] " +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"
proj4string(p)
# [1] " +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"
 
plot(x)
plot(p, col="red" , add=TRUE)

Here is the figure:
screenshot-from-2016-11-10-165353

res <- over(p, x)
table(res$NAME_1) # count points
#               Abruzzo                Apulia            Basilicata
#                    11                    20                     9
#              Calabria              Campania        Emilia-Romagna
#                    16                     8                    25
# Friuli-Venezia Giulia                 Lazio               Liguria
#                     7                    14                     7
#             Lombardia                Marche                Molise
#                    22                     4                     3
#              Piemonte              Sardegna                Sicily
#                    35                    18                    21
#               Toscana   Trentino-Alto Adige                Umbria
#                    33                    15                     6
#         Valle d'Aosta                Veneto
#                     4                    22
15 May

Load googlesheet data in R

Dear R-addict,

At the era of cloud-based computing, functions such as read.csv, read.xls or even fread are really old-fashioned.
Here is a chunk of code that will allow you to load data from a googlesheet:

 
install.packages("gsheet")
library(gsheet)
# Download a sheet
 
# Download a sheet
url <- 'https://docs.google.com/spreadsheets/d/1XBs7p44-djCPmN4TnPEgboUVAdB2mChbAlCjqnVOyQ0'
a <- gsheet2tbl(url)
12 May

Download daymet data with R

Hey folks ! It’s been a while !

Interested in meteorological data over US ? Here is a way to easily download Daymet data inspired from DaymetR package.

Daymet is a collection of algorithms and computer software designed to interpolate and extrapolate from daily meteorological observations to produce gridded estimates of daily weather parameters. Weather parameters generated include daily surfaces of minimum and maximum temperature, precipitation, humidity, and radiation produced on a 1 km x 1 km gridded surface over the conterminous United States, Mexico, and Southern Canada.

Download the “tiles” shapefile here.

setwd("/bite/guru-blard/DAYMET")
 
param = c("vp", "tmin", "tmax", "swe", "srad", "prcp", "dayl") %see here https://daymet.ornl.gov/overview.html for the available variable
tiles=c(11922:11925, 11742:11745) %id of the tiles of interest (cfr shapefile)
year_range=1985:2015
 
for (i in year_range) {
  for (j in tiles) {
    for (k in param) {
      download_string = sprintf("https://daymet.ornl.gov/thredds/fileServer/ornldaac/1219/tiles/%s/%s_%s/%s.nc", 
                                i, j, i, k)
      daymet_file = paste(k, "_", i, "_", j, ".nc", 
                          sep = "")
      cat(paste("Downloading DAYMET data for tile: ", 
                j, "; year: ", i, "; product: ", k, "\n", 
                sep = ""))
      try(download(download_string, daymet_file, quiet = TRUE, 
                   mode = "wb"), silent = FALSE)
    }
  }
}