20 Nov

Get angle/orientation/direction of GPS track using R


Today a bit of trSIGonometry with this function to compute the angle/orientation/direction between 2 points of a GPX file.

angle <- function(pt1, pt2) {
  conv = 180 / pi # conversion radians / degrees
  diff <- pt2 - pt1
  diff.abs <- abs(diff)
  if (diff[1]>=0 & diff[2]>=0){angle <- pi/2 - atan(diff.abs[2] / diff.abs[1])}
  if (diff[1]>=0 & diff[2]<=0){angle <- pi/2 + atan(diff.abs[2] / diff.abs[1])}
  if (diff[1]<=0 & diff[2]>=0){angle <- 3*pi/2 + atan(diff.abs[2] / diff.abs[1])}
  if (diff[1]<=0 & diff[2]<=0){angle <- pi + (pi/2- atan(diff.abs[2] / diff.abs[1]))}
  return(angle * conv)
#Set folder
# Load file names
filenames <- list.files(pattern=".gpx$")
# Iteration through all gpx files
for (filename in filenames){
  # Load the GPX
  gpx.trackpoints <- readOGR(dsn = filename,  layer= "track_points", stringsAsFactors = F)
  # Extract the coordinates as a matrix:
  gpx.tp <- coordinates(gpx.trackpoints)
  # #Loop through and add angle - the difference between the of the current point and the following one:
  for (tp in 1:(nrow(gpx.tp)-1)) {
    gpx.trackpoints$angle[tp] <- angle(gpx.tp[tp,], gpx.tp[tp+1,])
  gpx.trackpoints@data$Date <- substr(as.character(gpx.trackpoints@data$time), 1,10)
  gpx.trackpoints@data$Hour <- substr(as.character(gpx.trackpoints@data$time),12,19)
  gpx.trackpoints@data$filename <- filename
  writeOGR(gpx.trackpoints, paste0(tools::file_path_sans_ext(filename), ".shp"), layer=tools::file_path_sans_ext(filename), driver="ESRI Shapefile")

Screenshot from 2017-11-20 14-53-09

17 Jul

Fast Euclidean Distance Function

Holà GISpo,

Today, a simple function to (highly) speed up the standard Euclidean distance function in R when you use large matrix (n> 100.000.000).

euc_dist <- function(m) {mtm <- Matrix::tcrossprod(m); sq <- rowSums(m*m);  sqrt(outer(sq,sq,"+") - 2*mtm)} 
# Example
M <- matrix(rexp(1000000, rate=.1), ncol=1000)
ptm <- proc.time()
d <- dist(M, method = "euclidean")
proc.time() - ptm
#   user  system elapsed 
#   1.08    0.00    1.08 
ptm <- proc.time()
d2 <- euc_dist(M)
d2 <- as.dist(d2)
proc.time() - ptm
#   user  system elapsed 
#  0.424   0.008   0.429 
isTRUE(all.equal(as.matrix(d), as.matrix(d2)))
15 May

Load googlesheet data in R

Dear R-addict,

At the era of cloud-based computing, functions such as read.csv, read.xls or even fread are really old-fashioned.
Here is a chunk of code that will allow you to load data from a googlesheet:

# Download a sheet
# Download a sheet
url <- 'https://docs.google.com/spreadsheets/d/1XBs7p44-djCPmN4TnPEgboUVAdB2mChbAlCjqnVOyQ0'
a <- gsheet2tbl(url)
28 Jan

Lazy Raster Processing With Gdal Vrts

Hi there, today an introduction to VRT, a magic tool, by Matthew T. Perry. Post from here.

Lazy Raster Processing With Gdal Vrts.
No, not lazy as in REST :-) … Lazy as in “Lazy evaluation”:

    In computer programming, lazy evaluation is the technique of delaying a computation until the result is required.

Take an example raster processing workflow to go from a bunch of tiled, latlong, GeoTiff digital elevation models to a single shaded relief GeoTiff in projected space:

    1) Merge the tiles together
    2) Reproject the merged DEM (using bilinear or cubic interpolation)
    3) Generate the hillshade from the merged DEM

Simple enough to do with GDAL tools on the command line. Here’s the typical, process-as-you-go implementation:

gdal_merge.py -of GTiff -o srtm_merged.tif srtm_12_*.tif 
gdalwarp -t_srs epsg:3310 -r bilinear -of GTiff srtm_merged.tif srtm_merged_3310.tif 
gdaldem hillshade srtm_merged_3310.tif srtm_merged_3310_shade.tif -of GTiff

Alternately, we can simulate lazy evaluation by using GDAL Virtual Rasters (VRT) to perform the intermediate steps, only outputting the GeoTiff as the final step.

gdalbuildvrt srtm_merged.vrt srtm_12_0*.tif
gdalwarp -t_srs epsg:3310 -r bilinear -of VRT srtm_merged.vrt srtm_merged_3310.vrt 
gdaldem hillshade srtm_merged_3310.vrt srtm_merged_3310_shade2.tif -of GTiff

So what’s the advantage to doing it the VRT way? They both produce exactly the same output raster. Lets compare:


The Lazy VRT method delays all the computationally-intensive processing until it is actually required. The intermediate files, instead of containing the raw raster output of the actual computation, are XML files which contain the instructions to get the desired output. This allows GDAL to do all the processing in one step (the final step #3). The total processing time is not significantly different between the two methods but in terms of the productivity of the GIS analyst, the VRT method is superior. Imagine working with datasets 1000x this size with many more steps – having to type the command, wait 2 hours, type the next, etc. would be a waste of human resources versus assembling the instructions into vrts then hitting the final processing step when you leave the office for a long weekend.

Additionaly, the VRT method produces only small intermediate xml files instead of having a potentially huge data management nightmare of shuffling around GB (or TB) of intermediate outputs! Plus those xml files serve as an excellent piece of metadata which describe the exact processing steps which you can refer to later or adapt to different datasets.

So next time you have a multi-step raster workflow, use the GDAL VRTs to your full advantage – you’ll save yourself time and disk space by being lazy.

13 Jan

Sample raster easily

Hey GISette! What’s up?

Here is a simple way to sample a raster as Raster object or SpatialPointsDataFrame using the library raster in r.


r <- raster(system.file("external/test.grd", package="raster"))
x<-sampleRandom(r, size=1000, asRaster=TRUE)
y<-sampleRandom(r, size=100, sp=TRUE)
plot(y, add=TRUE, pch=20, col="black")
09 Dec

Stratified random sampling in R

Hello GI(uy)s!

What’s up ? Here is a really efficient function (developped by mrdwab) to perform stratified random sampling on data.table object in R. Enjoy !

#' @title Stratified random sampling
#' @description Applies a linear stretch to any multiband raster \code{link{stack}} or \code{link{brick}}
#' @param group The grouping column(s). Can be a character vector or the numeric positions of the columns.
#' @param size  The desired sample size. Can be a decimal (proportionate by group) or an integer (same number of samples per group).
#' @param select A named list with optional subsetting statements.
#' @param replace Logical. Should sampling be done with or without replacement.
#' @param bothSets Logical. Should a list be returned. Useful when setting up a “testing” and “training” sampling setup.
#' @return a data.table object 
#' @author mrdwab source:https://gist.github.com/933ffeaa7a1d718bd10a.git
#' @import data.table
stratifiedDT <- function(indt, group, size, select = NULL, 
                         replace = FALSE, keep.rownames = FALSE,
                         bothSets = FALSE) {
  if (is.numeric(group)) group <- names(indt)[group]
  if (!is.data.table(indt)) indt <- as.data.table(
    indt, keep.rownames = keep.rownames)
  if (is.null(select)) {
    indt <- indt
  } else {
    if (is.null(names(select))) stop("'select' must be a named list")
    if (!all(names(select) %in% names(indt)))
      stop("Please verify your 'select' argument")
    temp <- vapply(names(select), function(x)
      indt[[x]] %in% select[[x]], logical(nrow(indt)))
    indt <- indt[rowSums(temp) == length(select), ]
  df.table <- indt[, .N, by = group]
  if (length(size) > 1) {
    if (length(size) != nrow(df.table))
      stop("Number of groups is ", nrow(df.table),
           " but number of sizes supplied is ", length(size))
    if (is.null(names(size))) {
      stop("size should be entered as a named vector")
    } else {
      ifelse(all(names(size) %in% do.call(
        paste, df.table[, group, with = FALSE])),
        n <- merge(
          setnames(data.table(names(size), ss = size), 
                   c(group, "ss")), by = group),
        stop("Named vector supplied with names ",
             paste(names(size), collapse = ", "),
             "\n but the names for the group levels are ",
             do.call(paste, c(unique(
               df.table[, group, with = FALSE]), collapse = ", "))))
  } else if (size < 1) {
    n <- df.table[, ss := round(N * size, digits = 0)]
  } else if (size >= 1) {
    if (all(df.table$N >= size) || isTRUE(replace)) {
      n <- cbind(df.table, ss = size)
    } else {
        "Some groups\n---",
        do.call(paste, c(df.table[df.table$N < size][, group, with = FALSE], 
                         sep = ".", collapse = ", ")),
        "---\ncontain fewer observations",
        " than desired number of samples.\n",
        "All observations have been returned from those groups.")
      n <- cbind(df.table, ss = pmin(df.table$N, size))
  setkeyv(indt, group)
  setkeyv(n, group)
  indt[, .RNID := sequence(nrow(indt))]
  out1 <- indt[indt[n, list(
    .RNID = sample(.RNID, ss, replace)), by = .EACHI]$`.RNID`]
  if (isTRUE(bothSets)) {
    out2 <- indt[!.RNID %in% out1$`.RNID`]
    indt[, .RNID := NULL]
    out1[, .RNID := NULL]
    out2[, .RNID := NULL]
    list(SAMP1 = out1, SAMP2 = out2)
  } else {
    indt[, .RNID := NULL]
    out1[, .RNID := NULL][]
04 Dec

Random sampling

Good Evening, fellow GIS analysts!

Here is a quick function to randomly draw a set of points over your area of interest, say for an accuracy assessment.

First the function:

randSamp <- function(n,ext,filename=''){
    x <- runif(n, ext@xmin, ext@xmax)
    y <- runif(n, ext@ymin, ext@ymax)
    s <- SpatialPoints(list(x,y))
  } else if (class(ext) %in% c('Raster','RasterStack','RasterBrick')){
    x <- runif(2*n, extent(ext)@xmin, extent(ext)@xmax)
    y <- runif(2*n, extent(ext)@ymin, extent(ext)@ymax)
    s <- SpatialPoints(list(x,y))
    proj4string(s) <- crs(ext)
  } else if (class(ext) %in% c('SpatialPolygonsDataFrame','SpatialPolygons')){
    x <- c()
    y <- c()
    id <- c()
      xl <- runif(2*n, extent(ext)@xmin, extent(ext)@xmax)
      yl <- runif(2*n, extent(ext)@ymin, extent(ext)@ymax)
      sl <- SpatialPoints(list(xl,yl))
      proj4string(sl) <- proj4string(ext)
      point_in_pol <- over( sl , ext , fn = NULL,returnList = FALSE)
      idl <- which(!apply(point_in_pol, 1, function(x) all(is.na(x))))
      x <- cbind(x,xl[idl])
      y <- cbind(y,yl[idl])
    s <- SpatialPoints(list(x[id],y[id]))
    proj4string(s) <- proj4string(ext)
  } else {
    print('ext not of class "Extent" nor "SpatialPolygonsDataFrame", "SpatialPolygons", "Raster", "RasterStack","RasterBrick". I cannot work with this')
    s_points_df <- SpatialPointsDataFrame(s, data=data.frame(ID=1:n))

And a example on how to use it:

dat <- matrix(stats::rnorm(2000), ncol = 2)
ch <- chull(dat)
coords <- dat[c(ch, ch[1]), ]  # closed polygon
ext <- SpatialPolygons(list(Polygons(list(Polygon(coords)), ID=1)))
proj4string(ext) = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")
spsamp <- randSamp(10,ext)
# Now let's visualize this piece of art


03 Dec

On stretching histograms

Hello GIS addicts!

Here are two functions on how to stretch the histograms of your raster for better visualization.

#' @title Linear stretch of a multi-band raster.
#' @description Applies a linear stretch to any multiband raster \code{link{stack}} or \code{link{brick}}
#' @param img Raster* object
#' @param minmax  user defined minimum and maximum for the stretch (optional)
#' @return a Raster* object 
#' @author White-Guru
#' @seealso \code{\link{calc}}
#' @import raster
#' @export
  if(is.na(minmax)) minmax<-c(min(getValues(img),na.rm=T),max(getValues(img),na.rm=T))
  temp<-calc(img,fun=function(x) (255*(x-minmax[1]))/(minmax[2]-minmax[1]))
  #set all values above or below minmax to 0 or 255
#' @title Histogram equalization stretch of a multi-band raster.
#' @description Applies a histogram equalization stretch to any multiband raster \code{link{stack}} or \code{link{brick}}
#' @param img Raster* object
#' @return a Raster* object 
#' @author White-Guru
#' @seealso \code{\link{calc}}
#' @import raster
#' @export
  unique <- na.omit(getValues(img))
  if (length(unique>0)){
    out <- calc(img,fun=function(x) ecdf(x)*255)

And here is a dummy example of their effect:

#import a random raster
b <- (brick(system.file("external/rlogo.grd", package="raster")))*6-600
b4 <- subset(b,1)
nf<-layout(matrix(seq(1,8), byrow=T, nrow=2))
myramp<-colorRampPalette(brewer.pal(5,"PiYG")) #custom ramp from RColorBrewer
# no stretch
# BAD hack in order to get plot(*Raster) to look like it didn't apply a Min/Max stretch
temp<-b4; temp[1]<-0; temp[2]<-255;
plot(temp, col=myramp(255)) #plot with diff color map
hist(getValues(b4), freq=F, main="Original raster")
#min/max stretch and histogram
plot(b4.minmax, col=myramp(255))
hist(getValues(b4.minmax), breaks=seq(0,255), xlim=c(0,255), freq=F, main="Linear Min/Max")
#histogram equalization
plot(eqstretch(b4), col=myramp(255))
hist(getValues(calc(b4, fun=function(x) b4.ecdf(x)*255)), breaks=seq(0,255), xlim=c(0,255), freq=F, main="Histogram Equalization")
#2% linear stretch
b4quant<-quantile(getValues(b4), c(0.02,0.98)) # 2% cutoffs
b4.linstretch<-linstretch(b4, minmax=c(b4quant[1],b4quant[2]))
plot(b4.linstretch, col=myramp(255))
hist(getValues(b4.linstretch), breaks=seq(0,255), xlim=c(0,255), freq=F, main="Linear 2%")

And here is the result:

03 Dec

Use gdal_calc to calculate index (NDVI, NDWI, …) , add stats with gdalinfo and overview with gdaladdo

His GISguys!

Here is a simple example of how to use the very fast multi-thred gdal_calc to get NDWI with stats, histogram and overviews.

          paste(' --outfile=', outpufile,sep=''),
          paste(' --calc=','\"','(A.astype(float)-B)/(A.astype(float)+B)','\"',sep=''),
  system(paste('gdalinfo -hist -stats',outpufile))
          '2 4 8 16')

Here is another version with byte file as output (values ranging from 0 to 255, faster and smaller files):

          paste(' --outfile=', outpufile,sep=''),
          paste(' --calc=','\"','((((A.astype(float)-B)/(A.astype(float)+B))+1)*128)','\"',sep=''),
  system(paste('gdalinfo -hist -stats',outpufile))
          '2 4 8 16')
03 Dec

Define class specific functions

Tired of specifying all the parameters for a graph you do very often?
Here is a dummy example explaining how to define a function specifically for a class:

y18 <- c(1:3, 5, 4, 7:3, 2*(2:5), rep(10, 4))
x18 <- c(1:18)
(ys18  <- smooth.spline(x18,y18))
x <- list(y=y18,ys=ys18,outl=outl)
plot(x,main='Temporal smoothing and outlier deteciton',xlab='Time',ylab='Vegetation Index')

And here is the result: