04 Mar

List filenames in a directory that are different from a specific pattern

Dear followers,

Have you already need to list filenames in a directory that are different from a specific pattern ?

Just let the MaGIS happen with the list.diff function !

list.diff<-function(dir, pattern){
  filename.all<-list.files(dir)
  filename.diff<-list.files(dir, pattern)
  filename<-setdiff(filename.all,filename.diff)
  return(filename)
}
20 Feb

Merge shapefiles list using OGR (ogr2ogr)

Tired of using ArcGis to merge a huge number of shapefiles?

Here is a simple way to use ogr2ogr and merge shapefiles, so easy!

 
list_files<-Sys.glob('/path_to_shapefiles/*.shp')
 
#create destination file (copy of the first file to merge)
merged_file<-'/path_to_my_results/merged_shapefile.shp'
system(paste(
  'ogr2ogr',
  merged_file,
  list_files[1]))
 
# update the merged_fille by appending all the shapefiles of the list list_files
for (i in seq(2,length(list_files))){
system(paste('ogr2ogr',
             '-update',
             '-append',
             merged_file,
             list_files[i]
             ))
}
19 Feb

Larger and smaller extent shared by different rasters

Hello GISton,

Here is a way to get the larger (red) and the smaller (green) extent shared by different rasters.


extent_raster

library(raster)
 
# dummy extent from your rasters, instead use lapply(raster list, extent)
a <- raster(nrows=884, ncols=804, xmn=-45.85728, xmx=-43.76855, ymn=-2.388705, ymx=-0.5181549)
b <- raster(nrows=884, ncols=804, xmn=-45.87077, xmx=-43.78204, ymn=-2.388727, ymx=-0.5208711)
c <- raster(nrows=884, ncols=804, xmn=-45.81952, xmx=-43.7173,  ymn=-2.405129, ymx=-0.5154312)
 
a[] <- 1
b[] <- 2
c[] <- 3
 
plot(a, xlim=c(-45.8,-43.7), ylim=c(-2.41, -0.5))
par(new=TRUE)
plot(b, xlim=c(-45.8,-43.7), ylim=c(-2.41, -0.5))
 
a<-extent(-45, -30, -20, -10)
b<-extent(-55, -35, -25, -5) 
c<-extent(-40 ,-25 , -15 ,0)
extent_list<-list(a, b, c)
 
# make a matrix out of it, each column represents a raster, rows the values
extent_list<-lapply(extent_list, as.matrix)
matrix_extent<-matrix(unlist(extent_list), ncol=length(extent_list))
rownames(matrix_extent)<-c("xmin", "ymin", "xmax", "ymax")
 
# create an extent that covers all the individual extents
larger_extent<-extent(min(matrix_extent[1,]), max(matrix_extent[3,]),
                      min(matrix_extent[2,]), max(matrix_extent[4,]))
 
# create the larger extent shared by all the individual extents
smaller_extent<-extent(max(matrix_extent[1,]), min(matrix_extent[3,]),
                       max(matrix_extent[2,]), min(matrix_extent[4,]))
 
# Plot results
 
a.shp<-as(a, 'SpatialPolygons')
b.shp<-as(b, 'SpatialPolygons')
c.shp<-as(c, 'SpatialPolygons')
larger_extent.shp<-as(larger_extent, 'SpatialPolygons')
smaller_extent.shp<-as(smaller_extent, 'SpatialPolygons')
 
plot(a.shp,  xlim=c(larger_extent@xmin, larger_extent@xmax), ylim=c(larger_extent@ymin, larger_extent@ymax))
plot(b.shp,  xlim=c(larger_extent@xmin, larger_extent@xmax), ylim=c(larger_extent@ymin, larger_extent@ymax), add=TRUE)
plot(c.shp,  xlim=c(larger_extent@xmin, larger_extent@xmax), ylim=c(larger_extent@ymin, larger_extent@ymax), add=TRUE)
plot(larger_extent.shp,  xlim=c(larger_extent@xmin, larger_extent@xmax), ylim=c(larger_extent@ymin, larger_extent@ymax), 
     add=TRUE, lwd=3, border="red")
plot(smaller_extent.shp,  xlim=c(larger_extent@xmin, larger_extent@xmax), ylim=c(larger_extent@ymin, larger_extent@ymax), 
     add=TRUE, lwd=3, border="green")
28 Jan

Lazy Raster Processing With Gdal Vrts

Hi there, today an introduction to VRT, a magic tool, by Matthew T. Perry. Post from here.

Lazy Raster Processing With Gdal Vrts.
No, not lazy as in REST :-) … Lazy as in “Lazy evaluation”:

    In computer programming, lazy evaluation is the technique of delaying a computation until the result is required.

Take an example raster processing workflow to go from a bunch of tiled, latlong, GeoTiff digital elevation models to a single shaded relief GeoTiff in projected space:

    1) Merge the tiles together
    2) Reproject the merged DEM (using bilinear or cubic interpolation)
    3) Generate the hillshade from the merged DEM

Simple enough to do with GDAL tools on the command line. Here’s the typical, process-as-you-go implementation:

gdal_merge.py -of GTiff -o srtm_merged.tif srtm_12_*.tif 
gdalwarp -t_srs epsg:3310 -r bilinear -of GTiff srtm_merged.tif srtm_merged_3310.tif 
gdaldem hillshade srtm_merged_3310.tif srtm_merged_3310_shade.tif -of GTiff

Alternately, we can simulate lazy evaluation by using GDAL Virtual Rasters (VRT) to perform the intermediate steps, only outputting the GeoTiff as the final step.

gdalbuildvrt srtm_merged.vrt srtm_12_0*.tif
gdalwarp -t_srs epsg:3310 -r bilinear -of VRT srtm_merged.vrt srtm_merged_3310.vrt 
gdaldem hillshade srtm_merged_3310.vrt srtm_merged_3310_shade2.tif -of GTiff

So what’s the advantage to doing it the VRT way? They both produce exactly the same output raster. Lets compare:


benchmark

The Lazy VRT method delays all the computationally-intensive processing until it is actually required. The intermediate files, instead of containing the raw raster output of the actual computation, are XML files which contain the instructions to get the desired output. This allows GDAL to do all the processing in one step (the final step #3). The total processing time is not significantly different between the two methods but in terms of the productivity of the GIS analyst, the VRT method is superior. Imagine working with datasets 1000x this size with many more steps – having to type the command, wait 2 hours, type the next, etc. would be a waste of human resources versus assembling the instructions into vrts then hitting the final processing step when you leave the office for a long weekend.

Additionaly, the VRT method produces only small intermediate xml files instead of having a potentially huge data management nightmare of shuffling around GB (or TB) of intermediate outputs! Plus those xml files serve as an excellent piece of metadata which describe the exact processing steps which you can refer to later or adapt to different datasets.

So next time you have a multi-step raster workflow, use the GDAL VRTs to your full advantage – you’ll save yourself time and disk space by being lazy.

19 Jan

Generate a random landscape

Hey GuruFans…

Have you ever wanted to test all your cool algorithms on a simulated landscape instead of a real one? No? Well, too bad… I will still show you how to create quickly a nice landscape from scratch using the R secr and raster packages…

# script to generate a random landscape

require(raster)
require(secr)
require(igraph)
tempmask <- make.mask(nx = 200, ny = 200, spacing = 10)

# Create 3 masks with different properties of the degree of fragmentation or aggregation of the patches (p), the proportion occupied by the patches (A) and the minimum patch size (minpatch)
DBFmask <- raster(randomHabitat(tempmask, p = 0.5, A = 0.2, minpatch = 20))
WCRmask <- raster(randomHabitat(tempmask, p = 0.4, A = 0.4, minpatch = 8))
SCRmask <- raster(randomHabitat(tempmask, p = 0.3, A = 0.5, minpatch = 5))

r <- raster(tempmask)

r[] <- 4
r[which(as.vector(SCRmask)==1)]<-3
r[which(as.vector(WCRmask)==1)]<-2
r[which(as.vector(DBFmask)==1)]<-1

plot(r)

And it should look something like this:

Rplot

06 Jan

Download MODIS products

Tired of browsing through the MODIS catalogue? Here is the solution for you!
Providing a list of tiles, year, product and FTP site, this here code browse all the products and download the ones your interested in. Have fun!

library(rgdal)
library(RCurl)
setInternet2(use = TRUE)
options(download.file.method="auto")
#set dir
setwd("/dir/on/my/raptor/machine")
# location of the MODIS data:
MOD09GQ <- "http://e4ftl01.cr.usgs.gov/MOLT/MOD09GQ.005/"
product = "MOD09GQ"
yearList = c('2008')
tileList <- c('h18v04')
 
for(tile in tileList){
  for(year in yearList){
    # get the list of directories (thanks to Barry Rowlingson):
    items1 <- strsplit(getURL(MOD09GQ), "\n")[[1]]
    # get the directory names and create a new data frame:
    dates=data.frame(dirname=substr(items1[20:length(items1)],52,62))
 
    # get the list of *.hdf files:
    dates <- data.frame(dirname=grep(year,dates$dirname,value = TRUE))
    for (i in 1:NROW(dates)){
      # for each date per year
      getlist <- strsplit(getURL(paste(MOD09GQ, dates$dirname[i], sep="")), ">")[[1]]
      getlist=getlist[grep(product,getlist)]
      getlist=getlist[grep(".hdf<",getlist)]
      filenames=substr(getlist,1,nchar(getlist[1])-3)
 
      BLOCK <- grep(pattern=paste0("MOD09GQ.*.",tile,".*.hdf"),filenames,value = TRUE)
 
      # Download all blocks from the list to a local drive:
      download.file(paste(MOD09GQ,  dates$dirname[i], BLOCK,sep=""),destfile=file.path(getwd(),BLOCK), cacheOK=FALSE, )
    }
  }
}
16 Dec

Copy to clipboard from R on Ubuntu

Hi young padaGIS,

Install xclip:

sudo apt-get install xclip

Then use this function:

#' @title Copy an object in the clipboard
#' 
#' @description Copy an object in the clipboard}
#' 
#' @param sep The object to copy.
#' @param sep A character to be used as separator for each column of the object
#' @param row.names  Copy row names (default is FALSE)
#' @param col.names Copy column names (default is TRUE)
#' @return copy the object as character in the clipboard
#' @author freecube source:http://stackoverflow.com/questions/10959521/how-to-write-to-clipboard-on-ubuntu-linux-in-r
 
clipboard <- function(x, sep="\t", row.names=FALSE, col.names=TRUE){
     con <- pipe("xclip -selection clipboard -i", open="w")
     write.table(x, con, sep=sep, row.names=row.names, col.names=col.names)
     close(con)
}

Example:

vec <- c(1,2,3,4)
 
clipboard(vec)
clipboard(vec, ",", col.names=FALSE)
clipboard(vec, " ", row.names=TRUE)
09 Dec

Stratified random sampling in R

Hello GI(uy)s!

What’s up ? Here is a really efficient function (developped by mrdwab) to perform stratified random sampling on data.table object in R. Enjoy !

#' @title Stratified random sampling
#' 
#' @description Applies a linear stretch to any multiband raster \code{link{stack}} or \code{link{brick}}
#' 
#' @param group The grouping column(s). Can be a character vector or the numeric positions of the columns.
#' @param size  The desired sample size. Can be a decimal (proportionate by group) or an integer (same number of samples per group).
#' @param select A named list with optional subsetting statements.
#' @param replace Logical. Should sampling be done with or without replacement.
#' @param bothSets Logical. Should a list be returned. Useful when setting up a “testing” and “training” sampling setup.
#' @return a data.table object 
#' @author mrdwab source:https://gist.github.com/933ffeaa7a1d718bd10a.git
#' @import data.table
 
stratifiedDT <- function(indt, group, size, select = NULL, 
                         replace = FALSE, keep.rownames = FALSE,
                         bothSets = FALSE) {
  if (is.numeric(group)) group <- names(indt)[group]
  if (!is.data.table(indt)) indt <- as.data.table(
    indt, keep.rownames = keep.rownames)
  if (is.null(select)) {
    indt <- indt
  } else {
    if (is.null(names(select))) stop("'select' must be a named list")
    if (!all(names(select) %in% names(indt)))
      stop("Please verify your 'select' argument")
    temp <- vapply(names(select), function(x)
      indt[[x]] %in% select[[x]], logical(nrow(indt)))
    indt <- indt[rowSums(temp) == length(select), ]
  }
  df.table <- indt[, .N, by = group]
  df.table
  if (length(size) > 1) {
    if (length(size) != nrow(df.table))
      stop("Number of groups is ", nrow(df.table),
           " but number of sizes supplied is ", length(size))
    if (is.null(names(size))) {
      stop("size should be entered as a named vector")
    } else {
      ifelse(all(names(size) %in% do.call(
        paste, df.table[, group, with = FALSE])),
        n <- merge(
          df.table, 
          setnames(data.table(names(size), ss = size), 
                   c(group, "ss")), by = group),
        stop("Named vector supplied with names ",
             paste(names(size), collapse = ", "),
             "\n but the names for the group levels are ",
             do.call(paste, c(unique(
               df.table[, group, with = FALSE]), collapse = ", "))))
    }
  } else if (size < 1) {
    n <- df.table[, ss := round(N * size, digits = 0)]
  } else if (size >= 1) {
    if (all(df.table$N >= size) || isTRUE(replace)) {
      n <- cbind(df.table, ss = size)
    } else {
      message(
        "Some groups\n---",
        do.call(paste, c(df.table[df.table$N < size][, group, with = FALSE], 
                         sep = ".", collapse = ", ")),
        "---\ncontain fewer observations",
        " than desired number of samples.\n",
        "All observations have been returned from those groups.")
      n <- cbind(df.table, ss = pmin(df.table$N, size))
    }
  }
  setkeyv(indt, group)
  setkeyv(n, group)
  indt[, .RNID := sequence(nrow(indt))]
  out1 <- indt[indt[n, list(
    .RNID = sample(.RNID, ss, replace)), by = .EACHI]$`.RNID`]
 
  if (isTRUE(bothSets)) {
    out2 <- indt[!.RNID %in% out1$`.RNID`]
    indt[, .RNID := NULL]
    out1[, .RNID := NULL]
    out2[, .RNID := NULL]
    list(SAMP1 = out1, SAMP2 = out2)
  } else {
    indt[, .RNID := NULL]
    out1[, .RNID := NULL][]
  }
}
03 Dec

Use gdal_calc to calculate index (NDVI, NDWI, …) , add stats with gdalinfo and overview with gdaladdo

His GISguys!

Here is a simple example of how to use the very fast multi-thred gdal_calc to get NDWI with stats, histogram and overviews.

file='my_input_file.tif'
outfile='my_output_file.tif'
 
  system(
    paste('gdal_calc.py',
          '-A',file,
          '--A_band=2',
          '-B',file,
          '--B_band=4',
          paste(' --outfile=', outpufile,sep=''),
          paste(' --calc=','\"','(A.astype(float)-B)/(A.astype(float)+B)','\"',sep=''),
          paste('--type=','\'','Float32','\'',sep=''),
          paste('--co=','\"','COMPRESS=LZW','\"',sep='')))
 
  system(paste('gdalinfo -hist -stats',outpufile))
 
  system(
    paste('gdaladdo',
          outpufile,
          '2 4 8 16')

Here is another version with byte file as output (values ranging from 0 to 255, faster and smaller files):

file='my_input_file.tif'
outfile='my_output_file.tif'
 
  system(
    paste('gdal_calc.py',
          '-A',file,
          '--A_band=2',
          '-B',file,
          '--B_band=4',
          paste(' --outfile=', outpufile,sep=''),
          paste(' --calc=','\"','((((A.astype(float)-B)/(A.astype(float)+B))+1)*128)','\"',sep=''),
          paste('--type=','\'','Byte','\'',sep=''),
          paste('--co=','\"','COMPRESS=LZW','\"',sep='')))
 
  system(paste('gdalinfo -hist -stats',outpufile))
 
  system(
    paste('gdaladdo',
          outpufile,
          '2 4 8 16')
27 Nov

How to randomly split a data frame or a vector into a training and test dataset ?

GIS là,

Here it is the function to do that. Fix the seed if you want to generate the exact same sample several time.
prop allows to define the proportion of the total data that will be sample for the training set.

#' Splitdf splits a dataframe into a training sample and test sample with a given proportion
#' 
#' This function takes a data frame and according to predefined proportion "prop" it will return a training and a test sample
#' 
#' @param input a n x p dataframe of n observations and p variables or a vector
#' @param seed the seed to be set in order to ensure reproductability of the split
#' @param prop the proportion of the training sample [0-1]
#' @return a list with two slots: trainset and testset
#' @author BlackGuru
#' @details
#' This function takes a data frame or a vector and according to predefined proportion "prop" it will return a training and a test sample. "prop" corresponds to the proportion of the training sample.
#' @export
 
splitdf <- function(input, prop=0.5, seed=NULL) {
  if (!is.null(seed)) set.seed(seed)
  if (is.data.frame(input)){
    index <- 1:nrow(input)
    trainindex <- sample(index, trunc(length(index)*prop))
    trainset <- input[trainindex, ]
    testset <- input[-trainindex, ]
  }else if (is.vector(input)){
    index<-1:length(input)  
    trainindex <- sample(index, trunc(length(index)*prop))
    trainset <- input[trainindex]
    testset <- input[-trainindex]
  }else{
    print("Input must be a dataframe or a vector")
  }
  list(trainset=trainset,testset=testset)
}