02 Nov

Efficient Zonal Statistics using R and gdal

Holà crazy GIS lovers,

This function allows to compute zonal statistics of a raster using a ‘zonal shapefile’ and to add an attribute to this shapefile with the result of the statistics for each zone.

UPDATE: The function has been implemented in the Rnightlight R package.

library(raster)
 
# source: https://stat.ethz.ch/pipermail/r-sig-geo/2013-February/017475.html
myZonal <- function (x, z, stat, digits = 0, na.rm = TRUE, 
                      ...) { 
  library(data.table)
  fun <- match.fun(stat) 
  vals <- getValues(x) 
  zones <- round(getValues(z), digits = digits) 
  rDT <- data.table(vals, z=zones) 
  setkey(rDT, z) 
  rDT[, lapply(.SD, fun, na.rm = TRUE), by=z] 
} 
 
ZonalPipe<- function (path.in.shp, path.in.r, path.out.r, path.out.shp, zone.attribute, stat){
 
  # 1/ Rasterize using GDAL
 
  #Initiate parameter
  r<-stack(path.in.r)
 
  ext<-extent(r)
  ext<-paste(ext[1], ext[3], ext[2], ext[4])
 
  res<-paste(res(r)[1], res(r)[2])
 
  #Gdal_rasterize
  command<-'gdal_rasterize'
  command<-paste(command, "--config GDAL_CACHEMAX 2000") #Speed-up with more cache (avice: max 1/3 of your total RAM)
  command<-paste(command, "-a", zone.attribute) #Identifies an attribute field on the features to be used for a burn in value. The value will be burned into all output bands.
  command<-paste(command, "-te", as.character(ext)) #(GDAL >= 1.8.0) set georeferenced extents. The values must be expressed in georeferenced units. If not specified, the extent of the output file will be the extent of the vector layers.
  command<-paste(command, "-tr", res) #(GDAL >= 1.8.0) set target resolution. The values must be expressed in georeferenced units. Both must be positive values.
  command<-paste(command, path.in.shp)
  command<-paste(command, path.out.r)
 
  system(command)
 
  # 2/ Zonal Stat using myZonal function
  zone<-raster(path.out.r)
 
  Zstat<-data.frame(myZonal(r, zone, stat))
  colnames(Zstat)[2:length(Zstat)]<-paste0("B", c(1:(length(Zstat)-1)), "_",stat)
 
  # 3/ Merge data in the shapefile and write it
  shp<-readOGR(path.in.shp, layer= sub("^([^.]*).*", "\\1", basename(path.in.shp)))
 
  shp@data <- data.frame(shp@data, Zstat[match(shp@data[,zone.attribute], Zstat[, "z"]),])
 
  writeOGR(shp, path.out.shp, layer= sub("^([^.]*).*", "\\1", basename(path.in.shp)), driver="ESRI Shapefile")
}
 
path.in.shp<-"/home/zone.shp"
path.in.r<-"/home/NDVI.tif" #or path.in.r<-list.files("/home/, pattern=".tif$")
path.out.r<-"/home/zone.tif"
path.out.shp<-"/home/zone_withZstat.shp"
zone.attribute<-"ID"
 
ZonalPipe(path.in.shp, path.in.r, path.out.r, path.out.shp, zone.attribute, stat="mean")
 
#With
#path.in.shp: Shapefile with zone (INPUT)
#path.in.r: Raster from which the stats have to be computed (INPUT)
#path.out.r: Path of path.in.shp converted in raster (intermediate OUTPUT)
#path.out.shp: Path of path.in.shp with stat value (OUTPUT)
#zone.attribute: Attribute name of path.in.shp corresponding to the zones (ID, Country...)
#stat: function to summary path.in.r values ("mean", "sum"...)

33 thoughts on “Efficient Zonal Statistics using R and gdal

  1. This looks to be a really nice function, just struggling to get it to work!

    Could you give a little more info on the path.outs? Am I right in thinking path.out.r needs to exist and gets overwritten and that path.out.shp will be created?

    I seem to be getting two warning messages:
    1. running command ‘gdal_rasterize …’ had status 127
    2. package ‘data.table’ was built under R version 3.1.3
    (I’m running R version 3.1.2).

    My path.out.shp result is a mix of nulls and odd values. I can’t figure out if this is to do with inputs or R version…

    Cheers

    • Hi,

      path.out.r is the file path of your rasterized input shapefile, it does not need to pre-exist. It’s because we have to convert the initial shapefile containing the zones into a raster to perform the zonal stat. Actually, you could re-write this function without saving the rasterized shapefile on the hard disk and just store it in memory.

      Just upgrade R to use the package ‘data.table’.

      Hope it helps.

  2. Hi,
    I understand ‘stat’ here is the statistical measure that we want to calculate for each zone. Can stat be ‘count’? I need to find the area (i.e., the count) of the input raster in each zone represented by the polygon shapefile. Is this possible?

    • Hello,
      I think you are looking for a “Tabulate Area” function which is a much more complex operation than a simple zonal stat. This tuto give an explanation on how to do it with R. Alternatively, you can reclass your raster (1 for the class of interest and 0 for others) and compute the sum (stat="sum") of raster cells inside each zone. Repeat the operation for each class of interest.
      Hope it helps.

    • Actually you can also simply edit the myZonal function like this: myZonal < - function (x, z,digits = 0, na.rm = TRUE) { vals <- getValues(x) zones <- round(getValues(z), digits = digits) rDT <- data.table(vals, z=zones) count(rDT) }

      • This will only work for small rasters. the binarize approach is better although it is not very handy if you have a lot of different categories.

  3. Could you clarify on the variables you are using? I find this function very useful, however, I am having troubles to understand your input variables
    path.in.shp: Where my unrastarized shp is located?
    path.in.r: ?
    path.out.r: Where my rasterized shp is located?
    path.out.shp: Where the shp with the added attributes is to be saved
    zone.attribute: ?
    stat=”mean”: What statistic to perform

  4. Hi Guru-Blard, thanks for sharing this script, I’ve been trying to workaround getting the Stat column to be named after each file but I cant seem to work it out. Can you pls help with this? as I have thousands of rasters that I am calculating zonal statistics for and the easiest way for me to identify each would be for each column to bear the parent file name.

    Thanks
    Lanre

    • Hi Lanre,

      You could simply replace the line:
      colnames(Zstat)[2:length(Zstat)]< -paste0("B", c(1:(length(Zstat)-1)), "_",stat)

      by

      colnames(Zstat)[2:length(Zstat)]< -paste0(names(r), "_",stat)

      with path.in.r=list.files(DirectoryWithRasters, pattern=".tif$")

  5. Thanks for your quick response It is very much appreciated.
    I tried what you mentioned but I keep getting the error below.
    Error in rep.int(names(x), lengths(x)) : invalid ‘times’ value

    Also am i supposed to manually rasterize the input shapefile or will this code automate it and if so how do I do that. I couldnt workout how to get it to automate the rasterization of my input shapefile, so I did it in ArcGIS but I dont trust it. You’ll notice I pointed the “path.out.r” to a raster I had pre-created. Thanks heaps for your help.

    I have included my whole script below as I tried to run it. Sorry for the very elementary questions. I just started trying my hands on R yesterday when ArcGIS couldnt handle the workload.All help is greatly appreciated.

    myZonal <- function (x, z, stat, digits = 0, na.rm = TRUE,
    …) {
    library(data.table)
    fun <- match.fun(stat)
    vals <- getValues(x)
    zones <- round(getValues(z), digits = digits)
    rDT <- data.table(vals, z=zones)
    setkey(rDT, z)
    rDT[, lapply(.SD, fun), by=z]
    }

    ZonalPipe<- function (path.in.shp, path.in.r, path.out.r, path.out.shp, zone.attribute, stat){

    # 1/ Rasterize using GDAL

    #Initiate parameter
    r<-stack(path.in.r)

    ext<-extent(r)
    ext<-paste(ext[1], ext[3], ext[2], ext[4])

    res<-paste(res(r)[1], res(r)[2])

    #Gdal_rasterize
    command<-'gdal_rasterize'
    command<-paste(command, "–config GDAL_CACHEMAX 2000") #Speed-up with more cache (avice: max 1/3 of your total RAM)
    command<-paste(command, "-a", zone.attribute) #Identifies an attribute field on the features to be used for a burn in value. The value will be burned into all output bands.
    command= 1.8.0) set georeferenced extents. The values must be expressed in georeferenced units. If not specified, the extent of the output file will be the extent of the vector layers.
    command= 1.8.0) set target resolution. The values must be expressed in georeferenced units. Both must be positive values.
    command<-paste(command, path.in.shp)
    command<-paste(command, path.out.r)

    system(command)

    # 2/ Zonal Stat using myZonal function
    zone<-raster(path.out.r)

    Zstat<-data.frame(myZonal(r, zone, stat))
    colnames(Zstat)[2:length(Zstat)]< -paste0(names(r), "_",stat)

    # 3/ Merge data in the shapefile and write it
    shp<-readOGR(path.in.shp, layer= sub("^([^.]*).*", "\\1", basename(path.in.shp)))

    shp@data <- data.frame(shp@data, Zstat[match(shp@data[,zone.attribute], Zstat[, "z"]),])

    writeOGR(shp, path.out.shp, layer= sub("^([^.]*).*", "\\1", basename(path.in.shp)), driver="ESRI Shapefile")
    }
    path.in.r <- (list.files("D:/sixthtest/New folder", pattern=".tif$"))
    path.in.shp <- "D:/sixth shape SWAT/raster to zonal tools/Sixth_subsGDA94.shp"
    path.out.r <- "D:/sixthtest/New fold/sixthcreek.tif"
    zone.attribute <- "GRIDCODE"
    path.out.shp <- "D:/sixthtest/New fold/fold"

    ZonalPipe(path.in.shp, path.in.r, path.out.r, path.out.shp, zone.attribute, stat="mean")

    #With
    #path.in.shp: Shapefile with zone (INPUT)
    #path.in.r: Raster from which the stats have to be computed (INPUT)
    #path.out.r: Path of path.in.shp converted in raster (intermediate OUTPUT)
    #path.out.shp: Path of path.in.shp with stat value (OUTPUT)
    #zone.attribute: Attribute name of path.in.shp corresponding to the zones (ID, Country…)
    #stat: function to summary path.in.r values ("mean", "sum"…

    • You have to load the library raster in preambule -> library(raster)
      Furthermore, your path.out.shp should be “D:/sixthtest/New fold/fold.shp”
      The function will rasterize your shapefile and store it in path.out.r for future use.
      Hope it helps.

  6. Hi Guru Blard,
    I am still struggling with getting this code to work and from the error message, it is related to GDAL rasterize command, I guess its not creating the raster file referred to in path.out.r. I have included all your corrections in your previous posts and loaded the raster library as shown below. I have included the error comments below, followed by the script I used. Thanks for all your help.

    Error in .local(.Object, …) :
    `D:\sixthtest\New folder\Sixth_subsGDA94.tif’ does not exist in the file system,
    and is not recognised as a supported dataset name.

    In addition: Warning message:
    running command ‘gdal_rasterize –config GDAL_CACHEMAX 2000 -a GRIDCODE -te 138.475 -35.175 139.175 -34.225 -tr 0.00200000000000005 0.00199999999999999 D:/sixthtest/New folder/Sixth_subsGDA94.shp D:/sixthtest/New folder/Sixth_subsGDA94.tif’ had status 127
    Error in .rasterObjectFromFile(x, band = band, objecttype = “RasterLayer”, :
    Cannot create a RasterLayer object from this file. (file does not exist)

    library(raster)
    library(data.table)

    myZonal <- function (x, z, stat, digits = 0, na.rm = TRUE,
    …) {
    library(data.table)
    fun <- match.fun(stat)
    vals <- getValues(x)
    zones <- round(getValues(z), digits = digits)
    rDT <- data.table(vals, z=zones)
    setkey(rDT, z)
    rDT[, lapply(.SD, fun), by=z]
    }

    ZonalPipe<- function (path.in.shp, path.in.r, path.out.r, path.out.shp, zone.attribute, stat){

    # 1/ Rasterize using GDAL

    #Initiate parameter
    r<-stack(path.in.r)

    ext<-extent(r)
    ext<-paste(ext[1], ext[3], ext[2], ext[4])

    res<-paste(res(r)[1], res(r)[2])

    #Gdal_rasterize
    command<-'gdal_rasterize'
    command<-paste(command, "–config GDAL_CACHEMAX 2000") #Speed-up with more cache (avice: max 1/3 of your total RAM)
    command<-paste(command, "-a", zone.attribute) #Identifies an attribute field on the features to be used for a burn in value. The value will be burned into all output bands.
    command= 1.8.0) set georeferenced extents. The values must be expressed in georeferenced units. If not specified, the extent of the output file will be the extent of the vector layers.
    command= 1.8.0) set target resolution. The values must be expressed in georeferenced units. Both must be positive values.
    command<-paste(command, path.in.shp)
    command<-paste(command, path.out.r)

    system(command)

    # 2/ Zonal Stat using myZonal function
    zone <- raster(path.out.r)

    Zstat<-data.frame(myZonal(r, zone, stat))
    colnames(Zstat)[2:length(Zstat)]< -paste0(names(r), "_",stat)

    # 3/ Merge data in the shapefile and write it
    shp<-readOGR(path.in.shp, layer= sub("^([^.]*).*", "\\1", basename(path.in.shp)))

    shp@data <- data.frame(shp@data, Zstat[match(shp@data[,zone.attribute], Zstat[, "z"]),])

    writeOGR(shp, path.out.shp, layer= sub("^([^.]*).*", "\\1", basename(path.in.shp)), driver="ESRI Shapefile")
    }

    path.in.shp <- "D:/sixthtest/New folder/Sixth_subsGDA94.shp"
    path.in.r <- (list.files("D:/sixthtest/New folder", pattern=".tif$"))
    path.out.r <- "D:/sixthtest/New folder/Sixth_subsGDA94.tif"
    path.out.shp <- "D:/sixthtest/New folder/fold.shp"
    zone.attribute <- "GRIDCODE"

    ZonalPipe(path.in.shp, path.in.r, path.out.r, path.out.shp, zone.attribute, stat="mean")

    #With
    #path.in.shp: Shapefile with zone (INPUT)
    #path.in.r: Raster from which the stats have to be computed (INPUT)
    #path.out.r: Path of path.in.shp converted in raster (intermediate OUTPUT)
    #path.out.shp: Path of path.in.shp with stat value (OUTPUT)
    #zone.attribute: Attribute name of path.in.shp corresponding to the zones (ID, Country…)
    #stat: function to summary path.in.r values ("mean", "sum"…)

      • I am having the same problem here.
        The function does not write the rasterfile to the folder.
        #path.in.shp: Shapefile with zone (INPUT)
        path.in.shp <- "/Desktop/test/shapes/002.shp"

        #path.in.r: Raster from which the stats have to be computed (INPUT)
        path.in.r <- "/Desktop/test/001.asc"

        #path.out.r: Path of path.in.shp converted in raster (intermediate OUTPUT)
        path.out.r <- "/Desktop/test/temp.tif"

        #path.out.shp: Path of path.in.shp with stat value (OUTPUT)
        path.out.shp <- "/Desktop/test/shapes/new.shp"

        I get

        sh: gdal_rasterize: command not found
        Error in .local(.Object, …) :
        `/Desktop/test/temp.tif' does not exist in the file system,
        and is not recognised as a supported dataset name.

        Error in .rasterObjectFromFile(x, band = band, objecttype = "RasterLayer", :
        Cannot create a RasterLayer object from this file. (file does not exist).

        Why is that?

  7. I get the same error “… does not exist in the file system,
    and is not recognised as a supported dataset name.”

    I believe it is because gdal_rasterize wants you to create the raster to hold the burned in data *before* you run the gdal_rasterize command.

  8. The Problem with “[…] Error in .rasterObjectFromFile(x, band = band, objecttype = “RasterLayer”, :
    Cannot create a RasterLayer object from this file. (file does not exist). […]” I got resolved.
    You need to point your system where to find the gdal_rasterize plug in.

    However, my output is empty. In the column the function adds (mean) there are NULL values for every row.
    Both data have the same CRS…

        • Thanks for your reply and your your responsiveness! Just finishing of re-installing GDAL, that now re-works perfectly, but I am still getting the error! I am wondering how I can point to my system where to find the gdal_rasterize plug. I have already add to system variable “path” the path of GDAL plugins. Any idea about the reason why I am still getting the error?

  9. Hi.

    Thanks a lot for this. It has GREATLY improved my processing times on a project that deals with large rasters. Previously rasterizing the polygon zones individually, then extracting, even with parallelization was taking forever. A small variation that I made in myZonal breaks up large rasters so that getValues(x) doesn’t try to load the whole raster into memory. It uses raster::blockSize which suggests chunks of the raster to be loaded. A caveat the function requires tweaking for functions where the cumulative result is different from the overall result. I am using it for sums so no tweak required. The code:

    myZonal <- function (x, z, stat, digits = 0, na.rm = TRUE, …)
    {
    fun <- match.fun(stat)

    vals <- NULL

    zones <- NULL

    blocks <- blockSize(x)

    result <- NULL

    for (i in 1:blocks$n)
    {
    vals <- getValues(x, blocks$row[i], blocks$nrows[i])

    vals[vals < 0] <- NA

    zones <- round(getValues(z, blocks$row[i], blocks$nrows[i]), digits = digits)

    rDT <- data.table(vals, z=zones)

    setkey(rDT, z)

    result <- rbind(result, rDT[, lapply(.SD, fun, na.rm = TRUE), by=z])
    }

    result <- result[, lapply(.SD, fun, na.rm = TRUE), by=z]

    gc()

    return(result)
    }

      • I have edited the myZonal function with rDT[, lapply(.SD, fun, na.rm = TRUE), by=z]
        Then:
        path.in.shp<-"maps/terr-ecoregions/tnc_terr_ecoregions.shp"
        #path.in.shp<-"maps/gez2010/gez_2010_wgs84.shp"
        path.in.r<-"maps/liuDif.tif"
        path.out.r<-"maps/temp.tif"
        path.out.shp<-"maps/wwf/newStats.shp"
        zone.attribute<-"WWF_MHTNAM"

        ZonalPipe(path.in.shp, path.in.r, path.out.r, path.out.shp, zone.attribute, stat="sum")
        p1 <- readOGR(dsn="maps/wwf", layer="newStats")
        percMean <- data.frame(p1)
        percMean % filter(!is.na(B1_sum))

        But unfortunately “percMean” has 0 observations. What else can I try? Thanks

        • It is hard to say w/o the data but first check if ”’temp.tif”’ gives the zones that you are looking for and check if it correctly overlays ”’liuDif.tif”’ (with the same crs).

          • Hi, it appears that ”’temp.tif” is just a 8MB file full of 0’s, there is nothing interesting in it. Something you can think of that can cause this issue?

  10. This is great, works great, and I now just need to get it to work with rpy2 and python. I keep getting: p = rinterface.parse(string)
    rpy2.rinterface.RParsingError

    string = ”’ my R functions code here”’

    I do not even get to the part of the code that calls STAP. Python fails while reading R functions. Still working through it.

Leave a Reply

Your email address will not be published. Required fields are marked *