02 Nov

Efficient Zonal Statistics using R and gdal

Holà crazy GIS lovers,

This function allows to compute zonal statistics of a raster using a ‘zonal shapefile’ and to add an attribute to this shapefile with the result of the statistics for each zone.

library(raster)
 
myZonal <- function (x, z, stat, digits = 0, na.rm = TRUE, 
                      ...) { 
  library(data.table)
  fun <- match.fun(stat) 
  vals <- getValues(x) 
  zones <- round(getValues(z), digits = digits) 
  rDT <- data.table(vals, z=zones) 
  setkey(rDT, z) 
  rDT[, lapply(.SD, fun, na.rm = TRUE), by=z] 
} 
 
ZonalPipe<- function (path.in.shp, path.in.r, path.out.r, path.out.shp, zone.attribute, stat){
 
  # 1/ Rasterize using GDAL
 
  #Initiate parameter
  r<-stack(path.in.r)
 
  ext<-extent(r)
  ext<-paste(ext[1], ext[3], ext[2], ext[4])
 
  res<-paste(res(r)[1], res(r)[2])
 
  #Gdal_rasterize
  command<-'gdal_rasterize'
  command<-paste(command, "--config GDAL_CACHEMAX 2000") #Speed-up with more cache (avice: max 1/3 of your total RAM)
  command<-paste(command, "-a", zone.attribute) #Identifies an attribute field on the features to be used for a burn in value. The value will be burned into all output bands.
  command<-paste(command, "-te", as.character(ext)) #(GDAL >= 1.8.0) set georeferenced extents. The values must be expressed in georeferenced units. If not specified, the extent of the output file will be the extent of the vector layers.
  command<-paste(command, "-tr", res) #(GDAL >= 1.8.0) set target resolution. The values must be expressed in georeferenced units. Both must be positive values.
  command<-paste(command, path.in.shp)
  command<-paste(command, path.out.r)
 
  system(command)
 
  # 2/ Zonal Stat using myZonal function
  zone<-raster(path.out.r)
 
  Zstat<-data.frame(myZonal(r, zone, stat))
  colnames(Zstat)[2:length(Zstat)]<-paste0("B", c(1:(length(Zstat)-1)), "_",stat)
 
  # 3/ Merge data in the shapefile and write it
  shp<-readOGR(path.in.shp, layer= sub("^([^.]*).*", "\\1", basename(path.in.shp)))
 
  shp@data <- data.frame(shp@data, Zstat[match(shp@data[,zone.attribute], Zstat[, "z"]),])
 
  writeOGR(shp, path.out.shp, layer= sub("^([^.]*).*", "\\1", basename(path.in.shp)), driver="ESRI Shapefile")
}
 
path.in.shp<-"/home/zone.shp"
path.in.r<-"/home/NDVI.tif" #or path.in.r<-list.files("/home/, pattern=".tif$")
path.out.r<-"/home/zone.tif"
path.out.shp<-"/home/zone_withZstat.shp"
zone.attribute<-"ID"
 
ZonalPipe(path.in.shp, path.in.r, path.out.r, path.out.shp, zone.attribute, stat="mean")
 
#With
#path.in.shp: Shapefile with zone (INPUT)
#path.in.r: Raster from which the stats have to be computed (INPUT)
#path.out.r: Path of path.in.shp converted in raster (intermediate OUTPUT)
#path.out.shp: Path of path.in.shp with stat value (OUTPUT)
#zone.attribute: Attribute name of path.in.shp corresponding to the zones (ID, Country...)
#stat: function to summary path.in.r values ("mean", "sum"...)

29 thoughts on “Efficient Zonal Statistics using R and gdal

  1. This looks to be a really nice function, just struggling to get it to work!

    Could you give a little more info on the path.outs? Am I right in thinking path.out.r needs to exist and gets overwritten and that path.out.shp will be created?

    I seem to be getting two warning messages:
    1. running command ‘gdal_rasterize …’ had status 127
    2. package ‘data.table’ was built under R version 3.1.3
    (I’m running R version 3.1.2).

    My path.out.shp result is a mix of nulls and odd values. I can’t figure out if this is to do with inputs or R version…

    Cheers

    • Hi,

      path.out.r is the file path of your rasterized input shapefile, it does not need to pre-exist. It’s because we have to convert the initial shapefile containing the zones into a raster to perform the zonal stat. Actually, you could re-write this function without saving the rasterized shapefile on the hard disk and just store it in memory.

      Just upgrade R to use the package ‘data.table’.

      Hope it helps.

  2. Hi,
    I understand ‘stat’ here is the statistical measure that we want to calculate for each zone. Can stat be ‘count’? I need to find the area (i.e., the count) of the input raster in each zone represented by the polygon shapefile. Is this possible?

    • Hello,
      I think you are looking for a “Tabulate Area” function which is a much more complex operation than a simple zonal stat. This tuto give an explanation on how to do it with R. Alternatively, you can reclass your raster (1 for the class of interest and 0 for others) and compute the sum (stat="sum") of raster cells inside each zone. Repeat the operation for each class of interest.
      Hope it helps.

    • Actually you can also simply edit the myZonal function like this: myZonal < - function (x, z,digits = 0, na.rm = TRUE) { vals <- getValues(x) zones <- round(getValues(z), digits = digits) rDT <- data.table(vals, z=zones) count(rDT) }

      • This will only work for small rasters. the binarize approach is better although it is not very handy if you have a lot of different categories.

  3. Could you clarify on the variables you are using? I find this function very useful, however, I am having troubles to understand your input variables
    path.in.shp: Where my unrastarized shp is located?
    path.in.r: ?
    path.out.r: Where my rasterized shp is located?
    path.out.shp: Where the shp with the added attributes is to be saved
    zone.attribute: ?
    stat=”mean”: What statistic to perform

  4. Hi Guru-Blard, thanks for sharing this script, I’ve been trying to workaround getting the Stat column to be named after each file but I cant seem to work it out. Can you pls help with this? as I have thousands of rasters that I am calculating zonal statistics for and the easiest way for me to identify each would be for each column to bear the parent file name.

    Thanks
    Lanre

    • Hi Lanre,

      You could simply replace the line:
      colnames(Zstat)[2:length(Zstat)]< -paste0("B", c(1:(length(Zstat)-1)), "_",stat)

      by

      colnames(Zstat)[2:length(Zstat)]< -paste0(names(r), "_",stat)

      with path.in.r=list.files(DirectoryWithRasters, pattern=".tif$")

  5. Thanks for your quick response It is very much appreciated.
    I tried what you mentioned but I keep getting the error below.
    Error in rep.int(names(x), lengths(x)) : invalid ‘times’ value

    Also am i supposed to manually rasterize the input shapefile or will this code automate it and if so how do I do that. I couldnt workout how to get it to automate the rasterization of my input shapefile, so I did it in ArcGIS but I dont trust it. You’ll notice I pointed the “path.out.r” to a raster I had pre-created. Thanks heaps for your help.

    I have included my whole script below as I tried to run it. Sorry for the very elementary questions. I just started trying my hands on R yesterday when ArcGIS couldnt handle the workload.All help is greatly appreciated.

    myZonal <- function (x, z, stat, digits = 0, na.rm = TRUE,
    …) {
    library(data.table)
    fun <- match.fun(stat)
    vals <- getValues(x)
    zones <- round(getValues(z), digits = digits)
    rDT <- data.table(vals, z=zones)
    setkey(rDT, z)
    rDT[, lapply(.SD, fun), by=z]
    }

    ZonalPipe<- function (path.in.shp, path.in.r, path.out.r, path.out.shp, zone.attribute, stat){

    # 1/ Rasterize using GDAL

    #Initiate parameter
    r<-stack(path.in.r)

    ext<-extent(r)
    ext<-paste(ext[1], ext[3], ext[2], ext[4])

    res<-paste(res(r)[1], res(r)[2])

    #Gdal_rasterize
    command<-'gdal_rasterize'
    command<-paste(command, "–config GDAL_CACHEMAX 2000") #Speed-up with more cache (avice: max 1/3 of your total RAM)
    command<-paste(command, "-a", zone.attribute) #Identifies an attribute field on the features to be used for a burn in value. The value will be burned into all output bands.
    command= 1.8.0) set georeferenced extents. The values must be expressed in georeferenced units. If not specified, the extent of the output file will be the extent of the vector layers.
    command= 1.8.0) set target resolution. The values must be expressed in georeferenced units. Both must be positive values.
    command<-paste(command, path.in.shp)
    command<-paste(command, path.out.r)

    system(command)

    # 2/ Zonal Stat using myZonal function
    zone<-raster(path.out.r)

    Zstat<-data.frame(myZonal(r, zone, stat))
    colnames(Zstat)[2:length(Zstat)]< -paste0(names(r), "_",stat)

    # 3/ Merge data in the shapefile and write it
    shp<-readOGR(path.in.shp, layer= sub("^([^.]*).*", "\\1", basename(path.in.shp)))

    shp@data <- data.frame(shp@data, Zstat[match(shp@data[,zone.attribute], Zstat[, "z"]),])

    writeOGR(shp, path.out.shp, layer= sub("^([^.]*).*", "\\1", basename(path.in.shp)), driver="ESRI Shapefile")
    }
    path.in.r <- (list.files("D:/sixthtest/New folder", pattern=".tif$"))
    path.in.shp <- "D:/sixth shape SWAT/raster to zonal tools/Sixth_subsGDA94.shp"
    path.out.r <- "D:/sixthtest/New fold/sixthcreek.tif"
    zone.attribute <- "GRIDCODE"
    path.out.shp <- "D:/sixthtest/New fold/fold"

    ZonalPipe(path.in.shp, path.in.r, path.out.r, path.out.shp, zone.attribute, stat="mean")

    #With
    #path.in.shp: Shapefile with zone (INPUT)
    #path.in.r: Raster from which the stats have to be computed (INPUT)
    #path.out.r: Path of path.in.shp converted in raster (intermediate OUTPUT)
    #path.out.shp: Path of path.in.shp with stat value (OUTPUT)
    #zone.attribute: Attribute name of path.in.shp corresponding to the zones (ID, Country…)
    #stat: function to summary path.in.r values ("mean", "sum"…

    • You have to load the library raster in preambule -> library(raster)
      Furthermore, your path.out.shp should be “D:/sixthtest/New fold/fold.shp”
      The function will rasterize your shapefile and store it in path.out.r for future use.
      Hope it helps.

  6. Hi Guru Blard,
    I am still struggling with getting this code to work and from the error message, it is related to GDAL rasterize command, I guess its not creating the raster file referred to in path.out.r. I have included all your corrections in your previous posts and loaded the raster library as shown below. I have included the error comments below, followed by the script I used. Thanks for all your help.

    Error in .local(.Object, …) :
    `D:\sixthtest\New folder\Sixth_subsGDA94.tif’ does not exist in the file system,
    and is not recognised as a supported dataset name.

    In addition: Warning message:
    running command ‘gdal_rasterize –config GDAL_CACHEMAX 2000 -a GRIDCODE -te 138.475 -35.175 139.175 -34.225 -tr 0.00200000000000005 0.00199999999999999 D:/sixthtest/New folder/Sixth_subsGDA94.shp D:/sixthtest/New folder/Sixth_subsGDA94.tif’ had status 127
    Error in .rasterObjectFromFile(x, band = band, objecttype = “RasterLayer”, :
    Cannot create a RasterLayer object from this file. (file does not exist)

    library(raster)
    library(data.table)

    myZonal <- function (x, z, stat, digits = 0, na.rm = TRUE,
    …) {
    library(data.table)
    fun <- match.fun(stat)
    vals <- getValues(x)
    zones <- round(getValues(z), digits = digits)
    rDT <- data.table(vals, z=zones)
    setkey(rDT, z)
    rDT[, lapply(.SD, fun), by=z]
    }

    ZonalPipe<- function (path.in.shp, path.in.r, path.out.r, path.out.shp, zone.attribute, stat){

    # 1/ Rasterize using GDAL

    #Initiate parameter
    r<-stack(path.in.r)

    ext<-extent(r)
    ext<-paste(ext[1], ext[3], ext[2], ext[4])

    res<-paste(res(r)[1], res(r)[2])

    #Gdal_rasterize
    command<-'gdal_rasterize'
    command<-paste(command, "–config GDAL_CACHEMAX 2000") #Speed-up with more cache (avice: max 1/3 of your total RAM)
    command<-paste(command, "-a", zone.attribute) #Identifies an attribute field on the features to be used for a burn in value. The value will be burned into all output bands.
    command= 1.8.0) set georeferenced extents. The values must be expressed in georeferenced units. If not specified, the extent of the output file will be the extent of the vector layers.
    command= 1.8.0) set target resolution. The values must be expressed in georeferenced units. Both must be positive values.
    command<-paste(command, path.in.shp)
    command<-paste(command, path.out.r)

    system(command)

    # 2/ Zonal Stat using myZonal function
    zone <- raster(path.out.r)

    Zstat<-data.frame(myZonal(r, zone, stat))
    colnames(Zstat)[2:length(Zstat)]< -paste0(names(r), "_",stat)

    # 3/ Merge data in the shapefile and write it
    shp<-readOGR(path.in.shp, layer= sub("^([^.]*).*", "\\1", basename(path.in.shp)))

    shp@data <- data.frame(shp@data, Zstat[match(shp@data[,zone.attribute], Zstat[, "z"]),])

    writeOGR(shp, path.out.shp, layer= sub("^([^.]*).*", "\\1", basename(path.in.shp)), driver="ESRI Shapefile")
    }

    path.in.shp <- "D:/sixthtest/New folder/Sixth_subsGDA94.shp"
    path.in.r <- (list.files("D:/sixthtest/New folder", pattern=".tif$"))
    path.out.r <- "D:/sixthtest/New folder/Sixth_subsGDA94.tif"
    path.out.shp <- "D:/sixthtest/New folder/fold.shp"
    zone.attribute <- "GRIDCODE"

    ZonalPipe(path.in.shp, path.in.r, path.out.r, path.out.shp, zone.attribute, stat="mean")

    #With
    #path.in.shp: Shapefile with zone (INPUT)
    #path.in.r: Raster from which the stats have to be computed (INPUT)
    #path.out.r: Path of path.in.shp converted in raster (intermediate OUTPUT)
    #path.out.shp: Path of path.in.shp with stat value (OUTPUT)
    #zone.attribute: Attribute name of path.in.shp corresponding to the zones (ID, Country…)
    #stat: function to summary path.in.r values ("mean", "sum"…)

      • I am having the same problem here.
        The function does not write the rasterfile to the folder.
        #path.in.shp: Shapefile with zone (INPUT)
        path.in.shp <- "/Desktop/test/shapes/002.shp"

        #path.in.r: Raster from which the stats have to be computed (INPUT)
        path.in.r <- "/Desktop/test/001.asc"

        #path.out.r: Path of path.in.shp converted in raster (intermediate OUTPUT)
        path.out.r <- "/Desktop/test/temp.tif"

        #path.out.shp: Path of path.in.shp with stat value (OUTPUT)
        path.out.shp <- "/Desktop/test/shapes/new.shp"

        I get

        sh: gdal_rasterize: command not found
        Error in .local(.Object, …) :
        `/Desktop/test/temp.tif' does not exist in the file system,
        and is not recognised as a supported dataset name.

        Error in .rasterObjectFromFile(x, band = band, objecttype = "RasterLayer", :
        Cannot create a RasterLayer object from this file. (file does not exist).

        Why is that?

  7. I get the same error “… does not exist in the file system,
    and is not recognised as a supported dataset name.”

    I believe it is because gdal_rasterize wants you to create the raster to hold the burned in data *before* you run the gdal_rasterize command.

  8. The Problem with “[…] Error in .rasterObjectFromFile(x, band = band, objecttype = “RasterLayer”, :
    Cannot create a RasterLayer object from this file. (file does not exist). […]” I got resolved.
    You need to point your system where to find the gdal_rasterize plug in.

    However, my output is empty. In the column the function adds (mean) there are NULL values for every row.
    Both data have the same CRS…

  9. Hi.

    Thanks a lot for this. It has GREATLY improved my processing times on a project that deals with large rasters. Previously rasterizing the polygon zones individually, then extracting, even with parallelization was taking forever. A small variation that I made in myZonal breaks up large rasters so that getValues(x) doesn’t try to load the whole raster into memory. It uses raster::blockSize which suggests chunks of the raster to be loaded. A caveat the function requires tweaking for functions where the cumulative result is different from the overall result. I am using it for sums so no tweak required. The code:

    myZonal <- function (x, z, stat, digits = 0, na.rm = TRUE, …)
    {
    fun <- match.fun(stat)

    vals <- NULL

    zones <- NULL

    blocks <- blockSize(x)

    result <- NULL

    for (i in 1:blocks$n)
    {
    vals <- getValues(x, blocks$row[i], blocks$nrows[i])

    vals[vals < 0] <- NA

    zones <- round(getValues(z, blocks$row[i], blocks$nrows[i]), digits = digits)

    rDT <- data.table(vals, z=zones)

    setkey(rDT, z)

    result <- rbind(result, rDT[, lapply(.SD, fun, na.rm = TRUE), by=z])
    }

    result <- result[, lapply(.SD, fun, na.rm = TRUE), by=z]

    gc()

    return(result)
    }

      • I have edited the myZonal function with rDT[, lapply(.SD, fun, na.rm = TRUE), by=z]
        Then:
        path.in.shp<-"maps/terr-ecoregions/tnc_terr_ecoregions.shp"
        #path.in.shp<-"maps/gez2010/gez_2010_wgs84.shp"
        path.in.r<-"maps/liuDif.tif"
        path.out.r<-"maps/temp.tif"
        path.out.shp<-"maps/wwf/newStats.shp"
        zone.attribute<-"WWF_MHTNAM"

        ZonalPipe(path.in.shp, path.in.r, path.out.r, path.out.shp, zone.attribute, stat="sum")
        p1 <- readOGR(dsn="maps/wwf", layer="newStats")
        percMean <- data.frame(p1)
        percMean % filter(!is.na(B1_sum))

        But unfortunately “percMean” has 0 observations. What else can I try? Thanks

        • It is hard to say w/o the data but first check if ”’temp.tif”’ gives the zones that you are looking for and check if it correctly overlays ”’liuDif.tif”’ (with the same crs).

          • Hi, it appears that ”’temp.tif” is just a 8MB file full of 0’s, there is nothing interesting in it. Something you can think of that can cause this issue?

Leave a Reply

Your email address will not be published. Required fields are marked *