Stitching shapefiles efficiently using OGR

Thu 29 January 2015

Filed under Blog

Tags python ogr gdal

Today I had the unenviable task of stitching together and reprojecting a large number of Shapefiles (yes, shapefiles are still used. Nothing has changed since 1992, I guess). The process is greatly simplified by using OGR's virtual datasets for vectors in a way that is reminiscent of how you about doing this with GDAL and raster data (see here).

The process is extemely simple, and it is done by creating a VRT XML file. The definition of the format given in this page, but in essence if you want to have a union of different layers, you just need to come up with the following XML:

    <OGRVRTUnionLayer name="TheLayerUnited">
        <OGRVRTLayer name="layer1">
        <OGRVRTLayer name="layer2">

This is really easy: you can sandwich in as many layers as you want with extra <OGRVRTUnionLayer> elements. You can also transform them by wrapping this in a <OGRVRTWarpedLayer> object, but it turns out that in my particular case, ogr2ogr seems to be slightly easier. At any rate, this would be the code to convert the two datasets to a new projection:

        <OGRVRTUnionLayer name="TheLayerUnited">
            <OGRVRTLayer name="layer1">
            <OGRVRTLayer name="layer2">

You can do all sorts of other things easily, like spatial filtering and things like that. To me, it is a useful way to be able to extract areas of complex and large multi-shapefile datasets.

Eventually, this is the code I ended up using from Python.

#!/usr/bin/env python
import sys
import subprocess
import os
import numpy as np # Not really needed, but I'm happy using numpy...
# These MODIS tiles (H/V) cover the US (CONUS)
tiles="""8   6
9   6
10  6
8   5
9   5
10  5
11  5
12  5
8   4
9   4
10  4
11  4
12  4
13  4"""

# Put the tiles in an array
tile_list = np.fromstring(tiles, sep=" " ).reshape((14,2)).astype(
# Main loop over the years
for year in xrange(2000, 2013):
    # Now months...
    for month in ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]:
        # Some user feedback
        print "%04d/%s..." % (year, month),
        # ``vrt_text`` holds the context of the VRT file
        vrt_text='<OGRVRTDataSource>\n\t<OGRVRTUnionLayer name="JoinedUp">\n'
        for htile, vtile in tile_list:
            # We add the different shapefiles, in this case, indexed by h-tile, v-tile, year and month
            vrt_text += '''        <OGRVRTLayer name="MCD64A1_fires_h%02dv%02d_%04d_%s">
            </OGRVRTLayer>\n''' % ( htile, vtile, year, month,htile, vtile, year, month)
        vrt_text +='''\t</OGRVRTUnionLayer>\n</OGRVRTDataSource>'''
        # This is the ouptut file with VRT extension...
        input_file="/tmp/CONUS_MCD64A1_%04d_%s.vrt" % ( year, month)
        # This is the output file, but in Shapefile format
        output_file = input_file.replace (".vrt", ".shp" )
        # We save the VRT text to the VRT file
        fp = open (input_file, 'w' )
        fp.write ( vrt_text )
        # We use ``ogr2ogr`` to convert from VRT to shapefile, and in this case, reproject from
        # MODIS sinuosoidal projection to Lon/Lat  ( 'ogr2ogr -f "ESRI Shapefile" -s_srs "+proj=sinu +R=6371007.181 +nadgrids=@null +wktext" -t_srs "+proj=longlat +ellps=GRS80 +no_defs" %s %s' % (output_file, input_file ), shell=True )
        # Get rid of the VRT file
        os.remove ( input_file )
        print "\tDone!"


EO & DA Ramblings © J Gomez-Dans Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More