Stitching shapefiles efficiently using OGR
Today I had the unenviable task of stitching together and reprojecting a large number of Shapefiles (yes, shapefiles are still used. Nothing has changed since 1992, I guess). The process is greatly simplified by using OGR's virtual datasets for vectors in a way that is reminiscent of how you about doing this with GDAL and raster data (see here).
The process is extemely simple, and it is done by creating a VRT XML file. The definition of the format given in this page, but in essence if you want to have a union of different layers, you just need to come up with the following XML:
<OGRVRTDataSource> <OGRVRTUnionLayer name="TheLayerUnited"> <OGRVRTLayer name="layer1"> <SrcDataSource>layer1.shp</SrcDataSource> </OGRVRTLayer> <OGRVRTLayer name="layer2"> <SrcDataSource>layer2.shp</SrcDataSource> </OGRVRTLayer> </OGRVRTUnionLayer> </OGRVRTDataSource>
This is really easy: you can sandwich in as many layers as you want with extra <OGRVRTUnionLayer> elements. You can also transform them by wrapping this in a <OGRVRTWarpedLayer> object, but it turns out that in my particular case, ogr2ogr seems to be slightly easier. At any rate, this would be the code to convert the two datasets to a new projection:
<OGRVRTDataSource> <OGRVRTWarpedLayer> <OGRVRTUnionLayer name="TheLayerUnited"> <OGRVRTLayer name="layer1"> <SrcDataSource>layer1.shp</SrcDataSource> </OGRVRTLayer> <OGRVRTLayer name="layer2"> <SrcDataSource>layer2.shp</SrcDataSource> </OGRVRTLayer> </OGRVRTUnionLayer> <TargetSRS>EPSG:29030</TargetSRS> </OGRVRTWarpedLayer> </OGRVRTDataSource>
You can do all sorts of other things easily, like spatial filtering and things like that. To me, it is a useful way to be able to extract areas of complex and large multi-shapefile datasets.
Eventually, this is the code I ended up using from Python.
#!/usr/bin/env python import sys import subprocess import os import numpy as np # Not really needed, but I'm happy using numpy... # These MODIS tiles (H/V) cover the US (CONUS) tiles="""8 6 9 6 10 6 8 5 9 5 10 5 11 5 12 5 8 4 9 4 10 4 11 4 12 4 13 4""" # Put the tiles in an array tile_list = np.fromstring(tiles, sep=" " ).reshape((14,2)).astype(np.int) # Main loop over the years for year in xrange(2000, 2013): # Now months... for month in ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]: # Some user feedback print "%04d/%s..." % (year, month), sys.stdout.flush() # ``vrt_text`` holds the context of the VRT file vrt_text='<OGRVRTDataSource>\n\t<OGRVRTUnionLayer name="JoinedUp">\n' for htile, vtile in tile_list: # We add the different shapefiles, in this case, indexed by h-tile, v-tile, year and month vrt_text += ''' <OGRVRTLayer name="MCD64A1_fires_h%02dv%02d_%04d_%s"> <SrcDataSource>shapes/MCD64A1_fires_h%02dv%02d_%04d_%s.shp</SrcDataSource> </OGRVRTLayer>\n''' % ( htile, vtile, year, month,htile, vtile, year, month) vrt_text +='''\t</OGRVRTUnionLayer>\n</OGRVRTDataSource>''' # This is the ouptut file with VRT extension... input_file="/tmp/CONUS_MCD64A1_%04d_%s.vrt" % ( year, month) # This is the output file, but in Shapefile format output_file = input_file.replace (".vrt", ".shp" ) # We save the VRT text to the VRT file fp = open (input_file, 'w' ) fp.write ( vrt_text ) fp.close() # We use ``ogr2ogr`` to convert from VRT to shapefile, and in this case, reproject from # MODIS sinuosoidal projection to Lon/Lat subprocess.call ( 'ogr2ogr -f "ESRI Shapefile" -s_srs "+proj=sinu +R=6371007.181 +nadgrids=@null +wktext" -t_srs "+proj=longlat +ellps=GRS80 +no_defs" %s %s' % (output_file, input_file ), shell=True ) # Get rid of the VRT file os.remove ( input_file ) print "\tDone!" sys.stdout.flush()