Linearly scaling raster data with GDAL
So, it's quite tricky to play around with these geospatial datasets: they have a myriad formats, these keep changing, and you can never know what metadata will be available in them. A clever way around this situation is to use tools that allow you to read most of the data, most of the associated metadata, and that let you do the usual minor tweaks that you'll require to get things working. And this means GDAL (for rasters) and OGR (for vectors).
Typical problem: you get some data you've never seen before, and it is linearly scaled (so they put it in as a 16bit integer and you need to apply a linear mapping to get the units right as floating point numbers). In the GDAL mailing list there's a thread on this using the MODIS SST product. Obviously, you need a reasonable file format that is read easily (this mean GeoTIFF), and you need it in the right units, as you don't want to remember these things. The easiest way to accomplish this is to use gdal_translate, with its rather handy -scale option, which performs a linear scaling. We'll also want to change the data type to a floating point number, as otherwise results will be quantised. From the post above, we know what the linear scaling is, but we need to convert it to GDAL, which requires a mapping from source minimum and maximum values to destimation min/max. One way to do this is to assume that the data span the complete range between 0 and 65534 (maximum possible number) and calculate the two extrema in the destination space if you know the slope and intercept. Another way is to fish those two destination numbers from the metadata. Using
gdalinfo A20030012003008.L3m_8D_NSST_4 [...] Scaling=linear Scaling Equation=(Slope*l3m_data) + Intercept = Parameter value Slope=0.000717185 Intercept=-2 Scaled Data Minimum=-2 Scaled Data Maximum=45 Data Minimum=-1.999999 Data Maximum=35.785 [...]
- we see that the total scaling is between -2 and 45. A bash script that
- loops over a set of files could be
for file in A*L3m_8D_NSST_4 ;
do
echo "map: $file"
gdal_translate -of GTiff -ot Float32 \
-scale 0 65534 -2. 45. \
-a_srs "EPSG:4326" \
-a_nodata 65535 \
-a_ullr -180 90 180 -90 \
-co "COMPRESS=PACKBITS" \
'HDF4_SDS:UNKNOWN:"A20030012003008.L3m_8D_NSST_4":0'\
A20030012003008.L3m_8D_NSST_4.tif
done
A further refinement to save space is to convert the data to a GDAL Virtual Raster. This is basically a pointer to the datafile in the form of an XML file. If you only access the files rarely, or if you plan to carry out more processing on them, then this saves you storing more copies of the same data
gdal_translate -of VRT -ot Float32 -scale 0 65534 -2. 45. \ -a_srs "EPSG:4326" -a_nodata 65535 \ -a_ullr -180 90 180 -90 -co "COMPRESS=PACKBITS" \ 'HDF4_SDS:UNKNOWN:"A20030012003008.L3m_8D_NSST_4":0' \ A20030012003008.L3m_8D_NSST_4.vrt
results in the following VRT file:
<VRTDataset rasterXSize="8640" rasterYSize="4320">
<SRS>GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]</SRS>
<GeoTransform> -1.8000000000000000e+02, 4.1666666666666664e-02, 0.0000000000000000e+00, 9.0000000000000000e+01, 0.0000000000000000e+00, -4.1666666666666664e-02</GeoTransform>
<Metadata>
<MDI key="Product Name">A20030012003008.L3m_8D_NSST_4</MDI>
<MDI key="Sensor Name">MODISA</MDI>
<MDI key="Sensor"></MDI>
<MDI key="Title">MODISA Level-3 Standard Mapped Image</MDI>
<MDI key="Data Center"></MDI>
<MDI key="Station Name"></MDI>
<MDI key="Station Latitude">0</MDI>
<MDI key="Station Longitude">0</MDI>
<MDI key="Mission"></MDI>
<MDI key="Mission Characteristics"></MDI>
<MDI key="Sensor Characteristics"></MDI>
<MDI key="Product Type">8-day</MDI>
<MDI key="Replacement Flag">ORIGINAL</MDI>
<MDI key="Software Name">smigen</MDI>
<MDI key="Software Version">3.30</MDI>
<MDI key="Processing Time">2006060201827000</MDI>
<MDI key="Input Files">A20030012003008.L3b_8D_NSST.main</MDI>
<MDI key="Processing Control">smigen par=A20030012003008.L3m_8D_NSST_4.param</MDI>
<MDI key="Input Parameters">IFILE = /data1/vdc/sdpsoper/vpu0/workbuf/A20030012003008.L3b_8D_NSST.main|OFILE = A20030012003008.L3m_8D_NSST_4|PFILE = |PROD = sst|PALFILE = DEFAULT|RFLAG = ORIGINAL|MEAS = 1|STYPE = 0|DATAMIN = 0.000000|DATAMAX = 0.000000|LONWEST = -180.000000|LONEAST = 180.000000|LATSOUTH = -90.000000|LATNORTH = 90.000000|RESOLUTION = 4km|PROJECTION = RECT|GAP_FILL = 0|SEAM_LON = -180.000000|PRECISION = |</MDI>
<MDI key="L2 Flag Names">LAND,~HISOLZ</MDI>
<MDI key="Period Start Year">2003</MDI>
<MDI key="Period Start Day">1</MDI>
<MDI key="Period End Year">2003</MDI>
<MDI key="Period End Day">8</MDI>
<MDI key="Start Time">2002365123005000</MDI>
<MDI key="End Time">2003009023005000</MDI>
<MDI key="Start Year">2002</MDI>
<MDI key="Start Day">365</MDI>
<MDI key="Start Millisec">45005000</MDI>
<MDI key="End Year">2003</MDI>
<MDI key="End Day">9</MDI>
<MDI key="End Millisec">9005000</MDI>
<MDI key="Start Orbit">0</MDI>
<MDI key="End Orbit">0</MDI>
<MDI key="Orbit">0</MDI>
<MDI key="Map Projection">Equidistant Cylindrical</MDI>
<MDI key="Latitude Units">degrees North</MDI>
<MDI key="Longitude Units">degrees East</MDI>
<MDI key="Northernmost Latitude">90</MDI>
<MDI key="Southernmost Latitude">-90</MDI>
<MDI key="Westernmost Longitude">-180</MDI>
<MDI key="Easternmost Longitude">180</MDI>
<MDI key="Latitude Step">0.04166667</MDI>
<MDI key="Longitude Step">0.04166667</MDI
<MDI key="SW Point Latitude">-89.97916</MDI>
<MDI key="SW Point Longitude">-179.9792</MDI>
<MDI key="Data Bins">13045202</MDI>
<MDI key="Number of Lines">4320</MDI>
<MDI key="Number of Columns">8640</MDI>
<MDI key="Parameter">Sea Surface Temperature</MDI>
<MDI key="Measure">Mean</MDI>
<MDI key="Units">deg-C</MDI>
<MDI key="Scaling">linear</MDI>
<MDI key="Scaling Equation">(Slope*l3m_data) + Intercept = Parameter value</MDI>
<MDI key="Slope">0.000717185</MDI>
<MDI key="Intercept">-2</MDI>
<MDI key="Scaled Data Minimum">-2</MDI>
<MDI key="Scaled Data Maximum">45</MDI>
<MDI key="Data Minimum">-1.999999</MDI>
<MDI key="Data Maximum">35.785</MDI>
<MDI key="Scaling">linear</MDI>
<MDI key="Scaling Equation">(Slope*l3m_data) + Intercept = Parameter value</MDI>
<MDI key="Slope">0.000717185</MDI>
<MDI key="Intercept">-2</MDI>
</Metadata>
<VRTRasterBand dataType="Float32" band="1">
<Metadata />
<NoDataValue>6.55350000000000E+04</NoDataValue>
<ColorInterp>Gray</ColorInterp>
<ComplexSource>
<SourceFilename relativeToVRT="0">HDF4_SDS:UNKNOWN:"A20030012003008.L3m_8D_NSST_4":0</SourceFilename>
<SourceBand>1</SourceBand>
<SourceProperties RasterXSize="8640" RasterYSize="4320" DataType="UInt16" BlockXSize="8640" BlockYSize="115" />
<SrcRect xOff="0" yOff="0" xSize="8640" ySize="4320" />
<DstRect xOff="0" yOff="0" xSize="8640" ySize="4320" />
<ScaleOffset>-2</ScaleOffset>
<ScaleRatio>0.000717185</ScaleRatio>
</ComplexSource>
</VRTRasterBand>
</VRTDataset>
It basically points to the original data, and has the scaling information. If you wanted to, you could also use the -projwin or -srcwin options to crop the data, or even reproject it with gdalwarp.