Data Validation Results: Jakarta 2007 Flood
AIFDR (Australian-Indonesia Facility for Disaster Reduction) maintains a plugin called InaSAFE for the open-source platform QGIS (Quantum GIS). This plugin offers a number of datasets and risk evaluation processes, including a map of Jakarta 2007 flood extents. Jakarta is located below sea level, on the coast, and between several mountains, so it tends to flood every 3-5 years, during the height of its rainy season in January/February. The rapid development of its built environment has also contributed to flood extents. Its 2007 flood was one of the most extensive in recent decades.
After OpenIR team member Barry Beagen found this plugin, undergraduate researcher Stephanie New and I conducted a data validation process with this map, comparing it with OpenIR’s Jakarta Risk Index map. We documented this process so that it can be replicated with other OpenIR Risk Indices and other observed data, including that for Hurricane Sandy in New York and 2013 flooding in Jakarta.
The general validation steps, mostly conducted in QGIS, consist of
- Converting, if needed, the observed flood extent image from a raster to vector image.
- Converting each Risk Index level (vegetation low-lying, vegetation high-lying, urban low-lying, urban high-lying) from a raster to vector image.
- Checking that the observed data and the Risk Index map are correctly geolocated and in the same coordinate system.
- Using the flood extent image as a clipping mask for the risk index image.
- Calculating the area of vegetation low-lying, vegetation high-lying, urban low-lying, and urban high-lying in the resulting image.
- Exporting these areas to spreadsheets.
- Calculating percentages of these areas (from the total area) in the resulting image.
Screenshots of some intermediate imagery: a) the InaSAFE raster image, which contains depth information; b) InaSAFE imagery converted to vector format; c) the raw image for the OpenIR risk index level, in this case urban low-lying; d) the urban low-lying layer, clipped by the InaSAFE layer
In this particular validation process, the InaSAFE data contains information on flood depth (see figure below). However, for the purposes of this initial validation, we decided to discard this information, so as to more easily convert the data to vector form, and so as to restrict our process to a one-to-one comparison between flood extent and risk index levels.
The steps for this specific validation process consist of
- Downloading InaSAFE, QGIS, and any required dependencies.
- Adding the InaSAFE layer in QGIS: Click Add Raster Layer on the tool bar, and select inasafe_data-master>hazard>Flood_Current_Depth_Jakarta_geographic.asc.
- Polygonizing the InaSAFE layer: On the top toolbar, select Raster>Conversions>Polygonize (Raster to Vector), and input the InaSAFE flood raster layer.
- Loading one of OpenIR’s Flood Risk Index levels (i.e urban low-lying): Click Add Raster Layer and select a layer of OpenIR’s Flood Risk Index (the naming convention is “urb_LL”). Temporarily deselect the InaSAFE Flood Vector Layer.
- Removing the background fill of the raster layer: Right click on the layer and select Properties. In the Transparency tab, click the Add Values from display button and click on the black background of the image. Delete the preexisting transparency row in the table (Indexed Value: -32768 Percent Transparency 100.00).
- Clipping the raster layer using a mask: Navigate to the top toolbar, and select Raster>Extraction>Clipper. Under Clipping mode, choose mask layer and select the InaSAFE Flood Vector Layer.
- Repeating Steps 2-5 for other index levels (urb_HL, veg_LL, veg_HL).
- Removing the borders from the clipped images, so that the images can be correctly filled:
- Open the clipped shapefile.
- Transform it from Polygons to Lines: Vector > Geometry Tools > Polygons to Lines.
- Toggle Editing and minimize the outer border to a tiny speck. Select the Move Features button and move the entire border away from the actual data. Choose Select features by freehand to select the border. Click Cut Features.
- Convert this layer back to polygons: Vector > Geometry Tools > Lines to Polygons. The tiny speck should be removed.
- Calculating area percentages: Navigate to the top toolbar, and select Vector>Geometry Tools>Export/Add Geometry Columns. Use the Select Features by Freehand to select the entire raster layer. Then, click Layer>Open Attribute Table. Copy the table (Ctrl+C) and paste it into a spreadsheet editor. The second column of the table is the areas of the polygons that make up the layer, so auto-sum the second columns to calculate the total area of the intersection between OpenIR’s Risk Index layer and InaSAFE’s flood data.
The final image, in which the observed 2007 data is a clipping mask for the risk index map, displays mostly dark red areas, indicating that the areas evaluated by OpenIR to be of highest vulnerability (urban low-lying) have the most overlap with 2007 flood extents. The resulting image displays very little light pink areas, indicating that areas evaluated by OpenIR to be a lowest vulnerability (vegetation low-lying) were not flooded in 2007.
Percent area calculations quantitatively confirm these visual observations: the highest index level (#4, urban low-lying) overlaps with 66.64% of the 2007 flood area, while the lowest index level (#1, vegetation high-lying) overlaps with only 0.47% of the 2007 flooded area.
However, the middle indices, #2 and #3, are more problematic. At 4.52%, index level #3 is much lower than index level #3, which is 28.38%. This indicates that areas of low-lying vegetation was more extensively flooded than high-lying urban areas.
In some ways, this result is quite intuitive: low-lying areas were more extensively flooded than high-lying areas; about 95% of the flood affected a low-lying area. But this process does highlight a problem in ranking environmental features in terms of risk. Perhaps it would make more sense for OpenIR’s data viewers to just to show flood planes, that is, low-lying areas. On the other hand, it is useful to know how low-lying areas intersect with vegetation and urban surfaces. Perhaps, instead of numerically ranking feature combinations, it would make more sense to label what they are: urban low-lying, urban high-lying, vegetation, low-lying, and vegetation high-lying. Professor Cesar Hidalgo also suggested that the four index colors are too similar; it would be easier on the eye if they were four different hues, instead of four shades of red.
These considerations return the discussion to issues of design and user interface, which we continued to explore in implementing our Summative Data Viewers.