Best practices for GeoJSON validation
Context & Problem Statement
Invalid GeoJSON frequently breaks web mapping renderers and spatial joins. Many libraries auto-parse malformed files, but implicit tolerance causes silent topology errors downstream. Implementing strict validation early in your Geospatial Data Ingestion & Processing Workflows prevents cascading failures. The GeoJSON specification (RFC 7946) mandates WGS84 (EPSG:4326) coordinates. Real-world exports often omit CRS metadata or contain self-intersecting geometries that crash rendering engines.
Minimal Reproducible Validation Script
import json
import geojson
from shapely.geometry import shape
from shapely.validation import explain_validity
import logging
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
def validate_geojson(filepath: str) -> dict:
with open(filepath, 'r') as f:
raw_data = json.load(f)
# 1. Schema & Structure Validation
try:
geojson_obj = geojson.loads(json.dumps(raw_data))
if not geojson_obj.is_valid:
raise ValueError(f"RFC 7946 structure violation: {geojson_obj.errors()}")
except Exception as e:
logging.error(f"Structure validation failed: {e}")
return {"status": "fail", "error": str(e)}
# 2. Geometry & Topology Validation
features = raw_data.get('features', [])
for i, feat in enumerate(features):
geom = feat.get('geometry')
if not geom:
continue
try:
shp = shape(geom)
if not shp.is_valid:
reason = explain_validity(shp)
logging.warning(f"Feature {i} invalid: {reason}")
except Exception as e:
logging.error(f"Feature {i} geometry parse error: {e}")
# 3. Explicit CRS Handling
if 'crs' in raw_data:
logging.warning("Legacy 'crs' key detected. RFC 7946 requires WGS84. Stripping and enforcing EPSG:4326.")
del raw_data['crs']
logging.info("Validation complete. Ready for downstream processing.")
return {"status": "pass", "features_validated": len(features)}
# Usage: validate_geojson('input.geojson')
Step-by-Step Explanation
The script enforces a three-tier validation pipeline. First, the geojson library checks RFC 7946 structural compliance. It catches missing type keys or malformed feature collections before heavy parsing begins. Second, shapely validates topological integrity by converting each geometry into a native Python object. Running explain_validity() catches self-intersections, unclosed rings, and degenerate points that standard JSON parsers ignore. Third, explicit CRS handling strips legacy crs objects and enforces WGS84 compliance. This is critical when migrating data into Shapefile & GeoJSON Parsing routines. Always validate before projecting or merging datasets to avoid coordinate drift.
Edge Cases & Debugging Clarity
Common failure modes include deeply nested MultiPolygons with invalid ring ordering. Outer rings must be clockwise while inner rings require counter-clockwise orientation. Coordinates frequently exceed [-180, 180] longitude bounds or mix geometry types in a single FeatureCollection. Enable shapely.validation.explain_validity to get precise error strings like Self-intersection[12.345 67.890]. If your pipeline requires automatic repair, apply shp.buffer(0) cautiously. It resolves topology but slightly alters geometry area. Always log feature indices alongside validation errors to enable rapid source-data correction. When handling batch files, wrap the validator in a try/except block. Output a JSON error manifest instead of halting execution.
For memory efficiency, load GeoJSON files in streaming mode when datasets exceed 500MB. Parse geometries individually to prevent peak RAM spikes during validation. Cache shapely results when running iterative spatial joins. Pre-validate coordinate bounds before instantiating heavy geometry objects.
Debugging Checklist & Dependencies
- Use
shapely.validation.explain_validity()for precise topology error locations. - Log feature array indices alongside validation failures for rapid source tracing.
- Apply
shp.buffer(0)only as a last-resort repair; it modifies geometry area slightly. - Validate JSON structure before geometry parsing to fail fast on malformed payloads.
- Check for mixed geometry types in FeatureCollections that violate schema expectations.
- Install required stack:
geojson>=3.0.0,shapely>=2.0.0,pyproj>=3.0.0.