Best Practices for GeoJSON Validation
Context & Problem Statement
Invalid GeoJSON frequently breaks web mapping renderers and spatial joins. Many libraries auto-parse malformed files, but implicit tolerance causes silent topology errors downstream. Implementing strict validation early in your Geospatial Data Ingestion & Processing Workflows prevents cascading failures.
The GeoJSON specification (RFC 7946) mandates WGS84 (EPSG:4326) coordinates with longitude before latitude. Real-world exports often omit CRS metadata, use the legacy "crs" member (removed in RFC 7946), or contain self-intersecting geometries that crash rendering engines.
Minimal Reproducible Validation Script
import json
import geojson
from shapely.geometry import shape
from shapely.validation import explain_validity
import logging
logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
def validate_geojson(filepath: str) -> dict:
"""
Three-tier GeoJSON validator:
1. RFC 7946 structural compliance (geojson library)
2. Geometry topology (Shapely/GEOS)
3. Legacy CRS member detection
Returns a summary dict with status and feature count.
"""
with open(filepath, "r") as f:
raw_data = json.load(f)
# 1. Schema & Structure Validation
try:
geojson_obj = geojson.loads(json.dumps(raw_data))
if not geojson_obj.is_valid:
raise ValueError(f"RFC 7946 structure violation: {geojson_obj.errors()}")
except Exception as e:
logging.error(f"Structure validation failed: {e}")
return {"status": "fail", "error": str(e)}
# 2. Geometry & Topology Validation
features = raw_data.get("features", [])
for i, feat in enumerate(features):
geom = feat.get("geometry")
if not geom:
continue
try:
shp = shape(geom)
if not shp.is_valid:
reason = explain_validity(shp)
logging.warning(f"Feature {i} invalid: {reason}")
except Exception as e:
logging.error(f"Feature {i} geometry parse error: {e}")
# 3. Explicit CRS Handling
# RFC 7946 removed the "crs" member; its presence signals a pre-standard file.
if "crs" in raw_data:
logging.warning(
"Legacy 'crs' key detected. RFC 7946 requires WGS84 implicitly. "
"Remove this key and ensure coordinates are in lon/lat order."
)
logging.info("Validation complete. Ready for downstream processing.")
return {"status": "pass", "features_validated": len(features)}
# Usage: validate_geojson("input.geojson")
Step-by-Step Explanation
The script enforces a three-tier validation pipeline:
- Structural compliance: The
geojsonlibrary checks RFC 7946 schema rules — missingtypekeys, malformed FeatureCollections, and incorrect coordinate array depths — before heavy parsing begins. - Topology validation:
shapely.shape()converts each geometry dict into a native GEOS object, thenexplain_validity()catches self-intersections, unclosed rings, and degenerate points that standard JSON parsers ignore. - CRS metadata audit: RFC 7946 removed the
"crs"member; any file that still contains it pre-dates the standard and may have non-WGS84 coordinates mislabelled as lon/lat. Log and investigate rather than silently strip.
This validation pattern integrates directly into Shapefile & GeoJSON Parsing routines. Always validate before projecting or merging datasets to avoid coordinate drift.
Edge Cases & Debugging
Ring winding order: RFC 7946 requires exterior rings to be counter-clockwise and holes to be clockwise — the opposite of the older GeoJSON convention. Some renderers (Mapbox, Deck.gl) enforce this; validate with shapely.is_ccw() on exterior ring coordinates if rendering artifacts appear.
Coordinate bounds overflow: Coordinates outside [-180, 180] for longitude or [-90, 90] for latitude are illegal in RFC 7946. They often appear when source data uses a projected CRS and was serialised without reprojection.
Buffer-zero repair: shp.buffer(0) resolves many topology errors but slightly alters geometry area and can split polygons. Use it only as a last resort and always log before/after geometry types to detect unexpected splits.
Streaming large files: For files exceeding a few hundred MB, avoid json.load() (loads the whole file). Use ijson or fiona.open() with iteration to parse features one at a time.
Debugging Checklist & Dependencies
explain_validity()returns precise GEOS error strings including coordinates, e.g.,Self-intersection [12.345 67.890].- Log feature array indices alongside failures for rapid source tracing.
- Check for mixed geometry types in a single FeatureCollection — some downstream tools reject heterogeneous collections.
- Required stack:
geojson>=3.0.0,shapely>=2.0.0,pyproj>=3.0.0.