Best Practices for GeoJSON Validation

Context & Problem Statement

Invalid GeoJSON frequently breaks web mapping renderers and spatial joins. Many libraries auto-parse malformed files, but implicit tolerance causes silent topology errors downstream. Implementing strict validation early in your Geospatial Data Ingestion & Processing Workflows prevents cascading failures.

The GeoJSON specification (RFC 7946) mandates WGS84 (EPSG:4326) coordinates with longitude before latitude. Real-world exports often omit CRS metadata, use the legacy "crs" member (removed in RFC 7946), or contain self-intersecting geometries that crash rendering engines.

Minimal Reproducible Validation Script

import json
import geojson
from shapely.geometry import shape
from shapely.validation import explain_validity
import logging

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")


def validate_geojson(filepath: str) -> dict:
    """
    Three-tier GeoJSON validator:
      1. RFC 7946 structural compliance (geojson library)
      2. Geometry topology (Shapely/GEOS)
      3. Legacy CRS member detection

    Returns a summary dict with status and feature count.
    """
    with open(filepath, "r") as f:
        raw_data = json.load(f)

    # 1. Schema & Structure Validation
    try:
        geojson_obj = geojson.loads(json.dumps(raw_data))
        if not geojson_obj.is_valid:
            raise ValueError(f"RFC 7946 structure violation: {geojson_obj.errors()}")
    except Exception as e:
        logging.error(f"Structure validation failed: {e}")
        return {"status": "fail", "error": str(e)}

    # 2. Geometry & Topology Validation
    features = raw_data.get("features", [])
    for i, feat in enumerate(features):
        geom = feat.get("geometry")
        if not geom:
            continue
        try:
            shp = shape(geom)
            if not shp.is_valid:
                reason = explain_validity(shp)
                logging.warning(f"Feature {i} invalid: {reason}")
        except Exception as e:
            logging.error(f"Feature {i} geometry parse error: {e}")

    # 3. Explicit CRS Handling
    # RFC 7946 removed the "crs" member; its presence signals a pre-standard file.
    if "crs" in raw_data:
        logging.warning(
            "Legacy 'crs' key detected. RFC 7946 requires WGS84 implicitly. "
            "Remove this key and ensure coordinates are in lon/lat order."
        )

    logging.info("Validation complete. Ready for downstream processing.")
    return {"status": "pass", "features_validated": len(features)}


# Usage: validate_geojson("input.geojson")

Step-by-Step Explanation

The script enforces a three-tier validation pipeline:

  1. Structural compliance: The geojson library checks RFC 7946 schema rules — missing type keys, malformed FeatureCollections, and incorrect coordinate array depths — before heavy parsing begins.
  2. Topology validation: shapely.shape() converts each geometry dict into a native GEOS object, then explain_validity() catches self-intersections, unclosed rings, and degenerate points that standard JSON parsers ignore.
  3. CRS metadata audit: RFC 7946 removed the "crs" member; any file that still contains it pre-dates the standard and may have non-WGS84 coordinates mislabelled as lon/lat. Log and investigate rather than silently strip.

This validation pattern integrates directly into Shapefile & GeoJSON Parsing routines. Always validate before projecting or merging datasets to avoid coordinate drift.

Edge Cases & Debugging

Ring winding order: RFC 7946 requires exterior rings to be counter-clockwise and holes to be clockwise — the opposite of the older GeoJSON convention. Some renderers (Mapbox, Deck.gl) enforce this; validate with shapely.is_ccw() on exterior ring coordinates if rendering artifacts appear.

Coordinate bounds overflow: Coordinates outside [-180, 180] for longitude or [-90, 90] for latitude are illegal in RFC 7946. They often appear when source data uses a projected CRS and was serialised without reprojection.

Buffer-zero repair: shp.buffer(0) resolves many topology errors but slightly alters geometry area and can split polygons. Use it only as a last resort and always log before/after geometry types to detect unexpected splits.

Streaming large files: For files exceeding a few hundred MB, avoid json.load() (loads the whole file). Use ijson or fiona.open() with iteration to parse features one at a time.

Debugging Checklist & Dependencies