Fixing Self-Intersecting Polygons Programmatically in Python

Understanding Self-Intersecting Polygons

Self-intersecting polygons, commonly called "bowties", occur when polygon edges cross within the same ring. These geometries violate the OGC Simple Features specification and cause spatial operations like intersects or union to fail or return incorrect results. They also break rendering in standard web mapping libraries. Resolving these anomalies is a foundational step in Geospatial Data Ingestion & Processing Workflows. This guide delivers a deterministic approach to detection and repair that preserves coordinate reference system integrity throughout.

Minimal Reproducible Code

The following script demonstrates a production-ready workflow using geopandas and shapely. It validates topologies, applies the repair algorithm, and verifies output.

import geopandas as gpd
from shapely.validation import explain_validity, make_valid


def fix_self_intersecting_polygons(
    gdf: gpd.GeoDataFrame, target_crs: str = "EPSG:3857"
) -> gpd.GeoDataFrame:
    """
    Detect and repair self-intersecting polygons in a GeoDataFrame.
    Projects to a planar CRS before repair to ensure metric accuracy,
    then reprojects back to the original CRS.
    """
    if gdf.crs is None:
        raise ValueError("GeoDataFrame must have a defined CRS before processing.")

    original_crs = gdf.crs

    # Project to a planar CRS for accurate geometric operations
    gdf_planar = gdf.to_crs(target_crs)

    # Identify invalid geometries
    invalid_mask = ~gdf_planar.is_valid
    if not invalid_mask.any():
        print("All geometries are valid. No repair needed.")
        return gdf

    print(f"Found {invalid_mask.sum()} invalid geometries. Applying repair...")

    # Programmatic repair using Shapely's make_valid
    gdf_planar.loc[invalid_mask, "geometry"] = (
        gdf_planar.loc[invalid_mask, "geometry"].apply(make_valid)
    )

    # Verify repair success
    post_repair_invalid = ~gdf_planar.is_valid
    if post_repair_invalid.any():
        print(f"Warning: {post_repair_invalid.sum()} geometries remain invalid after repair.")
        for idx in gdf_planar[post_repair_invalid].index:
            reason = explain_validity(gdf_planar.loc[idx, "geometry"])
            print(f"  Row {idx}: {reason}")

    # Return to original CRS
    return gdf_planar.to_crs(original_crs)


# Usage Example:
# gdf = gpd.read_file("input.shp")
# fixed_gdf = fix_self_intersecting_polygons(gdf)
# fixed_gdf.to_file("output_fixed.geojson", driver="GeoJSON")

Dependencies: geopandas>=1.0, shapely>=2.0, pyproj

Step-by-Step Explanation

The repair process relies on projection, validation, and deterministic reconstruction.

Projection to a metric CRS: Geographic coordinates distort distance and intersection math. The script forces projection to a metric CRS like EPSG:3857 or a local UTM zone. make_valid() operates in Cartesian space; running it in geographic degrees (EPSG:4326) can produce unpredictable results near poles or the antimeridian.

Validation with GEOS: gdf.is_valid leverages the GEOS topology engine. It flags bowties and ring self-intersections efficiently.

shapely.make_valid(): Decomposes invalid shapes into valid MultiPolygon or Polygon objects by splitting intersecting edges precisely at crossing coordinates. This is the preferred repair strategy in Shapely 2.0+; the legacy buffer(0) fallback is still useful for certain degenerate cases but alters geometry area. This methodology aligns with standard practices in Topology Validation & Repair pipelines.

Reprojection back: After repair, the geometries are transformed back to the original CRS so callers receive data in the same coordinate space they provided.

CRS Handling & Debugging

Use explain_validity() to output precise GEOS error strings. It returns coordinates like Self-intersection [12.345 67.890] for targeted debugging. Wrap the repair function in a try/except block for automated pipelines and log original versus repaired geometry counts to track data degradation.

Edge Cases & Production Considerations

Self-intersections typically stem from digitizing errors, coordinate truncation, or merging overlapping boundaries. Batch processing may resolve some geometries into empty sets after repair. Filter these immediately using gdf[~gdf.geometry.is_empty] before export.

For web mapping deployment, reproject the final output to EPSG:4326. Apply shapely.set_precision(geom, grid_size=1e-6) before validation to snap vertices to a strict grid and prevent micro-intersections introduced by coordinate truncation.

Performance Optimization: For pure make_valid() runs without conditional fallback, use the vectorized Shapely 2.0 function directly on the geometry array:

import shapely
import numpy as np

# Vectorized repair — significantly faster than .apply() for large datasets
geom_array = gdf_planar.geometry.values
repaired = shapely.make_valid(geom_array)
gdf_planar = gdf_planar.set_geometry(
    gpd.GeoSeries(repaired, crs=gdf_planar.crs)
)

This executes in compiled C and reduces runtime substantially compared to Python-level .apply() iteration.