Computing Overlay Union & Difference with GeoPandas

geopandas.overlay implements the set-theoretic operations — intersection, union, difference, symmetric difference, identity — that combine two polygon layers into a new one. This guide shows each mode on realistic layers and the topology hygiene that keeps the output clean. It is for anyone overlaying zoning against hazard areas, or land use against parcels. It sits under Geometric Intersections & Overlays in Spatial Analysis & Advanced Query Techniques.

Why This Approach / What Goes Wrong

overlay is not the same as sjoin. A spatial join attaches attributes where geometries relate but keeps the original geometries; an overlay cuts geometries against each other and produces new polygons at their intersections. Two things break overlays: invalid input geometry (self-intersections turn unions into TopologyExceptions) and unprojected coordinates (areas computed in degrees are meaningless). Validate first with the tooling from Topology Validation & Repair, and work in a projected CRS. Done right, overlay is the precise tool; done on dirty data, it produces slivers and errors.

Prerequisites

conda install -c conda-forge "geopandas=0.14.*" "shapely=2.0.*"

Step-by-Step Implementation

1. Load two polygon layers and align CRS to a projected system.

import geopandas as gpd

zoning = gpd.read_file("zoning.gpkg")
floodplain = gpd.read_file("floodplain.gpkg")

# Project both to the same metric CRS so areas are in m²
zoning = zoning.to_crs(epsg=25832)
floodplain = floodplain.to_crs(epsg=25832)

2. Validate before overlaying.

from shapely.validation import make_valid

for layer in (zoning, floodplain):
    invalid = ~layer.geometry.is_valid
    if invalid.any():
        layer.loc[invalid, "geometry"] = layer.loc[invalid, "geometry"].apply(make_valid)

3. Intersection — the area in both layers (zoning and flood risk).

zoning_at_risk = gpd.overlay(zoning, floodplain, how="intersection")
zoning_at_risk["risk_area_m2"] = zoning_at_risk.geometry.area

4. Difference — zoning outside the floodplain.

zoning_safe = gpd.overlay(zoning, floodplain, how="difference")

5. Union — every distinct piece from both, attributes preserved.

combined = gpd.overlay(zoning, floodplain, how="union")
# 'union' yields all intersection pieces plus the non-overlapping parts of each input

6. Symmetric difference — area in exactly one layer.

either_not_both = gpd.overlay(zoning, floodplain, how="symmetric_difference")

Verification

Check that areas reconcile — intersection plus difference should equal the original zoning area.

total_zoning = zoning.geometry.area.sum()
recombined = zoning_at_risk.geometry.area.sum() + zoning_safe.geometry.area.sum()

print(f"Original zoning area:   {total_zoning:,.0f} m²")     # 184,302,551 m²
print(f"Intersection + diff:    {recombined:,.0f} m²")        # 184,302,551 m²
assert abs(total_zoning - recombined) / total_zoning < 1e-6, "Area leak — check validity"
assert zoning_at_risk.geometry.is_valid.all()

A reconciliation gap signals invalid input or a CRS mismatch that distorted the cut.

Edge Cases & Debugging