Spatial Indexing in PostGIS with GiST
A spatial query over an unindexed PostGIS table scans every row; with a GiST index it jumps straight to candidates. This guide shows how to build, verify, and reason about GiST indexes so spatial predicates actually use them. It is for anyone whose PostGIS queries are slower than expected. It sits under PostGIS Integration with Python in Spatial Analysis & Advanced Query Techniques.
Why This Approach / What Goes Wrong
PostGIS spatial predicates work in two phases: a fast bounding-box filter (the && operator) that a GiST index accelerates, followed by an exact geometry test. Without the index, the planner box-tests every row sequentially. The common mistakes are: never creating the index; creating it but never running ANALYZE, so the planner has no statistics and ignores it; or writing a predicate that can't use the index (wrapping the indexed column in a function, or comparing across SRIDs). The goal is an EXPLAIN plan that shows an Index Scan, not a Seq Scan.
Prerequisites
- A PostGIS table with a populated
geometrycolumn sqlalchemy>=2.0andpsycopg>=3.1to drive it from Python
conda install -c conda-forge "sqlalchemy=2.0.*" "psycopg=3.1.*"
Step-by-Step Implementation
1. Create the GiST index on the geometry column.
from sqlalchemy import create_engine, text
engine = create_engine("postgresql+psycopg://gis:gis@localhost:5432/gisdb")
with engine.begin() as conn:
conn.execute(text(
"CREATE INDEX IF NOT EXISTS roads_gix ON roads USING GIST (geometry)"
))
2. Run ANALYZE so the planner gets row/selectivity statistics.
with engine.begin() as conn:
conn.execute(text("ANALYZE roads"))
3. Write predicates the index can use — keep the indexed column bare and SRIDs matched.
import geopandas as gpd
# Good: bbox operator (&&) and ST_Intersects both hit the GiST index
good_sql = """
SELECT r.road_id, r.geometry
FROM roads r
WHERE ST_Intersects(
r.geometry,
ST_SetSRID(ST_MakeEnvelope(7.6, 45.0, 7.8, 45.1), 4326)
)
"""
window = gpd.read_postgis(good_sql, engine, geom_col="geometry")
4. Avoid index-defeating patterns.
# Bad: ST_Buffer wraps the indexed column, so the index can't be used
# WHERE ST_Intersects(ST_Buffer(r.geometry, 0.01), :pt)
# Better: buffer the constant, or use ST_DWithin which is index-aware
# WHERE ST_DWithin(r.geometry, :pt, 0.01)
Verification
Use EXPLAIN ANALYZE to confirm the planner chose an index scan.
from sqlalchemy import create_engine, text
engine = create_engine("postgresql+psycopg://gis:gis@localhost:5432/gisdb")
with engine.connect() as conn:
plan = conn.execute(text(
"EXPLAIN ANALYZE SELECT road_id FROM roads "
"WHERE geometry && ST_MakeEnvelope(7.6, 45.0, 7.8, 45.1, 4326)"
)).fetchall()
plan_text = "\n".join(row[0] for row in plan)
print(plan_text)
# Index Scan using roads_gix on roads (cost=...) <-- index is used
assert "Index Scan" in plan_text or "Bitmap Index Scan" in plan_text
Seeing Seq Scan instead means the index is missing, ANALYZE wasn't run, or the predicate isn't index-eligible.
Edge Cases & Debugging
- Planner ignores the index on a small table. A sequential scan is genuinely cheaper for a few thousand rows; test with realistic data volumes.
Seq Scandespite an index. RunANALYZE; check the column isn't wrapped in a function in theWHEREclause.- Index not used after a big load. Statistics are stale —
ANALYZEafter bulk inserts. geographycolumn won't index well. GiST works ongeographytoo, but cast consistency matters; index the type you actually query.- Index bloat after many updates.
REINDEX INDEX roads_gixor useCREATE INDEX CONCURRENTLYto rebuild without locking. - Cross-SRID predicate. The box test can't match across SRIDs;
ST_Transformto a common SRID first.