PostGIS Integration with Python
When a spatial dataset outgrows what fits comfortably in memory, the analysis moves into the database. PostGIS — the spatial extension to PostgreSQL — runs predicates, joins, and aggregations server-side over indexed geometry, returning only the rows you need to Python. This guide covers the round trip between GeoPandas DataFrames Explained and PostGIS, and complements the in-process engines covered in DuckDB Spatial Analytics and Scaling with Dask-GeoPandas within Spatial Analysis & Advanced Query Techniques.
Architecture & Data Structures
PostGIS stores geometry in a geometry (planar) or geography (spheroidal) column, each tagged with an SRID — the database's name for an EPSG code. Spatial predicates (ST_Intersects, ST_DWithin, ST_Contains) run in SQL and are accelerated by a GiST index over the geometry column. The Python bridge is GeoPandas' to_postgis/read_postgis, which lean on SQLAlchemy for the connection and GeoAlchemy2 for geometry type handling.
import geopandas as gpd
from sqlalchemy import create_engine
engine = create_engine("postgresql+psycopg://gis:gis@localhost:5432/gisdb")
# Read with a spatial predicate executed server-side
sql = """
SELECT parcel_id, land_use, geom
FROM parcels
WHERE ST_DWithin(geom, ST_SetSRID(ST_MakePoint(%(lon)s, %(lat)s), 4326)::geography, 500)
"""
nearby = gpd.read_postgis(sql, engine, geom_col="geom", params={"lon": 7.69, "lat": 45.07})
The win is that parcels may hold tens of millions of rows; only the few hundred within 500 m cross the wire.
Environment Configuration & Dependency Resolution
conda install -c conda-forge "geopandas=0.14.*" "sqlalchemy=2.0.*" "geoalchemy2=0.15.*" "psycopg=3.1.*"
# PostGIS server (Docker is the simplest dev setup):
# docker run -e POSTGRES_PASSWORD=gis -p 5432:5432 postgis/postgis:16-3.4
Use psycopg (version 3) with the postgresql+psycopg:// URL prefix; the older psycopg2 works but is on a separate maintenance track. GeoAlchemy2 must be importable for to_postgis to register the geometry type, even though you rarely call it directly.
Vectorized Operations & Core Workflow
The standard loop: write a GeoDataFrame to a table, index it, then query with spatial SQL. Connection and write details are in Connecting GeoPandas to PostGIS with SQLAlchemy.
import geopandas as gpd
from sqlalchemy import create_engine, text
engine = create_engine("postgresql+psycopg://gis:gis@localhost:5432/gisdb")
census_blocks = gpd.read_file("census_blocks.gpkg").to_crs(epsg=4326)
census_blocks.to_postgis("census_blocks", engine, if_exists="replace", index=False)
with engine.begin() as conn:
conn.execute(text("CREATE INDEX ON census_blocks USING GIST (geometry)"))
conn.execute(text("ANALYZE census_blocks"))
Geometry / Data Processing Details
Server-side joins are where PostGIS earns its place. A spatial join that would materialize a huge intermediate in memory runs as an indexed nested loop in the database:
import geopandas as gpd
from sqlalchemy import create_engine
engine = create_engine("postgresql+psycopg://gis:gis@localhost:5432/gisdb")
# Count sensors per census block entirely in SQL
join_sql = """
SELECT b.block_id, b.geometry, COUNT(s.sensor_id) AS sensor_count
FROM census_blocks b
LEFT JOIN sensors s ON ST_Contains(b.geometry, s.geom)
GROUP BY b.block_id, b.geometry
"""
result = gpd.read_postgis(join_sql, engine, geom_col="geometry")
The equivalent of this is covered for in-memory data in Spatial Joins & Merging; PostGIS is the answer when "in memory" stops being an option.
CRS Alignment & Projection Pipeline
Every PostGIS geometry carries an SRID, and predicates between mismatched SRIDs raise an error rather than silently misalign — a feature, not a bug. Decide the storage CRS deliberately: EPSG:4326 for general storage and geography-based distance, or a projected SRID when most work is metric. Reproject in Python via Coordinate Systems with PyProj before writing, or ST_Transform in SQL.
from sqlalchemy import create_engine, text
engine = create_engine("postgresql+psycopg://gis:gis@localhost:5432/gisdb")
# Distance in metres: cast 4326 geometry to geography, which measures on the spheroid
metric_sql = """
SELECT a.id
FROM hydrants a, fires f
WHERE ST_DWithin(a.geom::geography, f.geom::geography, 300)
"""
# Avoid ST_Distance on raw 4326 geometry — it returns degrees, not metres.
Production Export & Integration
- Index before you query. A GiST index turns a sequential scan into a fast lookup; the speedup is order-of-magnitude. Details in Spatial Indexing in PostGIS with GiST.
- Push work down. Filter, join, and aggregate in SQL; pull results, not raw tables.
- Stream large reads. Use a server-side cursor (
stream_results) or chunk by bounding box to bound client memory. - Feed the web layer. Export query results as GeoParquet or GeoJSON for Web Mapping & Interactive Visualization.
Windows / Platform Edge Cases & Debugging
could not find function ST_*. The PostGIS extension isn't enabled:CREATE EXTENSION postgis;.Operation on mixed SRID geometries. Two columns have different SRIDs;ST_Transformone to match.psycopginstall fails on Windows. Use the conda-forge build or thepsycopg[binary]wheel rather than compiling.- Distances look wrong. You measured on
geometryin degrees; cast togeographyor use a projected SRID. to_postgiserrors on geometry type. GeoAlchemy2 isn't installed/importable in the environment.- Slow first query after load. No index yet, or
ANALYZEwasn't run so the planner lacks statistics.