PostGIS Integration with Python

When a spatial dataset outgrows what fits comfortably in memory, the analysis moves into the database. PostGIS — the spatial extension to PostgreSQL — runs predicates, joins, and aggregations server-side over indexed geometry, returning only the rows you need to Python. This guide covers the round trip between GeoPandas DataFrames Explained and PostGIS, and complements the in-process engines covered in DuckDB Spatial Analytics and Scaling with Dask-GeoPandas within Spatial Analysis & Advanced Query Techniques.

GeoPandas to PostGIS round trip GeoPandas writes a GeoDataFrame to a PostGIS table through GeoAlchemy2; PostGIS applies a GiST index and runs server-side spatial SQL; only the result rows return to GeoPandas. GeoPandas GeoDataFrame in-memory PostGIS GiST index ST_Intersects ST_DWithin server-side SQL Result set filtered rows back to Python to_postgis read_postgis
PostGIS pushes the spatial work to an indexed database; Python sends geometry in and pulls only matching rows back.

Architecture & Data Structures

PostGIS stores geometry in a geometry (planar) or geography (spheroidal) column, each tagged with an SRID — the database's name for an EPSG code. Spatial predicates (ST_Intersects, ST_DWithin, ST_Contains) run in SQL and are accelerated by a GiST index over the geometry column. The Python bridge is GeoPandas' to_postgis/read_postgis, which lean on SQLAlchemy for the connection and GeoAlchemy2 for geometry type handling.

import geopandas as gpd
from sqlalchemy import create_engine

engine = create_engine("postgresql+psycopg://gis:gis@localhost:5432/gisdb")

# Read with a spatial predicate executed server-side
sql = """
SELECT parcel_id, land_use, geom
FROM parcels
WHERE ST_DWithin(geom, ST_SetSRID(ST_MakePoint(%(lon)s, %(lat)s), 4326)::geography, 500)
"""
nearby = gpd.read_postgis(sql, engine, geom_col="geom", params={"lon": 7.69, "lat": 45.07})

The win is that parcels may hold tens of millions of rows; only the few hundred within 500 m cross the wire.

Environment Configuration & Dependency Resolution

conda install -c conda-forge "geopandas=0.14.*" "sqlalchemy=2.0.*" "geoalchemy2=0.15.*" "psycopg=3.1.*"
# PostGIS server (Docker is the simplest dev setup):
#   docker run -e POSTGRES_PASSWORD=gis -p 5432:5432 postgis/postgis:16-3.4

Use psycopg (version 3) with the postgresql+psycopg:// URL prefix; the older psycopg2 works but is on a separate maintenance track. GeoAlchemy2 must be importable for to_postgis to register the geometry type, even though you rarely call it directly.

Vectorized Operations & Core Workflow

The standard loop: write a GeoDataFrame to a table, index it, then query with spatial SQL. Connection and write details are in Connecting GeoPandas to PostGIS with SQLAlchemy.

import geopandas as gpd
from sqlalchemy import create_engine, text

engine = create_engine("postgresql+psycopg://gis:gis@localhost:5432/gisdb")

census_blocks = gpd.read_file("census_blocks.gpkg").to_crs(epsg=4326)
census_blocks.to_postgis("census_blocks", engine, if_exists="replace", index=False)

with engine.begin() as conn:
    conn.execute(text("CREATE INDEX ON census_blocks USING GIST (geometry)"))
    conn.execute(text("ANALYZE census_blocks"))

Geometry / Data Processing Details

Server-side joins are where PostGIS earns its place. A spatial join that would materialize a huge intermediate in memory runs as an indexed nested loop in the database:

import geopandas as gpd
from sqlalchemy import create_engine

engine = create_engine("postgresql+psycopg://gis:gis@localhost:5432/gisdb")

# Count sensors per census block entirely in SQL
join_sql = """
SELECT b.block_id, b.geometry, COUNT(s.sensor_id) AS sensor_count
FROM census_blocks b
LEFT JOIN sensors s ON ST_Contains(b.geometry, s.geom)
GROUP BY b.block_id, b.geometry
"""
result = gpd.read_postgis(join_sql, engine, geom_col="geometry")

The equivalent of this is covered for in-memory data in Spatial Joins & Merging; PostGIS is the answer when "in memory" stops being an option.

CRS Alignment & Projection Pipeline

Every PostGIS geometry carries an SRID, and predicates between mismatched SRIDs raise an error rather than silently misalign — a feature, not a bug. Decide the storage CRS deliberately: EPSG:4326 for general storage and geography-based distance, or a projected SRID when most work is metric. Reproject in Python via Coordinate Systems with PyProj before writing, or ST_Transform in SQL.

from sqlalchemy import create_engine, text

engine = create_engine("postgresql+psycopg://gis:gis@localhost:5432/gisdb")

# Distance in metres: cast 4326 geometry to geography, which measures on the spheroid
metric_sql = """
SELECT a.id
FROM hydrants a, fires f
WHERE ST_DWithin(a.geom::geography, f.geom::geography, 300)
"""
# Avoid ST_Distance on raw 4326 geometry — it returns degrees, not metres.

Production Export & Integration

Windows / Platform Edge Cases & Debugging