Shapely 1.x vs Shapely 2.0 Vectorization
Shapely 2.0 was a ground-up rewrite around a NumPy-style vectorized API, and it changes how fast geometry operations run and how you should write them. Code that loops over geometries the 1.x way still works, but leaves most of the speedup on the table. This guide compares the two and shows the vectorized idioms. It is for anyone whose Shapely-heavy code feels slow. It sits under Shapely Geometry Operations in Mastering Core Geospatial Python Libraries.
Why This Approach / What Goes Wrong
In Shapely 1.x, every operation went through a Python method on a single geometry object, so processing a million polygons meant a million Python-level GEOS calls. Shapely 2.0 exposes module-level functions (shapely.area, shapely.buffer, shapely.intersects) that accept NumPy arrays of geometries and execute the loop in C, often 5–100× faster on bulk work. The catch during migration: a handful of 1.x behaviors changed — iterating a multipart geometry, the .ctypes/array interface, and some cascaded_union-era names — so old code can break subtly. The biggest missed opportunity is writing 2.0 like 1.x: looping with .apply() or a Python for instead of passing the whole array. If you use GeoPandas DataFrames Explained, you already get 2.0 vectorization for column operations for free.
Prerequisites
shapely>=2.0numpy>=1.26geopandas>=0.14(optional, for the DataFrame comparison)
conda install -c conda-forge "shapely=2.0.*" "numpy=1.26.*" "geopandas=0.14.*"
Step-by-Step Implementation
1. The 1.x idiom — a Python loop over objects (still valid, but slow at scale).
from shapely.geometry import Point
points = [Point(x, x) for x in range(1_000_000)]
# One GEOS round trip per object
buffers_loop = [p.buffer(10) for p in points]
areas_loop = [p.buffer(10).area for p in points]
2. The 2.0 idiom — operate on a NumPy array in one call.
import numpy as np
import shapely
pts = shapely.points(np.arange(1_000_000), np.arange(1_000_000))
buffers_vec = shapely.buffer(pts, 10) # whole array, C-level loop
areas_vec = shapely.area(buffers_vec) # returns a NumPy array
3. Vectorized predicates replace per-pair loops.
import numpy as np
import shapely
a = shapely.points(np.random.rand(100_000), np.random.rand(100_000))
boundary = shapely.box(0.2, 0.2, 0.8, 0.8)
inside = shapely.contains(boundary, a) # boolean NumPy array, one call
print("Inside count:", int(inside.sum()))
4. Inside GeoPandas it is automatic — column operations already vectorize.
import geopandas as gpd
parcels = gpd.read_file("parcels.gpkg").to_crs(epsg=25832)
parcels["area_m2"] = parcels.geometry.area # Shapely 2.0 vectorized under the hood
Verification
Confirm the vectorized path gives identical results and is materially faster.
import numpy as np, shapely, time
from shapely.geometry import Point
n = 200_000
xs = np.arange(n, dtype=float)
t0 = time.perf_counter()
loop_areas = np.array([Point(x, x).buffer(5).area for x in xs])
t_loop = time.perf_counter() - t0
t0 = time.perf_counter()
vec_areas = shapely.area(shapely.buffer(shapely.points(xs, xs), 5))
t_vec = time.perf_counter() - t0
assert np.allclose(loop_areas, vec_areas), "Vectorized result must match the loop"
print(f"loop: {t_loop:.2f}s vectorized: {t_vec:.2f}s speedup: {t_loop/t_vec:.0f}x")
# loop: 1.93s vectorized: 0.07s speedup: 28x
Edge Cases & Debugging
AttributeErrorafter upgrading. Renamed/removed 1.x helpers (e.g.cascaded_union→shapely.union_all); update the import.- Iterating a multipolygon changed. Use
shapely.get_parts()to access components in 2.0. - Still slow on 2.0. You're looping with
.apply()/for; pass the whole array to ashapely.*function instead. - Mixing object methods and module functions. Both work, but only the array functions vectorize; be consistent in hot loops.
None/empty geometries in an array. Vectorized functions propagateNone; filter or handle withshapely.is_missing/is_empty.- GeoPandas pins an older Shapely. Ensure the environment resolved
shapely>=2.0; checkshapely.__version__.