Shapely 1.x vs Shapely 2.0 Vectorization

Shapely 2.0 was a ground-up rewrite around a NumPy-style vectorized API, and it changes how fast geometry operations run and how you should write them. Code that loops over geometries the 1.x way still works, but leaves most of the speedup on the table. This guide compares the two and shows the vectorized idioms. It is for anyone whose Shapely-heavy code feels slow. It sits under Shapely Geometry Operations in Mastering Core Geospatial Python Libraries.

Shapely 1.x versus 2.0 vectorization Shapely 1.x processes geometries one at a time through Python loops; Shapely 2.0 applies operations across whole NumPy arrays of geometries in C, with stable GEOS-backed performance. Shapely 1.x object-at-a-time Shapely 2.0 vectorized arrays Python for-loop per geometry shapely.area(arr) over all GEOS call overhead each item One C call, whole array .area / .buffer on objects shapely.* module functions Slower on large sets 5-100x faster bulk ops Immutable, but per-object Immutable + array-native GeoPandas 0.12+ already uses Shapely 2.0 vectorization under the hood
Shapely 2.0 moves the loop into C: operations apply across arrays of geometries in a single call.

Why This Approach / What Goes Wrong

In Shapely 1.x, every operation went through a Python method on a single geometry object, so processing a million polygons meant a million Python-level GEOS calls. Shapely 2.0 exposes module-level functions (shapely.area, shapely.buffer, shapely.intersects) that accept NumPy arrays of geometries and execute the loop in C, often 5–100× faster on bulk work. The catch during migration: a handful of 1.x behaviors changed — iterating a multipart geometry, the .ctypes/array interface, and some cascaded_union-era names — so old code can break subtly. The biggest missed opportunity is writing 2.0 like 1.x: looping with .apply() or a Python for instead of passing the whole array. If you use GeoPandas DataFrames Explained, you already get 2.0 vectorization for column operations for free.

Prerequisites

conda install -c conda-forge "shapely=2.0.*" "numpy=1.26.*" "geopandas=0.14.*"

Step-by-Step Implementation

1. The 1.x idiom — a Python loop over objects (still valid, but slow at scale).

from shapely.geometry import Point

points = [Point(x, x) for x in range(1_000_000)]
# One GEOS round trip per object
buffers_loop = [p.buffer(10) for p in points]
areas_loop = [p.buffer(10).area for p in points]

2. The 2.0 idiom — operate on a NumPy array in one call.

import numpy as np
import shapely

pts = shapely.points(np.arange(1_000_000), np.arange(1_000_000))
buffers_vec = shapely.buffer(pts, 10)        # whole array, C-level loop
areas_vec = shapely.area(buffers_vec)        # returns a NumPy array

3. Vectorized predicates replace per-pair loops.

import numpy as np
import shapely

a = shapely.points(np.random.rand(100_000), np.random.rand(100_000))
boundary = shapely.box(0.2, 0.2, 0.8, 0.8)
inside = shapely.contains(boundary, a)       # boolean NumPy array, one call
print("Inside count:", int(inside.sum()))

4. Inside GeoPandas it is automatic — column operations already vectorize.

import geopandas as gpd

parcels = gpd.read_file("parcels.gpkg").to_crs(epsg=25832)
parcels["area_m2"] = parcels.geometry.area   # Shapely 2.0 vectorized under the hood

Verification

Confirm the vectorized path gives identical results and is materially faster.

import numpy as np, shapely, time
from shapely.geometry import Point

n = 200_000
xs = np.arange(n, dtype=float)

t0 = time.perf_counter()
loop_areas = np.array([Point(x, x).buffer(5).area for x in xs])
t_loop = time.perf_counter() - t0

t0 = time.perf_counter()
vec_areas = shapely.area(shapely.buffer(shapely.points(xs, xs), 5))
t_vec = time.perf_counter() - t0

assert np.allclose(loop_areas, vec_areas), "Vectorized result must match the loop"
print(f"loop: {t_loop:.2f}s   vectorized: {t_vec:.2f}s   speedup: {t_loop/t_vec:.0f}x")
# loop: 1.93s   vectorized: 0.07s   speedup: 28x

Edge Cases & Debugging