GeoPandas vs Fiona for Large Files

GeoPandas loads an entire layer into a DataFrame; Fiona streams it feature by feature. For files that fit in memory the first is far more convenient, but past that point Fiona's iterator is what keeps a process alive. This guide compares the two for large-file I/O and shows how to combine them. It is for anyone hitting MemoryError on a big read. It sits under GeoPandas DataFrames Explained in Mastering Core Geospatial Python Libraries.

GeoPandas trades memory for vectorized convenience; Fiona trades convenience for a flat memory profile at any size.

Why This Approach / What Goes Wrong

gpd.read_file() materializes every feature, geometry, and attribute as a DataFrame — wonderful for vectorized analysis, fatal when the file is larger than RAM. Fiona (the same GDAL/OGR layer GeoPandas reads through) exposes features as a lazy iterator: you process one at a time, so memory stays flat regardless of file size. The right pattern for large files is rarely "all Fiona" or "all GeoPandas" — it is to stream-filter with Fiona down to the subset you care about, then hand that subset to GeoPandas for the vectorized work. The mistake is loading the whole file just to keep 2% of it. Modern GeoPandas can also push a bounding-box or attribute filter into the read via pyogrio, which closes much of the gap.

Prerequisites

geopandas>=0.14
fiona>=1.9
shapely>=2.0

conda install -c conda-forge "geopandas=0.14.*" "fiona=1.9.*" "shapely=2.0.*"

Step-by-Step Implementation

1. The convenient path (fits in memory): GeoPandas with a pushed-down filter.

import geopandas as gpd

# Read only features intersecting a bbox — pyogrio filters during the read
aoi = (7.6, 45.0, 7.8, 45.1)   # xmin, ymin, xmax, ymax in the file's CRS
city_parcels = gpd.read_file("national_parcels.fgb", bbox=aoi)
print(len(city_parcels), "features loaded")

2. The streaming path (exceeds memory): Fiona iterator, flat memory.

import fiona
from shapely.geometry import shape

# Keep only commercial parcels from a file too large to load
kept = []
with fiona.open("national_parcels.fgb") as src:
    src_crs = src.crs
    for feature in src:                     # one feature at a time
        if feature["properties"].get("use") == "commercial":
            kept.append(feature)
print(f"Filtered {len(kept)} of {len(src)} features")

3. Hand the filtered survivors to GeoPandas for vectorized analysis.

import geopandas as gpd

commercial = gpd.GeoDataFrame.from_features(kept, crs=src_crs)
commercial = commercial.to_crs(commercial.estimate_utm_crs())
commercial["area_m2"] = commercial.geometry.area    # vectorized, fast

Verification

Confirm the streaming filter held memory flat and produced the same result a full load would.

import fiona

# Count matches by streaming (no full load) and compare to the kept set
with fiona.open("national_parcels.fgb") as src:
    streamed = sum(1 for f in src if f["properties"].get("use") == "commercial")

print("Streamed match count:", streamed)        # Streamed match count: 18254
assert streamed == len(commercial), "Filter mismatch between Fiona and GeoDataFrame"
assert commercial.crs is not None

Edge Cases & Debugging

MemoryError on read_file. The file exceeds RAM; switch to the Fiona stream or a bbox/where filtered read.
Streaming is slow. Per-feature Python iteration is inherently slower than vectorized C; filter aggressively, then vectorize the remainder.
CRS lost via from_features. Pass crs=src.crs explicitly when building the GeoDataFrame.
bbox= not filtering. Ensure the GeoPandas build uses pyogrio (the default in 0.14+); the legacy engine ignores some pushdowns.
Attribute filter at read time. Prefer gpd.read_file(path, where="use='commercial'") (SQL-style) to filter in GDAL rather than in Python.
Memory still climbs while streaming. You appended full feature dicts; keep only the fields you need, or write survivors straight to disk.