Windowed Reads from Cloud Optimized GeoTIFF
A Cloud Optimized GeoTIFF lets you read a 512×512 patch from a 40 GB raster on object storage without downloading the file. This guide shows the windowed-read pattern with Rasterio, including reading by geographic coordinates rather than pixel offsets. It is for anyone working with satellite imagery, orthophotos, or DEMs too large to localize. It sits under Cloud-Native Geospatial Formats in Geospatial Data Ingestion & Processing Workflows.
Why This Approach / What Goes Wrong
A COG is internally tiled with overviews, and its header maps each tile to a byte range. Rasterio (via GDAL's /vsicurl/) reads a Window by issuing HTTP range requests for just the overlapping tiles — seconds and megabytes instead of minutes and gigabytes. Two things defeat this. First, the file must actually be a COG; a plain striped GeoTIFF forces GDAL to read sequentially, so a "windowed" read still transfers most of the file. Second, you must convert geographic bounds to a pixel Window using the dataset's affine transform — guessing pixel offsets reads the wrong area. The CRS of your bounds must match the raster, or the window lands somewhere else entirely.
Prerequisites
rasterio>=1.3(bundles GDAL with/vsicurl/support)pyproj>=3.4for reprojecting query bounds when needed
conda install -c conda-forge "rasterio=1.3.*" "pyproj=3.4.*"
Step-by-Step Implementation
1. Configure GDAL for efficient remote reads, then open the COG.
import rasterio
from rasterio.env import Env
gdal_opts = {
"GDAL_DISABLE_READDIR_ON_OPEN": "EMPTY_DIR", # don't list the bucket
"CPL_VSIL_CURL_ALLOWED_EXTENSIONS": ".tif",
}
cog_url = "https://example-bucket.s3.amazonaws.com/sentinel_ortho_cog.tif"
with Env(**gdal_opts):
with rasterio.open(cog_url) as src:
print(src.profile["driver"], src.shape, src.crs) # GTiff (20000, 20000) EPSG:32633
2. Convert a geographic bounding box to a pixel window. Reproject the bounds into the raster's CRS first.
from rasterio.windows import from_bounds
from pyproj import Transformer
# Area of interest in EPSG:4326 (lon/lat)
aoi_4326 = (13.35, 52.50, 13.42, 52.54)
with Env(**gdal_opts), rasterio.open(cog_url) as src:
# Reproject AOI bounds to the raster CRS (always_xy keeps lon/lat order)
to_raster = Transformer.from_crs("EPSG:4326", src.crs, always_xy=True)
xmin, ymin = to_raster.transform(aoi_4326[0], aoi_4326[1])
xmax, ymax = to_raster.transform(aoi_4326[2], aoi_4326[3])
window = from_bounds(xmin, ymin, xmax, ymax, transform=src.transform)
patch = src.read(1, window=window)
patch_transform = src.window_transform(window)
3. Persist the patch as its own small GeoTIFF, carrying the correct transform.
import rasterio
with Env(**gdal_opts), rasterio.open(cog_url) as src:
profile = src.profile.copy()
profile.update(
width=patch.shape[1], height=patch.shape[0], transform=patch_transform, count=1
)
with rasterio.open("aoi_patch.tif", "w", **profile) as dst:
dst.write(patch, 1)
Verification
Confirm the window is the expected size and the patch is georeferenced where you asked.
import rasterio
print("Patch shape:", patch.shape) # Patch shape: (148, 312)
assert patch.size > 0, "Empty window — bounds may miss the raster extent"
with rasterio.open("aoi_patch.tif") as check:
left, bottom, right, top = check.bounds
print("Patch bounds (raster CRS):", round(left), round(bottom), round(right), round(top))
assert check.crs == rasterio.crs.CRS.from_epsg(32633)
# The bounds should bracket the reprojected AOI corners from step 2
Edge Cases & Debugging
- "Windowed" read still slow. The file isn't a real COG; validate with
rio cogeo validate file.tifand re-encode withrio cogeo create. - Empty or tiny patch. Bounds are in the wrong CRS or outside the raster; reproject the AOI to
src.crs(step 2). - Patch georeferenced wrongly. You wrote it with the source transform instead of
window_transform(window). - 403 on the URL. Credentials/headers not set for the bucket; configure
AWS_*orGDAL_HTTP_HEADERS. - Reading at full resolution when you need an overview. Pass
out_shapetoread()to pull a decimated overview level cheaply. - Edge windows clipped.
from_boundscan produce out-of-range offsets; intersect with the dataset window or passboundless=True.