Skip to content

API Reference

oex

oex: country-scale OSM and Overture vector exports.

__version__ module-attribute

__version__ = '0.3.0'

__all__ module-attribute

__all__ = [
    "BoundaryConfig",
    "CategoryConfig",
    "DuckdbConfig",
    "ExportResult",
    "Exporter",
    "HdxConfig",
    "LoggingConfig",
    "OsmSourceConfig",
    "OutputConfig",
    "OvertureSourceConfig",
    "ParallelConfig",
    "PcodesSourceConfig",
    "RootConfig",
    "__version__",
]

BoundaryConfig dataclass

CategoryConfig dataclass

DuckdbConfig dataclass

HdxConfig dataclass

LoggingConfig dataclass

OsmSourceConfig dataclass

OutputConfig dataclass

OvertureSourceConfig dataclass

ParallelConfig dataclass

PcodesSourceConfig dataclass

RootConfig dataclass

Exporter

ExportResult dataclass

boundary

Country boundary resolution: user-supplied geom or geoBoundaries ADM0.

cli

Typer CLI for oex.

cmd_overture

cmd_overture(
    iso3_or_yaml: str | None = typer.Argument(
        None,
        help="ISO3 like NPL, or name of a YAML in ./configs/ (prefer --iso3)",
    ),
    theme: str | None = typer.Argument(
        None,
        help="Optional theme override (e.g. buildings)",
    ),
    configs_dir: Path | None = typer.Option(
        None,
        "--configs-dir",
        help="Run every YAML in this directory",
    ),
    config: Path | None = typer.Option(
        None,
        "--config",
        "-c",
        help="Explicit config YAML path",
    ),
    iso3: str | None = typer.Option(
        None,
        "--iso3",
        help="ISO3 country code (e.g. NPL, COD). Overrides the positional argument and YAML.",
    ),
    dataset_name: str | None = typer.Option(
        None,
        "--dataset-name",
        help="Free-form area label used as the {country} substitution in hdx.title_template. Set this to fix pycountry inversions (e.g. DRC) or for sub-national exports.",
    ),
    output_dir: Path | None = typer.Option(
        None, "--output-dir", "-o"
    ),
    hdx_push: bool | None = typer.Option(
        None, "--hdx-push/--no-hdx-push"
    ),
    hdx_purge: bool | None = typer.Option(
        None,
        "--hdx-purge/--no-hdx-purge",
        help="Destructive: delete every existing resource on the dataset before upload.",
    ),
) -> None

Export Overture data.

cmd_osm

cmd_osm(
    iso3_or_yaml: str | None = typer.Argument(
        None,
        help="ISO3 like NPL, or name of a YAML in ./configs/ (prefer --iso3)",
    ),
    theme: str | None = typer.Argument(
        None,
        help="Optional theme override (e.g. buildings)",
    ),
    configs_dir: Path | None = typer.Option(
        None, "--configs-dir"
    ),
    config: Path | None = typer.Option(
        None, "--config", "-c"
    ),
    iso3: str | None = typer.Option(
        None,
        "--iso3",
        help="ISO3 country code (e.g. NPL, COD). Overrides the positional argument and YAML.",
    ),
    dataset_name: str | None = typer.Option(
        None,
        "--dataset-name",
        help="Free-form area label used as the {country} substitution in hdx.title_template. Set this to fix pycountry inversions (e.g. DRC) or for sub-national exports.",
    ),
    output_dir: Path | None = typer.Option(
        None, "--output-dir", "-o"
    ),
    hdx_push: bool | None = typer.Option(
        None, "--hdx-push/--no-hdx-push"
    ),
    hdx_purge: bool | None = typer.Option(
        None,
        "--hdx-purge/--no-hdx-purge",
        help="Destructive: delete every existing resource on the dataset before upload.",
    ),
    engine: str | None = typer.Option(
        None,
        "--engine",
        help="OSM engine: 'geofabrik' (default) or 'planet'",
    ),
    download_if_missing: bool | None = typer.Option(
        None,
        "--download-if-missing/--no-download-if-missing",
        help="When the planet path is missing, download the ~87 GB planet PBF before running. Overrides source.osm.auto_download_planet.",
    ),
    resume: bool | None = typer.Option(
        None,
        "--resume/--no-resume",
        help="Skip categories already built and uploaded according to the state file. Default: enabled (configurable via output.resume).",
    ),
) -> None

Export OSM data via the configured engine.

cmd_osm_build_cache

cmd_osm_build_cache(
    config: Path | None = typer.Option(
        None,
        "--config",
        "-c",
        help="Config providing source.osm settings",
    ),
) -> None

Download the planet OSM PBF to <cache_dir>/_pbf/planet-latest.osm.pbf.

Per-country extraction from that PBF happens lazily in oex-cli osm <ISO3> when source.osm.engine is planet (or geofabrik with planet_fallback: true).

config

Typed configuration loading.

ConfigError

Bases: ValueError

Raised when a configuration is malformed.

apply_overrides

apply_overrides(
    cfg: RootConfig, overrides: dict[str, Any]
) -> RootConfig

Apply a dict of dotted overrides to an already-loaded config.

load_config

load_config(
    user_config: str | PathLike[str] | None = None,
    overrides: list[str] | None = None,
) -> RootConfig

Build a RootConfig. categories precedence: defaults < categories_file < inline categories:.

select_categories

select_categories(
    cfg: RootConfig, theme: str | None
) -> RootConfig

Restrict the config to a single category whose slugified name matches theme.

loader

Layered YAML config: bundled defaults < user YAML < dotlist overrides.

ConfigError

Bases: ValueError

Raised when a configuration is malformed.

load_config
load_config(
    user_config: str | PathLike[str] | None = None,
    overrides: list[str] | None = None,
) -> RootConfig

Build a RootConfig. categories precedence: defaults < categories_file < inline categories:.

apply_overrides
apply_overrides(
    cfg: RootConfig, overrides: dict[str, Any]
) -> RootConfig

Apply a dict of dotted overrides to an already-loaded config.

select_categories
select_categories(
    cfg: RootConfig, theme: str | None
) -> RootConfig

Restrict the config to a single category whose slugified name matches theme.

schema

Typed run configuration.

defaults

Bundled default YAML configuration.

duckdb_session

exporter

Per-category export loop, shared by Overture and OSM sources.

hdx_publisher

HDX dataset and resource publication. Imports hdx-python-api lazily.

locale

Resolve a country's OSM name:<lang> tags from its ISO3 code via babel.

primary_osm_language

primary_osm_language(iso3: str) -> str | None

First non-English official language for the country, or None.

local_osm_languages

local_osm_languages(iso3: str) -> list[str]

Up to three non-English official languages for the country.

Babel sometimes lists English first for multilingual countries (Sudan, Philippines), so English is dropped: name_en is already covered by the schema's static select.

logging_setup

Idempotent root logger setup.

metadata

Per-dataset metadata report (feature counts, geom types, bbox, column stats).

osm

OSM source: planet PBF download + quackosm conversion + per-country query.

OsmRunner

Bases: SourceRunner

build_cache

PBF -> per-theme GeoParquet cache via quackosm.

build_cache
build_cache(
    cfg: RootConfig,
    pbf_path: str | Path,
    *,
    cache_root: Path,
    snapshot: str | None = None,
    themes_filter: list[str] | None = None,
    geometry_filter: BaseGeometry | None = None,
) -> CacheManifest

Materialise //.parquet for each enabled category.

category_filter

Translate oex CategoryConfig.osm.filter into quackosm filters and SQL.

Two consumers: - planet engine prep: build a single union OsmTagsFilter from N categories for the one-pass quackosm call. - planet engine query_for: build the per-category SQL WHERE predicate that picks just one category's features from the unified country.parquet.

union_tag_filter
union_tag_filter(
    categories: Iterable[CategoryConfig],
) -> OsmTagsFilter

Merge N category osm.filter blocks into one quackosm OsmTagsFilter.

Rules: - Any True wins for a key (any-value match). - list+list -> sorted union; list+str -> list with str added; str+str -> list of both.

category_where_predicate
category_where_predicate(category: CategoryConfig) -> str

SQL WHERE clause matching this category's osm.filter on tags MAP.

Returns a parenthesised expression suitable for AND'ing into a larger WHERE. Empty filter -> "TRUE" (matches all).

extract

osmium-tool subprocess wrappers for polygon-based PBF extraction.

The planet engine uses osmium-tool's extract command to clip a country PBF out of a planet PBF using a 5km-buffered admin polygon. We shell out because pyosmium does not expose extract --strategy=complete_ways and reimplementing the multi-pass strategy in Python is out of scope.

OsmiumNotInstalledError

Bases: RuntimeError

osmium binary not found on PATH.

OsmiumExtractError

Bases: RuntimeError

osmium extract exited non-zero.

osmium_polygon_extract
osmium_polygon_extract(
    pbf_path: Path,
    polygon_geojson: dict[str, Any],
    out_pbf: Path,
    *,
    strategy: ExtractStrategy = "complete_ways",
) -> None

Clip pbf_path to polygon_geojson, write to out_pbf.

Polygon vertex count is engineered away by osmium's banded algorithm, so we pass the full-precision boundary (no simplification needed).

fetch_planet

OSM PBF download with HTTP Range resume and optional md5 verification.

geofabrik

Geofabrik country-PBF URL lookup via the public index-v1.json.

GeofabrikLookupError

Bases: LookupError

Raised when the index does not contain a country-level extract.

GeofabrikUnavailableError

Bases: GeofabrikLookupError

Geofabrik does not publish a country-level PBF for this ISO3.

Distinct subclass so callers can catch precisely (e.g. for planet_fallback) without swallowing other Geofabrik errors like network failures.

runner

OSM source runner.

Two engines, one unified pipeline. Both produce a single country.parquet per (iso3, snapshot) by running quackosm once with the union of all category tag filters and keep_all_tags=True. Per-category extraction is a tag-predicate WHERE at query time, no per-category PBF reparse.

  • geofabrik: download per-country PBF from Geofabrik, then build the country parquet. Cache: <cache>/geofabrik/<iso3>/<snapshot>/country.parquet.

  • planet: clip a country PBF out of a local planet PBF via osmium-tool, then build the country parquet. Cache: <cache>/planet/<iso3>/<snapshot>/country.parquet.

OsmRunner

Bases: SourceRunner

overture

Overture Maps source: query the public S3 release bucket via DuckDB.

resolve_release

resolve_release(
    release: str, *, bucket: str = "overturemaps-us-west-2"
) -> str

Return a concrete release version, resolving "latest" via S3 listing.

runner

Overture source runner: DuckDB httpfs read from s3://overturemaps-us-west-2.

resolve_release
resolve_release(
    release: str, *, bucket: str = "overturemaps-us-west-2"
) -> str

Return a concrete release version, resolving "latest" via S3 listing.

pcodes

P-code tagging via fieldmaps.io edge-matched humanitarian admin polygons.

cache

Fetch and cache fieldmaps.io edge-matched admin parquets.

tagger

Pcode tagging via H3 integer hash join at resolution 7. Boundary residuals (~1-5% of features whose centroid H3 cell isn't owned by any admin) are resolved by either a 1-ring H3 neighbour hash lookup (default, memory-bounded) or a GEOS ST_Contains spatial join (precise but can OOM on large countries).

parse_boundary_resolution
parse_boundary_resolution(value: str) -> BoundaryResolution

Validate and narrow a config string to BoundaryResolution. Fails loud on typos.

preflight

Pre-run sanity checks. Fail loud before doing any expensive work.

PreflightError

Bases: RuntimeError

A required precondition is not satisfied.

check_writable_paths

check_writable_paths(cfg: RootConfig) -> None

Verify every directory the run needs to write to is writable.

Catches read-only filesystems and permission errors before downloading PBFs or running quackosm. Tests by creating, writing, then deleting a tiny temp file in each candidate path.

report

Multi-source HTML report rendered from per-source metadata.json payloads.

html

Multi-source HTML report renderer.

s3

Upload artifacts to S3 and return a public URL for HDX linking.

sources

Per-source query builders.

A source knows how to expose a parquet read expression and a metadata block for a given category. The shared exporter then does the bbox/clip/select/ write/zip steps in a uniform way.

CategorySkippedError

Bases: RuntimeError

Raised by a source when a category is not applicable to it.

SourceRunner

Bases: ABC

peek_snapshot_label
peek_snapshot_label(cfg: RootConfig) -> str | None

Best-effort snapshot label without doing network work or running prepare.

Returns the label the runner would adopt in prepare(), or None if it can't be determined cheaply. Used by the exporter to short-circuit the run when every category is already uploaded for that label.

base

Abstract source interface.

CategorySkippedError

Bases: RuntimeError

Raised by a source when a category is not applicable to it.

SourceRunner

Bases: ABC

peek_snapshot_label
peek_snapshot_label(cfg: RootConfig) -> str | None

Best-effort snapshot label without doing network work or running prepare.

Returns the label the runner would adopt in prepare(), or None if it can't be determined cheaply. Used by the exporter to short-circuit the run when every category is already uploaded for that label.

sql

SELECT/WHERE clause builders + materialise(); shared by both sources.

build_where_clause

build_where_clause(
    boundary: Boundary,
    where_conditions: list[str],
    bbox_cols: str,
) -> str

Combine bbox prune + boundary intersect + caller-supplied conditions.

bbox_cols="bbox" uses an upstream bbox struct (Overture); "geom" derives the bbox from the geometry column (OSM cache).

state

Per-(country, source) resume state, atomic-write JSON.

A run keeps a single .state.json per (output_dir, iso3, source) recording, for each category, when the local build finished and when the HDX upload completed. With output.resume enabled the exporter consults this to skip already-finished work after a partial run, and HDX rate-limit storms become recoverable without rebuilding zips.

State is keyed by category slug. A snapshot label mismatch (different PBF) is treated as a miss so a fresh snapshot always rebuilds.

StateStore

Read/write the per-(iso3, source) resume state JSON, atomically.

system

Defaults for thread count and DuckDB memory limit, derived from psutil.

total_memory_gb

total_memory_gb() -> float

Return effective memory in GB.

OEX_MEMORY_GB env var overrides psutil (use this inside Docker where --memory sets the container limit but psutil reads the host RAM).

adaptive_parallel_resources

adaptive_parallel_resources() -> tuple[int, int]

Compute (parallel_workers, memory_gb_per_worker) scaled to total system RAM.

Always returns 1 worker. DuckDB's intra-query pipeline engine parallelises every operation (joins, scans, aggregations) across all CPU cores within one session. Concurrent sessions split the RAM budget with zero cross-session coordination and OOM-kill each other on large countries (BRA, IND, CHN).

Memory: 60% of total RAM, DuckDB's recommended safe fraction for a single session. Leaves headroom for GDAL write allocations, string heaps, and spatial index structures that bypass the buffer manager.

Uses total memory (cgroup-aware on Linux >= 5.0, so a Docker container with --memory set reports the container limit here).

translit

DuckDB-side transliteration to Latin via unidecode.

engine

Add Latin display columns to a materialised table via unidecode.

writers

GIS format writers (gpkg, shp, geojson) over materialised DuckDB tables.

zip_bundle

Per-format zip bundles with README, config snapshot, and optional metadata.