API Reference¶
oex ¶
oex: country-scale OSM and Overture vector exports.
__all__
module-attribute
¶
__all__ = [
"BoundaryConfig",
"CategoryConfig",
"DuckdbConfig",
"ExportResult",
"Exporter",
"HdxConfig",
"LoggingConfig",
"OsmSourceConfig",
"OutputConfig",
"OvertureSourceConfig",
"ParallelConfig",
"PcodesSourceConfig",
"RootConfig",
"__version__",
]
BoundaryConfig
dataclass
¶
CategoryConfig
dataclass
¶
DuckdbConfig
dataclass
¶
HdxConfig
dataclass
¶
LoggingConfig
dataclass
¶
OsmSourceConfig
dataclass
¶
OutputConfig
dataclass
¶
OvertureSourceConfig
dataclass
¶
ParallelConfig
dataclass
¶
PcodesSourceConfig
dataclass
¶
RootConfig
dataclass
¶
Exporter ¶
ExportResult
dataclass
¶
boundary ¶
Country boundary resolution: user-supplied geom or geoBoundaries ADM0.
cli ¶
Typer CLI for oex.
cmd_overture ¶
cmd_overture(
iso3_or_yaml: str | None = typer.Argument(
None,
help="ISO3 like NPL, or name of a YAML in ./configs/ (prefer --iso3)",
),
theme: str | None = typer.Argument(
None,
help="Optional theme override (e.g. buildings)",
),
configs_dir: Path | None = typer.Option(
None,
"--configs-dir",
help="Run every YAML in this directory",
),
config: Path | None = typer.Option(
None,
"--config",
"-c",
help="Explicit config YAML path",
),
iso3: str | None = typer.Option(
None,
"--iso3",
help="ISO3 country code (e.g. NPL, COD). Overrides the positional argument and YAML.",
),
dataset_name: str | None = typer.Option(
None,
"--dataset-name",
help="Free-form area label used as the {country} substitution in hdx.title_template. Set this to fix pycountry inversions (e.g. DRC) or for sub-national exports.",
),
output_dir: Path | None = typer.Option(
None, "--output-dir", "-o"
),
hdx_push: bool | None = typer.Option(
None, "--hdx-push/--no-hdx-push"
),
hdx_purge: bool | None = typer.Option(
None,
"--hdx-purge/--no-hdx-purge",
help="Destructive: delete every existing resource on the dataset before upload.",
),
) -> None
Export Overture data.
cmd_osm ¶
cmd_osm(
iso3_or_yaml: str | None = typer.Argument(
None,
help="ISO3 like NPL, or name of a YAML in ./configs/ (prefer --iso3)",
),
theme: str | None = typer.Argument(
None,
help="Optional theme override (e.g. buildings)",
),
configs_dir: Path | None = typer.Option(
None, "--configs-dir"
),
config: Path | None = typer.Option(
None, "--config", "-c"
),
iso3: str | None = typer.Option(
None,
"--iso3",
help="ISO3 country code (e.g. NPL, COD). Overrides the positional argument and YAML.",
),
dataset_name: str | None = typer.Option(
None,
"--dataset-name",
help="Free-form area label used as the {country} substitution in hdx.title_template. Set this to fix pycountry inversions (e.g. DRC) or for sub-national exports.",
),
output_dir: Path | None = typer.Option(
None, "--output-dir", "-o"
),
hdx_push: bool | None = typer.Option(
None, "--hdx-push/--no-hdx-push"
),
hdx_purge: bool | None = typer.Option(
None,
"--hdx-purge/--no-hdx-purge",
help="Destructive: delete every existing resource on the dataset before upload.",
),
engine: str | None = typer.Option(
None,
"--engine",
help="OSM engine: 'geofabrik' (default) or 'planet'",
),
download_if_missing: bool | None = typer.Option(
None,
"--download-if-missing/--no-download-if-missing",
help="When the planet path is missing, download the ~87 GB planet PBF before running. Overrides source.osm.auto_download_planet.",
),
resume: bool | None = typer.Option(
None,
"--resume/--no-resume",
help="Skip categories already built and uploaded according to the state file. Default: enabled (configurable via output.resume).",
),
) -> None
Export OSM data via the configured engine.
cmd_osm_build_cache ¶
cmd_osm_build_cache(
config: Path | None = typer.Option(
None,
"--config",
"-c",
help="Config providing source.osm settings",
),
) -> None
Download the planet OSM PBF to <cache_dir>/_pbf/planet-latest.osm.pbf.
Per-country extraction from that PBF happens lazily in oex-cli osm <ISO3>
when source.osm.engine is planet (or geofabrik with
planet_fallback: true).
config ¶
Typed configuration loading.
ConfigError ¶
Bases: ValueError
Raised when a configuration is malformed.
apply_overrides ¶
Apply a dict of dotted overrides to an already-loaded config.
load_config ¶
load_config(
user_config: str | PathLike[str] | None = None,
overrides: list[str] | None = None,
) -> RootConfig
Build a RootConfig. categories precedence: defaults < categories_file < inline categories:.
select_categories ¶
Restrict the config to a single category whose slugified name matches theme.
loader ¶
Layered YAML config: bundled defaults < user YAML < dotlist overrides.
ConfigError ¶
Bases: ValueError
Raised when a configuration is malformed.
load_config ¶
load_config(
user_config: str | PathLike[str] | None = None,
overrides: list[str] | None = None,
) -> RootConfig
Build a RootConfig. categories precedence: defaults < categories_file < inline categories:.
apply_overrides ¶
Apply a dict of dotted overrides to an already-loaded config.
select_categories ¶
Restrict the config to a single category whose slugified name matches theme.
schema ¶
Typed run configuration.
defaults ¶
Bundled default YAML configuration.
duckdb_session ¶
exporter ¶
Per-category export loop, shared by Overture and OSM sources.
hdx_publisher ¶
HDX dataset and resource publication. Imports hdx-python-api lazily.
locale ¶
Resolve a country's OSM name:<lang> tags from its ISO3 code via babel.
primary_osm_language ¶
First non-English official language for the country, or None.
local_osm_languages ¶
Up to three non-English official languages for the country.
Babel sometimes lists English first for multilingual countries (Sudan,
Philippines), so English is dropped: name_en is already covered by
the schema's static select.
logging_setup ¶
Idempotent root logger setup.
metadata ¶
Per-dataset metadata report (feature counts, geom types, bbox, column stats).
osm ¶
OSM source: planet PBF download + quackosm conversion + per-country query.
OsmRunner ¶
Bases: SourceRunner
build_cache ¶
PBF -> per-theme GeoParquet cache via quackosm.
build_cache ¶
build_cache(
cfg: RootConfig,
pbf_path: str | Path,
*,
cache_root: Path,
snapshot: str | None = None,
themes_filter: list[str] | None = None,
geometry_filter: BaseGeometry | None = None,
) -> CacheManifest
Materialise
category_filter ¶
Translate oex CategoryConfig.osm.filter into quackosm filters and SQL.
Two consumers: - planet engine prep: build a single union OsmTagsFilter from N categories for the one-pass quackosm call. - planet engine query_for: build the per-category SQL WHERE predicate that picks just one category's features from the unified country.parquet.
union_tag_filter ¶
Merge N category osm.filter blocks into one quackosm OsmTagsFilter.
Rules: - Any True wins for a key (any-value match). - list+list -> sorted union; list+str -> list with str added; str+str -> list of both.
category_where_predicate ¶
SQL WHERE clause matching this category's osm.filter on tags MAP.
Returns a parenthesised expression suitable for AND'ing into a larger WHERE. Empty filter -> "TRUE" (matches all).
extract ¶
osmium-tool subprocess wrappers for polygon-based PBF extraction.
The planet engine uses osmium-tool's extract command to clip a country
PBF out of a planet PBF using a 5km-buffered admin polygon. We shell out
because pyosmium does not expose extract --strategy=complete_ways and
reimplementing the multi-pass strategy in Python is out of scope.
OsmiumNotInstalledError ¶
Bases: RuntimeError
osmium binary not found on PATH.
OsmiumExtractError ¶
Bases: RuntimeError
osmium extract exited non-zero.
osmium_polygon_extract ¶
osmium_polygon_extract(
pbf_path: Path,
polygon_geojson: dict[str, Any],
out_pbf: Path,
*,
strategy: ExtractStrategy = "complete_ways",
) -> None
Clip pbf_path to polygon_geojson, write to out_pbf.
Polygon vertex count is engineered away by osmium's banded algorithm, so we pass the full-precision boundary (no simplification needed).
fetch_planet ¶
OSM PBF download with HTTP Range resume and optional md5 verification.
geofabrik ¶
Geofabrik country-PBF URL lookup via the public index-v1.json.
GeofabrikLookupError ¶
Bases: LookupError
Raised when the index does not contain a country-level extract.
GeofabrikUnavailableError ¶
Bases: GeofabrikLookupError
Geofabrik does not publish a country-level PBF for this ISO3.
Distinct subclass so callers can catch precisely (e.g. for planet_fallback) without swallowing other Geofabrik errors like network failures.
runner ¶
OSM source runner.
Two engines, one unified pipeline. Both produce a single
country.parquet per (iso3, snapshot) by running quackosm once with the
union of all category tag filters and keep_all_tags=True. Per-category
extraction is a tag-predicate WHERE at query time, no per-category PBF
reparse.
-
geofabrik: download per-country PBF from Geofabrik, then build the country parquet. Cache:<cache>/geofabrik/<iso3>/<snapshot>/country.parquet. -
planet: clip a country PBF out of a local planet PBF via osmium-tool, then build the country parquet. Cache:<cache>/planet/<iso3>/<snapshot>/country.parquet.
OsmRunner ¶
Bases: SourceRunner
overture ¶
Overture Maps source: query the public S3 release bucket via DuckDB.
resolve_release ¶
Return a concrete release version, resolving "latest" via S3 listing.
pcodes ¶
P-code tagging via fieldmaps.io edge-matched humanitarian admin polygons.
cache ¶
Fetch and cache fieldmaps.io edge-matched admin parquets.
tagger ¶
Pcode tagging via H3 integer hash join at resolution 7. Boundary residuals (~1-5% of features whose centroid H3 cell isn't owned by any admin) are resolved by either a 1-ring H3 neighbour hash lookup (default, memory-bounded) or a GEOS ST_Contains spatial join (precise but can OOM on large countries).
parse_boundary_resolution ¶
Validate and narrow a config string to BoundaryResolution. Fails loud on typos.
preflight ¶
Pre-run sanity checks. Fail loud before doing any expensive work.
PreflightError ¶
Bases: RuntimeError
A required precondition is not satisfied.
check_writable_paths ¶
Verify every directory the run needs to write to is writable.
Catches read-only filesystems and permission errors before downloading PBFs or running quackosm. Tests by creating, writing, then deleting a tiny temp file in each candidate path.
report ¶
Multi-source HTML report rendered from per-source metadata.json payloads.
html ¶
Multi-source HTML report renderer.
s3 ¶
Upload artifacts to S3 and return a public URL for HDX linking.
sources ¶
Per-source query builders.
A source knows how to expose a parquet read expression and a metadata block for a given category. The shared exporter then does the bbox/clip/select/ write/zip steps in a uniform way.
CategorySkippedError ¶
Bases: RuntimeError
Raised by a source when a category is not applicable to it.
SourceRunner ¶
Bases: ABC
peek_snapshot_label ¶
Best-effort snapshot label without doing network work or running prepare.
Returns the label the runner would adopt in prepare(), or None if it can't be determined cheaply. Used by the exporter to short-circuit the run when every category is already uploaded for that label.
base ¶
Abstract source interface.
CategorySkippedError ¶
Bases: RuntimeError
Raised by a source when a category is not applicable to it.
SourceRunner ¶
Bases: ABC
peek_snapshot_label ¶
Best-effort snapshot label without doing network work or running prepare.
Returns the label the runner would adopt in prepare(), or None if it can't be determined cheaply. Used by the exporter to short-circuit the run when every category is already uploaded for that label.
sql ¶
SELECT/WHERE clause builders + materialise(); shared by both sources.
build_where_clause ¶
Combine bbox prune + boundary intersect + caller-supplied conditions.
bbox_cols="bbox" uses an upstream bbox struct (Overture); "geom"
derives the bbox from the geometry column (OSM cache).
state ¶
Per-(country, source) resume state, atomic-write JSON.
A run keeps a single .state.json per (output_dir, iso3, source) recording,
for each category, when the local build finished and when the HDX upload
completed. With output.resume enabled the exporter consults this to
skip already-finished work after a partial run, and HDX rate-limit storms
become recoverable without rebuilding zips.
State is keyed by category slug. A snapshot label mismatch (different PBF) is treated as a miss so a fresh snapshot always rebuilds.
StateStore ¶
Read/write the per-(iso3, source) resume state JSON, atomically.
system ¶
Defaults for thread count and DuckDB memory limit, derived from psutil.
total_memory_gb ¶
Return effective memory in GB.
OEX_MEMORY_GB env var overrides psutil (use this inside Docker where --memory sets the container limit but psutil reads the host RAM).
adaptive_parallel_resources ¶
Compute (parallel_workers, memory_gb_per_worker) scaled to total system RAM.
Always returns 1 worker. DuckDB's intra-query pipeline engine parallelises every operation (joins, scans, aggregations) across all CPU cores within one session. Concurrent sessions split the RAM budget with zero cross-session coordination and OOM-kill each other on large countries (BRA, IND, CHN).
Memory: 60% of total RAM, DuckDB's recommended safe fraction for a single session. Leaves headroom for GDAL write allocations, string heaps, and spatial index structures that bypass the buffer manager.
Uses total memory (cgroup-aware on Linux >= 5.0, so a Docker container with --memory set reports the container limit here).
translit ¶
DuckDB-side transliteration to Latin via unidecode.
engine ¶
Add Latin display columns to a materialised table via unidecode.
writers ¶
GIS format writers (gpkg, shp, geojson) over materialised DuckDB tables.
zip_bundle ¶
Per-format zip bundles with README, config snapshot, and optional metadata.