Custom categories¶

A category is one logical dataset. Each one is written out as a zip per output format (gpkg, shp, geojson, kml). You write its SELECT and WHERE in plain SQL.

Anatomy¶

- name: Buildings
  formats: [gpkg, shp, kml]         # optional override of output.formats
  hdx:
    title: Buildings of Nepal
    notes: ...
    tags: [buildings, geodata]
    license: ODbL 1.0
  overture:
    enabled: true
    theme: buildings
    feature_type: building
    select:
      - id
      - names.primary AS name
      - height
      - num_floors
    where: []
  osm:
    enabled: true
    filter:
      building: true
    select:
      - feature_id AS id
      - tags['name'] AS name
      - tags['height'] AS height
    where: []

When both overture.enabled and osm.enabled are true, both sources contribute resources to the same HDX dataset.

Mental model¶

Field	Where it runs	Purpose
`osm.filter`	quackosm (PBF -> parquet)	which OSM features enter cache
`osm.select`	DuckDB SQL on cached parquet	columns to keep
`osm.where`	DuckDB SQL on cached parquet	extra row filter (rarely needed)
`overture.select`	DuckDB SQL on Overture S3 read	columns to keep
`overture.where`	DuckDB SQL on Overture S3 read	row filter

OSM source schema¶

quackosm produces parquet with three columns:

feature_id (varchar): OSM type + id, e.g. node/12345
tags (MAP<VARCHAR, VARCHAR>): all retained tags
geometry

tags['key'] returns the scalar value (or NULL when the tag is absent). See the OSM Map Features wiki for available keys.

osm.filter accepts the quackosm tag-filter shape:

osm:
  filter:
    building: true                          # any value
    highway: ["primary", "secondary"]       # only these values
    amenity: ["hospital", "clinic"]

Overture source schema¶

Overture data lives at s3://overturemaps-us-west-2/release/<release>/theme=<theme>/type=<feature_type>/.

Theme	Feature type	Notable columns
`addresses`	`address`	`id`, `country`, `postcode`, `street`, `number`, `unit`
`base`	`bathymetry`	`id`, `depth`
`base`	`infrastructure`	`id`, `names`, `subtype`, `class`
`base`	`land`	`id`, `names`, `subtype`, `class`
`base`	`land_cover`	`id`, `subtype`, `cartography.{min,max}_zoom`
`base`	`land_use`	`id`, `names`, `subtype`, `class`, `surface`
`base`	`water`	`id`, `names`, `subtype`, `class`, `is_salt`, `wikidata`
`buildings`	`building`	`id`, `names`, `class`, `subtype`, `height`, `num_floors`, `roof_*`
`buildings`	`building_part`	`id`, `height`, `num_floors`
`divisions`	`division`	`id`, `names`, `subtype`, `country`, `region`, `population`
`divisions`	`division_area`	`id`, `names`, `subtype`, `country`, `region`
`divisions`	`division_boundary`	`id`, `subtype`, `class`
`places`	`place`	`id`, `names`, `categories`, `addresses`, `phones`, `websites`
`transportation`	`connector`	`id`
`transportation`	`segment`	`id`, `names`, `class`, `subclass`, `subtype`, `road_surface`

Authoritative reference: overturemaps.org/schema.

Overture renames types across releases (e.g. boundary -> division_boundary in 2026-04-15.0). Pin source.overture.release if you depend on a specific schema.

Three ways to plug in a category set¶

Inline in the country YAML. Drop a categories: block; it replaces the bundled defaults wholesale.
Reference a separate file with categories_file: path/to/schema.yaml. That YAML must contain a top-level categories: list.
Both. categories_file loads the base set; an inline categories: block then overrides for that country.

Bundled schemas¶

File	What it produces
`configs/examples/hot-schema.yaml`	12 layers matching HOT's HDX exports
`configs/examples/overture-package-schema.yaml`	15 datasets, one per Overture theme/type
`configs/examples/custom-schema.yaml`	Minimal 2-category example

The bundled defaults at src/oex/defaults/base.yaml ship the eight-theme combo (Buildings, Roads, Hospitals, Schools, Rivers, Land Use, Transportation Hubs, Settlements) drawn from both sources.

Optional per-category features¶

- name: Buildings
  transliterate:
    - target: name_latin
      source: name
      prefer: name_en

transliterate adds a Latin-script display column. When prefer is non-null it is used verbatim; otherwise source is transliterated via unidecode.

Optional run-level features¶

output:
  report:
    enabled: true            # writes report.html per category and (if hdx.push)
                             # attaches it as the dataset's interactive view

source:
  pcodes:
    enabled: true            # adds adm{N}_pcode and adm{N}_name columns
                             # from fieldmaps.io edge-matched humanitarian

hdx:
  purge_existing_resources: true   # destructive: clears the dataset before
                                   # uploading fresh resources