Appearance
Census Import - Quick Start
Import census data from a ZIP archive containing an XML file and source images.
Basic Use Case
The most common use case is importing units with linked source images and address data. Each unit represents an address/location in the census, and images are the scanned pages of the original census documents.
Minimal Example
Download the minimal example: minimal-import.zip
ZIP Structure
minimal-import.zip
├── census.xml # Required: must be at root, exactly this name
└── images/
├── page_001.jpg
├── page_002.jpg
└── page_003.jpgXML Breakdown
The minimal XML consists of several sections. Each section is explained below with links to the full specification.
Root Element
xml
<?xml version="1.0" encoding="UTF-8"?>
<census_export>
<census id="...">
<!-- Census content -->
</census>
<units>
<!-- Units listed here -->
</units>
</census_export>The root element must be <census_export>. Inside it:
<census>contains metadata, source, specifications, and codebooks<units>,<subunits>,<entities>are at the root level (siblings of<census>)
How IDs Work
ID Mapping
All id attributes in the XML are temporary reference IDs used only during import. The system generates new database IDs for all created objects.
IDs are used to link elements together:
<page id="page-001">defines a page with ID "page-001"<page_ref id="page-001" />references that page<field id="field-001" />in a data value references a field definition<select id="entry-001">references a codebook entry
You can use any string as an ID (UUIDs, numbers, descriptive names). They just need to be consistent within the XML file.
Source & Pages
xml
<source id="00000000-0000-0000-0000-000000000002">
<title>Minimal Test Source</title>
<pages>
<page id="page-001">
<page_number>0</page_number>
<image_file_name>page_001.jpg</image_file_name>
</page>
<!-- more pages... -->
</pages>
</source>| Element | Description |
|---|---|
id | Reference ID for linking (any unique string) |
page_number | Sequential number (0-based), determines display order |
image_file_name | Path relative to images/ directory in ZIP (can include subdirectories) |
Image resolution: The image_file_name is joined with the images/ directory:
image_file_name | Resolved path in ZIP |
|---|---|
page_001.jpg | images/page_001.jpg |
book1/scan_001.jpg | images/book1/scan_001.jpg |
2024/jan/page.jpg | images/2024/jan/page.jpg |
This allows organizing images into subdirectories within the images/ folder.
Data Specifications
xml
<data_specifications>
<data_specification id="spec-unit" level="unit">
<name>minimal_unit</name>
<title_expression>{{ address_street }} {{ address_number }}</title_expression>
<fields>
<!-- field definitions -->
</fields>
</data_specification>
</data_specifications>| Attribute/Element | Description |
|---|---|
level | Which level this spec applies to: unit, subunit, or subsubunit |
title_expression | Jinja template for computing display titles from field values |
fields | Field definitions for this level |
Field mapping behavior:
- Fields are matched to existing census fields by
key - If a field with the same key exists → maps to it
- If no matching field exists → creates a new field
Field Definitions
xml
<field id="field-address-street">
<key>address_street</key>
<key_name>Address street (select)</key_name>
<order>0</order>
<input_template>{"id": "census_codebook_entries_select_input", "label": "Address street", "select": {"freeform": false, "required": true, "parameters": ["codebook-streets", "Streets"]}}</input_template>
</field>| Element | Description |
|---|---|
id | Reference ID for data values to link to |
key | Unique identifier within the level (used for matching existing fields) |
key_name | Human-readable name |
order | Display order (lower = first) |
input_template | JSON defining the field type and configuration |
Input template types:
- Text:
{"text": {"type": "tit_text"}, "label": "..."} - Number:
{"number": {"precision": 0}, "label": "..."} - Codebook select:
{"id": "census_codebook_entries_select_input", "select": {"parameters": ["codebook-id", "Codebook Name"]}}
Codebook Reference
In input_template, the parameters array contains ["codebook-id", "Codebook Name"]. The codebook ID here must match the id attribute of a <codebook> element. After import, this ID is automatically updated to the actual database ID.
Codebooks
xml
<codebooks>
<codebook id="codebook-streets">
<name>Streets</name>
<description>Street codebook for minimal census</description>
<entries>
<entry id="entry-main-st">
<key>1</key>
<label>Main Street</label>
<label_en>Main Street</label_en>
</entry>
<!-- more entries... -->
</entries>
</codebook>
</codebooks>| Element | Description |
|---|---|
id | Reference ID (used in field input_template and data value select) |
name | Codebook name (used for matching existing codebooks) |
entry.id | Reference ID for data values to link to |
entry.key | Unique key within codebook (used for matching existing entries) |
entry.label | Display label (primary language) |
entry.label_en | Display label in English (optional) |
Codebook mapping behavior:
- Codebooks are matched by
namewithin the target census - If a codebook with the same name exists → maps to it, maps entries by
key - If no matching codebook exists → creates new codebook and entries
Global Codebooks
Global codebooks are shared across all censuses (they have no census assigned). Examples: Gender, Marital Status.
- Global codebooks don't need to be included in the XML
- If a field references a global codebook ID that already exists in the database, that reference is preserved
- Only census-specific codebooks need to be exported/imported
Units
xml
<units>
<unit id="unit-001">
<title>Main Street, 10</title>
<pages>
<page_ref id="page-001" />
</pages>
<data_values>
<!-- data values -->
</data_values>
</unit>
</units>| Element | Description |
|---|---|
id | Reference ID (for subunits/entities to reference) |
title | Display title (can be auto-computed from title_expression) |
pages | Links to source pages via <page_ref id="..." /> |
data_values | Field values for this unit |
Title Computation
After import, extract_data() and update_title() are called on each unit. If a title_expression is defined, the title is recomputed from the data values.
Data Values
xml
<data_values>
<!-- Text value -->
<data_value id="dv-001-number">
<field id="field-address-number" />
<text>10</text>
</data_value>
<!-- Codebook select value -->
<data_value id="dv-001-street">
<field id="field-address-street" />
<select id="entry-main-st">
<key>1</key>
<label>Main Street</label>
</select>
</data_value>
</data_values>| Element | Description |
|---|---|
field id="..." | References the field definition this value belongs to |
text | Text value |
select id="..." | References a codebook entry (with fallback key/label) |
Select value resolution:
- First tries to find entry by
idin the ID map - If not found, falls back to looking up by
keyin the field's codebook
Complete Minimal XML
xml
<?xml version="1.0" encoding="UTF-8"?>
<census_export>
<census id="00000000-0000-0000-0000-000000000001">
<title>Minimal Test Census</title>
<source id="00000000-0000-0000-0000-000000000002">
<title>Minimal Test Source</title>
<pages>
<page id="page-001">
<page_number>0</page_number>
<image_file_name>page_001.jpg</image_file_name>
</page>
<page id="page-002">
<page_number>1</page_number>
<image_file_name>page_002.jpg</image_file_name>
</page>
<page id="page-003">
<page_number>2</page_number>
<image_file_name>page_003.jpg</image_file_name>
</page>
</pages>
</source>
<data_specifications>
<data_specification id="spec-unit" level="unit">
<name>minimal_unit</name>
<title_expression>{{ address_street }} {{ address_number }}</title_expression>
<fields>
<field id="field-address-street">
<key>address_street</key>
<key_name>Address street (select)</key_name>
<order>0</order>
<input_template>{"id": "census_codebook_entries_select_input", "label": "Address street", "select": {"freeform": false, "required": true, "parameters": ["codebook-streets", "Streets"]}}</input_template>
</field>
<field id="field-address-number">
<key>address_number</key>
<key_name>Address number (text)</key_name>
<order>1</order>
<input_template>{"id": "address_number", "text": {"type": "tit_text"}, "label": "Address number"}</input_template>
</field>
</fields>
</data_specification>
</data_specifications>
<codebooks>
<codebook id="codebook-streets">
<name>Streets</name>
<description>Street codebook for minimal census</description>
<entries>
<entry id="entry-main-st">
<key>1</key>
<label>Main Street</label>
</entry>
<entry id="entry-oak-ave">
<key>2</key>
<label>Oak Avenue</label>
</entry>
<entry id="entry-park-rd">
<key>3</key>
<label>Park Road</label>
</entry>
</entries>
</codebook>
</codebooks>
</census>
<units>
<unit id="unit-001">
<title>Main Street, 10</title>
<pages>
<page_ref id="page-001" />
</pages>
<data_values>
<data_value id="dv-001-street">
<field id="field-address-street" />
<select id="entry-main-st">
<key>1</key>
<label>Main Street</label>
</select>
</data_value>
<data_value id="dv-001-number">
<field id="field-address-number" />
<text>10</text>
</data_value>
</data_values>
</unit>
<unit id="unit-002">
<title>Oak Avenue, 25</title>
<pages>
<page_ref id="page-002" />
</pages>
<data_values>
<data_value id="dv-002-street">
<field id="field-address-street" />
<select id="entry-oak-ave">
<key>2</key>
<label>Oak Avenue</label>
</select>
</data_value>
<data_value id="dv-002-number">
<field id="field-address-number" />
<text>25</text>
</data_value>
</data_values>
</unit>
<unit id="unit-003">
<title>Park Road, 5</title>
<pages>
<page_ref id="page-003" />
</pages>
<data_values>
<data_value id="dv-003-street">
<field id="field-address-street" />
<select id="entry-park-rd">
<key>3</key>
<label>Park Road</label>
</select>
</data_value>
<data_value id="dv-003-number">
<field id="field-address-number" />
<text>5</text>
</data_value>
</data_values>
</unit>
</units>
</census_export>How to Import
- Create a census in the admin interface (Censuses → New Census)
- Upload the ZIP via the source file uploader
- The system detects
census.xmland runs the XML import instead of normal page extraction - Verify the imported data in the census viewer
What Gets Imported
| Component | Behavior |
|---|---|
| Data Specifications | Maps to existing fields by key, creates new if missing |
| Codebooks | Maps existing by name, creates new if missing |
| Codebook Entries | Maps existing by key, creates new if missing |
| Source Pages | Creates pages with images from images/ directory |
| Units | Creates with linked pages and data values |
| Subunits | Creates with parent unit reference |
| Entities | Creates with parent relationships |
Optional Sections
All parts of the import are optional. You can import:
- Just pages (source images only)
- Pages + units (with manual data entry later)
- Full structure with specifications, codebooks, and data values
Next Steps
For the complete XML specification including subunits, entities, and all data value types, see the XML Reference.