Skip to content

Census Import - Quick Start

Import census data from a ZIP archive containing an XML file and source images.

Basic Use Case

The most common use case is importing units with linked source images and address data. Each unit represents an address/location in the census, and images are the scanned pages of the original census documents.

Minimal Example

Download the minimal example: minimal-import.zip

ZIP Structure

minimal-import.zip
├── census.xml          # Required: must be at root, exactly this name
└── images/
    ├── page_001.jpg
    ├── page_002.jpg
    └── page_003.jpg

XML Breakdown

The minimal XML consists of several sections. Each section is explained below with links to the full specification.

Root Element

xml
<?xml version="1.0" encoding="UTF-8"?>
<census_export>
  <census id="...">
    <!-- Census content -->
  </census>
  <units>
    <!-- Units listed here -->
  </units>
</census_export>

The root element must be <census_export>. Inside it:

  • <census> contains metadata, source, specifications, and codebooks
  • <units>, <subunits>, <entities> are at the root level (siblings of <census>)

How IDs Work

ID Mapping

All id attributes in the XML are temporary reference IDs used only during import. The system generates new database IDs for all created objects.

IDs are used to link elements together:

  • <page id="page-001"> defines a page with ID "page-001"
  • <page_ref id="page-001" /> references that page
  • <field id="field-001" /> in a data value references a field definition
  • <select id="entry-001"> references a codebook entry

You can use any string as an ID (UUIDs, numbers, descriptive names). They just need to be consistent within the XML file.


Source & Pages

Full specification →

xml
<source id="00000000-0000-0000-0000-000000000002">
  <title>Minimal Test Source</title>
  <pages>
    <page id="page-001">
      <page_number>0</page_number>
      <image_file_name>page_001.jpg</image_file_name>
    </page>
    <!-- more pages... -->
  </pages>
</source>
ElementDescription
idReference ID for linking (any unique string)
page_numberSequential number (0-based), determines display order
image_file_namePath relative to images/ directory in ZIP (can include subdirectories)

Image resolution: The image_file_name is joined with the images/ directory:

image_file_nameResolved path in ZIP
page_001.jpgimages/page_001.jpg
book1/scan_001.jpgimages/book1/scan_001.jpg
2024/jan/page.jpgimages/2024/jan/page.jpg

This allows organizing images into subdirectories within the images/ folder.


Data Specifications

Full specification →

xml
<data_specifications>
  <data_specification id="spec-unit" level="unit">
    <name>minimal_unit</name>
    <title_expression>{{ address_street }} {{ address_number }}</title_expression>
    <fields>
      <!-- field definitions -->
    </fields>
  </data_specification>
</data_specifications>
Attribute/ElementDescription
levelWhich level this spec applies to: unit, subunit, or subsubunit
title_expressionJinja template for computing display titles from field values
fieldsField definitions for this level

Field mapping behavior:

  • Fields are matched to existing census fields by key
  • If a field with the same key exists → maps to it
  • If no matching field exists → creates a new field

Field Definitions

Full specification →

xml
<field id="field-address-street">
  <key>address_street</key>
  <key_name>Address street (select)</key_name>
  <order>0</order>
  <input_template>{"id": "census_codebook_entries_select_input", "label": "Address street", "select": {"freeform": false, "required": true, "parameters": ["codebook-streets", "Streets"]}}</input_template>
</field>
ElementDescription
idReference ID for data values to link to
keyUnique identifier within the level (used for matching existing fields)
key_nameHuman-readable name
orderDisplay order (lower = first)
input_templateJSON defining the field type and configuration

Input template types:

  • Text: {"text": {"type": "tit_text"}, "label": "..."}
  • Number: {"number": {"precision": 0}, "label": "..."}
  • Codebook select: {"id": "census_codebook_entries_select_input", "select": {"parameters": ["codebook-id", "Codebook Name"]}}

Codebook Reference

In input_template, the parameters array contains ["codebook-id", "Codebook Name"]. The codebook ID here must match the id attribute of a <codebook> element. After import, this ID is automatically updated to the actual database ID.


Codebooks

Full specification →

xml
<codebooks>
  <codebook id="codebook-streets">
    <name>Streets</name>
    <description>Street codebook for minimal census</description>
    <entries>
      <entry id="entry-main-st">
        <key>1</key>
        <label>Main Street</label>
        <label_en>Main Street</label_en>
      </entry>
      <!-- more entries... -->
    </entries>
  </codebook>
</codebooks>
ElementDescription
idReference ID (used in field input_template and data value select)
nameCodebook name (used for matching existing codebooks)
entry.idReference ID for data values to link to
entry.keyUnique key within codebook (used for matching existing entries)
entry.labelDisplay label (primary language)
entry.label_enDisplay label in English (optional)

Codebook mapping behavior:

  • Codebooks are matched by name within the target census
  • If a codebook with the same name exists → maps to it, maps entries by key
  • If no matching codebook exists → creates new codebook and entries

Global Codebooks

Global codebooks are shared across all censuses (they have no census assigned). Examples: Gender, Marital Status.

  • Global codebooks don't need to be included in the XML
  • If a field references a global codebook ID that already exists in the database, that reference is preserved
  • Only census-specific codebooks need to be exported/imported

Units

Full specification →

xml
<units>
  <unit id="unit-001">
    <title>Main Street, 10</title>
    <pages>
      <page_ref id="page-001" />
    </pages>
    <data_values>
      <!-- data values -->
    </data_values>
  </unit>
</units>
ElementDescription
idReference ID (for subunits/entities to reference)
titleDisplay title (can be auto-computed from title_expression)
pagesLinks to source pages via <page_ref id="..." />
data_valuesField values for this unit

Title Computation

After import, extract_data() and update_title() are called on each unit. If a title_expression is defined, the title is recomputed from the data values.


Data Values

Full specification →

xml
<data_values>
  <!-- Text value -->
  <data_value id="dv-001-number">
    <field id="field-address-number" />
    <text>10</text>
  </data_value>

  <!-- Codebook select value -->
  <data_value id="dv-001-street">
    <field id="field-address-street" />
    <select id="entry-main-st">
      <key>1</key>
      <label>Main Street</label>
    </select>
  </data_value>
</data_values>
ElementDescription
field id="..."References the field definition this value belongs to
textText value
select id="..."References a codebook entry (with fallback key/label)

Select value resolution:

  1. First tries to find entry by id in the ID map
  2. If not found, falls back to looking up by key in the field's codebook

Complete Minimal XML

xml
<?xml version="1.0" encoding="UTF-8"?>
<census_export>
  <census id="00000000-0000-0000-0000-000000000001">
    <title>Minimal Test Census</title>
    <source id="00000000-0000-0000-0000-000000000002">
      <title>Minimal Test Source</title>
      <pages>
        <page id="page-001">
          <page_number>0</page_number>
          <image_file_name>page_001.jpg</image_file_name>
        </page>
        <page id="page-002">
          <page_number>1</page_number>
          <image_file_name>page_002.jpg</image_file_name>
        </page>
        <page id="page-003">
          <page_number>2</page_number>
          <image_file_name>page_003.jpg</image_file_name>
        </page>
      </pages>
    </source>
    <data_specifications>
      <data_specification id="spec-unit" level="unit">
        <name>minimal_unit</name>
        <title_expression>{{ address_street }} {{ address_number }}</title_expression>
        <fields>
          <field id="field-address-street">
            <key>address_street</key>
            <key_name>Address street (select)</key_name>
            <order>0</order>
            <input_template>{"id": "census_codebook_entries_select_input", "label": "Address street", "select": {"freeform": false, "required": true, "parameters": ["codebook-streets", "Streets"]}}</input_template>
          </field>
          <field id="field-address-number">
            <key>address_number</key>
            <key_name>Address number (text)</key_name>
            <order>1</order>
            <input_template>{"id": "address_number", "text": {"type": "tit_text"}, "label": "Address number"}</input_template>
          </field>
        </fields>
      </data_specification>
    </data_specifications>
    <codebooks>
      <codebook id="codebook-streets">
        <name>Streets</name>
        <description>Street codebook for minimal census</description>
        <entries>
          <entry id="entry-main-st">
            <key>1</key>
            <label>Main Street</label>
          </entry>
          <entry id="entry-oak-ave">
            <key>2</key>
            <label>Oak Avenue</label>
          </entry>
          <entry id="entry-park-rd">
            <key>3</key>
            <label>Park Road</label>
          </entry>
        </entries>
      </codebook>
    </codebooks>
  </census>
  <units>
    <unit id="unit-001">
      <title>Main Street, 10</title>
      <pages>
        <page_ref id="page-001" />
      </pages>
      <data_values>
        <data_value id="dv-001-street">
          <field id="field-address-street" />
          <select id="entry-main-st">
            <key>1</key>
            <label>Main Street</label>
          </select>
        </data_value>
        <data_value id="dv-001-number">
          <field id="field-address-number" />
          <text>10</text>
        </data_value>
      </data_values>
    </unit>
    <unit id="unit-002">
      <title>Oak Avenue, 25</title>
      <pages>
        <page_ref id="page-002" />
      </pages>
      <data_values>
        <data_value id="dv-002-street">
          <field id="field-address-street" />
          <select id="entry-oak-ave">
            <key>2</key>
            <label>Oak Avenue</label>
          </select>
        </data_value>
        <data_value id="dv-002-number">
          <field id="field-address-number" />
          <text>25</text>
        </data_value>
      </data_values>
    </unit>
    <unit id="unit-003">
      <title>Park Road, 5</title>
      <pages>
        <page_ref id="page-003" />
      </pages>
      <data_values>
        <data_value id="dv-003-street">
          <field id="field-address-street" />
          <select id="entry-park-rd">
            <key>3</key>
            <label>Park Road</label>
          </select>
        </data_value>
        <data_value id="dv-003-number">
          <field id="field-address-number" />
          <text>5</text>
        </data_value>
      </data_values>
    </unit>
  </units>
</census_export>

How to Import

  1. Create a census in the admin interface (Censuses → New Census)
  2. Upload the ZIP via the source file uploader
  3. The system detects census.xml and runs the XML import instead of normal page extraction
  4. Verify the imported data in the census viewer

What Gets Imported

ComponentBehavior
Data SpecificationsMaps to existing fields by key, creates new if missing
CodebooksMaps existing by name, creates new if missing
Codebook EntriesMaps existing by key, creates new if missing
Source PagesCreates pages with images from images/ directory
UnitsCreates with linked pages and data values
SubunitsCreates with parent unit reference
EntitiesCreates with parent relationships

Optional Sections

All parts of the import are optional. You can import:

  • Just pages (source images only)
  • Pages + units (with manual data entry later)
  • Full structure with specifications, codebooks, and data values

Next Steps

For the complete XML specification including subunits, entities, and all data value types, see the XML Reference.