Appearance
XML Reference
Complete specification for the census import XML format.
Document Structure
xml
<?xml version="1.0" encoding="UTF-8"?>
<census_export>
<census id="...">
<!-- Census metadata, source, data specs, codebooks -->
</census>
<units>
<!-- Unit elements -->
</units>
<subunits>
<!-- Subunit elements (optional) -->
</subunits>
<entities>
<!-- Entity elements (optional) -->
</entities>
</census_export>Detection
The importer activates when a ZIP contains census.xml at its root. The filename must be exactly census.xml (case-sensitive).
Census Element
The <census> element contains metadata and nested definitions.
xml
<census id="unique-id">
<sistory_id>22</sistory_id>
<title>Ljubljana 1900</title>
<title_en>Ljubljana 1900</title_en>
<year>1900</year>
<archived>False</archived>
<source>...</source>
<data_specifications>...</data_specifications>
<codebooks>...</codebooks>
</census>INFO
The id attribute is used only for internal reference mapping during import. New IDs are generated for all imported objects.
Source & Pages
Define source pages with linked images.
xml
<source id="source-id">
<title>Source Title</title>
<pages>
<page id="page-001">
<page_number>0</page_number>
<image_file_name>page_001.jpg</image_file_name>
</page>
<page id="page-002">
<page_number>1</page_number>
<image_file_name>subdir/page_002.jpg</image_file_name>
</page>
</pages>
</source>Image Resolution
Images are resolved from the images/ directory in the ZIP. The image_file_name can include subdirectories:
| XML Value | Resolved Path |
|---|---|
page_001.jpg | images/page_001.jpg |
subdir/scan.jpg | images/subdir/scan.jpg |
2024/jan/page.jpg | images/2024/jan/page.jpg |
This allows organizing images into subdirectories within the images/ folder.
Page Fields
| Field | Required | Description |
|---|---|---|
id | Yes | Reference ID for linking |
page_number | Yes | Sequential page number (0-based) |
image_file_name | No | Path relative to images/ directory |
Data Specifications
Define field structures for each level (unit, subunit, entity).
xml
<data_specifications>
<data_specification id="spec-unit" level="unit">
<name>census_unit_spec</name>
<title_expression>{{ address_street }} {{ address_number }}</title_expression>
<fields>
<field id="field-001">
<key>address_street</key>
<key_name>Address Street</key_name>
<order>0</order>
<archived>False</archived>
<input_template>{"id": "...", "select": {...}}</input_template>
</field>
</fields>
</data_specification>
</data_specifications>Specification Levels
| Level | Description |
|---|---|
unit | Address/location level |
subunit | Household level |
subsubunit | Person/entity level |
Title Expression
The title_expression uses Jinja-style templating to compute display titles:
xml
<title_expression>{{ name }} {{ surname }}</title_expression>Field keys are referenced using syntax.
Field Input Templates
The input_template is a JSON object defining the field type:
Text field:
json
{"id": "field_id", "text": {"type": "tit_text"}, "label": "Field Label"}Number field:
json
{"id": "field_id", "label": "Age", "number": {"precision": 0, "unlimited": true}}Checkbox field:
json
{"id": "field_id", "check": {"value": false}, "label": "Is Active"}Codebook select field:
json
{
"id": "census_codebook_entries_select_input",
"label": "Street",
"select": {
"freeform": false,
"required": true,
"parameters": ["codebook-id", "Codebook Name"]
}
}Codebooks
Define controlled vocabularies for select fields.
xml
<codebooks>
<codebook id="codebook-streets">
<name>Streets</name>
<description>Street names for this census</description>
<archived>False</archived>
<entries>
<entry id="entry-001">
<key>1</key>
<label>Main Street</label>
<label_en>Main Street</label_en>
<archived>False</archived>
</entry>
<entry id="entry-002">
<key>2</key>
<label>Oak Avenue</label>
<archived>False</archived>
</entry>
</entries>
</codebook>
</codebooks>Codebook Fields
| Field | Required | Description |
|---|---|---|
id | Yes | Reference ID (used in field input_template and data value select) |
name | Yes | Codebook name (used for matching existing codebooks) |
description | No | Description text |
archived | No | Whether codebook is archived |
Entry Fields
| Field | Required | Description |
|---|---|---|
id | Yes | Reference ID for data values to link to |
key | Yes | Unique key within codebook (used for matching existing entries) |
label | Yes | Display label (primary language) |
label_en | No | Display label in English |
archived | No | Whether entry is archived |
Codebook Mapping
During import:
- Codebooks are matched by name to existing codebooks in the target census
- If no match, a new codebook is created
- Entries are matched by key within the codebook
- Codebook IDs in field
input_templateare updated automatically
Global Codebooks
INFO
Global codebooks are shared across all censuses (they have census=NULL in the database). Examples include Gender, Marital Status, etc.
- Global codebooks do not need to be included in the XML
- If a field's
input_templatereferences a codebook ID that exists in the database (including global codebooks), that reference is preserved - Only census-specific codebooks need to be exported and imported
- The import process checks if a referenced codebook ID exists before reporting it as missing
Units
Top-level census records, typically representing addresses.
xml
<units>
<unit id="unit-001">
<sistory_id>1</sistory_id>
<title>Main Street, 10</title>
<archived>False</archived>
<pages>
<page_ref id="page-001" />
<page_ref id="page-002" />
</pages>
<data_values>
<!-- Data values for this unit -->
</data_values>
</unit>
</units>Unit Fields
| Field | Required | Description |
|---|---|---|
id | Yes | Reference ID |
sistory_id | No | Legacy ID (new one generated) |
title | No | Display title (computed from title_expression if not set) |
archived | No | Whether unit is archived |
pages | No | Linked source pages |
data_values | No | Field values |
Subunits
Second-level records, typically representing households within an address.
xml
<subunits>
<subunit id="subunit-001">
<sistory_id>1</sistory_id>
<title>Household 1</title>
<archived>False</archived>
<order>0</order>
<unit id="unit-001" />
<source_entity id="entity-001" />
<pages>
<page_ref id="page-001" />
</pages>
<data_values>...</data_values>
</subunit>
</subunits>Subunit-Specific Fields
| Field | Description |
|---|---|
unit | Reference to parent unit (required) |
source_entity | Reference to the "head of household" entity |
order | Sort order within unit |
Entities
Third-level records, typically representing individual persons.
xml
<entities>
<entity id="entity-001">
<sistory_id>1</sistory_id>
<title>John Smith</title>
<archived>False</archived>
<order>0</order>
<unit id="unit-001" />
<subunit id="subunit-001" />
<parent id="entity-000" />
<parent_relationship>1</parent_relationship>
<parent_other_relationships>nephew</parent_other_relationships>
<pages>
<page_ref id="page-001" />
</pages>
<data_values>...</data_values>
</entity>
</entities>Entity-Specific Fields
| Field | Description |
|---|---|
unit | Reference to parent unit |
subunit | Reference to parent subunit |
parent | Reference to parent entity (for family trees) |
parent_relationship | Enum value for relationship type |
parent_other_relationships | Free text for other relationships |
order | Sort order within subunit |
Data Values
Data values store field values for units, subunits, and entities.
xml
<data_values>
<data_value id="dv-001">
<field id="field-001" />
<field_key>address_street</field_key>
<!-- Value element based on type -->
</data_value>
</data_values>Text Value
xml
<data_value id="...">
<field id="field-id" />
<text>John</text>
</data_value>HTML Value
xml
<data_value id="...">
<field id="field-id" />
<html><![CDATA[<p>Rich text content</p>]]></html>
</data_value>Number Value
xml
<data_value id="...">
<field id="field-id" />
<number>42</number>
</data_value>Boolean Value
xml
<data_value id="...">
<field id="field-id" />
<boolean>True</boolean>
</data_value>Date Value
xml
<data_value id="...">
<field id="field-id" />
<date>1900-01-15</date>
<date_granularity>0</date_granularity>
</data_value>Date granularity: 0 = day, 1 = month, 2 = year
Date Range Value
xml
<data_value id="...">
<field id="field-id" />
<date_start>1900-01-01</date_start>
<date_start_granularity>0</date_start_granularity>
<date_end>1900-12-31</date_end>
<date_end_granularity>0</date_end_granularity>
</data_value>Select (Codebook) Value
xml
<data_value id="...">
<field id="field-id" />
<select id="codebook-entry-id">
<key>1</key>
<label>Main Street</label>
</select>
</data_value>The id references a codebook entry. The key and label are used as fallback if the entry ID isn't found.
Entity Reference Value
xml
<data_value id="...">
<field id="field-id" />
<entity_ref id="entity-001" />
</data_value>Address Value
xml
<data_value id="...">
<field id="field-id" />
<address id="addr-001">
<recipient_name>John Smith</recipient_name>
<organization>Company Inc.</organization>
<address_line_1>123 Main St</address_line_1>
<address_line_2>Apt 4</address_line_2>
<locality>Ljubljana</locality>
<postal_code>1000</postal_code>
<country_code>SI</country_code>
<latitude>46.0569</latitude>
<longitude>14.5058</longitude>
</address>
</data_value>Orderable Index
For fields that support multiple values (orderable inputs):
xml
<data_value id="...">
<field id="field-id" />
<text>Value 1</text>
<orderable_i>0</orderable_i>
</data_value>
<data_value id="...">
<field id="field-id" />
<text>Value 2</text>
<orderable_i>1</orderable_i>
</data_value>Complete Example
See the minimal example ZIP for a working import file with:
- 3 source pages with images
- 1 codebook (Streets) with 3 entries
- 1 data specification with 2 fields (street select, number text)
- 3 units with codebook-linked addresses