Skip to content

XML Reference

Complete specification for the census import XML format.

Document Structure

xml
<?xml version="1.0" encoding="UTF-8"?>
<census_export>
  <census id="...">
    <!-- Census metadata, source, data specs, codebooks -->
  </census>
  <units>
    <!-- Unit elements -->
  </units>
  <subunits>
    <!-- Subunit elements (optional) -->
  </subunits>
  <entities>
    <!-- Entity elements (optional) -->
  </entities>
</census_export>

Detection

The importer activates when a ZIP contains census.xml at its root. The filename must be exactly census.xml (case-sensitive).


Census Element

The <census> element contains metadata and nested definitions.

xml
<census id="unique-id">
  <sistory_id>22</sistory_id>
  <title>Ljubljana 1900</title>
  <title_en>Ljubljana 1900</title_en>
  <year>1900</year>
  <archived>False</archived>

  <source>...</source>
  <data_specifications>...</data_specifications>
  <codebooks>...</codebooks>
</census>

INFO

The id attribute is used only for internal reference mapping during import. New IDs are generated for all imported objects.


Source & Pages

Define source pages with linked images.

xml
<source id="source-id">
  <title>Source Title</title>
  <pages>
    <page id="page-001">
      <page_number>0</page_number>
      <image_file_name>page_001.jpg</image_file_name>
    </page>
    <page id="page-002">
      <page_number>1</page_number>
      <image_file_name>subdir/page_002.jpg</image_file_name>
    </page>
  </pages>
</source>

Image Resolution

Images are resolved from the images/ directory in the ZIP. The image_file_name can include subdirectories:

XML ValueResolved Path
page_001.jpgimages/page_001.jpg
subdir/scan.jpgimages/subdir/scan.jpg
2024/jan/page.jpgimages/2024/jan/page.jpg

This allows organizing images into subdirectories within the images/ folder.

Page Fields

FieldRequiredDescription
idYesReference ID for linking
page_numberYesSequential page number (0-based)
image_file_nameNoPath relative to images/ directory

Data Specifications

Define field structures for each level (unit, subunit, entity).

xml
<data_specifications>
  <data_specification id="spec-unit" level="unit">
    <name>census_unit_spec</name>
    <title_expression>{{ address_street }} {{ address_number }}</title_expression>
    <fields>
      <field id="field-001">
        <key>address_street</key>
        <key_name>Address Street</key_name>
        <order>0</order>
        <archived>False</archived>
        <input_template>{"id": "...", "select": {...}}</input_template>
      </field>
    </fields>
  </data_specification>
</data_specifications>

Specification Levels

LevelDescription
unitAddress/location level
subunitHousehold level
subsubunitPerson/entity level

Title Expression

The title_expression uses Jinja-style templating to compute display titles:

xml
<title_expression>{{ name }} {{ surname }}</title_expression>

Field keys are referenced using syntax.

Field Input Templates

The input_template is a JSON object defining the field type:

Text field:

json
{"id": "field_id", "text": {"type": "tit_text"}, "label": "Field Label"}

Number field:

json
{"id": "field_id", "label": "Age", "number": {"precision": 0, "unlimited": true}}

Checkbox field:

json
{"id": "field_id", "check": {"value": false}, "label": "Is Active"}

Codebook select field:

json
{
  "id": "census_codebook_entries_select_input",
  "label": "Street",
  "select": {
    "freeform": false,
    "required": true,
    "parameters": ["codebook-id", "Codebook Name"]
  }
}

Codebooks

Define controlled vocabularies for select fields.

xml
<codebooks>
  <codebook id="codebook-streets">
    <name>Streets</name>
    <description>Street names for this census</description>
    <archived>False</archived>
    <entries>
      <entry id="entry-001">
        <key>1</key>
        <label>Main Street</label>
        <label_en>Main Street</label_en>
        <archived>False</archived>
      </entry>
      <entry id="entry-002">
        <key>2</key>
        <label>Oak Avenue</label>
        <archived>False</archived>
      </entry>
    </entries>
  </codebook>
</codebooks>

Codebook Fields

FieldRequiredDescription
idYesReference ID (used in field input_template and data value select)
nameYesCodebook name (used for matching existing codebooks)
descriptionNoDescription text
archivedNoWhether codebook is archived

Entry Fields

FieldRequiredDescription
idYesReference ID for data values to link to
keyYesUnique key within codebook (used for matching existing entries)
labelYesDisplay label (primary language)
label_enNoDisplay label in English
archivedNoWhether entry is archived

Codebook Mapping

During import:

  • Codebooks are matched by name to existing codebooks in the target census
  • If no match, a new codebook is created
  • Entries are matched by key within the codebook
  • Codebook IDs in field input_template are updated automatically

Global Codebooks

INFO

Global codebooks are shared across all censuses (they have census=NULL in the database). Examples include Gender, Marital Status, etc.

  • Global codebooks do not need to be included in the XML
  • If a field's input_template references a codebook ID that exists in the database (including global codebooks), that reference is preserved
  • Only census-specific codebooks need to be exported and imported
  • The import process checks if a referenced codebook ID exists before reporting it as missing

Units

Top-level census records, typically representing addresses.

xml
<units>
  <unit id="unit-001">
    <sistory_id>1</sistory_id>
    <title>Main Street, 10</title>
    <archived>False</archived>
    <pages>
      <page_ref id="page-001" />
      <page_ref id="page-002" />
    </pages>
    <data_values>
      <!-- Data values for this unit -->
    </data_values>
  </unit>
</units>

Unit Fields

FieldRequiredDescription
idYesReference ID
sistory_idNoLegacy ID (new one generated)
titleNoDisplay title (computed from title_expression if not set)
archivedNoWhether unit is archived
pagesNoLinked source pages
data_valuesNoField values

Subunits

Second-level records, typically representing households within an address.

xml
<subunits>
  <subunit id="subunit-001">
    <sistory_id>1</sistory_id>
    <title>Household 1</title>
    <archived>False</archived>
    <order>0</order>
    <unit id="unit-001" />
    <source_entity id="entity-001" />
    <pages>
      <page_ref id="page-001" />
    </pages>
    <data_values>...</data_values>
  </subunit>
</subunits>

Subunit-Specific Fields

FieldDescription
unitReference to parent unit (required)
source_entityReference to the "head of household" entity
orderSort order within unit

Entities

Third-level records, typically representing individual persons.

xml
<entities>
  <entity id="entity-001">
    <sistory_id>1</sistory_id>
    <title>John Smith</title>
    <archived>False</archived>
    <order>0</order>
    <unit id="unit-001" />
    <subunit id="subunit-001" />
    <parent id="entity-000" />
    <parent_relationship>1</parent_relationship>
    <parent_other_relationships>nephew</parent_other_relationships>
    <pages>
      <page_ref id="page-001" />
    </pages>
    <data_values>...</data_values>
  </entity>
</entities>

Entity-Specific Fields

FieldDescription
unitReference to parent unit
subunitReference to parent subunit
parentReference to parent entity (for family trees)
parent_relationshipEnum value for relationship type
parent_other_relationshipsFree text for other relationships
orderSort order within subunit

Data Values

Data values store field values for units, subunits, and entities.

xml
<data_values>
  <data_value id="dv-001">
    <field id="field-001" />
    <field_key>address_street</field_key>
    <!-- Value element based on type -->
  </data_value>
</data_values>

Text Value

xml
<data_value id="...">
  <field id="field-id" />
  <text>John</text>
</data_value>

HTML Value

xml
<data_value id="...">
  <field id="field-id" />
  <html><![CDATA[<p>Rich text content</p>]]></html>
</data_value>

Number Value

xml
<data_value id="...">
  <field id="field-id" />
  <number>42</number>
</data_value>

Boolean Value

xml
<data_value id="...">
  <field id="field-id" />
  <boolean>True</boolean>
</data_value>

Date Value

xml
<data_value id="...">
  <field id="field-id" />
  <date>1900-01-15</date>
  <date_granularity>0</date_granularity>
</data_value>

Date granularity: 0 = day, 1 = month, 2 = year

Date Range Value

xml
<data_value id="...">
  <field id="field-id" />
  <date_start>1900-01-01</date_start>
  <date_start_granularity>0</date_start_granularity>
  <date_end>1900-12-31</date_end>
  <date_end_granularity>0</date_end_granularity>
</data_value>

Select (Codebook) Value

xml
<data_value id="...">
  <field id="field-id" />
  <select id="codebook-entry-id">
    <key>1</key>
    <label>Main Street</label>
  </select>
</data_value>

The id references a codebook entry. The key and label are used as fallback if the entry ID isn't found.

Entity Reference Value

xml
<data_value id="...">
  <field id="field-id" />
  <entity_ref id="entity-001" />
</data_value>

Address Value

xml
<data_value id="...">
  <field id="field-id" />
  <address id="addr-001">
    <recipient_name>John Smith</recipient_name>
    <organization>Company Inc.</organization>
    <address_line_1>123 Main St</address_line_1>
    <address_line_2>Apt 4</address_line_2>
    <locality>Ljubljana</locality>
    <postal_code>1000</postal_code>
    <country_code>SI</country_code>
    <latitude>46.0569</latitude>
    <longitude>14.5058</longitude>
  </address>
</data_value>

Orderable Index

For fields that support multiple values (orderable inputs):

xml
<data_value id="...">
  <field id="field-id" />
  <text>Value 1</text>
  <orderable_i>0</orderable_i>
</data_value>
<data_value id="...">
  <field id="field-id" />
  <text>Value 2</text>
  <orderable_i>1</orderable_i>
</data_value>

Complete Example

See the minimal example ZIP for a working import file with:

  • 3 source pages with images
  • 1 codebook (Streets) with 3 entries
  • 1 data specification with 2 fields (street select, number text)
  • 3 units with codebook-linked addresses