Basic Parsing¶

Opening a file¶

CrystalXMLSource accepts a file path (string or pathlib.Path):

from crxml import CrystalXMLSource

# path string
src = CrystalXMLSource("report.xml")

# pathlib.Path
from pathlib import Path
src = CrystalXMLSource(Path("report.xml"))

Parameters¶

Param	Type	Default	Description
`source`	`str \\| Path`	—	Path to CR XML file
`row_tag`	`str`	`"Row"`	XML tag for each record row

The row_tag parameter lets you target a different repeating element if your CR XML uses a non-standard tag name.

Iteration¶

CrystalXMLSource is iterable. Each row is a dict[str, str]:

for row in CrystalXMLSource("report.xml"):
    print(row["{Report.InvoiceNo}"], row["{Report.Amount}"])

Keys are the FieldName attribute values from the CR XML (e.g. {Report.InvoiceNo}). Values are the raw text of the first <FormattedValue> or <Value> child element.

Schema inspection¶

Call .schema() to discover fields without consuming the stream:

src = CrystalXMLSource("report.xml")
fields = src.schema()  # list of (key, sample_value) tuples

The source yields rows internally and caches them, so the first batch is not lost. .schema() is safe to call before building a pipeline.

Memory model¶

The parser streams the file in constant memory. The Rust backend reuses internal buffers across rows and never materializes the full document. RSS scales with file content (22 MB for 10 MB, 75 MB for 100 MB), staying well below file size. pandas is imported lazily — memory climbs only when to_dataframe is called.

CR XML layout detection¶

Crystal Reports XML stores field values in two patterns:

Attribute style: <Field FieldName="{Report.Amount}"><Value>123.45</Value></Field>
Element style: <Field><FieldName>{Report.Amount}</FieldName><Value>123.45</Value></Field>
Mixed: some fields use attributes, others use child elements

The parser detects both styles automatically, no configuration needed.