Getting Started¶

This guide walks through a complete round-trip: installation, first parse, schema inspection, a simple pipeline, and DataFrame conversion.

Install¶

pip install crxml

See Installation for details on building from source and platform support.

Your first source¶

Create a small Crystal Reports XML file and point CrystalXMLSource at it:

from crxml import CrystalXMLSource

src = CrystalXMLSource("report.xml")

for row in src:
    print(row)

Each row is a dict[str, str]. The keys are field names from the CR XML (e.g. {Report.FieldName}) and the values are the raw text content.

Inspect the schema¶

Use .schema() to see the fields without consuming the stream:

src = CrystalXMLSource("report.xml")
fields = src.schema()  # list of (key, sample_value) tuples
for key, sample in fields:
    print(f"{key}: {sample!r}")

This is useful for building dynamic pipelines.

Simple pipeline¶

The | operator chains transformation stages. Nothing executes until you iterate or sink the result:

from crxml import CrystalXMLSource, RenameFields, CastTypes, DropFields

pipe = (
    CrystalXMLSource("report.xml")
    | RenameFields({
        "{Report.InvoiceNo}": "invoice",
        "{Report.Customer}": "customer",
        "{Report.Amount}": "amount",
    })
    | CastTypes({"amount": float})
    | DropFields("{Report.TaxRate}")
)

for row in pipe:
    print(row["invoice"], row["amount"])

Convert to DataFrame¶

from crxml import to_dataframe

df = pipe |> to_dataframe

This collects all rows into a pandas DataFrame. For large files use chunksize= to build the DataFrame incrementally (see Sinks).

Next steps¶

Usage guide, deeper topics: custom stages, parallel mode, branching
Pipeline API, how | and lazy evaluation work
Built-in stages, reference for all four stage types
Performance, benchmarks, memory model, bottlenecks