Performance¶

Benchmarks¶

Test	Size	Rows	Time	Rows/s	MB/s	RSS
Stream	10 MB	9,010	0.043s	211 K	234	22 MB
Stream	50 MB	45,328	0.223s	203 K	224	45 MB
Stream	100 MB	90,384	0.418s	216 K	239	75 MB
To list	10 MB	9,010	0.052s	174 K	192	32 MB
To list	50 MB	45,328	0.249s	182 K	201	98 MB
To list	100 MB	90,384	0.478s	189 K	209	181 MB
Pipeline	10 MB	9,010	0.060s	150 K	166	32 MB
Pipeline	50 MB	45,328	0.295s	154 K	169	96 MB
Pipeline	100 MB	90,384	0.579s	156 K	173	176 MB
DataFrame	10 MB	9,010	0.320s	28 K	31	86 MB
DataFrame	50 MB	45,328	0.538s	84 K	93	152 MB
DataFrame	100 MB	90,384	0.829s	109 K	121	234 MB

pandas is imported lazily — RSS and time for DataFrame mode includes pandas overhead.

Where time goes¶

I/O: ~15% of parse time (kernel read syscalls)
XML parsing: ~40% (quick-xml)
Dict construction: ~30% (PyDict creation, key/value allocation)
Python overhead: ~15% (yield, iteration boundary)

The dominant cost is allocating Python str objects for each field value and constructing dicts. This overhead is inherent to the Python/Rust boundary.

Memory breakdown¶

┌──────────────────┐
│  I/O buffer      │  2 MB  (reused Vec<u8>)
│  Row dict        │  ~2 KB per row (temporary)
│  RSS floor       │  15 MB (Python interpreter, no pandas)
│  Pipeline stages │  ~20 MB (stage objects, fusion)
└──────────────────┘

RSS scales with file content (22 MB for 10 MB, 75 MB for 100 MB).

Parallel mode¶

For files >200 MB, parallel mode distributes batches across N workers:

Workers	500 MB file	Speedup
1	3.2 s	1.0×
2	1.9 s	1.7×
4	1.1 s	2.9×
8	0.9 s	3.6×

Diminishing returns past 4 workers due to IPC overhead.

Bottleneck analysis¶

Disk I/O: Not the bottleneck for files <1 GB on modern NVMe SSDs
Single-threaded Python: Dict allocation is the bottleneck
XML parsing: quick-xml is fast; the bottleneck is Python dict construction
Rust allocator: Uses system allocator with stable RSS

Recommendations¶

File size	Strategy
< 50 MB	Sequential (no overhead)
50–200 MB	Sequential or parallel(2)
> 200 MB	parallel(4)
> 1 GB	parallel(N_CPU), chunksize=20000