How it works¶
xmldiffreport treats every XML document as a tree of nodes and compares N
of them at once, aligning nodes by a natural key rather than by position.
The model¶
- Each element is a node with attributes, optional text, and children.
- A recipe declares, per tag: the
key(natural identity), whether the tag isinline(its children become pseudo-attributes instead of opening a new level), and which attributes to ignore. - The engine compares N sources simultaneously, matching nodes by identity (order-independent). Only differences end up in the result.
flowchart LR
A[parse each file] --> B[index units by tag+key]
B --> C{unit in ≥2 sources?}
C -- no --> X[skip]
C -- yes --> D[recursive diff per node]
D --> F[render report]
Units and recursion¶
The recipe's unit (e.g. SMART_FOLDER) is the top comparison entity. For each
unit present in 2 or more sources, the engine walks the tree recursively:
- Scalar differences — attributes (and element text) that differ become rows.
- Leaf / inline children (e.g.
INCOND,OUTCOND,ON) are compared by their key; a row appears when one is added/removed or when one of its attributes changes (e.g. anOUTCONDkeeps itsNAMEbut flipsSIGN). - Container children (e.g.
JOB) open a new level and are rendered as sub-sections; identical ones are collapsed into a count.
Attribute-level, not just present/absent¶
Because elements are matched by identity, a change inside an element is shown as an attribute change, not as a delete + add:
| Element / attribute | bench | uat | prod | |
|---|---|---|---|---|
| ≠ | INCOND …STAGE-…LOAD_OK · AND_OR |
A |
O |
A |
| ≠ | OUTCOND …LOAD-…POST_OK · SIGN |
- |
+ |
+ |
Reading the report¶
The Markdown and HTML reports share one structure:
- A top Sources block maps each short environment label — the parent
directory, e.g.
bench— to its full file path, listed once. The tables then use the short labels as columns so they stay narrow. - The Summary has one column per change type — Own (the unit's own attribute/text diffs), Presence (children in some sources but not others) and Changed (changed sub-units) — plus a Total row once there are more than five units. Each row links to its detail section.
- Every detail row opens with a status sign:
| Meaning | |
|---|---|
≠ |
present in every source, values differ |
⊘ |
present in some sources, absent in at least one |
± |
present in only one source |
The lone diverging value in a ≠ row is highlighted (bold in Markdown, red in
HTML); a missing value shows as absent. Presence-only children are listed as a
✓ / — matrix rather than free text.
Volatile attributes are ignored¶
Attributes that change on every export without functional meaning — VERSION,
CREATION_TIME, JOBISN, LAST_UPLOAD, … — are listed in the recipe's
ignore_attrs and never produce a row. This is what makes the diff semantic
instead of noisy.
What gets reported¶
The engine reports differences — every unit present in 2+ sources that isn't identical. It deliberately stays out of your domain: it does not classify those differences (e.g. "conflict" vs "informational"). If that distinction matters to your workflow, derive it yourself from the result — you know which source is which (each column is labelled by its environment, and the full paths are listed once in the top Sources block).
Namespaces & text¶
XML namespaces are stripped on parse ({uri}tag → tag) so tags and keys stay
readable and recipes stay simple. Element text is comparable too — e.g. a
sitemap <url> is identified by its <loc> text and its <lastmod> text is
compared as a value.