Overview
The XML Field Extractor is a powerful tool for extracting specific data from XML files. It offers two extraction modes: Simple mode for direct field extraction with hierarchical parent segments, and Advanced mode for conditional filtering with 6 different operators. Process multiple XML files at once (batch processing), automatically detect XML structure with the scanner, handle repeating elements with row extraction, and export results to CSV, JSON, or Excel. The tool supports 6 encoding formats including SAP-specific encodings, all processed 100% locally in your browser.
Interface overview
The XML Field Extractor has a clean, organized layout with all extraction controls in one place.
Header section
At the top you'll see:
• XML FIELD EXTRACTOR: the tool title in large orange text
• Subtitle: Extract specific fields from XML files & export to CSV, JSON & Excel
• Heart icon (top right) - click to add this tool to your favorites
Extraction mode selector
Below the header, a collapsible section with two mode buttons:
• Simple (list icon) - "Direct field extraction" - default mode
• Advanced (filter icon) - "Conditional filtering" - for complex rules
Click the chevron button (top right of section) to collapse/expand the mode selector.
Two-column layout
The main area is split into two columns:
• Left column: File upload zone (drag & drop or click to browse)
• Right column: XML Structure Scanner (optional helper)
In Advanced mode, the scanner is hidden and file upload takes full width.
Fields to extract section
Below the two columns (Simple mode):
• Section label: "Fields to extract" with action buttons
• Action buttons: Add parent segment, Add field, Load demo, Clear all
• Case Sensitive checkbox: toggle case-sensitive field matching
• Parent segments: hierarchical containers with nested fields
Extract button
The orange "EXTRACT DATA" button at the bottom processes all uploaded files with your defined fields/rules. Click this after setting up your extraction configuration.
Results section
After extraction, a results table appears showing:
• File name column
• Row # column (if repeating elements)
• Extracted field columns
• Filter inputs for each column
• Download and fullscreen buttons
Analytics section
At the bottom, Extraction analytics shows:
• Files processed: total XML files processed
• Fields extracted: total fields extracted
• Time saved: estimated manual time saved
• Reset button: clear all statistics
Simple vs Advanced mode Mode Selection
The tool offers two extraction modes for different use cases. Choose based on your needs.
Simple mode (default)
Use Simple mode when:
• You know the exact field names to extract
• You want to extract from specific parent elements
• You need to handle repeating elements
• Your XML has a hierarchical structure
Features:
• Hierarchical parent segments (up to 3 levels deep)
• "Extract all repetitions" checkbox for repeating elements
• XML Structure Scanner to auto-detect fields
• Case-sensitive toggle
Advanced mode
Use Advanced mode when:
• You need to filter data by conditions
• You want to extract only rows matching specific criteria
• You need operators like equals, contains, range
• You're doing complex data extraction
Features:
• Parent element lookup (required)
• Multiple conditions with 6 operators
• Target field specification
• Multiple extraction rules
Switching modes
Click the mode button (Simple or Advanced) to switch. The active mode is visually highlighted. Each mode maintains its own:
• Field/rule configuration
• Extraction results
Switching modes preserves your data in each mode - you can switch back and forth.
Mode selector collapse
Click the chevron icon (top right of "Extraction mode" section) to collapse the mode selector. This saves screen space once you've chosen your mode. Click again to expand.
Uploading XML files File Input
Upload XML files using drag & drop or file browser. The tool supports batch processing: upload multiple files at once.
Drop zone (initial state)
When no files are uploaded, you'll see a large drop zone:
• Cloud upload icon (orange)
• Drop XML files here: main text
• or click to browse: subtitle
• SELECT FILES button (orange)
• XML badge - shows accepted format
Drag and drop
To upload via drag & drop:
1. Drag XML files from your file explorer
2. A global overlay appears: "Drop files to add them"
3. Release to upload
4. Files are validated (only .xml accepted)
You can drag files from anywhere on the page: a full-screen overlay appears.
Click to browse
To upload via file browser:
1. Click anywhere on the drop zone, OR
2. Click the "SELECT FILES" button
3. File dialog opens
4. Select one or more .xml files
5. Click "Open"
Files list (after upload)
After uploading, the drop zone transforms into a files list:
• Header row: "X XML file(s) selected" + "Clear All" button
• Compact add zone: "Drop more files or click to browse"
• File items: each showing:
- File icon (code icon)
- File name
- File size (kB or MB)
- Remove button (trash icon)
Adding more files
After initial upload, you can add more files:
• Click the compact add zone at the top
• Or drag & drop additional files anywhere
New files are added to the existing list.
Removing files
To remove a single file:
• Click the trash icon on the file item
To remove all files:
• Click "Clear All" button in the header
• Drop zone returns to initial state
XML structure scanner Auto-Detect
The XML Structure Scanner analyzes sample XML and automatically creates the field extraction structure. This is the fastest way to set up extraction for complex XML.
Scanner location
In Simple mode, the scanner is in the right column next to the file upload area:
• Label: "XML Structure scanner" with "(Optional - helps identify fields)"
• Textarea: paste your sample XML here
• "SCAN STRUCTURE" button (orange)
Paste sample XML
Paste a sample of your XML into the textarea. It doesn't need to be the complete file - just enough to show the structure. For example:<Order>
<Header>
<OrderID>123</OrderID>
<Date>2026-01-15</Date>
</Header>
<Items>
<Item>
<ProductID>ABC</ProductID>
<Quantity>5</Quantity>
</Item>
</Items></Order>
Click SCAN STRUCTURE
Click the orange "SCAN STRUCTURE" button. The scanner:
1. Parses the XML
2. Identifies all parent elements (containers)
3. Identifies all leaf elements (fields with values)
4. Detects repeating elements (multiple children with same name)
5. Builds hierarchical structure
Auto-generated structure
After scanning, the "Fields to extract" section is populated with:
• Parent segments for each container element
• Nested segments preserving the hierarchy
• Fields for each leaf element (value-containing tags)
• Repeating checkbox auto-checked for detected repetitions
Progress indicator
For large XML samples, a progress notification shows:
• "Scanning structure..."
• Progress bar (percentage)
• "Detecting XML elements..."
This prevents UI freeze on complex XML.
Confirmation dialog
If you already have fields defined, scanning will ask:
• "Clear all fields?": because scanning replaces existing structure
• Click OK to proceed and replace
• Click Cancel to keep existing fields
Simple mode in detail Direct Extraction
Simple mode extracts fields by name, organized in hierarchical parent segments. Here's every element explained.
Section header
The "Fields to extract" section header contains:
• List icon + "Fields to extract" label
• Case Sensitive checkbox: toggle case matching
• Folder+ button: add new parent segment
• Plus button: add new field (to root level)
• Flask button: load demo data structure
• Trash button: clear all fields
Default structure
When you first load the tool, there's a default structure:
• One parent segment named "UNNAMED"
• One empty field input inside it
This is the minimal structure - you always need at least one parent segment.
Adding fields
To add a field:
1. Click the plus (+) button in a parent segment's header, OR
2. Click the global plus button in the section header
A new empty field input appears. Type the XML element name to extract (e.g., "OrderID", "CustomerName").
Field input behavior
Each field input:
• Placeholder: "Enter XML element name (e.g., ID, NAME, ITEM, QTY, DATE)"
• Auto-focus: cursor automatically placed in new fields
• Enter key: pressing Enter triggers extraction
• Remove button (X) - delete this field
Case Sensitive toggle
The Case Sensitive checkbox controls field name matching:
• Unchecked (default): "orderid" matches "OrderID", "ORDERID", etc.
• Checked: exact case match required
Case-insensitive mode is useful when XML case varies between files.
Load demo data
Click the flask button to load a sample structure:
• Parent: "Parent_Example"
- Field (empty)
- Nested: "Nested_Example" with 1 field
- Nested: "Nested_Example2"
- Nested: "Nested_Example3" with 1 field
This demonstrates the 3-level nesting capability.
Clear all fields
Click the trash button to clear everything:
• Confirmation dialog appears (if fields have data)
• All parent segments and fields are removed
• Default structure (one UNNAMED parent) is recreated
Parent segments & nesting Hierarchy
Parent segments represent container elements in your XML. They can be nested up to 3 levels deep to match complex XML hierarchies.
Parent segment anatomy
Each parent segment shows:
• Collapse button (chevron) - expand/collapse contents
• Question mark icon: hover for help tooltip
• Pencil icon: click to edit segment name
• Segment name (editable input) - the XML element name
• "Extract all repetitions" checkbox: for repeating elements
• Folder+ button: add nested parent segment
• Plus button: add field to this segment
• Trash button: remove this segment
Adding parent segment
To add a top-level parent:
1. Click the folder+ button in the section header
2. New segment named "UNNAMED" appears
3. Name input is auto-focused and selected
4. Type the parent element name (e.g., "Order", "Customer")
5. Press Enter or click outside to confirm
Adding nested parent
To add a nested (child) parent:
1. Find the parent you want to nest inside
2. Click its folder+ button
3. New nested segment appears inside
4. Type the nested element name
Nested segments are visually indented and have a darker background.
Three nesting levels
The tool supports 3 visual nesting levels:
• Level 0: top-level parents (lightest background)
• Level 1: first nested level (slightly darker)
• Level 2: second nested level (darkest)
You can nest deeper, but visual distinction maxes at 3 levels.
Editing segment name
To rename a parent segment:
1. Click the pencil icon, OR
2. Click directly on the name text
3. Input becomes editable (selected)
4. Type new name
5. Press Enter to confirm, OR
6. Press Escape to cancel (reverts to original)
7. Or just click outside
Collapse/expand
To collapse a parent segment:
1. Click the chevron button (left of name)
2. Contents hide, chevron rotates right
3. Click again to expand
Useful for managing complex structures with many levels.
Removing parent segment
To remove a parent segment:
1. Click the trash button on the segment
2. Confirmation dialog: "Remove this parent segment and all its fields?"
3. Click OK to confirm
Note: You cannot remove the last top-level parent: the tool always needs at least one.
Parent help tooltip
Hover over the question mark icon to see:
"Parent Segment: A parent element (e.g., <ITEM>) is a container in your XML that groups related fields together. When you define a parent segment, all fields within it will be extracted as a single row. Enable 'Extract all repetitions' to create a row for each occurrence of this parent element in your XML file."
<Order><Items><Item>, create parents: Order → Items → Item (nested inside each other).Extract all repetitions Repeating Data
The "Extract all repetitions" checkbox is the key to extracting data from XML with multiple similar elements (like multiple line items in an order).
What are repeating elements?
Many XML files have repeating structures:<Order>
<Item><ProductID>A1</ProductID><Qty>5</Qty></Item>
<Item><ProductID>B2</ProductID><Qty>3</Qty></Item>
<Item><ProductID>C3</ProductID><Qty>7</Qty></Item></Order>
Here, <Item> repeats 3 times. Each needs to become a separate row.
Checkbox location
The Extract all repetitions checkbox is in each parent segment's header, between the name and the action buttons. It has a checkbox followed by the text Extract all repetitions.
Checkbox OFF (default)
When unchecked (default):
• Only the first occurrence of the parent element is used
• Result: one row per file
• Use when the element appears only once, or you only want the first
Checkbox ON
When checked:
• All occurrences of the parent element are processed
• Result: multiple rows per file (one per occurrence)
• A "Row #" column appears in results
• Use for line items, addresses, contacts, etc.
Row numbering
With repetitions enabled, results include:
• File name column - same value for all rows from one file
• Row # column - 1, 2, 3... for each occurrence
• Field columns: values from that specific occurrence
This lets you identify which repetition each row came from.
Multiple repeating parents
You can enable repetitions on multiple parent segments. The tool finds the maximum count across all repeating parents and creates that many rows. Fields from parents with fewer occurrences show empty values for extra rows.
Nested repeating elements
For deeply nested repeating elements:
1. Create the parent hierarchy matching your XML
2. Enable "Extract all repetitions" on the innermost parent that repeats
3. Fields outside that parent use the first occurrence
4. Fields inside get their value from each repetition
Advanced mode in detail Conditional Rules
Advanced mode lets you create extraction rules with conditions. Extract data only when specific criteria are met.
Switching to Advanced mode
Click the "Advanced" button in the mode selector. The button shows:
• Filter icon
• "Advanced" title
• "Conditional filtering" description
The interface changes - XML Scanner hides, file upload expands to full width.
Rules section
The "Conditional extraction rules" section appears with:
• Filter icon + section label
• Plus button: add new extraction rule
• Rule containers: each rule in its own card
Rule container structure
Each rule container has:
• Header: "Extraction rule" + red trash button
• Parent element input: where to look in XML
• Conditions section: optional filters
• Target field input: what to extract
Parent element (required)
The "Look within parent element:" field (marked with red asterisk):
• Placeholder: "e.g., E1ADRM1 (leave empty to search entire document)"
• Required: extraction won't run without it
• Specifies which XML element to search within
Conditions section
The "With conditions (optional):" section:
• Condition cards: each with field, operator, value
• "+ ADD CONDITION" button: add more conditions
• Multiple conditions = ALL must match (AND logic)
Target field
The "Field to extract:" field (marked with "TARGET" badge):
• Placeholder: "Enter the XML field name to extract (e.g., NAME1)"
• This is the actual value you want in your results
Adding rules
Click the plus button in the section header to add another rule. Each rule extracts a different field (becomes a column). All rules are processed together for each file.
Removing/clearing rules
Single rule: Click the red trash button on the rule
• If it's the last rule, it clears fields (not removed)
• Confirmation: "Clear all fields in this rule?"
Multiple rules: Each can be independently removed
Condition operators 6 Operators
Advanced mode conditions support 6 different operators for flexible filtering.
Condition card anatomy
Each condition card contains:
• Trash button (top right) - remove condition
• Field name input: the XML field to check
• Operator dropdown: how to compare
• Value input: what to compare against (or range inputs)
Operator: equals
equals (default)
• Exact match comparison
• Case-sensitive
• Example: Field "TYPE" equals "CUSTOMER"
• Matches only when field value is exactly "CUSTOMER"
Operator: not equals
not equals
• Inverse of equals
• Matches when field value is anything EXCEPT the specified value
• Example: Field "STATUS" not equals "DELETED"
• Extracts all non-deleted records
Operator: contains
contains
• Substring match
• Case-sensitive
• Example: Field "DESCRIPTION" contains "urgent"
• Matches "This is urgent!", "URGENT order", etc.
Operator: starts with
starts with
• Prefix match
• Case-sensitive
• Example: Field "CODE" starts with "PRD-"
• Matches "PRD-001", "PRD-ABC", but not "OLD-PRD-001"
Operator: ends with
ends with
• Suffix match
• Case-sensitive
• Example: Field "EMAIL" ends with "@company.com"
• Matches "[email protected]", not "[email protected]"
Operator: range
range
• Numeric range comparison
• When selected, the value input changes to two inputs: From and To
• Example: Field QUANTITY range 10 to 100
• Matches values >= 10 AND <= 100
• Leave From or To empty for open-ended range
Multiple conditions
Click "+ ADD CONDITION" to add more conditions to a rule. Multiple conditions use AND logic: ALL conditions must be true for extraction to happen. Example:
• Field "TYPE" equals "ORDER"
• AND Field "AMOUNT" range 1000 to 9999
• AND Field "STATUS" not equals "CANCELLED"
Results table features Interactive Table
After extraction, results appear in an interactive table with sorting, filtering, and resizing capabilities.
Table header
The "Extraction Results" section header shows:
• Table icon + "Extraction Results" label
• Download button: opens export modal
• Expand button: enters fullscreen focus mode
Column headers
The first row contains column headers:
• File name: always first
• Row #: only if "Extract all repetitions" was used
• Field columns: one for each extracted field
Each header shows the field name and sort icon (arrows).
Filter row
Below headers is a filter row with:
• One text input per column
• Placeholder: "Filter..."
• Type to filter that column in real-time
• Filters work with partial matches (contains)
Sorting columns
Click any column header to sort:
• First click: ascending (A-Z, 1-9) - up arrow icon
• Second click: descending (Z-A, 9-1) - down arrow icon
• Third click: original order: neutral arrow icon
Numeric columns sort numerically, text columns sort alphabetically.
Resizing columns
Drag the right edge of any column header to resize:
• Cursor changes to resize cursor
• Drag left to shrink, right to expand
• Minimum width: 80px
• Works in both normal and fullscreen views
Data rows
Each row shows:
• File name: source XML file
• Row #: occurrence number (if applicable)
• Field values: extracted data
• "-": shown when field has no value (empty)
Scrolling
The table container is scrollable:
• Horizontal scroll for many columns
• Vertical scroll for many rows
• Headers stay fixed while scrolling rows
Focus mode (fullscreen) Fullscreen View
Focus mode expands the results table to fullscreen for easier data review and manipulation.
Entering focus mode
Click the expand button (arrows icon) in the results section header. The table expands to cover the entire screen with:
• Dark overlay background
• Centered table view
• Header with "Focus mode" title
Focus mode header
The fullscreen header shows:
• Expand icon + "Focus mode" title (left)
• Download button: same as normal view (right)
• Vertical separator line
• Compress button: exit fullscreen (right)
Table in focus mode
The table behaves identically to normal view:
• Same columns: synced from main table
• Same filters: values sync both ways
• Same sorting: click headers to sort
• Same resizing: drag edges to resize
Synced state
Focus mode and normal view stay in sync:
• Filter text typed in fullscreen appears in normal view
• Sort order syncs both ways
• Column widths sync both ways
• Changes persist when exiting
Exiting focus mode
Exit focus mode by:
• Click the compress button (top right)
• Press Escape key
Scroll position is restored to where you were before entering.
Export options 3 Formats
Export your extraction results in 3 formats: CSV, JSON, or Excel (XLSX).
Opening export modal
Click the download button in the results header (or focus mode header). A modal appears with:
• "Export results" title
• CSV separator options
• "DOWNLOAD CSV" button
• "Other formats" section with JSON and Excel
CSV export
CSV (Comma-Separated Values):
• Separator options: comma (,), semicolon (;), Tab, pipe (|)
• Select your preferred separator
• Click "DOWNLOAD CSV"
• File downloads as xml_extraction_YYYYMMDD_HHMMSS.csv
• Includes UTF-8 BOM for Excel compatibility
CSV separator choice
Which separator to use?
• Comma (,): universal standard, may conflict with data containing commas
• Semicolon (;): European standard, safer if data has commas
• Tab: cleanest separation, good for copy-paste
• Pipe (|): rarely appears in data, safest choice
JSON export
JSON (JavaScript Object Notation):
Click the "JSON" button to export as JSON. File includes:
• metadata object: generated timestamp, tool name, record count, field list
• results array: all extraction results as objects
Downloads as xml-extraction-results-TIMESTAMP.json
Excel export
Excel (XLSX):
Click the "EXCEL" button to export as Excel. Creates:
• Single worksheet named "Extraction Results"
• Headers in first row
• Auto-sized columns based on content
• Maximum column width: 50 characters
Downloads as xml-extraction-results-TIMESTAMP.xlsx
Closing the modal
Close the export modal by:
• Click the X button (top right)
• Click outside the modal
• Press Escape key
• Export also closes the modal automatically
Encoding & SAP support 6 Encodings
The tool automatically detects and handles 6 different encodings, with special support for SAP IDoc XML files.
Automatic encoding detection
When processing XML files, the tool:
1. First reads the file as UTF-8
2. Looks for the encoding declaration in XML header
3. Re-reads with the correct encoding if different
Example: <?xml version="1.0" encoding="windows-1250"?>
Supported encodings
6 encodings supported:
• UTF-8: universal default
• Windows-1250: Central European (Czech, Polish, etc.)
• Windows-1252: Western European
• ISO-8859-2: Latin-2 (Central European)
• ISO-8859-1: Latin-1 (Western European)
• UTF-16: Unicode
SAP IDoc support
The tool is optimized for SAP IDoc XML structures:
• Common elements: E1ADRM1, E1EDK01, E1EDP01, E1MARAM
• Typical fields: PARTNER_Q, NAME1, MATNR, MENGE
• Windows-1250 encoding (common in Czech SAP systems)
• Nested segment structures
SAP extraction example
To extract SAP IDoc addresses:
1. Upload IDoc XML files
2. Use XML Scanner to detect structure
3. Parent segment: E1ADRM1
4. Enable "Extract all repetitions"
5. Fields: NAME1, STREET, CITY, POSTL_COD1
6. Click EXTRACT DATA
Character handling
Special characters are properly handled:
• Czech: ě, š, č, ř, ž, ý, á, í, é, ů, ú
• Polish: ą, ć, ę, ł, ń, ó, ś, ź, ż
• German: ä, ö, ü, ß
• French: é, è, ê, à, ç
Exported CSV includes UTF-8 BOM for Excel compatibility.
Extraction analytics Statistics
Track your extraction activity with the analytics section at the bottom of the tool.
Analytics location
Below the main tool area, the "Extraction analytics" section shows:
• Chart icon + "Extraction analytics" header
• Reset button: clear all statistics
• Three metric cards
Files processed
FILES PROCESSED card:
• File icon (blue)
• Count: total XML files processed
• Increments with each successful file extraction
• Tooltip: "Total number of XML files processed"
Fields extracted
FIELDS EXTRACTED card:
• List icon (blue)
• Count: total fields extracted across all files
• Counts each field value extracted
• Tooltip: "Total fields extracted from XML"
Time saved
TIME SAVED card:
• Clock icon (blue)
• Duration: estimated manual time saved
• Calculation: ~2 minutes per file (manual extraction estimate)
• Shows as "Xmin" or "Xh Xmin"
• Tooltip: "Time saved using this tool"
Persistent storage
Analytics are stored in IndexedDB (browser storage):
• Persists across browser sessions
• Specific to this device/browser
• Not synced across devices
Reset statistics
Click the "Reset" button to clear all analytics:
• Confirmation dialog: "Reset all statistics? This cannot be undone."
• Click OK to confirm
• All counters reset to 0
Mobile action buttons Mobile UI
On mobile devices (768px and below), action buttons appear in a fixed bottom bar for easy access.
Bottom bar layout
The mobile bottom bar shows:
• Action group 1: Parent, Field, Demo buttons
• Divider
• Action group 2: Extract, Download, Clear buttons
• Divider
• Favorite button (heart)
Action buttons
6 action buttons:
• Folder+: add parent segment
• Plus: add new field
• Flask: load demo data
• Play: extract data
• Download: export results
• Trash: clear all fields
Touch targets
Each button is 40x40px: meeting the minimum touch target size for mobile usability. Buttons have:
• Rounded corners
• Border for visibility
• Active state feedback when tapped
Favorite button
The heart button (rightmost) toggles favorite status:
• Outline heart: not favorited
• Filled heart: favorited
• Tap to toggle
Tips & best practices
<Root><Level1><Level2>, create nested parents: Root → Level1 → Level2.Frequently asked questions
<Item> elements, you get 5 rows. Without it, only the first occurrence is used.