Whether IATA Cargo-IMP, AHM, SSIM, SWIFT, EDIFACT, X12, EDIFICE, EANCOM, TRADACOMS, GENCOD, IEF, SPEC-2000... just name them and you can transform these Electronic Data Interchange standards with this tool... both ways: EDI to XML (with the Parser and optionally XSLT) and XML to EDI (with XSLT).

'Regular expressions (regex or regexp in short) do look screwy, that's a fact. The reason, for me, is that you cannot expect to guess their meaning as easily as you can read and understand intuitively program code in a language you do not master too well. Any guessing attempt will be defeated until you are explicitly told about the very special meaning of a bunch of commonplace characters like . * + ? | ( [ { \ ^ $ and obviously the closing ) ] } .

Count them! we have only 11 special characters, no more. If you know them, you'll understand and develop 95% of the regular expressions that you'll need. You will spend 5 minutes extra about the others, usually looking for the meaning of a seldom used flag in the reference documentation.

How much Un-structure can you afford to handle?

Whereas most parsers require well defined delimiters, tags, or fixed size fields to grasp data fields and carry on transformations, the parser that we are about to present is capable of extracting data from look-alike stuff. Its mapping power is directly related to that of regular expressions which are state-of-the-art in pattern matching. Data resisting pattern matching by being even less structured would imply concepts like ontologies and semantic nets which are not in scope here.

There is a significant gap between a regular expression software library providing just the raw capability to match a pattern, as sophisticated as it can be, and the final production of an XML document. That is indeed the added value of the reverseXSL software: regular expressions are organized to conduct four tasks (identify, cut, extract, and validate) in turn, and recursively, till reaching the atoms of data which must be output into your XML document.

Please ensure that you have read the Accelerated regex tutorial - Part 1 hereabove before  continuing.

The tutorial has been divided in two parts for the ease of learning. This part is quite smaller than PART 1 but could take as much time. It is dedicated to the science of combining patterns

CSV exports from MS-Excel® as much as CSV-alike files from other sources seldom look like neatly repeated sequences of comma-separated lines featuring the same positional elements from the first to the last line in the file. For instance, MS-Excel® ensures that every row yields one line with a constant number of comma-separated fields, but the meaning of fields in each row often vary with its relative position in the sheet. Typically, a sheet containing a variable length table features three zones: a set of heading lines that you will also want to map, a body of repeated table rows in variable quantity, a trailer with totals and other calculated values. In addition, double quotes pop-up around numbers and strings dependent from the value being framed; their presence is not fixed. Similar considerations apply to other CSV brands.

How would you like to handle this?