89.40.112.148 - Stop developing ad-hoc non-XML format parsers

Think about the time it takes you to understand the subtleties of a legacy data exchange format. If at the same time that you digest the syntax, you also formalize your understanding into a ReverseXSL DEF file, you have done! The effort is marginal over the one required by the in-depth reading of the message specs alone.

Next, you are left with testing against message samples. In case the target XML format is also imposed, and requires indeed to rearrange some information pieces, you will add an XML-to-XML mapping developed with your favorite XSLT editor.

No, the ReverseXSL parser does not just replace delimiters by '<..>' and '</..>' to lure you to XML

Many light XML'izing approches do just that! Tags become XML nodes, delimited data field sequences yield positional tags like <1>, <2>, <3>, and so forth. Consequently:

You must endeavor the development of an XML schema to validate the structure and specify facets for data validation
The verification of interdependecies and conditions will require extra XSL code
You'll need a second XML schema to validate the final result
If you ever get an error, it will be very hard to trace it back to an offset in the true original message, and the error messages themselves will not be directly relevant to the original message specifications
Some legacy syntaxes mix several data element delimiting conventions, making a dumb approach simply impractical. To illustrate, consider these message bits: in /K600/CMT60-70-100/22, there are implicit delimiters in the alpha-to-num transitions, i.e. between K and 600, CMT and 60, and the '-' separates length, width and height fields. Further in /JFKAA1234/23NOV there's an obvious cut to assume between 23 and NOV, but not between JFKAA and 1234; the proper cut is JFK (airport code, fixed 3 chars) and AA1234 (flight number). We have then in the same message SCI/DFD45-24-263/A where the '-' is just part of the atom value DFD45-24-263 and not a delimiter. In this: FA111Y/25 1/FB/FA 4/FDA 6/LX545A/23MAR09 the space char is a delimiter, and 1/, 4/, 6/, are distinguished optional data element tags, not data; whereas in AMSQQQ 114/ABC AIRWAYS INC. the first space is a delimiter, but obviously not for the rest of the line. Moreover, AMSQQQ is not a segment tag, but the combination of two 3-letter city codes. 114 is also a code, so if you want to match this segment, you must identify a pattern 6 letters followed by a single space and 3 digits. ReverseXSL handles all these cases straight and clean. I borrowed examples from IATA messages, but SWIFT and others are alike. With such flexibility, you can effectively transform printable outputs to XML; out of reach of dumb XML'izers.

Of course, you can add loads of string handling processing in XSL to sort this out, falling back to about the same activity as developing custom java or C# code. Ooouups!

Produce a much nicer XML document with less code in less time

Instead of getting something like:

<LIN> <1>C</1> <2>541234500007</2> <3>LENS</3>
    <4>12</4> <5>BX</5> <6>2350</6> </LIN>

you'll immediately produce:

<LineItem status="Confirmed">
  <SKU>54-12345-00007</SKU> 
  <Name>LENS</Name> 
  <Qty Unit="Boxes">12</Qty> 
  <UnitPrice>23.50</UnitPrice> 
</LineItem>

Not visible, but quite as much valuable, you will have already validated every element, checked mandatory and optional constraints, dealt with repetition limits, and verified inter-dependencies.

There is however one thing that the ReverseXSL transformer does not perform: re-order elements! In other words, you can:

sub-cut or combine elements
hide elements
interpret coded values
insert new elements
decide which one to make an attribute instead of a text node
insert new groups around arbitrary element sequences
flatten structures or the other way round, increase nesting levels

but, no, you will not re-order... by design, because the remedy is astoundingly simple: put all element tags, namespaces, attributes and the hierarchical structure in place, and then spend 10 minutes in the drag-drop mode of your favorite XSLT editor to generate the final order (and syntax if the output is not XML). You have done.

How can I convince you to be efficient?

For zero extra effort, benefit from comprehensive error handling

The project manager for which the ReverseXSL Transformer has been initially developed helped his company process nearly a billion messages since he started on the job. He was uncompromising on two points:

the flexibility for fully handling the most screwy syntaxes (explained elsewhere), putting an end to custom coding
comprehensive error handling, with no less than:

details about the error context (message chunks), exact offset, and good error description, immediately understandable
a quote of the exact and complete identification of faulty elements by reference to the original message specifications
provide the option not to stop on the first error, but continue parsing. In particular, dissociate minor and major errors, and give the option to accept non-formally-compliant messages with a report of all minor deviations
provide the ability to accept messages with extra lines, indentation, trailing spaces, variant line terminators, and other such common pollution by transmission channels without falling into syntaxical errors

The ReverseXSL software met all his requirements.