This article is the first in a series which takes a detailed look at generating strong types, in languages such as C# and Java, from W3C XSD schemas.
W3C Schema definitions are very much active in business areas such as Finance, Air Transportation, government messaging, hospitality and international goods and services
2. TABLE OF
CONTENTS
Introduction......................................................................... 1
Why Generate?..................................................................... 2
The Challenges ..................................................................... 2
The Type Landscape .............................................................. 5
Simple Type.................................................................... 6
W3c Primitives ................................................................ 6
Restrictions..................................................................... 7
Union............................................................................. 8
Choice ........................................................................... 8
List................................................................................ 9
Remarks .............................................................................10
3. INTRODUCTION
W3C schemas can be troublesome.
In domains like travel (Open Travel, IATA NDC), hospitality (HTNG), finance
(ISO20022, SEPA), government (NIEM) and health (Health Level 7), schemas
governing the business messaging, can be both extensive as well as complex. The
example providers mentioned here are by no means a complete list.
Within these domains, and associated with the development of these schema sets,
there is often a business development theme, for example, IATA NDC. However, at
the technical heart of these global standards is the need to deal with the schema
sets, defined as they are, by international working groups.
For a project team tasked with interpreting the schema sets and producing message
handling code, the challenge is to perform this task in such a way that the result
preserves the meaning expressed in the meta-data, maximising fidelity. The idioms
available in the W3C Schema standard are used to the full across the set of domains
noted above, and this elegance and expressiveness can be challenging to translate
into high fidelity code in, say, C# or Java, for example.
Some years ago, the author was on a project, within a global IT provider, which had
the aim of providing an online booking and shopping service for an American airline.
The messages to be used were those defined by Open Travel. At the time this
combined ticketing and shopping approach was novel, pre-dating the arrival of the
NDC initiative of IATA. One of the central arguments of global schema providers is
that usage of their message definitions ensures maximum interoperability between
communicating partners, they all speak the same “language”. However, in this
project because of the complexity of the schemas, which caused real challenges to
interpreting them in a high-level programming language like Java, it was decided to
work with a subset of the definitions, immediately nullifying the interoperability of
the messaging solution.
Once development was underway, it became necessary to prepare for testing a
product increment. For this, the test team researched the appropriate tooling to
allow a range of test messages to be made available for real case coverage. It
turned out that, even in this global IT shop, no such tooling existed, and the
development of the test message structures became a case of handwork.
This approach was completely unsatisfactory in the context of an agile project
process.
4. 2
WHY GENERATE?
Given that schemas represent a data definition, then we might seek to use this as
the source for a transformation process to corresponding (strong) types that could
be used in general development. However, as hinted in the previous section with
the story of the global IT project, there are quite some challenges to be overcome if
we want to take a W3C schema, any such schema, and generate strong types that
match the definition and can thus be used for application development as well as
being able to serialize and de-serialize appropriate messages (sending/receiving) as
specified in the Schema set.
If we could have such a generator, two very powerful benefits would emerge:
1) Retain interoperability since all partners using the generated strong
types would speak the same “language”
2) Embracing changes in the schema definitions would become
straightforward. Of course, appropriate changes in the general
business application would still need to be engineered, but the objects
reflecting the Schema definitions would be easily produced
It is the description of the basis of such a generator which is the subject of this
sequence of articles.
THE CHALLENGES
So, what are the challenges that must be overcome on our way to high-fidelity
generated types?
Firstly, the structural definition of W3C schemas is complex. The idioms that can be
employed by a schema writer are challenging to interpret in a target programming
space. For example, Union and Choice schema elements present challenges to
expression in such languages as C# or Java. In addition, local elements in a Schema
definition which have so-called, complex content, present some challenges to the
generator developer. We also need to embrace the fundamental types of W3C, the
primitive types, e.g., DateTime, and these too require us to devise appropriate
programmatic solutions.
5. 3
In addition, sometimes schema authors interpret a “standard” programmatic entity
in ways that aren’t helpful to those who want to generate code. For example, in a
certain schema one can find an enumeration with the inner element of:
…
<xsd:enumeration value="DOCUMENTS/DATA/PHOTO">
<xsd:annotation>
<xsd:documentation>Not have in my possession any written materials, documents, computer data,
photographs which give evidence of gang involvement or activity such as: (1) membership or enemy lists, (2) articles
which contain or have upon them gang-associated graffiti, drawings or lettering, (3) photographs or newspaper
clippings of gang members, gang crimes or activities including obituaries, (4) photographs of myself in gang
clothing, demonstrating hand signs or holding weapons.
</xsd:documentation>
</xsd:annotation>
</xsd:enumeration>
...
or a restriction on a value expressed as an enumeration, such as:
<xsd:restriction base="xsd:token">
<xsd:enumeration value="3-HIGH RISK"/>
<xsd:enumeration value="COMPACT OUT"/>
<xsd:enumeration value="FUGITIVE"/>
<xsd:enumeration value="2-MODERATE RISK"/>
<xsd:enumeration value="4-EXTREME RISK"/>
<xsd:enumeration value="ISP II"/>
<xsd:enumeration value="1-LOW RISK"/>
<xsd:enumeration value="RESID/IN-STATE CUSTD"/>
<xsd:enumeration value="ISP I"/>
</xsd:restriction>
In both cases, our mythical generator needs to deal, in this case, with enumeration
values with a form that doesn’t match what the target programming space will
accept, C# or Java, for example. In the case where we want to generate from these
definitions, we would first need to have a general algorithm for mapping such value
to acceptable forms. In addition, we would need to persist the original value since
this will be required when serialising/deserializing a message that contains such a
feature.
In this second example, we also see how primitive W3C types can be used to
constrain the scope of a value, in this case, “Token”.
Open Travel
In this article, we will focus on the Open Travel (OTA) schema set as specified in the
first version for the year 2021 (V2021A). Open Travel generally releases two
versions per year, “A” and “B”, which reflect adaptions as requested by the business
community. These adaptions would need to be carefully reviewed by software
development projects to assess their impact on specific products.
6. 4
In earlier years OTA supplied the schema set organised into separate groups e.g.
(air, hotel, loyalty, rail, veh, profile, purchase, reviews, general), to reflect the
different business domains to which the contained schemas applied - Air
Transportation, Hospitality, Rail Transportation, Car Hire, Customer Profile,
Customer Purchasing, Customer Reviews and general definitions). These groups we
will refer to as Silos. However, in 2021, OTA supplied their schemas as one set, no
Silos as far as the basic definition is concerned.
Checking Type Completeness
In order to use a given type produced by our generator, we need some way to
establish the dependent types so that they too can be generated. These types will
be, in general, specified in other schemas within, in our example case, OTA (for
example, the so-called SimpleTypes) as well as types specified by the W3C
organisation (what we will refer to as XSD primitives, as we touched on above).
How can we, in general, establish this Type landscape completeness? The options
are:
Use a generated type in a programming project, build it and react to the
unsatisfied references
Scan the schema from which our specific types are to be generated and find
all dependency references (for example, in such idioms as “include”, “import”,
“xs:…” and “base=…”)
If we look at the OTA air travel request definition (request message definition),
specified in OTA_AirBookRQ.xsd, then applying the second approach above, gives the
following result:
Idiom Referenced Schema/Type Comment
include OTA_AirCommonTypes.xsd This is the schema that needs to be
scanned for needed types. This schema
will probably include/import others (see
below)
Import - None
type= POS_Type
AirItineryType
OTA_CodeType
xs:boolean
The type prefixed by “xs:” are W3C
Primitives, otherwise the type needs to
be found in a dependent schema.
7. 5
TravelerInfoType
FulfillmentType
DateOrDateTimeType
StringLength1to64
UniqueID_Type
EMD_Type
DonationType
AirOfferChoiceType
TransactionActionType
base= BookingPriceInfoType
TicketingInfoType
BookingPriceInfoType
These types form the “base types” for
other types, these types need to be
found in a dependent schema.
With the “include” case we will need to ensure that all dependencies are tracked. In
the case of the one noted above, OTA_AirCommonTypes.xsd, we see the following
dependency structure:
Schema Includes Includes
OTA_AirCommonTypes.xsd OTA_CommonTypes.xsd OTA_SimpleTypes.xsd (final)
OTA_Lists.xsd (final)
OTA_AirPreferences.xsd OTA_AirCommonTypes.xsd (*)
OTA_CommonPrefs.xsd
OTA_CommonPrefs.xsd OTA_CommonTypes.xsd (*)
Where the entries marked (*) represent circular references. So, to have a “type-
complete” solution for our single example type, OTA_AirBookRQ.xsd, we need to
locate and generate all the dependent types. In this context, “type complete
solution” means a collection of definitions, which lead to generated strong types, is
compilable within the target development environment.
THE TYPE LANDSCAPE
The structures expressed in a W3C schema describing a single message can be
complex and dealing with this complexity, and generating a usable strong type, is
challenging. In this section, we outline some of these structures as they reflect in a
Type landscape.
The following sections highlight some of the basic value-type XSD structures that
need to be dealt with by a generator2
8. 6
Simple Type
The euphemistically named Simple Types represent a definition of a single value.
Simple Types can appear in an overall schema at the top level or can be specified
within other schema elements.
The top-level, global, Simple Type has the structure1:
<simpleType
final=(“#all” |
(“list” | “union” | “restriction”))
id=xs:ID
name=xs:NCName
any attributes with non-schema namespace>
(annotation?,(restriction|list|union))
</simpleType>
And this definition is what governs such types, in OTA V2021A, as:
<xs:simpleType name="DateOrMonthDay">
<xs:annotation>
<xs:documentation xml:lang="en">A construct to validate either a date or a month and day
value.</xs:documentation>
</xs:annotation>
<xs:union memberTypes="xs:date xs:gMonthDay"/>
</xs:simpleType>
and:
<xs:simpleType name="ListOfISO3166">
<xs:annotation>
<xs:documentation xml:lang="en">List of country codes in ISO 3166 format.</xs:documentation>
</xs:annotation>
<xs:list itemType="ISO3166"/>
</xs:simpleType>
as well as:
<xs:simpleType name="AlphaNumericStringLength1">
<xs:annotation>
<xs:documentation xml:lang="en">Used for Alpha-Numeric Strings, length 1.</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9a-zA-Z]{1}"/>
</xs:restriction>
</xs:simpleType>
W3c Primitives
In the Schema types we saw in the previous section, there were a number of places
where references in the form “xs:type-name” occurred, for example, “xs:date”.
These are references to primitive types defined by W3C2
1 XML Schema, Eric van der Vlist, O’Reilly 2002
2 XML Schema String Datatypes (w3schools.com)
9. 7
There are more than forty primitive types, and along with “xs:string” we have such
types as “xs:anyUri”, “xs:double”, “cs:duration”, “xs:token” and
“xs:nonNegativeInteger”.
These definitions need to be available as strong types so that our mythical
generation process can have a closed landscape of types to deal with. These W3C
Primitive Types can be expressed in schema form, for example, in the case of
nonNegativeInteger:
<xs:simpleType name="nonNegativeInteger" id="nonNegativeInteger">
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/>
</xs:restriction>
<xs:simpleType>
Restrictions
A fundamental feature of XSD type definitions is the specification of so-called
restrictions. We have seen one form in the definition above of
AlphaNumericStringLength1 in the statement block:
…
<xs:restriction base="xs:string">
<xs:pattern value="[0-9a-zA-Z]{1}"/>
</xs:restriction>
…
or the definition of nonNegativeInteger, in the previous section. This specifies that
the underlying form of the value is as specified by the W3C string type, but with the
additional constraint as specified by a regular expression pattern. So, in this case,
valid values would be:
“0”, “h”, “q”
There is a range of restriction types, e.g. “xs:enumeration”, which we saw earlier,
“xs:minLength”, “xs:maxLength”, “xs:fractionDigits” etc, each of which has an
associated “base=” form.
Our generator needs to be able to interpret such definitional structures
appropriately.
10. 8
Union
One of the idioms in W3C schema that does not have a direct form in C# or Java, is
the Union. We encountered it earlier in the definition of DateOrMonthDay. Another
example is:
<xs:simpleType name="DateOrDateTimeType">
<xs:annotation>
<xs:documentation xml:lang="en">A construct to validate either a date or a dateTime value.
</xs:documentation>
</xs:annotation>
<xs:union memberTypes="xs:date xs:dateTime"/>
</xs:simpleType>
This form of definition, in a sense, defines a single object that can hold either of the
two defined values. At any given time only one of the specified values can exist in
the element, either a W3C “date” or a W3C “dateTime”. The C programming
language has a built-in Union type, but C#, Java and Kotlin, for example, do not.
Choice
Another anachronistic idiom of W3C schema is the Choice element (compositor).
The choice allows only one of its children to appear in an instance. An example of
which is exemplified by the definition of the OTA type AppliedRuleType:
<xs:complexType name="AppliedRuleType">
<xs:annotation>
<xs:documentation source="Use" xml:lang="en">
Applied rule information, including category, rule identifier and rule descriptions.</xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:choice>
<xs:annotation>
<xs:documentation source="Use" xml:lang="en">
A choice between a default rule indicator OR a rule name and version number.
</xs:documentation>
</xs:annotation>
<xs:element name="DefaultUsedInd" type="xs:boolean">
<xs:annotation>
<xs:documentation source="Use" xml:lang="en">
If true, a system default rule was used.
</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element name="RuleInfo">
<xs:annotation>
<xs:documentation source="Use" xml:lang="en">
Information for individual airline applied rules.
</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:attribute name="Name" type="xs:string" use="required">
<xs:annotation>
<xs:documentation source="Use" xml:lang="en">
The name of the applied rule.
</xs:documentation>
</xs:annotation>
11. 9
</xs:attribute>
<xs:attribute name="Version" type="xs:integer" use="optional">
<xs:annotation>
<xs:documentation source="Use" xml:lang="en">
The version of the rule.</xs:documentation>
</xs:annotation>
</xs:attribute>
</xs:complexType>
</xs:element>
</xs:choice>
…
</xs:sequence>
. . .
</xs:complexType>
This is quite a complex definition, even with parts redacted, but at its core, there is
a structure reflecting a choice between two schema elements, DefaultUserInd and
RuleInfo.
List
As the name suggests, the List element is a “value” that is a sequence of strings,
e.g. “First Second Third”. It can also be a list of items of the same type, as specified
in ListOfISO3166:
<xs:simpleType name="ListOfISO3166">
<xs:annotation>
<xs:documentation xml:lang="en">List of country codes in ISO 3166 format.</xs:documentation>
</xs:annotation>
<xs:list itemType="ISO3166"/>
</xs:simpleType>
In this case the list is a white-space separated set of ISO3166 country codes, e.g.
“EN DE IT”.
For modern programming languages handling this sort of “value” would present no
problem. However, the generated class will need to have helper methods to parse
the list.
12. 10
REMARKS
In this article, we have presented some of the challenges related to transforming
W3C Schema definitions into strong types, in target languages like C#, Java and
Kotlin.
In subsequent articles, we will look at how these challenges can be met on our way
to describe a concrete generation pattern.