Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
275
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Object Role Modelling and XML-Schema Linda Bird1 and Andrew Goodchild Terry Halpin Distributed System Technology Center (DSTC) 2 Microsoft Corporation Level 7, GP South, The University of Queensland, Seattle WA, USA QLD, 4072, AUSTRALIA Email: TerryHa@microsoft.com Email: [bird, andrewg]@dstc.edu.au Web: http://www.orm.net Web: http://www.dstc.edu.au Abstract: XML is increasingly becoming the preferred method of encoding structured data for exchange over the Internet. XML-Schema, which is an emerging text-based schema definition language, promises to become the most popular method for describing these XML-documents. While text-based languages, such as XML-Schema, offer great advantages for data interchange on the Internet, graphical modelling languages are widely accepted as a more visually effective means of specifying and communicating data requirements for a human audience. With this in mind, this paper investigates the use of Object Role Modelling (ORM), a graphical, conceptual modelling technique, as a means for designing XML-Schemas. The primary benefit of using ORM is that it is much easier to get the model ‘correct’ by designing it in ORM first, rather than in XML. To facilitate this process we describe an algorithm that enables an XML-Schema file to be automatically generated from an ORM conceptual data model. Our approach aims to reduce data redundancy and increase the connectivity of the resulting XML instances. language – namely XML-Schema - designed 1. Introduction specifically for the purpose of describing XML (eXtensible Markup Language) [W3C3] is structured data. XML-Schema provides a richer rapidly emerging as the premier encoding method set of data types, data constraints and data for exchanging data in a portable fashion over the concepts than DTDs. Internet. To date, the primary method for An important feature of XML-Schema is that it defining valid XML documents has been the uses XML as the syntax for describing schemas. ‘Document Type Definition’ (DTD). However, XML is text based and, while it is recognized as because DTDs were originally designed to being “human-readable”, a moderately sized describe semi-structured text-based documents, XML-Schema can become difficult to they have a number of limitations when it comes understand. Graphical modelling languages are to describing the highly structured data widely accepted as being a more visually commonly found in data-oriented applications. effective means of specifying and communicating As a result, the World Wide Web Consortium data requirements for a human audience. For this (W3C) will soon be releasing a schema definition reason, a number of companies have developed 1 Nee Campbell. 2 The work reported in this paper has been funded in part by the Cooperative Research Centres Program through the Department of the Prime Minister and Cabinet of the Commonwealth Government of Australia. 1
  • 2. XML-Schema editors that will graphically 2. XML-Schema present the main constructs of a schema to the XML-Schema [W3C0, W3C1, W3C2] is a new user, including: language being designed by the W3C to describe • XmlAuthority (from Extensibility); the content and structure of document types in • BizTalk Editor (from Microsoft); and XML. It serves the same purpose as the DTD • Near and Far Designer (from OpenText). language, but provides a more powerful method All three of these editors rely on the tree-based of describing and constraining the content of nature of XML-Schema and present a graphical XML-documents. Although DTDs will continue tree-like interface for editing XML schemas. to exist, XML-Schema should better meet the Currently, the graphical languages used by these requirements of a wide-range of data-oriented tools tend to lack an underlying methodology for applications that will use XML. In particular, constructing schemas, and make it difficult to XML-Schema provides the following features: • XML Syntax: XML-Schema uses an XML represent some constraints that are often enforced on databases. Furthermore, the tree-like user syntax, which means that existing XML- interface forces schema designers to make parsers can be used to build XML-Schema decisions about the hierarchical structure of the parsers. DTDs have a specialized syntax, schema too early in the modelling process. which requires XML parser developers to With this in mind, this paper presents a new, support additional non-XML syntax. conceptual approach to designing schemas for • Richer Data Typing: DTDs provide only a XML. In particular, we investigate the use of primitive type system based on textual Object Role Modeling (ORM), a conceptual elements. XML-Schema extends this typing modeling method, as a means for designing mechanism with an extensive range of XML-Schemas. By using ORM, we are able to primitive types from SQL and Java, such as model a rich variety of data constraints, and numeric, date/time, binary, boolean, URIs, delay decisions about the tree-structure of the etc. Furthermore, complex types can be built XML-Schema until after the conceptual analysis from the composition of other types (either phase. Encoding an ORM schema in XML- primitive or complex). In particular, XML- Schema has benefits beyond facilitating the Schema uses a single inheritance model that exchange of schemas between different CASE allows the restriction or extension of type tools and repositories (in the way XMI3 enables definitions. UML schemas to be exchanged). More • Support for Name Spaces: XML-Schema is importantly, the XML-Schema definition namespace-aware, enabling elements with the generated can be used to automatically validate same name to be used in different contexts. the associated XML instance documents against Additionally, schema types and elements can the schema definition. be included (or imported) from a separate Section 2 of this paper gives a brief overview of XML-schema using the same (or different) the new XML-Schema language that is currently namespace. being developed by the W3C. Sections 3 and 4 • Constraints: XML-Schema provides an describe ORM with an ongoing example and assortment of constraint types not supported indicate how ‘major object types’ can be by DTDs, including format-based ‘pattern’ identified in an ORM model. These ‘major object constraints (e.g. “d{3}[A-Z]{2}” represents types’ are then used in Section 5 to describe an three digits followed by two uppercase algorithm for generating XML-Schema files from letters), key and uniqueness constraints, key an ORM diagram. In Section 6 we enumerate references (foreign keys), enumerated types some of the limitations of the mapping process, (value constraints), cardinality (or frequency) before concluding in Section 7. constraints and ‘nullability’. • Other Features: A number of other features, including anonymous type definitions, 3 http://www.oasis-open.org/cover/xmi.html 2
  • 3. element content types, ‘Any’ elements and more stable than those of attribute-based attributes, annotations, groupings and the use approaches [BH97]. of derived types in instance documents, are Paper also provided by XML-Schema. Status Title Phone Nr Email Rating (code) Address An example XML-Schema for a conference (nr) {’undec’, ’accept’, application, as generated by the algorithm in this [1..10] ’reject’} has paper, is presented in Appendix A. An example Review instance of that schema is shown in Appendix B. has has has has refereed 2 3. ORM and XML-Schema Paper (#) Person 4 Object-Role Modelling (ORM) [H98, H99a] is a conceptual modelling approach that views the authors has world in terms of objects, and the roles they play. Accepted has is from Paper Every elementary type of fact that occurs presents between object types in the Universe of Institution attends P Discourse (UoD) is verbalized and displayed on a Name has conceptual schema diagram. ORM allows a wide P Nr Pages Person Country Institution variety of data constraints to be specified, Name (name) is based in including mandatory role, uniqueness, subset, An ’Accepted Paper’ is a ’Paper’ that has a ’Status’ of quot;acceptquot;. exclusion, frequency and ring constraints. Figure 1 shows an example ORM diagram that Figure 1: ‘Conference Paper’ ORM Schema models a ‘Conference Paper’ UoD. In this As XML schemas are hierachical, generating an diagram, object types are represented as named XML-Schema definition from an ORM schema ellipses and relationship-types as named requires one or more object types to start the tree- sequences of adjacent role boxes. Individual role hierarchy. One approach to this mapping problem names can also be used, but are omitted from this could be to select a single object type as the diagram for clarity. An arrowed-bar over a role or XML root-node, and progressively define each role sequence indicates an internal uniqueness ORM fact-type as sub-elements, producing an constraint, and a circled ‘U’ or ‘P’ denotes an XML-instance such as: external uniqueness (or primary uniqueness) constraint. Value constraints are represented as a <ConferencePaper> <Person name=”Winnie the Pooh”> … braced list of values, and frequency constraints as <EmailAddress>pooh@hundredacrewood.edu a numeric range attached to one or more roles. </EmailAddress> …. Subset constraints are shown as a dotted arrow, <AuthoredPaper nr=”27”> exclusion constraints as a circled ‘x’ between the <PaperTitle> A Macro-Economic Theory for relevant role-sequences, and subtype links as Honey Distribution </PaperTitle> solid arrows between object types. <Status> undec </Status> </AuthoredPaper> …. ORM was chosen for designing XML schemas </Person> for three main reasons. Firstly, its linguistic basis <Person name=”Eeyore”> … and role-based notation allows models to be <EmailAddress> easily validated with domain experts by natural eeyore@hundredacrewood.edu verbalization and sample populations [H99b]. </EmailAddress> …. <AuthoredPaper nr=”27”> Secondly, its data modeling support is richer than <PaperTitle> A Macro-Economic Theory for other popular notations (Entity-Relationship (ER) Honey Distribution </PaperTitle> or Unified Modeling Language (UML)), allowing <Status> undec </Status> more business rules to be captured [HB99]. </AuthoredPaper> …. Thirdly, its attribute-free models and queries are </Person> </ConferencePaper> 4 http://www.orm.net 3
  • 4. However, as this example illustrates, this method Intuitively, the ‘major object types’ are the ‘most leads to redundant data at the instance level. important’ object types in a conceptual model. Here, the title and status of a paper are repeated They are identified by selecting those object- for each Author of the paper. types considered to be the ‘most important participant’ in some fact-type5. The ‘importance’ Another approach would be to map every object of a participant in a fact-type is determined by type in the schema to a tag beneath the root node, ‘weighting’ roles, based on the ‘strength’ with and include its associated fact types as which they are ‘anchored’ to their player. The subelements, producing an XML-instance such role with the highest weighting in a fact type is as: referred to as the anchor for that fact-type. This <ConferencePaper> algorithm for weighting and anchoring fact types <EmailAddress> eeyore@shadygove.edu is summarized in the following twelve rules: <Person name=”Eeyore”> … </Person> 1. Any fact type role involved in a non- </EmailAddress> implied mandatory role constraint is <EmailAddress> pooh@shadygove.edu weighted in inverse proportion to the <Person name=”Winnie the Pooh”> number of roles participating in the … </Person> constraint (so non-disjunctive mandatory </EmailAddress> <Phone nr=“+1-555-12348”> constraints receive the greatest weighting) <Person name=”Eeyore”> 2. The player of the role in a unary predicate … </Person> is ‘the most important participant’ in that </Phone> predicate, and is weighted accordingly. <Phone nr= “+1-555-12345”> <Person name=”Winnie the Pooh”> 3. If only one role in a fact type is played by a … </Person> ‘non-leaf’ object type, then this role is </Phone>. . . . ‘conceptually important’ enough to be </ConferencePaper> given a strong weighting. However, as this example demonstrates, this 4. If exactly one role within a fact type has approach leads to a difficult-to-read and the smallest maximum frequency6 of that disconnected XML instance (connected by an fact type, this role should be anchored. extensive list of ‘key references’). 5. If exactly one role in a fact type is played In contrast to these two approaches, the approach by a non-value type, then the fact type presented in this paper minimizes redundancy in should be anchored on this role. the XML-instance document, while retaining the 6. If exactly one role in a given fact type is connectivity of the XML data structures as much played by an object type that became an as possible. To achieve this, we use each of the anchor point via rules 1 to 5, the fact type ‘most important’ (or ‘major’) object types in the is anchored on this role. ORM model as a starting point for an XML 7. If a fact type is involved in exactly one hierarchy and associate each fact-type with single-role set constraint (ie. subset, exactly one of these hierarchies. To this end, we equality or exclusion), and the role at the must first define the concept of a ‘major object other end of the set constraint is anchored, type’. then the constrained role in the given fact type should also be anchored. 4. Major Object Types 8. If a fact type is involved in exactly one The notion of a major object type is based on our (possibly multi-role) set constraint and previous work [B97, CHP96], in which ‘major object types’ (or ‘key concepts’) were identified 5 A ‘fact-type’ is a relationship-type that is not part of for abstraction purposes. It is also similar in idea the primary identification-scheme of any unnested to the process of mapping an ORM model into an object-type. Object-Oriented framework. 6 ‘Smallest maximum frequency’ is calculated based on both uniqueness and frequency constraints. 4
  • 5. exactly one of the roles in the fact type is Once this automatic anchoring procedure has in the corresponding position within the set been applied, it is suggested that the user be constraint as an anchored role, then this given the option to adjust the anchors as required. role is itself anchored. This allows additional human understanding of the UoD to impact on the final choice of ‘major 9. If there exists a non-implied set constraint object types’. For more information and in which one of the roles involved in the algorithm formalisms for automatically constraint is the only involved role in its determining anchors and major object types, fact type to be played by an anchor point please refer to [B97, CHP96]. and the corresponding role’s fact type in the other role sequence is not anchored, 5. ORM to XML-Schema Mapping then this role becomes an anchor. 10. Those unanchored fact types, in which With the major object types of the conceptual only one role is the ‘join role’ for some set schema identified, we can now describe our constraint role sequence, should be algorithm for generating an XML schema from anchored on this ‘join role’. an ORM diagram. The algorithm has three major steps: 11. The first role of each multi-role, non- implied set constraint becomes an anchor, 5.1 Step 1: Generate a type definition if its fact type is not already anchored. for each ORM object type 12. Any fact type not already anchored should ORM value types (including those implied be anchored on the first role involved in an through reference modes) are represented in internal uniqueness constraint. XML-Schema as simple types, and may include Paper Phone Nr Status Title value or range constraints. For example, the Email Rating (code) Address (nr) value type “Email Address”7 is mapped to: {’undec’, ’accept’, [1..10] ’reject’} <simpleType name=”EmailAddress” base=”string”/> has Review while the reference scheme “Rating(nr)” is has has has has mapped to: refereed 2 <simpleType name=”RatingNr” base=”integer”> Paper <minInclusive value=”1”/> (#) Person <maxInclusive value=”10”/> </simpleType> authors and “StatusCode” is mapped to: has has Accepted is from Paper <simpleType name=”StatusCode” base=”string”/> presents <enumeration value=”undec”/> <enumeration value=”accept”/> Institution attends P Name <enumeration value=”reject”/> </simpleType> has P NrPages Person Entity types are mapped to complex types in Country Institution Name (name) XML-Schema, with the value types that form is based in part of their primary identification scheme being An ’Accepted Paper’ is a ’Paper’ that has a ’Status’ of quot;acceptquot;. represented as attributes8, and the entity types Figure 2: Anchored ORM Schema that form part of their primary identification After these ‘anchoring’ rules are applied, the major object-types are identified as those object types to which some fact type is anchored. Figure 2 shows the result of applying this anchoring 7 ORM allows spaces in entity type names, but XML algorithm to the ORM model in figure 1. The does not. To address this, we replace the spaces in major object types are shaded, and the anchors entity type names with capitalisation. are marked with thick arrow-tips. 8 In XML-Schema, attributes are, by default, assumed to be optional (minOccurs = 0) and functional (maxOccurs = 1). 5
  • 6. scheme being represented as sub-elements9. For The method used to determine when (and in example the entity type “Status” is mapped to: which direction) major object type groups may be nested, is based on the existence of a <complexType name=”Status”> functional, mandatory role in the fact type <attribute name=”code” type=”conf: connecting the major object types. While fact- StatusCode” minOccurs=”1”/> type anchors are an effective method of </complexType> identifying major object types, they should not be while the entity type “Institution” is mapped to: used to determine the direction of major object <complexType name=”Institution”> type nestings. <attribute name=”name” type=”conf: InstitutionName” minOccurs=”1”/> Employee Subject <element name=”Country” type=”conf:Country”/> Name Title </complexType> 5.2 Step 2: Build a complex type definition for each major fact type grouping has has As a general rule, each ORM fact type is mapped Employee Subject to a sub-element of the major object type to (nr) (code) which it is anchored. For example, the fact-types is head lecturer of anchored to “Person” map to the definition: <complexType name = quot;PersonFactsquot; base= Figure 3: ‘Lecturer-Subject’ ORM schema ”conf:Person” derivedBy=”extension”> <element name=”EmailAddress” type=”conf:EmailAddress” minOccurs=”0”/> To illustrate this, consider the ‘Lecturer-Subject’ <element name=”Phone” type=”conf:Phone” schema shown in figure 3. If we were to combine minOccurs=”0”/> the ‘Employee’ and ‘Subject’ fact types by <attribute name=”attends” type=”boolean” nesting ‘EmployeeFacts’ inside ‘SubjectFacts’ minOccurs=”1”/> </complexType> (as indicated by the direction of the anchor), we would end up generating the following XML- In this example, the XML element names are Schema definition: based on the names of the associated entity types. The unary predicate “attends” is mapped to a <complexType name=”SubjectFacts” base=”s:Subject” boolean attribute, rather than to an element. derivedBy=”extension”> Optional roles played by major object types have <element name=”Title” type=”s:SubjectTitle”/> <element name=”HeadLecturer”> the constraint ‘minOccurs=”0”’ and multi-role <complexType base=”cp:Employee” uniqueness keys have ‘maxOccurs=”*”’. derivedBy=”extension”> While this approach (of using each major object <element name=”Name” type as the root of further subelements) produces type=”cp:EmployeeName”/> a reasonable XML-Schema, in some cases the </complexType> </element> connectivity of the resulting XML-Schema can </complexType> be improved by combining the fact types anchored around two (or more) major object An XML element (“Subject”) based on this types—in particular, by nesting one major object definition could have the following instances: type’s fact types inside another’s. <Subject code=”CS100”> <Title> Intro to Programming </Title> <HeadLecturer empNr=”5687”> 9 In XML-Schema elements are, by default, assumed to <Name> Helen March </Name> be mandatory (minOccurs = 1) and functional </HeadLecturer> (maxOccurs = 1). Only non-default occurrence </Subject> constraints need be specified. <Subject code=”CS210”> 6
  • 7. <Title> Database Design</Title> ‘Employee’. Hence the ‘Subject’ fact types may <HeadLecturer empNr=”5687”> be grouped with the ‘Employee’ fact types. To <Name> Helen March </Name> understand why these two constraints are so </HeadLecturer> important in making this grouping possible, we </Subject> consider each one in turn: There are two main problems with this mapping • Uniqueness key: If a ‘Subject’ could be approach. Firstly, because each ‘Employee’ may headed by more than one ‘Employee’ (ie be the head lecturer of more than one ‘Subject’ there was no functional uniqueness key), (as per the constraints in figure 3), the same set then nesting the ‘Subject’ facts inside of ‘Employee’ facts may be associated with ‘Employee’ would introduce redundancy several different ‘Subjects’ (if they have the same into the schema. This is because the head lecturer). This introduces redundancy, as ‘Subject Title’ of a ‘Subject’ would be evident in the example instance, in which both repeated every time that ‘Subject’ was ‘Subjects’ include the fact that the Employee headed by a different ‘Employee’. “5687” has the Name “Helen March”. • Mandatory constraint: If a ‘Subject’ did The second main problem is that, based on the not need to be headed by an ‘Employee’ constraints in figure 3, not all ‘Employees’ are (ie. no mandatory constraint), then nesting necessarily the head lecturer of a ‘Subject’ – and the ‘Subject’ facts inside ‘Employee’ those Employees who are not a head lecturer can would make it impossible to represent any not be represented using the above XML- ‘Subject’ not headed by an ‘Employee’. Schema. Instead, there would need to be a Therefore, when a single, mandatory, functional separate list of ‘Employees’ who are not the head relationship type exists between two major object of any ‘Subject’, thus reducing the connectivity types, the fact types anchored to the object type of the schema. on the functional, mandatory side can be nested Instead of nesting major object types towards the inside the other. anchors, as just shown, our approach is to nest A special case of this fact type grouping the major object types away from mandatory, approach, is the nested fact type. In the ORM functional roles. Using our algorithm, the diagram in figure 2, each ‘Review’ object type example in figure 3 maps to the XML-Schema has exactly one10 ‘Paper’ being refereed and definition: exactly one ‘Person’ refereeing it. This <complexType name=”EmployeeFacts” base= mandatory, functional relationship between ”s:Employee” derivedBy=”extension”> ‘Review’ and both of its primary identifiers, <element name=”Name” makes it a candidate for the combining of major type=”s:EmployeeName”/> <element name=”SubjectHeaded”> fact type groups. Since there are two mandatory <complexType base=”s:Subject” functional roles involved (one on each ‘implied’ derivedBy=”extension”> reference type11), we choose to combine the fact <element name=”Title” types towards the anchor of the nested fact type type=”s:SubjectTitle”/> (ie. towards ‘Paper’). Figure 4 shows the final </complexType> fact type groupings. </element> </complexType> It is possible to nest the major object type elements in this way because (a) the mandatory constraint requires each ‘Subject’ to be headed by at least one ‘Employee’, and (b) the uniqueness key requires each ‘Subject’ to be headed by at most one ‘Employee’. The 10 combination of these two constraints means that ‘exactly one’ means ‘at least one’ and ‘at most one’. 11 A reference type is an association that is part of the each ‘Subject’ must be headed by exactly one primary identification scheme of an object type. 7
  • 8. would usually be a better element name than Paper Phone Nr Status Title “Date”. Email Rating (code) Address (nr) {’undec’, Finally, subtypes are mapped to complex types ’accept’, [1..10] that extend their supertype. For example: ’reject’} has <complexType name = quot;AcceptedPaperFactsquot; Review has base=”conf:PaperFacts” derivedBy=”extension”> has has has <element name=”NrPages” refereed 2 type=”conf:NrPages”/> Paper (#) </complexType> Person authors 5.3 Step 3: Create a root element for in has pages has Accepted is from the whole schema and add keys Paper presents and key references. Institution attends P Since each XML document must have a root Name has node, we create a root node element to represent P Quantity Person (nr) Country Institution the whole conceptual model (in this case, called Name (name) is based in “Conference”). A subelement is then created, An ’Accepted Paper’ is a ’Paper’ that has a ’Status’ of quot;acceptquot;. beneath the root node, for each major fact type grouping that was created in Step 2. Based on the Figure 4: XML-Schema fact type groupings example from figure 4, the resulting element The result of this combined fact type grouping on definition generated is: ‘Paper’ is the following XML-Schema definition: <complexType name = quot;PaperFactsquot; base= <element name = quot;Conferencequot;> ”conf:Paper” derivedBy=”extension”> <element name=”Person” type=”PersonFacts” <element name=”Title” type=”conf:PaperTitle”/> minOccurs=”0” maxOccurs=”*”/> <element name=”Status” type=”conf:Status”/> <element name=”Paper” type=”PaperFacts” <element name=”Author” type=”conf:Person” minOccurs=”0” maxOccurs=”*”/> maxOccurs=”*” /> </element> <element name=”Review” minOccurs=”0” maxOccurs=”2” Finally, the primary identification scheme of <complexType> each major object type is mapped to an XML- <element name=”Referee” Schema “key”. For example: type=”conf:Person” /> <element name=”Rating” <key name=quot;PaperKeyquot;> type=”conf:Rating”/> <selector>Conference/Paper</selector> </complexType> <field>@paperNr</field> </element> </key> <element name=”Presenter” type=”conf: Person” Multi-role uniqueness constraints and uniqueness minOccurs=”0” maxOccurs=”*” /> constraints on non-anchored roles are mapped to </complexType> XML-Schema “unique” constraints. For example: As shown in the above example, ‘Person’ plays more than one role in the same fact type grouping <unique name=quot;EmailUniquequot;> <selector>Conference/Person</selector> —namely, the roles of ‘referee’, ‘author’ and <field>EmailAddress</field> ‘presenter’. Therefore the associated XML </unique> elements are named using the role names, rather The fact types connecting major object type than the entity type names, to disambiguate the groupings are mapped to key references. For elements. Even when ambiguity does not arise, example: however, it is often preferable to name an element after the role rather than the entity type. <keyref name=quot;AuthorPersonRefquot; refer=”PersonKey”> For example, where appropriate, “BirthDate” <selector>Conference/Paper/Author</selector> <field>@name</field> 8
  • 9. <field>Institution/@name</field> in ORM that are not currently available in XML- <field>institution/Country/@name</field> Schema. For example: </keyref> • XML-Schema does not support exclusion For the complete XML-Schema definition constraints or subset constraints that target generated from the ORM schema in figure 1, non-key elements; please refer to Appendix A. • XML-Schema supports only a single inheritance model while ORM supports 6. Options and Limitations multiple inheritance; • 6.1 Options XML-Schema does not support disjunctive mandatory constraints; When mapping ORM schemas to XML-Schemas, • ORM subtype definitions cannot be fully there are several different options available that represented in XML-Schema; and have not been discussed so far. For example, all • XML-Schema cannot represent some other fact types anchored by a functional role, could be ORM constraints (e.g. frequency or ring). For modelled as attributes rather than as sub- example, an optional role with a frequency elements. For instance, ‘Paper Title’ and ‘Status’ constraint of exactly 2 (as in our Conference could have been modelled as attributes of the Paper example) cannot be fully represented ‘Paper’ element: in XML-Schema—the closest match to this is <complexType name = quot;PaperFactsquot; > ‘minOccurs=”0” maxOccurs=”2”’. <attribute name=”PaperTitle” type=”conf: PaperTitle” minOccurs=”1’/> These issues could be addressed in a number of <attribute name=”Status” type=”conf: ways. One option would be to map constraint StatusCode” minOccurs=”1”/> verbalisations to comments within the XML- . . . </complexType> Schema. While these comments would preserve Similarly, all direct primary identification the information in the original schema, they schemes (simple or complex) could be cannot be processed and used by an XML parser. represented as attributes. For example: Another approach would be to develop some non-standard extensions to XML-Schema to <complexType name=”Person”> <attribute name=”PersonName” type=”conf: support the additional constraints. However, this PersonName” minOccurs=”1”/> would require that non-standard modifications be <attribute name=”InstitutionName” type= made to an XML-schema validator to support ”conf:InstitutionName” minOccurs=”1”/> these extensions. <attribute name=”InstitutionCountry” type= XML-Schema also has a few features not ”conf:CountryName” minOccurs=”1”/> </complexType> supported by standard ORM. For example, XML- Schema supports format models (using ‘patterns’ Alternatively, it may also be decided that such as “a field consists of two letters followed introducing controlled redundancy into the by three digits”). XML-Schema also allows schema is appropriate, or that decreasing the mixed content models, which allow natural connectivity of the schema has some advantages. language text to be marked up with XML. For In the future, we would like to develop an example: approach to generating XML-Schemas from <paragraph> The <ship> Titanic </ship> sunk in ORM that is configurable, so that modellers have <year> 1912 </year> en route to <location> New greater control over the schemas they develop. York City </location>. More than 1,500 people perished at sea. </paragraph> 6.2 Limitations <paragraph> Only the arrival of the <ship> Carpathia </ship> 1 hour and 20 minutes after the While ORM and XML-Schema have many <ship> Titanic </ship> went down saved further similar features, there are some features available loss of life in the icy waters. </paragraph> 9
  • 10. In this case we could model ORM object types models. Once this has been done, it should be for each concept, such as paragraph, ship and possible to automatically generate XSLT scripts year, but as the text is unstructured it is difficult to translate between the two corresponding to identify fact types without changing the format XML-Schemas. We anticipate that such work of the XML representation. could rely on existing work in ORM schema integration [E95]. 7. Conclusions and Future Work 8. Acknowledgements This paper presented a method of mapping Object Role Models to XML-Schema. We The authors thank Zar Zar Tun (DSTC), Hoylen believe that an ORM-based approach to Sue (DSTC) and Anthony Bloesch (Microsoft) designing XML-Schemas has advantages over for their comments on earlier versions of this current tree based XML-Schema editors for a paper. number of reasons. Firstly, tree-based editors 9. References force designers to make decisions about the tree structure of a schema very early in the modelling [B97] Bird, L. Data Reverse Engineering: process. Secondly, tree-based editors cannot from a Relational Database System to graphically model many of the rich constraints a 3-Dimensional Conceptual Schema. available in ORM. Thirdly, ORM makes it easier Ph.D. Thesis, Department of to visualise, verbalise, populate and validate the Computer Science and Electrical model with the domain expert, thus making it Engineering, The University of easier to design a correct schema. Queensland. 1997. In developing the mapping algorithm, we [BH97] Bloesch, A. & Halpin, T. 1997, discovered many ways to map an ORM schema ‘Conceptual queries using ConQuer- to an XML-Schema. With this in mind, we II’, Proc. ER’97, Springer LNCS, no. distinguished our approach by aiming to 1331, pp. 113-26. minimize the data redundancy in the resulting [CHP96] Campbell,L., Halpin,T., Proper,H., XML-schema, while maximizing the connectivity ‘Conceptual Schemas with of elements. Abstractions: Making flat conceptual Future research plans include exploring schemas more comprehensible’ in alternative options for modelling n-ary and Data & Knowledge Engineering, nested fact types, and additional configuration 20(1996), pp.39-85. alternatives to the mapping process. We also plan [E95] Ewald, C. Foundations of Conceptual to investigate the reverse procedure of generating Schema Evolution. Ph.D. Thesis, an ORM schema from an XML-Schema. In Department of Computer Science and particular, we wish to explore the notion of Electrical Engineering, The preserving the structure of an XML-Schema, University of Queensland. 1997. thereby developing an ORM-to-XML-Schema [H98] Halpin, T. 1998, ‘Object-Role editor that can map in both directions and Modeling (ORM/NIAM)’, Handbook produce the original schema. This will enable on Architectures of Information existing XML-Schemas to be edited with an Systems, Springer, Heidelberg, Ch. 4. ORM tool, while preserving the style and [H99a] Halpin, T. Conceptual Schema & Relational Database Design. 2nd edn structure of the original schema. (revised), WytLytPub, 1999. Finally, there are many published XML-Schemas [H99b] Halpin, T. 1999, ‘Fact-orientation that perform similar functions. If we could before object-orientation: the case for develop a method of mapping between these data use cases’, DataToKnowledge schemas then it should be possible to facilitate a Newsletter, vol. 27, no. 6. higher level of interoperability. We would [HB99] Halpin, T. & Bloesch, A. 1999, 'Data therefore like to investigate how to graphically modeling in UML and ORM: a map between the concepts in two different ORM 10
  • 11. comparison’, Journal of Database Management, Idea group, Hershey. [W3C0] W3C. XML-Schema Part 0: Primer. [W3C3] W3C. Extensible Markup Language Available at: (XML) 1.0. Available at: http://www.w3.org/TR/xmlschema-0/ http://www.w3.org/TR/1998/REC- [W3C1] W3C. XML-Schema Part 1: xml-19980210 Structures. Available at: http://www.w3.org/TR/xmlschema-1/ [W3C2] W3C. XML-Schema Part 2: 11
  • 12. Appendix A: ‘Conference Paper’ XML-Schema <?xml version =quot;1.0quot;?> <schema targetNamespace = “http://www.dstc.edu.au/CONF” xmlns = “http://www.w3.org/1999/XMLSchema “ xmlns:conf=”http://www.dstc.edu.au/CONF”> <!--Type definition for each ORM object type (except AcceptedPaper) and reference type --> <simpleType name=”EmailAddress” base=”string”/> <simpleType name=”PhoneNr” base=”string”/> <simpleType name=”RatingNr” base=”positive-integer”> <minInclusive value=”1”/> <maxInclusive value=”10”/> </simpleType> < complexType name=”Rating”> <attribute name=”nr” type=”conf:RatingNr” minOccurs=”1”/> </complexType> <simpleType name=”PaperTitle” base=”string”/> <simpleType name=”StatusCode” base=”string”/> <enumeration value=”undec”/> <enumeration value=”accept”/> <enumeration value=”reject”/> </simpleType> < complexType name=”Status”> <attribute name=”code” type=”conf:StatusCode” minOccurs=”1”/> </complexType> <simpleType name=”NrPages” base=”positive-integer”/> <simpleType name=”InstitutionName” base=”string”/> <simpleType name=”CountryName” base=”string”/> < complexType name=”Country”> <attribute name=”name” type=”conf:CountryName” minOccurs=”1”/> </complexType> <simpleType name=”PersonName” base=”string”/> <simpleType name=”PaperNr” base=”positive_integer”/> <complexType name=”Paper”> <attribute name=”paperNr” type=”conf:PaperNr” minOccurs=”1”/> </complexType> <complexType name=”Institution”> <attribute name=”name” type=”conf:InstitutionName” minOccurs=”1”/> <element name=”Country” type=”conf:Country” /> </complexType> <complexType name=”Person”> <attribute name=”PersonName” type=”conf:PersonName” minOccurs=”1”/> <element name=”Institution” type=”conf:Institution” /> </complexType> <!—Complex type definitions for each major ORM grouping --> <complexType name = “PersonFactsquot; base=”conf:Person” derivedBy=”extension”> <element name=”EmailAddress” type=”conf:EmailAddress” minOccurs=”0”/> <element name=”PhoneNr” type=”conf:PhoneNr” minOccurs=”0”/> <attribute name=”attends” type=”boolean” minOccurs=”1”/> </complexType> <complexType name = quot;PaperFactsquot; base=”conf:Paper” derivedBy=”extension”> <element name=”PaperTitle” type=”conf:PaperTitle”/> <element name=”Status” type=”conf:Status”/> <element name=”Author” type=”conf:Person” maxOccurs=”*” /> <element name=”Review” minOccurs=”0” maxOccurs=”2”> <complexType> <element name=”Referee” type=”conf:Person”/> 12
  • 13. <element name=”Rating” type=”conf:Rating” minOccurs=”0”/> </complexType> </element> <element name=”Presenter” type=”conf:Person” minOccurs=”0”maxOccurs=”*” /> </complexType> <complexType name = quot;AcceptedPaperFactsquot; base=”conf:PaperFacts” derivedBy=”extension”> <element name=”NrPages” type=”conf:NrPages”/> </complexType> <!-- Element definitions of each major object type in main conference schema --> <element name = quot;Conferencequot;> <element name=”Person” type=”PersonFacts” minOccurs=”0” maxOccurs=”*”/> <element name=”Paper” type=”PaperFacts” minOccurs=”0” maxOccurs=”*”/> <!-- Keys and uniqueness constraints --> <key name=quot;PersonKeyquot;> <selector>Conference/Person</selector> <field>/Institution/@name</field> <field>/Institution/Country/@name</field> <field>@name</field> </key> <key name=quot;PaperKeyquot;> <selector>Conference/Paper</selector> <field>@paperNr</field> </key> <unique name=quot;EmailUniquequot;> <selector>Conference/Person</selector> <field>EmailAddress</field> </unique> <unique name=quot;RefereeKeyquot;> <selector>Conference/Paper</selector> <field>@paperNr</field> <field>Referee/@name</field> <field>Referee/Institution/@name</field> <field>Referee/Institution/Country/@name</field> </unique> <unqiue name=quot;AuthorKeyquot;> <selector>Conference/Paper</selector> <field>@paperNr</field> <field>Author/@name</field> <field>Author/Institution/@name</field> <field>Author/Institution/Country/@name</field> </ unique > < unique name=quot;PresenterKeyquot;> <selector>Conference/Paper</selector> <field>@paperNr</field> <field>Presenter/@name</field> <field>Presenter/Institution/@name</field> <field> Presenter/Institution/Country/@name</field> </ unique > <!— Key references between major object type groupings --> <keyref name=quot;RefereePersonRefquot; refer=”PersonKey”> <selector>./Conference/Paper/Referee</selector> <field>@name</field> <field>Institution/@name</field> <field>/institution/Country/@name</field> </keyref> <keyref name=quot;AuthorPersonRefquot; refer=”PersonKey”> <selector>./Conference/Paper/Author</selector> 13
  • 14. <field>@name</field> <field>Institution/@name</field> <field>/institution/Country/@name</field> </keyref> <keyref name=quot;PresenterPersonRefquot; refer=”PersonKey”> <selector>./Conference/Paper/Presenter</selector> <field>@name</field> <field>Institution/@name</field> <field>/institution/Country/@name</field> </keyref> </element> </schema> Appendix B: ‘Conference Paper’ XML-Instance <?xml version=”1.0”?> <Conference> <Person name=quot;Christopher Robinquot; attends=quot;Truequot;> <Institution name=quot;Stanford Universityquot;> <Country name=quot;USAquot;/> </Institution> <EmailAddress> crobin@stanford.edu </EmailAddress> <PhoneNr> +1-555-54321 </PhoneNr> </Person> <Person name=quot;Winnie the Poohquot; attends=quot;Truequot;> <Institution name=quot;Hundred Acre Wood Universityquot;> <Country name=quot;USAquot;/> </Institution> <EmailAddress> pooh@hundredacrewood.edu </EmailAddress> <PhoneNr> +1-555-12345 </PhoneNr> </Person> <Person name=quot;Tiggerquot; attends=quot;Truequot;> <Institution name=quot;Hundred Acre Wood Universityquot;> <Country name=quot;USAquot;/> </Institution> <EmailAddress> tigger@hundredacrewood.edu </EmailAddress> <PhoneNr> +1-555-12347 </PhoneNr> </Person> <Person name=quot;Eeyorequot; attends=quot;Falsequot;> <Institution name=quot;Hundred Acre Wood Universityquot;> <Country name=quot;USAquot;/> </Institution> <EmailAddress> eeyore@hundredacrewood.edu </EmailAddress> <PhoneNr> +1-555-12348 </PhonenR> </Person> <Paper nr=”27”> <PaperTitle> A Macro-Economic Theory for Honey Distribution </PaperTitle> <Status code=”undec”/> <Author name=quot;Winnie the Poohquot;> <Institution name=quot;Hundred Acre Wood Universityquot;> <Country name=quot;USAquot;/> </Institution> </Author> <Author name=quot;Eeyorequot;> <Institution name=quot;Hundred Acre Wood Universityquot;> <Country name=quot;USAquot;/> </Institution> </Author> <Review> <Referee name=”Tigger”> <Institution name=quot;Hundred Acre Wood Universityquot;> <Country name=quot;USAquot;/> </Institution> </Referee> <Rating nr=”4”/> </Review> <Review> <Referee name=quot;Christopher Robinquot;> <Institution name=quot;Stanford Universityquot;> <Country name=quot;USAquot;/> </Institution> </Referee> <Rating nr=”10”/> </Review> </Paper> </Conference> 14