Transcript of "Lecture 3 - Slicing and Dicing Data Categories -The Art of Taxonomy"
XML FOR DUMMIES http://it-slideshares.blogspot.comBook author: Lucinda Dykes and Ed TittlePart 1 : XML BasicsLecture 3: Slicing and Dicing Data Categories: The Art of Taxonomy
http://it-slideshares.blogspot.com Contents1 Taking Stock of Your Data2 Breaking Down Data in Different Ways3 Developing Your Taxonomy4 Testing Your Taxonomy5 Looning Ahead to Validation
http://it-slideshares.blogspot.com 1. Taking Stock of Your Data Looking at business practices and partners. ♦ Taking a close look at the flow information in your business will help you identify the components of your content. ♦ Each different process is a specialized use of information. ♦ Take some time to talk to those people who create or frequently process the data.Find out: ○ What users do with individual pieces of information. ○ What data users think is impossible to live without. ○ What data is unnecessary or optional. Gathering some content. ♦ The more complete your collection of sameples is, the better chance you have of creating markup that file fits all your content. Here are some ideas: ○ Get data from multiple source ○ Get a lot of data. ○ Get a lot of data from multiple sources.
http://it-slideshares.blogspot.com Taking Stock of Your Data(cont..) Checking whether a DTD or schema already exists. ♦ When you create a document according to a DTD or schema, you use a predefined structure that specifies how the components of markup should be used to describe a particular kink of content. ♦ Predifined DTDs and schemas usually come from a couple of different sources: ○ Industry groups or organizations: that want a common format for standard data such as OFX, CML. ○ Application builders: who created their systems to run with content described by a particular set of markup. For example : CFML, ASP. Searching for a schemaa repository. ♦ You could search for a schema or DTD, or add one of your own to the repository. Online at sites such as www.Biztalk.org or www.schema.net ♦ There is one still schema repository hosted by OASIS at ww.xml.org/xml/registry.jsp. OASIS provides a every comprehensive list of proposed XML applicatuons and industry initiatives at www.oasis-open.org.xml.html#applications. ♦ The whole point of using XML is to make your content as accessible to a system as possible. ♦ Content analysis with XML in mind is much easier when you have a handle on the ins and outs of XML Schemas and DTDs and how to put them together.
http://it-slideshares.blogspot.com 2. Breaking Down Data in Different Ways.When developed our hypothetical book-selling business, we went through the same data-analysis process we’re sharing with you. After we gathered our documents and familiarized ourselves with them, we took a good hard look at what we learned about our content. Here’s what we came up with: - Books can be categorized in a number if different ways, includeing: Author, title, publication date, publisher, edition, language, number of pages, size, type(fiction, nonfiction),special features, format, price, ISBN. - The customer information we collect includes: first name, last name, address, city, state, zip code, email address, phone number. - The sales information we gather in addition to customer information includes: data , item number, price, total cost Winnowing out the wheat from the chaff. ♦ When we analyzed our content, we made some judgments about what information we needed to collect. ♦ We chose to exclude not useful information from our taxonomy strategy. ♦ When you content-analysis process for knowing the purpose of your markup can help you keep your goals in sight.
http://it- slideshares.blogspot.com Breaking Down Data in Different Ways(cont..). Types of data that can be stored in XML. ♦ XML content can be divided into two main groups: data-intensive and document or text-intensive. ♦ On the data end of the spectrum, you find collections of data like those that reside in a database. Each collection consists of a more or less abitrary number of record structures, in which record contains: ○ A unique identifier or key. ○ A common collection of named, organized values. ♦ XML can capture and represent data that describes other collection of data. ♦ XML can handle many kinds of data and can accommodate binary information, it can supply data to other computer applications outside XML’s control. ♦ XML document can reference anything that a computer can represent.
http://it- slideshares.blogspot.com 3. Developing Your Taxonomy After you look at your content, you can start breaking it down into categories and subcategories. Here’s how we broke it down for our hypothetical book bisiness: ● Book ●Sales ○ Item Number ○ Item Number ○ Title ○ Price ○ Shipping ○ Author ○ Total Cost ○ Publisher ○ Date ○ Price ○ Source ○ Content Type ○ Format ○ ISBN.
http://it- slideshares.blogspot.com●Customer ♦ As you can see, Item Number appears as a subcategory in both the Book and the ○Customer Number Sales categories. ○First Name ♦ The Item Number is unique to each copy ○Last Name of a book, which makes it easy to keep track of sales and inventory ○Address ○City ○State ○Zip Code ○E-mail Address ○Phone Number.
http://it- slideshares.blogspot.com 4. Testing Your TaxonomyUsing trial and error for the best fit. ♦ You work with your markup, experiment with using combinations of elements and attributes until you get the best results. For example:We used two nested elements to specify the content type for a book. <book> <contentType>Fiction</contentType> </book> ♦ The markup would use as many ‘content Type’ elements within the ‘book’ but we decided to go with ‘content Type’ as an attribute of the ‘book’ element instead, as show here. <book contentType=”Fiction”/> ♦ We decided on this route because we thought that we’d want to predefine the category names and require valid documents choose one of the names from the list in our DTD or schema.
http://it- slideshares.blogspot.com Testing Your Taxonomy(cont..) Testing Your content analysis. ♦ The best way to your final markup is to apply it to as many content samples as you can lay your hands on. -Shows the final draft of our bookstore markup. <?xml version=”1.0” standalone=”yes”?> <books> <book contentType=”Fiction” format=”Hardback”> <bookInfo> <title>The Da Vinci Code</title> <author>Brown, Dan</author> <publisher>Doubleday</publisher> <isbn>0385504209</isbn> </bookInfo> <salesInfo> <price priceType=”Retail”>$24.95</price> <itemNumber>0385504209-1</itemNumber>
http://it- slideshares.blogspot.com <date>January 12, 2005</date> <source sourceType=”Retail” /> <shipping>$5.00</shipping> <cost>$29.95</cost> </salesInfo> </book> <totalCost>$29.95</totalCost> <customer custType=”newRetail”> <custNumber>5594</custNumber> <lastName>Blow</lastName> <firstName>Joe</firstName> <address>52 Joetta Lane</address> <city>Cottage Grove</city> <state>OR</state> <zip>97424</zip> <phone>767-3333</phone> <email>firstname.lastname@example.org</email> </customer>♦ The first line in our code <?xml version=”1.0” standalone=”yes”?> is an XMLdeclaration. You’ll learn all about XML declaratuons and all the other details ofXML syntax in Chapter 5.
http://it- slideshares.blogspot.com 5. Looking Ahead to Validation You can get to make up as many rules as you want or need to make the markup do what you want it to. The rules that you create with XML can dictate which elements make up an XML document. Creating XML document descriptions enables you to state the rules that a whole class of documents must follow. The two main forms of XML document descriptions in use today are DTDs and XML schemas. DTDs work well for validating XML with text-intensive content, while XML schemas work well for validating XML with data- intensive content.
http://it- slideshares.blogspot.comThank you The end chapter 3