SlideShare a Scribd company logo
Introduction to XML
Sasha Schwarzman, 202-777-7518
sschwarzman@agu.org
AGU
14 February 2003
Page 2 of 16
1. XML by example
1.1. Credit card statement (paper)
Cardmember Statement
ACCOUNT NUMBER: 4444888822221111
AVAILABLE CREDIT: 5,000
CLOSING DATE: 11/25/02
PAYMENT DUE DATE: 12/15/02
CARDMEMBER STATEMENT SUMMARY
TRANS
DATE
POST
DATE
REFERENCE
NUMBER
DESCRIPTION OF
TRANSACTION
CREDITS CHARGES
1023 1025 2416QZP Townhouse Store #3306 DC 10.65
1027 1027 2422KQ12 Wazuri DC 55.00
1103 1103 7422120F Payment – Thank you 1,000.00
Page 3 of 16
1.2. Credit card statement (XML)
Page 4 of 16
1.3. XML building blocks
XML deal with documents
 A document is a basic unit of XML information, composed of elements and other
markup in an orderly package
<Description> Payment -- Thank you </Description>
Start tag Character data End tag
Markup Element Markup
 An element is an identifiable, named component of a document
 can have content (but doesn’t have to): data, other elements
 can be a pointer to information (cross-reference, link)
 must have one start and one end tag
 elements can nest but cannot overlap
 An attribute provides additional information about an element
<Transaction Category=”Groceries”>
 found inside start tag
Page 5 of 16
 may be required or implied
 an element may have multiple attributes
1.4. Credit card statement DTD
• What DTD can (structure, sequence, in-document linking, selected occurrence
indicators) and cannot provide for (datatyping, flexible occurrence indicators)
1.5. Document types and their instances
• Invoice
• Sales catalog
• Dictionary
• Journal article
Page 6 of 16
1.6. Validating parser
1.6.1. What parser does
• Is document well-formed? (for stand-alone docs)
• Does a DTD conform to XML specs?
• Does a document instance conform to the DTD?
1.6.2. What parser does not do
• Check semantics (“gobbledygook” might be meaningless but valid as far as a
validating parser is concerned)
• Check what a DTD cannot enforce (datatyping, flexible occurrence indicators)
1.7. Credit card statement in XML environment
Page 7 of 16
2. Components of an XML system
• Document instance
• DTD/Schema
• Validating parser
• Processing system
2.1.1. Document
 Two kinds
 Well-formed
 Valid (has a model)
 Usually created
 Manually – using XML Editor (Epic, XMetaL)
 Programmatically from a database, another XML document, or by conversion
from another format (LaTeX, MSWord)
2.1.2. DTD
 The modeling mechanism specified by the XML standard
 models one type of information
 is a set of rules describing how documents of that type can be marked up
2.2. Processing system
XML DOES NOT DO ANYTHING!
Your software CAN!
 Start/stop behavior
 Run a script, load a database, create a “form letter” and fill-in contents
 Link
 Format (start bold, end bold)
 Process
 Extract selected elements (e.g., metadata)
 Rearrange/resequence content
 Rename, add content
Page 8 of 16
 Count how many
3. XML origins
3.1. What is markup?
 Information added to a document that enhances its meaning in certain ways, in
that it identifies the parts and how they relate to each other.
3.2. Pre-electronic (traditional) markup
Set this header in 12-point Helvetica Medium italic on a 14-point text
body, justified on a 22-Pica slug with indents of 1 en on left and none on
the right.
3.3. Markup language
 A set of symbols that can be placed in the text of a document to demarcate and
label the parts of that document
3.4. Specific markup languages
 Tells formatter what action to take: "carriage return", "center the following lines",
"go to the next page", etc.
3.4.1. RTF, Script, etc.
Script example
.sp (skip one line)
.bf roman 12 (change font size)
.bd .ce Chapter 1. Introduction
(center "Chapter 1. Introduction" and print it in bold)
3.4.2. WYSIWYG Word Processors, DTP, and professional typesetting
systems
• WordPerfect, MSWord, WordStar, MacWrite
• Quark, Ventura
• XYVision, Penta, Miles 33
Proprietary, not interchangeable, structure and presentation inextricably intertwined.
Retrieval, cross-referencing difficult.
Page 9 of 16
3.5. Generic markup languages
 Uses descriptive tags rather than formatting codes. Indicates logical structure of
the documents. Separates formatting from structure/content.
3.5.1. Macro-based languages
• LaTeX for TeX
• Syspub for Waterloo Script
• ms for nroff
LaTeX example
to{Mr. Smith}
stands for 3 commands
noindent
settabs 6 columns
+TO:&Mr. Smithcr
3.6. SGML
• 1960s. GCA’s “GenCode” (Graphics Communications Association)
• 1969. IBM’s GML. Generalized Markup Language (Charles Goldfarb, Edward
Mosher, and Raymond Lorie)
• 1978. ANSI working group formed to provide a format for text interchange to
develop a standard text-description language based on GML headed by Charles
Goldfarb
• 1983 SGML developed. DoD and IRS adopt SGML. DoD develops CALS
(Computer-Aided Acquisition and Logistic Support) as an SGML application.
(CALS tables still in use.) AAP develops DTDs for books and journals. SGML
spreads in Europe and North America
• 1986. ISO ratifies SGML as a standard (ISO 8879:1986)
3.7. HTML
• Early 1990s. Tim Berners-Lee and Anders Berglund of European particle physics
lab CERN develop HyperText Markup Language (Berglund designed a
publishing system to test SGML in the 1980s)
• HTML is an application of SGML for hypertext documents
Page 10 of 16
• Both a step forward (Web, wide adoption, public interest in markup) and a step
back (generic coding principles compromised: one (!) doc type used for all
purposes, many tags purely presentational)
HTML example
Page 11 of 16
 HTML tags format
4. XML
1998. W3C group under Jon Bosak: simplified version of SGML: 80% of SGML
power with 20% of its complexity
4.1. What XML can do
XML can be used to tag…
 Content (what type of information is this?)
 City, state, zip
 Part number
 Debit, credit, payment
 Question, answer
Page 12 of 16
 Genus, species
 Indications, counter-indications
 Structure (what part of document is this?)
 Paragraph, sub-section, section, chapter, list
 Table, figure, formula, video
 Author block, signature block, address block
 Pointers (Location, navigation, linking, and other relationships)
 Hypertext links
 Cross-references
 Indexing terms
 Metadata (information about data)
 Bibliographic/cataloging information (author, title, publication date)
 Index terms and keywords (search terms)
 Revision, version, edition
 Status, tracking information
 Data sources
 Editor’s and reviewer’s comments
 Abstracts, highlights, “teasers”, “blurbs”
 Rendering/Processing (if you MUST) – how text should behave, display, or print
normally handled through a stylesheet but…
 position of graphic on the page (floating, centered)
 line break in titles
 tables
 author’s whimsy (“I want this word bold just because”)
Page 13 of 16
4.2. XML is…
 A subset of SGML. A meta-language that describes the concepts and rules to
build domain-specific markup languages
 A family of technologies/standards (W3C Recommendations): XSLT, XSL,
Xpointer, XPath, XQuery, Xlink, DOM, SAX, etc.
 XML can be used:
 for document modeling
 for data interchange
4.3. XML applications (domain-specific markup languages)
Device/media-oriented:
• XHTML - Web
• WML – wireless markup language
• VoxML – spoken word markup language
Discipline-oriented:
• MathML – mathematical markup language
• CML – chemical markup language
Industry-oriented:
• Airlines/aircraft
• Semiconductors
Process-oriented:
• SVG - Scalable Vector Graphics
4.4. XML is not…
• a programming language. Does not replace C++, Java, Perl, etc.
• a user interface
• a presentation format
• a text formatting or processing system
• a standard set of document types
Page 14 of 16
• a standard or recommended set of tags
• UNICODE
• a database
• user-unfriendly
5. XML in a publishing environment
Page 15 of 16
5.1. Uncontrolled inputs, controlled outputs
Hand held computer
Cell phone
Telephone
A&I
Services
XML
document
TOCs
Indices
Search
Interfaces
XML DB
Word
Perfect
MS
Word
LaTeX
HTML
Post
Script
XML
Converter
Composition
Engine
Low-res
PDF
High-res
PDF
XML Article
HTML
XSLT
stylesheet
CrossRef
MDDB
Page 16 of 16
5.2. Integrated environment with controlled inputs and outputs
Example: technical manual (aircraft, automobile, etc.)
Conceptual configuration of a database-centered XML-aware system (adapted from The SGML Implementation
Guide by B. Travis and D. Waldt)
Authoring Editing Reviewing
Copy-editing
Converting
Imaging
ComposingPublishing
Abstracting
and Indexing
Searching Archiving
Revising
Tracking
Referencing
and linking
Translating
Assigning
Master Database
- Text Objects
- Graphics
- Works in Progress
6. XML advantages
• Encode (markup) data only once. Create single information repository
• Separates content/structure from presentation/formatting
• Software/hardware independent
• Interoperability: common language for a community to agree on data content;
machine-to-machine communication.
• Portability
• Preservation
• Non-proprietary/open industry standard
• Reuse/re-purposing (many outputs)
• Enables semantically complex searching and retrieval
• Cuts down on the number of required converters (saves software development
costs)

More Related Content

What's hot

Web data management
Web data managementWeb data management
Web data management
Abdul Hannan
 
Xml 215-presentation
Xml 215-presentationXml 215-presentation
Xml 215-presentationphilipsinter
 
XML - EXtensible Markup Language
XML - EXtensible Markup LanguageXML - EXtensible Markup Language
XML - EXtensible Markup Language
Reem Alattas
 
Introduction to XML and Databases
Introduction to XML and DatabasesIntroduction to XML and Databases
Introduction to XML and Databases
torp42
 
Xml theory 2005_[ngohaianh.info]_1_introduction-to-xml
Xml theory 2005_[ngohaianh.info]_1_introduction-to-xmlXml theory 2005_[ngohaianh.info]_1_introduction-to-xml
Xml theory 2005_[ngohaianh.info]_1_introduction-to-xml
Ông Thông
 
Web services Overview in depth
Web services Overview in depthWeb services Overview in depth
Web services Overview in depth
AbdulImrankhan7
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
Prabu U
 
Xml schema
Xml schemaXml schema
Xml schema
Akshaya Akshaya
 
XML-Extensible Markup Language
XML-Extensible Markup Language XML-Extensible Markup Language
XML-Extensible Markup Language
Ann Joseph
 
CTDA Workshop on XML and MODS
CTDA Workshop on XML and MODSCTDA Workshop on XML and MODS
CTDA Workshop on XML and MODS
University of Connecticut Libraries
 
Unit ii java script and xhtml documents and dynamic documents with javascript
Unit ii java script and xhtml documents and dynamic documents with javascriptUnit ii java script and xhtml documents and dynamic documents with javascript
Unit ii java script and xhtml documents and dynamic documents with javascript
zahid7578
 
Xml
XmlXml
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
Abhra Basak
 
CTDA Workshop on XSL
CTDA Workshop on XSLCTDA Workshop on XSL
Extensible Markup Language (XML)
Extensible Markup Language (XML)Extensible Markup Language (XML)
Extensible Markup Language (XML)
AakankshaR
 
01 xml document structure
01 xml document structure01 xml document structure
01 xml document structure
Baskarkncet
 
XML
XMLXML

What's hot (20)

Web data management
Web data managementWeb data management
Web data management
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Xml 215-presentation
Xml 215-presentationXml 215-presentation
Xml 215-presentation
 
XML - EXtensible Markup Language
XML - EXtensible Markup LanguageXML - EXtensible Markup Language
XML - EXtensible Markup Language
 
Introduction to XML and Databases
Introduction to XML and DatabasesIntroduction to XML and Databases
Introduction to XML and Databases
 
Xml theory 2005_[ngohaianh.info]_1_introduction-to-xml
Xml theory 2005_[ngohaianh.info]_1_introduction-to-xmlXml theory 2005_[ngohaianh.info]_1_introduction-to-xml
Xml theory 2005_[ngohaianh.info]_1_introduction-to-xml
 
Web services Overview in depth
Web services Overview in depthWeb services Overview in depth
Web services Overview in depth
 
XML Technologies
XML TechnologiesXML Technologies
XML Technologies
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Xml schema
Xml schemaXml schema
Xml schema
 
XML-Extensible Markup Language
XML-Extensible Markup Language XML-Extensible Markup Language
XML-Extensible Markup Language
 
CTDA Workshop on XML and MODS
CTDA Workshop on XML and MODSCTDA Workshop on XML and MODS
CTDA Workshop on XML and MODS
 
light_xml
light_xmllight_xml
light_xml
 
Unit ii java script and xhtml documents and dynamic documents with javascript
Unit ii java script and xhtml documents and dynamic documents with javascriptUnit ii java script and xhtml documents and dynamic documents with javascript
Unit ii java script and xhtml documents and dynamic documents with javascript
 
Xml
XmlXml
Xml
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
CTDA Workshop on XSL
CTDA Workshop on XSLCTDA Workshop on XSL
CTDA Workshop on XSL
 
Extensible Markup Language (XML)
Extensible Markup Language (XML)Extensible Markup Language (XML)
Extensible Markup Language (XML)
 
01 xml document structure
01 xml document structure01 xml document structure
01 xml document structure
 
XML
XMLXML
XML
 

Viewers also liked

Major Parts of a Research Paper
Major Parts of a Research PaperMajor Parts of a Research Paper
Major Parts of a Research Paper
EssayAcademy
 
Structure of research article for journal publication- Dr. THRIJIL KRISHNAN E M
Structure of research article for journal publication- Dr. THRIJIL KRISHNAN E MStructure of research article for journal publication- Dr. THRIJIL KRISHNAN E M
Structure of research article for journal publication- Dr. THRIJIL KRISHNAN E M
DR THRIJIL KRISHNAN E M
 
Prescribed Parts of the Thesis
Prescribed Parts of the ThesisPrescribed Parts of the Thesis
Prescribed Parts of the ThesisJo Bartolata
 
How To Write A Three Part Thesis Statement by Mrs. Scruggs
How To Write A Three Part Thesis Statement by Mrs. ScruggsHow To Write A Three Part Thesis Statement by Mrs. Scruggs
How To Write A Three Part Thesis Statement by Mrs. Scruggs
Wendy Scruggs
 
Applied vs basic research - Research Methodology - Manu Melwin Joy
Applied vs basic research - Research Methodology - Manu Melwin Joy Applied vs basic research - Research Methodology - Manu Melwin Joy
Applied vs basic research - Research Methodology - Manu Melwin Joy
manumelwin
 
Difference between report and article
Difference between report and articleDifference between report and article
Difference between report and article
Junaid-sanwal
 
Types of Articles
Types of ArticlesTypes of Articles
Types of Articlesrobinbowles
 
basic research versus applied research
basic research versus applied researchbasic research versus applied research
basic research versus applied research
Christian Orsolino
 
Basic vs Applied Research
Basic vs Applied ResearchBasic vs Applied Research
Basic vs Applied Research
Anupama Saini
 
Research methodology
Research methodologyResearch methodology
Research methodology
Rolling Plans Pvt. Ltd.
 
5 parts of research paper
5 parts of research paper5 parts of research paper
5 parts of research paperQueene Balaoro
 
Parts of a Research Paper
Parts of a Research PaperParts of a Research Paper
Parts of a Research PaperDraizelle Sexon
 
The thesis and its parts
The thesis and its partsThe thesis and its parts
The thesis and its partsDraizelle Sexon
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
sh_neha252
 
Research methodology ppt babasab
Research methodology ppt babasab Research methodology ppt babasab
Research methodology ppt babasab
Babasab Patil
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-Presented
SlideShare
 

Viewers also liked (17)

Major Parts of a Research Paper
Major Parts of a Research PaperMajor Parts of a Research Paper
Major Parts of a Research Paper
 
Structure of research article for journal publication- Dr. THRIJIL KRISHNAN E M
Structure of research article for journal publication- Dr. THRIJIL KRISHNAN E MStructure of research article for journal publication- Dr. THRIJIL KRISHNAN E M
Structure of research article for journal publication- Dr. THRIJIL KRISHNAN E M
 
Prescribed Parts of the Thesis
Prescribed Parts of the ThesisPrescribed Parts of the Thesis
Prescribed Parts of the Thesis
 
How To Write A Three Part Thesis Statement by Mrs. Scruggs
How To Write A Three Part Thesis Statement by Mrs. ScruggsHow To Write A Three Part Thesis Statement by Mrs. Scruggs
How To Write A Three Part Thesis Statement by Mrs. Scruggs
 
Applied vs basic research - Research Methodology - Manu Melwin Joy
Applied vs basic research - Research Methodology - Manu Melwin Joy Applied vs basic research - Research Methodology - Manu Melwin Joy
Applied vs basic research - Research Methodology - Manu Melwin Joy
 
Difference between report and article
Difference between report and articleDifference between report and article
Difference between report and article
 
Types of Articles
Types of ArticlesTypes of Articles
Types of Articles
 
basic research versus applied research
basic research versus applied researchbasic research versus applied research
basic research versus applied research
 
Basic vs Applied Research
Basic vs Applied ResearchBasic vs Applied Research
Basic vs Applied Research
 
Chapters 1 5
Chapters 1 5Chapters 1 5
Chapters 1 5
 
Research methodology
Research methodologyResearch methodology
Research methodology
 
5 parts of research paper
5 parts of research paper5 parts of research paper
5 parts of research paper
 
Parts of a Research Paper
Parts of a Research PaperParts of a Research Paper
Parts of a Research Paper
 
The thesis and its parts
The thesis and its partsThe thesis and its parts
The thesis and its parts
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
Research methodology ppt babasab
Research methodology ppt babasab Research methodology ppt babasab
Research methodology ppt babasab
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-Presented
 

Similar to XML-talk

Markup For Dummies (Russ Ward)
Markup For Dummies (Russ Ward)Markup For Dummies (Russ Ward)
Markup For Dummies (Russ Ward)
STC-Philadelphia Metro Chapter
 
8023.ppt
8023.ppt8023.ppt
8023.ppt
PoojaTripathi92
 
Why XML is important for everyone, especially technical communicators
Why XML is important for everyone, especially technical communicatorsWhy XML is important for everyone, especially technical communicators
Why XML is important for everyone, especially technical communicators
ECM-Search Consultant - EContent Magazine
 
[DSBW Spring 2010] Unit 10: XML and Web And beyond
[DSBW Spring 2010] Unit 10: XML and Web And beyond[DSBW Spring 2010] Unit 10: XML and Web And beyond
[DSBW Spring 2010] Unit 10: XML and Web And beyond
Carles Farré
 
uptu web technology unit 2 Xml2
uptu web technology unit 2 Xml2uptu web technology unit 2 Xml2
uptu web technology unit 2 Xml2
Abhishek Kesharwani
 
IT6801-Service Oriented Architecture
IT6801-Service Oriented ArchitectureIT6801-Service Oriented Architecture
IT6801-Service Oriented Architecture
Madhu Amarnath
 
XML Introduction
XML IntroductionXML Introduction
XML Introduction
Bikash chhetri
 
XML-Unit 1.ppt
XML-Unit 1.pptXML-Unit 1.ppt
XML-Unit 1.ppt
ssuseree7dcd
 
Internet and Web Technology (CLASS-5) [HTML DOM]
Internet and Web Technology (CLASS-5) [HTML DOM] Internet and Web Technology (CLASS-5) [HTML DOM]
Internet and Web Technology (CLASS-5) [HTML DOM]
Ayes Chinmay
 
UNIT-1 Web services
UNIT-1 Web servicesUNIT-1 Web services
UNIT-1 Web services
madhusrinivasan9
 
IT6801-Service Oriented Architecture- UNIT-I notes
IT6801-Service Oriented Architecture- UNIT-I notesIT6801-Service Oriented Architecture- UNIT-I notes
IT6801-Service Oriented Architecture- UNIT-I notes
Ramco Institute of Technology, Rajapalayam, Tamilnadu, India
 
Xml
XmlXml
advDBMS_XML.pptx
advDBMS_XML.pptxadvDBMS_XML.pptx
advDBMS_XML.pptx
IreneGetzi
 
DATA INTEGRATION (Gaining Access to Diverse Data).ppt
DATA INTEGRATION (Gaining Access to Diverse Data).pptDATA INTEGRATION (Gaining Access to Diverse Data).ppt
DATA INTEGRATION (Gaining Access to Diverse Data).ppt
careerPointBasti
 
Xml 215-presentation
Xml 215-presentationXml 215-presentation
Xml 215-presentation
Manish Chaurasia
 

Similar to XML-talk (20)

Markup For Dummies (Russ Ward)
Markup For Dummies (Russ Ward)Markup For Dummies (Russ Ward)
Markup For Dummies (Russ Ward)
 
8023.ppt
8023.ppt8023.ppt
8023.ppt
 
Why XML is important for everyone, especially technical communicators
Why XML is important for everyone, especially technical communicatorsWhy XML is important for everyone, especially technical communicators
Why XML is important for everyone, especially technical communicators
 
[DSBW Spring 2010] Unit 10: XML and Web And beyond
[DSBW Spring 2010] Unit 10: XML and Web And beyond[DSBW Spring 2010] Unit 10: XML and Web And beyond
[DSBW Spring 2010] Unit 10: XML and Web And beyond
 
Xml
XmlXml
Xml
 
uptu web technology unit 2 Xml2
uptu web technology unit 2 Xml2uptu web technology unit 2 Xml2
uptu web technology unit 2 Xml2
 
IT6801-Service Oriented Architecture
IT6801-Service Oriented ArchitectureIT6801-Service Oriented Architecture
IT6801-Service Oriented Architecture
 
XML Introduction
XML IntroductionXML Introduction
XML Introduction
 
XML-Unit 1.ppt
XML-Unit 1.pptXML-Unit 1.ppt
XML-Unit 1.ppt
 
Internet and Web Technology (CLASS-5) [HTML DOM]
Internet and Web Technology (CLASS-5) [HTML DOM] Internet and Web Technology (CLASS-5) [HTML DOM]
Internet and Web Technology (CLASS-5) [HTML DOM]
 
UNIT-1 Web services
UNIT-1 Web servicesUNIT-1 Web services
UNIT-1 Web services
 
IT6801-Service Oriented Architecture- UNIT-I notes
IT6801-Service Oriented Architecture- UNIT-I notesIT6801-Service Oriented Architecture- UNIT-I notes
IT6801-Service Oriented Architecture- UNIT-I notes
 
Xml3
Xml3Xml3
Xml3
 
Xml
XmlXml
Xml
 
Unit 2.3
Unit 2.3Unit 2.3
Unit 2.3
 
advDBMS_XML.pptx
advDBMS_XML.pptxadvDBMS_XML.pptx
advDBMS_XML.pptx
 
DATA INTEGRATION (Gaining Access to Diverse Data).ppt
DATA INTEGRATION (Gaining Access to Diverse Data).pptDATA INTEGRATION (Gaining Access to Diverse Data).ppt
DATA INTEGRATION (Gaining Access to Diverse Data).ppt
 
Xml
XmlXml
Xml
 
Unit 2.3
Unit 2.3Unit 2.3
Unit 2.3
 
Xml 215-presentation
Xml 215-presentationXml 215-presentation
Xml 215-presentation
 

More from aschwarzman

2012-08-14-OSA-Pubs-IT_Presentation
2012-08-14-OSA-Pubs-IT_Presentation2012-08-14-OSA-Pubs-IT_Presentation
2012-08-14-OSA-Pubs-IT_Presentationaschwarzman
 
2012-05-20-CSE-2012_Schwarzman
2012-05-20-CSE-2012_Schwarzman2012-05-20-CSE-2012_Schwarzman
2012-05-20-CSE-2012_Schwarzmanaschwarzman
 
2012-03-20-AGU-Librarians_Presentation
2012-03-20-AGU-Librarians_Presentation2012-03-20-AGU-Librarians_Presentation
2012-03-20-AGU-Librarians_Presentationaschwarzman
 
2011-11-14-CrossRef-Workshops_Schwarzman
2011-11-14-CrossRef-Workshops_Schwarzman2011-11-14-CrossRef-Workshops_Schwarzman
2011-11-14-CrossRef-Workshops_Schwarzmanaschwarzman
 
2011-09-27-JATS-Con-Presentation_Schwarzman
2011-09-27-JATS-Con-Presentation_Schwarzman2011-09-27-JATS-Con-Presentation_Schwarzman
2011-09-27-JATS-Con-Presentation_Schwarzmanaschwarzman
 
2011-Balisage-Poster-Schwarzman
2011-Balisage-Poster-Schwarzman2011-Balisage-Poster-Schwarzman
2011-Balisage-Poster-Schwarzmanaschwarzman
 
Schwarzman-CSE2011
Schwarzman-CSE2011Schwarzman-CSE2011
Schwarzman-CSE2011aschwarzman
 
Schwarzman-JATS-Con-slides
Schwarzman-JATS-Con-slidesSchwarzman-JATS-Con-slides
Schwarzman-JATS-Con-slidesaschwarzman
 
Extreme-ML-2006-Poster-A-Schwarzman
Extreme-ML-2006-Poster-A-SchwarzmanExtreme-ML-2006-Poster-A-Schwarzman
Extreme-ML-2006-Poster-A-Schwarzmanaschwarzman
 
XML2004-schwarzman
XML2004-schwarzmanXML2004-schwarzman
XML2004-schwarzmanaschwarzman
 
JATS-Con-Schwarzman-slides_corr-2016-04-29
JATS-Con-Schwarzman-slides_corr-2016-04-29JATS-Con-Schwarzman-slides_corr-2016-04-29
JATS-Con-Schwarzman-slides_corr-2016-04-29aschwarzman
 
Balisage_2011-08-03_Schwarzman
Balisage_2011-08-03_SchwarzmanBalisage_2011-08-03_Schwarzman
Balisage_2011-08-03_Schwarzmanaschwarzman
 
Balisage-2015-funding-poster
Balisage-2015-funding-posterBalisage-2015-funding-poster
Balisage-2015-funding-posteraschwarzman
 
Balisage-2015-sup-mat-poster
Balisage-2015-sup-mat-posterBalisage-2015-sup-mat-poster
Balisage-2015-sup-mat-posteraschwarzman
 
Using Schematron for appropriate layer validation: A case study
Using Schematron for appropriate layer validation: A case studyUsing Schematron for appropriate layer validation: A case study
Using Schematron for appropriate layer validation: A case study
aschwarzman
 
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
aschwarzman
 
NISO-NFAIS Supplemental Journal Article Materials Working Group
NISO-NFAIS Supplemental Journal Article Materials Working GroupNISO-NFAIS Supplemental Journal Article Materials Working Group
NISO-NFAIS Supplemental Journal Article Materials Working Group
aschwarzman
 

More from aschwarzman (19)

dineen2013
dineen2013dineen2013
dineen2013
 
2012-08-14-OSA-Pubs-IT_Presentation
2012-08-14-OSA-Pubs-IT_Presentation2012-08-14-OSA-Pubs-IT_Presentation
2012-08-14-OSA-Pubs-IT_Presentation
 
2012-05-20-CSE-2012_Schwarzman
2012-05-20-CSE-2012_Schwarzman2012-05-20-CSE-2012_Schwarzman
2012-05-20-CSE-2012_Schwarzman
 
2012-03-20-AGU-Librarians_Presentation
2012-03-20-AGU-Librarians_Presentation2012-03-20-AGU-Librarians_Presentation
2012-03-20-AGU-Librarians_Presentation
 
2011-11-14-CrossRef-Workshops_Schwarzman
2011-11-14-CrossRef-Workshops_Schwarzman2011-11-14-CrossRef-Workshops_Schwarzman
2011-11-14-CrossRef-Workshops_Schwarzman
 
2011-09-27-JATS-Con-Presentation_Schwarzman
2011-09-27-JATS-Con-Presentation_Schwarzman2011-09-27-JATS-Con-Presentation_Schwarzman
2011-09-27-JATS-Con-Presentation_Schwarzman
 
2011-Balisage-Poster-Schwarzman
2011-Balisage-Poster-Schwarzman2011-Balisage-Poster-Schwarzman
2011-Balisage-Poster-Schwarzman
 
Schwarzman-CSE2011
Schwarzman-CSE2011Schwarzman-CSE2011
Schwarzman-CSE2011
 
Schwarzman-JATS-Con-slides
Schwarzman-JATS-Con-slidesSchwarzman-JATS-Con-slides
Schwarzman-JATS-Con-slides
 
Extreme-ML-2006-Poster-A-Schwarzman
Extreme-ML-2006-Poster-A-SchwarzmanExtreme-ML-2006-Poster-A-Schwarzman
Extreme-ML-2006-Poster-A-Schwarzman
 
XML2004
XML2004XML2004
XML2004
 
XML2004-schwarzman
XML2004-schwarzmanXML2004-schwarzman
XML2004-schwarzman
 
JATS-Con-Schwarzman-slides_corr-2016-04-29
JATS-Con-Schwarzman-slides_corr-2016-04-29JATS-Con-Schwarzman-slides_corr-2016-04-29
JATS-Con-Schwarzman-slides_corr-2016-04-29
 
Balisage_2011-08-03_Schwarzman
Balisage_2011-08-03_SchwarzmanBalisage_2011-08-03_Schwarzman
Balisage_2011-08-03_Schwarzman
 
Balisage-2015-funding-poster
Balisage-2015-funding-posterBalisage-2015-funding-poster
Balisage-2015-funding-poster
 
Balisage-2015-sup-mat-poster
Balisage-2015-sup-mat-posterBalisage-2015-sup-mat-poster
Balisage-2015-sup-mat-poster
 
Using Schematron for appropriate layer validation: A case study
Using Schematron for appropriate layer validation: A case studyUsing Schematron for appropriate layer validation: A case study
Using Schematron for appropriate layer validation: A case study
 
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
 
NISO-NFAIS Supplemental Journal Article Materials Working Group
NISO-NFAIS Supplemental Journal Article Materials Working GroupNISO-NFAIS Supplemental Journal Article Materials Working Group
NISO-NFAIS Supplemental Journal Article Materials Working Group
 

XML-talk

  • 1. Introduction to XML Sasha Schwarzman, 202-777-7518 sschwarzman@agu.org AGU 14 February 2003
  • 2. Page 2 of 16 1. XML by example 1.1. Credit card statement (paper) Cardmember Statement ACCOUNT NUMBER: 4444888822221111 AVAILABLE CREDIT: 5,000 CLOSING DATE: 11/25/02 PAYMENT DUE DATE: 12/15/02 CARDMEMBER STATEMENT SUMMARY TRANS DATE POST DATE REFERENCE NUMBER DESCRIPTION OF TRANSACTION CREDITS CHARGES 1023 1025 2416QZP Townhouse Store #3306 DC 10.65 1027 1027 2422KQ12 Wazuri DC 55.00 1103 1103 7422120F Payment – Thank you 1,000.00
  • 3. Page 3 of 16 1.2. Credit card statement (XML)
  • 4. Page 4 of 16 1.3. XML building blocks XML deal with documents  A document is a basic unit of XML information, composed of elements and other markup in an orderly package <Description> Payment -- Thank you </Description> Start tag Character data End tag Markup Element Markup  An element is an identifiable, named component of a document  can have content (but doesn’t have to): data, other elements  can be a pointer to information (cross-reference, link)  must have one start and one end tag  elements can nest but cannot overlap  An attribute provides additional information about an element <Transaction Category=”Groceries”>  found inside start tag
  • 5. Page 5 of 16  may be required or implied  an element may have multiple attributes 1.4. Credit card statement DTD • What DTD can (structure, sequence, in-document linking, selected occurrence indicators) and cannot provide for (datatyping, flexible occurrence indicators) 1.5. Document types and their instances • Invoice • Sales catalog • Dictionary • Journal article
  • 6. Page 6 of 16 1.6. Validating parser 1.6.1. What parser does • Is document well-formed? (for stand-alone docs) • Does a DTD conform to XML specs? • Does a document instance conform to the DTD? 1.6.2. What parser does not do • Check semantics (“gobbledygook” might be meaningless but valid as far as a validating parser is concerned) • Check what a DTD cannot enforce (datatyping, flexible occurrence indicators) 1.7. Credit card statement in XML environment
  • 7. Page 7 of 16 2. Components of an XML system • Document instance • DTD/Schema • Validating parser • Processing system 2.1.1. Document  Two kinds  Well-formed  Valid (has a model)  Usually created  Manually – using XML Editor (Epic, XMetaL)  Programmatically from a database, another XML document, or by conversion from another format (LaTeX, MSWord) 2.1.2. DTD  The modeling mechanism specified by the XML standard  models one type of information  is a set of rules describing how documents of that type can be marked up 2.2. Processing system XML DOES NOT DO ANYTHING! Your software CAN!  Start/stop behavior  Run a script, load a database, create a “form letter” and fill-in contents  Link  Format (start bold, end bold)  Process  Extract selected elements (e.g., metadata)  Rearrange/resequence content  Rename, add content
  • 8. Page 8 of 16  Count how many 3. XML origins 3.1. What is markup?  Information added to a document that enhances its meaning in certain ways, in that it identifies the parts and how they relate to each other. 3.2. Pre-electronic (traditional) markup Set this header in 12-point Helvetica Medium italic on a 14-point text body, justified on a 22-Pica slug with indents of 1 en on left and none on the right. 3.3. Markup language  A set of symbols that can be placed in the text of a document to demarcate and label the parts of that document 3.4. Specific markup languages  Tells formatter what action to take: "carriage return", "center the following lines", "go to the next page", etc. 3.4.1. RTF, Script, etc. Script example .sp (skip one line) .bf roman 12 (change font size) .bd .ce Chapter 1. Introduction (center "Chapter 1. Introduction" and print it in bold) 3.4.2. WYSIWYG Word Processors, DTP, and professional typesetting systems • WordPerfect, MSWord, WordStar, MacWrite • Quark, Ventura • XYVision, Penta, Miles 33 Proprietary, not interchangeable, structure and presentation inextricably intertwined. Retrieval, cross-referencing difficult.
  • 9. Page 9 of 16 3.5. Generic markup languages  Uses descriptive tags rather than formatting codes. Indicates logical structure of the documents. Separates formatting from structure/content. 3.5.1. Macro-based languages • LaTeX for TeX • Syspub for Waterloo Script • ms for nroff LaTeX example to{Mr. Smith} stands for 3 commands noindent settabs 6 columns +TO:&Mr. Smithcr 3.6. SGML • 1960s. GCA’s “GenCode” (Graphics Communications Association) • 1969. IBM’s GML. Generalized Markup Language (Charles Goldfarb, Edward Mosher, and Raymond Lorie) • 1978. ANSI working group formed to provide a format for text interchange to develop a standard text-description language based on GML headed by Charles Goldfarb • 1983 SGML developed. DoD and IRS adopt SGML. DoD develops CALS (Computer-Aided Acquisition and Logistic Support) as an SGML application. (CALS tables still in use.) AAP develops DTDs for books and journals. SGML spreads in Europe and North America • 1986. ISO ratifies SGML as a standard (ISO 8879:1986) 3.7. HTML • Early 1990s. Tim Berners-Lee and Anders Berglund of European particle physics lab CERN develop HyperText Markup Language (Berglund designed a publishing system to test SGML in the 1980s) • HTML is an application of SGML for hypertext documents
  • 10. Page 10 of 16 • Both a step forward (Web, wide adoption, public interest in markup) and a step back (generic coding principles compromised: one (!) doc type used for all purposes, many tags purely presentational) HTML example
  • 11. Page 11 of 16  HTML tags format 4. XML 1998. W3C group under Jon Bosak: simplified version of SGML: 80% of SGML power with 20% of its complexity 4.1. What XML can do XML can be used to tag…  Content (what type of information is this?)  City, state, zip  Part number  Debit, credit, payment  Question, answer
  • 12. Page 12 of 16  Genus, species  Indications, counter-indications  Structure (what part of document is this?)  Paragraph, sub-section, section, chapter, list  Table, figure, formula, video  Author block, signature block, address block  Pointers (Location, navigation, linking, and other relationships)  Hypertext links  Cross-references  Indexing terms  Metadata (information about data)  Bibliographic/cataloging information (author, title, publication date)  Index terms and keywords (search terms)  Revision, version, edition  Status, tracking information  Data sources  Editor’s and reviewer’s comments  Abstracts, highlights, “teasers”, “blurbs”  Rendering/Processing (if you MUST) – how text should behave, display, or print normally handled through a stylesheet but…  position of graphic on the page (floating, centered)  line break in titles  tables  author’s whimsy (“I want this word bold just because”)
  • 13. Page 13 of 16 4.2. XML is…  A subset of SGML. A meta-language that describes the concepts and rules to build domain-specific markup languages  A family of technologies/standards (W3C Recommendations): XSLT, XSL, Xpointer, XPath, XQuery, Xlink, DOM, SAX, etc.  XML can be used:  for document modeling  for data interchange 4.3. XML applications (domain-specific markup languages) Device/media-oriented: • XHTML - Web • WML – wireless markup language • VoxML – spoken word markup language Discipline-oriented: • MathML – mathematical markup language • CML – chemical markup language Industry-oriented: • Airlines/aircraft • Semiconductors Process-oriented: • SVG - Scalable Vector Graphics 4.4. XML is not… • a programming language. Does not replace C++, Java, Perl, etc. • a user interface • a presentation format • a text formatting or processing system • a standard set of document types
  • 14. Page 14 of 16 • a standard or recommended set of tags • UNICODE • a database • user-unfriendly 5. XML in a publishing environment
  • 15. Page 15 of 16 5.1. Uncontrolled inputs, controlled outputs Hand held computer Cell phone Telephone A&I Services XML document TOCs Indices Search Interfaces XML DB Word Perfect MS Word LaTeX HTML Post Script XML Converter Composition Engine Low-res PDF High-res PDF XML Article HTML XSLT stylesheet CrossRef MDDB
  • 16. Page 16 of 16 5.2. Integrated environment with controlled inputs and outputs Example: technical manual (aircraft, automobile, etc.) Conceptual configuration of a database-centered XML-aware system (adapted from The SGML Implementation Guide by B. Travis and D. Waldt) Authoring Editing Reviewing Copy-editing Converting Imaging ComposingPublishing Abstracting and Indexing Searching Archiving Revising Tracking Referencing and linking Translating Assigning Master Database - Text Objects - Graphics - Works in Progress 6. XML advantages • Encode (markup) data only once. Create single information repository • Separates content/structure from presentation/formatting • Software/hardware independent • Interoperability: common language for a community to agree on data content; machine-to-machine communication. • Portability • Preservation • Non-proprietary/open industry standard • Reuse/re-purposing (many outputs) • Enables semantically complex searching and retrieval • Cuts down on the number of required converters (saves software development costs)