Introduction to XPath

Introduction to XPath
Kristian Torp
Department of Computer Science
Aalborg University
people.cs.aau.dk/˜torp
torp@cs.aau.dk
November 3, 2015
daisy.aau.dk
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 1 / 59

Outline
1 Introduction
2 Tree Terminology
3 Location Path and Steps
4 XPath Path Expressions
5 Axes
6 Summary

Learning Goals and Focus
Learning Goals
Understand the XPath data model
Know the basic tree terminology
Good at querying XML documents using XPath
Know the abbreviations used in XPath
Very handy to know in practice
Compact and quite readable!
Database Focus
All XML technologies are presented from a database perspective also
called a data focus (i.e., not a document focus)!

Outline
1 Introduction
2 Tree Terminology
5 Axes
6 Summary

Introduction
Example
Find all courses: /coursecatalog/course
Find the semesters: //semester/text()
Overview
A language for
ﬁnding/addressing information in XML documents
navigating through elements and attributes in an XML document
Used in many XML technologies, e.g., XQuery and XPointer
A part of the XSLT recommendation
Microsoft/Visual Studio makes heavy usage of XSLT
The data model is an abstract and logical structure of an XML
document
Called a node tree

The Node Tree
Terminology
Document node: The entire XML document
Also called the document root or the root node
Element node: An XML element
A special one is the document element or root element
Text node: The text strings in an element node
Attribute node: An attribute
Example (A Node Tree)
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip

Example: Find the Courses
Example (Document)
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Query
/coursecatalog/course
Result
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip

Example: Find the Semesters
Example (Document)
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Query
//semester/text()
Result
3 7

Major Components
Components
Nodes
XML document treated as a tree of nodes
Examples: Elements, attributes, and comments
Path expressions
Select a set of nodes in an XML document
Examples: /, /coursecatalog/course
Standard functions
Approximate 100 built-in functions
Examples: concat(’a’, ’b’), round(1.5)

Quiz
Example (Document)
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Questions
Who is the parent of the document element?
How many document elements are there in an XML document?
How many elements can there be in an XML document?
Are elements and attributes the same node type?

Outline
1 Introduction
2 Tree Terminology
5 Axes
6 Summary

Tree Terminology
1
2
3 4
5
6
7
8
9 A B
Tree
Like any other tree in CS

Tree Terminology
1
2
3 4
5
6
7
8
9 A B
Root
Quiz
Are there other roots?

Tree Terminology
1
2
3 4
5
6
7
8
9 A B
Leafs
Quiz
Are there more leafs?

Tree Terminology
1
2
3 4
5
6
7
8
9 A B
Children of 1
Quiz
Who are the children of 3?

Tree Terminology
1
2
3 4
5
6
7
8
9 A B
Siblings of 9
Quiz
Who are the siblings of 3?

Tree Terminology
1
2
3 4
5
6
7
8
9 A B
Ancestors of 6
Quiz
Who are the ancestors of 9?

Tree Terminology
1
2
3 4
5
6
7
8
9 A B
Parent of 8
Quiz
Who are the parents of 4?

Tree Terminology
1
2
3 4
5
6
7
8
9 A B
Descendants of 1
Quiz
Who are the descendants of 5?

Quiz
Example (Another Node Tree)
1
2
3 4
5 6 7
8
9
A
B
C
D E F
G
H I
J
Questions
Parent of E?
Children of 2?
Descendants of 2?

Outline
1 Introduction
2 Tree Terminology
5 Axes
6 Summary

Location Path and Location Step I
Deﬁnition (Location Path)
A location path evaluates to a sequence of nodes
Example (Location Path)
/child::coursecatalog/child::course[name=’OOP’or name=’DB’][@id<10]
Deﬁnition (Location Step)
A location path consists of a number of location steps.
Example (Location Steps)
child::coursecatalog
child::course[name=’OOP’or name=’DB’][@id<10]

Location Path and Location Step II
Deﬁnition
A location step consists of an axis, a node test, and a set of predicates
Example (One)
child::coursecatalog
Axis: child
Node test: coursecatalog
Predicates: empty
Example (Two)
Axis: child
Node test: course
Predicates: [name=’OOP’or name=’DB’][@id<10]

Abbreviations
Most Used
Abbreviation Meaning
. self::node()
.. parent::node()
//coursecatalog /descendant-or-self::coursecatalog
course child::course
Example (Abbreviations in Action)
Abbreviation Meaning
//name /descendant-or-self::name
//name/.. /descendant-or-self::name/parent::node()
/coursecatalog/course /child::coursecatalog/child::course
Note
Abbreviations makes the expression more readable
Sometimes abbreviations can make it hard to guess the result

Evaluation of Location Path I
Example XML Document
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Evalute the Location Path
/child::coursecatalog/child::course[name=’OOP’or name=’DB’][@id<10]/name

Evaluation of Location Path II
The Steps in the Evaluation
1 Starts with / therefore the context node is set to root node
2 Evaluate the location step child::coursecatalog
3 Result is the coursecatalog root element node
4 Set context to root element node
5 Evaluate the location step
6 The result is the two course element nodes
7 Set context to the OOP course element node
8 Evaluate the location step child::name
9 Results in the name element node which is the ﬁrst part of the result
10 Set context to the DB course element node
11 Evaluate the location step child::name
12 Results in the name element node which is the last part of the result

Context
Definition (Context)
A context node (a node in the node tree)
A context size and context position
A set of variable bindings
A function library
A set of name space declaration
Definition (Context Size)
The context size is the lenght of the sequence of nodes return by the
previous location step
Definition (Context Position)
The context position is the current node in the sequence being evaluated

Outline
1 Introduction
2 Tree Terminology
5 Axes
6 Summary

Compact Notation for Node Tree
Example (The Node Tree)
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Example (The Equivalent Compact Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc

Example: Find the Courses
Example (Node Tree)
/coursecatalog
course
course
Query
/coursecatalog/course
Result
course:OOP
course:DB

Example: Find Elements That Do Not Exist
Example (Node Tree)
/coursecatalog
course
course
Query
/coursecatalog/name
Result
Empty no name element below coursecatalog!
Note that it is not an error!

Example: Find the Course Names
Example (Node Tree)
/coursecatalog
course
course
Query
/coursecatalog//name
Result
name:OOP name:DB

Examples: Find the OOP Course
Example (Node Tree)
/coursecatalog
course
course
Query
/coursecatalog/course[name="OOP"]
Result
course:OOP

Example: Find a Course Based on ID
Example (Node Tree)
/coursecatalog
course
course
Query
/coursecatalog/course[@id="2"]
Result
course:DB

Example: Filter on an Attribute
Example (Node Tree)
/coursecatalog
course
course
Query
/coursecatalog/course[@id="2"]/name
Result
name:DB

Example: Get the Name of a Course as a String
Example (Node Tree)
/coursecatalog
course
course
Query
/coursecatalog/course[@id="2"]/name/text()
Result
The string DB

Example: Use Parent Axis
Example (Node Tree)
/coursecatalog
course
course
Query
//course[@id="2"]/parent::node()
Result
The document node, i.e., the entire tree

Example: Use Child Axis
Example (Node Tree)
/coursecatalog
course
course
Query
/coursecatalog/child::node()
Result
course:OOP
course:DB

Example: Use Descendant Axis
Example (Node Tree)
/coursecatalog
course
course
Query
/coursecatalog/descendant::node()
Result
8 element nodes
6 text nodes

Example: Use Functions
Example (Node Tree)
/coursecatalog
course
course
Query
concat("hello, ", "world!")
Result
The string ’hello, world!’

Example: Functions and XPath Expressions
Example (Node Tree)
/coursecatalog
course
course
Query
concat("hello ", /coursecatalog/course[@id="2"]/name/text())
Result
The string ’hello DB’

Most used Path Expressions
Often Used Expressions
Path Expression Description
/ select from the root node
//NodeName select NodeName element nodes
. select the current node
.. select parent of the current node
/NodeName[@id>7] select based on attribute node
/NodeName[Node2=’H’] select based on element node
/NodeName/text() select the text node value
/NodeName/attribute() select the attribute nodes
/NodeName[1] select the ﬁrst NodeName element node
/NodeName[last()] select the last NodeName element node
Note
Almost like Linux/Unix directory navigation

Quiz
Example (Node Tree)
/coursecatalog
course
course
Questions
/coursecatalog/course/name returns?
/coursecatalog/teacher returns?
/coursecatalog is the same as /?
/coursecatalog/course/../course/../course returns?
/coursecatalog/course[@id<11]/name/text() returns?

Outline
1 Introduction
2 Tree Terminology
5 Axes
6 Summary

Node Numbering
Example (Node Tree)
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Note
Depth-ﬁrst numbering of nodes
Used for relative access to other nodes

Forward and Backward Axes
Definition (Axis)
An axis is a sequence of nodes located relative to the context node.
Definition (Forward Axis)
A forward axis can only return the context node or nodes after in the
document order.
Definition (Backward Axis)
An backward axis can only return the context node or nodes that are
before in the document order.

The Axes
Axis Name Direction Description
attribute forward All my attributes
self forward My self
child forward All my children
descendant forward All my children, grand children, etc.
parent backward My unique parent
ancestor backward My parent, grand parent, etc.
following forward All after me that are not ancestors
preceding backward All before me that are not ancestors
following-sibling forward My “younger” siblings
preceding-sibling backward My “elder” siblings
descendant-or-self forward My self and all my descendants
ancestor-or-self backward My self or my ancestors

Child
Finds
Immediately descendants to current node.
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
cur
1 2 3
Quiz
Which direction of the child axis (and why)?

Child Examples
Example (Document Tree)
/coursecatalog
course
course
Queries
/coursecatalog/child::node()
Result: the two course nodes
/coursecatalog/course/child::node()
Result: six element nodes
/coursecatalog/course/attribute()
Result: two attribute nodes
/coursecatalog/course/semester/child::node()
Result: two text nodes

Parent
Finds
The one node immediately above
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
1
cur
Quiz
Which direction of the parent axis (and why)?

Parent Examples
/coursecatalog
course
course
Queries
/coursecatalog/course[@id=’2’]/name/parent::node()
Result: the course element node with id = 2
/coursecatalog/course/name/parent::node()
Result: the two course element nodes
/coursecatalog/parent::node()
Result: the document root
/parent::node()
Result: empty

Descendent
Finds
Children all the way down the tree
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
cur
1
2 3
4 5

Descendant Examples
/coursecatalog
course
course
Queries
/coursecatalog/descendant::node()
Result: 8 element nodes + 6 text nodes
/coursecatalog/course[name="OOP"]/descendant::node()
/coursecatalog/course[name="OOP"]/descendant::node()/attribute()
Result: 2 attribute nodes
/coursecatalog/course/name/descendant::node()
Result: two text nodes

Ancestor
Finds
Parents all the way up the tree
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
4
3
2
1
cur

Ancestor Examples
/coursecatalog
course
course
Queries
/coursecatalog/course[name="DB"]/desc/ancestor::node()[2]
Result: course-catalog element node
/coursecatalog/course/name/ancestor::node()
Result: document node + coursecatalog node + 2 course nodes
/coursecatalog/ancestor::node()
Result: document root
/ancestor::node()
Result: empty

Following
Finds
All nodes that follows excluding descendants
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
cur 1 2
3 4 5

Following Examples
/coursecatalog
course
course
Queries
/coursecatalog/course[@id="4"]/following::node()
/coursecatalog/course[@id="2"]/following::node()
Result: empty
/coursecatalog/course[@id="4"]/name/text()/following::node()
Result: 6 element nodes and 5 text nodes

Preceding
Finds
All preceding nodes excluding ancestors
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
3
2 1
cur

Preceding Examples
/coursecatalog
course
course
Queries
/coursecatalog/course[@id="4"]/semester/text()/preceding::node()
Result: 1 element node + 1 text node, root element is anscestor
/coursecatalog/course/preceding::node()
Result: the OOP course 4 element nodes + 3 text nodes
/coursecatalog/course[name="OOP"]/preceding::node()
Result: empty

Following Sibling
Finds
All siblings nodes following
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
cur 1 2 3

Following Sibling Examples
/coursecatalog
course
course
Queries
/coursecatalog/course/following-sibling::node()
Result: 1 element node (the DB course)
/coursecatalog/course[@id="2"]/following-sibling::node()
Result: empty
/coursecatalog/course/semester/following-sibling::node()
Result: 2 element nodes (descriptions)
/coursecatalog/course/@id/following-sibling::node()
Result: empty

Preceding Sibling
Finds
All siblings nodes before
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
2 1 cur

Preceding Sibling Examples
/coursecatalog
course
course
Queries
/coursecatalog/course/preceding-sibling::node()
Result: 1 element node (the OOP course)
/coursecatalog/course[@id="2"]/preceding-sibling::node()
Result: 1 element node (the OOP course)
/coursecatalog/course/semester/preceding-sibling::node()
Result: 2 element nodes (names)
/coursecatalog/course/desc/preceding-sibling::node()
Result: 4 element nodes (0 attribute nodes)

Outline
1 Introduction
2 Tree Terminology
5 Axes
6 Summary

Summary: XPath
Main Points
XPath is widely used
Not an XML syntax!
XPath is used for many purposes in related XML technologies
XQuery
XSLT
SQL/XML
W3C Recommendation November 1999 www.w3.org/TR/xpath
Note
Very good idea to get familiar with XPath
XPath is the foundation for understanding other XML technologies

Additional Information
Web Sites
www.w3schools.com/XPath/xpath_intro.asp: W3C is always a
good place to start
www.stylusstudio.com/w3c/xpath/: A very good and quite
elaborated tutorial
www.devarticles.com/c/a/XML/Introduction-to-XPath/: Good
4 page tutorial
pierre.senellart.com/wdmd/chap-xpath.pdf: A description of
the XPath data model
Tools
pgfearo.googlepages.com/: A very good tool for playing around
with XPath
There is an introduction screencast
http://www.bit-101.com/xpath/: A good online tool

Introduction to XPath

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Introduction to XPath

Similar to Introduction to XPath (20)

Recently uploaded

Recently uploaded (20)

Introduction to XPath