• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Perl Xpath Lightning Talk
 

Perl Xpath Lightning Talk

on

  • 1,930 views

 

Statistics

Views

Total Views
1,930
Views on SlideShare
1,919
Embed Views
11

Actions

Likes
0
Downloads
0
Comments
0

2 Embeds 11

http://www.slideshare.net 10
http://webcache.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • What is XPath? XPath is a syntax for defining parts of an XML document XPath uses path expressions to navigate in XML documents XPath contains a library of standard functions XPath is a major element in several XML Based Technologies
  • Nodes In XPath, there are seven kinds of nodes: element, attribute, text, namespace, processing-instruction, comment, and document (root) nodes. XML documents are treated as trees of nodes. The root of the tree is called the document node (or root node). Atomic values Atomic values are nodes with no children or parent. Items Items are atomic values or nodes. Relationship of Nodes Parent Each element and attribute has one parent. Children Element nodes may have zero, one or more children. Siblings Nodes that have the same parent. Ancestors A node's parent, parent's parent, etc. Descendants A node's children, children's children, etc.
  • Selecting Nodes XPath uses path expressions to select nodes in an XML document. The node is selected by following a path or steps. The most useful path expressions are listed below: Expression Description nodename Selects all child nodes of the named node / Selects from the root node // Selects nodes in the document from the current node that match the selection no matter where they are . Selects the current node .. Selects the parent of the current node @ Selects attributes Wildcard Description Matches any element node Selecting Several Paths By using the | operator in an XPath expression you can select several paths.

Perl Xpath Lightning Talk Perl Xpath Lightning Talk Presentation Transcript

  • Introduction to Perl & XPATH
  • What is XPATH
  • XPATH Terminology
    • Items
      • Nodes
      • Atomic values
    • Relationship of Nodes
      • Parent
      • Children
      • Siblings
      • Ancestors
      • Descendants
  • XPATH Syntax != div - + * >= > = <= < or and | [] @ .. . // / Nodename
  • XML Sample
    • <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>
    • <root>
    • <node subject=&quot;Perl&quot;>
    • <title lang=&quot;en&quot;>Mastering Perl</title>
    • <author>brian d foy</author>
    • <year>2007</year>
    • <price>39.99</price>
    • </node>
    • <node subject=&quot;Perl&quot;>
    • <title lang=&quot;en&quot;>Perl Best Practices</title>
    • <author>Damian Conway</author>
    • <year>2005</year>
    • <price>39.95</price>
    • </node>
    • <node subject=&quot;OO&quot;>
    • <title lang=&quot;en&quot;>Design Patterns: Elements of Reusable Object-Oriented Software</title>
    • <author>Erich Gamma</author>
    • <author>Richard Helm</author>
    • <author>Ralph Johnson</author>
    • <author>John Vlissides</author>
    • <year>1994</year>
    • <price>59.99</price>
    • </node>
    • <node subject=&quot;RegEx&quot;>
    • <title lang=&quot;en&quot;>Mastering Regular Expressions , Third Edition</title>
    • <author>Jeffrey E. F. Friedl</author>
    • <year>2006</year>
    • <price>44.99</price>
    • </node>
    • </root>
  • Sample Script
    • use strict;
    • use warnings;
    • use XML::LibXML;
    • # To the daring win32 perl people: use the ppm command line
    • # to install this package, the gui still doesn’t give the right answers
    • # to the setup procedure.
    • #### Fetch libxml2.dll? [yes] yes
    • #### Where should libxml2.dll be placed? [C:Perlin] <ENTER>
    • my $RawXml = '<?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>
    • <root>
    • ..
    • </root>';
    • my $parser = XML::LibXML->new();
    • my $tree = $parser->parse_string($RawXml);
    • my $root = $tree->getDocumentElement();
    • ..
    • foreach my $N ( $bookshelf->findnodes(' XPATH_EXPRESSION ') ) {
    • # print $N. &quot; &quot;; # XML::LibXML::Element=SCALAR(0x1a7a7d4)
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
  • Searching throughout on from the current (xml libxml object) node
    • use strict;
    • use warnings;
    • use XML::LibXML;
    • ..
    • foreach my $N ( $bookshelf->findnodes('//title') ) {
  • Searching throughout the specified xml tree path
    • use strict;
    • use warnings;
    • use XML::LibXML;
    • ..
    • foreach my $N ( $bookshelf->findnodes('book') ) {
    • foreach my $N ( $bookshelf->findnodes('title') ) { # won't work since no 'title' child objects at this xml tree level
    • foreach my $N ( $bookshelf->findnodes('book/title') ) {
    • foreach my $N ( $bookshelf->findnodes('/bookshelf/book/price') ) {
  • Playing with XPath positions
    • use strict;
    • use warnings;
    • use XML::LibXML;
    • ..
    • foreach my $N ( $bookshelf->findnodes('book[1]') ) {
    • foreach my $N ( $bookshelf->findnodes('book[1]/title') ) {
    • foreach my $N ( $bookshelf->findnodes('book[position()<3]') ) {
    • foreach my $N ( $bookshelf->findnodes('book[last()-1]') ) {
    • foreach my $N ( $bookshelf->findnodes('book[last()]') ) {
  • Querying element values
    • use strict;
    • use warnings;
    • use XML::LibXML;
    • ..
    • foreach my $N ( $bookshelf->findnodes('/bookshelf/book[price>40 and price<50]/title') ) {
    • foreach my $N ( $bookshelf->findnodes('/bookshelf/book[price>50]/title') ) {
  • Querying attributes
    • use strict;
    • use warnings;
    • use XML::LibXML;
    • ..
    • foreach my $N ( $bookshelf->findnodes('//@lang') ) {
    • foreach my $N ( $bookshelf->findnodes('//title[@id]') ) {
    • foreach my $N ( $bookshelf->findnodes('//title[@lang]') ) {
    • foreach my $N ( $bookshelf->findnodes('//title[@id | @ooo]') ) {
    • foreach my $N ( $bookshelf->findnodes( ' //title[@id= &quot; en &quot; ] ' ) ) {
  • Examples
  • XML Sample
    • <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>
    • <root>
    • <node subject=&quot;Perl&quot;>
    • <title lang=&quot;en&quot;>Mastering Perl</title>
    • <author>brian d foy</author>
    • <year>2007</year>
    • <price>39.99</price>
    • </node>
    • <node subject=&quot;Perl&quot;>
    • <title lang=&quot;en&quot;>Perl Best Practices</title>
    • <author>Damian Conway</author>
    • <year>2005</year>
    • <price>39.95</price>
    • </node>
    • <node subject=&quot;OO&quot;>
    • <title lang=&quot;en&quot;>Design Patterns: Elements of Reusable Object-Oriented Software</title>
    • <author>Erich Gamma</author>
    • <author>Richard Helm</author>
    • <author>Ralph Johnson</author>
    • <author>John Vlissides</author>
    • <year>1994</year>
    • <price>59.99</price>
    • </node>
    • <node subject=&quot;RegEx&quot;>
    • <title lang=&quot;en&quot;>Mastering Regular Expressions , Third Edition</title>
    • <author>Jeffrey E. F. Friedl</author>
    • <year>2006</year>
    • <price>44.99</price>
    • </node>
    • </root>
  • Sample Script header
    • use strict;
    • use warnings;
    • use XML::LibXML;
    • my $RawXml = '<?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>
    • <root>
    • ..
    • </root>';
    • my $parser = XML::LibXML->new();
    • my $tree = $parser->parse_string($RawXml);
    • my $root = $tree->getDocumentElement();
  • Find all nodes
    • ## Find all nodes
    • foreach my $N ( $root->findnodes('node') ) {
    • print $N. &quot; &quot;;
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## Output Sample:
    • ## XML::LibXML::Element=SCALAR(0x1a7a7d4)
    • ## node=&quot;
    • ## Mastering Perl
    • ## brian d foy
    • ## 2007
    • ## 39.99
    • ## &quot;
    • ## XML::LibXML::Element=SCALAR(0x1a7a7f4)
    • ## node=&quot;
    • ## Perl Best Practices
    • ## Damian Conway
    • ## 2005
    • ## 39.95
    • ## &quot;
    • ## XML::LibXML::Element=SCALAR(0x1a7a814)
    • ## node=&quot;
    • ## Design Patterns: Elements of Reusable Object-Oriented Software
    • ## Erich Gamma
    • ## Richard Helm
    • ## Ralph Johnson
    • ## John Vlissides
    • ## 1994
    • ## 59.99
    • ## &quot;
    • ## XML::LibXML::Element=SCALAR(0x1a7a834)
    • ## node=&quot;
    • ## Mastering Regular Expressions , Third Edition
    • ## Jeffrey E. F. Friedl
    • ## 2006
    • ## 44.99
    • ## &quot;
  • Find first node
    • ## Find first node
    • foreach my $N ( $root->findnodes('node[1]') ) {
    • print $N->getAttribute(&quot;subject&quot;). &quot; &quot;;
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## Perl
    • ## node=&quot;
    • ## Mastering Perl
    • ## brian d foy
    • ## 2007
    • ## 39.99
    • ## &quot;
  • Find last node
    • ## Find last node
    • foreach my $N ( $root->findnodes('node[last()]') ) {
    • print $N->getAttribute(&quot;subject&quot;). &quot; &quot;;
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## RegEx
    • ## node=&quot;
    • ## Mastering Regular Expressions , Third Edition
    • ## Jeffrey E. F. Friedl
    • ## 2006
    • ## 44.99
    • ## &quot;
  • Find last-1 node
    • ## Find last-1 node
    • foreach my $N ( $root->findnodes('node[last()-1]') ) {
    • print $N->getAttribute(&quot;subject&quot;). &quot; &quot;;
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## OO
    • ## node=&quot;
    • ## Design Patterns: Elements of Reusable Object-Oriented Software
    • ## Erich Gamma
    • ## Richard Helm
    • ## Ralph Johnson
    • ## John Vlissides
    • ## 1994
    • ## 59.99
    • ## &quot;
  • Find all titles
    • ## Find all titles
    • foreach my $N ( $root->findnodes('//title') ) {
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## title=&quot;Mastering Perl&quot;
    • ## title=&quot;Perl Best Practices&quot;
    • ## title=&quot;Design Patterns: Elements of Reusable Object-Oriented Software&quot;
    • ## title=&quot;Mastering Regular Expressions , Third Edition&quot;
    • ## title=&quot;Perl Best Practices duplicate&quot;
  • Find all node/titles
    • ## Find all node/titles
    • foreach my $N ( $root->findnodes('node/title') ) {
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## title=&quot;Mastering Perl&quot;
    • ## title=&quot;Perl Best Practices&quot;
    • ## title=&quot;Design Patterns: Elements of Reusable Object-Oriented Software&quot;
    • ## title=&quot;Mastering Regular Expressions , Third Edition&quot;
  • Find all nodes/prices/text
    • ## Find all nodes/prices
    • foreach my $N ( $root->findnodes('/root/node/price/text()') ) {
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## #text=&quot;39.99&quot;
    • ## #text=&quot;39.95&quot;
    • ## #text=&quot;59.99&quot;
    • ## #text=&quot;44.99&quot;
  • Find all nodes/prices
    • ## Find all nodes/prices
    • foreach my $N ( $root->findnodes('/root/node/price') ) {
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## price=&quot;39.99&quot;
    • ## price=&quot;39.95&quot;
    • ## price=&quot;59.99&quot;
    • ## price=&quot;44.99&quot;
  • Find first node/title
    • ## Find first node/title
    • foreach my $N ( $root->findnodes('node[1]/title') ) {
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## title=&quot;Mastering Perl&quot;
  • Find all nodes/titles where price>50
    • ## Find all nodes/titles where price>50
    • foreach my $N ( $root->findnodes('/root/node[price>50]/title') ) {
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## title=&quot;Design Patterns: Elements of Reusable Object-Oriented Software&quot;
  • Find all nodes/titles where price<50 and price>40
    • ## Find all nodes/titles where price<50 and price>40
    • foreach my $N ( $root->findnodes('/root/node[price>40 and price<50]/title') ) {
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## title=&quot;Mastering Regular Expressions , Third Edition“
    • ## title=&quot;Design Patterns: Elements of Reusable Object-Oriented Software&quot;
  • Find all values for the attribute 'lang'
    • ## Find all values for the attribute 'lang'
    • foreach my $N ( $root->findnodes('//@lang') ) {
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## lang=&quot;en&quot;
    • ## lang=&quot;en&quot;
    • ## lang=&quot;en&quot;
    • ## lang=&quot;en&quot;
  • Find all titles with the attribute 'lang'
    • ## Find all titles with the attribute 'lang'
    • foreach my $N ( $root->findnodes('//title[@lang]') ) {
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## title=&quot;Mastering Perl&quot;
    • ## title=&quot;Perl Best Practices&quot;
    • ## title=&quot;Design Patterns: Elements of Reusable Object-Oriented Software&quot;
    • ## title=&quot;Mastering Regular Expressions , Third Edition&quot;
  • Find all titles with the attribute 'id'
    • ## Find all titles with the attribute 'id'
    • foreach my $N ( $root->findnodes('//title[@id]') ) {
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## title=&quot;Perl Best Practices duplicate&quot;
  • Find all titles with the attribute 'id=&quot;en&quot;'
    • foreach my $N ( $root->findnodes('//title[@id=&quot;en&quot;]') ) {
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## title=&quot;Perl Best Practices duplicate&quot;
  • Find all titles with the attribute 'id' or with the attribute 'ooo'
    • ## Find all titles with the attribute 'id' or with the attribute 'ooo'
    • foreach my $N ( $root->findnodes('//title[@id | @ooo]') ) {
    • print $N->getName() . &quot;=&quot; . &quot;&quot;&quot; . $N->textContent() . &quot;&quot;&quot; . &quot; &quot;;
    • }
    • ## title=&quot;Perl Best Practices duplicate&quot;
  • [email_address]