#PBCAT




     The Lumberjack - Xpath 101
            Thomas Weinert
About Me
â—Ź   Application Developer
    â—Ź   PHP
    â—Ź   JavaScript
    â—Ź   XSL
â—Ź   papaya Software GmbH
    â—Ź   papaya CMS
    â—Ź   Technical Director
â—Ź   FluentDOM
Questions!




         Please ask any time!
Xpath 1
â—Ź   XML Path Language
â—Ź   W3C Recommendation 16 November 1999
â—Ź   Used by
    â—Ź   XSLT 1
    â—Ź   XPointer
Xpath 2
â—Ź   W3C Recommendation 23 January 2007
â—Ź   Superset of Xpath 1
â—Ź   More data types
DOM
â—Ź   Document Object Modell
â—Ź   Standard extension: ext/dom
â—Ź   LibXml2
    â—Ź   Xpath 1
DOMXpath
â—Ź   Create after loading the document!
â—Ź   evaluate()/query()
<?php
   $str = '<sample><element/></sample>';
   $dom = new DOMDocument();
   $dom->loadXML($str);
   $xpath = new DOMXPath($dom);
   var_dump($xpath->evaluate('//element'));
   var_dump($xpath->evaluate('//noelement'));
   var_dump($xpath->evaluate('//noelement/@attr'));
?>


                   object(DOMNodeList)[5]
SimpleXML
â—Ź    Always return SimpleXML
<?php
  $str = '<sample><element/></sample>';
  $xml = simplexml_load_string($str);

    var_dump($xml->xpath('//element'));
    var_dump($xml->xpath('//noelement'));

    var_dump($xml->xpath('//noelement/@attr'));
?>


 array                            array     boolean false
  0 =>                             empty
    object(SimpleXMLElement)[2]
XSL
â—Ź   Libxslt
    â—Ź   based on Libxml2
â—Ź   ext/xsl
â—Ź   ext/xslcache
Syntax

                    /element/child[@attr]

    Absolute Path     Step 1               Predicate




                           Separator   Step 2
Nodes
â—Ź   node()
    â—Ź   * or qualified-name
    â—Ź   text()
    â—Ź   comment()
    â—Ź   processing-instruction()
Axes
â—Ź   axis::...
â—Ź   Full syntax
â—Ź   Short Syntax
â—Ź   Default Axis
child
<barcamps>
  <barcamp title="PHP Unconference Hamburg" id="phpuchh">
    <link href="http://www.php-unconference.de/" />
  </barcamp>
  <barcamp title="PHP Barcamp Salzburg" id="phpbcat">
    <link href="http://www.phpbarcamp.at/cms/" />
    <speakers-featured>
      <speaker>Bastian Feder</speaker>
    </speakers-featured>
    <speakers>
      <speaker>Thomas Weinert</speaker>
    </speakers>
  </barcamp>
  <barcamp title="PHP Unconference Europe" id="phpuceu">
    <link href="http://www.phpuceu.org/">
  </barcamp>
</barcamps>
descendant
<barcamps>
  <barcamp title="PHP Unconference Hamburg" id="phpuchh">
    <link href="http://www.php-unconference.de/" />
  </barcamp>
  <barcamp title="PHP Barcamp Salzburg" id="phpbcat">
    <link href="http://www.phpbarcamp.at/cms/" />
    <speakers-featured>
      <speaker>Bastian Feder</speaker>
    </speakers-featured>
    <speakers>
      <speaker>Thomas Weinert</speaker>
    </speakers>
  </barcamp>
  <barcamp title="PHP Unconference Europe" id="phpuceu">
    <link href="http://www.phpuceu.org/">
  </barcamp>
</barcamps>
parent
<barcamps>
  <barcamp title="PHP Unconference Hamburg" id="phpuchh">
    <link href="http://www.php-unconference.de/" />
  </barcamp>
  <barcamp title="PHP Barcamp Salzburg" id="phpbcat">
    <link href="http://www.phpbarcamp.at/cms/" />
    <speakers-featured>
      <speaker>Bastian Feder</speaker>
    </speakers-featured>
    <speakers>
      <speaker>Thomas Weinert</speaker>
    </speakers>
  </barcamp>
  <barcamp title="PHP Unconference Europe" id="phpuceu">
    <link href="http://www.phpuceu.org/">
  </barcamp>
</barcamps>
following-sibling
<barcamps>
  <barcamp title="PHP Unconference Hamburg" id="phpuchh">
    <link href="http://www.php-unconference.de/" />
  </barcamp>
  <barcamp title="PHP Barcamp Salzburg" id="phpbcat">
    <link href="http://www.phpbarcamp.at/cms/" />
    <speakers-featured>
      <speaker>Bastian Feder</speaker>
    </speakers-featured>
    <speakers>
      <speaker>Thomas Weinert</speaker>
    </speakers>
  </barcamp>
  <barcamp title="PHP Unconference Europe" id="phpuceu">
    <link href="http://www.phpuceu.org/">
  </barcamp>
</barcamps>
More Axes
â—Ź   ancestor             â—Ź   attribute
â—Ź   ancestor-or-self     â—Ź   namespaces
â—Ź   descendant-or-self
â—Ź   following
â—Ź   preceding
â—Ź   preceding-sibling
â—Ź   self
Short Syntax
â—Ź   self::node()/
    descendant-or-self::node()/
    child::para
â—Ź   .//para

                 Axis             Short
child
self                       .
parent                     ..
attribute                  @
descendant-or-self         /
Cast Functions
â—Ź   string()
â—Ź   number()
â—Ź   boolean()

    echo $xpath->evaluate('string(/html/head/title)');
Node Functions
â—Ź   count()                 â—Ź   name()
â—Ź   last()                  â—Ź   local-name()
â—Ź   position()              â—Ź   namespace-uri()



$list = $xpath->evaluate(
  '//*[local-name() = 'li' and position() = last()]'
);
String Functions
â—Ź   concat()             â—Ź   normalize-string()
â—Ź   starts-with()        â—Ź   translate()
â—Ź   contains()
â—Ź   substring-before()
â—Ź   substring-after()
â—Ź   substring()
â—Ź   string-length()
Match A Class
â—Ź   normalize-string()
â—Ź   concat()
â—Ź   contains()
Namespaces
â—Ź   URN
â—Ź   Prefix
â—Ź   Default Namespace
â—Ź   Own Prefixes
â—Ź   Attributes
Bug #49490
â—Ź   Namespace prefix conflict
$dom = new DOMDocument();
$dom->loadXML(
   '<foobar><a:foo xmlns:a="urn:a">'.
   '<b:bar xmlns:b="urn:b"/></a:foo>'.
   '</foobar>'
);
$xpath = new DOMXPath($dom);
$context = $dom->documentElement->firstChild;
$xpath->registerNamespace('a', 'urn:b');
var_dump(
   $xpath->evaluate('descendant-or-self::a:*', $context)
     ->item(0)->tagName
);
Tools
â—Ź   Firebug
â—Ź   Firefox AddOns
CSS Selectors
â—Ź   JavaScript libraries
â—Ź   element nodes
    â—Ź   *
â—Ź   no axes
    â—Ź   descendant-or-self::*
â—Ź   can ignore namespaces
    â—Ź   descendant-or-self::*[local-name() = '...']
Thanks
â—Ź   Web:
    â—Ź   http://www.papaya-cms.com/
    â—Ź   http://www.a-basketful-of-papayas.net/
â—Ź   Twitter
    â—Ź   @ThomasWeinert
â—Ź   Joind.in
    â—Ź   http://joind.in/1621

Lumberjack XPath 101

  • 1.
    #PBCAT The Lumberjack - Xpath 101 Thomas Weinert
  • 2.
    About Me â—Ź Application Developer â—Ź PHP â—Ź JavaScript â—Ź XSL â—Ź papaya Software GmbH â—Ź papaya CMS â—Ź Technical Director â—Ź FluentDOM
  • 3.
    Questions! Please ask any time!
  • 4.
    Xpath 1 â—Ź XML Path Language â—Ź W3C Recommendation 16 November 1999 â—Ź Used by â—Ź XSLT 1 â—Ź XPointer
  • 5.
    Xpath 2 â—Ź W3C Recommendation 23 January 2007 â—Ź Superset of Xpath 1 â—Ź More data types
  • 6.
    DOM â—Ź Document Object Modell â—Ź Standard extension: ext/dom â—Ź LibXml2 â—Ź Xpath 1
  • 7.
    DOMXpath â—Ź Create after loading the document! â—Ź evaluate()/query() <?php $str = '<sample><element/></sample>'; $dom = new DOMDocument(); $dom->loadXML($str); $xpath = new DOMXPath($dom); var_dump($xpath->evaluate('//element')); var_dump($xpath->evaluate('//noelement')); var_dump($xpath->evaluate('//noelement/@attr')); ?> object(DOMNodeList)[5]
  • 8.
    SimpleXML â—Ź Always return SimpleXML <?php $str = '<sample><element/></sample>'; $xml = simplexml_load_string($str); var_dump($xml->xpath('//element')); var_dump($xml->xpath('//noelement')); var_dump($xml->xpath('//noelement/@attr')); ?> array array boolean false 0 => empty object(SimpleXMLElement)[2]
  • 9.
    XSL â—Ź Libxslt â—Ź based on Libxml2 â—Ź ext/xsl â—Ź ext/xslcache
  • 10.
    Syntax /element/child[@attr] Absolute Path Step 1 Predicate Separator Step 2
  • 11.
    Nodes â—Ź node() â—Ź * or qualified-name â—Ź text() â—Ź comment() â—Ź processing-instruction()
  • 12.
    Axes â—Ź axis::... â—Ź Full syntax â—Ź Short Syntax â—Ź Default Axis
  • 13.
    child <barcamps> <barcamptitle="PHP Unconference Hamburg" id="phpuchh"> <link href="http://www.php-unconference.de/" /> </barcamp> <barcamp title="PHP Barcamp Salzburg" id="phpbcat"> <link href="http://www.phpbarcamp.at/cms/" /> <speakers-featured> <speaker>Bastian Feder</speaker> </speakers-featured> <speakers> <speaker>Thomas Weinert</speaker> </speakers> </barcamp> <barcamp title="PHP Unconference Europe" id="phpuceu"> <link href="http://www.phpuceu.org/"> </barcamp> </barcamps>
  • 14.
    descendant <barcamps> <barcamptitle="PHP Unconference Hamburg" id="phpuchh"> <link href="http://www.php-unconference.de/" /> </barcamp> <barcamp title="PHP Barcamp Salzburg" id="phpbcat"> <link href="http://www.phpbarcamp.at/cms/" /> <speakers-featured> <speaker>Bastian Feder</speaker> </speakers-featured> <speakers> <speaker>Thomas Weinert</speaker> </speakers> </barcamp> <barcamp title="PHP Unconference Europe" id="phpuceu"> <link href="http://www.phpuceu.org/"> </barcamp> </barcamps>
  • 15.
    parent <barcamps> <barcamptitle="PHP Unconference Hamburg" id="phpuchh"> <link href="http://www.php-unconference.de/" /> </barcamp> <barcamp title="PHP Barcamp Salzburg" id="phpbcat"> <link href="http://www.phpbarcamp.at/cms/" /> <speakers-featured> <speaker>Bastian Feder</speaker> </speakers-featured> <speakers> <speaker>Thomas Weinert</speaker> </speakers> </barcamp> <barcamp title="PHP Unconference Europe" id="phpuceu"> <link href="http://www.phpuceu.org/"> </barcamp> </barcamps>
  • 16.
    following-sibling <barcamps> <barcamptitle="PHP Unconference Hamburg" id="phpuchh"> <link href="http://www.php-unconference.de/" /> </barcamp> <barcamp title="PHP Barcamp Salzburg" id="phpbcat"> <link href="http://www.phpbarcamp.at/cms/" /> <speakers-featured> <speaker>Bastian Feder</speaker> </speakers-featured> <speakers> <speaker>Thomas Weinert</speaker> </speakers> </barcamp> <barcamp title="PHP Unconference Europe" id="phpuceu"> <link href="http://www.phpuceu.org/"> </barcamp> </barcamps>
  • 17.
    More Axes â—Ź ancestor â—Ź attribute â—Ź ancestor-or-self â—Ź namespaces â—Ź descendant-or-self â—Ź following â—Ź preceding â—Ź preceding-sibling â—Ź self
  • 18.
    Short Syntax â—Ź self::node()/ descendant-or-self::node()/ child::para â—Ź .//para Axis Short child self . parent .. attribute @ descendant-or-self /
  • 19.
    Cast Functions â—Ź string() â—Ź number() â—Ź boolean() echo $xpath->evaluate('string(/html/head/title)');
  • 20.
    Node Functions â—Ź count() â—Ź name() â—Ź last() â—Ź local-name() â—Ź position() â—Ź namespace-uri() $list = $xpath->evaluate( '//*[local-name() = 'li' and position() = last()]' );
  • 21.
    String Functions â—Ź concat() â—Ź normalize-string() â—Ź starts-with() â—Ź translate() â—Ź contains() â—Ź substring-before() â—Ź substring-after() â—Ź substring() â—Ź string-length()
  • 22.
    Match A Class â—Ź normalize-string() â—Ź concat() â—Ź contains()
  • 23.
    Namespaces â—Ź URN â—Ź Prefix â—Ź Default Namespace â—Ź Own Prefixes â—Ź Attributes
  • 24.
    Bug #49490 â—Ź Namespace prefix conflict $dom = new DOMDocument(); $dom->loadXML( '<foobar><a:foo xmlns:a="urn:a">'. '<b:bar xmlns:b="urn:b"/></a:foo>'. '</foobar>' ); $xpath = new DOMXPath($dom); $context = $dom->documentElement->firstChild; $xpath->registerNamespace('a', 'urn:b'); var_dump( $xpath->evaluate('descendant-or-self::a:*', $context) ->item(0)->tagName );
  • 25.
    Tools â—Ź Firebug â—Ź Firefox AddOns
  • 26.
    CSS Selectors â—Ź JavaScript libraries â—Ź element nodes â—Ź * â—Ź no axes â—Ź descendant-or-self::* â—Ź can ignore namespaces â—Ź descendant-or-self::*[local-name() = '...']
  • 27.
    Thanks â—Ź Web: â—Ź http://www.papaya-cms.com/ â—Ź http://www.a-basketful-of-papayas.net/ â—Ź Twitter â—Ź @ThomasWeinert â—Ź Joind.in â—Ź http://joind.in/1621