QueryPath
For Web Services and Mash-ups
• Who created it?
• What does it do?
• Where is it being used?
• Why use it?
• How is it used?
Matt Butcher
 Maintainer of QueryPath and
the Drupal QueryPath module

    http://technosophos.com
http://twitter.com/technosophos
2009
2004
What is QueryPath?


QueryPath is a library for working with
      HTML and XML in PHP.
    It is like jQuery for the Server.
Why Another XML
    Library?
The Task
    Find all
  <a></a>
tags who have
 the ancestor
<div></div>
The DOM API
•   At least seventeen
    classes, each with dozens
    of methods.

•   Complex data structure.

•   Checkered history in
    PHP.

•   VERY powerful.
SimpleXML

•   One class, about a dozen
    functions and methods.

•   Turns XML into basic
    objects.

•   Makes easy stuff easy,
    and makes everything
    else insanely hard.
                  Same size as DOM
                      Example
SAX / Expat
               Same size,
             fewer features


•   Event based.

•   Write your own parser
    handler.

•   One per XML format.

•   Only reliable “legacy”
    library.
QueryPath
•   Compact library.

•   Functions are short and
    mnemonic.
                                    [This space intentionally left blank ]
•   The power of DOM, but
    simpler than SimpleXML

•   (Shhh... don’t tell, but it’s
    built on the DOM.)
Use it to...
•   Import HTML files        •   Create HTML on the fly

•   Read XML files           •   Retrieve database
                                content and insert it
•   Work with remote web        into XML or HTML
    services
                            •   Examine RDFa data
•   Manipulate SVG images       inside HTML or XHTML

•   Retrieve RSS and Atom   •   Run SPARQL queries
    feed data                   and retrieve SemWeb
                                content
Where is it Used?
Importing Content
•   6,000+ existing
    documents

•   Fragments of HTML 2,
    3, and 4

•   Much of it non-standard

•   Imported into Drupal
    using BatchAPI, Tidy, and
    QueryPath
Querying Semantic
        Information
 •   Use the OpenAmplify
     web service

 •   Submit node content

 •   Use results to enrich
     page

 •   Build a supermashup
     (seven web services)

http://www.youtube.com/watch?v=GBBKPIva1tM
Gateway to Web
           Services
•   100’s of Gigabytes of
    data

•   Stored in an external
    Digital Asset
    Management tool

•   XML gateway

•   QueryPath integrates
    Drupal with the DAM
Semantic Network
•   DBPedia is semantic
    equivalent of Wikipedia

•   Query with SPARQL

•   Return semantically
    oriented XML content

•   QueryPath can query
    and make use of the
    results
Twitter Mash-up
•   Retrieve latest posts
    from Twitter

•   Submit them to
    OpenAmplify for analysis

•   Provide “ratings” and
    sentiment information

•   Not in Drupal


                    http://tweetypants.com
Frameweld Framework
•   Frameweld uses
    QueryPath as part of
    their proprietary
    framework

•   Clean separation of
    presentation and other
    logic

•   QueryPath translates
    data objects into HTML   Frameweld contributes
                               back to QueryPath
Why Use It?
Shorten Difficult Tasks
 <?php
 require 'QueryPath/QueryPath.php';

 $odt = 'zip://o.odt#content.xml';

 foreach(qp($odt, 'text|h') as $i) {
   print $i->text() . "n";
 }

(Prints an outline from an ODT document)
Twitter Search



           Ten Lines of Code
How is it Used?
A Closer Look



‣Object-oriented
‣Operates on files, strings, or streams
‣Uses CSS 3 Selectors
‣Has dozens of methods
The Features
•   Query a document         •   Access remote data

    •   With XPath           •   Extensions for...

    •   CSS selectors            •   XSLT, XSD, PI

•   Move around inside the       •   SQL database access
    document
                                 •   Templates
•   Modify the document

•   Access local files
Traversing an
HTML or XML
  Document
   There are over a dozen
  functions for traversing a
         document.

These are similar to jQuery’s
    traversal functions.
Manipulating a
 Document
          •   Get and set text,
              elements,
              attributes, etc.

          •   Move, clone,
              delete.

          •   Build arbitrary
              XML or HTML.
In Drupal...

          •   Install the
              QueryPath
              module

          •   Begin using
              QueryPath in
              your modules
Outside of Drupal

            •   Go to
                QueryPath.org

            •   Download
                QueryPath

            •   Use it in your
                applications
Learn More
• IBM DeveloperWorks published an
  introduction to QueryPath:
  http://is.gd/2wHPA
• The full API docs are available at
  http://api.querypath.org
• Learn more about Drupal modules at
  http://dupal.org/project/querypath

QueryPath, Mash-ups, and Web Services

  • 1.
  • 2.
    • Who createdit? • What does it do? • Where is it being used? • Why use it? • How is it used?
  • 3.
    Matt Butcher Maintainerof QueryPath and the Drupal QueryPath module http://technosophos.com http://twitter.com/technosophos
  • 4.
  • 5.
    What is QueryPath? QueryPathis a library for working with HTML and XML in PHP. It is like jQuery for the Server.
  • 6.
  • 7.
    The Task Find all <a></a> tags who have the ancestor <div></div>
  • 8.
    The DOM API • At least seventeen classes, each with dozens of methods. • Complex data structure. • Checkered history in PHP. • VERY powerful.
  • 9.
    SimpleXML • One class, about a dozen functions and methods. • Turns XML into basic objects. • Makes easy stuff easy, and makes everything else insanely hard. Same size as DOM Example
  • 10.
    SAX / Expat Same size, fewer features • Event based. • Write your own parser handler. • One per XML format. • Only reliable “legacy” library.
  • 11.
    QueryPath • Compact library. • Functions are short and mnemonic. [This space intentionally left blank ] • The power of DOM, but simpler than SimpleXML • (Shhh... don’t tell, but it’s built on the DOM.)
  • 12.
    Use it to... • Import HTML files • Create HTML on the fly • Read XML files • Retrieve database content and insert it • Work with remote web into XML or HTML services • Examine RDFa data • Manipulate SVG images inside HTML or XHTML • Retrieve RSS and Atom • Run SPARQL queries feed data and retrieve SemWeb content
  • 13.
  • 14.
    Importing Content • 6,000+ existing documents • Fragments of HTML 2, 3, and 4 • Much of it non-standard • Imported into Drupal using BatchAPI, Tidy, and QueryPath
  • 15.
    Querying Semantic Information • Use the OpenAmplify web service • Submit node content • Use results to enrich page • Build a supermashup (seven web services) http://www.youtube.com/watch?v=GBBKPIva1tM
  • 16.
    Gateway to Web Services • 100’s of Gigabytes of data • Stored in an external Digital Asset Management tool • XML gateway • QueryPath integrates Drupal with the DAM
  • 17.
    Semantic Network • DBPedia is semantic equivalent of Wikipedia • Query with SPARQL • Return semantically oriented XML content • QueryPath can query and make use of the results
  • 18.
    Twitter Mash-up • Retrieve latest posts from Twitter • Submit them to OpenAmplify for analysis • Provide “ratings” and sentiment information • Not in Drupal http://tweetypants.com
  • 19.
    Frameweld Framework • Frameweld uses QueryPath as part of their proprietary framework • Clean separation of presentation and other logic • QueryPath translates data objects into HTML Frameweld contributes back to QueryPath
  • 20.
  • 21.
    Shorten Difficult Tasks <?php require 'QueryPath/QueryPath.php'; $odt = 'zip://o.odt#content.xml'; foreach(qp($odt, 'text|h') as $i) { print $i->text() . "n"; } (Prints an outline from an ODT document)
  • 22.
    Twitter Search Ten Lines of Code
  • 23.
    How is itUsed?
  • 24.
    A Closer Look ‣Object-oriented ‣Operateson files, strings, or streams ‣Uses CSS 3 Selectors ‣Has dozens of methods
  • 25.
    The Features • Query a document • Access remote data • With XPath • Extensions for... • CSS selectors • XSLT, XSD, PI • Move around inside the • SQL database access document • Templates • Modify the document • Access local files
  • 26.
    Traversing an HTML orXML Document There are over a dozen functions for traversing a document. These are similar to jQuery’s traversal functions.
  • 27.
    Manipulating a Document • Get and set text, elements, attributes, etc. • Move, clone, delete. • Build arbitrary XML or HTML.
  • 28.
    In Drupal... • Install the QueryPath module • Begin using QueryPath in your modules
  • 29.
    Outside of Drupal • Go to QueryPath.org • Download QueryPath • Use it in your applications
  • 30.
    Learn More • IBMDeveloperWorks published an introduction to QueryPath: http://is.gd/2wHPA • The full API docs are available at http://api.querypath.org • Learn more about Drupal modules at http://dupal.org/project/querypath