HT2010 Paper Presentation


Published on

Providing resilient XPaths for external adaptation engines

Session 3: Adaptation, June 14 3pm
Northrop Frye Hall

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

HT2010 Paper Presentation

  1. 1. Providing Resilient Xpaths for External Adaptation Engines Iñaki Paz LKS, S. Coop. ONEKIN Research Group – UPV/EHU Donostia - San Sebastián, Spain June 14th, 2010
  2. 2. Index <ul><li>Introduction </li></ul><ul><li>XPath Expressions to select contents </li></ul><ul><li>Web pages get changed!!!! </li></ul><ul><ul><li>In Space </li></ul></ul><ul><ul><li>In Time </li></ul></ul><ul><li>Evaluation </li></ul><ul><li>Conclusions </li></ul>Introduction
  3. 3. Adaptation aware Web Applications Architecture: Server Browser Depending on user profile and context, the Web Application reacts executing adaptation rules providing personalized contents. RULES “kind of” CONFIGURE ADAPTATION HTTP URL + Params Content Adaptation Rules Rules address what is adapted and how, based on user profile and context
  4. 4. Adaptation Aware Applications <ul><li>Adaptation cases / rules are foreseen on application development </li></ul><ul><li>New not foreseen adaptation needs may appear through time </li></ul><ul><li>New Possible Adaptation needs: </li></ul><ul><ul><li>New interaction protocol (FTP) to handle application docs. </li></ul></ul><ul><ul><li>New comm. language (RSS) to present data. </li></ul></ul><ul><ul><li>Provide a RESTful interface to application concepts </li></ul></ul><ul><ul><li>New data filters on searches for given user. </li></ul></ul><ul><ul><li>Add external mashups related to certain content. </li></ul></ul>
  5. 5. Adaptation as an Application Layer <ul><li>Adaptation Layer can be inside the application </li></ul><ul><ul><li>May access to application’s business logic and APIs </li></ul></ul><ul><ul><li>Complex adaptations </li></ul></ul>Architecture: Application Layer Browser Adapted Content HTTP / HTML? Content Protocol Adaptation Layer <ul><li>Adaptation Layer can be EXTERNAL to the application </li></ul><ul><ul><li>Adapt Layer works like any other Browser (HTTP + HTML) </li></ul></ul><ul><ul><li>More flexible, Adaptation FULLY independent from Application </li></ul></ul>Adaptation Rules
  6. 6. External Adaptation Architecture: Application Layer Browser HTTP / HTML Content (HTML Pages) Adaptation Layer Adapted Communication Protocol Adapted Content Content (HTML Pages) HTTP / HTML? <ul><ul><li> </li></ul></ul><ul><ul><ul><li>Web Page => RSS, Google Gadget </li></ul></ul></ul><ul><ul><li>GreaseMonkey Scripts </li></ul></ul><ul><ul><ul><li>JS Scripts for the Browser to personalize app. </li></ul></ul></ul>
  7. 7. External Adaptation <ul><li>Adaptation Rules need to specify WHICH elements adaptation affects on the page. </li></ul><ul><li>Distinct technologies available to select elements on pages: </li></ul><ul><ul><li>Text Patterns </li></ul></ul><ul><ul><li>Regular Expressions </li></ul></ul><ul><ul><li>Complex Expression Languages </li></ul></ul><ul><li>This work focusses on Xpath </li></ul><ul><ul><li>Most browsers support DOM Level 3 Xpath specification </li></ul></ul><ul><ul><li>Easy to transform HTML to XHTML (e.g. Jtidy) </li></ul></ul>Application Layer Content (HTML Pages) Adaptation Layer
  8. 8. Index <ul><li>Introduction </li></ul><ul><li>XPath expressions to select contents </li></ul><ul><li>Web pages get changed!!!! </li></ul><ul><ul><li>In Space </li></ul></ul><ul><ul><li>In Time </li></ul></ul><ul><li>Evaluation </li></ul><ul><li>Conclusions </li></ul>XPath to Select Contents
  9. 9. External Adaptation XPATH is a language to select nodes in XML Documents XPATH is based on the TREE Structure of Documents /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/tr[4]
  10. 10. Web App Pages Change!!! /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[4] /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[6] If the page changes, wanted element may not be correctly selected
  11. 11. Web App Pages Change!!! /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[4] /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[6] If the page changes, wanted element may not be correctly selected OUR OBJECTIVE IS TO OBTAIN CHANGE RESILIENT XPATH EXPRESSIONS
  12. 12. Web App Pages Change!!! <ul><li>Given the XPaths: </li></ul><ul><li>The Xpath: </li></ul><ul><li>Would select the same elements. </li></ul><ul><li>Notice that this XPath characterizes the banner as those ROWS with only ONE column on a table whose cellpadding is ‘2’ </li></ul><ul><li>Obtaining these XPath expressions by hand is cumbersome and error prone. A tool has been developed to obtain a node’s absolute XPath expression and then generate an optimized XPATH. </li></ul><ul><li>Firefox plugins like XPather or XPath Checker (among others) enable obtaining a node’s absolute XPath. </li></ul>/html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[4] /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[6] //table[@cellpadding=‘2’]/tr[count(*)=1]
  13. 13. Web App Pages are different!!! <ul><ul><li>Distinct Pages => Distinct Structure, Distinct Contents => Distinct XPaths </li></ul></ul><ul><ul><li>XPaths are patterns to be applied over a pageClass set. </li></ul></ul><ul><ul><li>Page Class = The SET of pages that describe the same type of information and have a similar page structure. </li></ul></ul>
  14. 14. Index <ul><li>Introduction </li></ul><ul><li>XPath expressions to select contents </li></ul><ul><li>Web pages get changed!!!! </li></ul><ul><ul><li>In Space </li></ul></ul><ul><ul><li>In Time </li></ul></ul><ul><li>Evaluation </li></ul><ul><li>Conclusions </li></ul>Web Pages get changed!!!!
  15. 15. Variability in Space <ul><li>Variability in Space denotes the distinct running versions of a given page accessible on a given time. </li></ul><ul><li>Web applications pages change their contents!!! </li></ul><ul><ul><li>Different searches provide different results </li></ul></ul><ul><ul><li>Information caducity </li></ul></ul><ul><ul><li>Advert introduction </li></ul></ul><ul><ul><li>User and context adaptations application is aware of </li></ul></ul><ul><li>An XPath working on a page of a given class may not work on another of the same class </li></ul><ul><li>Need to induce an XPath robust to those changes from a pageClass set contaning most of the page variants </li></ul>
  16. 16. XPath Induction /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[3] /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[6] Each STEP in an absolute XPATH selects one and only one ELEMENT
  17. 17. Induction: Differences on Paths <ul><li>3 Main difference types may be found </li></ul>/a[n]/b[m]/ c[o] /a[n]/b[m]/ c[p] ---------------------- /a[n]/b[m]/ c[conds] Position /a[n]/b/c[m] /a[n]/d/c[m] ------------------ /a[n]/ *[conds] /c[m] /a[n]/b[m]/c[o] /a[n]/d/b[m]/c[o] ----------------------- /a[n] //b[conds] /c[o] Node (e.g. div vs. span) Depth
  18. 18. <ul><li>These types may appear combined </li></ul>Induction: Differences on Paths /a[n]/b[m]/c[o] /a[n]/d[m]/c[p] ------------------ /a[n] /*[conds]/c[conds] Position & node combination <ul><li>Sample on : 2 of Position </li></ul>/html/body[1]/table[2]/tr[1]/…/ table[3] / tr[7] /html/body[1]/table[2]/tr[1]/…/ table[2] / tr[10] ------------------ /html/body[1]/table[2]/tr[1]/ … / table[@width='100%'][@border='0'] [@cellpadding='2'][@cellspacing='0'][tr] / tr[count(*)=1][count(td)=1]
  19. 19. <ul><li>LOOP on XPaths resolving unconsidered differences </li></ul><ul><li>Problems: </li></ul><ul><ul><li>/…/table[@class] </li></ul></ul><ul><ul><li>/…/table[@style] </li></ul></ul><ul><li>Induction provides an XPath working on all the samples, does not optimize it </li></ul><ul><li>Ends on expressions like: </li></ul>Induction Algorithm html/body[1]/table[2]/tr[1]/ … /table[@width='100%'][@border='0'] [@cellpadding='2'][@cellspacing='0'][tr]/tr[count(*)=1][count(td)=1]
  20. 20. <ul><li>¿Which is the problem? </li></ul><ul><ul><li>XPath is based on structure. </li></ul></ul><ul><ul><li>Small changes may affect structure. </li></ul></ul><ul><li>Solution: </li></ul><ul><ul><li>Remove as much structural information as possible keeping equivalence with original XPath. </li></ul></ul>Web Pages Evolve in time!!!
  21. 21. <ul><li>Definition: </li></ul><ul><ul><li>Two XPaths are equivalent if they recover the same nodes. [Miklau 2004] have demonstrated that this problem is NP-Complete for a subset of XPath. </li></ul></ul><ul><li>Definition: </li></ul><ul><ul><li>An XPath is resilient to change C, if the set of recovered nodes is independent of making change C or not. </li></ul></ul>Web Pages Evolve in time!!!
  22. 22. <ul><li>An Example: </li></ul><ul><ul><li>¿Which XPath seems more robust? </li></ul></ul><ul><ul><li>/html/body/table/tr/td/span </li></ul></ul><ul><ul><li>/html//span </li></ul></ul><ul><li>The optimum for a change may not be such for another change. But the probability of being affected by a change IS different. </li></ul>Web Pages Evolve in time!!!
  23. 23. <ul><li>Generic probabilistic heuristic approach for global optimization problems. </li></ul><ul><li>Iteration starting from a solution: </li></ul><ul><ul><li>Get new valid neighbor solution (RANDOM) </li></ul></ul><ul><ul><li>Test if new solution improves older based on an energy calculation function </li></ul></ul><ul><ul><li>Else, check if probabilistically solution is accepted (RANDOM) </li></ul></ul><ul><ul><li>Iterate until solution is good enough or computation budget has been exhausted </li></ul></ul><ul><li>Simulated annealing with this function has been used: </li></ul><ul><li>F(XPath)= a * nºsteps + b * nºwildcards + c * conditions </li></ul>Simulated Annealing
  24. 24. <ul><li>Selecting a neighbor solution: </li></ul><ul><ul><li>Solutions obtained by the modification of an XPath step </li></ul></ul><ul><ul><li>Resulting solution obtained by the modification must be equivalent (select the same nodes). This is checked on SA execution. </li></ul></ul>Simulated Annealing
  25. 25. <ul><li>How to characterize an XPATH? </li></ul><ul><li>Parts of an XPath: </li></ul><ul><ul><li>Steps (/table): FIX an structure element on the path </li></ul></ul><ul><ul><li>Wildcards (/*): FIX an undetermined structure element on the path </li></ul></ul><ul><ul><li>Conditions: FIX a condition over an elements attribute </li></ul></ul><ul><li>Conditions: </li></ul><ul><ul><li>Style (@width) vs. description (@class, @id, @alt) </li></ul></ul><ul><ul><li>Change Likelihood vs. Condition singularity </li></ul></ul><ul><li>Energy Function characterization: </li></ul><ul><li>F(xpath)=a*steps + b*wildcards + c*styleConds + d*descrConds </li></ul>Simulated Annealing
  26. 26. <ul><li>Sample on CarSearch </li></ul><ul><ul><li>Area to be adapted: BANNERS </li></ul></ul>Simulated Annealing
  27. 27. <ul><li>Sample on CarSearch </li></ul><ul><ul><li>Area to be adapted: BANNERS </li></ul></ul>Simulated Annealing Note that optimized Xpaths somehow determine WHAT characterizes the selection on the document
  28. 28. Index <ul><li>Introduction </li></ul><ul><li>XPath Expressions to select contents </li></ul><ul><li>Web pages get changed!!!! </li></ul><ul><ul><li>In Space </li></ul></ul><ul><ul><li>In Time </li></ul></ul><ul><li>Evaluation </li></ul><ul><li>Conclusions </li></ul>Evaluation
  29. 29. Evaluation <ul><li>How to obtain page evolution for a Web app? </li></ul><ul><ul><li>Select apps and watch if and how change </li></ul></ul><ul><ul><li>Consult web site home pages. </li></ul></ul><ul><li> || </li></ul><ul><li>Tests: </li></ul><ul><ul><li>One page each 10 days. </li></ul></ul><ul><ul><li>All pages analyzed for changes. </li></ul></ul><ul><ul><li>Changes => milestones </li></ul></ul><ul><ul><li>2 or 3 different pages between milestones to generate Xpath </li></ul></ul><ul><ul><li>Tested with pages AFTER milestone. </li></ul></ul>
  30. 30. Evaluation <ul><li>Changes evaluated as: </li></ul><ul><ul><li>Minor: small changes in esthetics and basic structure (e.g. add rows to table) </li></ul></ul><ul><ul><li>Major: App redesign, new layout, etc. </li></ul></ul><ul><li>Results: </li></ul><ul><ul><li>90% of XPaths were resilient to Minor Changes </li></ul></ul><ul><ul><li>10% of XPaths were resilient to Major Changes </li></ul></ul><ul><li>Conclusion: </li></ul><ul><ul><li>The approach works for evolutionary changes, </li></ul></ul><ul><ul><li>not revolutionary ones </li></ul></ul>
  31. 31. Index <ul><li>Introduction </li></ul><ul><li>XPath Expressions to select contents </li></ul><ul><li>Web pages get changed!!!! </li></ul><ul><ul><li>In Space </li></ul></ul><ul><ul><li>In Time </li></ul></ul><ul><li>Evaluation </li></ul><ul><li>Conclusions </li></ul>Conclusions
  32. 32. Conclusions <ul><li>External Adaptation Tools have appeared </li></ul><ul><li>Require selection patterns, such as XPath </li></ul><ul><li>Pattern Resilience to Web App Changes is important </li></ul><ul><li>Application of Induction and SA techniques </li></ul><ul><li>Further specific treatments based on the language should be taken into account (a table always contains rows and columns) on energy function. </li></ul>
  33. 33. Contact Iñaki Paz [email_address]