Your SlideShare is downloading. ×
0
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
HT2010 Paper Presentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HT2010 Paper Presentation

416

Published on

Providing resilient XPaths for external adaptation engines …

Providing resilient XPaths for external adaptation engines

Session 3: Adaptation, June 14 3pm
Northrop Frye Hall

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
416
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Providing Resilient Xpaths for External Adaptation Engines Iñaki Paz LKS, S. Coop. ONEKIN Research Group – UPV/EHU Donostia - San Sebastián, Spain June 14th, 2010
  • 2. Index
    • Introduction
    • XPath Expressions to select contents
    • Web pages get changed!!!!
      • In Space
      • In Time
    • Evaluation
    • Conclusions
    Introduction
  • 3. Adaptation aware Web Applications Architecture: Server Browser Depending on user profile and context, the Web Application reacts executing adaptation rules providing personalized contents. RULES “kind of” CONFIGURE ADAPTATION HTTP URL + Params Content Adaptation Rules Rules address what is adapted and how, based on user profile and context
  • 4. Adaptation Aware Applications
    • Adaptation cases / rules are foreseen on application development
    • New not foreseen adaptation needs may appear through time
    • New Possible Adaptation needs:
      • New interaction protocol (FTP) to handle application docs.
      • New comm. language (RSS) to present data.
      • Provide a RESTful interface to application concepts
      • New data filters on searches for given user.
      • Add external mashups related to certain content.
  • 5. Adaptation as an Application Layer
    • Adaptation Layer can be inside the application
      • May access to application’s business logic and APIs
      • Complex adaptations
    Architecture: Application Layer Browser Adapted Content HTTP / HTML? Content Protocol Adaptation Layer
    • Adaptation Layer can be EXTERNAL to the application
      • Adapt Layer works like any other Browser (HTTP + HTML)
      • More flexible, Adaptation FULLY independent from Application
    Adaptation Rules
  • 6. External Adaptation Architecture: Application Layer Browser HTTP / HTML Content (HTML Pages) Adaptation Layer Adapted Communication Protocol Adapted Content Content (HTML Pages) HTTP / HTML?
      • http://www.dapper.net/open/
        • Web Page => RSS, Google Gadget
      • GreaseMonkey Scripts
        • JS Scripts for the Browser to personalize app.
  • 7. External Adaptation
    • Adaptation Rules need to specify WHICH elements adaptation affects on the page.
    • Distinct technologies available to select elements on pages:
      • Text Patterns
      • Regular Expressions
      • Complex Expression Languages
    • This work focusses on Xpath
      • Most browsers support DOM Level 3 Xpath specification
      • Easy to transform HTML to XHTML (e.g. Jtidy)
    Application Layer Content (HTML Pages) Adaptation Layer
  • 8. Index
    • Introduction
    • XPath expressions to select contents
    • Web pages get changed!!!!
      • In Space
      • In Time
    • Evaluation
    • Conclusions
    XPath to Select Contents
  • 9. External Adaptation XPATH is a language to select nodes in XML Documents XPATH is based on the TREE Structure of Documents /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/tr[4]
  • 10. Web App Pages Change!!! /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[4] /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[6] If the page changes, wanted element may not be correctly selected
  • 11. Web App Pages Change!!! /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[4] /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[6] If the page changes, wanted element may not be correctly selected OUR OBJECTIVE IS TO OBTAIN CHANGE RESILIENT XPATH EXPRESSIONS
  • 12. Web App Pages Change!!!
    • Given the XPaths:
    • The Xpath:
    • Would select the same elements.
    • Notice that this XPath characterizes the banner as those ROWS with only ONE column on a table whose cellpadding is ‘2’
    • Obtaining these XPath expressions by hand is cumbersome and error prone. A tool has been developed to obtain a node’s absolute XPath expression and then generate an optimized XPATH.
    • Firefox plugins like XPather or XPath Checker (among others) enable obtaining a node’s absolute XPath.
    /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[4] /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[6] //table[@cellpadding=‘2’]/tr[count(*)=1]
  • 13. Web App Pages are different!!!
      • Distinct Pages => Distinct Structure, Distinct Contents => Distinct XPaths
      • XPaths are patterns to be applied over a pageClass set.
      • Page Class = The SET of pages that describe the same type of information and have a similar page structure.
  • 14. Index
    • Introduction
    • XPath expressions to select contents
    • Web pages get changed!!!!
      • In Space
      • In Time
    • Evaluation
    • Conclusions
    Web Pages get changed!!!!
  • 15. Variability in Space
    • Variability in Space denotes the distinct running versions of a given page accessible on a given time.
    • Web applications pages change their contents!!!
      • Different searches provide different results
      • Information caducity
      • Advert introduction
      • User and context adaptations application is aware of
    • An XPath working on a page of a given class may not work on another of the same class
    • Need to induce an XPath robust to those changes from a pageClass set contaning most of the page variants
  • 16. XPath Induction /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[3] /html/body[1]/table[2]/tr[1]/td[3]/table[1]/tr[1]/td[2]/table[3]/ tr[6] Each STEP in an absolute XPATH selects one and only one ELEMENT
  • 17. Induction: Differences on Paths
    • 3 Main difference types may be found
    /a[n]/b[m]/ c[o] /a[n]/b[m]/ c[p] ---------------------- /a[n]/b[m]/ c[conds] Position /a[n]/b/c[m] /a[n]/d/c[m] ------------------ /a[n]/ *[conds] /c[m] /a[n]/b[m]/c[o] /a[n]/d/b[m]/c[o] ----------------------- /a[n] //b[conds] /c[o] Node (e.g. div vs. span) Depth
  • 18.
    • These types may appear combined
    Induction: Differences on Paths /a[n]/b[m]/c[o] /a[n]/d[m]/c[p] ------------------ /a[n] /*[conds]/c[conds] Position & node combination
    • Sample on http://www.carsearch.com : 2 of Position
    /html/body[1]/table[2]/tr[1]/…/ table[3] / tr[7] /html/body[1]/table[2]/tr[1]/…/ table[2] / tr[10] ------------------ /html/body[1]/table[2]/tr[1]/ … / table[@width='100%'][@border='0'] [@cellpadding='2'][@cellspacing='0'][tr] / tr[count(*)=1][count(td)=1]
  • 19.
    • LOOP on XPaths resolving unconsidered differences
    • Problems:
      • /…/table[@class]
      • /…/table[@style]
    • Induction provides an XPath working on all the samples, does not optimize it
    • Ends on expressions like:
    Induction Algorithm html/body[1]/table[2]/tr[1]/ … /table[@width='100%'][@border='0'] [@cellpadding='2'][@cellspacing='0'][tr]/tr[count(*)=1][count(td)=1]
  • 20.
    • ¿Which is the problem?
      • XPath is based on structure.
      • Small changes may affect structure.
    • Solution:
      • Remove as much structural information as possible keeping equivalence with original XPath.
    Web Pages Evolve in time!!!
  • 21.
    • Definition:
      • Two XPaths are equivalent if they recover the same nodes. [Miklau 2004] have demonstrated that this problem is NP-Complete for a subset of XPath.
    • Definition:
      • An XPath is resilient to change C, if the set of recovered nodes is independent of making change C or not.
    Web Pages Evolve in time!!!
  • 22.
    • An Example:
      • ¿Which XPath seems more robust?
      • /html/body/table/tr/td/span
      • /html//span
    • The optimum for a change may not be such for another change. But the probability of being affected by a change IS different.
    Web Pages Evolve in time!!!
  • 23.
    • Generic probabilistic heuristic approach for global optimization problems.
    • Iteration starting from a solution:
      • Get new valid neighbor solution (RANDOM)
      • Test if new solution improves older based on an energy calculation function
      • Else, check if probabilistically solution is accepted (RANDOM)
      • Iterate until solution is good enough or computation budget has been exhausted
    • Simulated annealing with this function has been used:
    • F(XPath)= a * nºsteps + b * nºwildcards + c * conditions
    Simulated Annealing
  • 24.
    • Selecting a neighbor solution:
      • Solutions obtained by the modification of an XPath step
      • Resulting solution obtained by the modification must be equivalent (select the same nodes). This is checked on SA execution.
    Simulated Annealing
  • 25.
    • How to characterize an XPATH?
    • Parts of an XPath:
      • Steps (/table): FIX an structure element on the path
      • Wildcards (/*): FIX an undetermined structure element on the path
      • Conditions: FIX a condition over an elements attribute
    • Conditions:
      • Style (@width) vs. description (@class, @id, @alt)
      • Change Likelihood vs. Condition singularity
    • Energy Function characterization:
    • F(xpath)=a*steps + b*wildcards + c*styleConds + d*descrConds
    Simulated Annealing
  • 26.
    • Sample on CarSearch
      • Area to be adapted: BANNERS
    Simulated Annealing
  • 27.
    • Sample on CarSearch
      • Area to be adapted: BANNERS
    Simulated Annealing Note that optimized Xpaths somehow determine WHAT characterizes the selection on the document
  • 28. Index
    • Introduction
    • XPath Expressions to select contents
    • Web pages get changed!!!!
      • In Space
      • In Time
    • Evaluation
    • Conclusions
    Evaluation
  • 29. Evaluation
    • How to obtain page evolution for a Web app?
      • Select apps and watch if and how change
      • Consult archive.org web site home pages.
    • www.yahoo.com || www.elmundo.es
    • Tests:
      • One page each 10 days.
      • All pages analyzed for changes.
      • Changes => milestones
      • 2 or 3 different pages between milestones to generate Xpath
      • Tested with pages AFTER milestone.
  • 30. Evaluation
    • Changes evaluated as:
      • Minor: small changes in esthetics and basic structure (e.g. add rows to table)
      • Major: App redesign, new layout, etc.
    • Results:
      • 90% of XPaths were resilient to Minor Changes
      • 10% of XPaths were resilient to Major Changes
    • Conclusion:
      • The approach works for evolutionary changes,
      • not revolutionary ones
  • 31. Index
    • Introduction
    • XPath Expressions to select contents
    • Web pages get changed!!!!
      • In Space
      • In Time
    • Evaluation
    • Conclusions
    Conclusions
  • 32. Conclusions
    • External Adaptation Tools have appeared
    • Require selection patterns, such as XPath
    • Pattern Resilience to Web App Changes is important
    • Application of Induction and SA techniques
    • Further specific treatments based on the language should be taken into account (a table always contains rows and columns) on energy function.
  • 33. Contact Iñaki Paz [email_address] http://www.lks.es http://www.onekin.org

×