Rewire the Net


Published on

A presentation on wrappers and mashup tools, made for the PhD course "Mobile and Context-Aware Database Systems" at Politecnico di Milano, May 2007

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Rewire the Net

  1. 1. Rewire the Net Davide Eynard Dipartimento di Elettronica e Informazione Politecnico di Milano 2007/05/30 Mobile, Context Aware Databases and Database Systems
  2. 2. Intro  The problem  Wrapping vs Mashup  Mashup tools and technologies  Open problems  Conclusions p. 2 2007/05/30 Rewire the Net
  3. 3. The problem S T R U S C T T R U U R C E T D U U R N E S D T R U C T U R E D p. 3 2007/05/30 Rewire the Net
  4. 4. What is a wrapper? p. 4 2007/05/30 Rewire the Net
  5. 5. What is a wrapper? Content Provider Desired Interface p. 5 2007/05/30 Rewire the Net
  6. 6. Is a wrapper enough?  A wrapper takes a (usually unstructured) data source and returns information in a desired format • All the uninteresting stuff is hidden within it • From outside we see only the desired interface  What we want to do is work with this information • aggregate/filter it • use it as input for other services • mash it! p. 6 2007/05/30 Rewire the Net
  7. 7. An example ... and now? p. 7 2007/05/30 Rewire the Net
  8. 8. An example Convert data structures to LaTeX and generate a Sudoku book in PDF p. 8 2007/05/30 Rewire the Net
  9. 9. An example Create a Web app which delivers data in a standard format Create a Java app that runs Sudokus on your mobile Create another app that solves Sudokus! p. 9 2007/05/30 Rewire the Net
  10. 10. What kind of mashup?  Imagination is your only limit • and... uhm, well... ability  So, most of the mashups around belong to one of the following families: • mapping mashups • video and photo mashups • search and shopping mashups • news mashups p. 10 2007/05/30 Rewire the Net
  11. 11. Examples p. 11 2007/05/30 Rewire the Net
  12. 12. Examples p. 12 2007/05/30 Rewire the Net
  13. 13. Examples p. 13 2007/05/30 Rewire the Net
  14. 14. Examples p. 14 2007/05/30 Rewire the Net
  15. 15. Examples p. 15 2007/05/30 Rewire the Net
  16. 16. Examples p. 16 2007/05/30 Rewire the Net
  17. 17. Features Source: “Five Ways to Mix, Rip, and Mash Your Data” Nick Gonzalez, March 2 2007 p. 17 2007/05/30 Rewire the Net
  18. 18. The architecture API/Content Provider I N T API/Content E Provider Client R MASHUP F SITE/SERVICE A C ... E API/Content Provider p. 18 2007/05/30 Rewire the Net
  19. 19. The architecture API/Content Provider API/Content A Provider Client J MASHUP A SITE/SERVICE X ... API/Content Provider p. 19 2007/05/30 Rewire the Net
  20. 20. AJAX  Asynchronous Javascript and XML  It's a Web application model, rather than a specific technology, and comprises several different technologies: • XHTML and CSS for style presentation • The DOM API exposed by the browser for dynamic display and interaction • Asynchronous data exchange (typically XML) • Browser-side scripting (typically Javascript) p. 20 2007/05/30 Rewire the Net
  21. 21. Protocols and standards  Web protocols • SOAP (Services-Oriented Access Protocol) − XML message format − Message structure: head and body parts • REST (Representational State Transfer) − Web-based communication using HTTP+XML − Few operations: GET, POST, PUT, DELETE applicable to all pieces of information  Syndication formats • RSS (v1.0 is RDF based, while 2.0 is not) • ATOM (more attention on metadata) p. 21 2007/05/30 Rewire the Net
  22. 22. Wrappers, spiders, scrapers  Wrapper is quite a general term used to describe a particular architecture Remember this one?  A wrapper needs at least other two components to accomplish its task • A spider (or crawler), to follow links and download web pages • A scraper, to extract useful content from pages full of uninteresting data p. 22 2007/05/30 Rewire the Net
  23. 23. Scrapers p. 23 2007/05/30 Rewire the Net
  24. 24. Scrapers p. 24 2007/05/30 Rewire the Net
  25. 25. Scrapers  However powerful, screen scraping is usually considered an inelegant solution • Lack of sophisticated, re-usable screen scraping toolkit software (most of the scrapers are created ad hoc). Difficult to program • Unlike API-interfaces, scraping has no explicit contract between content provider and content consumer. Difficult to update/maintain p. 25 2007/05/30 Rewire the Net
  26. 26. Semantic Web and RDF Hey, that's my job!  Content created for human consumption does not make good content for automated machine consumption • Data becomes information when it conveys meaning  XML in itself is not sufficient (too arbitrary).  RDF is quickly finding an adoption in a variety of domains. • possibility to query over it (RDQL, SPARQL) • possibility to reason over it (Jena, RACER) p. 26 2007/05/30 Rewire the Net
  27. 27. Challenges  Technical: • data integration (what if mapping is not a complete one?) • data that need to be fixed/cleaned/converted • robust standards, protocols, models and toolkits (... and try to avoid scrapers)  Social: • encouraging user contributions • data pollution (lack of precision, gaming) • tradeoff between the protection of intellectual property and consumer privacy versus fair use and free flow of information p. 27 2007/05/30 Rewire the Net
  28. 28. Conclusions  Considering information as freely flowing on the Internet, and creating “pipes” to redirect, aggregate, reuse it is a great and powerful idea  We're still at the very beginning  User participation might offer new chances for improvement ... and create new problems, of course! p. 28 2007/05/30 Rewire the Net
  29. 29. Webography  Duane Merrill: “Mashups: The new breed of Web app”  Tim O'Reilly: “Pipes and filters for the Internet”  Nick Gonzales: “Five ways to Mix, Rip and Mash Your Data”  Davide Eynard: “PowerBrowsing Projects”, “SukaSudoku”  p. 29 2007/05/30 Rewire the Net
  30. 30. That's All, Folks Thank you! Questions are welcome p. 30 2007/05/30 Rewire the Net