Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rewire the Net


Published on

A presentation on wrappers and mashup tools, made for the PhD course "Mobile and Context-Aware Database Systems" at Politecnico di Milano, May 2007

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Rewire the Net

  1. 1. Rewire the Net Davide Eynard Dipartimento di Elettronica e Informazione Politecnico di Milano 2007/05/30 Mobile, Context Aware Databases and Database Systems
  2. 2. Intro  The problem  Wrapping vs Mashup  Mashup tools and technologies  Open problems  Conclusions p. 2 2007/05/30 Rewire the Net
  3. 3. The problem S T R U S C T T R U U R C E T D U U R N E S D T R U C T U R E D p. 3 2007/05/30 Rewire the Net
  4. 4. What is a wrapper? p. 4 2007/05/30 Rewire the Net
  5. 5. What is a wrapper? Content Provider Desired Interface p. 5 2007/05/30 Rewire the Net
  6. 6. Is a wrapper enough?  A wrapper takes a (usually unstructured) data source and returns information in a desired format • All the uninteresting stuff is hidden within it • From outside we see only the desired interface  What we want to do is work with this information • aggregate/filter it • use it as input for other services • mash it! p. 6 2007/05/30 Rewire the Net
  7. 7. An example ... and now? p. 7 2007/05/30 Rewire the Net
  8. 8. An example Convert data structures to LaTeX and generate a Sudoku book in PDF p. 8 2007/05/30 Rewire the Net
  9. 9. An example Create a Web app which delivers data in a standard format Create a Java app that runs Sudokus on your mobile Create another app that solves Sudokus! p. 9 2007/05/30 Rewire the Net
  10. 10. What kind of mashup?  Imagination is your only limit • and... uhm, well... ability  So, most of the mashups around belong to one of the following families: • mapping mashups • video and photo mashups • search and shopping mashups • news mashups p. 10 2007/05/30 Rewire the Net
  11. 11. Examples p. 11 2007/05/30 Rewire the Net
  12. 12. Examples p. 12 2007/05/30 Rewire the Net
  13. 13. Examples p. 13 2007/05/30 Rewire the Net
  14. 14. Examples p. 14 2007/05/30 Rewire the Net
  15. 15. Examples p. 15 2007/05/30 Rewire the Net
  16. 16. Examples p. 16 2007/05/30 Rewire the Net
  17. 17. Features Source: “Five Ways to Mix, Rip, and Mash Your Data” Nick Gonzalez, March 2 2007 p. 17 2007/05/30 Rewire the Net
  18. 18. The architecture API/Content Provider I N T API/Content E Provider Client R MASHUP F SITE/SERVICE A C ... E API/Content Provider p. 18 2007/05/30 Rewire the Net
  19. 19. The architecture API/Content Provider API/Content A Provider Client J MASHUP A SITE/SERVICE X ... API/Content Provider p. 19 2007/05/30 Rewire the Net
  20. 20. AJAX  Asynchronous Javascript and XML  It's a Web application model, rather than a specific technology, and comprises several different technologies: • XHTML and CSS for style presentation • The DOM API exposed by the browser for dynamic display and interaction • Asynchronous data exchange (typically XML) • Browser-side scripting (typically Javascript) p. 20 2007/05/30 Rewire the Net
  21. 21. Protocols and standards  Web protocols • SOAP (Services-Oriented Access Protocol) − XML message format − Message structure: head and body parts • REST (Representational State Transfer) − Web-based communication using HTTP+XML − Few operations: GET, POST, PUT, DELETE applicable to all pieces of information  Syndication formats • RSS (v1.0 is RDF based, while 2.0 is not) • ATOM (more attention on metadata) p. 21 2007/05/30 Rewire the Net
  22. 22. Wrappers, spiders, scrapers  Wrapper is quite a general term used to describe a particular architecture Remember this one?  A wrapper needs at least other two components to accomplish its task • A spider (or crawler), to follow links and download web pages • A scraper, to extract useful content from pages full of uninteresting data p. 22 2007/05/30 Rewire the Net
  23. 23. Scrapers p. 23 2007/05/30 Rewire the Net
  24. 24. Scrapers p. 24 2007/05/30 Rewire the Net
  25. 25. Scrapers  However powerful, screen scraping is usually considered an inelegant solution • Lack of sophisticated, re-usable screen scraping toolkit software (most of the scrapers are created ad hoc). Difficult to program • Unlike API-interfaces, scraping has no explicit contract between content provider and content consumer. Difficult to update/maintain p. 25 2007/05/30 Rewire the Net
  26. 26. Semantic Web and RDF Hey, that's my job!  Content created for human consumption does not make good content for automated machine consumption • Data becomes information when it conveys meaning  XML in itself is not sufficient (too arbitrary).  RDF is quickly finding an adoption in a variety of domains. • possibility to query over it (RDQL, SPARQL) • possibility to reason over it (Jena, RACER) p. 26 2007/05/30 Rewire the Net
  27. 27. Challenges  Technical: • data integration (what if mapping is not a complete one?) • data that need to be fixed/cleaned/converted • robust standards, protocols, models and toolkits (... and try to avoid scrapers)  Social: • encouraging user contributions • data pollution (lack of precision, gaming) • tradeoff between the protection of intellectual property and consumer privacy versus fair use and free flow of information p. 27 2007/05/30 Rewire the Net
  28. 28. Conclusions  Considering information as freely flowing on the Internet, and creating “pipes” to redirect, aggregate, reuse it is a great and powerful idea  We're still at the very beginning  User participation might offer new chances for improvement ... and create new problems, of course! p. 28 2007/05/30 Rewire the Net
  29. 29. Webography  Duane Merrill: “Mashups: The new breed of Web app”  Tim O'Reilly: “Pipes and filters for the Internet”  Nick Gonzales: “Five ways to Mix, Rip and Mash Your Data”  Davide Eynard: “PowerBrowsing Projects”, “SukaSudoku”  p. 29 2007/05/30 Rewire the Net
  30. 30. That's All, Folks Thank you! Questions are welcome p. 30 2007/05/30 Rewire the Net