Rewire the Net

                                                                   Davide Eynard
                         ...
Intro

           The problem
           Wrapping vs Mashup
           Mashup tools and technologies
           Open p...
The problem

                                          S
                                          T
                     ...
What is a wrapper?




p. 4   2007/05/30       Rewire the Net
What is a wrapper?




                                         Content
                                         Provider
...
Is a wrapper enough?

        A wrapper takes a (usually unstructured) data
         source and returns information in a ...
An example




                                     ... and now?
p. 7   2007/05/30   Rewire the Net
An example




                    Convert data structures
                    to LaTeX and generate
                    a...
An example




                    Create a Web app
                    which delivers data
                    in a stand...
What kind of mashup?

         Imagination is your only limit
           • and... uhm, well... ability

         So, mos...
Examples




p. 11   2007/05/30   Rewire the Net
Examples




p. 12   2007/05/30   Rewire the Net
Examples




p. 13   2007/05/30   Rewire the Net
Examples




p. 14   2007/05/30   Rewire the Net
Examples




p. 15   2007/05/30   Rewire the Net
Examples




p. 16   2007/05/30   Rewire the Net
Features




                                       Source:
                     “Five Ways to Mix, Rip, and Mash Your Dat...
The architecture


                                             API/Content
                                              ...
The architecture


                                               API/Content
                                            ...
AJAX

         Asynchronous Javascript and XML
         It's a Web application model, rather than a
          specific t...
Protocols and standards

         Web protocols
          • SOAP (Services-Oriented Access Protocol)
                 − X...
Wrappers, spiders, scrapers

         Wrapper is quite a general term used to describe
          a particular architectur...
Scrapers




p. 23   2007/05/30   Rewire the Net
Scrapers




p. 24   2007/05/30   Rewire the Net
Scrapers

         However powerful, screen scraping is usually
          considered an inelegant solution
           • L...
Semantic Web and RDF Hey, that's my job!

         Content created for human consumption does not
          make good con...
Challenges

         Technical:
           • data integration (what if mapping is not a
             complete one?)
     ...
Conclusions

         Considering information as freely flowing on the
          Internet, and creating “pipes” to redire...
Webography

         Duane Merrill:
          “Mashups: The new breed of Web app”
         Tim O'Reilly: “Pipes and filt...
That's All, Folks



                            Thank you!
                     Questions are welcome




p. 30   2007/05...
Upcoming SlideShare
Loading in...5
×

Rewire the Net

1,194

Published on

A presentation on wrappers and mashup tools, made for the PhD course "Mobile and Context-Aware Database Systems" at Politecnico di Milano, May 2007

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,194
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Rewire the Net

  1. 1. Rewire the Net Davide Eynard eynard@elet.polimi.it Dipartimento di Elettronica e Informazione Politecnico di Milano 2007/05/30 Mobile, Context Aware Databases and Database Systems
  2. 2. Intro  The problem  Wrapping vs Mashup  Mashup tools and technologies  Open problems  Conclusions p. 2 2007/05/30 Rewire the Net
  3. 3. The problem S T R U S C T T R U U R C E T D U U R N E S D T R U C T U R E D p. 3 2007/05/30 Rewire the Net
  4. 4. What is a wrapper? p. 4 2007/05/30 Rewire the Net
  5. 5. What is a wrapper? Content Provider Desired Interface p. 5 2007/05/30 Rewire the Net
  6. 6. Is a wrapper enough?  A wrapper takes a (usually unstructured) data source and returns information in a desired format • All the uninteresting stuff is hidden within it • From outside we see only the desired interface  What we want to do is work with this information • aggregate/filter it • use it as input for other services • mash it! p. 6 2007/05/30 Rewire the Net
  7. 7. An example ... and now? p. 7 2007/05/30 Rewire the Net
  8. 8. An example Convert data structures to LaTeX and generate a Sudoku book in PDF p. 8 2007/05/30 Rewire the Net
  9. 9. An example Create a Web app which delivers data in a standard format Create a Java app that runs Sudokus on your mobile Create another app that solves Sudokus! p. 9 2007/05/30 Rewire the Net
  10. 10. What kind of mashup?  Imagination is your only limit • and... uhm, well... ability  So, most of the mashups around belong to one of the following families: • mapping mashups • video and photo mashups • search and shopping mashups • news mashups p. 10 2007/05/30 Rewire the Net
  11. 11. Examples p. 11 2007/05/30 Rewire the Net
  12. 12. Examples p. 12 2007/05/30 Rewire the Net
  13. 13. Examples p. 13 2007/05/30 Rewire the Net
  14. 14. Examples p. 14 2007/05/30 Rewire the Net
  15. 15. Examples p. 15 2007/05/30 Rewire the Net
  16. 16. Examples p. 16 2007/05/30 Rewire the Net
  17. 17. Features Source: “Five Ways to Mix, Rip, and Mash Your Data” Nick Gonzalez, March 2 2007 p. 17 2007/05/30 Rewire the Net
  18. 18. The architecture API/Content Provider I N T API/Content E Provider Client R MASHUP F SITE/SERVICE A C ... E API/Content Provider p. 18 2007/05/30 Rewire the Net
  19. 19. The architecture API/Content Provider API/Content A Provider Client J MASHUP A SITE/SERVICE X ... API/Content Provider p. 19 2007/05/30 Rewire the Net
  20. 20. AJAX  Asynchronous Javascript and XML  It's a Web application model, rather than a specific technology, and comprises several different technologies: • XHTML and CSS for style presentation • The DOM API exposed by the browser for dynamic display and interaction • Asynchronous data exchange (typically XML) • Browser-side scripting (typically Javascript) p. 20 2007/05/30 Rewire the Net
  21. 21. Protocols and standards  Web protocols • SOAP (Services-Oriented Access Protocol) − XML message format − Message structure: head and body parts • REST (Representational State Transfer) − Web-based communication using HTTP+XML − Few operations: GET, POST, PUT, DELETE applicable to all pieces of information  Syndication formats • RSS (v1.0 is RDF based, while 2.0 is not) • ATOM (more attention on metadata) p. 21 2007/05/30 Rewire the Net
  22. 22. Wrappers, spiders, scrapers  Wrapper is quite a general term used to describe a particular architecture Remember this one?  A wrapper needs at least other two components to accomplish its task • A spider (or crawler), to follow links and download web pages • A scraper, to extract useful content from pages full of uninteresting data p. 22 2007/05/30 Rewire the Net
  23. 23. Scrapers p. 23 2007/05/30 Rewire the Net
  24. 24. Scrapers p. 24 2007/05/30 Rewire the Net
  25. 25. Scrapers  However powerful, screen scraping is usually considered an inelegant solution • Lack of sophisticated, re-usable screen scraping toolkit software (most of the scrapers are created ad hoc). Difficult to program • Unlike API-interfaces, scraping has no explicit contract between content provider and content consumer. Difficult to update/maintain p. 25 2007/05/30 Rewire the Net
  26. 26. Semantic Web and RDF Hey, that's my job!  Content created for human consumption does not make good content for automated machine consumption • Data becomes information when it conveys meaning  XML in itself is not sufficient (too arbitrary).  RDF is quickly finding an adoption in a variety of domains. • possibility to query over it (RDQL, SPARQL) • possibility to reason over it (Jena, RACER) p. 26 2007/05/30 Rewire the Net
  27. 27. Challenges  Technical: • data integration (what if mapping is not a complete one?) • data that need to be fixed/cleaned/converted • robust standards, protocols, models and toolkits (... and try to avoid scrapers)  Social: • encouraging user contributions • data pollution (lack of precision, gaming) • tradeoff between the protection of intellectual property and consumer privacy versus fair use and free flow of information p. 27 2007/05/30 Rewire the Net
  28. 28. Conclusions  Considering information as freely flowing on the Internet, and creating “pipes” to redirect, aggregate, reuse it is a great and powerful idea  We're still at the very beginning  User participation might offer new chances for improvement ... and create new problems, of course! p. 28 2007/05/30 Rewire the Net
  29. 29. Webography  Duane Merrill: “Mashups: The new breed of Web app”  Tim O'Reilly: “Pipes and filters for the Internet”  Nick Gonzales: “Five ways to Mix, Rip and Mash Your Data”  Davide Eynard: “PowerBrowsing Projects”, “SukaSudoku”  www.webmashup.com p. 29 2007/05/30 Rewire the Net
  30. 30. That's All, Folks Thank you! Questions are welcome p. 30 2007/05/30 Rewire the Net
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×