SlideShare a Scribd company logo
1 of 30
Download to read offline
Rewire the Net

                                                                   Davide Eynard
                                                             eynard@elet.polimi.it

                                Dipartimento di Elettronica e Informazione
                                                      Politecnico di Milano

                                                2007/05/30

Mobile, Context Aware Databases and Database Systems
Intro

           The problem
           Wrapping vs Mashup
           Mashup tools and technologies
           Open problems
           Conclusions




p. 2       2007/05/30     Rewire the Net
The problem

                                          S
                                          T
                                          R
                                          U
                                      S   C
                                      T   T
                                      R   U
                                      U   R
                                      C   E
                                      T   D
                                      U       U
                                      R       N
                                      E       S
                                      D       T
                                              R
                                              U
                                              C
                                              T
                                              U
                                              R
                                              E
                                              D

p. 3   2007/05/30    Rewire the Net
What is a wrapper?




p. 4   2007/05/30       Rewire the Net
What is a wrapper?




                                         Content
                                         Provider

                                    Desired
                                    Interface




p. 5   2007/05/30       Rewire the Net
Is a wrapper enough?

        A wrapper takes a (usually unstructured) data
         source and returns information in a desired
         format
          • All the uninteresting stuff is hidden within it
          • From outside we see only the desired interface
        What we want to do is work with this information
          • aggregate/filter it
          • use it as input for other services
          • mash it!




p. 6    2007/05/30        Rewire the Net
An example




                                     ... and now?
p. 7   2007/05/30   Rewire the Net
An example




                    Convert data structures
                    to LaTeX and generate
                    a Sudoku book in PDF




p. 8   2007/05/30          Rewire the Net
An example




                    Create a Web app
                    which delivers data
                    in a standard format
                                       Create a Java app
                                       that runs Sudokus
                                       on your mobile

                                       Create another app
                                       that solves Sudokus!



p. 9   2007/05/30          Rewire the Net
What kind of mashup?

         Imagination is your only limit
           • and... uhm, well... ability

         So, most of the mashups around belong to one of
          the following families:
           • mapping mashups
           • video and photo mashups
           • search and shopping mashups
           • news mashups




p. 10    2007/05/30        Rewire the Net
Examples




p. 11   2007/05/30   Rewire the Net
Examples




p. 12   2007/05/30   Rewire the Net
Examples




p. 13   2007/05/30   Rewire the Net
Examples




p. 14   2007/05/30   Rewire the Net
Examples




p. 15   2007/05/30   Rewire the Net
Examples




p. 16   2007/05/30   Rewire the Net
Features




                                       Source:
                     “Five Ways to Mix, Rip, and Mash Your Data”
                            Nick Gonzalez, March 2 2007




p. 17   2007/05/30              Rewire the Net
The architecture


                                             API/Content
                                              Provider


                      I
                      N
                      T                      API/Content
                      E                       Provider
          Client      R          MASHUP
                      F       SITE/SERVICE
                      A
                      C                          ...
                      E

                                             API/Content
                                              Provider




p. 18   2007/05/30        Rewire the Net
The architecture


                                               API/Content
                                                Provider



                                               API/Content
                     A                          Provider
          Client     J             MASHUP
                     A          SITE/SERVICE
                     X
                                                   ...



                                               API/Content
                                                Provider




p. 19   2007/05/30          Rewire the Net
AJAX

         Asynchronous Javascript and XML
         It's a Web application model, rather than a
          specific technology, and comprises several
          different technologies:
           • XHTML and CSS for style presentation
           • The DOM API exposed by the browser for
              dynamic display and interaction
           • Asynchronous data exchange (typically XML)
           • Browser-side scripting (typically Javascript)




p. 20    2007/05/30      Rewire the Net
Protocols and standards

         Web protocols
          • SOAP (Services-Oriented Access Protocol)
                 − XML message format
                 − Message structure: head and body parts
           • REST (Representational State Transfer)
                 − Web-based    communication using HTTP+XML
                 − Few operations: GET, POST, PUT, DELETE
                   applicable to all pieces of information
         Syndication formats
           • RSS (v1.0 is RDF based, while 2.0 is not)
           • ATOM (more attention on metadata)


p. 21    2007/05/30          Rewire the Net
Wrappers, spiders, scrapers

         Wrapper is quite a general term used to describe
          a particular architecture
                                           Remember
                                            this one?




         A wrapper needs at least other two components
          to accomplish its task
           • A spider (or crawler), to follow links and
             download web pages
           • A scraper, to extract useful content from pages
             full of uninteresting data

p. 22    2007/05/30       Rewire the Net
Scrapers




p. 23   2007/05/30   Rewire the Net
Scrapers




p. 24   2007/05/30   Rewire the Net
Scrapers

         However powerful, screen scraping is usually
          considered an inelegant solution
           • Lack of sophisticated, re-usable screen
             scraping toolkit software (most of the scrapers
             are created ad hoc). Difficult to program
           • Unlike API-interfaces, scraping has no explicit
             contract between content provider and content
             consumer. Difficult to update/maintain




p. 25    2007/05/30       Rewire the Net
Semantic Web and RDF Hey, that's my job!

         Content created for human consumption does not
          make good content for automated machine
          consumption
           • Data becomes information when it conveys
             meaning
         XML in itself is not sufficient (too arbitrary).
         RDF is quickly finding an adoption in a variety of
          domains.
           • possibility to query over it (RDQL, SPARQL)
           • possibility to reason over it (Jena, RACER)




p. 26    2007/05/30           Rewire the Net
Challenges

         Technical:
           • data integration (what if mapping is not a
             complete one?)
           • data that need to be fixed/cleaned/converted
           • robust standards, protocols, models and
             toolkits (... and try to avoid scrapers)
         Social:
           • encouraging user contributions
           • data pollution (lack of precision, gaming)
           • tradeoff between the protection of intellectual
             property and consumer privacy versus fair use
             and free flow of information

p. 27    2007/05/30       Rewire the Net
Conclusions

         Considering information as freely flowing on the
          Internet, and creating “pipes” to redirect,
          aggregate, reuse it is a great and powerful idea
         We're still at the very beginning
         User participation might offer new chances for
          improvement
                       ... and create new problems, of course!




p. 28    2007/05/30       Rewire the Net
Webography

         Duane Merrill:
          “Mashups: The new breed of Web app”
         Tim O'Reilly: “Pipes and filters for the Internet”
         Nick Gonzales:
          “Five ways to Mix, Rip and Mash Your Data”
         Davide Eynard: “PowerBrowsing Projects”,
          “SukaSudoku”
         www.webmashup.com




p. 29    2007/05/30        Rewire the Net
That's All, Folks



                            Thank you!
                     Questions are welcome




p. 30   2007/05/30       Rewire the Net

More Related Content

Viewers also liked

ATT_1401898188036_Yudha Aplication indonesia-1baru
ATT_1401898188036_Yudha Aplication indonesia-1baruATT_1401898188036_Yudha Aplication indonesia-1baru
ATT_1401898188036_Yudha Aplication indonesia-1baru
hadis primayudha
 

Viewers also liked (9)

SAnno: a unifying framework for semantic annotation
SAnno: a unifying framework for semantic annotationSAnno: a unifying framework for semantic annotation
SAnno: a unifying framework for semantic annotation
 
Personal Statement
Personal StatementPersonal Statement
Personal Statement
 
ATT_1401898188036_Yudha Aplication indonesia-1baru
ATT_1401898188036_Yudha Aplication indonesia-1baruATT_1401898188036_Yudha Aplication indonesia-1baru
ATT_1401898188036_Yudha Aplication indonesia-1baru
 
christopher-w-betts
christopher-w-bettschristopher-w-betts
christopher-w-betts
 
VirBELA Presentation
VirBELA PresentationVirBELA Presentation
VirBELA Presentation
 
Partes del computador
Partes del computadorPartes del computador
Partes del computador
 
Нормативно правова база по національно-патріотичному вихованню
Нормативно правова база по національно-патріотичному вихованнюНормативно правова база по національно-патріотичному вихованню
Нормативно правова база по національно-патріотичному вихованню
 
ReSearch - Searching for Researchers
ReSearch - Searching for ResearchersReSearch - Searching for Researchers
ReSearch - Searching for Researchers
 
Напрями роботи навчального закладу з патріотичного виховання учнів
Напрями роботи навчального закладу з патріотичного виховання учнівНапрями роботи навчального закладу з патріотичного виховання учнів
Напрями роботи навчального закладу з патріотичного виховання учнів
 

Similar to Rewire the Net

Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS Practitioners
Emanuele Della Valle
 
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-CentersTowards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Faculty of Technical Sciences, University of Novi Sad
 
CCNA Exploration 1 - Chapter 3
CCNA Exploration 1 - Chapter 3CCNA Exploration 1 - Chapter 3
CCNA Exploration 1 - Chapter 3
Irsandi Hasan
 
Ontology Aware Applications @ YAPC::EU 2012
Ontology Aware Applications @ YAPC::EU 2012Ontology Aware Applications @ YAPC::EU 2012
Ontology Aware Applications @ YAPC::EU 2012
Nuno Carvalho
 

Similar to Rewire the Net (20)

Small service is true service while it lasts: integrating web services into I...
Small service is true service while it lasts: integrating web services into I...Small service is true service while it lasts: integrating web services into I...
Small service is true service while it lasts: integrating web services into I...
 
CCA09 Cloud Computing Standards and OCCI
CCA09 Cloud Computing Standards and OCCICCA09 Cloud Computing Standards and OCCI
CCA09 Cloud Computing Standards and OCCI
 
The Internet as Web Services: introduction to ReST
The Internet as Web Services: introduction to ReSTThe Internet as Web Services: introduction to ReST
The Internet as Web Services: introduction to ReST
 
Web of Things - Towards Open and Sharable Networks of Embedded Devices
Web of Things - Towards Open and Sharable Networks of Embedded DevicesWeb of Things - Towards Open and Sharable Networks of Embedded Devices
Web of Things - Towards Open and Sharable Networks of Embedded Devices
 
Ontology development
Ontology developmentOntology development
Ontology development
 
Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS Practitioners
 
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-CentersTowards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
 
Semantic web Document
Semantic web DocumentSemantic web Document
Semantic web Document
 
Why Memcached?
Why Memcached?Why Memcached?
Why Memcached?
 
CCNA Exploration 1 - Chapter 3
CCNA Exploration 1 - Chapter 3CCNA Exploration 1 - Chapter 3
CCNA Exploration 1 - Chapter 3
 
S3OiA esiot12
S3OiA esiot12S3OiA esiot12
S3OiA esiot12
 
Ontology Aware Applications @ YAPC::EU 2012
Ontology Aware Applications @ YAPC::EU 2012Ontology Aware Applications @ YAPC::EU 2012
Ontology Aware Applications @ YAPC::EU 2012
 
Nsby examples
Nsby examplesNsby examples
Nsby examples
 
Lotico oct 2010
Lotico oct 2010Lotico oct 2010
Lotico oct 2010
 
Semantics Enriched Service Environments
Semantics Enriched Service EnvironmentsSemantics Enriched Service Environments
Semantics Enriched Service Environments
 
John Manley
John ManleyJohn Manley
John Manley
 
Implementation of a SaaS based simulation platform using open standards and o...
Implementation of a SaaS based simulation platform using open standards and o...Implementation of a SaaS based simulation platform using open standards and o...
Implementation of a SaaS based simulation platform using open standards and o...
 
Web Page Recommendation using Domain Knowledge and Web Usage Knowledge
Web Page Recommendation using Domain Knowledge and Web Usage KnowledgeWeb Page Recommendation using Domain Knowledge and Web Usage Knowledge
Web Page Recommendation using Domain Knowledge and Web Usage Knowledge
 
Data Integration with server side Mashups
Data Integration with server side MashupsData Integration with server side Mashups
Data Integration with server side Mashups
 
An Optics Life
An Optics LifeAn Optics Life
An Optics Life
 

More from Davide Eynard

Research on collaborative information sharing systems
Research on collaborative information sharing systemsResearch on collaborative information sharing systems
Research on collaborative information sharing systems
Davide Eynard
 

More from Davide Eynard (10)

Building Compatible Bases on Graphs, Images, and Manifolds
Building Compatible Bases on Graphs, Images, and ManifoldsBuilding Compatible Bases on Graphs, Images, and Manifolds
Building Compatible Bases on Graphs, Images, and Manifolds
 
Laplacian Colormaps: a framework for structure-preserving color transformations
Laplacian Colormaps: a framework for structure-preserving color transformationsLaplacian Colormaps: a framework for structure-preserving color transformations
Laplacian Colormaps: a framework for structure-preserving color transformations
 
Notes on Spectral Clustering
Notes on Spectral ClusteringNotes on Spectral Clustering
Notes on Spectral Clustering
 
An integrated approach to discover tag semantics
An integrated approach to discover tag semanticsAn integrated approach to discover tag semantics
An integrated approach to discover tag semantics
 
PhDLinux: A Linux Crash Course for PhD Students
PhDLinux: A Linux Crash Course for PhD StudentsPhDLinux: A Linux Crash Course for PhD Students
PhDLinux: A Linux Crash Course for PhD Students
 
Exploiting user gratification for collaborative semantic annotation
Exploiting user gratification for collaborative semantic annotationExploiting user gratification for collaborative semantic annotation
Exploiting user gratification for collaborative semantic annotation
 
Cracking Codes With Genetic Algorithms
Cracking Codes With Genetic AlgorithmsCracking Codes With Genetic Algorithms
Cracking Codes With Genetic Algorithms
 
Fast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparisonFast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparison
 
Unambiguous Recognizable Two-dimensional Languages
Unambiguous Recognizable Two-dimensional LanguagesUnambiguous Recognizable Two-dimensional Languages
Unambiguous Recognizable Two-dimensional Languages
 
Research on collaborative information sharing systems
Research on collaborative information sharing systemsResearch on collaborative information sharing systems
Research on collaborative information sharing systems
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Rewire the Net

  • 1. Rewire the Net Davide Eynard eynard@elet.polimi.it Dipartimento di Elettronica e Informazione Politecnico di Milano 2007/05/30 Mobile, Context Aware Databases and Database Systems
  • 2. Intro  The problem  Wrapping vs Mashup  Mashup tools and technologies  Open problems  Conclusions p. 2 2007/05/30 Rewire the Net
  • 3. The problem S T R U S C T T R U U R C E T D U U R N E S D T R U C T U R E D p. 3 2007/05/30 Rewire the Net
  • 4. What is a wrapper? p. 4 2007/05/30 Rewire the Net
  • 5. What is a wrapper? Content Provider Desired Interface p. 5 2007/05/30 Rewire the Net
  • 6. Is a wrapper enough?  A wrapper takes a (usually unstructured) data source and returns information in a desired format • All the uninteresting stuff is hidden within it • From outside we see only the desired interface  What we want to do is work with this information • aggregate/filter it • use it as input for other services • mash it! p. 6 2007/05/30 Rewire the Net
  • 7. An example ... and now? p. 7 2007/05/30 Rewire the Net
  • 8. An example Convert data structures to LaTeX and generate a Sudoku book in PDF p. 8 2007/05/30 Rewire the Net
  • 9. An example Create a Web app which delivers data in a standard format Create a Java app that runs Sudokus on your mobile Create another app that solves Sudokus! p. 9 2007/05/30 Rewire the Net
  • 10. What kind of mashup?  Imagination is your only limit • and... uhm, well... ability  So, most of the mashups around belong to one of the following families: • mapping mashups • video and photo mashups • search and shopping mashups • news mashups p. 10 2007/05/30 Rewire the Net
  • 11. Examples p. 11 2007/05/30 Rewire the Net
  • 12. Examples p. 12 2007/05/30 Rewire the Net
  • 13. Examples p. 13 2007/05/30 Rewire the Net
  • 14. Examples p. 14 2007/05/30 Rewire the Net
  • 15. Examples p. 15 2007/05/30 Rewire the Net
  • 16. Examples p. 16 2007/05/30 Rewire the Net
  • 17. Features Source: “Five Ways to Mix, Rip, and Mash Your Data” Nick Gonzalez, March 2 2007 p. 17 2007/05/30 Rewire the Net
  • 18. The architecture API/Content Provider I N T API/Content E Provider Client R MASHUP F SITE/SERVICE A C ... E API/Content Provider p. 18 2007/05/30 Rewire the Net
  • 19. The architecture API/Content Provider API/Content A Provider Client J MASHUP A SITE/SERVICE X ... API/Content Provider p. 19 2007/05/30 Rewire the Net
  • 20. AJAX  Asynchronous Javascript and XML  It's a Web application model, rather than a specific technology, and comprises several different technologies: • XHTML and CSS for style presentation • The DOM API exposed by the browser for dynamic display and interaction • Asynchronous data exchange (typically XML) • Browser-side scripting (typically Javascript) p. 20 2007/05/30 Rewire the Net
  • 21. Protocols and standards  Web protocols • SOAP (Services-Oriented Access Protocol) − XML message format − Message structure: head and body parts • REST (Representational State Transfer) − Web-based communication using HTTP+XML − Few operations: GET, POST, PUT, DELETE applicable to all pieces of information  Syndication formats • RSS (v1.0 is RDF based, while 2.0 is not) • ATOM (more attention on metadata) p. 21 2007/05/30 Rewire the Net
  • 22. Wrappers, spiders, scrapers  Wrapper is quite a general term used to describe a particular architecture Remember this one?  A wrapper needs at least other two components to accomplish its task • A spider (or crawler), to follow links and download web pages • A scraper, to extract useful content from pages full of uninteresting data p. 22 2007/05/30 Rewire the Net
  • 23. Scrapers p. 23 2007/05/30 Rewire the Net
  • 24. Scrapers p. 24 2007/05/30 Rewire the Net
  • 25. Scrapers  However powerful, screen scraping is usually considered an inelegant solution • Lack of sophisticated, re-usable screen scraping toolkit software (most of the scrapers are created ad hoc). Difficult to program • Unlike API-interfaces, scraping has no explicit contract between content provider and content consumer. Difficult to update/maintain p. 25 2007/05/30 Rewire the Net
  • 26. Semantic Web and RDF Hey, that's my job!  Content created for human consumption does not make good content for automated machine consumption • Data becomes information when it conveys meaning  XML in itself is not sufficient (too arbitrary).  RDF is quickly finding an adoption in a variety of domains. • possibility to query over it (RDQL, SPARQL) • possibility to reason over it (Jena, RACER) p. 26 2007/05/30 Rewire the Net
  • 27. Challenges  Technical: • data integration (what if mapping is not a complete one?) • data that need to be fixed/cleaned/converted • robust standards, protocols, models and toolkits (... and try to avoid scrapers)  Social: • encouraging user contributions • data pollution (lack of precision, gaming) • tradeoff between the protection of intellectual property and consumer privacy versus fair use and free flow of information p. 27 2007/05/30 Rewire the Net
  • 28. Conclusions  Considering information as freely flowing on the Internet, and creating “pipes” to redirect, aggregate, reuse it is a great and powerful idea  We're still at the very beginning  User participation might offer new chances for improvement ... and create new problems, of course! p. 28 2007/05/30 Rewire the Net
  • 29. Webography  Duane Merrill: “Mashups: The new breed of Web app”  Tim O'Reilly: “Pipes and filters for the Internet”  Nick Gonzales: “Five ways to Mix, Rip and Mash Your Data”  Davide Eynard: “PowerBrowsing Projects”, “SukaSudoku”  www.webmashup.com p. 29 2007/05/30 Rewire the Net
  • 30. That's All, Folks Thank you! Questions are welcome p. 30 2007/05/30 Rewire the Net