Instructor: Professor Lothar PiepmeyerBeautifying Datain the Real World         Group 5:     Toan Do - An Du  Vinh Nguyen ...
How big is the data on the Internet?2004: The first time Internet exceed 1EB2005: Eric Schmidt estimated it was 5 millio...
If 1 byte = 0.5mm                    Source:3http://blog.fliptop.com/how-much-data-is-on-the-internet/
ContentIntroductionOpen Notebook Sciences appoachingCurating and presenting the dataBeautfifying the dataData Visuali...
Data on the internet                Source: http://news.bbc.co.uk/2/hi/technology/8562801.stm
Problems of data in real world(Scientific)Noisy source of dataThe barrier of data presentation  OCR version  Text vers...
Open Notebook SciencePurpose: record full scientific research raw data, make it available and onlineBenefits:   obtain ...
Apply ONS on free services
Crowdsourcinga distributed problem-solving and production model
Crowdsourcing
Crowdsourcing
Crowdsourcing                Source: http://r18ultrachair.com/
Validating crowdsourced dataAccording to ONS, all detail data have been recordedThe doubtful data also be kept and marke...
Unique Identifiers for ChemicalEntityStandardize dataFacilitate the integration with other data setsConsider 3 possibil...
CAS Registry Number Proprietary Cannot converted to chemical structure Dependent to a external organization to issueFor...
InChI IUPAC International Chemical Identifier Freely usable and non-proprietary Do not have to be assigned by some orga...
SMILES   Simplified molecular-input    line-entry system   More human-readable than    InChI   Can convert to InChIhttp...
18http://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system
Analysis OptionsAccess to live dataGet SummaryComplex Statistical representations of modelsMark the skeptical data for...
20
Google Docs APIAllows developers to create, retrieve, update, and delete Google Docs files and collectionsAlso provides ...
Google Visualization APIChart Library  JavaScript classesData Table  JavaScript DataTable classData Source  Chart To...
23
24https://google-developers.appspot.com/chart/interactive/docs/gallery
RESTful Web Service Representational State Transfer - a simpler alternative to  SOAP - and Web Services Description Langu...
Compare REST and SOAPWhos using REST?     All of Yahoos web services use REST, including Flickr,      del.icio.us API us...
Compare REST and SOAPREST                   SOAP Lightweight - not a    Easy to consume -  lot of extra xml        som...
28
An Effort to Aggregate Data fromMultiple SourcesIntroducing ChemSpider  An online lookup engine for Chemists     http://ww...
What is "wrong" with  wikipedia.com?         30
Wikipedia.comNot “wrong”:   Very informative for human being
Wikipedia.comThis little guy is left behind  Not machine-readable
Semantic WebDescribing things in a way that computers applications can understand it.   “The Beatles was a band from Liv...
Resource Description FrameworkIs a language to describe resources on the webComponent of the Semantic WebData is self-d...
RDFGraph Database  Nodes  EdgesWell-suited for Knowledge Representation  Beautified Data => Knowledge
RDF Example<?xml version="1.0"?><rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:cd="http://www.recsho...
Semantic Web Example: DBPedia“Old School” wikipedia:     http://en.wikipedia.org/wiki/Porsche_PanameraDbPedia Entries  ...
Query Language: SPARQL (sparkle)Query Language for RDF    Graph Traversal    Matching the triplesExample:    Data:<ht...
To Infinity and Beyond• DB2 and Oracle are ready for this train•Object Database    Versant OODBMS, anybody?•Machine-Readab...
“Data Finds Data” and Semantic Data       Model – A Hypothesis                 40
Non-Obvious Relationship Awareness   LÂM                         BẢO                41
Non-Obvious Relationship Awareness     LÂM’s     iPhone   LÂM                         BẢO                42
Non-Obvious Relationship Awareness     LÂM’s     iPhone                         BẢO’s                      SS Galaxy   LÂM...
TheGioiDi           Dong.com  LÂM’s  iPhone                          BẢO’s                       SS GalaxyLÂM             ...
TheGioiDi           Dong.com  LÂM’s  iPhone                          BẢO’s                       SS GalaxyLÂM             ...
TheGioiDi                           Dong.com             LÂM’s             iPhone                                         ...
TheGioiDi           Dong.com  LÂM’s  iPhone                          BẢO’s                       SS GalaxyLÂM             ...
 Data Visualization Building a portal from open data andfree services
Visualization of Data                        Top million web                        sites (per Alexa                      ...
Visualization of Data
Second LifeSecond Life is a 3D world where everyone you see is a real person andevery place you visit is built by people j...
3D Visualization in SL
SL- The Opportunity for "Edutainment"           iSchool                      Teaching: Quizzes and Lectures  Classrooms wi...
3-D Environments                               http://3rdrockgrid.com/  http://www.secondlife.com/                        ...
Visualization To Suggest NewExperiments
Building A Portal From Open Data And Free Services Freely hosted Wiki service Google Spreadsheet Google Docs API / java...
Key To Success                     Model+ Transparency                  Information                    Data               ...
Demonstration Google Docs Second Life
ReferencesOreilly – Beautiful data – Chapter 16th Beautifying data in the real worldhttp://techland.time.com/2011/06/01/...
Upcoming SlideShare
Loading in...5
×

Beautifying Data in the real world

601

Published on

Beautifying Data in the real world

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
601
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Beautifying Data in the real world

  1. 1. Instructor: Professor Lothar PiepmeyerBeautifying Datain the Real World Group 5: Toan Do - An Du Vinh Nguyen - Tan Tran 1
  2. 2. How big is the data on the Internet?2004: The first time Internet exceed 1EB2005: Eric Schmidt estimated it was 5 million Terabytes (~ 5EB)Cisco forecasts that in 2015, the size of the Internet will reach nearly 1,000 EB How big is it? Source: http://www.wisegeek.com/how-big-is-the-internet.htm http://techland.time.com/
  3. 3. If 1 byte = 0.5mm Source:3http://blog.fliptop.com/how-much-data-is-on-the-internet/
  4. 4. ContentIntroductionOpen Notebook Sciences appoachingCurating and presenting the dataBeautfifying the dataData Visualization & Building a portal from open data and free servicesDemonstration
  5. 5. Data on the internet Source: http://news.bbc.co.uk/2/hi/technology/8562801.stm
  6. 6. Problems of data in real world(Scientific)Noisy source of dataThe barrier of data presentation OCR version Text version Human-readable Machine readable …How to verify the data?
  7. 7. Open Notebook SciencePurpose: record full scientific research raw data, make it available and onlineBenefits: obtain detailed descriptions of procedures improve the communication of science increase the progress reduce time lost due to the repetition of failed experiments …
  8. 8. Apply ONS on free services
  9. 9. Crowdsourcinga distributed problem-solving and production model
  10. 10. Crowdsourcing
  11. 11. Crowdsourcing
  12. 12. Crowdsourcing Source: http://r18ultrachair.com/
  13. 13. Validating crowdsourced dataAccording to ONS, all detail data have been recordedThe doubtful data also be kept and marked for
  14. 14. Unique Identifiers for ChemicalEntityStandardize dataFacilitate the integration with other data setsConsider 3 possibilities  CAS Registry Number  InChI  SMILES
  15. 15. CAS Registry Number Proprietary Cannot converted to chemical structure Dependent to a external organization to issueFor example, the CAS number of water is 7732-18-5: the checksum 5 is calculated as (8 1 + 1 2 + 2 3 + 3 4 + 7 5 + 7 6) = 105; 105 mod 10 = 5http://en.wikipedia.org/wiki/CAS_registry_number
  16. 16. InChI IUPAC International Chemical Identifier Freely usable and non-proprietary Do not have to be assigned by some organization Can be computed from structural information Human readable (with practice) http://en.wikipedia.org/wiki/Inchi
  17. 17. SMILES  Simplified molecular-input line-entry system  More human-readable than InChI  Can convert to InChIhttp://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system
  18. 18. 18http://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system
  19. 19. Analysis OptionsAccess to live dataGet SummaryComplex Statistical representations of modelsMark the skeptical data for later consideration
  20. 20. 20
  21. 21. Google Docs APIAllows developers to create, retrieve, update, and delete Google Docs files and collectionsAlso provides some advanced features like resource archives, Optical CharacterRecognition, translation, and revision history.Useful to store data in the cloud, perform resource management, convert document formatshttps://developers.google.com/google-apps/documents-list/
  22. 22. Google Visualization APIChart Library JavaScript classesData Table JavaScript DataTable classData Source Chart Tools Datasource protocol https://developers.google.com/chart/interactive/docs/index
  23. 23. 23
  24. 24. 24https://google-developers.appspot.com/chart/interactive/docs/gallery
  25. 25. RESTful Web Service Representational State Transfer - a simpler alternative to SOAP - and Web Services Description Language (WSDL) based Web services Principles:  Use HTTP methods explicitly.  Be stateless.  Expose directory structure-like URIs.  Transfer XML, JavaScript Object Notation (JSON), or both.http://www.ibm.com/developerworks/webservices/library/ws-restful/
  26. 26. Compare REST and SOAPWhos using REST? All of Yahoos web services use REST, including Flickr, del.icio.us API uses it, pubsub, bloglines, technorati, and both eBay, and Amazon have web services for both REST and SOAP.Whos using SOAP? Google seams to be consistent in implementing their web services to use SOAP, with the exception of Blogger, which uses XML-RPC. You will find SOAP web services in lots of enterprise software as well.http://www.petefreitag.com/item/431.cfm
  27. 27. Compare REST and SOAPREST SOAP Lightweight - not a Easy to consume - lot of extra xml sometimes markup Rigid - type Human Readable checking, adheres to Results a contract Easy to build - no Development tools toolkits required
  28. 28. 28
  29. 29. An Effort to Aggregate Data fromMultiple SourcesIntroducing ChemSpider An online lookup engine for Chemists http://www.chemspider.com 40 mil substances Multiple data sources A "link farm" to other sources
  30. 30. What is "wrong" with wikipedia.com? 30
  31. 31. Wikipedia.comNot “wrong”: Very informative for human being
  32. 32. Wikipedia.comThis little guy is left behind Not machine-readable
  33. 33. Semantic WebDescribing things in a way that computers applications can understand it. “The Beatles was a band from Liverpool”Describes the relationships between things (like A is a part of B and Y is a member of Z) and the properties of things (like size, weight, age, and price)“..will make all the data in the world look like one huge database“ – Tim Berners-Lee http://www.w3schools.com/web/web_semantic.asp
  34. 34. Resource Description FrameworkIs a language to describe resources on the webComponent of the Semantic WebData is self-describing Triples: "subject", "predicate" and "value“ URIs are used to denote resources
  35. 35. RDFGraph Database Nodes EdgesWell-suited for Knowledge Representation Beautified Data => Knowledge
  36. 36. RDF Example<?xml version="1.0"?><rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:cd="http://www.recshop.fake/cd#"><rdf:Descriptionrdf:about="http://www.recshop.fake/cd/Empire Burlesque"> <cd:artist>Bob Dylan</cd:artist> <cd:country>USA</cd:country> <cd:company>Columbia</cd:company> <cd:price>10.90</cd:price> <cd:year>1985</cd:year></rdf:Description></rdf:RDF>
  37. 37. Semantic Web Example: DBPedia“Old School” wikipedia:  http://en.wikipedia.org/wiki/Porsche_PanameraDbPedia Entries  http://dbpedia.org/page/Porsche_Panamera  http://dbpedia.org/page/Chromium_carbide
  38. 38. Query Language: SPARQL (sparkle)Query Language for RDF Graph Traversal Matching the triplesExample: Data:<http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> "SPARQL Tutorial” Query: SELECT ?title WHERE { <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> ?title . } Query Result: title "SPARQL Tutorial"
  39. 39. To Infinity and Beyond• DB2 and Oracle are ready for this train•Object Database Versant OODBMS, anybody?•Machine-Readable Data Will they become self-awareness? 39
  40. 40. “Data Finds Data” and Semantic Data Model – A Hypothesis 40
  41. 41. Non-Obvious Relationship Awareness LÂM BẢO 41
  42. 42. Non-Obvious Relationship Awareness LÂM’s iPhone LÂM BẢO 42
  43. 43. Non-Obvious Relationship Awareness LÂM’s iPhone BẢO’s SS Galaxy LÂM BẢO 43
  44. 44. TheGioiDi Dong.com LÂM’s iPhone BẢO’s SS GalaxyLÂM BẢO 44
  45. 45. TheGioiDi Dong.com LÂM’s iPhone BẢO’s SS GalaxyLÂM BẢO 45
  46. 46. TheGioiDi Dong.com LÂM’s iPhone BẢO’s SS Galaxy LÂM BẢOConnection Detected! -Bao could have met Lam at Thegioididong? -They could have discussed their World dominationscheme during the meeting there?-??? 46
  47. 47. TheGioiDi Dong.com LÂM’s iPhone BẢO’s SS GalaxyLÂM BẢO 47
  48. 48.  Data Visualization Building a portal from open data andfree services
  49. 49. Visualization of Data Top million web sites (per Alexa traffic data) was performed in early 2010 ] Source http://nmap.org/favicon/
  50. 50. Visualization of Data
  51. 51. Second LifeSecond Life is a 3D world where everyone you see is a real person andevery place you visit is built by people just like you.
  52. 52. 3D Visualization in SL
  53. 53. SL- The Opportunity for "Edutainment" iSchool Teaching: Quizzes and Lectures Classrooms with Powerpoint Research Center Drexel Island on Second Life
  54. 54. 3-D Environments http://3rdrockgrid.com/ http://www.secondlife.com/ http://www.craft-world.org http://www.osgrid.org/ http://youralternativelife.com//
  55. 55. Visualization To Suggest NewExperiments
  56. 56. Building A Portal From Open Data And Free Services Freely hosted Wiki service Google Spreadsheet Google Docs API / javascripts Visualization services/anlalysis services (2D, 3D) RDF/ Senmantic Web/ Webservices Cost: free or fit to the purpose
  57. 57. Key To Success Model+ Transparency Information Data Records
  58. 58. Demonstration Google Docs Second Life
  59. 59. ReferencesOreilly – Beautiful data – Chapter 16th Beautifying data in the real worldhttp://techland.time.com/2011/06/01/how-big- is-the-internet-spoiler-not-as-big-as-itll-be-in- 2015/http://drexelisland.wikispaces.com/SMILE to 3D – Secon Life, http://www.youtube.com/watch?v=tOfhuoRbn Cg&feature=player_embedded
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×