SlideShare a Scribd company logo
Instructor: Professor Lothar Piepmeyer




Beautifying Data
in the Real World
         Group 5:
     Toan Do - An Du
  Vinh Nguyen - Tan Tran

              1
How big is the data on the Internet?


2004: The first time Internet exceed 1EB
2005: Eric Schmidt estimated it was 5 million
 Terabytes (~ 5EB)
Cisco forecasts that in 2015, the size of the
 Internet will reach nearly 1,000 EB

           How big is it?
                    Source: http://www.wisegeek.com/how-big-is-the-internet.htm
                                                      http://techland.time.com/
If 1 byte = 0.5mm




                    Source:3http://blog.fliptop.com/how-much-data-is-on-the-internet/
Content



Introduction
Open Notebook Sciences appoaching
Curating and presenting the data
Beautfifying the data
Data Visualization & Building a portal from
 open data and free services
Demonstration
Data on the internet




                Source: http://news.bbc.co.uk/2/hi/technology/8562801.stm
Problems of data in real world
(Scientific)


Noisy source of data
The barrier of data presentation
  OCR version
  Text version
  Human-readable
  Machine readable
  …
How to verify the data?
Open Notebook Science


Purpose: record full scientific research raw data,
 make it available and online
Benefits:
   obtain detailed descriptions of procedures
   improve the communication of science
   increase the progress
   reduce time lost due to the repetition of failed
    experiments
   …
Apply ONS on free services
Crowdsourcing


a distributed problem-solving and
 production model
Crowdsourcing
Crowdsourcing
Crowdsourcing




                Source: http://r18ultrachair.com/
Validating crowdsourced data



According to ONS, all detail data have been
 recorded
The doubtful data also be kept and marked
 for
Unique Identifiers for Chemical
Entity



Standardize data

Facilitate the integration with other data sets

Consider 3 possibilities
   CAS Registry Number
   InChI
   SMILES
CAS Registry Number



 Proprietary

 Cannot converted to chemical structure

 Dependent to a external organization to issue

For example, the CAS number of water is 7732-18-5: the
   checksum 5 is calculated as (8 1 + 1 2 + 2 3 + 3 4 + 7 5 +
   7 6) = 105; 105 mod 10 = 5
http://en.wikipedia.org/wiki/CAS_registry_number
InChI
 IUPAC International Chemical Identifier
 Freely usable and non-proprietary
 Do not have to be assigned by some organization
 Can be computed from structural information
 Human readable (with practice)




            http://en.wikipedia.org/wiki/Inchi
SMILES

   Simplified molecular-input
    line-entry system

   More human-readable than
    InChI

   Can convert to InChI




http://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system
18
http://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system
Analysis Options



Access to live data
Get Summary
Complex Statistical representations of
 models
Mark the skeptical data for later
 consideration
20
Google Docs API


Allows developers to create, retrieve, update, and
 delete Google Docs files and collections
Also provides some advanced features like resource
 archives, Optical Character
Recognition, translation, and revision history.
Useful to store data in the cloud, perform resource
 management, convert document formats


https://developers.google.com/google-apps/documents-list/
Google Visualization API


Chart Library
  JavaScript classes
Data Table
  JavaScript DataTable class
Data Source
  Chart Tools Datasource
   protocol

                        https://developers.google.com/chart/interactive/docs/index
23
24
https://google-developers.appspot.com/chart/interactive/docs/gallery
RESTful Web Service


 Representational State Transfer - a simpler alternative to
  SOAP - and Web Services Description Language (WSDL)
  based Web services
 Principles:
      Use HTTP methods explicitly.
      Be stateless.
      Expose directory structure-like URIs.
      Transfer XML, JavaScript Object
 Notation (JSON), or both.

http://www.ibm.com/developerworks/webservices/library/ws-restful/
Compare REST and SOAP


Who's using REST?
     All of Yahoo's web services use REST, including Flickr,
      del.icio.us API uses it, pubsub, bloglines, technorati, and
      both eBay, and Amazon have web services for both
      REST and SOAP.
Who's using SOAP?
     Google seams to be consistent in implementing their
      web services to use SOAP, with the exception of
      Blogger, which uses XML-RPC. You will find SOAP web
      services in lots of enterprise software as well.
http://www.petefreitag.com/item/431.cfm
Compare REST and SOAP



REST                   SOAP
 Lightweight - not a    Easy to consume -
  lot of extra xml        sometimes
  markup                 Rigid - type
 Human Readable          checking, adheres to
  Results                 a contract
 Easy to build - no     Development tools
  toolkits required
28
An Effort to Aggregate Data from
Multiple Sources



Introducing ChemSpider
  An online lookup engine for Chemists
     http://www.chemspider.com
     40 mil substances
     Multiple data sources
     A "link farm" to other sources
What is "wrong" with
  wikipedia.com?


         30
Wikipedia.com


Not “wrong”:

   Very informative for human being
Wikipedia.com


This little guy is left behind

  Not machine-readable
Semantic Web

Describing things in a way that computers
 applications can understand it.
   “The Beatles was a band from Liverpool”
Describes the relationships between things (like A
 is a part of B and Y is a member of Z) and
 the properties of things (like size, weight, age, and
 price)
“..will make all the data in the world look like
 one huge database“ – Tim Berners-Lee
                             http://www.w3schools.com/web/web_semantic.asp
Resource Description Framework

Is a language to describe resources on
 the web
Component of the Semantic Web
Data is self-describing
  Triples: "subject", "predicate" and "value“
  URIs are used to denote resources
RDF

Graph Database
  Nodes
  Edges




Well-suited for Knowledge Representation
  Beautified Data => Knowledge
RDF Example

<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cd="http://www.recshop.fake/cd#">
<rdf:Description
rdf:about="http://www.recshop.fake/cd/Empire Burlesque">
  <cd:artist>Bob Dylan</cd:artist>
  <cd:country>USA</cd:country>
  <cd:company>Columbia</cd:company>
  <cd:price>10.90</cd:price>
  <cd:year>1985</cd:year>
</rdf:Description>
</rdf:RDF>
Semantic Web Example: DBPedia

“Old School” wikipedia:
     http://en.wikipedia.org/wiki/Porsche_Panamera


DbPedia Entries

   http://dbpedia.org/page/Porsche_Panamera
   http://dbpedia.org/page/Chromium_carbide
Query Language: SPARQL (sparkle)

Query Language for RDF
    Graph Traversal
    Matching the triples
Example:
    Data:
<http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> "SPARQL
  Tutorial”

    Query:
  SELECT ?title
  WHERE { <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title>
  ?title . }

    Query Result:           title "SPARQL Tutorial"
To Infinity and Beyond

• DB2 and Oracle are ready for this train

•Object Database
    Versant OODBMS, anybody?

•Machine-Readable Data
    Will they become self-awareness?

                     39
“Data Finds Data” and Semantic Data
       Model – A Hypothesis




                 40
Non-Obvious Relationship Awareness




   LÂM



                         BẢO




                41
Non-Obvious Relationship Awareness

     LÂM’s
     iPhone




   LÂM


                         BẢO




                42
Non-Obvious Relationship Awareness

     LÂM’s
     iPhone

                         BẢO’s
                      SS Galaxy

   LÂM


                         BẢO




                43
TheGioiDi
           Dong.com


  LÂM’s
  iPhone

                          BẢO’s
                       SS Galaxy

LÂM


                          BẢO




            44
TheGioiDi
           Dong.com


  LÂM’s
  iPhone

                          BẢO’s
                       SS Galaxy

LÂM


                          BẢO




            45
TheGioiDi
                           Dong.com


             LÂM’s
             iPhone

                                          BẢO’s
                                       SS Galaxy

           LÂM


                                          BẢO
Connection Detected!
 -Bao could have met Lam at Thegioididong?
 -They could have discussed their World domination
scheme during the meeting there?
-???                         46
TheGioiDi
           Dong.com


  LÂM’s
  iPhone

                          BẢO’s
                       SS Galaxy

LÂM


                          BẢO




            47
 Data Visualization

 Building a portal from open data and
free services
Visualization of Data




                        Top million web
                        sites (per Alexa
                        traffic data) was
                        performed in
                        early 2010 ]


                        Source http://nmap.org/favicon/
Visualization of Data
Second Life
Second Life is a 3D world where everyone you see is a real person and
every place you visit is built by people just like you.
3D Visualization in SL
SL- The Opportunity for "Edutainment"




           iSchool                      Teaching: Quizzes and Lectures




  Classrooms with Powerpoint                        Research Center
                     Drexel Island on Second Life
3-D Environments




                               http://3rdrockgrid.com/
  http://www.secondlife.com/




                               http://www.craft-world.org


  http://www.osgrid.org/


                                 http://youralternativelife.com//
Visualization To Suggest New
Experiments
Building A Portal From Open Data And
 Free Services


 Freely hosted Wiki service
 Google Spreadsheet
 Google Docs API / javascripts
 Visualization services/anlalysis services (2D, 3D)
 RDF/ Senmantic Web/ Webservices
 Cost: free or fit to the purpose
Key To Success




                     Model
+ Transparency
                  Information


                    Data

                  Records
Demonstration
 Google Docs
 Second Life
References


Oreilly – Beautiful data – Chapter 16th
 Beautifying data in the real world
http://techland.time.com/2011/06/01/how-big-
 is-the-internet-spoiler-not-as-big-as-itll-be-in-
 2015/
http://drexelisland.wikispaces.com/
SMILE to 3D – Secon Life,
 http://www.youtube.com/watch?v=tOfhuoRbn
 Cg&feature=player_embedded

More Related Content

What's hot

Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018
Fabien Gandon
 
The Web We Mix - benevolent AIs for a resilient web
The Web We Mix - benevolent AIs for a resilient webThe Web We Mix - benevolent AIs for a resilient web
The Web We Mix - benevolent AIs for a resilient web
Fabien Gandon
 
Web science AI and IA
Web science AI and IAWeb science AI and IA
Web science AI and IA
Fabien Gandon
 
Learning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the Web
Gerard de Melo
 
20141112 courtot big_datasemwebontologies
20141112 courtot big_datasemwebontologies20141112 courtot big_datasemwebontologies
20141112 courtot big_datasemwebontologies
Melanie Courtot
 
Semantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAMESemantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAME
National Information Standards Organization (NISO)
 
Database Pro Power Days 2010 - Graph data in the cloud using .NET
Database Pro Power Days 2010 -  Graph data in the cloud using .NETDatabase Pro Power Days 2010 -  Graph data in the cloud using .NET
Database Pro Power Days 2010 - Graph data in the cloud using .NETAchim Friedland
 
BIBFRAME
BIBFRAMEBIBFRAME
BIBFRAME
Thomas Meehan
 
Introduction to bibframe
Introduction to bibframeIntroduction to bibframe
Introduction to bibframe
Kai Li
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
Roberto García
 
EDF2012 Mariana Damova - Factforge
EDF2012   Mariana Damova - FactforgeEDF2012   Mariana Damova - Factforge
EDF2012 Mariana Damova - FactforgeEuropean Data Forum
 
Best Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataBest Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open Data
Jose Emilio Labra Gayo
 
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
Artificial Intelligence Institute at UofSC
 
DBpedia as Gaeilge Chapter
DBpedia as Gaeilge ChapterDBpedia as Gaeilge Chapter
DBpedia as Gaeilge Chapter
Bianca Pereira
 
Why Link?
Why Link?Why Link?
Why Link?
Richard Wallis
 
Jgd User Group Demo
Jgd User Group DemoJgd User Group Demo
Jgd User Group Demobarakmich
 
Serendipity in Linked Open Data
Serendipity in Linked Open DataSerendipity in Linked Open Data
Serendipity in Linked Open Data
i_serena
 

What's hot (17)

Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018
 
The Web We Mix - benevolent AIs for a resilient web
The Web We Mix - benevolent AIs for a resilient webThe Web We Mix - benevolent AIs for a resilient web
The Web We Mix - benevolent AIs for a resilient web
 
Web science AI and IA
Web science AI and IAWeb science AI and IA
Web science AI and IA
 
Learning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the Web
 
20141112 courtot big_datasemwebontologies
20141112 courtot big_datasemwebontologies20141112 courtot big_datasemwebontologies
20141112 courtot big_datasemwebontologies
 
Semantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAMESemantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAME
 
Database Pro Power Days 2010 - Graph data in the cloud using .NET
Database Pro Power Days 2010 -  Graph data in the cloud using .NETDatabase Pro Power Days 2010 -  Graph data in the cloud using .NET
Database Pro Power Days 2010 - Graph data in the cloud using .NET
 
BIBFRAME
BIBFRAMEBIBFRAME
BIBFRAME
 
Introduction to bibframe
Introduction to bibframeIntroduction to bibframe
Introduction to bibframe
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
EDF2012 Mariana Damova - Factforge
EDF2012   Mariana Damova - FactforgeEDF2012   Mariana Damova - Factforge
EDF2012 Mariana Damova - Factforge
 
Best Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataBest Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open Data
 
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
 
DBpedia as Gaeilge Chapter
DBpedia as Gaeilge ChapterDBpedia as Gaeilge Chapter
DBpedia as Gaeilge Chapter
 
Why Link?
Why Link?Why Link?
Why Link?
 
Jgd User Group Demo
Jgd User Group DemoJgd User Group Demo
Jgd User Group Demo
 
Serendipity in Linked Open Data
Serendipity in Linked Open DataSerendipity in Linked Open Data
Serendipity in Linked Open Data
 

Viewers also liked

Hadoop at a glance
Hadoop at a glanceHadoop at a glance
Hadoop at a glance
Tan Tran
 
BIS Vietnamese-German University
BIS Vietnamese-German UniversityBIS Vietnamese-German University
BIS Vietnamese-German UniversityTan Tran
 
Brief Introduction to HCI
Brief Introduction to HCIBrief Introduction to HCI
Brief Introduction to HCI
Bao Nguyen
 
Personal task management
Personal task managementPersonal task management
Personal task managementTan Tran
 
Mật thư trò chơi lớn (tóm tắt)
Mật thư trò chơi lớn (tóm tắt)Mật thư trò chơi lớn (tóm tắt)
Mật thư trò chơi lớn (tóm tắt)
Tan Tran
 
Phac thao compendium
Phac thao compendiumPhac thao compendium
Phac thao compendiumTan Tran
 

Viewers also liked (6)

Hadoop at a glance
Hadoop at a glanceHadoop at a glance
Hadoop at a glance
 
BIS Vietnamese-German University
BIS Vietnamese-German UniversityBIS Vietnamese-German University
BIS Vietnamese-German University
 
Brief Introduction to HCI
Brief Introduction to HCIBrief Introduction to HCI
Brief Introduction to HCI
 
Personal task management
Personal task managementPersonal task management
Personal task management
 
Mật thư trò chơi lớn (tóm tắt)
Mật thư trò chơi lớn (tóm tắt)Mật thư trò chơi lớn (tóm tắt)
Mật thư trò chơi lớn (tóm tắt)
 
Phac thao compendium
Phac thao compendiumPhac thao compendium
Phac thao compendium
 

Similar to Beautifying Data in the real world

The Semantic Web: What IAs Need to Know About Web 3.0
The Semantic Web: What IAs Need to Know About Web 3.0The Semantic Web: What IAs Need to Know About Web 3.0
The Semantic Web: What IAs Need to Know About Web 3.0
Chiara Fox Ogan
 
Web Technology Trends (early 2009)
Web Technology Trends (early 2009)Web Technology Trends (early 2009)
Web Technology Trends (early 2009)
Prodosh Banerjee
 
Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
Jie Bao
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?Martin Hepp
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
Andrea Volpini
 
Skb web2.0
Skb web2.0Skb web2.0
Skb web2.0animove
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
Jesse Wang
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark GreavesMediabistro
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
Chris Mungall
 
Web3uploaded
Web3uploadedWeb3uploaded
Web3uploaded
fahimilyas
 
Explaining The Semantic Web
Explaining The Semantic WebExplaining The Semantic Web
Explaining The Semantic WebAditya Tuli
 
LODLAM Landscape NOTES
LODLAM Landscape NOTESLODLAM Landscape NOTES
LODLAM Landscape NOTES
Shana McDanold
 
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Edureka!
 
Exploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorialExploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorial
Mathieu d'Aquin
 
Semantic web and Linked Data
Semantic web and Linked DataSemantic web and Linked Data
Semantic web and Linked Data
Hyun Namgoong
 
Linked Data
Linked DataLinked Data
Linked Data
Danny Ayers
 
鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107
皓仁 柯
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
Ivan Herman
 

Similar to Beautifying Data in the real world (20)

The Semantic Web: What IAs Need to Know About Web 3.0
The Semantic Web: What IAs Need to Know About Web 3.0The Semantic Web: What IAs Need to Know About Web 3.0
The Semantic Web: What IAs Need to Know About Web 3.0
 
Web Technology Trends (early 2009)
Web Technology Trends (early 2009)Web Technology Trends (early 2009)
Web Technology Trends (early 2009)
 
Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Linked data and voyager
Linked data and voyagerLinked data and voyager
Linked data and voyager
 
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
 
Skb web2.0
Skb web2.0Skb web2.0
Skb web2.0
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
 
Web3uploaded
Web3uploadedWeb3uploaded
Web3uploaded
 
Explaining The Semantic Web
Explaining The Semantic WebExplaining The Semantic Web
Explaining The Semantic Web
 
LODLAM Landscape NOTES
LODLAM Landscape NOTESLODLAM Landscape NOTES
LODLAM Landscape NOTES
 
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
 
Exploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorialExploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorial
 
Semantic web and Linked Data
Semantic web and Linked DataSemantic web and Linked Data
Semantic web and Linked Data
 
Linked Data
Linked DataLinked Data
Linked Data
 
鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 

More from Tan Tran

Managing for results
Managing for resultsManaging for results
Managing for resultsTan Tran
 
Software estimation techniques
Software estimation techniquesSoftware estimation techniques
Software estimation techniquesTan Tran
 
Jira in action
Jira in actionJira in action
Jira in actionTan Tran
 
Management skills in IT - Communication
Management skills in IT - CommunicationManagement skills in IT - Communication
Management skills in IT - CommunicationTan Tran
 
Internet governance and the filtering problems
Internet governance and the filtering problemsInternet governance and the filtering problems
Internet governance and the filtering problems
Tan Tran
 
C# conventions & good practices
C# conventions & good practicesC# conventions & good practices
C# conventions & good practices
Tan Tran
 
Tổng hợp Dâng Ngài - nhạc sĩ Thy Yên
Tổng hợp Dâng Ngài - nhạc sĩ Thy YênTổng hợp Dâng Ngài - nhạc sĩ Thy Yên
Tổng hợp Dâng Ngài - nhạc sĩ Thy YênTan Tran
 
Flash coding convention for action script 3
Flash coding convention for action script 3Flash coding convention for action script 3
Flash coding convention for action script 3Tan Tran
 
Java convention
Java conventionJava convention
Java conventionTan Tran
 
VGU - BIS2010: Integrated Information Management
VGU - BIS2010: Integrated Information ManagementVGU - BIS2010: Integrated Information Management
VGU - BIS2010: Integrated Information Management
Tan Tran
 
Scrum introduction
Scrum introductionScrum introduction
Scrum introduction
Tan Tran
 

More from Tan Tran (11)

Managing for results
Managing for resultsManaging for results
Managing for results
 
Software estimation techniques
Software estimation techniquesSoftware estimation techniques
Software estimation techniques
 
Jira in action
Jira in actionJira in action
Jira in action
 
Management skills in IT - Communication
Management skills in IT - CommunicationManagement skills in IT - Communication
Management skills in IT - Communication
 
Internet governance and the filtering problems
Internet governance and the filtering problemsInternet governance and the filtering problems
Internet governance and the filtering problems
 
C# conventions & good practices
C# conventions & good practicesC# conventions & good practices
C# conventions & good practices
 
Tổng hợp Dâng Ngài - nhạc sĩ Thy Yên
Tổng hợp Dâng Ngài - nhạc sĩ Thy YênTổng hợp Dâng Ngài - nhạc sĩ Thy Yên
Tổng hợp Dâng Ngài - nhạc sĩ Thy Yên
 
Flash coding convention for action script 3
Flash coding convention for action script 3Flash coding convention for action script 3
Flash coding convention for action script 3
 
Java convention
Java conventionJava convention
Java convention
 
VGU - BIS2010: Integrated Information Management
VGU - BIS2010: Integrated Information ManagementVGU - BIS2010: Integrated Information Management
VGU - BIS2010: Integrated Information Management
 
Scrum introduction
Scrum introductionScrum introduction
Scrum introduction
 

Recently uploaded

ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 

Recently uploaded (20)

ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 

Beautifying Data in the real world

  • 1. Instructor: Professor Lothar Piepmeyer Beautifying Data in the Real World Group 5: Toan Do - An Du Vinh Nguyen - Tan Tran 1
  • 2. How big is the data on the Internet? 2004: The first time Internet exceed 1EB 2005: Eric Schmidt estimated it was 5 million Terabytes (~ 5EB) Cisco forecasts that in 2015, the size of the Internet will reach nearly 1,000 EB How big is it? Source: http://www.wisegeek.com/how-big-is-the-internet.htm http://techland.time.com/
  • 3. If 1 byte = 0.5mm Source:3http://blog.fliptop.com/how-much-data-is-on-the-internet/
  • 4. Content Introduction Open Notebook Sciences appoaching Curating and presenting the data Beautfifying the data Data Visualization & Building a portal from open data and free services Demonstration
  • 5. Data on the internet Source: http://news.bbc.co.uk/2/hi/technology/8562801.stm
  • 6. Problems of data in real world (Scientific) Noisy source of data The barrier of data presentation OCR version Text version Human-readable Machine readable … How to verify the data?
  • 7. Open Notebook Science Purpose: record full scientific research raw data, make it available and online Benefits: obtain detailed descriptions of procedures improve the communication of science increase the progress reduce time lost due to the repetition of failed experiments …
  • 8. Apply ONS on free services
  • 12. Crowdsourcing Source: http://r18ultrachair.com/
  • 13. Validating crowdsourced data According to ONS, all detail data have been recorded The doubtful data also be kept and marked for
  • 14. Unique Identifiers for Chemical Entity Standardize data Facilitate the integration with other data sets Consider 3 possibilities  CAS Registry Number  InChI  SMILES
  • 15. CAS Registry Number  Proprietary  Cannot converted to chemical structure  Dependent to a external organization to issue For example, the CAS number of water is 7732-18-5: the checksum 5 is calculated as (8 1 + 1 2 + 2 3 + 3 4 + 7 5 + 7 6) = 105; 105 mod 10 = 5 http://en.wikipedia.org/wiki/CAS_registry_number
  • 16. InChI  IUPAC International Chemical Identifier  Freely usable and non-proprietary  Do not have to be assigned by some organization  Can be computed from structural information  Human readable (with practice) http://en.wikipedia.org/wiki/Inchi
  • 17. SMILES  Simplified molecular-input line-entry system  More human-readable than InChI  Can convert to InChI http://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system
  • 19. Analysis Options Access to live data Get Summary Complex Statistical representations of models Mark the skeptical data for later consideration
  • 20. 20
  • 21. Google Docs API Allows developers to create, retrieve, update, and delete Google Docs files and collections Also provides some advanced features like resource archives, Optical Character Recognition, translation, and revision history. Useful to store data in the cloud, perform resource management, convert document formats https://developers.google.com/google-apps/documents-list/
  • 22. Google Visualization API Chart Library JavaScript classes Data Table JavaScript DataTable class Data Source Chart Tools Datasource protocol https://developers.google.com/chart/interactive/docs/index
  • 23. 23
  • 25. RESTful Web Service  Representational State Transfer - a simpler alternative to SOAP - and Web Services Description Language (WSDL) based Web services  Principles:  Use HTTP methods explicitly.  Be stateless.  Expose directory structure-like URIs.  Transfer XML, JavaScript Object  Notation (JSON), or both. http://www.ibm.com/developerworks/webservices/library/ws-restful/
  • 26. Compare REST and SOAP Who's using REST? All of Yahoo's web services use REST, including Flickr, del.icio.us API uses it, pubsub, bloglines, technorati, and both eBay, and Amazon have web services for both REST and SOAP. Who's using SOAP? Google seams to be consistent in implementing their web services to use SOAP, with the exception of Blogger, which uses XML-RPC. You will find SOAP web services in lots of enterprise software as well. http://www.petefreitag.com/item/431.cfm
  • 27. Compare REST and SOAP REST SOAP Lightweight - not a Easy to consume - lot of extra xml sometimes markup Rigid - type Human Readable checking, adheres to Results a contract Easy to build - no Development tools toolkits required
  • 28. 28
  • 29. An Effort to Aggregate Data from Multiple Sources Introducing ChemSpider An online lookup engine for Chemists http://www.chemspider.com 40 mil substances Multiple data sources A "link farm" to other sources
  • 30. What is "wrong" with wikipedia.com? 30
  • 31. Wikipedia.com Not “wrong”:  Very informative for human being
  • 32. Wikipedia.com This little guy is left behind Not machine-readable
  • 33. Semantic Web Describing things in a way that computers applications can understand it. “The Beatles was a band from Liverpool” Describes the relationships between things (like A is a part of B and Y is a member of Z) and the properties of things (like size, weight, age, and price) “..will make all the data in the world look like one huge database“ – Tim Berners-Lee http://www.w3schools.com/web/web_semantic.asp
  • 34. Resource Description Framework Is a language to describe resources on the web Component of the Semantic Web Data is self-describing Triples: "subject", "predicate" and "value“ URIs are used to denote resources
  • 35. RDF Graph Database Nodes Edges Well-suited for Knowledge Representation Beautified Data => Knowledge
  • 36. RDF Example <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cd="http://www.recshop.fake/cd#"> <rdf:Description rdf:about="http://www.recshop.fake/cd/Empire Burlesque"> <cd:artist>Bob Dylan</cd:artist> <cd:country>USA</cd:country> <cd:company>Columbia</cd:company> <cd:price>10.90</cd:price> <cd:year>1985</cd:year> </rdf:Description> </rdf:RDF>
  • 37. Semantic Web Example: DBPedia “Old School” wikipedia:  http://en.wikipedia.org/wiki/Porsche_Panamera DbPedia Entries  http://dbpedia.org/page/Porsche_Panamera  http://dbpedia.org/page/Chromium_carbide
  • 38. Query Language: SPARQL (sparkle) Query Language for RDF Graph Traversal Matching the triples Example: Data: <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> "SPARQL Tutorial” Query: SELECT ?title WHERE { <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> ?title . } Query Result: title "SPARQL Tutorial"
  • 39. To Infinity and Beyond • DB2 and Oracle are ready for this train •Object Database Versant OODBMS, anybody? •Machine-Readable Data Will they become self-awareness? 39
  • 40. “Data Finds Data” and Semantic Data Model – A Hypothesis 40
  • 42. Non-Obvious Relationship Awareness LÂM’s iPhone LÂM BẢO 42
  • 43. Non-Obvious Relationship Awareness LÂM’s iPhone BẢO’s SS Galaxy LÂM BẢO 43
  • 44. TheGioiDi Dong.com LÂM’s iPhone BẢO’s SS Galaxy LÂM BẢO 44
  • 45. TheGioiDi Dong.com LÂM’s iPhone BẢO’s SS Galaxy LÂM BẢO 45
  • 46. TheGioiDi Dong.com LÂM’s iPhone BẢO’s SS Galaxy LÂM BẢO Connection Detected! -Bao could have met Lam at Thegioididong? -They could have discussed their World domination scheme during the meeting there? -??? 46
  • 47. TheGioiDi Dong.com LÂM’s iPhone BẢO’s SS Galaxy LÂM BẢO 47
  • 48.  Data Visualization  Building a portal from open data and free services
  • 49. Visualization of Data Top million web sites (per Alexa traffic data) was performed in early 2010 ] Source http://nmap.org/favicon/
  • 51. Second Life Second Life is a 3D world where everyone you see is a real person and every place you visit is built by people just like you.
  • 53. SL- The Opportunity for "Edutainment" iSchool Teaching: Quizzes and Lectures Classrooms with Powerpoint Research Center Drexel Island on Second Life
  • 54. 3-D Environments http://3rdrockgrid.com/ http://www.secondlife.com/ http://www.craft-world.org http://www.osgrid.org/ http://youralternativelife.com//
  • 55. Visualization To Suggest New Experiments
  • 56. Building A Portal From Open Data And Free Services  Freely hosted Wiki service  Google Spreadsheet  Google Docs API / javascripts  Visualization services/anlalysis services (2D, 3D)  RDF/ Senmantic Web/ Webservices  Cost: free or fit to the purpose
  • 57. Key To Success Model + Transparency Information Data Records
  • 59. References Oreilly – Beautiful data – Chapter 16th Beautifying data in the real world http://techland.time.com/2011/06/01/how-big- is-the-internet-spoiler-not-as-big-as-itll-be-in- 2015/ http://drexelisland.wikispaces.com/ SMILE to 3D – Secon Life, http://www.youtube.com/watch?v=tOfhuoRbn Cg&feature=player_embedded