The document discusses beautifying data in the real world. It describes how much data exists on the internet, which is estimated to reach nearly 1,000 exabytes by 2015. It also discusses open notebook science, crowdsourcing data, and challenges with real world data like noise and barriers to presentation. Unique identifiers for chemicals and options for analyzing data are examined. The document proposes using semantic web technologies like RDF and SPARQL to build knowledge from beautified data and create non-obvious relationships. It demonstrates visualizing data through services like Google Docs and Second Life.
How to Build Linked Data Sites with Drupal 7 and RDFascorlosquet
Slides of the tutorial Stéphane Corlosquet, Lin Clark and Alexandre Passant presented at SemTech 2010 in San Francisco http://semtech2010.semanticuniverse.com/sessionPop.cfm?confid=42& proposalid=2889
This course is a quick overview of the fundamentals of graph databases and graph queries, with a focus on RDF and SPARQL. It includes both simple and challenging hands-on exercises to practice and test your understanding.
The material for this course can be downloaded form the following link: https://github.com/paolo7/Introduction-to-Graph-Databases
Web open standards for linked data and knowledge graphs as enablers of EU dig...Fabien Gandon
Web open standards for linked data and knowledge graphs as enablers of EU digital sovereignty
ENDORSE Keynote by Fabien GANDON, 19/03/2021
https://op.europa.eu/en/web/endorse
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Duncan Hull
After centuries with little change, scientific libraries have recently experienced massive upheaval. From being almost entirely paper-based, most libraries are now almost completely digital. This information revolution has all happened in less than 20 years and has created many novel opportunities and threats for scientists, publishers and libraries.
Today, we are struggling with an embarassing wealth of digital knowledge on the Web. Most scientists access this knowledge through some kind of digital library, however these places can be cold, impersonal, isolated, and inaccessible places. Many libraries are still clinging to obsolete models of identity, attribution, contribution, citation and publication.
Based on a review published in PLoS Computational Biology, http://pubmed.gov/18974831 this talk will discuss the current chilly state of digital libraries for biologists, chemists and informaticians, including PubMed and Google Scholar. We highlight problems and solutions to the coupling and decoupling of publication data and metadata, with a tool called http://www.citeulike.org. This software tool exploits the Web to make digital libraries “warmer”: more personal, sociable, integrated, and accessible places.
Finally issues that will help or hinder the continued warming of libraries in the future, particularly the accurate identity of authors and their publications, are briefly introduced. These are discussed in the context of the BBSRC funded REFINE project, at the National Centre for Text Mining (NaCTeM.ac.uk), which is linking biochemical pathway data with evidence for pathways from the PubMed database.
A talk about the gap between theory and practice with W3C Semantic Web and Dublin Core standards, and how the DC Tools Community can help collectively reduce the cost of that gap.
Given as part of the DC Tools Community workshop at LIDA2009 in Zadar, Croatia.
How to Build Linked Data Sites with Drupal 7 and RDFascorlosquet
Slides of the tutorial Stéphane Corlosquet, Lin Clark and Alexandre Passant presented at SemTech 2010 in San Francisco http://semtech2010.semanticuniverse.com/sessionPop.cfm?confid=42& proposalid=2889
This course is a quick overview of the fundamentals of graph databases and graph queries, with a focus on RDF and SPARQL. It includes both simple and challenging hands-on exercises to practice and test your understanding.
The material for this course can be downloaded form the following link: https://github.com/paolo7/Introduction-to-Graph-Databases
Web open standards for linked data and knowledge graphs as enablers of EU dig...Fabien Gandon
Web open standards for linked data and knowledge graphs as enablers of EU digital sovereignty
ENDORSE Keynote by Fabien GANDON, 19/03/2021
https://op.europa.eu/en/web/endorse
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Duncan Hull
After centuries with little change, scientific libraries have recently experienced massive upheaval. From being almost entirely paper-based, most libraries are now almost completely digital. This information revolution has all happened in less than 20 years and has created many novel opportunities and threats for scientists, publishers and libraries.
Today, we are struggling with an embarassing wealth of digital knowledge on the Web. Most scientists access this knowledge through some kind of digital library, however these places can be cold, impersonal, isolated, and inaccessible places. Many libraries are still clinging to obsolete models of identity, attribution, contribution, citation and publication.
Based on a review published in PLoS Computational Biology, http://pubmed.gov/18974831 this talk will discuss the current chilly state of digital libraries for biologists, chemists and informaticians, including PubMed and Google Scholar. We highlight problems and solutions to the coupling and decoupling of publication data and metadata, with a tool called http://www.citeulike.org. This software tool exploits the Web to make digital libraries “warmer”: more personal, sociable, integrated, and accessible places.
Finally issues that will help or hinder the continued warming of libraries in the future, particularly the accurate identity of authors and their publications, are briefly introduced. These are discussed in the context of the BBSRC funded REFINE project, at the National Centre for Text Mining (NaCTeM.ac.uk), which is linking biochemical pathway data with evidence for pathways from the PubMed database.
A talk about the gap between theory and practice with W3C Semantic Web and Dublin Core standards, and how the DC Tools Community can help collectively reduce the cost of that gap.
Given as part of the DC Tools Community workshop at LIDA2009 in Zadar, Croatia.
From Feb 19 2014 NISO Virtual Conference: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations
Kevin Ford, Semantic Web Applications in Libraries: The Road to BIBFRAME
From: Linked Data: what cataloguers need to know. A CIG event. 25 November 2013, Birmingham. #cigld
http://www.cilip.org.uk/cataloguing-and-indexing-group/events/linked-data-what-cataloguers-need-know-cig-event
Accompanying write-up from Catalogue & Index 174: http://discovery.ucl.ac.uk/1449460/
Talk about Exploring the Semantic Web, and particularly Linked Data, and the Rhizomer approach. Presented August 14th 2012 at the SRI AIC Seminar Series, Menlo Park, CA
Semantic Web technologies such as RDF and OWL have become World Wide Web Consortium (W3C) standards for knowledge representation and reasoning. RDF triples about triples, or meta triples, form the basis for a contextualized knowledge graph. They represent the contextual information about individual triples such as the source, the occurring time or place, or the certainty.
However, an efficient RDF representation for such meta-knowledge of triples remains a major limitation of the RDF data model. The existing reification approach allows such meta-knowledge of RDF triples to be expressed in RDF by using four triples per reified triple. While reification is simple and intuitive, this approach does not have a formal foundation and is not commonly used in practice as described in the RDF Primer.
This dissertation presents the foundations for representing, querying, reasoning and traversing the contextualized knowledge graphs (CKG) using Semantic Web technologies.
A triple-based compact representation for CKGs. We propose a principled approach and construct RDF triples about triples by extending the current RDF data model with a new concept, called singleton property (SP), as a triple identifier. The SP representation needs two triples to the RDF datasets and can be queried with SPARQL.
A formal model-theoretic semantics for CKGs. We formalize the semantics of the singleton property and its relationships with the triple it represents. We extend the current RDF model-theoretic semantics to capture the semantics of the singleton properties and provide the interpretation at three levels: simple, RDF, and RDFS. It provides a single interpretation of the singleton property semantics across applications and systems.
A sound and complete inference mechanism for CKGs. Based on the semantics we propose, we develop a set of inference rules for validating and inferring new triples based on the SP syntax. We also develop different sets of context-based inference rules for provenance, time, and uncertainty.
A graph-based formalism for CKGs. We propose a formal contextualized graph model for the SP representation. We formalize the RDF triples as a mathematical graph by combining the model theory and the graph theory into a hybrid RDF formal semantics. The unified semantics allows the RDF formal semantics to be leveraged in the graph-based algorithms.
This presentation introduces the semantic web concepts that enable the publication of linked open data. It also introduces LodLive, a linked open data visualization, and discover-me-semantically, an RDF authoring tool.
"An Elephan can't jump. But can carry heavy load".
Besides Facebook and Yahoo!, many other organizations are using Hadoop to run large distributed computations: Amazon.com, Apple, eBay, IBM, ImageShack, LinkedIn, Microsoft, Twitter, The New York Times...
From Feb 19 2014 NISO Virtual Conference: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations
Kevin Ford, Semantic Web Applications in Libraries: The Road to BIBFRAME
From: Linked Data: what cataloguers need to know. A CIG event. 25 November 2013, Birmingham. #cigld
http://www.cilip.org.uk/cataloguing-and-indexing-group/events/linked-data-what-cataloguers-need-know-cig-event
Accompanying write-up from Catalogue & Index 174: http://discovery.ucl.ac.uk/1449460/
Talk about Exploring the Semantic Web, and particularly Linked Data, and the Rhizomer approach. Presented August 14th 2012 at the SRI AIC Seminar Series, Menlo Park, CA
Semantic Web technologies such as RDF and OWL have become World Wide Web Consortium (W3C) standards for knowledge representation and reasoning. RDF triples about triples, or meta triples, form the basis for a contextualized knowledge graph. They represent the contextual information about individual triples such as the source, the occurring time or place, or the certainty.
However, an efficient RDF representation for such meta-knowledge of triples remains a major limitation of the RDF data model. The existing reification approach allows such meta-knowledge of RDF triples to be expressed in RDF by using four triples per reified triple. While reification is simple and intuitive, this approach does not have a formal foundation and is not commonly used in practice as described in the RDF Primer.
This dissertation presents the foundations for representing, querying, reasoning and traversing the contextualized knowledge graphs (CKG) using Semantic Web technologies.
A triple-based compact representation for CKGs. We propose a principled approach and construct RDF triples about triples by extending the current RDF data model with a new concept, called singleton property (SP), as a triple identifier. The SP representation needs two triples to the RDF datasets and can be queried with SPARQL.
A formal model-theoretic semantics for CKGs. We formalize the semantics of the singleton property and its relationships with the triple it represents. We extend the current RDF model-theoretic semantics to capture the semantics of the singleton properties and provide the interpretation at three levels: simple, RDF, and RDFS. It provides a single interpretation of the singleton property semantics across applications and systems.
A sound and complete inference mechanism for CKGs. Based on the semantics we propose, we develop a set of inference rules for validating and inferring new triples based on the SP syntax. We also develop different sets of context-based inference rules for provenance, time, and uncertainty.
A graph-based formalism for CKGs. We propose a formal contextualized graph model for the SP representation. We formalize the RDF triples as a mathematical graph by combining the model theory and the graph theory into a hybrid RDF formal semantics. The unified semantics allows the RDF formal semantics to be leveraged in the graph-based algorithms.
This presentation introduces the semantic web concepts that enable the publication of linked open data. It also introduces LodLive, a linked open data visualization, and discover-me-semantically, an RDF authoring tool.
"An Elephan can't jump. But can carry heavy load".
Besides Facebook and Yahoo!, many other organizations are using Hadoop to run large distributed computations: Amazon.com, Apple, eBay, IBM, ImageShack, LinkedIn, Microsoft, Twitter, The New York Times...
The Semantic Web: What IAs Need to Know About Web 3.0Chiara Fox Ogan
This presentation from the IA Summit 2009 will answer the questions “What exactly *is* the Semantic Web? And why should I care?” We’ll discuss how ontologies are similar and different from thesauri and taxonomies. We’ll look at examples of how this technology is being used in the marketplace. We’ll talk about how these concepts can be incorporated into the information architecture work that you are doing today. And where you can go to learn more.
This slide deck has been prepared for a workshop on Linked Data Publishing and Semantic Processing using the Redlink platform (http://redlink.co). The workshop delivered at the Department of Information Engineering, Computer Science and Mathematics at Università degli Studi dell'Aquila aimed at providing a general understanding of Semantic Web Technologies and how these can be used in real world use cases such as Salzburgerland Tourismus.
A brief introduction has been also included on MICO (Media in Context) a European Union part-funded research project to provide cross-media analysis solutions for online multimedia producers.
An introduction deck for the Web of Data to my team, including basic semantic web, Linked Open Data, primer, and then DBpedia, Linked Data Integration Framework (LDIF), Common Crawl Database, Web Data Commons.
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOChris Mungall
NOTE THAT I HAVE MOVED AWAY FROM SLIDESHARE TO ZENODO
The identical presentation is now here:
https://doi.org/10.5281/zenodo.7778641
General introduction to LinkML, The Linked Data Modeling Language.
Adapter from presentation given to NIH May 2022
https://linkml.io/linkml
NISO Virtual Conference: BIBFRAME & Real World Applications of Linked Bibliographic Data
http://www.niso.org/news/events/2016/virtual_conference/jun15_virtualconf/
June 15, 2016
Opening Keynote: Landscape and Current Status of BIBFRAME and Related Initiatives
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...Edureka!
The free webinar on Python titled "Mastering Python - An Excellent tool for Web Scraping and Data Analysis" was conducted by Edureka on 14th November 2014
VGU - BIS2010: Integrated Information ManagementTan Tran
- Integrated Information Management – Applying successful industrial concepts to IT.
- A book of Zarnekow and Walter Brenner.
- Provide a framework need more further research and development.
- The main idea is transforming IT services to IT “product” and applying the production model.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
1. Instructor: Professor Lothar Piepmeyer
Beautifying Data
in the Real World
Group 5:
Toan Do - An Du
Vinh Nguyen - Tan Tran
1
2. How big is the data on the Internet?
2004: The first time Internet exceed 1EB
2005: Eric Schmidt estimated it was 5 million
Terabytes (~ 5EB)
Cisco forecasts that in 2015, the size of the
Internet will reach nearly 1,000 EB
How big is it?
Source: http://www.wisegeek.com/how-big-is-the-internet.htm
http://techland.time.com/
3. If 1 byte = 0.5mm
Source:3http://blog.fliptop.com/how-much-data-is-on-the-internet/
4. Content
Introduction
Open Notebook Sciences appoaching
Curating and presenting the data
Beautfifying the data
Data Visualization & Building a portal from
open data and free services
Demonstration
5. Data on the internet
Source: http://news.bbc.co.uk/2/hi/technology/8562801.stm
6. Problems of data in real world
(Scientific)
Noisy source of data
The barrier of data presentation
OCR version
Text version
Human-readable
Machine readable
…
How to verify the data?
7. Open Notebook Science
Purpose: record full scientific research raw data,
make it available and online
Benefits:
obtain detailed descriptions of procedures
improve the communication of science
increase the progress
reduce time lost due to the repetition of failed
experiments
…
14. Unique Identifiers for Chemical
Entity
Standardize data
Facilitate the integration with other data sets
Consider 3 possibilities
CAS Registry Number
InChI
SMILES
15. CAS Registry Number
Proprietary
Cannot converted to chemical structure
Dependent to a external organization to issue
For example, the CAS number of water is 7732-18-5: the
checksum 5 is calculated as (8 1 + 1 2 + 2 3 + 3 4 + 7 5 +
7 6) = 105; 105 mod 10 = 5
http://en.wikipedia.org/wiki/CAS_registry_number
16. InChI
IUPAC International Chemical Identifier
Freely usable and non-proprietary
Do not have to be assigned by some organization
Can be computed from structural information
Human readable (with practice)
http://en.wikipedia.org/wiki/Inchi
17. SMILES
Simplified molecular-input
line-entry system
More human-readable than
InChI
Can convert to InChI
http://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system
21. Google Docs API
Allows developers to create, retrieve, update, and
delete Google Docs files and collections
Also provides some advanced features like resource
archives, Optical Character
Recognition, translation, and revision history.
Useful to store data in the cloud, perform resource
management, convert document formats
https://developers.google.com/google-apps/documents-list/
22. Google Visualization API
Chart Library
JavaScript classes
Data Table
JavaScript DataTable class
Data Source
Chart Tools Datasource
protocol
https://developers.google.com/chart/interactive/docs/index
25. RESTful Web Service
Representational State Transfer - a simpler alternative to
SOAP - and Web Services Description Language (WSDL)
based Web services
Principles:
Use HTTP methods explicitly.
Be stateless.
Expose directory structure-like URIs.
Transfer XML, JavaScript Object
Notation (JSON), or both.
http://www.ibm.com/developerworks/webservices/library/ws-restful/
26. Compare REST and SOAP
Who's using REST?
All of Yahoo's web services use REST, including Flickr,
del.icio.us API uses it, pubsub, bloglines, technorati, and
both eBay, and Amazon have web services for both
REST and SOAP.
Who's using SOAP?
Google seams to be consistent in implementing their
web services to use SOAP, with the exception of
Blogger, which uses XML-RPC. You will find SOAP web
services in lots of enterprise software as well.
http://www.petefreitag.com/item/431.cfm
27. Compare REST and SOAP
REST SOAP
Lightweight - not a Easy to consume -
lot of extra xml sometimes
markup Rigid - type
Human Readable checking, adheres to
Results a contract
Easy to build - no Development tools
toolkits required
29. An Effort to Aggregate Data from
Multiple Sources
Introducing ChemSpider
An online lookup engine for Chemists
http://www.chemspider.com
40 mil substances
Multiple data sources
A "link farm" to other sources
33. Semantic Web
Describing things in a way that computers
applications can understand it.
“The Beatles was a band from Liverpool”
Describes the relationships between things (like A
is a part of B and Y is a member of Z) and
the properties of things (like size, weight, age, and
price)
“..will make all the data in the world look like
one huge database“ – Tim Berners-Lee
http://www.w3schools.com/web/web_semantic.asp
34. Resource Description Framework
Is a language to describe resources on
the web
Component of the Semantic Web
Data is self-describing
Triples: "subject", "predicate" and "value“
URIs are used to denote resources
35. RDF
Graph Database
Nodes
Edges
Well-suited for Knowledge Representation
Beautified Data => Knowledge
38. Query Language: SPARQL (sparkle)
Query Language for RDF
Graph Traversal
Matching the triples
Example:
Data:
<http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> "SPARQL
Tutorial”
Query:
SELECT ?title
WHERE { <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title>
?title . }
Query Result: title "SPARQL Tutorial"
39. To Infinity and Beyond
• DB2 and Oracle are ready for this train
•Object Database
Versant OODBMS, anybody?
•Machine-Readable Data
Will they become self-awareness?
39
46. TheGioiDi
Dong.com
LÂM’s
iPhone
BẢO’s
SS Galaxy
LÂM
BẢO
Connection Detected!
-Bao could have met Lam at Thegioididong?
-They could have discussed their World domination
scheme during the meeting there?
-??? 46
53. SL- The Opportunity for "Edutainment"
iSchool Teaching: Quizzes and Lectures
Classrooms with Powerpoint Research Center
Drexel Island on Second Life
56. Building A Portal From Open Data And
Free Services
Freely hosted Wiki service
Google Spreadsheet
Google Docs API / javascripts
Visualization services/anlalysis services (2D, 3D)
RDF/ Senmantic Web/ Webservices
Cost: free or fit to the purpose
57. Key To Success
Model
+ Transparency
Information
Data
Records
59. References
Oreilly – Beautiful data – Chapter 16th
Beautifying data in the real world
http://techland.time.com/2011/06/01/how-big-
is-the-internet-spoiler-not-as-big-as-itll-be-in-
2015/
http://drexelisland.wikispaces.com/
SMILE to 3D – Secon Life,
http://www.youtube.com/watch?v=tOfhuoRbn
Cg&feature=player_embedded