SlideShare a Scribd company logo
1 of 44
Download to read offline
From Big Data to
Valuable Knowledge
Gerard de Melo, Tsinghua University
http://gerard.demelo.org
From Big Data to
Valuable Knowledge
Gerard de Melo, Tsinghua University
http://gerard.demelo.org
25 Years of the World Wide Web:
1989−2014
25 Years of the World Wide Web:
1989−2014
http://geekcom.wordpress.com/2009/03/19/
Tim Berners-Lee
Big Data on the WebBig Data on the WebBig Data on the WebBig Data on the Web
Theological Hall, Strahov Monastery Library, Prague
Main Challenge So Far: ScaleMain Challenge So Far: ScaleMain Challenge So Far: ScaleMain Challenge So Far: Scale
Matej Kren: Idiom. Prague Municipal Library https://www.flickr.com/photos/ill-padrino/6437837857/
Developing for ScalabilityDeveloping for Scalability
official
Hadoop
WordCount
v1.0
excluding
imports
and
improvements
in WordCount
v2.0
Developing for ScalabilityDeveloping for Scalability
import com.twitter.scalding._
class WordCountJob(args : Args) extends Job(args) {
TextLine(args("input"))
.flatMap('line -> 'word) { line : String => line.split("""s+""") }
.groupBy('word) { _.size }
.write(Tsv(args("output")))
}
Developing for ScalabilityDeveloping for Scalability
Apache Spark Twitter's Scalding
Knowledge OrganizationKnowledge Organization
Image: http://commons.wikimedia.org/wiki/File:Mundaneum_Tir%C3%A4ng_Karteikaarten.jpg
Universal Bibliographic Repertory
(Repertoire Bibliographique Universel, RBU)
by Paul Otlet and Henri La Fontaine in 1895
index cards with answers to queries
Universal Bibliographic Repertory
(Repertoire Bibliographique Universel, RBU)
by Paul Otlet and Henri La Fontaine in 1895
index cards with answers to queries
Knowledge OrganizationKnowledge Organization
Image: Mundaneum
Universal Bibliographic Repertory
(Repertoire Bibliographique Universel, RBU)
by Paul Otlet and Henri La Fontaine in 1895
index cards with answers to queries
Universal Bibliographic Repertory
(Repertoire Bibliographique Universel, RBU)
by Paul Otlet and Henri La Fontaine in 1895
index cards with answers to queries
Alex Wright: This was a sort of
“analog search engine”
Alex Wright: This was a sort of
“analog search engine”
Current Challenge:Current Challenge:
Knowledge OrganizationKnowledge Organization
Current Challenge:Current Challenge:
Knowledge OrganizationKnowledge Organization
Alexandre Duret-Lutz
https://www.flickr.com/photos/gadl/110845690/
25 Years of the World Wide Web:
1989−2014
25 Years of the World Wide Web:
1989−2014
HyperText
(the “HT” in
“HTML”)
HyperText
(the “HT” in
“HTML”)
Basic Idea:
Connecting Data
Basic Idea:
Connecting Data
http://geekcom.wordpress.com/2009/03/19/
Tim Berners-Lee
25 Years of the World Wide Web:
1989−2014
25 Years of the World Wide Web:
1989−2014
Source: Ivan Herman. Introduction to Semantic Web Technologies
Data really
needs to be more
connected!
Data really
needs to be more
connected!
The Web of Data:
Linked Data
The Web of Data:
Linked Data
Semantic WebSemantic Web
Journal 2014Journal 2014
Semantic WebSemantic Web
Journal 2014Journal 2014
InterdisciplinaryInterdisciplinary
Work, e.g. inWork, e.g. in
Digital HumanitiesDigital Humanities
InterdisciplinaryInterdisciplinary
Work, e.g. inWork, e.g. in
Digital HumanitiesDigital Humanities
The Web of Data:
Lexvo.org
The Web of Data:
Lexvo.org
Source: Peter Mika
Entity Integration:
Challenges
Entity Integration:
Challenges
Entity Integration:
Challenges
Entity Integration:
Challenges
ACL 2010
AAAI 2013
ACL 2010
AAAI 2013
Entity Integration:
Challenges
Entity Integration:
Challenges
One bad link isOne bad link is
enough to make aenough to make a
connected componentconnected component
inconsistentinconsistent
One bad link isOne bad link is
enough to make aenough to make a
connected componentconnected component
inconsistentinconsistent
ACL 2010
AAAI 2013
ACL 2010
AAAI 2013
Entity Integration:
Challenges
Entity Integration:
Challenges
Min. cost solution:Min. cost solution:
NP-hardNP-hard
APX-hardAPX-hard
Min. cost solution:Min. cost solution:
NP-hardNP-hard
APX-hardAPX-hard
Entity IntegrationEntity Integration
ACL 2010
AAAI 2013
ACL 2010
AAAI 2013
Our Solution:Our Solution:
Use Linear Program andUse Linear Program and
then apply region growingthen apply region growing
techniquestechniques
→→ LogarithmicLogarithmic
ApproximationApproximation
GuaranteeGuarantee
Our Solution:Our Solution:
Use Linear Program andUse Linear Program and
then apply region growingthen apply region growing
techniquestechniques
→→ LogarithmicLogarithmic
ApproximationApproximation
GuaranteeGuarantee
Taxonomic Links
a user wants
a list of
„Art Schools in
Europe“
Taxonomic Integration:
MENTA Approach
De Melo & Weikum (2010).
CIKM Best Interdisciplinary Paper Award
De Melo & Weikum (2010).
CIKM Best Interdisciplinary Paper Award
Taxonomic Integration:
MENTA Approach
De Melo & Weikum (2010).
CIKM Best Interdisciplinary Paper Award
De Melo & Weikum (2010).
CIKM Best Interdisciplinary Paper Award
Taxonomic Integration:
MENTA Approach
De Melo & Weikum (2010).
CIKM Best Interdisciplinary Paper Award
De Melo & Weikum (2010).
CIKM Best Interdisciplinary Paper Award
Taxonomic Integration:
MENTA Approach
De Melo & Weikum (2010).
CIKM Best Interdisciplinary Paper Award
De Melo & Weikum (2010).
CIKM Best Interdisciplinary Paper Award
UWN/MENTA: multilingual extension of WordNet for
word senses and taxonomical information over 200 languages
Gerard de Melo
UWN/MENTAUWN/MENTAUWN/MENTAUWN/MENTA
Relation ExtractionRelation Extraction
Images: Denilson Barbosa, Haixun Wang, Cong Yu. Shallow Information Extraction for the Knowlege Web
Scaling Up:
Tandon, de Melo & Weikum.
AAAI 2011, COLING 2012
Scaling Up:
Tandon, de Melo & Weikum.
AAAI 2011, COLING 2012
Equivalent:
MetaWeb was acquired by Google.
MetaWeb was just recently acquired by Google.
MetaWeb, surprisingly, was acquired by Google.
Relation IntegrationRelation Integration
MetaWeb was bought out by Google.
Google bought MetaWeb.
Google acquired MetaWeb.
MetaWeb was sold to Google.
Google's acquisition of MetaWeb.
Google's MetaWeb acquisition.
and so on...
Underlying frame:
Commercial transfer
● Capture the “who-did-what-to-whom”
● Microsoft bought the patent from Nokia.
Nokia sold the patent to Microsoft.
The patent was acquired by Microsoft [from Nokia].
The patent was sold [by Nokia] to Microsoft.
Relation IntegrationRelation Integration
Buyer: Microsoft
Seller: Nokia
Product: The patent
Relation Integration:
FrameBase.org
Bringing knowledge into a standard form
based on natural language (FrameNet)
Bringing knowledge into a standard form
based on natural language (FrameNet)
Relation IntegrationRelation Integration
X isAuthorOf Y
Y writtenBy X
X wrote Y
Y writtenInYear Z
Relation IntegrationRelation Integration
YAGO: isMarriedTo predicateYAGO: isMarriedTo predicate
Freebase: Marriage EntityFreebase: Marriage Entity
Challenge:
Modelling
Differences
Challenge:
Modelling
Differences
Search Interfaces
“Which companies were created during the
last century in Silicon Valley ?”
YAGO2:
WWW 2011
Best Demo Award
YAGO2:
WWW 2011
Best Demo Award
Gerard de Melo
Real Understanding?Real Understanding?
Knowledge Bases keep growing, but
much of the Web is still not truly understood
Knowledge Bases keep growing, but
much of the Web is still not truly understood
Real Understanding?
Source: CMU NELL Browser 2015-03-17
Over 4000
countries
with >90%
confidence
Over 4000
countries
with >90%
confidence
Noisy
Patterns
Noisy
Patterns
Future Challenge:Future Challenge:
Real UnderstandingReal Understanding
Future Challenge:Future Challenge:
Real UnderstandingReal Understanding
Voynich Manuscript, early 15th century
From Big Data to KnowledgeFrom Big Data to KnowledgeFrom Big Data to KnowledgeFrom Big Data to Knowledge
Image:
Brett Ryder
Machine LearningMachine Learning
Examples
Probably
Incorrect!
LearningLearning PredictionPrediction
ClassifierModel
Incorrect
Correct
Better Machine LearningBetter Machine Learning
Examples
Probably
Incorrect!
LearningLearning PredictionPrediction
Incorrect
Correct
ClassifierModel
Better
Model!
+ Better
Labels
for Test
Data
ConversationConversation
Always there to
answer questions
Always there to
answer questions
Learning Common-SenseLearning Common-Sense
Gerard de Melo
I'm cold.
Warm coffee and tea are available at
Costa Coffee just around the corner.
But don't forget your meeting with
Linda in half an hour!
Learning Common-Sense:
From Big Data?
Learning Common-Sense:
From Big Data?
WebChild
AAAI 2014
WSDM 2014
AAAI 2011
WebChild
AAAI 2014
WSDM 2014
AAAI 2011
WebChild: Learning
Common-Sense From Big Data
WebChild: Learning
Common-Sense From Big Data
Why do you think Mary put on the
ring at the end of the movie?
Yes, that was powerful scene. The fact
that she put it on after reading the
letter from her mother indicates
that she may have changed
her mind about the value of ...
Future: Learning Advanced
Common-Sense Knowledge?
Future: Learning Advanced
Common-Sense Knowledge?
SummarySummarySummarySummary
Big Data is radically changing the world
Main Challenge in the Past: Scale
Main Current Challenge: Organization
1. Entity Integration
2. Taxonomic Integration
3. Relation Extraction and Integration
Main Future Challenge: Real Understanding
by learning from weak signals

More Related Content

Viewers also liked

IoT Knowledge Forum Slides by Deepak Gupta
IoT Knowledge Forum Slides by Deepak GuptaIoT Knowledge Forum Slides by Deepak Gupta
IoT Knowledge Forum Slides by Deepak GuptaTechXpla
 
Knowledge Engineering Management System on Cloud Technology for Externship St...
Knowledge Engineering Management System on Cloud Technology for Externship St...Knowledge Engineering Management System on Cloud Technology for Externship St...
Knowledge Engineering Management System on Cloud Technology for Externship St...Prachyanun Nilsook
 
Knowledge Management System(KMS)
Knowledge Management System(KMS)Knowledge Management System(KMS)
Knowledge Management System(KMS)ayush goyal
 
Webinar: Evolution of Data Management for the IoT
Webinar: Evolution of Data Management for the IoTWebinar: Evolution of Data Management for the IoT
Webinar: Evolution of Data Management for the IoTSnapLogic
 
Knowledge management system
Knowledge management system Knowledge management system
Knowledge management system Setyagus Sucipto
 
Types of knowledge management systems
Types of knowledge management systemsTypes of knowledge management systems
Types of knowledge management systemsNitin Reddy Katkam
 
Knowledge Management System & Technology
Knowledge Management System & TechnologyKnowledge Management System & Technology
Knowledge Management System & TechnologyElijah Ezendu
 
Knowledge Management Presentation
Knowledge Management PresentationKnowledge Management Presentation
Knowledge Management Presentationkreaume
 
Introduction to Knowledge Management
Introduction to Knowledge ManagementIntroduction to Knowledge Management
Introduction to Knowledge ManagementMiera Idayu
 

Viewers also liked (10)

IoT Knowledge Forum Slides by Deepak Gupta
IoT Knowledge Forum Slides by Deepak GuptaIoT Knowledge Forum Slides by Deepak Gupta
IoT Knowledge Forum Slides by Deepak Gupta
 
Knowledge Engineering Management System on Cloud Technology for Externship St...
Knowledge Engineering Management System on Cloud Technology for Externship St...Knowledge Engineering Management System on Cloud Technology for Externship St...
Knowledge Engineering Management System on Cloud Technology for Externship St...
 
Knowledge Management System(KMS)
Knowledge Management System(KMS)Knowledge Management System(KMS)
Knowledge Management System(KMS)
 
Webinar: Evolution of Data Management for the IoT
Webinar: Evolution of Data Management for the IoTWebinar: Evolution of Data Management for the IoT
Webinar: Evolution of Data Management for the IoT
 
Knowledge Management System
Knowledge Management SystemKnowledge Management System
Knowledge Management System
 
Knowledge management system
Knowledge management system Knowledge management system
Knowledge management system
 
Types of knowledge management systems
Types of knowledge management systemsTypes of knowledge management systems
Types of knowledge management systems
 
Knowledge Management System & Technology
Knowledge Management System & TechnologyKnowledge Management System & Technology
Knowledge Management System & Technology
 
Knowledge Management Presentation
Knowledge Management PresentationKnowledge Management Presentation
Knowledge Management Presentation
 
Introduction to Knowledge Management
Introduction to Knowledge ManagementIntroduction to Knowledge Management
Introduction to Knowledge Management
 

Similar to From Big Data to Valuable Knowledge

Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsJie Bao
 
Mike_Nelson_Amplify 11
Mike_Nelson_Amplify 11Mike_Nelson_Amplify 11
Mike_Nelson_Amplify 11AmplifyFest
 
Bigdata 2014... the year it rained tacos
Bigdata 2014... the year it rained tacosBigdata 2014... the year it rained tacos
Bigdata 2014... the year it rained tacosUpstarts.tv
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadKelly Technologies
 
Linked In Data & Web Content Management Systems by TERMINALFOUR
Linked In Data & Web Content Management Systems by TERMINALFOURLinked In Data & Web Content Management Systems by TERMINALFOUR
Linked In Data & Web Content Management Systems by TERMINALFOURptintori
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Big data-and-creativity v.1
Big data-and-creativity v.1Big data-and-creativity v.1
Big data-and-creativity v.1Kim Flintoff
 
government in the 2.0 era [2008 IACA Conference]
government in the 2.0 era [2008 IACA Conference]government in the 2.0 era [2008 IACA Conference]
government in the 2.0 era [2008 IACA Conference]Hillary Hartley
 
The Googlization of Business Intelligence
The Googlization of Business IntelligenceThe Googlization of Business Intelligence
The Googlization of Business IntelligenceSander Duivestein ✔
 
090827 Information Society Future Of And Digital Media Trends
090827   Information Society   Future Of And Digital Media Trends090827   Information Society   Future Of And Digital Media Trends
090827 Information Society Future Of And Digital Media Trendspetter
 
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...FIA2010
 
Social Networking & Microblogging For Museums
Social Networking & Microblogging For MuseumsSocial Networking & Microblogging For Museums
Social Networking & Microblogging For MuseumsTristan Denmark
 
Social media course 2010 2011: what's going on online?
Social media course 2010   2011: what's going on online?Social media course 2010   2011: what's going on online?
Social media course 2010 2011: what's going on online?guillaume ereteo
 
THE IMPACT OF DIGITAL COMMUNICATION ON SOCIAL NETWORK
THE IMPACT OF DIGITAL COMMUNICATION ON SOCIAL NETWORKTHE IMPACT OF DIGITAL COMMUNICATION ON SOCIAL NETWORK
THE IMPACT OF DIGITAL COMMUNICATION ON SOCIAL NETWORKAbdul Razaq
 
The web bang project michele zadra
The web bang project michele zadraThe web bang project michele zadra
The web bang project michele zadraMichele Zadra
 
Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0 Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0 Judy O'Connell
 
21st century learning environments costa
21st century learning environments costa21st century learning environments costa
21st century learning environments costaAdam Garry
 
Hadoop, Iot and Analytics- The Three Musketeers
Hadoop, Iot and Analytics- The Three MusketeersHadoop, Iot and Analytics- The Three Musketeers
Hadoop, Iot and Analytics- The Three MusketeersEdureka!
 

Similar to From Big Data to Valuable Knowledge (20)

Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
 
Mike_Nelson_Amplify 11
Mike_Nelson_Amplify 11Mike_Nelson_Amplify 11
Mike_Nelson_Amplify 11
 
Virtual World Basics
Virtual World BasicsVirtual World Basics
Virtual World Basics
 
Bigdata 2014... the year it rained tacos
Bigdata 2014... the year it rained tacosBigdata 2014... the year it rained tacos
Bigdata 2014... the year it rained tacos
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Linked In Data & Web Content Management Systems by TERMINALFOUR
Linked In Data & Web Content Management Systems by TERMINALFOURLinked In Data & Web Content Management Systems by TERMINALFOUR
Linked In Data & Web Content Management Systems by TERMINALFOUR
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Big data-and-creativity v.1
Big data-and-creativity v.1Big data-and-creativity v.1
Big data-and-creativity v.1
 
government in the 2.0 era [2008 IACA Conference]
government in the 2.0 era [2008 IACA Conference]government in the 2.0 era [2008 IACA Conference]
government in the 2.0 era [2008 IACA Conference]
 
The Googlization of Business Intelligence
The Googlization of Business IntelligenceThe Googlization of Business Intelligence
The Googlization of Business Intelligence
 
090827 Information Society Future Of And Digital Media Trends
090827   Information Society   Future Of And Digital Media Trends090827   Information Society   Future Of And Digital Media Trends
090827 Information Society Future Of And Digital Media Trends
 
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
 
Social Networking & Microblogging For Museums
Social Networking & Microblogging For MuseumsSocial Networking & Microblogging For Museums
Social Networking & Microblogging For Museums
 
Social media course 2010 2011: what's going on online?
Social media course 2010   2011: what's going on online?Social media course 2010   2011: what's going on online?
Social media course 2010 2011: what's going on online?
 
THE IMPACT OF DIGITAL COMMUNICATION ON SOCIAL NETWORK
THE IMPACT OF DIGITAL COMMUNICATION ON SOCIAL NETWORKTHE IMPACT OF DIGITAL COMMUNICATION ON SOCIAL NETWORK
THE IMPACT OF DIGITAL COMMUNICATION ON SOCIAL NETWORK
 
The web bang project michele zadra
The web bang project michele zadraThe web bang project michele zadra
The web bang project michele zadra
 
Semantic Web, an introduction
Semantic Web, an introductionSemantic Web, an introduction
Semantic Web, an introduction
 
Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0 Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0
 
21st century learning environments costa
21st century learning environments costa21st century learning environments costa
21st century learning environments costa
 
Hadoop, Iot and Analytics- The Three Musketeers
Hadoop, Iot and Analytics- The Three MusketeersHadoop, Iot and Analytics- The Three Musketeers
Hadoop, Iot and Analytics- The Three Musketeers
 

More from Gerard de Melo

SEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link PredictionSEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link PredictionGerard de Melo
 
How to Manage your Research
How to Manage your ResearchHow to Manage your Research
How to Manage your ResearchGerard de Melo
 
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesKnowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesGerard de Melo
 
Learning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebGerard de Melo
 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningGerard de Melo
 
Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Gerard de Melo
 
From Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated DataFrom Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated DataGerard de Melo
 
Information Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataGerard de Melo
 
UWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge BaseUWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge BaseGerard de Melo
 
Multilingual Text Classification using Ontologies
Multilingual Text Classification using OntologiesMultilingual Text Classification using Ontologies
Multilingual Text Classification using OntologiesGerard de Melo
 
Extracting Sense-Disambiguated Example Sentences From Parallel Corpora
Extracting Sense-Disambiguated Example Sentences From Parallel CorporaExtracting Sense-Disambiguated Example Sentences From Parallel Corpora
Extracting Sense-Disambiguated Example Sentences From Parallel CorporaGerard de Melo
 
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined EvidenceTowards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined EvidenceGerard de Melo
 
Not Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked DataNot Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked DataGerard de Melo
 
Good, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic IntensitiesGood, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic IntensitiesGerard de Melo
 
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged OntologyYAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged OntologyGerard de Melo
 

More from Gerard de Melo (15)

SEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link PredictionSEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link Prediction
 
How to Manage your Research
How to Manage your ResearchHow to Manage your Research
How to Manage your Research
 
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesKnowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
 
Learning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the Web
 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data Mining
 
Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)
 
From Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated DataFrom Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated Data
 
Information Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram Data
 
UWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge BaseUWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge Base
 
Multilingual Text Classification using Ontologies
Multilingual Text Classification using OntologiesMultilingual Text Classification using Ontologies
Multilingual Text Classification using Ontologies
 
Extracting Sense-Disambiguated Example Sentences From Parallel Corpora
Extracting Sense-Disambiguated Example Sentences From Parallel CorporaExtracting Sense-Disambiguated Example Sentences From Parallel Corpora
Extracting Sense-Disambiguated Example Sentences From Parallel Corpora
 
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined EvidenceTowards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
 
Not Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked DataNot Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked Data
 
Good, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic IntensitiesGood, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic Intensities
 
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged OntologyYAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
 

Recently uploaded

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

From Big Data to Valuable Knowledge

  • 1. From Big Data to Valuable Knowledge Gerard de Melo, Tsinghua University http://gerard.demelo.org From Big Data to Valuable Knowledge Gerard de Melo, Tsinghua University http://gerard.demelo.org
  • 2. 25 Years of the World Wide Web: 1989−2014 25 Years of the World Wide Web: 1989−2014 http://geekcom.wordpress.com/2009/03/19/ Tim Berners-Lee
  • 3. Big Data on the WebBig Data on the WebBig Data on the WebBig Data on the Web Theological Hall, Strahov Monastery Library, Prague
  • 4. Main Challenge So Far: ScaleMain Challenge So Far: ScaleMain Challenge So Far: ScaleMain Challenge So Far: Scale Matej Kren: Idiom. Prague Municipal Library https://www.flickr.com/photos/ill-padrino/6437837857/
  • 7. import com.twitter.scalding._ class WordCountJob(args : Args) extends Job(args) { TextLine(args("input")) .flatMap('line -> 'word) { line : String => line.split("""s+""") } .groupBy('word) { _.size } .write(Tsv(args("output"))) } Developing for ScalabilityDeveloping for Scalability Apache Spark Twitter's Scalding
  • 8. Knowledge OrganizationKnowledge Organization Image: http://commons.wikimedia.org/wiki/File:Mundaneum_Tir%C3%A4ng_Karteikaarten.jpg Universal Bibliographic Repertory (Repertoire Bibliographique Universel, RBU) by Paul Otlet and Henri La Fontaine in 1895 index cards with answers to queries Universal Bibliographic Repertory (Repertoire Bibliographique Universel, RBU) by Paul Otlet and Henri La Fontaine in 1895 index cards with answers to queries
  • 9. Knowledge OrganizationKnowledge Organization Image: Mundaneum Universal Bibliographic Repertory (Repertoire Bibliographique Universel, RBU) by Paul Otlet and Henri La Fontaine in 1895 index cards with answers to queries Universal Bibliographic Repertory (Repertoire Bibliographique Universel, RBU) by Paul Otlet and Henri La Fontaine in 1895 index cards with answers to queries Alex Wright: This was a sort of “analog search engine” Alex Wright: This was a sort of “analog search engine”
  • 10. Current Challenge:Current Challenge: Knowledge OrganizationKnowledge Organization Current Challenge:Current Challenge: Knowledge OrganizationKnowledge Organization Alexandre Duret-Lutz https://www.flickr.com/photos/gadl/110845690/
  • 11. 25 Years of the World Wide Web: 1989−2014 25 Years of the World Wide Web: 1989−2014 HyperText (the “HT” in “HTML”) HyperText (the “HT” in “HTML”) Basic Idea: Connecting Data Basic Idea: Connecting Data http://geekcom.wordpress.com/2009/03/19/ Tim Berners-Lee
  • 12. 25 Years of the World Wide Web: 1989−2014 25 Years of the World Wide Web: 1989−2014 Source: Ivan Herman. Introduction to Semantic Web Technologies Data really needs to be more connected! Data really needs to be more connected!
  • 13. The Web of Data: Linked Data The Web of Data: Linked Data
  • 14. Semantic WebSemantic Web Journal 2014Journal 2014 Semantic WebSemantic Web Journal 2014Journal 2014 InterdisciplinaryInterdisciplinary Work, e.g. inWork, e.g. in Digital HumanitiesDigital Humanities InterdisciplinaryInterdisciplinary Work, e.g. inWork, e.g. in Digital HumanitiesDigital Humanities The Web of Data: Lexvo.org The Web of Data: Lexvo.org
  • 15. Source: Peter Mika Entity Integration: Challenges Entity Integration: Challenges
  • 17. ACL 2010 AAAI 2013 ACL 2010 AAAI 2013 Entity Integration: Challenges Entity Integration: Challenges
  • 18. One bad link isOne bad link is enough to make aenough to make a connected componentconnected component inconsistentinconsistent One bad link isOne bad link is enough to make aenough to make a connected componentconnected component inconsistentinconsistent ACL 2010 AAAI 2013 ACL 2010 AAAI 2013 Entity Integration: Challenges Entity Integration: Challenges
  • 19. Min. cost solution:Min. cost solution: NP-hardNP-hard APX-hardAPX-hard Min. cost solution:Min. cost solution: NP-hardNP-hard APX-hardAPX-hard Entity IntegrationEntity Integration ACL 2010 AAAI 2013 ACL 2010 AAAI 2013 Our Solution:Our Solution: Use Linear Program andUse Linear Program and then apply region growingthen apply region growing techniquestechniques →→ LogarithmicLogarithmic ApproximationApproximation GuaranteeGuarantee Our Solution:Our Solution: Use Linear Program andUse Linear Program and then apply region growingthen apply region growing techniquestechniques →→ LogarithmicLogarithmic ApproximationApproximation GuaranteeGuarantee
  • 20. Taxonomic Links a user wants a list of „Art Schools in Europe“
  • 21. Taxonomic Integration: MENTA Approach De Melo & Weikum (2010). CIKM Best Interdisciplinary Paper Award De Melo & Weikum (2010). CIKM Best Interdisciplinary Paper Award
  • 22. Taxonomic Integration: MENTA Approach De Melo & Weikum (2010). CIKM Best Interdisciplinary Paper Award De Melo & Weikum (2010). CIKM Best Interdisciplinary Paper Award
  • 23. Taxonomic Integration: MENTA Approach De Melo & Weikum (2010). CIKM Best Interdisciplinary Paper Award De Melo & Weikum (2010). CIKM Best Interdisciplinary Paper Award
  • 24. Taxonomic Integration: MENTA Approach De Melo & Weikum (2010). CIKM Best Interdisciplinary Paper Award De Melo & Weikum (2010). CIKM Best Interdisciplinary Paper Award
  • 25. UWN/MENTA: multilingual extension of WordNet for word senses and taxonomical information over 200 languages Gerard de Melo UWN/MENTAUWN/MENTAUWN/MENTAUWN/MENTA
  • 26. Relation ExtractionRelation Extraction Images: Denilson Barbosa, Haixun Wang, Cong Yu. Shallow Information Extraction for the Knowlege Web Scaling Up: Tandon, de Melo & Weikum. AAAI 2011, COLING 2012 Scaling Up: Tandon, de Melo & Weikum. AAAI 2011, COLING 2012
  • 27. Equivalent: MetaWeb was acquired by Google. MetaWeb was just recently acquired by Google. MetaWeb, surprisingly, was acquired by Google. Relation IntegrationRelation Integration MetaWeb was bought out by Google. Google bought MetaWeb. Google acquired MetaWeb. MetaWeb was sold to Google. Google's acquisition of MetaWeb. Google's MetaWeb acquisition. and so on...
  • 28. Underlying frame: Commercial transfer ● Capture the “who-did-what-to-whom” ● Microsoft bought the patent from Nokia. Nokia sold the patent to Microsoft. The patent was acquired by Microsoft [from Nokia]. The patent was sold [by Nokia] to Microsoft. Relation IntegrationRelation Integration Buyer: Microsoft Seller: Nokia Product: The patent
  • 29. Relation Integration: FrameBase.org Bringing knowledge into a standard form based on natural language (FrameNet) Bringing knowledge into a standard form based on natural language (FrameNet)
  • 30. Relation IntegrationRelation Integration X isAuthorOf Y Y writtenBy X X wrote Y Y writtenInYear Z
  • 31. Relation IntegrationRelation Integration YAGO: isMarriedTo predicateYAGO: isMarriedTo predicate Freebase: Marriage EntityFreebase: Marriage Entity Challenge: Modelling Differences Challenge: Modelling Differences
  • 32. Search Interfaces “Which companies were created during the last century in Silicon Valley ?” YAGO2: WWW 2011 Best Demo Award YAGO2: WWW 2011 Best Demo Award Gerard de Melo
  • 33. Real Understanding?Real Understanding? Knowledge Bases keep growing, but much of the Web is still not truly understood Knowledge Bases keep growing, but much of the Web is still not truly understood
  • 34. Real Understanding? Source: CMU NELL Browser 2015-03-17 Over 4000 countries with >90% confidence Over 4000 countries with >90% confidence Noisy Patterns Noisy Patterns
  • 35. Future Challenge:Future Challenge: Real UnderstandingReal Understanding Future Challenge:Future Challenge: Real UnderstandingReal Understanding Voynich Manuscript, early 15th century
  • 36. From Big Data to KnowledgeFrom Big Data to KnowledgeFrom Big Data to KnowledgeFrom Big Data to Knowledge Image: Brett Ryder
  • 37. Machine LearningMachine Learning Examples Probably Incorrect! LearningLearning PredictionPrediction ClassifierModel Incorrect Correct
  • 38. Better Machine LearningBetter Machine Learning Examples Probably Incorrect! LearningLearning PredictionPrediction Incorrect Correct ClassifierModel Better Model! + Better Labels for Test Data
  • 39. ConversationConversation Always there to answer questions Always there to answer questions
  • 40. Learning Common-SenseLearning Common-Sense Gerard de Melo I'm cold. Warm coffee and tea are available at Costa Coffee just around the corner. But don't forget your meeting with Linda in half an hour!
  • 41. Learning Common-Sense: From Big Data? Learning Common-Sense: From Big Data?
  • 42. WebChild AAAI 2014 WSDM 2014 AAAI 2011 WebChild AAAI 2014 WSDM 2014 AAAI 2011 WebChild: Learning Common-Sense From Big Data WebChild: Learning Common-Sense From Big Data
  • 43. Why do you think Mary put on the ring at the end of the movie? Yes, that was powerful scene. The fact that she put it on after reading the letter from her mother indicates that she may have changed her mind about the value of ... Future: Learning Advanced Common-Sense Knowledge? Future: Learning Advanced Common-Sense Knowledge?
  • 44. SummarySummarySummarySummary Big Data is radically changing the world Main Challenge in the Past: Scale Main Current Challenge: Organization 1. Entity Integration 2. Taxonomic Integration 3. Relation Extraction and Integration Main Future Challenge: Real Understanding by learning from weak signals