SlideShare a Scribd company logo
1 of 115
Coping with Data Variety
in the Big Data Era:
The Semantic Computing Approach
André Freitas
Insight Centre for Data Analytics
Rio Big Data Meetup (June 2014)
Outline
 Shift in the Information Systems Landscape
 Semantic Computing
 Semantics Technologies that Work Today: Data Creation
 Semantics Technologies that Work Today: Data Consumption
 Case Study: Treo QA System
 Conclusions
Shift in the Information
Systems Landscape
Big Data
 Vision: More complete data-based picture of the world for
systems and users.
Big Data Dimensions
 Volume
 Velocity
 Variety
Big Data Dimensions
 Volume
 Velocity
 Variety
 Veracity
 Value
Big Data Definitions
7
Data Variety
What is Big Data?
Cost of Making Sense of It
“A lot of Big Data is a lot of small data put together.”
“Most of Big Data is not a
uniform big block.”
“Each data piece is very small and very
messy, and a lot of what we are doing
there is dealing with that variety.”
Cost of Making Sense of It
“It is more about the rate of change, the amount and the
resources that you need to deal with it.”“If the programming effort per amount of
high quality data is really high, the data is
big in the sense of high cost to produce
new information.”
“Big Data seems to be about addressing challenges
of scale, in terms of how fast things are coming
out at you versus how much it costs to get value
out of what you already have.”
Cost of Making Sense of It
“You can have Big Data challenges not only
because you have PBs of data but because data
is incredibly varied and therefore consumes a
lot of resources to make sense of it.”
Cost of Making Sense of It
“The speed in which data is generated and the
speed in which it needs to be processed in
order to use it effectively.”
“Schema” Growth
 Heterogeneous, complex and large-scale databases.
 Very-large and dynamic “schemas”.
10s-100s attributes
1,000s-1,000,000s attributes
circa 2000
circa 2014
Semantic Heterogeneity
 Decentralized content generation.
 Multiple perspectives (conceptualizations) of the reality.
 Ambiguity, vagueness, inconsistency.
Data variety +
Data quality -
Data
Programs
Full data coverage
Full automation
Structure level
Unstructured Data Structured Data
Consistent
Comparable
Processable
Easy to generate Easy to analyze
Semantic Computing
The Futurist Perspective
The Futurist Perspective
 AI vision
 Full automation
 Perfect natural language
interaction
The Realist Perspective
What can be achieved with semantic computing today?
Google Knowledge Graph
FB Graph Search
Apple Siri
IBM Watson
QA: Vision
Semantic Computing
(Some) Challenges in Semantics
Knowledge Representation Model
Reasoning
Large, inconsistent,
heterogeneous
Data
Expected Result: intelligent behavior
Semantic flexibility, predictive power, automation ...
Acquisition, Learning
There is an economical model behind each element!
Meaning
 Word meaning is usually represented in terms of some formal,
symbolic structure, either external or internal to the word
 External structure
- Associations between different concepts
 Internal structure
- Feature (property, attribute) lists
 The semantic properties of a word are derived from the formal
structure of its representation
- e.g. Inference algorithm, etc.
Semantics = Meaning representation model (data) +
inference model
Formal Representation of Meaning
(Problems)
 Different meanings
- bank (financial institution)
bank (river side)
 Meaning variation in context
 Meaning evolution
 Ambiguity, vagueness, inconsistency
Formal Representation of Meaning
(Problems)
 Different meanings
- bank (financial institution)
bank (river side)
 Meaning variation in context
- clever politician, clever tycoon
 Meaning evolution
 Ambiguity, vagueness, inconsistency
Word meaning acquisition &
representation
Lack of flexibility
Scalability
 Most semantic models have dealt with particular types of
constructions, and have been carried out under very simplifying
assumptions, in true lab conditions.
 If these idealizations are removed it is not clear at all that modern
semantics can give a full account of all but the simplest
models/statements.
Sahlgren, 2013
Formal World Real World
Baroni et al. 2013
Semantics for a Complex World
Semantics Technologies that
Work Today
Data Creation
Data Creation
 Human interaction element (Data Curation)
 Semantic representation
 Information extraction
Data Curation
Entity-Centric Content Generation
Defining Core Categories
Disambiguation/Synonym
Defining Attributes & Relationships
Data curation elements
 Data curation platforms
- Spreadsheets
- Open Refine
- Karma
 Algorithmic curation
- Validation & Annotation robots
 Curation at source
- Minimal Information Models (MIRIAM)
 Data curation roles
 Crowdsourcing
Standardized Data Models
 Provides a minimum level of data interoperability
 Examples:
- Resource Description Framework (RDF)
- Linked Comma Separated Value (CSV)
- Javascript Object Notation (JSON)
Resource Description Framework (RDF)
 Graph data model
 Entity-centric data integration
 Facilitates decentralized content generation
 URIs for concept identfiers
 Associated structured query language (SPARQL)
Resource Description Framework (RDF)
dbpedia:General_Electric "US$ 147.3 billion"@en
dbp:revenue
rdf:type
dbo:Organization
dbpedia:Fairfield, Connecticutdbp:locationCity
Resource Description Framework (RDF)
dbpedia:General_Electric "US$ 147.3 billion"@en
dbp:revenue
rdf:type
dbo:Organization
sec:General_Electric
ifrs:CashFlowsFromUsedInOperationsTotal
…
dbpedia:Fairfield, Connecticutdbp:locationCity
Resource Description Framework (RDF)
dbpedia:General_Electric "US$ 147.3 billion"@en
dbp:revenue
rdf:type
dbo:Organization
sec:General_Electric
ifrs:CashFlowsFromUsedInOperationsTotal
…
dbpedia:Fairfield, Connecticutdbp:locationCity
owl:sameAs
Resource Description Framework (RDF)
dbpedia:General_Electric "US$ 147.3 billion"@en
dbp:revenue
rdf:type
dbo:Organization
sec:General_Electric
ifrs:CashFlowsFromUsedInOperationsTotal
…
dbpedia:Fairfield, Connecticutdbp:locationCity
geo:Fairfield
"N 41° 13' 29''
geo:latitude
owl:sameAs
Resource Description Framework (RDF)
dbpedia:General_Electric "US$ 147.3 billion"@en
dbp:revenue
rdf:type
dbo:Organization
sec:General_Electric
ifrs:CashFlowsFromUsedInOperationsTotal
…
dbpedia:Fairfield, Connecticutdbp:locationCity
geo:Fairfield
"N 41° 13' 29''
geo:latitude
owl:sameAs
owl:sameAs
Representation
 Rules (SWRL, RIF)
 Ontology (OWL)
– Logical Constraints
 Taxonomy (RDFS)
– Classes in sub-/super-class hierarchy
 Relational (RDF)
– Attributes
– Associations
 Dictionary
– Terms and definitions
Increasing
Semantic
Representation
Representation
Increasing
Semantic
Representation
Linked Data
HTTP
request
RDF JSON
SPARQL
R2RML
Relational
Database
http://dbpedia.org/resource/Jupiter
Open Data
 Common-sense Knowledge Base
 Domain-specific Knowledge Base
 Entity reference system
 DBpedia
- http://dbpedia.org/
 YAGO
- http://www.mpi-inf.mpg.de/yago-naga/yago/
 Freebase
- http://www.freebase.com/
 Wikipedia dumps
- http://dumps.wikimedia.org/
 ConceptNet
- http:// conceptnet5.media.mit.edu/
 Geonames
- http://www.geonames.org/
 Common Crawl
- http://commoncrawl.org/
Open Data
Standardized Vocabularies
 Open conceptual models to be reused across different
datasets
 Provides conceptual model level interoperability
 Useful to be used for modelling recurrent domains of
discourse
Standardized Vocabularies
 FOAF
 SIOC
 COGS
 Data Cube Vocabulary
 PROV-O
 DCTERMS
 WGS84 Geo Positioning
 SDMX
 QUDT
 SSN
 Schema.org
 VoID
 Data Catalog
 ...
http://lov.okfn.org/dataset/lov/
Entity Recognition & Linking
 Align terms in unstructured text to entities in a structured KB
 Integrates structured to unstructured data
Entity Recognition & Linking
 Align terms in unstructured text to entities in a structured KB
 Integrates structured to unstructured data
Entity Recognition & Linking
 Align terms in unstructured text to entities in a structured KB
 Integrates structured to unstructured data
 Can be used to support semantic search
 Provides a first level of structure to unstructured data
 Exploratory browsing
Entity Recognition & Linking
 Example:
“GE has also been implicated in the creation of toxic waste.”
Entity Recognition & Linking
 Example:
“GE has also been implicated in the creation of toxic waste.”
Entity Recognition & Linking
 Example:
“GE has also been implicated in the creation of toxic waste.”
<http://dbpedia.org/resource/General_Electric>
yago:ConglomerateCompanies
yago:MedicalEquipmentManufacturers
yago:CompaniesListedOnTheNewYorkStockExchange
Entity Recognition & Linking
 Example:
“GE has also been implicated in the creation of toxic waste.”
<http://dbpedia.org/resource/Toxic_waste>
 DBpedia Spotlight
- http://spotlight.dbpedia.org
 NERD (Named Entity Recognition and Disambiguation)
- http://nerd.eurecom.fr/
 Stanford Named Entity Recognizer
- http://nlp.stanford.edu/software/CRF-NER.shtml
Entity Recognition/Linking
Syntactic Parsers
GE/NNP has/VBZ also/RB been/VBN implicated/VBN in/IN the/DT creation/NN of/IN
toxic/JJ waste/NN
 Stanford parser
- http://nlp.stanford.edu/software/lex-parser.shtml
- Languages: English, German, Chinese, and others
 MALT
- http://www.maltparser.org/
- Languages (pre-trained): English, French, Swedish
 C&C Parser
- http://svn.ask.it.usyd.edu.au/trac/candc
Parsers
 GATE (General Architecture for Text Engineering)
- http://gate.ac.uk/
 NLTK (Natural Language Toolkit)
- http://nltk.org/
 Stanford NLP
- http://www-nlp.stanford.edu/software/index.shtml
 LingPipe
- http://alias-i.com/lingpipe/index.html
Text Processing Tools
Database Representation
 Easy evolution of schemas (schema-less)
 Graph Databases
- OpenLink Virtuoso
- Neo4J
- Transforming Lucene into a Graph Database
 NoSQL ...
 Apache Unstructured Information Management Architecture
(UIMA)
- Component software architecture for the analysis of unstructured data
- http://uima.apache.org/
 NLP Interchange Format (NIF)
- RDF & OWL-based
- http://persistence.uni-leipzig.org/nlp2rdf/
NLP Integration
Relation/Graph Extraction
 Reverb
- http://reverb.cs.washington.edu/
 Graphia
- http://graphia.dcc.ufrj.br/
Relation/Graph Extraction
In 2002, GE acquired the wind power assets of Enron.In 2002 GE acquired the wind power assets of Enron
Relation/Graph Extraction
General Electric Company, or GE , is an American multinational conglomerate
corporation incorporated in Schenectady , New York
Semantics Technologies that
Work Today
Data Consumption
Vector Space Models
 Representation useful for approximate search
 Search over structured and unstructured data
 Construction of approximate semantic models
Vector Space Models
θ
http://en.wikipedia.org/wiki/General_Electric
General
Electric
...
“General Electric company”
 Lucene & Solr
- http://lucene.apache.org/
 Terrier
- http://terrier.org/
Indexing & Search Engines
Distributional Hypothesis
“Words occurring in similar (linguistic) contexts tend
to be semantically similar”
 He filled the wampimuk with the substance, passed it
around and we all drunk some
 We found a little, hairy wampimuk sleeping behind the
tree
Distributional Semantic Models (DSMs)
 Computational models that build contextual semantic representations
from corpus data
 Semantic context is represented by a vector
 Vectors are obtained through the statistical analysis of the linguistic
contexts of a word
 Salience of contexts (cf. context weighting scheme)
 Semantic similarity/relatedness as the core operation over the model
DSMs as Commonsense Reasoning
Commonsense is here
θ
car
dog
cat
bark
run
leash
DSMs as Commonsense Reasoning
DSMs as Commonsense Reasoning
DSMs as Commonsense Reasoning
DSMs as Commonsense Reasoning
θ
car
dog
cat
bark
run
leash
...
vs.
Semantic best-effort
Distributional Semantic Models (DSMs)
 Amtera Esprit (distributional semantic relatedness)
- http://www.mashape.com/amtera/esa-semantic-relatedness
 WS4J (Java API for several semantic relatedness
algorithms)
- https://code.google.com/p/ws4j/
 SecondString (string matching)
- http://secondstring.sourceforge.net
 S-space (distributional semantics framework)
- https://github.com/fozziethebeat/S-Space
String similarity and semantic relatedness
 WordNet
- http://wordnet.princeton.edu/
 Wiktionary
- http://www.wiktionary.org/
 FrameNet
- https://framenet.icsi.berkeley.edu/fndrupal/
 VerbNet
- http://verbs.colorado.edu/~mpalmer/projects/verbnet.html
 BabelNet
- http://babelnet.org/
Lexical Resources
Entity
Recognition &
Linking
Distributional
Semantics
Relation/Graph
Extraction
Internal
Datasets
Reference
Corpora
Semantic
Pipeline
Vocabulary
Management
Semantic
Search & QA
Crawling &
Indexing
Open
Data
Vocabularies,
Taxonomies,
Lexical
Resources
Internal
Documents
Knowledge
Graph
Management
Knowledge
Graph
Data Curation
Platform
Crowdsourcing
Services
Applications
User
feedback
Provenance
Management
Case Study:
Treo QA System
Querying your Knowledge Graph
Gaelic: direction
Solution (Video)
More Complex Queries (Video)
Vocabulary Problem
Query: Who is the daughter of Bill Clinton married to?
Possible representations = Commonsense Knowledge
Dataset (DBpedia 3.7 + YAGO): 45,767 predicates, 5,556,492 classes and
9,434,677 instances
Vocabulary Problem
Query: Who is the daughter of Bill Clinton married to?
Semantic approximationSemantic Gap
Possible representations = Commonsense Knowledge
Dataset (DBpedia 3.7 + YAGO): 45,767 predicates, 5,556,492 classes and
9,434,677 instances
Core Principles
 Minimize the impact of Ambiguity, Vagueness, Synonymy.
 Address the simplest matchings first (heuristics).
 Semantic Relatedness as a primitive operation.
 Distributional semantics as commonsense knowledge.
Step 1: POS Tagging
Who/WP
is/VBZ
the/DT
daughter/NN
of/IN
Bill/NNP
Clinton/NNP
married/VBN
to/TO
?/.
Query Pre-Processing
(Question Analysis)
Step 2: Core Entity Recognition
Rules-based: POS Tag + TF/IDF
Who is the daughter of Bill Clinton married to?
(PROBABLY AN INSTANCE)
Query Pre-Processing
(Question Analysis)
Step 3: Determine answer type
Rules-based.
Who is the daughter of Bill Clinton married to?
(PERSON)
Query Pre-Processing
(Question Analysis)
Step 4: Dependency parsing
dep(married-8, Who-1)
auxpass(married-8, is-2)
det(daughter-4, the-3)
nsubjpass(married-8, daughter-4)
prep(daughter-4, of-5)
nn(Clinton-7, Bill-6)
pobj(of-5, Clinton-7)
root(ROOT-0, married-8)
xcomp(married-8, to-9)
Query Pre-Processing
(Question Analysis)
Step 5: Determine Partial Ordered Dependency Structure
(PODS)
Rules based.
Remove stop words.
Merge words into entities.
Reorder structure from core entity position.
Query Pre-Processing
(Question Analysis)
(INSTANCE)
ANSWER
TYPE
QUESTION FOCUS
Bill Clinton daughter married to
Question Analysis
Query Features
Bill Clinton daughter married to
(INSTANCE) (PREDICATE) (PREDICATE) Query Features
PODS
Query Plan
Map query features into a query plan.
A query plan contains a sequence of core operations.
(INSTANCE) (PREDICATE) (PREDICATE) Query Features
Query Plan
 (1) INSTANCE SEARCH (Bill Clinton)
 (2) p1 <- SEARCH PREDICATE (Bill Clintion, daughter)
 (3) e1 <- NAVIGATE (Bill Clintion, p1)
 (4) p2 <- SEARCH PREDICATE (e1, married to)
 (5) e2 <- NAVIGATE (e1, p2)
Instance Search
Bill Clinton daughter married to
:Bill_Clinton
Query:
Linked
Data:
Instance Search
Predicate Search
Bill Clinton daughter married to
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
:Baptists
:religion
:Yale_Law_School
:almaMater
...
(PIVOT ENTITY)
(ASSOCIATED
TRIPLES)
Predicate Search
Bill Clinton daughter married to
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
:Baptists
:religion
:Yale_Law_School
:almaMater
...
sem_rel(daughter,child)=0.054
Which properties are semantically related to ‘daughter’?
Predicate Search
Bill Clinton daughter married to
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
:Baptists
:religion
:Yale_Law_School
:almaMater
...
sem_rel(daughter,child)=0.054
sem_rel(daughter,child)=0.004
Which properties are semantically related to ‘daughter’?
Predicate Search
Bill Clinton daughter married to
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
:Baptists
:religion
:Yale_Law_School
:almaMater
...
sem_rel(daughter,child)=0.054
sem_rel(daughter,child)=0.004
sem_rel(daughter,alma mater)=0.001
Which properties are semantically related to ‘daughter’?
Navigate
Bill Clinton daughter married to
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
Navigate
Bill Clinton daughter married to
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
(PIVOT ENTITY)
Predicate Search
Bill Clinton daughter married to
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
(PIVOT ENTITY)
:Mark_Mezvinsky
:spouse
Results
Evaluation
 102 natural language queries (Test Collection: QALD 2011).
 Avg. query execution time: 1.52 s (simple queries) – 8.53 s
(all queries).
Treo Answers Jeopardy Queries (Video)
http://bit.ly/1hWcch9
Hybrid unstructured & structured
Sydney's dad, Jack, was a CIA double agent working against SD-6 on this
Jennifer Garner show.
Core Principles
 Semantic best-effort
 Dialog & user disambiguation
 Pay-as-you-go data integration
 Simplicity of use
 Franklin et al. (2005): From Databases to Dataspaces.
 Helland (2011): If You Have Too Much Data, then “Good
Enough” Is Good Enough.
Take-away message
 There are approaches that can be used today to cope with
data variety in the Big Data era
 Coping with data variety demands a multi-disciplinary
perspective and a new infrastructure
- Knowledge Representation, IR and Natural Language Processing
 Semantics at scale as a central concern
 You can build your own IBM Watson-like application!
 Great opportunity for new solutions and for being a pioneer
andre.freitas – at – deri.org

More Related Content

What's hot

Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentalsrjain51
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsChandan Rajah
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesKaran Deep Singh
 
Intro to big data and applications - day 1
Intro to big data and applications - day 1Intro to big data and applications - day 1
Intro to big data and applications - day 1Parviz Vakili
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge GraphLukas Masuch
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Map Reduce in Big fata
Map Reduce in Big fataMap Reduce in Big fata
Map Reduce in Big fataSuraj Sawant
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...BigMine
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big DataMatthew Dennis
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...DATAVERSITY
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1gauravsc36
 
The role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practiceThe role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practiceJoseph Benjamin Ilagan
 

What's hot (20)

Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and Issues
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Intro to big data and applications - day 1
Intro to big data and applications - day 1Intro to big data and applications - day 1
Intro to big data and applications - day 1
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge Graph
 
Big Data Tutorial V4
Big Data Tutorial V4Big Data Tutorial V4
Big Data Tutorial V4
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Map Reduce in Big fata
Map Reduce in Big fataMap Reduce in Big fata
Map Reduce in Big fata
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
The role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practiceThe role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practice
 
STI Summit 2011 - Digital Worlds
STI Summit 2011 - Digital WorldsSTI Summit 2011 - Digital Worlds
STI Summit 2011 - Digital Worlds
 
Big data 101
Big data 101Big data 101
Big data 101
 

Similar to Coping with Data Variety in the Big Data Era: The Semantic Computing Approach

Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016Jessie Chuang
 
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...Thomas Rones
 
Semantic Web Technologies
Semantic Web TechnologiesSemantic Web Technologies
Semantic Web TechnologiesKANIMOZHIUMA
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayAmit Sheth
 
The technical case for a semantic web
The technical case for a semantic webThe technical case for a semantic web
The technical case for a semantic webTony Dobaj
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITYSEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITYAmit Sheth
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanPeter Berger
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data ModelingVital.AI
 
Data Structure and Types
Data Structure and TypesData Structure and Types
Data Structure and TypesAnjani Phuyal
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration James Hendler
 
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Artificial Intelligence Institute at UofSC
 

Similar to Coping with Data Variety in the Big Data Era: The Semantic Computing Approach (20)

BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016
 
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
 
Semantic Web Technologies
Semantic Web TechnologiesSemantic Web Technologies
Semantic Web Technologies
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World Today
 
Big data mining
Big data miningBig data mining
Big data mining
 
Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
 
The technical case for a semantic web
The technical case for a semantic webThe technical case for a semantic web
The technical case for a semantic web
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITYSEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David Newman
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data Modeling
 
Big Data
Big DataBig Data
Big Data
 
Database Essay
Database EssayDatabase Essay
Database Essay
 
Data Structure and Types
Data Structure and TypesData Structure and Types
Data Structure and Types
 
Big Data & Data Mining
Big Data & Data MiningBig Data & Data Mining
Big Data & Data Mining
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
 

More from Andre Freitas

AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAndre Freitas
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ ManchesterAndre Freitas
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep LearningAndre Freitas
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsAndre Freitas
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018Andre Freitas
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsAndre Freitas
 
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...Andre Freitas
 
Semantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsSemantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsAndre Freitas
 
Semantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and RefinementSemantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and RefinementAndre Freitas
 
Categorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsCategorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsAndre Freitas
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesAndre Freitas
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
 
WiSS Challenge - Day 2
WiSS Challenge - Day 2WiSS Challenge - Day 2
WiSS Challenge - Day 2Andre Freitas
 
WISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataWISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataAndre Freitas
 
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeSchema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeAndre Freitas
 
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...Andre Freitas
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachAndre Freitas
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
 
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...Andre Freitas
 
How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?Andre Freitas
 

More from Andre Freitas (20)

AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ Manchester
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
 
Semantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsSemantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering Systems
 
Semantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and RefinementSemantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and Refinement
 
Categorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsCategorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary Definitions
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology Classes
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering Systems
 
WiSS Challenge - Day 2
WiSS Challenge - Day 2WiSS Challenge - Day 2
WiSS Challenge - Day 2
 
WISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataWISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked Data
 
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeSchema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
 
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional Approach
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
 
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
 
How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?
 

Recently uploaded

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 

Recently uploaded (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 

Coping with Data Variety in the Big Data Era: The Semantic Computing Approach

  • 1. Coping with Data Variety in the Big Data Era: The Semantic Computing Approach André Freitas Insight Centre for Data Analytics Rio Big Data Meetup (June 2014)
  • 2. Outline  Shift in the Information Systems Landscape  Semantic Computing  Semantics Technologies that Work Today: Data Creation  Semantics Technologies that Work Today: Data Consumption  Case Study: Treo QA System  Conclusions
  • 3. Shift in the Information Systems Landscape
  • 4. Big Data  Vision: More complete data-based picture of the world for systems and users.
  • 5. Big Data Dimensions  Volume  Velocity  Variety
  • 6. Big Data Dimensions  Volume  Velocity  Variety  Veracity  Value
  • 7. Big Data Definitions 7 Data Variety What is Big Data?
  • 8. Cost of Making Sense of It “A lot of Big Data is a lot of small data put together.” “Most of Big Data is not a uniform big block.” “Each data piece is very small and very messy, and a lot of what we are doing there is dealing with that variety.”
  • 9. Cost of Making Sense of It “It is more about the rate of change, the amount and the resources that you need to deal with it.”“If the programming effort per amount of high quality data is really high, the data is big in the sense of high cost to produce new information.” “Big Data seems to be about addressing challenges of scale, in terms of how fast things are coming out at you versus how much it costs to get value out of what you already have.”
  • 10. Cost of Making Sense of It “You can have Big Data challenges not only because you have PBs of data but because data is incredibly varied and therefore consumes a lot of resources to make sense of it.”
  • 11. Cost of Making Sense of It “The speed in which data is generated and the speed in which it needs to be processed in order to use it effectively.”
  • 12. “Schema” Growth  Heterogeneous, complex and large-scale databases.  Very-large and dynamic “schemas”. 10s-100s attributes 1,000s-1,000,000s attributes circa 2000 circa 2014
  • 13. Semantic Heterogeneity  Decentralized content generation.  Multiple perspectives (conceptualizations) of the reality.  Ambiguity, vagueness, inconsistency.
  • 14.
  • 15.
  • 16. Data variety + Data quality - Data Programs Full data coverage Full automation
  • 17. Structure level Unstructured Data Structured Data Consistent Comparable Processable Easy to generate Easy to analyze Semantic Computing
  • 19. The Futurist Perspective  AI vision  Full automation  Perfect natural language interaction
  • 20. The Realist Perspective What can be achieved with semantic computing today?
  • 27. (Some) Challenges in Semantics Knowledge Representation Model Reasoning Large, inconsistent, heterogeneous Data Expected Result: intelligent behavior Semantic flexibility, predictive power, automation ... Acquisition, Learning There is an economical model behind each element!
  • 28. Meaning  Word meaning is usually represented in terms of some formal, symbolic structure, either external or internal to the word  External structure - Associations between different concepts  Internal structure - Feature (property, attribute) lists  The semantic properties of a word are derived from the formal structure of its representation - e.g. Inference algorithm, etc. Semantics = Meaning representation model (data) + inference model
  • 29. Formal Representation of Meaning (Problems)  Different meanings - bank (financial institution) bank (river side)  Meaning variation in context  Meaning evolution  Ambiguity, vagueness, inconsistency
  • 30. Formal Representation of Meaning (Problems)  Different meanings - bank (financial institution) bank (river side)  Meaning variation in context - clever politician, clever tycoon  Meaning evolution  Ambiguity, vagueness, inconsistency Word meaning acquisition & representation Lack of flexibility Scalability
  • 31.  Most semantic models have dealt with particular types of constructions, and have been carried out under very simplifying assumptions, in true lab conditions.  If these idealizations are removed it is not clear at all that modern semantics can give a full account of all but the simplest models/statements. Sahlgren, 2013 Formal World Real World Baroni et al. 2013 Semantics for a Complex World
  • 32. Semantics Technologies that Work Today Data Creation
  • 33. Data Creation  Human interaction element (Data Curation)  Semantic representation  Information extraction
  • 38. Defining Attributes & Relationships
  • 39. Data curation elements  Data curation platforms - Spreadsheets - Open Refine - Karma  Algorithmic curation - Validation & Annotation robots  Curation at source - Minimal Information Models (MIRIAM)  Data curation roles  Crowdsourcing
  • 40. Standardized Data Models  Provides a minimum level of data interoperability  Examples: - Resource Description Framework (RDF) - Linked Comma Separated Value (CSV) - Javascript Object Notation (JSON)
  • 41. Resource Description Framework (RDF)  Graph data model  Entity-centric data integration  Facilitates decentralized content generation  URIs for concept identfiers  Associated structured query language (SPARQL)
  • 42. Resource Description Framework (RDF) dbpedia:General_Electric "US$ 147.3 billion"@en dbp:revenue rdf:type dbo:Organization dbpedia:Fairfield, Connecticutdbp:locationCity
  • 43. Resource Description Framework (RDF) dbpedia:General_Electric "US$ 147.3 billion"@en dbp:revenue rdf:type dbo:Organization sec:General_Electric ifrs:CashFlowsFromUsedInOperationsTotal … dbpedia:Fairfield, Connecticutdbp:locationCity
  • 44. Resource Description Framework (RDF) dbpedia:General_Electric "US$ 147.3 billion"@en dbp:revenue rdf:type dbo:Organization sec:General_Electric ifrs:CashFlowsFromUsedInOperationsTotal … dbpedia:Fairfield, Connecticutdbp:locationCity owl:sameAs
  • 45. Resource Description Framework (RDF) dbpedia:General_Electric "US$ 147.3 billion"@en dbp:revenue rdf:type dbo:Organization sec:General_Electric ifrs:CashFlowsFromUsedInOperationsTotal … dbpedia:Fairfield, Connecticutdbp:locationCity geo:Fairfield "N 41° 13' 29'' geo:latitude owl:sameAs
  • 46. Resource Description Framework (RDF) dbpedia:General_Electric "US$ 147.3 billion"@en dbp:revenue rdf:type dbo:Organization sec:General_Electric ifrs:CashFlowsFromUsedInOperationsTotal … dbpedia:Fairfield, Connecticutdbp:locationCity geo:Fairfield "N 41° 13' 29'' geo:latitude owl:sameAs owl:sameAs
  • 47. Representation  Rules (SWRL, RIF)  Ontology (OWL) – Logical Constraints  Taxonomy (RDFS) – Classes in sub-/super-class hierarchy  Relational (RDF) – Attributes – Associations  Dictionary – Terms and definitions Increasing Semantic Representation
  • 51.
  • 52. Open Data  Common-sense Knowledge Base  Domain-specific Knowledge Base  Entity reference system
  • 53.  DBpedia - http://dbpedia.org/  YAGO - http://www.mpi-inf.mpg.de/yago-naga/yago/  Freebase - http://www.freebase.com/  Wikipedia dumps - http://dumps.wikimedia.org/  ConceptNet - http:// conceptnet5.media.mit.edu/  Geonames - http://www.geonames.org/  Common Crawl - http://commoncrawl.org/ Open Data
  • 54. Standardized Vocabularies  Open conceptual models to be reused across different datasets  Provides conceptual model level interoperability  Useful to be used for modelling recurrent domains of discourse
  • 55. Standardized Vocabularies  FOAF  SIOC  COGS  Data Cube Vocabulary  PROV-O  DCTERMS  WGS84 Geo Positioning  SDMX  QUDT  SSN  Schema.org  VoID  Data Catalog  ... http://lov.okfn.org/dataset/lov/
  • 56. Entity Recognition & Linking  Align terms in unstructured text to entities in a structured KB  Integrates structured to unstructured data
  • 57. Entity Recognition & Linking  Align terms in unstructured text to entities in a structured KB  Integrates structured to unstructured data
  • 58. Entity Recognition & Linking  Align terms in unstructured text to entities in a structured KB  Integrates structured to unstructured data  Can be used to support semantic search  Provides a first level of structure to unstructured data  Exploratory browsing
  • 59. Entity Recognition & Linking  Example: “GE has also been implicated in the creation of toxic waste.”
  • 60. Entity Recognition & Linking  Example: “GE has also been implicated in the creation of toxic waste.”
  • 61. Entity Recognition & Linking  Example: “GE has also been implicated in the creation of toxic waste.” <http://dbpedia.org/resource/General_Electric> yago:ConglomerateCompanies yago:MedicalEquipmentManufacturers yago:CompaniesListedOnTheNewYorkStockExchange
  • 62. Entity Recognition & Linking  Example: “GE has also been implicated in the creation of toxic waste.” <http://dbpedia.org/resource/Toxic_waste>
  • 63.  DBpedia Spotlight - http://spotlight.dbpedia.org  NERD (Named Entity Recognition and Disambiguation) - http://nerd.eurecom.fr/  Stanford Named Entity Recognizer - http://nlp.stanford.edu/software/CRF-NER.shtml Entity Recognition/Linking
  • 64. Syntactic Parsers GE/NNP has/VBZ also/RB been/VBN implicated/VBN in/IN the/DT creation/NN of/IN toxic/JJ waste/NN
  • 65.  Stanford parser - http://nlp.stanford.edu/software/lex-parser.shtml - Languages: English, German, Chinese, and others  MALT - http://www.maltparser.org/ - Languages (pre-trained): English, French, Swedish  C&C Parser - http://svn.ask.it.usyd.edu.au/trac/candc Parsers
  • 66.  GATE (General Architecture for Text Engineering) - http://gate.ac.uk/  NLTK (Natural Language Toolkit) - http://nltk.org/  Stanford NLP - http://www-nlp.stanford.edu/software/index.shtml  LingPipe - http://alias-i.com/lingpipe/index.html Text Processing Tools
  • 67. Database Representation  Easy evolution of schemas (schema-less)  Graph Databases - OpenLink Virtuoso - Neo4J - Transforming Lucene into a Graph Database  NoSQL ...
  • 68.  Apache Unstructured Information Management Architecture (UIMA) - Component software architecture for the analysis of unstructured data - http://uima.apache.org/  NLP Interchange Format (NIF) - RDF & OWL-based - http://persistence.uni-leipzig.org/nlp2rdf/ NLP Integration
  • 69. Relation/Graph Extraction  Reverb - http://reverb.cs.washington.edu/  Graphia - http://graphia.dcc.ufrj.br/
  • 70. Relation/Graph Extraction In 2002, GE acquired the wind power assets of Enron.In 2002 GE acquired the wind power assets of Enron
  • 71. Relation/Graph Extraction General Electric Company, or GE , is an American multinational conglomerate corporation incorporated in Schenectady , New York
  • 72. Semantics Technologies that Work Today Data Consumption
  • 73. Vector Space Models  Representation useful for approximate search  Search over structured and unstructured data  Construction of approximate semantic models
  • 75.  Lucene & Solr - http://lucene.apache.org/  Terrier - http://terrier.org/ Indexing & Search Engines
  • 76. Distributional Hypothesis “Words occurring in similar (linguistic) contexts tend to be semantically similar”  He filled the wampimuk with the substance, passed it around and we all drunk some  We found a little, hairy wampimuk sleeping behind the tree
  • 77. Distributional Semantic Models (DSMs)  Computational models that build contextual semantic representations from corpus data  Semantic context is represented by a vector  Vectors are obtained through the statistical analysis of the linguistic contexts of a word  Salience of contexts (cf. context weighting scheme)  Semantic similarity/relatedness as the core operation over the model
  • 78. DSMs as Commonsense Reasoning Commonsense is here θ car dog cat bark run leash
  • 79. DSMs as Commonsense Reasoning
  • 80. DSMs as Commonsense Reasoning
  • 81. DSMs as Commonsense Reasoning
  • 82. DSMs as Commonsense Reasoning θ car dog cat bark run leash ... vs. Semantic best-effort
  • 84.  Amtera Esprit (distributional semantic relatedness) - http://www.mashape.com/amtera/esa-semantic-relatedness  WS4J (Java API for several semantic relatedness algorithms) - https://code.google.com/p/ws4j/  SecondString (string matching) - http://secondstring.sourceforge.net  S-space (distributional semantics framework) - https://github.com/fozziethebeat/S-Space String similarity and semantic relatedness
  • 85.  WordNet - http://wordnet.princeton.edu/  Wiktionary - http://www.wiktionary.org/  FrameNet - https://framenet.icsi.berkeley.edu/fndrupal/  VerbNet - http://verbs.colorado.edu/~mpalmer/projects/verbnet.html  BabelNet - http://babelnet.org/ Lexical Resources
  • 86. Entity Recognition & Linking Distributional Semantics Relation/Graph Extraction Internal Datasets Reference Corpora Semantic Pipeline Vocabulary Management Semantic Search & QA Crawling & Indexing Open Data Vocabularies, Taxonomies, Lexical Resources Internal Documents Knowledge Graph Management Knowledge Graph Data Curation Platform Crowdsourcing Services Applications User feedback Provenance Management
  • 88. Querying your Knowledge Graph Gaelic: direction
  • 91. Vocabulary Problem Query: Who is the daughter of Bill Clinton married to? Possible representations = Commonsense Knowledge Dataset (DBpedia 3.7 + YAGO): 45,767 predicates, 5,556,492 classes and 9,434,677 instances
  • 92. Vocabulary Problem Query: Who is the daughter of Bill Clinton married to? Semantic approximationSemantic Gap Possible representations = Commonsense Knowledge Dataset (DBpedia 3.7 + YAGO): 45,767 predicates, 5,556,492 classes and 9,434,677 instances
  • 93. Core Principles  Minimize the impact of Ambiguity, Vagueness, Synonymy.  Address the simplest matchings first (heuristics).  Semantic Relatedness as a primitive operation.  Distributional semantics as commonsense knowledge.
  • 94. Step 1: POS Tagging Who/WP is/VBZ the/DT daughter/NN of/IN Bill/NNP Clinton/NNP married/VBN to/TO ?/. Query Pre-Processing (Question Analysis)
  • 95. Step 2: Core Entity Recognition Rules-based: POS Tag + TF/IDF Who is the daughter of Bill Clinton married to? (PROBABLY AN INSTANCE) Query Pre-Processing (Question Analysis)
  • 96. Step 3: Determine answer type Rules-based. Who is the daughter of Bill Clinton married to? (PERSON) Query Pre-Processing (Question Analysis)
  • 97. Step 4: Dependency parsing dep(married-8, Who-1) auxpass(married-8, is-2) det(daughter-4, the-3) nsubjpass(married-8, daughter-4) prep(daughter-4, of-5) nn(Clinton-7, Bill-6) pobj(of-5, Clinton-7) root(ROOT-0, married-8) xcomp(married-8, to-9) Query Pre-Processing (Question Analysis)
  • 98. Step 5: Determine Partial Ordered Dependency Structure (PODS) Rules based. Remove stop words. Merge words into entities. Reorder structure from core entity position. Query Pre-Processing (Question Analysis) (INSTANCE) ANSWER TYPE QUESTION FOCUS Bill Clinton daughter married to
  • 99. Question Analysis Query Features Bill Clinton daughter married to (INSTANCE) (PREDICATE) (PREDICATE) Query Features PODS
  • 100. Query Plan Map query features into a query plan. A query plan contains a sequence of core operations. (INSTANCE) (PREDICATE) (PREDICATE) Query Features Query Plan  (1) INSTANCE SEARCH (Bill Clinton)  (2) p1 <- SEARCH PREDICATE (Bill Clintion, daughter)  (3) e1 <- NAVIGATE (Bill Clintion, p1)  (4) p2 <- SEARCH PREDICATE (e1, married to)  (5) e2 <- NAVIGATE (e1, p2)
  • 101. Instance Search Bill Clinton daughter married to :Bill_Clinton Query: Linked Data: Instance Search
  • 102. Predicate Search Bill Clinton daughter married to :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child :Baptists :religion :Yale_Law_School :almaMater ... (PIVOT ENTITY) (ASSOCIATED TRIPLES)
  • 103. Predicate Search Bill Clinton daughter married to :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child :Baptists :religion :Yale_Law_School :almaMater ... sem_rel(daughter,child)=0.054 Which properties are semantically related to ‘daughter’?
  • 104. Predicate Search Bill Clinton daughter married to :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child :Baptists :religion :Yale_Law_School :almaMater ... sem_rel(daughter,child)=0.054 sem_rel(daughter,child)=0.004 Which properties are semantically related to ‘daughter’?
  • 105. Predicate Search Bill Clinton daughter married to :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child :Baptists :religion :Yale_Law_School :almaMater ... sem_rel(daughter,child)=0.054 sem_rel(daughter,child)=0.004 sem_rel(daughter,alma mater)=0.001 Which properties are semantically related to ‘daughter’?
  • 106. Navigate Bill Clinton daughter married to :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child
  • 107. Navigate Bill Clinton daughter married to :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child (PIVOT ENTITY)
  • 108. Predicate Search Bill Clinton daughter married to :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child (PIVOT ENTITY) :Mark_Mezvinsky :spouse
  • 110. Evaluation  102 natural language queries (Test Collection: QALD 2011).  Avg. query execution time: 1.52 s (simple queries) – 8.53 s (all queries).
  • 111. Treo Answers Jeopardy Queries (Video) http://bit.ly/1hWcch9
  • 112. Hybrid unstructured & structured Sydney's dad, Jack, was a CIA double agent working against SD-6 on this Jennifer Garner show.
  • 113. Core Principles  Semantic best-effort  Dialog & user disambiguation  Pay-as-you-go data integration  Simplicity of use  Franklin et al. (2005): From Databases to Dataspaces.  Helland (2011): If You Have Too Much Data, then “Good Enough” Is Good Enough.
  • 114. Take-away message  There are approaches that can be used today to cope with data variety in the Big Data era  Coping with data variety demands a multi-disciplinary perspective and a new infrastructure - Knowledge Representation, IR and Natural Language Processing  Semantics at scale as a central concern  You can build your own IBM Watson-like application!  Great opportunity for new solutions and for being a pioneer
  • 115. andre.freitas – at – deri.org

Editor's Notes

  1. Emphasize entity
  2. Emphasize entity
  3. Emphasize entity
  4. Emphasize entity
  5. Emphasize entity
  6. Emphasize entity
  7. ADD BABELNET