SlideShare a Scribd company logo
Methods
        of
Knowledge Extraction



Deepti Aggarwal
SIEL|SERL, IIIT-Hyderabad, India
Agenda
 Introduction to Web as a knowledge
  repository
 Automated extraction techniques (Input
  sources, extracted structures, input pre-
  processing, extraction methods, output
  generation)
 Issues with automated extraction
What is knowledge?
 A familiarity with someone or something
  with experience
 Includes facts, information, descriptions,
  skills
Types of Knowledge
Explicit Knowledge         Implicit Knowledge
 Always present            Not present explicitly
  explicitly in records      for analysis

 Objective facts having    Cultural beliefs with
 a definite answer           subjective judgments

 E.g., Hyderabad is the
  capital of A.P.           E.g., Hyderabad is the
                             best city to live in India.
How knowledge is
represented over a period
of time?
 From Public library to global library
How knowledge is
represented over the web?
 Millions of documents, blogs, forums,
  social networks scattered on web
 Diverse topic, different formats, from
  diverse people in diverse language,
  different point of views
Benefits of knowledge
extraction over the Web
 Question Answering systems

 Search engines                       Explicit
 Validating knowledge                 knowledge

 Tracking a particular information



 Predicting market, polls etc.       Implicit
 Community advertisements            knowledge
Problems with knowledge
acquisition over web

 Abundance of data
 Relevance of information
 Personalized retrieval
Possible approaches
 Manual filtering

 Automated techniques

 Combination of both
Automated
 Extraction
Working of automated
   extraction systems


           Defining       Input
            output         pre-     Extraction     Output
          structures   processing   methods      processing


 Input
sources                                                       Database
                                                              of all facts,
                              Extraction system               relations
Input sources
           Types
Input sources
 web documents
 news articles
 blogs
 social networks activities (user profiles,
 posts, comments)


Sentence level parsing required.
Defining the
structures of
      output
Named Entities and their relations
Output structures
 Named Entities
 Named entities relations
1. Named Entity: Definition
 It is an   atomic element in a body of
  text.

 Types: person, organization, location etc.
 Different named entities when linked together,
  form   a relation.
1. Named Entity: An
example


  Sachin Tendulkarwas born in Bombay.




    NE of type „Person‟   NE of type „Location‟
2. Named Entity
Relationship: Structure


     Subject – Relation - Object



    NE of any type            NE of any type

                Verb, Adjective, Adverb
2. Named Entity
Relationship: An Example


Sachin Tendulkar was born inBombay




     Subject        Relation   Object
Co-referencing


Sachin was born in Bombay. He is a ...


 Sachin Tendulkar…. Mr. Tendulkar …
  Master Blaster...
Input
pre-processing
           Libraries
NLP libraries:
   Splitting each sentence into tokens, words,
    digits using Sentence Tokenizer

   Recognizing language constructs, nouns,
    verbs, pronouns using Part-of-speech
    Tagger
 Example: Sachin/NNPTendulkar/NNP
  was/VBD born/VBN in/IN
  Bombay/NNP
NLP libraries (contd.):
 Linking individual constituents of a
  sentence with Parser to form parse
  tree
 Identify types of named entity using
   Named Entity Recognizer
 Example: Sachin
  Tendulkar/PERSON was born
  inBombay/LOCATION
NLP libraries (contd.):
 Identify all co-references and replace
  with actual entity using Co -
   reference Resolution tool
 Identify specific meaning of a word
   Word Sense Disambiguation
      External vocabularies: MindNet,
       DBpedia, WordNet
      E.g., contextual meaning of „crane‟:
       noun-bird, verb-lift/move
Extraction
 methods
Extracting relationships
among NEs: Standard
process
          named entities within a
1. Identify
   sentence.

          verbor adjective that
2. Find the

   connects the identified named

   entities.
3. Connect them together to form   relation.
Extracting relationships
among NEs: Required
process
1. Identifypart-of-speech constructs:
   noun, verb, adjective etc.

        Co-references,
2. Determine

   Acronyms and
   abbreviations.
3. Connect them together to form a
   relationship.
Extraction Methods
 Natural Language Processing: rule        based.
    Based on sentence structure

    E.g., for English language, a rule can be “noun-verb-noun”

 Machine Learning: supervised          and
 unsupervised learning.
    Features are detected from the training data

    E.g., to extract instances of some medical diseases, system
     is trained over all the symptoms of each given disease.
Extraction Methods (contd.)
 Other methods:Vocabulary
                        based systems,
 context based clustering.
    Maintaining a mapping file of all countries and their
     nationalities helps to determine nationality of a
     person when his birth place is known.

 Hybrid:
    NLP based libraries to pre-process the input data,
     applying machine learning approach to extract the
     relations by using some external vocabulary as
     WordNet.
Output
generation
Types of output systems
1. Identifies all mentionsof named entities
   and their relations.
 E.g., from a given corpus, extract all named entity
    relations.

2. Identify missing relations of a database
 E.g., Given a database, extract the missing attributes
    of given entities from the corpus.

3. Linking various entities within a database.
 E.g., Given a database, link two entities together with
    some relation extracted from the corpus.
Working of automated
   extraction systems


           Defining       Input
            output         pre-     Extraction     Output
          structures   processing   methods      processing


 Input
sources                                                       Database
                                                              of all facts,
                              Extraction system               relations
Issues with
    automated
     extraction
Accuracy, running time, dependency
Issue 1: Challenges of
language structure
  Co-reference
  resolution
  Ambiguous, complex
  sentences
  Abbreviations
  Acronyms
See an example…

 “Tomcalled his father last night. They talked for
  an hour. Hesaid hewould be home the next
  day."


          What is „He'referring to?
            Tomorhis father?
“You see sir, I can talk English, I can walk English, I
can laugh English, I can run English, because
English is such a funny language.”
Amitabh in NamakHalal
Issue 2: Accuracy
  Named entity detection: 90%,
   relationship 50-70%.

  Introduction of noise at each step.
    E.g., disambiguation of acronym
     „crane‟ with WordNet, introduces
     contextual errors, which then
     decreases accuracy of rule based
     relationship extraction
Issue 3: Efficiency
  Feature detection steps are
   expensive.

  Require days for computation
Issue 4: Dependency
 on external vocabulary sources, like
  Wikipedia, WordNet, MindNetetc.
 Maintenance &updationof vocabulary
  sources is manual: costly and require
  expertise.
 Limited size produce context based noise

  Domain-dependent: medical domain
  Corpus-dependent: Wikipedia, news
   corpus
  Relation specific: Dateand Place-of-
   event
Issue 5: Problem with Implicit
knowledge extraction
 Community Knowledge is learned and shared

 No one can be an expert.

 cultural competence and perception of
  workers are fed into a system as variables.

Cultural Consensus Theory provides
 models to include such variables into the
 system.
Can we do better?
Can we seek human intelligence to improve
the accuracy of automated techniques?
References
[1] I. Tuomi. Data is more than knowledge:
  implications of the reversed knowledge hierarchy
  for knowledge management and organizational
  memory. J. Manage. Inf. Syst. , 16(3):103–117, Dec.
  1999.

[2] S. Sekine. Named Entity: History and Future. 2004.

[3] S. Sarawagi. Information extraction. Found. Trends
  databases , 1(3):261–377, Mar. 2008.

[4] S. C. Weller. Cultural consensus theory:
  Applications and frequently asked questions. Field
  Methods,19(4):339–368, 2007.
References (contd.)
[5] Z. Syed, E. Viegas, and S. Parastatidis. Automatic
  discovery of semantic relations using mindnet.
  LREC,2010.

[6] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and
  K. Miller. Wordnet: An on-line lexical database.
  International Journal of Lexicography , 3:235–244,
  1990

[7] T. S. Jayram, R. Krishnamurthy, S. Raghavan, S.
  Vaithyanathan, and H. Zhu. Avatar information
  extraction system. IEEE Data Eng. Bull. , pages 40–48,
  2006.

[8] E. Greengrass. Information retrieval: A survey, 2000.
Thank you
    Questions?
Knowledge acquisition using automated techniques

More Related Content

What's hot

Ontologies
OntologiesOntologies
Ontologies
Mani Kumar
 
Ontology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical studyOntology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical studyDebashisnaskar
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
El Habib NFAOUI
 
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
Content Savvy
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
Waqas Tariq
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical Study
Debashisnaskar
 
Leveraging Semantic Parsing for Relation Linking over Knowledge Bases
Leveraging Semantic Parsing for Relation Linking over Knowledge BasesLeveraging Semantic Parsing for Relation Linking over Knowledge Bases
Leveraging Semantic Parsing for Relation Linking over Knowledge Bases
Nandana Mihindukulasooriya
 
ISWC 2020 - Semantic Answer Type Prediction
ISWC 2020 - Semantic Answer Type PredictionISWC 2020 - Semantic Answer Type Prediction
ISWC 2020 - Semantic Answer Type Prediction
Nandana Mihindukulasooriya
 
Adaptive information extraction
Adaptive information extractionAdaptive information extraction
Adaptive information extractionunyil96
 
Csci 6530 2016 fall presentation
Csci 6530 2016 fall presentationCsci 6530 2016 fall presentation
Csci 6530 2016 fall presentation
ciakov
 
Usability Report - Discovery Tools
Usability Report - Discovery ToolsUsability Report - Discovery Tools
Usability Report - Discovery Tools
Nikki Kerber
 
Chap 1 general introduction of information retrieval
Chap 1  general introduction of information retrievalChap 1  general introduction of information retrieval
Chap 1 general introduction of information retrieval
Malobe Lottin Cyrille Marcel
 
Domain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachDomain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised Approach
Waqas Tariq
 
Word Format.doc
Word Format.docWord Format.doc
Word Format.docbutest
 
Indexing Automated Vs Automatic Galvan1
Indexing Automated Vs Automatic   Galvan1Indexing Automated Vs Automatic   Galvan1
Indexing Automated Vs Automatic Galvan1CorinaF
 
An analysis on Filter for Spam Mail
An analysis on Filter for Spam MailAn analysis on Filter for Spam Mail
An analysis on Filter for Spam Mail
AM Publications
 
Phrase Structure Identification and Classification of Sentences using Deep Le...
Phrase Structure Identification and Classification of Sentences using Deep Le...Phrase Structure Identification and Classification of Sentences using Deep Le...
Phrase Structure Identification and Classification of Sentences using Deep Le...
ijtsrd
 

What's hot (20)

Ontologies
OntologiesOntologies
Ontologies
 
Ontology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical studyOntology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical study
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
The basics of ontologies
The basics of ontologiesThe basics of ontologies
The basics of ontologies
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
 
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical Study
 
Leveraging Semantic Parsing for Relation Linking over Knowledge Bases
Leveraging Semantic Parsing for Relation Linking over Knowledge BasesLeveraging Semantic Parsing for Relation Linking over Knowledge Bases
Leveraging Semantic Parsing for Relation Linking over Knowledge Bases
 
ISWC 2020 - Semantic Answer Type Prediction
ISWC 2020 - Semantic Answer Type PredictionISWC 2020 - Semantic Answer Type Prediction
ISWC 2020 - Semantic Answer Type Prediction
 
Adaptive information extraction
Adaptive information extractionAdaptive information extraction
Adaptive information extraction
 
Csci 6530 2016 fall presentation
Csci 6530 2016 fall presentationCsci 6530 2016 fall presentation
Csci 6530 2016 fall presentation
 
Usability Report - Discovery Tools
Usability Report - Discovery ToolsUsability Report - Discovery Tools
Usability Report - Discovery Tools
 
Chap 1 general introduction of information retrieval
Chap 1  general introduction of information retrievalChap 1  general introduction of information retrieval
Chap 1 general introduction of information retrieval
 
Domain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachDomain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised Approach
 
Word Format.doc
Word Format.docWord Format.doc
Word Format.doc
 
Extracting Semantic
Extracting Semantic Extracting Semantic
Extracting Semantic
 
Indexing Automated Vs Automatic Galvan1
Indexing Automated Vs Automatic   Galvan1Indexing Automated Vs Automatic   Galvan1
Indexing Automated Vs Automatic Galvan1
 
An analysis on Filter for Spam Mail
An analysis on Filter for Spam MailAn analysis on Filter for Spam Mail
An analysis on Filter for Spam Mail
 
Phrase Structure Identification and Classification of Sentences using Deep Le...
Phrase Structure Identification and Classification of Sentences using Deep Le...Phrase Structure Identification and Classification of Sentences using Deep Le...
Phrase Structure Identification and Classification of Sentences using Deep Le...
 

Viewers also liked

Knowledge acquisition group capabilities 2014 q1 (concise)
Knowledge acquisition group capabilities 2014 q1 (concise)Knowledge acquisition group capabilities 2014 q1 (concise)
Knowledge acquisition group capabilities 2014 q1 (concise)
AnnettaColeman
 
在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”
在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”
在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”
Takamitsu Nakao
 
中国のスマートフォン市場とソーシャルネットワーク市場
中国のスマートフォン市場とソーシャルネットワーク市場中国のスマートフォン市場とソーシャルネットワーク市場
中国のスマートフォン市場とソーシャルネットワーク市場
Takamitsu Nakao
 
Nautical numbers
Nautical numbersNautical numbers
Nautical numbers
Lloyd's Register
 
中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)
中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)
中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)
Takamitsu Nakao
 
CFBご利用・ご活用ガイド
CFBご利用・ご活用ガイドCFBご利用・ご活用ガイド
CFBご利用・ご活用ガイド
Takamitsu Nakao
 
numéricos, embarcados e componentes básicos de um computador - UFS
numéricos, embarcados e componentes básicos de um computador - UFSnuméricos, embarcados e componentes básicos de um computador - UFS
numéricos, embarcados e componentes básicos de um computador - UFS
wilkinson santana
 
Marketing-Methods,Marketing Automation & Marketing‘s Accountability
Marketing-Methods,Marketing Automation & Marketing‘s AccountabilityMarketing-Methods,Marketing Automation & Marketing‘s Accountability
Marketing-Methods,Marketing Automation & Marketing‘s Accountability
StephanWo
 
Nor-shipping stand build 2011
Nor-shipping stand build 2011Nor-shipping stand build 2011
Nor-shipping stand build 2011
Lloyd's Register
 
中国モバイル市場とソーシャルメディア市場(2013年1月版)
中国モバイル市場とソーシャルメディア市場(2013年1月版)中国モバイル市場とソーシャルメディア市場(2013年1月版)
中国モバイル市場とソーシャルメディア市場(2013年1月版)
Takamitsu Nakao
 
中国Android事情
中国Android事情中国Android事情
中国Android事情
Takamitsu Nakao
 
Laporan observasi Kecerdasan Buatan
Laporan observasi Kecerdasan BuatanLaporan observasi Kecerdasan Buatan
Laporan observasi Kecerdasan Buatan
Agung Moses C Satria
 
A broad overview of Tele-consultation through SocialNUI lens
A broad overview of Tele-consultation through SocialNUI lensA broad overview of Tele-consultation through SocialNUI lens
A broad overview of Tele-consultation through SocialNUI lens
University of Melbourne, Australia
 
Richter video screenshots may
Richter video screenshots mayRichter video screenshots may
Richter video screenshots may
razleesecurity
 
Bcs project of telenor
Bcs project of telenorBcs project of telenor
Bcs project of telenoraaaswad
 
Wellness at hand: Exploring interactive technology to support smokers
Wellness at hand: Exploring interactive technology to support smokersWellness at hand: Exploring interactive technology to support smokers
Wellness at hand: Exploring interactive technology to support smokers
University of Melbourne, Australia
 
中国ソーシャルメディア その実態と動向(2012年8月版)
中国ソーシャルメディア   その実態と動向(2012年8月版)中国ソーシャルメディア   その実態と動向(2012年8月版)
中国ソーシャルメディア その実態と動向(2012年8月版)Takamitsu Nakao
 

Viewers also liked (20)

Knowledge acquisition group capabilities 2014 q1 (concise)
Knowledge acquisition group capabilities 2014 q1 (concise)Knowledge acquisition group capabilities 2014 q1 (concise)
Knowledge acquisition group capabilities 2014 q1 (concise)
 
在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”
在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”
在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”
 
中国のスマートフォン市場とソーシャルネットワーク市場
中国のスマートフォン市場とソーシャルネットワーク市場中国のスマートフォン市場とソーシャルネットワーク市場
中国のスマートフォン市場とソーシャルネットワーク市場
 
Nautical numbers
Nautical numbersNautical numbers
Nautical numbers
 
中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)
中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)
中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)
 
CFBご利用・ご活用ガイド
CFBご利用・ご活用ガイドCFBご利用・ご活用ガイド
CFBご利用・ご活用ガイド
 
numéricos, embarcados e componentes básicos de um computador - UFS
numéricos, embarcados e componentes básicos de um computador - UFSnuméricos, embarcados e componentes básicos de um computador - UFS
numéricos, embarcados e componentes básicos de um computador - UFS
 
Marketing-Methods,Marketing Automation & Marketing‘s Accountability
Marketing-Methods,Marketing Automation & Marketing‘s AccountabilityMarketing-Methods,Marketing Automation & Marketing‘s Accountability
Marketing-Methods,Marketing Automation & Marketing‘s Accountability
 
Nor-shipping stand build 2011
Nor-shipping stand build 2011Nor-shipping stand build 2011
Nor-shipping stand build 2011
 
中国モバイル市場とソーシャルメディア市場(2013年1月版)
中国モバイル市場とソーシャルメディア市場(2013年1月版)中国モバイル市場とソーシャルメディア市場(2013年1月版)
中国モバイル市場とソーシャルメディア市場(2013年1月版)
 
中国Android事情
中国Android事情中国Android事情
中国Android事情
 
Laporan observasi Kecerdasan Buatan
Laporan observasi Kecerdasan BuatanLaporan observasi Kecerdasan Buatan
Laporan observasi Kecerdasan Buatan
 
Jhonier montoya
Jhonier montoyaJhonier montoya
Jhonier montoya
 
Solcellekursus 7 maj 2011_på_fc_jane_kruse
Solcellekursus  7 maj 2011_på_fc_jane_kruseSolcellekursus  7 maj 2011_på_fc_jane_kruse
Solcellekursus 7 maj 2011_på_fc_jane_kruse
 
Solcellekursus 7 maj 2011_på_fc_csm
Solcellekursus  7 maj 2011_på_fc_csmSolcellekursus  7 maj 2011_på_fc_csm
Solcellekursus 7 maj 2011_på_fc_csm
 
A broad overview of Tele-consultation through SocialNUI lens
A broad overview of Tele-consultation through SocialNUI lensA broad overview of Tele-consultation through SocialNUI lens
A broad overview of Tele-consultation through SocialNUI lens
 
Richter video screenshots may
Richter video screenshots mayRichter video screenshots may
Richter video screenshots may
 
Bcs project of telenor
Bcs project of telenorBcs project of telenor
Bcs project of telenor
 
Wellness at hand: Exploring interactive technology to support smokers
Wellness at hand: Exploring interactive technology to support smokersWellness at hand: Exploring interactive technology to support smokers
Wellness at hand: Exploring interactive technology to support smokers
 
中国ソーシャルメディア その実態と動向(2012年8月版)
中国ソーシャルメディア   その実態と動向(2012年8月版)中国ソーシャルメディア   その実態と動向(2012年8月版)
中国ソーシャルメディア その実態と動向(2012年8月版)
 

Similar to Knowledge acquisition using automated techniques

The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
Iman Mirrezaei
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
Meena Nagarajan
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
Cuong Tran Van
 
Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled Vision
Seth Grimes
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
Ignacio Delgado
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptx
nilesh405711
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
Hammad Afzal
 
Ontology learning
Ontology learningOntology learning
Ontology learning
Ehsan Asgarian
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
Devakumar Jain
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1
Sumit Sony
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
CloudTechnologies
 
Enhancing Semantic Mining
Enhancing Semantic MiningEnhancing Semantic Mining
Enhancing Semantic Mining
Santhosh Kumar
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus type
jins0618
 
Web 3 Expert System
Web 3 Expert SystemWeb 3 Expert System
Web 3 Expert Systemguest4513a7
 
Web 3 Expert System
Web 3 Expert SystemWeb 3 Expert System
Web 3 Expert SystemMediabistro
 
Content Analysis Overview for Persona Development
Content Analysis Overview for Persona DevelopmentContent Analysis Overview for Persona Development
Content Analysis Overview for Persona Development
Pamela Rutledge
 
Named entity recognition using web document corpus
Named entity recognition using web document corpusNamed entity recognition using web document corpus
Named entity recognition using web document corpus
IJMIT JOURNAL
 
Knowledge base system appl. p 3,4
Knowledge base system appl.  p 3,4Knowledge base system appl.  p 3,4
Knowledge base system appl. p 3,4
Taymoor Nazmy
 
Named Entity Recognition Using Web Document Corpus
Named Entity Recognition Using Web Document CorpusNamed Entity Recognition Using Web Document Corpus
Named Entity Recognition Using Web Document Corpus
IJMIT JOURNAL
 

Similar to Knowledge acquisition using automated techniques (20)

The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
 
Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled Vision
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptx
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
 
Ontology learning
Ontology learningOntology learning
Ontology learning
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
 
Enhancing Semantic Mining
Enhancing Semantic MiningEnhancing Semantic Mining
Enhancing Semantic Mining
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus type
 
Web 3 Expert System
Web 3 Expert SystemWeb 3 Expert System
Web 3 Expert System
 
Web 3 Expert System
Web 3 Expert SystemWeb 3 Expert System
Web 3 Expert System
 
ppt
pptppt
ppt
 
Content Analysis Overview for Persona Development
Content Analysis Overview for Persona DevelopmentContent Analysis Overview for Persona Development
Content Analysis Overview for Persona Development
 
Named entity recognition using web document corpus
Named entity recognition using web document corpusNamed entity recognition using web document corpus
Named entity recognition using web document corpus
 
Knowledge base system appl. p 3,4
Knowledge base system appl.  p 3,4Knowledge base system appl.  p 3,4
Knowledge base system appl. p 3,4
 
Named Entity Recognition Using Web Document Corpus
Named Entity Recognition Using Web Document CorpusNamed Entity Recognition Using Web Document Corpus
Named Entity Recognition Using Web Document Corpus
 

More from University of Melbourne, Australia

OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...
OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...
OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...
University of Melbourne, Australia
 
Supporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of PhysiotherapySupporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of Physiotherapy
University of Melbourne, Australia
 
SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...
SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...
SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...
University of Melbourne, Australia
 
Supporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of PhysiotherapySupporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of Physiotherapy
University of Melbourne, Australia
 
Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...
Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...
Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...
University of Melbourne, Australia
 
Understanding Video based Parent Training Intervention for Children with Autism
Understanding Video based Parent Training Intervention for Children with AutismUnderstanding Video based Parent Training Intervention for Children with Autism
Understanding Video based Parent Training Intervention for Children with Autism
University of Melbourne, Australia
 
PhD Confirmation talk
PhD Confirmation talkPhD Confirmation talk
PhD Confirmation talk
University of Melbourne, Australia
 
Six months progress review (PhD work)
Six months progress review (PhD work)Six months progress review (PhD work)
Six months progress review (PhD work)
University of Melbourne, Australia
 
Demography based ATM design
Demography based ATM designDemography based ATM design
Demography based ATM design
University of Melbourne, Australia
 
Upick
UpickUpick

More from University of Melbourne, Australia (12)

OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...
OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...
OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...
 
Supporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of PhysiotherapySupporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of Physiotherapy
 
SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...
SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...
SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...
 
Supporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of PhysiotherapySupporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of Physiotherapy
 
Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...
Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...
Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...
 
Understanding Video based Parent Training Intervention for Children with Autism
Understanding Video based Parent Training Intervention for Children with AutismUnderstanding Video based Parent Training Intervention for Children with Autism
Understanding Video based Parent Training Intervention for Children with Autism
 
PhD Confirmation talk
PhD Confirmation talkPhD Confirmation talk
PhD Confirmation talk
 
Six months progress review (PhD work)
Six months progress review (PhD work)Six months progress review (PhD work)
Six months progress review (PhD work)
 
Masters thesis defense talk
Masters thesis defense talkMasters thesis defense talk
Masters thesis defense talk
 
5min presentation
5min presentation5min presentation
5min presentation
 
Demography based ATM design
Demography based ATM designDemography based ATM design
Demography based ATM design
 
Upick
UpickUpick
Upick
 

Recently uploaded

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

Knowledge acquisition using automated techniques

  • 1. Methods of Knowledge Extraction Deepti Aggarwal SIEL|SERL, IIIT-Hyderabad, India
  • 2. Agenda  Introduction to Web as a knowledge repository  Automated extraction techniques (Input sources, extracted structures, input pre- processing, extraction methods, output generation)  Issues with automated extraction
  • 3. What is knowledge?  A familiarity with someone or something with experience  Includes facts, information, descriptions, skills
  • 4. Types of Knowledge Explicit Knowledge Implicit Knowledge  Always present  Not present explicitly explicitly in records for analysis  Objective facts having  Cultural beliefs with a definite answer subjective judgments  E.g., Hyderabad is the capital of A.P.  E.g., Hyderabad is the best city to live in India.
  • 5. How knowledge is represented over a period of time?  From Public library to global library
  • 6. How knowledge is represented over the web?  Millions of documents, blogs, forums, social networks scattered on web  Diverse topic, different formats, from diverse people in diverse language, different point of views
  • 7. Benefits of knowledge extraction over the Web  Question Answering systems  Search engines Explicit  Validating knowledge knowledge  Tracking a particular information  Predicting market, polls etc. Implicit  Community advertisements knowledge
  • 8. Problems with knowledge acquisition over web  Abundance of data  Relevance of information  Personalized retrieval
  • 9. Possible approaches  Manual filtering  Automated techniques  Combination of both
  • 11. Working of automated extraction systems Defining Input output pre- Extraction Output structures processing methods processing Input sources Database of all facts, Extraction system relations
  • 12. Input sources Types
  • 13. Input sources  web documents  news articles  blogs  social networks activities (user profiles, posts, comments) Sentence level parsing required.
  • 14. Defining the structures of output Named Entities and their relations
  • 15. Output structures  Named Entities  Named entities relations
  • 16. 1. Named Entity: Definition  It is an atomic element in a body of text.  Types: person, organization, location etc.  Different named entities when linked together, form a relation.
  • 17. 1. Named Entity: An example Sachin Tendulkarwas born in Bombay. NE of type „Person‟ NE of type „Location‟
  • 18. 2. Named Entity Relationship: Structure Subject – Relation - Object NE of any type NE of any type Verb, Adjective, Adverb
  • 19. 2. Named Entity Relationship: An Example Sachin Tendulkar was born inBombay Subject Relation Object
  • 20. Co-referencing Sachin was born in Bombay. He is a ... Sachin Tendulkar…. Mr. Tendulkar … Master Blaster...
  • 21. Input pre-processing Libraries
  • 22. NLP libraries:  Splitting each sentence into tokens, words, digits using Sentence Tokenizer  Recognizing language constructs, nouns, verbs, pronouns using Part-of-speech Tagger  Example: Sachin/NNPTendulkar/NNP was/VBD born/VBN in/IN Bombay/NNP
  • 23. NLP libraries (contd.):  Linking individual constituents of a sentence with Parser to form parse tree  Identify types of named entity using Named Entity Recognizer  Example: Sachin Tendulkar/PERSON was born inBombay/LOCATION
  • 24. NLP libraries (contd.):  Identify all co-references and replace with actual entity using Co - reference Resolution tool  Identify specific meaning of a word Word Sense Disambiguation  External vocabularies: MindNet, DBpedia, WordNet  E.g., contextual meaning of „crane‟: noun-bird, verb-lift/move
  • 26. Extracting relationships among NEs: Standard process named entities within a 1. Identify sentence. verbor adjective that 2. Find the connects the identified named entities. 3. Connect them together to form relation.
  • 27. Extracting relationships among NEs: Required process 1. Identifypart-of-speech constructs: noun, verb, adjective etc. Co-references, 2. Determine Acronyms and abbreviations. 3. Connect them together to form a relationship.
  • 28. Extraction Methods  Natural Language Processing: rule based.  Based on sentence structure  E.g., for English language, a rule can be “noun-verb-noun”  Machine Learning: supervised and unsupervised learning.  Features are detected from the training data  E.g., to extract instances of some medical diseases, system is trained over all the symptoms of each given disease.
  • 29. Extraction Methods (contd.)  Other methods:Vocabulary based systems, context based clustering.  Maintaining a mapping file of all countries and their nationalities helps to determine nationality of a person when his birth place is known.  Hybrid:  NLP based libraries to pre-process the input data, applying machine learning approach to extract the relations by using some external vocabulary as WordNet.
  • 31. Types of output systems 1. Identifies all mentionsof named entities and their relations. E.g., from a given corpus, extract all named entity relations. 2. Identify missing relations of a database E.g., Given a database, extract the missing attributes of given entities from the corpus. 3. Linking various entities within a database. E.g., Given a database, link two entities together with some relation extracted from the corpus.
  • 32. Working of automated extraction systems Defining Input output pre- Extraction Output structures processing methods processing Input sources Database of all facts, Extraction system relations
  • 33. Issues with automated extraction Accuracy, running time, dependency
  • 34. Issue 1: Challenges of language structure Co-reference resolution Ambiguous, complex sentences Abbreviations Acronyms
  • 35. See an example… “Tomcalled his father last night. They talked for an hour. Hesaid hewould be home the next day." What is „He'referring to? Tomorhis father?
  • 36. “You see sir, I can talk English, I can walk English, I can laugh English, I can run English, because English is such a funny language.” Amitabh in NamakHalal
  • 37. Issue 2: Accuracy  Named entity detection: 90%, relationship 50-70%.  Introduction of noise at each step.  E.g., disambiguation of acronym „crane‟ with WordNet, introduces contextual errors, which then decreases accuracy of rule based relationship extraction
  • 38. Issue 3: Efficiency  Feature detection steps are expensive.  Require days for computation
  • 39. Issue 4: Dependency  on external vocabulary sources, like Wikipedia, WordNet, MindNetetc.  Maintenance &updationof vocabulary sources is manual: costly and require expertise.  Limited size produce context based noise  Domain-dependent: medical domain  Corpus-dependent: Wikipedia, news corpus  Relation specific: Dateand Place-of- event
  • 40. Issue 5: Problem with Implicit knowledge extraction  Community Knowledge is learned and shared  No one can be an expert.  cultural competence and perception of workers are fed into a system as variables. Cultural Consensus Theory provides models to include such variables into the system.
  • 41. Can we do better? Can we seek human intelligence to improve the accuracy of automated techniques?
  • 42. References [1] I. Tuomi. Data is more than knowledge: implications of the reversed knowledge hierarchy for knowledge management and organizational memory. J. Manage. Inf. Syst. , 16(3):103–117, Dec. 1999. [2] S. Sekine. Named Entity: History and Future. 2004. [3] S. Sarawagi. Information extraction. Found. Trends databases , 1(3):261–377, Mar. 2008. [4] S. C. Weller. Cultural consensus theory: Applications and frequently asked questions. Field Methods,19(4):339–368, 2007.
  • 43. References (contd.) [5] Z. Syed, E. Viegas, and S. Parastatidis. Automatic discovery of semantic relations using mindnet. LREC,2010. [6] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Wordnet: An on-line lexical database. International Journal of Lexicography , 3:235–244, 1990 [7] T. S. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Eng. Bull. , pages 40–48, 2006. [8] E. Greengrass. Information retrieval: A survey, 2000.
  • 44. Thank you Questions?

Editor's Notes

  1. The definition of knowledge is a matter of on-going debate among philosophersbut for our talk I have taken this definition from wikipedia
  2. Predicting market: to predict whether people likes Lux soap or not.community advertisements. Ex: Advertising Bengalis’ community in Hyderabad for a concert in Bengali.
  3. Scarcity is not the issue but abundance is!Easy for humans to understand the meaning lying in different documents.Becomes difficult for a user to find a document of his interest.
  4. Too much of labour, time consuming, biasedness, For huge data, an intelligent way is to formulate an algo which can perform repetitive computation. with systems instead of manual labour. Less time consuming, Which I will talk about in my ppt.I Consider it to be more appropriate. Combines the advantages of both systems and humans. Systems: scalability and accuracy and intelligence with humans. In my thesis, I have particularly opted for this approach. Today I am not talking about this approach. I will cover this topic in some later ppt.
  5. Systems that are built over some algorithms: the use of methods for controlling industrial processes automatically, esp by electronically controlled systems, often reducing manpower
  6. Broad overview of how system worksAccording to me these are five main components
  7. Broad overview of how system worksAccording to me these are five main components
  8. Type of extraction method depends on the applicationHighly sophisticated system can achieve max. of 70% accuracy. Accuracy of automated techniques can not surpass human intelligence.