SlideShare a Scribd company logo
1 of 45
Methods
        of
Knowledge Extraction



Deepti Aggarwal
SIEL|SERL, IIIT-Hyderabad, India
Agenda
 Introduction to Web as a knowledge
  repository
 Automated extraction techniques (Input
  sources, extracted structures, input pre-
  processing, extraction methods, output
  generation)
 Issues with automated extraction
What is knowledge?
 A familiarity with someone or something
  with experience
 Includes facts, information, descriptions,
  skills
Types of Knowledge
Explicit Knowledge         Implicit Knowledge
 Always present            Not present explicitly
  explicitly in records      for analysis

 Objective facts having    Cultural beliefs with
 a definite answer           subjective judgments

 E.g., Hyderabad is the
  capital of A.P.           E.g., Hyderabad is the
                             best city to live in India.
How knowledge is
represented over a period
of time?
 From Public library to global library
How knowledge is
represented over the web?
 Millions of documents, blogs, forums,
  social networks scattered on web
 Diverse topic, different formats, from
  diverse people in diverse language,
  different point of views
Benefits of knowledge
extraction over the Web
 Question Answering systems

 Search engines                       Explicit
 Validating knowledge                 knowledge

 Tracking a particular information



 Predicting market, polls etc.       Implicit
 Community advertisements            knowledge
Problems with knowledge
acquisition over web

 Abundance of data
 Relevance of information
 Personalized retrieval
Possible approaches
 Manual filtering

 Automated techniques

 Combination of both
Automated
 Extraction
Working of automated
   extraction systems


           Defining       Input
            output         pre-     Extraction     Output
          structures   processing   methods      processing


 Input
sources                                                       Database
                                                              of all facts,
                              Extraction system               relations
Input sources
           Types
Input sources
 web documents
 news articles
 blogs
 social networks activities (user profiles,
 posts, comments)


Sentence level parsing required.
Defining the
structures of
      output
Named Entities and their relations
Output structures
 Named Entities
 Named entities relations
1. Named Entity: Definition
 It is an   atomic element in a body of
  text.

 Types: person, organization, location etc.
 Different named entities when linked together,
  form   a relation.
1. Named Entity: An
example


  Sachin Tendulkarwas born in Bombay.




    NE of type „Person‟   NE of type „Location‟
2. Named Entity
Relationship: Structure


     Subject – Relation - Object



    NE of any type            NE of any type

                Verb, Adjective, Adverb
2. Named Entity
Relationship: An Example


Sachin Tendulkar was born inBombay




     Subject        Relation   Object
Co-referencing


Sachin was born in Bombay. He is a ...


 Sachin Tendulkar…. Mr. Tendulkar …
  Master Blaster...
Input
pre-processing
           Libraries
NLP libraries:
   Splitting each sentence into tokens, words,
    digits using Sentence Tokenizer

   Recognizing language constructs, nouns,
    verbs, pronouns using Part-of-speech
    Tagger
 Example: Sachin/NNPTendulkar/NNP
  was/VBD born/VBN in/IN
  Bombay/NNP
NLP libraries (contd.):
 Linking individual constituents of a
  sentence with Parser to form parse
  tree
 Identify types of named entity using
   Named Entity Recognizer
 Example: Sachin
  Tendulkar/PERSON was born
  inBombay/LOCATION
NLP libraries (contd.):
 Identify all co-references and replace
  with actual entity using Co -
   reference Resolution tool
 Identify specific meaning of a word
   Word Sense Disambiguation
      External vocabularies: MindNet,
       DBpedia, WordNet
      E.g., contextual meaning of „crane‟:
       noun-bird, verb-lift/move
Extraction
 methods
Extracting relationships
among NEs: Standard
process
          named entities within a
1. Identify
   sentence.

          verbor adjective that
2. Find the

   connects the identified named

   entities.
3. Connect them together to form   relation.
Extracting relationships
among NEs: Required
process
1. Identifypart-of-speech constructs:
   noun, verb, adjective etc.

        Co-references,
2. Determine

   Acronyms and
   abbreviations.
3. Connect them together to form a
   relationship.
Extraction Methods
 Natural Language Processing: rule        based.
    Based on sentence structure

    E.g., for English language, a rule can be “noun-verb-noun”

 Machine Learning: supervised          and
 unsupervised learning.
    Features are detected from the training data

    E.g., to extract instances of some medical diseases, system
     is trained over all the symptoms of each given disease.
Extraction Methods (contd.)
 Other methods:Vocabulary
                        based systems,
 context based clustering.
    Maintaining a mapping file of all countries and their
     nationalities helps to determine nationality of a
     person when his birth place is known.

 Hybrid:
    NLP based libraries to pre-process the input data,
     applying machine learning approach to extract the
     relations by using some external vocabulary as
     WordNet.
Output
generation
Types of output systems
1. Identifies all mentionsof named entities
   and their relations.
 E.g., from a given corpus, extract all named entity
    relations.

2. Identify missing relations of a database
 E.g., Given a database, extract the missing attributes
    of given entities from the corpus.

3. Linking various entities within a database.
 E.g., Given a database, link two entities together with
    some relation extracted from the corpus.
Working of automated
   extraction systems


           Defining       Input
            output         pre-     Extraction     Output
          structures   processing   methods      processing


 Input
sources                                                       Database
                                                              of all facts,
                              Extraction system               relations
Issues with
    automated
     extraction
Accuracy, running time, dependency
Issue 1: Challenges of
language structure
  Co-reference
  resolution
  Ambiguous, complex
  sentences
  Abbreviations
  Acronyms
See an example…

 “Tomcalled his father last night. They talked for
  an hour. Hesaid hewould be home the next
  day."


          What is „He'referring to?
            Tomorhis father?
“You see sir, I can talk English, I can walk English, I
can laugh English, I can run English, because
English is such a funny language.”
Amitabh in NamakHalal
Issue 2: Accuracy
  Named entity detection: 90%,
   relationship 50-70%.

  Introduction of noise at each step.
    E.g., disambiguation of acronym
     „crane‟ with WordNet, introduces
     contextual errors, which then
     decreases accuracy of rule based
     relationship extraction
Issue 3: Efficiency
  Feature detection steps are
   expensive.

  Require days for computation
Issue 4: Dependency
 on external vocabulary sources, like
  Wikipedia, WordNet, MindNetetc.
 Maintenance &updationof vocabulary
  sources is manual: costly and require
  expertise.
 Limited size produce context based noise

  Domain-dependent: medical domain
  Corpus-dependent: Wikipedia, news
   corpus
  Relation specific: Dateand Place-of-
   event
Issue 5: Problem with Implicit
knowledge extraction
 Community Knowledge is learned and shared

 No one can be an expert.

 cultural competence and perception of
  workers are fed into a system as variables.

Cultural Consensus Theory provides
 models to include such variables into the
 system.
Can we do better?
Can we seek human intelligence to improve
the accuracy of automated techniques?
References
[1] I. Tuomi. Data is more than knowledge:
  implications of the reversed knowledge hierarchy
  for knowledge management and organizational
  memory. J. Manage. Inf. Syst. , 16(3):103–117, Dec.
  1999.

[2] S. Sekine. Named Entity: History and Future. 2004.

[3] S. Sarawagi. Information extraction. Found. Trends
  databases , 1(3):261–377, Mar. 2008.

[4] S. C. Weller. Cultural consensus theory:
  Applications and frequently asked questions. Field
  Methods,19(4):339–368, 2007.
References (contd.)
[5] Z. Syed, E. Viegas, and S. Parastatidis. Automatic
  discovery of semantic relations using mindnet.
  LREC,2010.

[6] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and
  K. Miller. Wordnet: An on-line lexical database.
  International Journal of Lexicography , 3:235–244,
  1990

[7] T. S. Jayram, R. Krishnamurthy, S. Raghavan, S.
  Vaithyanathan, and H. Zhu. Avatar information
  extraction system. IEEE Data Eng. Bull. , pages 40–48,
  2006.

[8] E. Greengrass. Information retrieval: A survey, 2000.
Thank you
    Questions?
Knowledge acquisition using automated techniques

More Related Content

What's hot

Ontology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical studyOntology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical studyDebashisnaskar
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibEl Habib NFAOUI
 
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...Content Savvy
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingWaqas Tariq
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyDebashisnaskar
 
Leveraging Semantic Parsing for Relation Linking over Knowledge Bases
Leveraging Semantic Parsing for Relation Linking over Knowledge BasesLeveraging Semantic Parsing for Relation Linking over Knowledge Bases
Leveraging Semantic Parsing for Relation Linking over Knowledge BasesNandana Mihindukulasooriya
 
Adaptive information extraction
Adaptive information extractionAdaptive information extraction
Adaptive information extractionunyil96
 
Csci 6530 2016 fall presentation
Csci 6530 2016 fall presentationCsci 6530 2016 fall presentation
Csci 6530 2016 fall presentationciakov
 
Usability Report - Discovery Tools
Usability Report - Discovery ToolsUsability Report - Discovery Tools
Usability Report - Discovery ToolsNikki Kerber
 
Chap 1 general introduction of information retrieval
Chap 1  general introduction of information retrievalChap 1  general introduction of information retrieval
Chap 1 general introduction of information retrievalMalobe Lottin Cyrille Marcel
 
Domain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachDomain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachWaqas Tariq
 
Word Format.doc
Word Format.docWord Format.doc
Word Format.docbutest
 
Indexing Automated Vs Automatic Galvan1
Indexing Automated Vs Automatic   Galvan1Indexing Automated Vs Automatic   Galvan1
Indexing Automated Vs Automatic Galvan1CorinaF
 
An analysis on Filter for Spam Mail
An analysis on Filter for Spam MailAn analysis on Filter for Spam Mail
An analysis on Filter for Spam MailAM Publications
 
Phrase Structure Identification and Classification of Sentences using Deep Le...
Phrase Structure Identification and Classification of Sentences using Deep Le...Phrase Structure Identification and Classification of Sentences using Deep Le...
Phrase Structure Identification and Classification of Sentences using Deep Le...ijtsrd
 

What's hot (20)

Ontologies
OntologiesOntologies
Ontologies
 
Ontology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical studyOntology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical study
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
The basics of ontologies
The basics of ontologiesThe basics of ontologies
The basics of ontologies
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
 
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical Study
 
Leveraging Semantic Parsing for Relation Linking over Knowledge Bases
Leveraging Semantic Parsing for Relation Linking over Knowledge BasesLeveraging Semantic Parsing for Relation Linking over Knowledge Bases
Leveraging Semantic Parsing for Relation Linking over Knowledge Bases
 
ISWC 2020 - Semantic Answer Type Prediction
ISWC 2020 - Semantic Answer Type PredictionISWC 2020 - Semantic Answer Type Prediction
ISWC 2020 - Semantic Answer Type Prediction
 
Adaptive information extraction
Adaptive information extractionAdaptive information extraction
Adaptive information extraction
 
Csci 6530 2016 fall presentation
Csci 6530 2016 fall presentationCsci 6530 2016 fall presentation
Csci 6530 2016 fall presentation
 
Usability Report - Discovery Tools
Usability Report - Discovery ToolsUsability Report - Discovery Tools
Usability Report - Discovery Tools
 
Chap 1 general introduction of information retrieval
Chap 1  general introduction of information retrievalChap 1  general introduction of information retrieval
Chap 1 general introduction of information retrieval
 
Domain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachDomain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised Approach
 
Word Format.doc
Word Format.docWord Format.doc
Word Format.doc
 
Extracting Semantic
Extracting Semantic Extracting Semantic
Extracting Semantic
 
Indexing Automated Vs Automatic Galvan1
Indexing Automated Vs Automatic   Galvan1Indexing Automated Vs Automatic   Galvan1
Indexing Automated Vs Automatic Galvan1
 
An analysis on Filter for Spam Mail
An analysis on Filter for Spam MailAn analysis on Filter for Spam Mail
An analysis on Filter for Spam Mail
 
Phrase Structure Identification and Classification of Sentences using Deep Le...
Phrase Structure Identification and Classification of Sentences using Deep Le...Phrase Structure Identification and Classification of Sentences using Deep Le...
Phrase Structure Identification and Classification of Sentences using Deep Le...
 

Viewers also liked

Knowledge acquisition group capabilities 2014 q1 (concise)
Knowledge acquisition group capabilities 2014 q1 (concise)Knowledge acquisition group capabilities 2014 q1 (concise)
Knowledge acquisition group capabilities 2014 q1 (concise)AnnettaColeman
 
在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”
在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”
在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”Takamitsu Nakao
 
中国のスマートフォン市場とソーシャルネットワーク市場
中国のスマートフォン市場とソーシャルネットワーク市場中国のスマートフォン市場とソーシャルネットワーク市場
中国のスマートフォン市場とソーシャルネットワーク市場Takamitsu Nakao
 
中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)
中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)
中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)Takamitsu Nakao
 
CFBご利用・ご活用ガイド
CFBご利用・ご活用ガイドCFBご利用・ご活用ガイド
CFBご利用・ご活用ガイドTakamitsu Nakao
 
numéricos, embarcados e componentes básicos de um computador - UFS
numéricos, embarcados e componentes básicos de um computador - UFSnuméricos, embarcados e componentes básicos de um computador - UFS
numéricos, embarcados e componentes básicos de um computador - UFSwilkinson santana
 
Marketing-Methods,Marketing Automation & Marketing‘s Accountability
Marketing-Methods,Marketing Automation & Marketing‘s AccountabilityMarketing-Methods,Marketing Automation & Marketing‘s Accountability
Marketing-Methods,Marketing Automation & Marketing‘s AccountabilityStephanWo
 
Nor-shipping stand build 2011
Nor-shipping stand build 2011Nor-shipping stand build 2011
Nor-shipping stand build 2011Lloyd's Register
 
中国モバイル市場とソーシャルメディア市場(2013年1月版)
中国モバイル市場とソーシャルメディア市場(2013年1月版)中国モバイル市場とソーシャルメディア市場(2013年1月版)
中国モバイル市場とソーシャルメディア市場(2013年1月版)Takamitsu Nakao
 
Laporan observasi Kecerdasan Buatan
Laporan observasi Kecerdasan BuatanLaporan observasi Kecerdasan Buatan
Laporan observasi Kecerdasan BuatanAgung Moses C Satria
 
Richter video screenshots may
Richter video screenshots mayRichter video screenshots may
Richter video screenshots mayrazleesecurity
 
Bcs project of telenor
Bcs project of telenorBcs project of telenor
Bcs project of telenoraaaswad
 
Wellness at hand: Exploring interactive technology to support smokers
Wellness at hand: Exploring interactive technology to support smokersWellness at hand: Exploring interactive technology to support smokers
Wellness at hand: Exploring interactive technology to support smokersUniversity of Melbourne, Australia
 
中国ソーシャルメディア その実態と動向(2012年8月版)
中国ソーシャルメディア   その実態と動向(2012年8月版)中国ソーシャルメディア   その実態と動向(2012年8月版)
中国ソーシャルメディア その実態と動向(2012年8月版)Takamitsu Nakao
 

Viewers also liked (20)

Knowledge acquisition group capabilities 2014 q1 (concise)
Knowledge acquisition group capabilities 2014 q1 (concise)Knowledge acquisition group capabilities 2014 q1 (concise)
Knowledge acquisition group capabilities 2014 q1 (concise)
 
在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”
在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”
在中日系企業の強い味方 微博(ウェイボ)型社内SNS ”CFB”
 
中国のスマートフォン市場とソーシャルネットワーク市場
中国のスマートフォン市場とソーシャルネットワーク市場中国のスマートフォン市場とソーシャルネットワーク市場
中国のスマートフォン市場とソーシャルネットワーク市場
 
Nautical numbers
Nautical numbersNautical numbers
Nautical numbers
 
中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)
中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)
中国モバイル市場&ソーシャルメディア概要(2012年3月23日版)
 
CFBご利用・ご活用ガイド
CFBご利用・ご活用ガイドCFBご利用・ご活用ガイド
CFBご利用・ご活用ガイド
 
numéricos, embarcados e componentes básicos de um computador - UFS
numéricos, embarcados e componentes básicos de um computador - UFSnuméricos, embarcados e componentes básicos de um computador - UFS
numéricos, embarcados e componentes básicos de um computador - UFS
 
Marketing-Methods,Marketing Automation & Marketing‘s Accountability
Marketing-Methods,Marketing Automation & Marketing‘s AccountabilityMarketing-Methods,Marketing Automation & Marketing‘s Accountability
Marketing-Methods,Marketing Automation & Marketing‘s Accountability
 
Nor-shipping stand build 2011
Nor-shipping stand build 2011Nor-shipping stand build 2011
Nor-shipping stand build 2011
 
中国モバイル市場とソーシャルメディア市場(2013年1月版)
中国モバイル市場とソーシャルメディア市場(2013年1月版)中国モバイル市場とソーシャルメディア市場(2013年1月版)
中国モバイル市場とソーシャルメディア市場(2013年1月版)
 
中国Android事情
中国Android事情中国Android事情
中国Android事情
 
Laporan observasi Kecerdasan Buatan
Laporan observasi Kecerdasan BuatanLaporan observasi Kecerdasan Buatan
Laporan observasi Kecerdasan Buatan
 
Jhonier montoya
Jhonier montoyaJhonier montoya
Jhonier montoya
 
Solcellekursus 7 maj 2011_på_fc_jane_kruse
Solcellekursus  7 maj 2011_på_fc_jane_kruseSolcellekursus  7 maj 2011_på_fc_jane_kruse
Solcellekursus 7 maj 2011_på_fc_jane_kruse
 
Solcellekursus 7 maj 2011_på_fc_csm
Solcellekursus  7 maj 2011_på_fc_csmSolcellekursus  7 maj 2011_på_fc_csm
Solcellekursus 7 maj 2011_på_fc_csm
 
A broad overview of Tele-consultation through SocialNUI lens
A broad overview of Tele-consultation through SocialNUI lensA broad overview of Tele-consultation through SocialNUI lens
A broad overview of Tele-consultation through SocialNUI lens
 
Richter video screenshots may
Richter video screenshots mayRichter video screenshots may
Richter video screenshots may
 
Bcs project of telenor
Bcs project of telenorBcs project of telenor
Bcs project of telenor
 
Wellness at hand: Exploring interactive technology to support smokers
Wellness at hand: Exploring interactive technology to support smokersWellness at hand: Exploring interactive technology to support smokers
Wellness at hand: Exploring interactive technology to support smokers
 
中国ソーシャルメディア その実態と動向(2012年8月版)
中国ソーシャルメディア   その実態と動向(2012年8月版)中国ソーシャルメディア   その実態と動向(2012年8月版)
中国ソーシャルメディア その実態と動向(2012年8月版)
 

Similar to Knowledge acquisition using automated techniques

The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...Iman Mirrezaei
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic ComputingMeena Nagarajan
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Cuong Tran Van
 
Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSeth Grimes
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxnilesh405711
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesHammad Afzal
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data miningDevakumar Jain
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1Sumit Sony
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsCloudTechnologies
 
Enhancing Semantic Mining
Enhancing Semantic MiningEnhancing Semantic Mining
Enhancing Semantic MiningSanthosh Kumar
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus typejins0618
 
Web 3 Expert System
Web 3 Expert SystemWeb 3 Expert System
Web 3 Expert Systemguest4513a7
 
Web 3 Expert System
Web 3 Expert SystemWeb 3 Expert System
Web 3 Expert SystemMediabistro
 
Content Analysis Overview for Persona Development
Content Analysis Overview for Persona DevelopmentContent Analysis Overview for Persona Development
Content Analysis Overview for Persona DevelopmentPamela Rutledge
 
Named entity recognition using web document corpus
Named entity recognition using web document corpusNamed entity recognition using web document corpus
Named entity recognition using web document corpusIJMIT JOURNAL
 
Knowledge base system appl. p 3,4
Knowledge base system appl.  p 3,4Knowledge base system appl.  p 3,4
Knowledge base system appl. p 3,4Taymoor Nazmy
 
Named Entity Recognition Using Web Document Corpus
Named Entity Recognition Using Web Document CorpusNamed Entity Recognition Using Web Document Corpus
Named Entity Recognition Using Web Document CorpusIJMIT JOURNAL
 

Similar to Knowledge acquisition using automated techniques (20)

The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
 
Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled Vision
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptx
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
 
Ontology learning
Ontology learningOntology learning
Ontology learning
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
 
Enhancing Semantic Mining
Enhancing Semantic MiningEnhancing Semantic Mining
Enhancing Semantic Mining
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus type
 
Web 3 Expert System
Web 3 Expert SystemWeb 3 Expert System
Web 3 Expert System
 
Web 3 Expert System
Web 3 Expert SystemWeb 3 Expert System
Web 3 Expert System
 
ppt
pptppt
ppt
 
Content Analysis Overview for Persona Development
Content Analysis Overview for Persona DevelopmentContent Analysis Overview for Persona Development
Content Analysis Overview for Persona Development
 
Named entity recognition using web document corpus
Named entity recognition using web document corpusNamed entity recognition using web document corpus
Named entity recognition using web document corpus
 
Knowledge base system appl. p 3,4
Knowledge base system appl.  p 3,4Knowledge base system appl.  p 3,4
Knowledge base system appl. p 3,4
 
Named Entity Recognition Using Web Document Corpus
Named Entity Recognition Using Web Document CorpusNamed Entity Recognition Using Web Document Corpus
Named Entity Recognition Using Web Document Corpus
 

More from University of Melbourne, Australia

OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...
OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...
OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...University of Melbourne, Australia
 
Supporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of PhysiotherapySupporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of PhysiotherapyUniversity of Melbourne, Australia
 
SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...
SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...
SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...University of Melbourne, Australia
 
Supporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of PhysiotherapySupporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of PhysiotherapyUniversity of Melbourne, Australia
 
Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...
Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...
Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...University of Melbourne, Australia
 
Understanding Video based Parent Training Intervention for Children with Autism
Understanding Video based Parent Training Intervention for Children with AutismUnderstanding Video based Parent Training Intervention for Children with Autism
Understanding Video based Parent Training Intervention for Children with AutismUniversity of Melbourne, Australia
 

More from University of Melbourne, Australia (12)

OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...
OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...
OzCHI 2020: Lessons Learnt from Designing a Smart Clothing Telehealth System ...
 
Supporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of PhysiotherapySupporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of Physiotherapy
 
SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...
SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...
SoPhy: A wearable Technology for Lower Limb Assessment in Video Consultations...
 
Supporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of PhysiotherapySupporting Bodily Communication in Video Consultations of Physiotherapy
Supporting Bodily Communication in Video Consultations of Physiotherapy
 
Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...
Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...
Doctor, Can You See My Squats?: Understanding Bodily Communication in Video C...
 
Understanding Video based Parent Training Intervention for Children with Autism
Understanding Video based Parent Training Intervention for Children with AutismUnderstanding Video based Parent Training Intervention for Children with Autism
Understanding Video based Parent Training Intervention for Children with Autism
 
PhD Confirmation talk
PhD Confirmation talkPhD Confirmation talk
PhD Confirmation talk
 
Six months progress review (PhD work)
Six months progress review (PhD work)Six months progress review (PhD work)
Six months progress review (PhD work)
 
Masters thesis defense talk
Masters thesis defense talkMasters thesis defense talk
Masters thesis defense talk
 
5min presentation
5min presentation5min presentation
5min presentation
 
Demography based ATM design
Demography based ATM designDemography based ATM design
Demography based ATM design
 
Upick
UpickUpick
Upick
 

Recently uploaded

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Knowledge acquisition using automated techniques

  • 1. Methods of Knowledge Extraction Deepti Aggarwal SIEL|SERL, IIIT-Hyderabad, India
  • 2. Agenda  Introduction to Web as a knowledge repository  Automated extraction techniques (Input sources, extracted structures, input pre- processing, extraction methods, output generation)  Issues with automated extraction
  • 3. What is knowledge?  A familiarity with someone or something with experience  Includes facts, information, descriptions, skills
  • 4. Types of Knowledge Explicit Knowledge Implicit Knowledge  Always present  Not present explicitly explicitly in records for analysis  Objective facts having  Cultural beliefs with a definite answer subjective judgments  E.g., Hyderabad is the capital of A.P.  E.g., Hyderabad is the best city to live in India.
  • 5. How knowledge is represented over a period of time?  From Public library to global library
  • 6. How knowledge is represented over the web?  Millions of documents, blogs, forums, social networks scattered on web  Diverse topic, different formats, from diverse people in diverse language, different point of views
  • 7. Benefits of knowledge extraction over the Web  Question Answering systems  Search engines Explicit  Validating knowledge knowledge  Tracking a particular information  Predicting market, polls etc. Implicit  Community advertisements knowledge
  • 8. Problems with knowledge acquisition over web  Abundance of data  Relevance of information  Personalized retrieval
  • 9. Possible approaches  Manual filtering  Automated techniques  Combination of both
  • 11. Working of automated extraction systems Defining Input output pre- Extraction Output structures processing methods processing Input sources Database of all facts, Extraction system relations
  • 12. Input sources Types
  • 13. Input sources  web documents  news articles  blogs  social networks activities (user profiles, posts, comments) Sentence level parsing required.
  • 14. Defining the structures of output Named Entities and their relations
  • 15. Output structures  Named Entities  Named entities relations
  • 16. 1. Named Entity: Definition  It is an atomic element in a body of text.  Types: person, organization, location etc.  Different named entities when linked together, form a relation.
  • 17. 1. Named Entity: An example Sachin Tendulkarwas born in Bombay. NE of type „Person‟ NE of type „Location‟
  • 18. 2. Named Entity Relationship: Structure Subject – Relation - Object NE of any type NE of any type Verb, Adjective, Adverb
  • 19. 2. Named Entity Relationship: An Example Sachin Tendulkar was born inBombay Subject Relation Object
  • 20. Co-referencing Sachin was born in Bombay. He is a ... Sachin Tendulkar…. Mr. Tendulkar … Master Blaster...
  • 21. Input pre-processing Libraries
  • 22. NLP libraries:  Splitting each sentence into tokens, words, digits using Sentence Tokenizer  Recognizing language constructs, nouns, verbs, pronouns using Part-of-speech Tagger  Example: Sachin/NNPTendulkar/NNP was/VBD born/VBN in/IN Bombay/NNP
  • 23. NLP libraries (contd.):  Linking individual constituents of a sentence with Parser to form parse tree  Identify types of named entity using Named Entity Recognizer  Example: Sachin Tendulkar/PERSON was born inBombay/LOCATION
  • 24. NLP libraries (contd.):  Identify all co-references and replace with actual entity using Co - reference Resolution tool  Identify specific meaning of a word Word Sense Disambiguation  External vocabularies: MindNet, DBpedia, WordNet  E.g., contextual meaning of „crane‟: noun-bird, verb-lift/move
  • 26. Extracting relationships among NEs: Standard process named entities within a 1. Identify sentence. verbor adjective that 2. Find the connects the identified named entities. 3. Connect them together to form relation.
  • 27. Extracting relationships among NEs: Required process 1. Identifypart-of-speech constructs: noun, verb, adjective etc. Co-references, 2. Determine Acronyms and abbreviations. 3. Connect them together to form a relationship.
  • 28. Extraction Methods  Natural Language Processing: rule based.  Based on sentence structure  E.g., for English language, a rule can be “noun-verb-noun”  Machine Learning: supervised and unsupervised learning.  Features are detected from the training data  E.g., to extract instances of some medical diseases, system is trained over all the symptoms of each given disease.
  • 29. Extraction Methods (contd.)  Other methods:Vocabulary based systems, context based clustering.  Maintaining a mapping file of all countries and their nationalities helps to determine nationality of a person when his birth place is known.  Hybrid:  NLP based libraries to pre-process the input data, applying machine learning approach to extract the relations by using some external vocabulary as WordNet.
  • 31. Types of output systems 1. Identifies all mentionsof named entities and their relations. E.g., from a given corpus, extract all named entity relations. 2. Identify missing relations of a database E.g., Given a database, extract the missing attributes of given entities from the corpus. 3. Linking various entities within a database. E.g., Given a database, link two entities together with some relation extracted from the corpus.
  • 32. Working of automated extraction systems Defining Input output pre- Extraction Output structures processing methods processing Input sources Database of all facts, Extraction system relations
  • 33. Issues with automated extraction Accuracy, running time, dependency
  • 34. Issue 1: Challenges of language structure Co-reference resolution Ambiguous, complex sentences Abbreviations Acronyms
  • 35. See an example… “Tomcalled his father last night. They talked for an hour. Hesaid hewould be home the next day." What is „He'referring to? Tomorhis father?
  • 36. “You see sir, I can talk English, I can walk English, I can laugh English, I can run English, because English is such a funny language.” Amitabh in NamakHalal
  • 37. Issue 2: Accuracy  Named entity detection: 90%, relationship 50-70%.  Introduction of noise at each step.  E.g., disambiguation of acronym „crane‟ with WordNet, introduces contextual errors, which then decreases accuracy of rule based relationship extraction
  • 38. Issue 3: Efficiency  Feature detection steps are expensive.  Require days for computation
  • 39. Issue 4: Dependency  on external vocabulary sources, like Wikipedia, WordNet, MindNetetc.  Maintenance &updationof vocabulary sources is manual: costly and require expertise.  Limited size produce context based noise  Domain-dependent: medical domain  Corpus-dependent: Wikipedia, news corpus  Relation specific: Dateand Place-of- event
  • 40. Issue 5: Problem with Implicit knowledge extraction  Community Knowledge is learned and shared  No one can be an expert.  cultural competence and perception of workers are fed into a system as variables. Cultural Consensus Theory provides models to include such variables into the system.
  • 41. Can we do better? Can we seek human intelligence to improve the accuracy of automated techniques?
  • 42. References [1] I. Tuomi. Data is more than knowledge: implications of the reversed knowledge hierarchy for knowledge management and organizational memory. J. Manage. Inf. Syst. , 16(3):103–117, Dec. 1999. [2] S. Sekine. Named Entity: History and Future. 2004. [3] S. Sarawagi. Information extraction. Found. Trends databases , 1(3):261–377, Mar. 2008. [4] S. C. Weller. Cultural consensus theory: Applications and frequently asked questions. Field Methods,19(4):339–368, 2007.
  • 43. References (contd.) [5] Z. Syed, E. Viegas, and S. Parastatidis. Automatic discovery of semantic relations using mindnet. LREC,2010. [6] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Wordnet: An on-line lexical database. International Journal of Lexicography , 3:235–244, 1990 [7] T. S. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Eng. Bull. , pages 40–48, 2006. [8] E. Greengrass. Information retrieval: A survey, 2000.
  • 44. Thank you Questions?

Editor's Notes

  1. The definition of knowledge is a matter of on-going debate among philosophersbut for our talk I have taken this definition from wikipedia
  2. Predicting market: to predict whether people likes Lux soap or not.community advertisements. Ex: Advertising Bengalis’ community in Hyderabad for a concert in Bengali.
  3. Scarcity is not the issue but abundance is!Easy for humans to understand the meaning lying in different documents.Becomes difficult for a user to find a document of his interest.
  4. Too much of labour, time consuming, biasedness, For huge data, an intelligent way is to formulate an algo which can perform repetitive computation. with systems instead of manual labour. Less time consuming, Which I will talk about in my ppt.I Consider it to be more appropriate. Combines the advantages of both systems and humans. Systems: scalability and accuracy and intelligence with humans. In my thesis, I have particularly opted for this approach. Today I am not talking about this approach. I will cover this topic in some later ppt.
  5. Systems that are built over some algorithms: the use of methods for controlling industrial processes automatically, esp by electronically controlled systems, often reducing manpower
  6. Broad overview of how system worksAccording to me these are five main components
  7. Broad overview of how system worksAccording to me these are five main components
  8. Type of extraction method depends on the applicationHighly sophisticated system can achieve max. of 70% accuracy. Accuracy of automated techniques can not surpass human intelligence.