SlideShare a Scribd company logo
1 of 15
Download to read offline
Set Expansion
Meera Parmar (201305529) !
! meera.parmar@students.iiit.ac.in!
Nishith Maheshwari (201002016) !
! nishith.maheshwari@students.iiit.ac.in!
Vandan Mujadia(201323602) !
! vandan.mujadia@research.iiit.ac.in!
Venessa Tauro(201101032) !
! venessaroshni.tauro@students.iiit.ac.in
Set Expansion-What is it?
❖ Set expansion is a way to expand a set of given seed entities
automatically into a more complete set.!
! For example!
Input set: !
• {Sachin Tendulkar, Dhoni,Rahul dravid}!
expand set: !
• {amit bhandari, syed abid ali, parthiv patel, murali kartik,…}!
!
Tools used !
❖ Stanford POS(parts of speech tagger)!
• to eliminated non nominal entities from the parsed
list.!
❖ Stanford NER(Named entity recoginizer)!
• used in ranking to recognize proper name to put
entities in relevance order
Approach
(parsing and index creation)
A corpus based approach (wikipedia dataset)!
• parse ‘list of ’ pages to get entity list. !
• parse entity list based on ‘category’ given in wiki
page.!
• parse entity list from ‘Infobox , Taxobox , Geobox ’
etc..!
• parse entity list from wiki page contents.
parsing and indexing
Approach
(ranking categories and search)
!
Ranking of categories !
❖ ranked entity based on tf/idf score!
❖ ranked entity by word vector distance score !
Search !
❖ First search in ‘category list’ index!
❖ If there is no list found then search in ‘list of pages
list’ index
Searching and Ranking
Experiment
Input : !
! raajneeti anjaana anjaani my name is khan !
Output: !
jaane kahan se aayi hai !
antardwand!
pyaar impossible!
peepli live!
atithi tum kab jaoge!
mr singh mrs mehta!
khatta meetha!
anjaana anjaani!
thanks maa!
khelein hum jee jaan sey
Applications!
❖ Named entities recognition !
❖ In evaluation of question answering system!
❖ Text summarisation !
❖ Search result suggestion !
❖ etc..
Last words
❖ In this project we have devised a method for set
expansion on the Wikipedia data by applying a simple
yet effective approach. !
❖ This unsupervised method used to extent entity list
independent of the language. !
❖ For the validation, we tested the approach on multiple
domains and obtained acceptable results.(shown in
video)
References
❖ http://mlg.eng.cam.ac.uk/zoubin/papers/bsets-
nips05.pdf!
❖ https://www.cs.cmu.edu/afs/cs/Web/People/
wcohen/postscript/icdm-2007.pdf!
❖ http://www.dfki.de/~neumann/
InformationExtractionLecture2011/sessions/7-
SetExpansion.pdf
Project Links
❖ Project description : http://researchweb.iiit.ac.in/
~vandan.mujadia/!
❖ Project Demo : https://www.youtube.com/watch?
v=XZez5aMBNNc&feature=youtu.be!
❖ Project Presentation : http://www.slideshare.net/
VandanMujadia/set-expansioniiit-hireteam-no14!
❖ Project CodeBase : https://github.com/vmujadia/IIIT-
H-IRE14
Thank you

More Related Content

Viewers also liked

SPECIALITÀ E MONUMENTI D'ITALIA
SPECIALITÀ E MONUMENTI D'ITALIASPECIALITÀ E MONUMENTI D'ITALIA
SPECIALITÀ E MONUMENTI D'ITALIAelia osma
 
Pre production work jordan
Pre production work jordanPre production work jordan
Pre production work jordanecsmedia
 
Encontro co autor fernando lalana
Encontro co autor fernando lalanaEncontro co autor fernando lalana
Encontro co autor fernando lalanamigadepan
 
Faith at the Ferrell Power Point Sales Pitch
Faith at the Ferrell Power Point Sales PitchFaith at the Ferrell Power Point Sales Pitch
Faith at the Ferrell Power Point Sales PitchClaire_Perkins
 
Little cherry virus 2 comp 1
Little cherry virus 2 comp 1Little cherry virus 2 comp 1
Little cherry virus 2 comp 1treddout
 
Java EE 7 in practise - OTN Hyderabad 2014
Java EE 7 in practise - OTN Hyderabad 2014Java EE 7 in practise - OTN Hyderabad 2014
Java EE 7 in practise - OTN Hyderabad 2014Jagadish Prasath
 
홈페이지 개편 자료 수집
홈페이지 개편 자료 수집홈페이지 개편 자료 수집
홈페이지 개편 자료 수집Andrew Hwang
 
Valley Medical Center Style Guide
Valley Medical Center Style GuideValley Medical Center Style Guide
Valley Medical Center Style GuideShannonKrig
 
Presentation
PresentationPresentation
PresentationKevLoud
 
Media coursework
Media courseworkMedia coursework
Media courseworkecsmedia
 
Unidad 1 windows XP
Unidad 1 windows XPUnidad 1 windows XP
Unidad 1 windows XPedumoreno1
 
Visualizing Issues - UCCA Presentation
Visualizing Issues - UCCA PresentationVisualizing Issues - UCCA Presentation
Visualizing Issues - UCCA Presentationseaninchina
 
1000207鐵馬樂活在台東(3天2夜)(s)
1000207鐵馬樂活在台東(3天2夜)(s)1000207鐵馬樂活在台東(3天2夜)(s)
1000207鐵馬樂活在台東(3天2夜)(s)瑞明 許
 
Gods great plan!
Gods great plan!Gods great plan!
Gods great plan!shawker
 

Viewers also liked (19)

Left 4 Dead
Left 4 DeadLeft 4 Dead
Left 4 Dead
 
SPECIALITÀ E MONUMENTI D'ITALIA
SPECIALITÀ E MONUMENTI D'ITALIASPECIALITÀ E MONUMENTI D'ITALIA
SPECIALITÀ E MONUMENTI D'ITALIA
 
Lista
ListaLista
Lista
 
Pre production work jordan
Pre production work jordanPre production work jordan
Pre production work jordan
 
Encontro co autor fernando lalana
Encontro co autor fernando lalanaEncontro co autor fernando lalana
Encontro co autor fernando lalana
 
Faith at the Ferrell Power Point Sales Pitch
Faith at the Ferrell Power Point Sales PitchFaith at the Ferrell Power Point Sales Pitch
Faith at the Ferrell Power Point Sales Pitch
 
Little cherry virus 2 comp 1
Little cherry virus 2 comp 1Little cherry virus 2 comp 1
Little cherry virus 2 comp 1
 
Java EE 7 in practise - OTN Hyderabad 2014
Java EE 7 in practise - OTN Hyderabad 2014Java EE 7 in practise - OTN Hyderabad 2014
Java EE 7 in practise - OTN Hyderabad 2014
 
Merly
MerlyMerly
Merly
 
홈페이지 개편 자료 수집
홈페이지 개편 자료 수집홈페이지 개편 자료 수집
홈페이지 개편 자료 수집
 
Valley Medical Center Style Guide
Valley Medical Center Style GuideValley Medical Center Style Guide
Valley Medical Center Style Guide
 
Presentation
PresentationPresentation
Presentation
 
Media coursework
Media courseworkMedia coursework
Media coursework
 
Unidad 1 windows XP
Unidad 1 windows XPUnidad 1 windows XP
Unidad 1 windows XP
 
Visualizing Issues - UCCA Presentation
Visualizing Issues - UCCA PresentationVisualizing Issues - UCCA Presentation
Visualizing Issues - UCCA Presentation
 
1000207鐵馬樂活在台東(3天2夜)(s)
1000207鐵馬樂活在台東(3天2夜)(s)1000207鐵馬樂活在台東(3天2夜)(s)
1000207鐵馬樂活在台東(3天2夜)(s)
 
Application for HSE winter school
Application for HSE winter schoolApplication for HSE winter school
Application for HSE winter school
 
Gods great plan!
Gods great plan!Gods great plan!
Gods great plan!
 
Job specification
Job specificationJob specification
Job specification
 

Similar to Set expansion(iiit h[ire]team no-14)

SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...Alp Öktem
 
NAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive AnalyticsNAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive Analyticsshengjing 孙胜晶
 
Building successful research collaboration
Building successful research collaborationBuilding successful research collaboration
Building successful research collaborationQSR International
 
Using Data to Improve and Grow Work-Based Learning
Using Data to Improve and Grow Work-Based LearningUsing Data to Improve and Grow Work-Based Learning
Using Data to Improve and Grow Work-Based LearningNAFCareerAcads
 
Accessible health education: Setting it up from scratch
Accessible health education: Setting it up from scratchAccessible health education: Setting it up from scratch
Accessible health education: Setting it up from scratchTamara Shores
 
Impact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual InquiryImpact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual InquiryRachel Vacek
 
Strumenti di analisi per la valutazione di un gruppo di apprendimento online
Strumenti di analisi per la valutazione di un gruppo di apprendimento onlineStrumenti di analisi per la valutazione di un gruppo di apprendimento online
Strumenti di analisi per la valutazione di un gruppo di apprendimento onlineStefano Penge
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Innovation presentation
Innovation presentationInnovation presentation
Innovation presentationdionesioable
 
Introduction to blended learning
Introduction to blended learningIntroduction to blended learning
Introduction to blended learningSylvia Suh
 
Evaluating Tools in the Higher Ed Classroom: What Works
Evaluating Tools in the Higher Ed Classroom: What WorksEvaluating Tools in the Higher Ed Classroom: What Works
Evaluating Tools in the Higher Ed Classroom: What WorksKatherine Hepworth
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET Journal
 
Research Software Engineering at Stanford
Research Software Engineering at StanfordResearch Software Engineering at Stanford
Research Software Engineering at StanfordVanessa S
 
Ace the interview!_jill_2007
Ace the interview!_jill_2007Ace the interview!_jill_2007
Ace the interview!_jill_2007yfsud1
 

Similar to Set expansion(iiit h[ire]team no-14) (19)

SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
 
NAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive AnalyticsNAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive Analytics
 
Building successful research collaboration
Building successful research collaborationBuilding successful research collaboration
Building successful research collaboration
 
N01741100102
N01741100102N01741100102
N01741100102
 
Using Data to Improve and Grow Work-Based Learning
Using Data to Improve and Grow Work-Based LearningUsing Data to Improve and Grow Work-Based Learning
Using Data to Improve and Grow Work-Based Learning
 
Accessible health education: Setting it up from scratch
Accessible health education: Setting it up from scratchAccessible health education: Setting it up from scratch
Accessible health education: Setting it up from scratch
 
Impact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual InquiryImpact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual Inquiry
 
Strumenti di analisi per la valutazione di un gruppo di apprendimento online
Strumenti di analisi per la valutazione di un gruppo di apprendimento onlineStrumenti di analisi per la valutazione di un gruppo di apprendimento online
Strumenti di analisi per la valutazione di un gruppo di apprendimento online
 
Information research
Information researchInformation research
Information research
 
User Centered Design of an Android app
User Centered Design of an Android appUser Centered Design of an Android app
User Centered Design of an Android app
 
Ecer 2011
Ecer 2011Ecer 2011
Ecer 2011
 
Ecer 2011
Ecer 2011Ecer 2011
Ecer 2011
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Innovation presentation
Innovation presentationInnovation presentation
Innovation presentation
 
Introduction to blended learning
Introduction to blended learningIntroduction to blended learning
Introduction to blended learning
 
Evaluating Tools in the Higher Ed Classroom: What Works
Evaluating Tools in the Higher Ed Classroom: What WorksEvaluating Tools in the Higher Ed Classroom: What Works
Evaluating Tools in the Higher Ed Classroom: What Works
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
 
Research Software Engineering at Stanford
Research Software Engineering at StanfordResearch Software Engineering at Stanford
Research Software Engineering at Stanford
 
Ace the interview!_jill_2007
Ace the interview!_jill_2007Ace the interview!_jill_2007
Ace the interview!_jill_2007
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 

Set expansion(iiit h[ire]team no-14)

  • 1. Set Expansion Meera Parmar (201305529) ! ! meera.parmar@students.iiit.ac.in! Nishith Maheshwari (201002016) ! ! nishith.maheshwari@students.iiit.ac.in! Vandan Mujadia(201323602) ! ! vandan.mujadia@research.iiit.ac.in! Venessa Tauro(201101032) ! ! venessaroshni.tauro@students.iiit.ac.in
  • 2. Set Expansion-What is it? ❖ Set expansion is a way to expand a set of given seed entities automatically into a more complete set.! ! For example! Input set: ! • {Sachin Tendulkar, Dhoni,Rahul dravid}! expand set: ! • {amit bhandari, syed abid ali, parthiv patel, murali kartik,…}!
  • 3. ! Tools used ! ❖ Stanford POS(parts of speech tagger)! • to eliminated non nominal entities from the parsed list.! ❖ Stanford NER(Named entity recoginizer)! • used in ranking to recognize proper name to put entities in relevance order
  • 5. A corpus based approach (wikipedia dataset)! • parse ‘list of ’ pages to get entity list. ! • parse entity list based on ‘category’ given in wiki page.! • parse entity list from ‘Infobox , Taxobox , Geobox ’ etc..! • parse entity list from wiki page contents.
  • 8. ! Ranking of categories ! ❖ ranked entity based on tf/idf score! ❖ ranked entity by word vector distance score ! Search ! ❖ First search in ‘category list’ index! ❖ If there is no list found then search in ‘list of pages list’ index
  • 10. Experiment Input : ! ! raajneeti anjaana anjaani my name is khan ! Output: ! jaane kahan se aayi hai ! antardwand! pyaar impossible! peepli live! atithi tum kab jaoge! mr singh mrs mehta! khatta meetha! anjaana anjaani! thanks maa! khelein hum jee jaan sey
  • 11. Applications! ❖ Named entities recognition ! ❖ In evaluation of question answering system! ❖ Text summarisation ! ❖ Search result suggestion ! ❖ etc..
  • 12. Last words ❖ In this project we have devised a method for set expansion on the Wikipedia data by applying a simple yet effective approach. ! ❖ This unsupervised method used to extent entity list independent of the language. ! ❖ For the validation, we tested the approach on multiple domains and obtained acceptable results.(shown in video)
  • 14. Project Links ❖ Project description : http://researchweb.iiit.ac.in/ ~vandan.mujadia/! ❖ Project Demo : https://www.youtube.com/watch? v=XZez5aMBNNc&feature=youtu.be! ❖ Project Presentation : http://www.slideshare.net/ VandanMujadia/set-expansioniiit-hireteam-no14! ❖ Project CodeBase : https://github.com/vmujadia/IIIT- H-IRE14