SlideShare a Scribd company logo
GUIDE : Prof. Amitabha Mukerjee

            Ankit Modi (10104)
          Chirag Gupta (10212)
SOURCE            ?             TARGET




            S1, S2, S3,...Sn
SOURCE   Si                          Sj   TARGET




                 S1, S2, S3,...Sn
 Problem ?
 Tackling information overload
 Problem ?
 Tackling information overload

 Seeing bigger picture
 Problem ?
 Tackling information overload

 Seeing bigger picture

 Navigate between topics
 Domain   ?

 News browsing : One of primary uses of
 Internet

 Politics, Sports, Entertainment etc

 Searching for relevant news is difficult
Framework

  Corpus of news
 articles from The
       Hindu




 a delhi court on wednesday convicted sukhdev pehalwan, the third
 accused in the 2002 nitish katara murder case, saying that at the time
 of the incident he too was “present with convicts vikas yadav and
 vishal yadav,” currently serving life term in tihar jail.
Framework

  Corpus of news
 articles from The                   Split into words
       Hindu




  45
  ['a', 'delhi', 'court', 'on', 'wednesday', 'convicted', 'sukhdev', 'pehalwan,',
  'the', 'third', 'accused', 'in', 'the', '2002', 'nitish', 'katara', 'murder', 'case,',
  'saying', 'that', 'at', 'the', 'time', 'of', 'the', 'incident', 'he', 'too', 'was',
  'present', 'with', 'convicts', 'vikas', 'yadav', 'and', 'vishal', 'yadav',
  'currently', 'serving', 'life', 'term', 'in', 'tihar', 'jail', '']
Framework

  Corpus of news
 articles from The                    Split into words                       Stemming
       Hindu




  45
  ['a', 'delhi', 'court', 'on', 'wednesdai', 'convict', 'sukhdev', 'pehalwan,',
  'the', 'third', 'accus', 'in', 'the', '2002', 'nitish', 'katara', 'murder', 'case,', 'sai',
  'that', 'at', 'the', 'time', 'of', 'the', 'incid', 'he', 'too', 'wa', ‘present', 'with',
  'convict', 'vika', 'yadav', 'and', 'vishal', 'yadav', 'current', 'serv', 'life',
  'term', 'in', 'tihar', 'jail']
Framework

  Corpus of news
 articles from The                    Split into words                     Stemming
       Hindu




                                                                          Remove Stop
                                                                            words



 29
 ['delhi', 'court', 'wednesdai', 'convict', 'sukhdev', 'pehalwan,', 'third',
 'accus', '2002', 'nitish', 'katara', 'murder', 'case,', 'sai', 'time', 'incid', 'wa',
 'present', 'convict', 'vika', 'yadav', 'vishal', 'yadav', 'current', 'serv', 'life',
 'term', 'tihar', 'jail']
Framework

  Corpus of news
 articles from The                    Split into words                      Stemming
       Hindu




                                         Frequency of
                                           1-grams.                        Remove Stop
                                           Stored in                         words
                                          Histograms



  [['delhi', 1], ['court', 1], ['wednesdai', 1], ['sukhdev', 1], ['pehalwan,', 1],
  ['third', 1], ['accus', 1], ['2002', 1], ['nitish', 1], ['katara', 1], ['murder', 1],
  ['case,', 1], ['sai', 1], ['time', 1], ['incid', 1], ['wa', 1], ['present', 1], ['vika',
  1], ['vishal', 1], ['current', 1], ['serv', 1], ['life', 1], ['term', 1], ['tihar', 1],
  ['jail', 1], ['yadav', 2], ['convict', 2]]
Framework

  Corpus of news
 articles from The        Split into words          Stemming
       Hindu




                            Frequency of
  Bhattacharyya’s             1-grams.             Remove Stop
     Distance                 Stored in              words
                             Histograms


                     Bhattacharyya’s Distance
DB = - ln (BC(p,q) ):
where
BC(p,q) = x € X Σ (p(x).q(x))1/2 is the Bhattacharyya coefficient

                                                   Reference: [7]
Framework

  Corpus of news
 articles from The   Split into words    Stemming
       Hindu




                       Frequency of
  Bhattacharya’s
                         1-grams.       Remove Stop
    Distance.
                         Stored in        words
                        Histograms




                         Dijkstra’s
                         Algorithm

                                        Reference: [6]
Warrants issued
in Jessica case


  Notice to
 Vikas Yadav


Charges framed
 in Katara case


Katara attackers
   declared
  absconding

  Katara case:
 Sukhdev gets
      lifer
US Forces kill
   osama

 Inconceivable
that no support
   in Pak : US


Laden buried
   at sea

   Osama’s
pakistan home
  is no more


Death will break
  Al-Qaeda
   Coherence (d1, …,dn) = n-1Σi=1 Σw 1(w € di ∩ di+1)
    Every time a word appears in two consecutive articles,
    we score a point
    Drawback : Weak links

   Coherence (d1, …,dn) = i=1…n-1min Σw 1(w € di ∩ di+1)
    Minimal transition score




                                               Reference: [1]
Code Snapshot
   [1] Dafna Shahaf and Prof. Carlos Guestrin : Connecting the dots between news articles. ACM
    SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2010.
   [2] Dafna Shahaf , Prof. Carlos Guestrin and Eric Horvitz : Trains of thought-Generating
    information maps. International World Wide Web Conference (WWW), 2012.
   [3] Michael D. Lee, Brandon Pincombe and Matthew Welsh : An Empirical Evaluation of
    Models of Text Document Similarity. In Proceedings of the 27th Annual Conference of the
    Cognitive Science Society (2005).
   [4] Deept Kumar, Naren Ramakrishnan, Richard F. Helm, and Malcolm Potts : Algorithms for
    Storytelling. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO.
    6, JUNE 2008
   [5] M. Shahriar Hossain, Joseph Gresock, Yvette Edmonds, Richard Helm, Malcolm Potts and
    Naren Ramakrishnan. Connecting the Dots between PubMed Abstracts. 2012

   [6]http://networkx.github.com/documentation/latest/reference/generated/networkx.algorit
    hms.shortest_paths.weighted.dijkstra_path.html#networkx.algorithms.shortest_paths.weight
    ed.dijkstra_path
   [7] http://en.wikipedia.org/wiki/Bhattacharyya_distance
Questions ?
   [5] used Soergel distance to calculate distance between documents and
    then A* algorithm to find the chain

   [1] used bipartite graph and the notion of influence to find the chain

   [2] used notion of m-coherence for evaluation of results

More Related Content

Recently uploaded

Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
Kartik Tiwari
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 

Recently uploaded (20)

Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 

Featured

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Project for Public Spaces & National Center for Biking and Walking
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Connecting the dots between news articles

  • 1. GUIDE : Prof. Amitabha Mukerjee Ankit Modi (10104) Chirag Gupta (10212)
  • 2. SOURCE ? TARGET  S1, S2, S3,...Sn
  • 3. SOURCE Si Sj TARGET  S1, S2, S3,...Sn
  • 4.  Problem ? Tackling information overload
  • 5.  Problem ? Tackling information overload Seeing bigger picture
  • 6.  Problem ? Tackling information overload Seeing bigger picture Navigate between topics
  • 7.  Domain ? News browsing : One of primary uses of Internet Politics, Sports, Entertainment etc Searching for relevant news is difficult
  • 8. Framework Corpus of news articles from The Hindu a delhi court on wednesday convicted sukhdev pehalwan, the third accused in the 2002 nitish katara murder case, saying that at the time of the incident he too was “present with convicts vikas yadav and vishal yadav,” currently serving life term in tihar jail.
  • 9. Framework Corpus of news articles from The Split into words Hindu 45 ['a', 'delhi', 'court', 'on', 'wednesday', 'convicted', 'sukhdev', 'pehalwan,', 'the', 'third', 'accused', 'in', 'the', '2002', 'nitish', 'katara', 'murder', 'case,', 'saying', 'that', 'at', 'the', 'time', 'of', 'the', 'incident', 'he', 'too', 'was', 'present', 'with', 'convicts', 'vikas', 'yadav', 'and', 'vishal', 'yadav', 'currently', 'serving', 'life', 'term', 'in', 'tihar', 'jail', '']
  • 10. Framework Corpus of news articles from The Split into words Stemming Hindu 45 ['a', 'delhi', 'court', 'on', 'wednesdai', 'convict', 'sukhdev', 'pehalwan,', 'the', 'third', 'accus', 'in', 'the', '2002', 'nitish', 'katara', 'murder', 'case,', 'sai', 'that', 'at', 'the', 'time', 'of', 'the', 'incid', 'he', 'too', 'wa', ‘present', 'with', 'convict', 'vika', 'yadav', 'and', 'vishal', 'yadav', 'current', 'serv', 'life', 'term', 'in', 'tihar', 'jail']
  • 11. Framework Corpus of news articles from The Split into words Stemming Hindu Remove Stop words 29 ['delhi', 'court', 'wednesdai', 'convict', 'sukhdev', 'pehalwan,', 'third', 'accus', '2002', 'nitish', 'katara', 'murder', 'case,', 'sai', 'time', 'incid', 'wa', 'present', 'convict', 'vika', 'yadav', 'vishal', 'yadav', 'current', 'serv', 'life', 'term', 'tihar', 'jail']
  • 12. Framework Corpus of news articles from The Split into words Stemming Hindu Frequency of 1-grams. Remove Stop Stored in words Histograms [['delhi', 1], ['court', 1], ['wednesdai', 1], ['sukhdev', 1], ['pehalwan,', 1], ['third', 1], ['accus', 1], ['2002', 1], ['nitish', 1], ['katara', 1], ['murder', 1], ['case,', 1], ['sai', 1], ['time', 1], ['incid', 1], ['wa', 1], ['present', 1], ['vika', 1], ['vishal', 1], ['current', 1], ['serv', 1], ['life', 1], ['term', 1], ['tihar', 1], ['jail', 1], ['yadav', 2], ['convict', 2]]
  • 13. Framework Corpus of news articles from The Split into words Stemming Hindu Frequency of Bhattacharyya’s 1-grams. Remove Stop Distance Stored in words Histograms Bhattacharyya’s Distance DB = - ln (BC(p,q) ): where BC(p,q) = x € X Σ (p(x).q(x))1/2 is the Bhattacharyya coefficient Reference: [7]
  • 14. Framework Corpus of news articles from The Split into words Stemming Hindu Frequency of Bhattacharya’s 1-grams. Remove Stop Distance. Stored in words Histograms Dijkstra’s Algorithm Reference: [6]
  • 15. Warrants issued in Jessica case Notice to Vikas Yadav Charges framed in Katara case Katara attackers declared absconding Katara case: Sukhdev gets lifer
  • 16. US Forces kill osama Inconceivable that no support in Pak : US Laden buried at sea Osama’s pakistan home is no more Death will break Al-Qaeda
  • 17. Coherence (d1, …,dn) = n-1Σi=1 Σw 1(w € di ∩ di+1) Every time a word appears in two consecutive articles, we score a point Drawback : Weak links  Coherence (d1, …,dn) = i=1…n-1min Σw 1(w € di ∩ di+1) Minimal transition score Reference: [1]
  • 19. [1] Dafna Shahaf and Prof. Carlos Guestrin : Connecting the dots between news articles. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2010.  [2] Dafna Shahaf , Prof. Carlos Guestrin and Eric Horvitz : Trains of thought-Generating information maps. International World Wide Web Conference (WWW), 2012.  [3] Michael D. Lee, Brandon Pincombe and Matthew Welsh : An Empirical Evaluation of Models of Text Document Similarity. In Proceedings of the 27th Annual Conference of the Cognitive Science Society (2005).  [4] Deept Kumar, Naren Ramakrishnan, Richard F. Helm, and Malcolm Potts : Algorithms for Storytelling. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008  [5] M. Shahriar Hossain, Joseph Gresock, Yvette Edmonds, Richard Helm, Malcolm Potts and Naren Ramakrishnan. Connecting the Dots between PubMed Abstracts. 2012  [6]http://networkx.github.com/documentation/latest/reference/generated/networkx.algorit hms.shortest_paths.weighted.dijkstra_path.html#networkx.algorithms.shortest_paths.weight ed.dijkstra_path  [7] http://en.wikipedia.org/wiki/Bhattacharyya_distance
  • 21. [5] used Soergel distance to calculate distance between documents and then A* algorithm to find the chain  [1] used bipartite graph and the notion of influence to find the chain  [2] used notion of m-coherence for evaluation of results