SlideShare a Scribd company logo
1 of 20
Prepared By:- Group No. 27
Ashrith Jalagam(201202126)
Shefali Soni(201405619)
Aditya Lunawat(201405559)
Mentored By : Litton J Kurisinkel
 Document Summarizer is a platform used to generate the
summaries using pre-defined summarizers and get the
most relevant summary by passing it to a model.
 The relevancy of a document with respect to Computer
Science is determined using WordToVec model and get the
most relevant summary out of it.
 Various pre-built systems such as Apache-tika, WordToVec
models have been used for buiding the platform. This
platfrom can further be used by other developers.
 Several summarizers makes it difficult to judge which
summarizer suits the best for a scenario.
 Ability of the platform to test different summarizers
based on a domain helps the developers to make a
choice.
 This can be achieved by rating the documents based
on their relevancy achieved.
 Crawl the data and create a corpus of related to
Computer Science domain and create a model using
WordToVec tool.
 Given a URL/file, extract the textual content and create
a summary using different summarizers.
 Pass the summaries one by one to the WordToVec
model and get the relevancy of the summaries with
respect to computer science.
Corpus Creation
Text Extraction
Summary Generation
Relevancy Calculation
 Define a crawler that will crawl through the Dmoz
website and get the desired data.
 Get the wikipedia pages of all of these keywords and
store them in a text file which is the corpus of our
system.
 The wiki pages are being accessed using the Apache-
tika tool to get the pages.
 Input for the system can
be an URL or any type of file
such as pdf, excel, odt, odp
etc.These type of files must
be converted to text file for
the summarizers to manipulate.
This work is done using Apache-tika tool. Read the
input from either the URL or the file, pass it to Apache-
tika API and collect the output stream and write it to a
file.
 Four Different Summarizers were used to generate the
summary for each parsed text document/URL.
 Summarizer 1 : This Summarizer simply tokenizes the
given document and splits it into sentences. Then, it
calculates the rank of each sentence according to the TF-
IDF Model.
 Summarizer 2 : This Summarizer is similar to the
previous one but has a “min” and a “max” threshold. So,
only those sentences are considered which lie in that
range.
 Summarizer 3/4 : In these summarizers, there is an
inbuilt tokenizer and stemmer, uses help of nltk to
rank the final sentences.
 Summarizer 5 : This summarizer is the “Open Text
Summarizer”. This summarizer gives us the best
relevant results based on the summary ratio we
provide to it as input.
 There are a available set of summarizers added to
the system and more summarizers can be added to
the framework.
 User chooses among the available summarizers and
generate the summary.
 These summaries are being forwarded to the model
for relevancy calculation
 The input to the model is the textual
summary from all the summarizers. Pass the
summary one by one to the model.
 Based on certain parameters the model gives
the relevancy factor as the output to all the
summaries.
 Based on this factor the user decides, which
summary suits the most to the domain.
 News Feed (Relevancy based on searched category)
which means analysing the news and displaying only
the summary of the news rather than displaying the
whole content.
 Developed as a platform for the researchers working
on summarization as they can add new features to
this project.
 The project has been developed as a platform into
which new summarizers can easily be added.
 Ease for developers to decide which summarizer
works best for their domain by testing their data on
the summaries and calculating the relevance factor.
 Now the file factor is not the point for the
developer’s to think. Input any type of file or URL to
the platform.
 Open Url Directory For Computer Science
(http://www.dmoz.org/Computers/Computer_Science
 WORD2VEC model
Link: http://radimrehurek.com/gensim/index.html
 Summarizers
 http://glowingpython.blogspot.in/2014/09/text-
summarization-with-nltk.html
 https://pypi.python.org/pypi/sumy/0.3.0
 http://pythonwise.blogspot.in/2008/01/simple-text-
summarizer.html
Document Summarizer Platform for Generating Relevant Summaries

More Related Content

What's hot

Advanced database protocols
Advanced database protocolsAdvanced database protocols
Advanced database protocolsHitesh Mohapatra
 
Web Programming - 11 SweetAlert2, DataTables, and WYSIWYG API
Web Programming - 11 SweetAlert2, DataTables, and WYSIWYG APIWeb Programming - 11 SweetAlert2, DataTables, and WYSIWYG API
Web Programming - 11 SweetAlert2, DataTables, and WYSIWYG APIAndiNurkholis1
 
Cody_Zeng_HPE_Intern_Poster
Cody_Zeng_HPE_Intern_PosterCody_Zeng_HPE_Intern_Poster
Cody_Zeng_HPE_Intern_PosterCody Zeng
 
Building nTier Applications with Entity Framework Services (Part 2)
Building nTier Applications with Entity Framework Services (Part 2)Building nTier Applications with Entity Framework Services (Part 2)
Building nTier Applications with Entity Framework Services (Part 2)David McCarter
 
4) databases
4) databases4) databases
4) databasestechbed
 
Web Programming - 9 Create, Read, Update and Delete
Web Programming - 9 Create, Read, Update and DeleteWeb Programming - 9 Create, Read, Update and Delete
Web Programming - 9 Create, Read, Update and DeleteAndiNurkholis1
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksParang Saraf
 
Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)David McCarter
 
Mule soft meetup_4_mty_online_oct_2020
Mule soft meetup_4_mty_online_oct_2020Mule soft meetup_4_mty_online_oct_2020
Mule soft meetup_4_mty_online_oct_2020Veyra Celina
 

What's hot (13)

Advanced database protocols
Advanced database protocolsAdvanced database protocols
Advanced database protocols
 
Web Programming - 11 SweetAlert2, DataTables, and WYSIWYG API
Web Programming - 11 SweetAlert2, DataTables, and WYSIWYG APIWeb Programming - 11 SweetAlert2, DataTables, and WYSIWYG API
Web Programming - 11 SweetAlert2, DataTables, and WYSIWYG API
 
Cody_Zeng_HPE_Intern_Poster
Cody_Zeng_HPE_Intern_PosterCody_Zeng_HPE_Intern_Poster
Cody_Zeng_HPE_Intern_Poster
 
Sql Injection
Sql InjectionSql Injection
Sql Injection
 
Building nTier Applications with Entity Framework Services (Part 2)
Building nTier Applications with Entity Framework Services (Part 2)Building nTier Applications with Entity Framework Services (Part 2)
Building nTier Applications with Entity Framework Services (Part 2)
 
4) databases
4) databases4) databases
4) databases
 
SQL injection
SQL injectionSQL injection
SQL injection
 
Web Programming - 9 Create, Read, Update and Delete
Web Programming - 9 Create, Read, Update and DeleteWeb Programming - 9 Create, Read, Update and Delete
Web Programming - 9 Create, Read, Update and Delete
 
Testcase Preparation Checklist
Testcase Preparation ChecklistTestcase Preparation Checklist
Testcase Preparation Checklist
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
 
REST API
REST APIREST API
REST API
 
Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)
 
Mule soft meetup_4_mty_online_oct_2020
Mule soft meetup_4_mty_online_oct_2020Mule soft meetup_4_mty_online_oct_2020
Mule soft meetup_4_mty_online_oct_2020
 

Similar to Document Summarizer Platform for Generating Relevant Summaries

Generative AI Application Development using LangChain and LangFlow
Generative AI Application Development using LangChain and LangFlowGenerative AI Application Development using LangChain and LangFlow
Generative AI Application Development using LangChain and LangFlowGene Leybzon
 
IRJET- A Novel Approch Automatically Categorizing Software Technologies
IRJET- A Novel Approch Automatically Categorizing Software TechnologiesIRJET- A Novel Approch Automatically Categorizing Software Technologies
IRJET- A Novel Approch Automatically Categorizing Software TechnologiesIRJET Journal
 
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorialESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorialJonathon Hare
 
CSE681 – Software Modeling and Analysis Fall 2013 Project .docx
CSE681 – Software Modeling and Analysis Fall 2013 Project .docxCSE681 – Software Modeling and Analysis Fall 2013 Project .docx
CSE681 – Software Modeling and Analysis Fall 2013 Project .docxfaithxdunce63732
 
Article Summarizer
Article SummarizerArticle Summarizer
Article SummarizerJose Katab
 
IRJET - Automation in Python using Speech Recognition
IRJET -  	  Automation in Python using Speech RecognitionIRJET -  	  Automation in Python using Speech Recognition
IRJET - Automation in Python using Speech RecognitionIRJET Journal
 
How a search engine works report
How a search engine works reportHow a search engine works report
How a search engine works reportSovan Misra
 
IRJET- Deep Web Searching (DWS)
IRJET- Deep Web Searching (DWS)IRJET- Deep Web Searching (DWS)
IRJET- Deep Web Searching (DWS)IRJET Journal
 
Office automation system report
Office automation system reportOffice automation system report
Office automation system reportAmit Kulkarni
 
Office automation system report
Office automation system reportOffice automation system report
Office automation system reportAmit Kulkarni
 
Programming Without Coding Technology (PWCT) Environment
Programming Without Coding Technology (PWCT) EnvironmentProgramming Without Coding Technology (PWCT) Environment
Programming Without Coding Technology (PWCT) EnvironmentMahmoud Samir Fayed
 
Must be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docxMust be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docxherthaweston
 
Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...cscpconf
 
Software architectural patterns - A Quick Understanding Guide
Software architectural patterns - A Quick Understanding GuideSoftware architectural patterns - A Quick Understanding Guide
Software architectural patterns - A Quick Understanding GuideMohammed Fazuluddin
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...IEEEMEMTECHSTUDENTPROJECTS
 

Similar to Document Summarizer Platform for Generating Relevant Summaries (20)

Generative AI Application Development using LangChain and LangFlow
Generative AI Application Development using LangChain and LangFlowGenerative AI Application Development using LangChain and LangFlow
Generative AI Application Development using LangChain and LangFlow
 
IRJET- A Novel Approch Automatically Categorizing Software Technologies
IRJET- A Novel Approch Automatically Categorizing Software TechnologiesIRJET- A Novel Approch Automatically Categorizing Software Technologies
IRJET- A Novel Approch Automatically Categorizing Software Technologies
 
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorialESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
 
CSE681 – Software Modeling and Analysis Fall 2013 Project .docx
CSE681 – Software Modeling and Analysis Fall 2013 Project .docxCSE681 – Software Modeling and Analysis Fall 2013 Project .docx
CSE681 – Software Modeling and Analysis Fall 2013 Project .docx
 
Presentation on SEO, .htaccess, Open-source, Ontology, Semantic web, etc.
Presentation on SEO, .htaccess, Open-source, Ontology, Semantic web, etc.Presentation on SEO, .htaccess, Open-source, Ontology, Semantic web, etc.
Presentation on SEO, .htaccess, Open-source, Ontology, Semantic web, etc.
 
Article Summarizer
Article SummarizerArticle Summarizer
Article Summarizer
 
IRJET - Automation in Python using Speech Recognition
IRJET -  	  Automation in Python using Speech RecognitionIRJET -  	  Automation in Python using Speech Recognition
IRJET - Automation in Python using Speech Recognition
 
How a search engine works report
How a search engine works reportHow a search engine works report
How a search engine works report
 
robot framework1.pptx
robot framework1.pptxrobot framework1.pptx
robot framework1.pptx
 
IRJET- Deep Web Searching (DWS)
IRJET- Deep Web Searching (DWS)IRJET- Deep Web Searching (DWS)
IRJET- Deep Web Searching (DWS)
 
Office automation system report
Office automation system reportOffice automation system report
Office automation system report
 
Office automation system report
Office automation system reportOffice automation system report
Office automation system report
 
Programming Without Coding Technology (PWCT) Environment
Programming Without Coding Technology (PWCT) EnvironmentProgramming Without Coding Technology (PWCT) Environment
Programming Without Coding Technology (PWCT) Environment
 
Must be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docxMust be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docx
 
Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...
 
Proposal with sdlc
Proposal with sdlcProposal with sdlc
Proposal with sdlc
 
Complete-Mini-Project-Report
Complete-Mini-Project-ReportComplete-Mini-Project-Report
Complete-Mini-Project-Report
 
[IJET V2I3P7] Authors: Muthe Sandhya, Shitole Sarika, Sinha Anukriti, Aghav S...
[IJET V2I3P7] Authors: Muthe Sandhya, Shitole Sarika, Sinha Anukriti, Aghav S...[IJET V2I3P7] Authors: Muthe Sandhya, Shitole Sarika, Sinha Anukriti, Aghav S...
[IJET V2I3P7] Authors: Muthe Sandhya, Shitole Sarika, Sinha Anukriti, Aghav S...
 
Software architectural patterns - A Quick Understanding Guide
Software architectural patterns - A Quick Understanding GuideSoftware architectural patterns - A Quick Understanding Guide
Software architectural patterns - A Quick Understanding Guide
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
 

Recently uploaded

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Recently uploaded (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 

Document Summarizer Platform for Generating Relevant Summaries

  • 1. Prepared By:- Group No. 27 Ashrith Jalagam(201202126) Shefali Soni(201405619) Aditya Lunawat(201405559) Mentored By : Litton J Kurisinkel
  • 2.  Document Summarizer is a platform used to generate the summaries using pre-defined summarizers and get the most relevant summary by passing it to a model.  The relevancy of a document with respect to Computer Science is determined using WordToVec model and get the most relevant summary out of it.  Various pre-built systems such as Apache-tika, WordToVec models have been used for buiding the platform. This platfrom can further be used by other developers.
  • 3.  Several summarizers makes it difficult to judge which summarizer suits the best for a scenario.  Ability of the platform to test different summarizers based on a domain helps the developers to make a choice.  This can be achieved by rating the documents based on their relevancy achieved.
  • 4.  Crawl the data and create a corpus of related to Computer Science domain and create a model using WordToVec tool.  Given a URL/file, extract the textual content and create a summary using different summarizers.  Pass the summaries one by one to the WordToVec model and get the relevancy of the summaries with respect to computer science.
  • 5.
  • 6.
  • 7. Corpus Creation Text Extraction Summary Generation Relevancy Calculation
  • 8.
  • 9.  Define a crawler that will crawl through the Dmoz website and get the desired data.  Get the wikipedia pages of all of these keywords and store them in a text file which is the corpus of our system.  The wiki pages are being accessed using the Apache- tika tool to get the pages.
  • 10.  Input for the system can be an URL or any type of file such as pdf, excel, odt, odp etc.These type of files must be converted to text file for the summarizers to manipulate. This work is done using Apache-tika tool. Read the input from either the URL or the file, pass it to Apache- tika API and collect the output stream and write it to a file.
  • 11.  Four Different Summarizers were used to generate the summary for each parsed text document/URL.  Summarizer 1 : This Summarizer simply tokenizes the given document and splits it into sentences. Then, it calculates the rank of each sentence according to the TF- IDF Model.  Summarizer 2 : This Summarizer is similar to the previous one but has a “min” and a “max” threshold. So, only those sentences are considered which lie in that range.
  • 12.  Summarizer 3/4 : In these summarizers, there is an inbuilt tokenizer and stemmer, uses help of nltk to rank the final sentences.  Summarizer 5 : This summarizer is the “Open Text Summarizer”. This summarizer gives us the best relevant results based on the summary ratio we provide to it as input.
  • 13.
  • 14.  There are a available set of summarizers added to the system and more summarizers can be added to the framework.  User chooses among the available summarizers and generate the summary.  These summaries are being forwarded to the model for relevancy calculation
  • 15.
  • 16.  The input to the model is the textual summary from all the summarizers. Pass the summary one by one to the model.  Based on certain parameters the model gives the relevancy factor as the output to all the summaries.  Based on this factor the user decides, which summary suits the most to the domain.
  • 17.  News Feed (Relevancy based on searched category) which means analysing the news and displaying only the summary of the news rather than displaying the whole content.  Developed as a platform for the researchers working on summarization as they can add new features to this project.
  • 18.  The project has been developed as a platform into which new summarizers can easily be added.  Ease for developers to decide which summarizer works best for their domain by testing their data on the summaries and calculating the relevance factor.  Now the file factor is not the point for the developer’s to think. Input any type of file or URL to the platform.
  • 19.  Open Url Directory For Computer Science (http://www.dmoz.org/Computers/Computer_Science  WORD2VEC model Link: http://radimrehurek.com/gensim/index.html  Summarizers  http://glowingpython.blogspot.in/2014/09/text- summarization-with-nltk.html  https://pypi.python.org/pypi/sumy/0.3.0  http://pythonwise.blogspot.in/2008/01/simple-text- summarizer.html