SlideShare a Scribd company logo
Major Project
Event Based News Clustering
Submitted By: Aniket Mishra
Problem Statement:
• To implement a clustering system which can cluster the data which is
related to it in one cluster and one can see what is happening in the
next event. so basically i have to implement event based news
clustering system using clustering algorithm.
Implementation Steps Followed:
• I have crawled data of election campaign Using BING API in different
time periods.
• Used sub categories AAP , BJP,Congress
• Applied k-means first I have taken 10 clusters.
• Then applied Modified K-means On data to improve it’s Efficiency.
• Applied algorithm using tfidf ,centroid calculation,cosine similiarity.
RSS Purity Rand Index
K-means 73.52 65.9 .66
Modified K-means 73.70 71.5 .649
Table 1 shows the results obtained by our system for k-means and
modified k-means algorithm.
Table 1-Comparison of clustering results
When calculating purity and rand index of k-means and modified k-
means we found out that when we repeat the clusters for 10 times and
get the initial k-points from each of the k different clusters rather than
random restart for modified k-means it gives better results and give
better purity as it can be.
Results Demonstration
These are the results in cluster 9 that are coming altogether making it related news as we can see all 4 news are
related to Rahul Gandhi. I have taken the news on 29-05-14 and these results were scattered and by using k-
means clustering they are clustered and we found out these results.
As in this second example that I have taken we can see news is mostly related to Punjab unit of congress.so this
is inferring that the news that I have taken correctly clustered. And we can also see that 2 news are also not
related so It is not 100% pure clustered news.
Conclusion
• In this project I have designed and evaluated clustering system. Our clustering
system crawls incoming news reports from Bing api and cluster them according to
the event they are describing. The clustering is performed by representing
incoming news reports as Bag of Word with TF-IDF weighting, and using a
variation of k-means algorithm that works in a single pass without cluster re-
organization. The number of cluster to produce is fixed for every query to 29 and
new events are detected automatically. Clustering process takes 1-2 minutes to
fetch news from website.
• The evaluation results show that our system is very effective when clustering
documents into highly specific clusters, but performs rather poorly when
clustering documents into more general categories and it performs better for
Modified k-means.
Future Work:
• It is my opinion that our clustering can be applied in other domains
apart from online news. For example it can be applied successfully to
the clustering of social media feed to produce clusters according to
the item being discussed by different people. In my project in future a
user interface for user can be created for better use. And we can also
improve its scalability
•
Thank you!

More Related Content

Viewers also liked

Vbm presentation-part2
Vbm presentation-part2Vbm presentation-part2
Vbm presentation-part2
Pooyan Najafi
 
memory lane buttons
memory lane buttonsmemory lane buttons
memory lane buttons
Ben Meulemans
 
New Media 2
New Media 2New Media 2
New Media 2
Ben Meulemans
 
New Media 2
New Media 2New Media 2
New Media 2
Ben Meulemans
 
Nutrient cycle by Uday sir
Nutrient cycle by Uday sirNutrient cycle by Uday sir
Nutrient cycle by Uday sir
udaysandy
 
artificial intelligence
 artificial intelligence artificial intelligence
artificial intelligence
Megha Sharma
 
Nilda Hair
Nilda HairNilda Hair
Nilda Hair
williansantana
 
Masterpresentatie finale
Masterpresentatie  finaleMasterpresentatie  finale
Masterpresentatie finale
Ben Meulemans
 
Vbm presentation-part1
Vbm presentation-part1Vbm presentation-part1
Vbm presentation-part1
Pooyan Najafi
 
Log in wireframe step by step
Log in wireframe step by stepLog in wireframe step by step
Log in wireframe step by stepBen Meulemans
 
Tips to have a quality life.
Tips to have a quality life.Tips to have a quality life.
Tips to have a quality life.
Aeleen Mc
 
Proyecto de investigacion maestria
Proyecto de investigacion   maestriaProyecto de investigacion   maestria
Proyecto de investigacion maestria
Rikdaly Mendez Piraban
 
Master presentatie 2
Master presentatie 2Master presentatie 2
Master presentatie 2
Ben Meulemans
 
Tehnik trading
Tehnik tradingTehnik trading
Tehnik trading
Yudha Widodo
 
Optimizing SAP All-In-One for High Tech and Software Companies
Optimizing SAP All-In-One for High Tech and Software CompaniesOptimizing SAP All-In-One for High Tech and Software Companies
Optimizing SAP All-In-One for High Tech and Software Companies
Idhasoft
 
Lagen wireframe step by step
Lagen wireframe step by stepLagen wireframe step by step
Lagen wireframe step by stepBen Meulemans
 

Viewers also liked (16)

Vbm presentation-part2
Vbm presentation-part2Vbm presentation-part2
Vbm presentation-part2
 
memory lane buttons
memory lane buttonsmemory lane buttons
memory lane buttons
 
New Media 2
New Media 2New Media 2
New Media 2
 
New Media 2
New Media 2New Media 2
New Media 2
 
Nutrient cycle by Uday sir
Nutrient cycle by Uday sirNutrient cycle by Uday sir
Nutrient cycle by Uday sir
 
artificial intelligence
 artificial intelligence artificial intelligence
artificial intelligence
 
Nilda Hair
Nilda HairNilda Hair
Nilda Hair
 
Masterpresentatie finale
Masterpresentatie  finaleMasterpresentatie  finale
Masterpresentatie finale
 
Vbm presentation-part1
Vbm presentation-part1Vbm presentation-part1
Vbm presentation-part1
 
Log in wireframe step by step
Log in wireframe step by stepLog in wireframe step by step
Log in wireframe step by step
 
Tips to have a quality life.
Tips to have a quality life.Tips to have a quality life.
Tips to have a quality life.
 
Proyecto de investigacion maestria
Proyecto de investigacion   maestriaProyecto de investigacion   maestria
Proyecto de investigacion maestria
 
Master presentatie 2
Master presentatie 2Master presentatie 2
Master presentatie 2
 
Tehnik trading
Tehnik tradingTehnik trading
Tehnik trading
 
Optimizing SAP All-In-One for High Tech and Software Companies
Optimizing SAP All-In-One for High Tech and Software CompaniesOptimizing SAP All-In-One for High Tech and Software Companies
Optimizing SAP All-In-One for High Tech and Software Companies
 
Lagen wireframe step by step
Lagen wireframe step by stepLagen wireframe step by step
Lagen wireframe step by step
 

Similar to Jiit 2013 14 project presentation aniket mishra

Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Stratio
 
Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...
GameCamp
 
Incentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data AnalysisIncentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data Analysis
rupasri mupparthi
 
Reduce Time to Value: Focus First on Configuration Management Debt
Reduce Time to Value: Focus First on Configuration Management DebtReduce Time to Value: Focus First on Configuration Management Debt
Reduce Time to Value: Focus First on Configuration Management Debt
Chris Sterling
 
QuickBooks Connect 2016 - Using WebHooks to handle data changes in your app
QuickBooks Connect 2016 - Using WebHooks to handle data changes in your appQuickBooks Connect 2016 - Using WebHooks to handle data changes in your app
QuickBooks Connect 2016 - Using WebHooks to handle data changes in your app
Intuit Developer
 
IRJET- Secure Distributed Data Mining
IRJET- Secure Distributed Data MiningIRJET- Secure Distributed Data Mining
IRJET- Secure Distributed Data Mining
IRJET Journal
 
Performance analysis of KNN & K-Means using internet advertisements data
Performance analysis of KNN & K-Means using internet advertisements dataPerformance analysis of KNN & K-Means using internet advertisements data
Performance analysis of KNN & K-Means using internet advertisements data
Muhammad GulRaj
 
Application Migration: How to Start, Scale and Succeed
Application Migration: How to Start, Scale and SucceedApplication Migration: How to Start, Scale and Succeed
Application Migration: How to Start, Scale and Succeed
VMware Tanzu
 
Marketing Campaign Management & Execution Process Final Submission
Marketing Campaign Management & Execution Process Final SubmissionMarketing Campaign Management & Execution Process Final Submission
Marketing Campaign Management & Execution Process Final Submission
Poonam Gupta
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
Harivamshi D
 
IRJET- Online Course Recommendation System
IRJET- Online Course Recommendation SystemIRJET- Online Course Recommendation System
IRJET- Online Course Recommendation System
IRJET Journal
 
Four Steps Toward a Safer Continuous Delivery Practice (Hint: Add Monitoring)
Four Steps Toward a Safer Continuous Delivery Practice (Hint: Add Monitoring)Four Steps Toward a Safer Continuous Delivery Practice (Hint: Add Monitoring)
Four Steps Toward a Safer Continuous Delivery Practice (Hint: Add Monitoring)
VMware Tanzu
 
Book Recommendation System
Book Recommendation SystemBook Recommendation System
Book Recommendation System
IRJET Journal
 
Cloud-Native Fundamentals: Accelerating Development with Continuous Integration
Cloud-Native Fundamentals: Accelerating Development with Continuous IntegrationCloud-Native Fundamentals: Accelerating Development with Continuous Integration
Cloud-Native Fundamentals: Accelerating Development with Continuous Integration
VMware Tanzu
 
ATAGTR2017 The way to recover the issue faced in IoT regression Testing
ATAGTR2017 The way to recover the issue faced in IoT regression TestingATAGTR2017 The way to recover the issue faced in IoT regression Testing
ATAGTR2017 The way to recover the issue faced in IoT regression Testing
Agile Testing Alliance
 
Software Sizing
Software SizingSoftware Sizing
Software Sizing
Noman Aftab
 
Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docx
ARIV4
 
Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8
Matthew Courtney
 
A Comparative Study Of Scrum And Kanban Approaches On A Real Case Study Using...
A Comparative Study Of Scrum And Kanban Approaches On A Real Case Study Using...A Comparative Study Of Scrum And Kanban Approaches On A Real Case Study Using...
A Comparative Study Of Scrum And Kanban Approaches On A Real Case Study Using...
Fiona Phillips
 
Mining Query Log to Suggest Competitive Keyphrases for Sponsored Search Via I...
Mining Query Log to Suggest Competitive Keyphrases for Sponsored Search Via I...Mining Query Log to Suggest Competitive Keyphrases for Sponsored Search Via I...
Mining Query Log to Suggest Competitive Keyphrases for Sponsored Search Via I...
IRJET Journal
 

Similar to Jiit 2013 14 project presentation aniket mishra (20)

Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
 
Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...
 
Incentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data AnalysisIncentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data Analysis
 
Reduce Time to Value: Focus First on Configuration Management Debt
Reduce Time to Value: Focus First on Configuration Management DebtReduce Time to Value: Focus First on Configuration Management Debt
Reduce Time to Value: Focus First on Configuration Management Debt
 
QuickBooks Connect 2016 - Using WebHooks to handle data changes in your app
QuickBooks Connect 2016 - Using WebHooks to handle data changes in your appQuickBooks Connect 2016 - Using WebHooks to handle data changes in your app
QuickBooks Connect 2016 - Using WebHooks to handle data changes in your app
 
IRJET- Secure Distributed Data Mining
IRJET- Secure Distributed Data MiningIRJET- Secure Distributed Data Mining
IRJET- Secure Distributed Data Mining
 
Performance analysis of KNN & K-Means using internet advertisements data
Performance analysis of KNN & K-Means using internet advertisements dataPerformance analysis of KNN & K-Means using internet advertisements data
Performance analysis of KNN & K-Means using internet advertisements data
 
Application Migration: How to Start, Scale and Succeed
Application Migration: How to Start, Scale and SucceedApplication Migration: How to Start, Scale and Succeed
Application Migration: How to Start, Scale and Succeed
 
Marketing Campaign Management & Execution Process Final Submission
Marketing Campaign Management & Execution Process Final SubmissionMarketing Campaign Management & Execution Process Final Submission
Marketing Campaign Management & Execution Process Final Submission
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
IRJET- Online Course Recommendation System
IRJET- Online Course Recommendation SystemIRJET- Online Course Recommendation System
IRJET- Online Course Recommendation System
 
Four Steps Toward a Safer Continuous Delivery Practice (Hint: Add Monitoring)
Four Steps Toward a Safer Continuous Delivery Practice (Hint: Add Monitoring)Four Steps Toward a Safer Continuous Delivery Practice (Hint: Add Monitoring)
Four Steps Toward a Safer Continuous Delivery Practice (Hint: Add Monitoring)
 
Book Recommendation System
Book Recommendation SystemBook Recommendation System
Book Recommendation System
 
Cloud-Native Fundamentals: Accelerating Development with Continuous Integration
Cloud-Native Fundamentals: Accelerating Development with Continuous IntegrationCloud-Native Fundamentals: Accelerating Development with Continuous Integration
Cloud-Native Fundamentals: Accelerating Development with Continuous Integration
 
ATAGTR2017 The way to recover the issue faced in IoT regression Testing
ATAGTR2017 The way to recover the issue faced in IoT regression TestingATAGTR2017 The way to recover the issue faced in IoT regression Testing
ATAGTR2017 The way to recover the issue faced in IoT regression Testing
 
Software Sizing
Software SizingSoftware Sizing
Software Sizing
 
Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docx
 
Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8
 
A Comparative Study Of Scrum And Kanban Approaches On A Real Case Study Using...
A Comparative Study Of Scrum And Kanban Approaches On A Real Case Study Using...A Comparative Study Of Scrum And Kanban Approaches On A Real Case Study Using...
A Comparative Study Of Scrum And Kanban Approaches On A Real Case Study Using...
 
Mining Query Log to Suggest Competitive Keyphrases for Sponsored Search Via I...
Mining Query Log to Suggest Competitive Keyphrases for Sponsored Search Via I...Mining Query Log to Suggest Competitive Keyphrases for Sponsored Search Via I...
Mining Query Log to Suggest Competitive Keyphrases for Sponsored Search Via I...
 

Recently uploaded

Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 

Recently uploaded (20)

Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 

Jiit 2013 14 project presentation aniket mishra

  • 1. Major Project Event Based News Clustering Submitted By: Aniket Mishra
  • 2. Problem Statement: • To implement a clustering system which can cluster the data which is related to it in one cluster and one can see what is happening in the next event. so basically i have to implement event based news clustering system using clustering algorithm.
  • 3. Implementation Steps Followed: • I have crawled data of election campaign Using BING API in different time periods. • Used sub categories AAP , BJP,Congress • Applied k-means first I have taken 10 clusters. • Then applied Modified K-means On data to improve it’s Efficiency. • Applied algorithm using tfidf ,centroid calculation,cosine similiarity.
  • 4. RSS Purity Rand Index K-means 73.52 65.9 .66 Modified K-means 73.70 71.5 .649 Table 1 shows the results obtained by our system for k-means and modified k-means algorithm. Table 1-Comparison of clustering results
  • 5. When calculating purity and rand index of k-means and modified k- means we found out that when we repeat the clusters for 10 times and get the initial k-points from each of the k different clusters rather than random restart for modified k-means it gives better results and give better purity as it can be.
  • 6. Results Demonstration These are the results in cluster 9 that are coming altogether making it related news as we can see all 4 news are related to Rahul Gandhi. I have taken the news on 29-05-14 and these results were scattered and by using k- means clustering they are clustered and we found out these results.
  • 7. As in this second example that I have taken we can see news is mostly related to Punjab unit of congress.so this is inferring that the news that I have taken correctly clustered. And we can also see that 2 news are also not related so It is not 100% pure clustered news.
  • 8. Conclusion • In this project I have designed and evaluated clustering system. Our clustering system crawls incoming news reports from Bing api and cluster them according to the event they are describing. The clustering is performed by representing incoming news reports as Bag of Word with TF-IDF weighting, and using a variation of k-means algorithm that works in a single pass without cluster re- organization. The number of cluster to produce is fixed for every query to 29 and new events are detected automatically. Clustering process takes 1-2 minutes to fetch news from website. • The evaluation results show that our system is very effective when clustering documents into highly specific clusters, but performs rather poorly when clustering documents into more general categories and it performs better for Modified k-means.
  • 9. Future Work: • It is my opinion that our clustering can be applied in other domains apart from online news. For example it can be applied successfully to the clustering of social media feed to produce clusters according to the item being discussed by different people. In my project in future a user interface for user can be created for better use. And we can also improve its scalability •