SlideShare a Scribd company logo
1 of 22
By: Shireen Fatima ()
Guide: Dr. Siddhartha Ghosh


Web Mining :Accomplishments & Future Directions
by Jaideep Srivastava



Mining the Web: Discovering Knowledge from
Hypertext Data by Soumen Chakrabati



Web Mining today and tomorrow by Kavita Sharma
and Vikas Kumar








Introduction
Applications
Challenges
Web Mining taxonomy
Solution to Search Engine Problem
Web Mining through cloud computing
Conclusion


Data mining: turn data into knowledge.



Web mining is the application of data mining
techniques to find interesting and potentially
useful knowledge from web data.
Web data is


Web content –text,image,records,etc.



Web structure –hyperlinks,tags,etc.



Web usage –http logs,app server logs,etc.


Personalized customer experience in ecommerce - Amazon.com



Web Search- Google



Web wide tracking - Double Click



Understanding Web communities- AOL



Understanding auction behavior - eBay



Personalized Portal for the Web - My Yahoo


Information filtering techniques try to learn
about users’ interests based on their evaluation
and actions, and then to use this information to
analyze new documents.



It Increase the value of each visitor. Improve the
visitor’s experience at the websites.



Web mining is attractive for companies, because
of several advantages.In the most general sense
it can contribute to the increase of profit.
Information is Huge.
Information is diverse.
 Information is redundant




Discovery of useful information from web
contents / data / documents



The data mining techniques applied are:
Classification
Clustering
Associations






Given:
-A source of textual
documents.
-Similarity measure
e.g., how many words
are common in these
documents?

•

Find:

Several clusters of documents
that are relevant to each other


Association Rules:
discovers

similarity
transactions

among

X =====> Y
where X,Y are sets of items,

sets

of

items

across

confidence or P(X v Y),

support or P(X^Y)
 Classification: is the task of generalizing known
structure to apply to new data.
 For example, an e-mail program might attempt to
classify an e-mail as "legitimate" or as "spam".








The structure of a typical Web graph consists of Web
pages as nodes, and hyperlinks as edges connecting
between two related pages.
Web Structure Mining is the process of discovering
structure information from the Web.
Web-graph: A directed graph that represents the Web.
‰
Node: Each Web page is a node of the Web-graph.
‰
Link: Each hyperlink on the Web is a directed edge of
the Web-graph


It deals with understanding user behavior in
interacting with the web or with a website.



To obtain information that may assist web
sites for reorganization or adaptation to
better suit the user.


Clustering and Classification
 clients who often access /products/software/webminer.html

tend to be from educational institutions.
 clients who placed an online order for software tend to be
students in the 20-25 age group
 75% of clients who download software from
/products/software/demos/ visit between 7:00 and 11:00
pm on weekends
Sequential patterns - A set of items is followed by
another item in time-order

Web usage examples
30% of clients who visited /products/software/, had done a

search in Yahoo using the keyword “software” before their
visit
60% of clients who placed an online order for WEBMINER,
placed another online order for software within 15 days


As the search engines use enormous information
existing in the web sites, web pages, it is a
challenging task to engineer, implement and to
improvise the search engine.



It helps in problems of how to effectively deal with
uncontrolled hypertext collection where anyone can
publish anything they want.


Web Mining Applications have been used by the
web sites such as Web search e.g., Google and
Yahoo ,Web Recommendations e.g., Amazon.com ,
Web Advertising e.g., Google and Yahoo.



Web site design e.g., landing page optimization


Cloud Computing is clearly one of today's most
seductive technology areas due at least in part to its
cost efficiency and flexibility.



Cloud Mining is new approach to faced search
interface for your data. SaS (Software-as-a-Service)
is used for reducing the cost of web mining and try
to provide security that become with cloud mining
technique.


Web Mining fills the information gap between web users
and web designers



Many successful techniques have been developed for the
mining the web



Cloud mining is the improvised method for web mining



The need for discovering new methods and techniques to
handle the amounts of data existing in this universe will
always exist.
Web mining

More Related Content

What's hot

Web Scraping and Data Extraction Service
Web Scraping and Data Extraction ServiceWeb Scraping and Data Extraction Service
Web Scraping and Data Extraction ServicePromptCloud
 
Web mining (structure mining)
Web mining (structure mining)Web mining (structure mining)
Web mining (structure mining)Amir Fahmideh
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalRoi Blanco
 
What is Web-scraping?
What is Web-scraping?What is Web-scraping?
What is Web-scraping?Yu-Chang Ho
 
Semantic web Document
Semantic web DocumentSemantic web Document
Semantic web Documentap
 
Web scraping in python
Web scraping in python Web scraping in python
Web scraping in python Viren Rajput
 
Multimedia content based retrieval slideshare.ppt
Multimedia content based retrieval slideshare.pptMultimedia content based retrieval slideshare.ppt
Multimedia content based retrieval slideshare.pptgovintech1
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data ScienceKenny Daniel
 
HITS + Pagerank
HITS + PagerankHITS + Pagerank
HITS + Pagerankajkt
 
MongoDB- Crud Operation
MongoDB- Crud OperationMongoDB- Crud Operation
MongoDB- Crud OperationEdureka!
 

What's hot (20)

Web mining
Web mining Web mining
Web mining
 
Web mining
Web miningWeb mining
Web mining
 
Web mining
Web miningWeb mining
Web mining
 
Web mining
Web miningWeb mining
Web mining
 
Web Scraping and Data Extraction Service
Web Scraping and Data Extraction ServiceWeb Scraping and Data Extraction Service
Web Scraping and Data Extraction Service
 
Web content mining
Web content miningWeb content mining
Web content mining
 
web mining
web miningweb mining
web mining
 
Web mining (structure mining)
Web mining (structure mining)Web mining (structure mining)
Web mining (structure mining)
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
What is Web-scraping?
What is Web-scraping?What is Web-scraping?
What is Web-scraping?
 
Web mining
Web miningWeb mining
Web mining
 
Semantic web Document
Semantic web DocumentSemantic web Document
Semantic web Document
 
Web scraping in python
Web scraping in python Web scraping in python
Web scraping in python
 
Multimedia content based retrieval slideshare.ppt
Multimedia content based retrieval slideshare.pptMultimedia content based retrieval slideshare.ppt
Multimedia content based retrieval slideshare.ppt
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
HITS + Pagerank
HITS + PagerankHITS + Pagerank
HITS + Pagerank
 
Content Management System
Content Management SystemContent Management System
Content Management System
 
MongoDB- Crud Operation
MongoDB- Crud OperationMongoDB- Crud Operation
MongoDB- Crud Operation
 

Viewers also liked

Web mining
Web miningWeb mining
Web miningSilicon
 
Web Content Filtering for Education and Schools - Webtitan Cloud Reseller Pre...
Web Content Filtering for Education and Schools - Webtitan Cloud Reseller Pre...Web Content Filtering for Education and Schools - Webtitan Cloud Reseller Pre...
Web Content Filtering for Education and Schools - Webtitan Cloud Reseller Pre...Dryden Geary
 
Internet Filtering and Blocking
Internet Filtering and BlockingInternet Filtering and Blocking
Internet Filtering and BlockingJoshua Sparks
 
US President Air force one ppt
US President Air force one pptUS President Air force one ppt
US President Air force one pptMaheshwar Mahe
 

Viewers also liked (10)

Web Mining
Web Mining Web Mining
Web Mining
 
Web content mining
Web content miningWeb content mining
Web content mining
 
Web mining
Web miningWeb mining
Web mining
 
Data mining
Data miningData mining
Data mining
 
Web Usage Pattern
Web Usage PatternWeb Usage Pattern
Web Usage Pattern
 
Web Content Filtering for Education and Schools - Webtitan Cloud Reseller Pre...
Web Content Filtering for Education and Schools - Webtitan Cloud Reseller Pre...Web Content Filtering for Education and Schools - Webtitan Cloud Reseller Pre...
Web Content Filtering for Education and Schools - Webtitan Cloud Reseller Pre...
 
Qantas
QantasQantas
Qantas
 
Web filtering through Software
Web filtering through SoftwareWeb filtering through Software
Web filtering through Software
 
Internet Filtering and Blocking
Internet Filtering and BlockingInternet Filtering and Blocking
Internet Filtering and Blocking
 
US President Air force one ppt
US President Air force one pptUS President Air force one ppt
US President Air force one ppt
 

Similar to Web mining

Business Intelligence: A Rapidly Growing Option through Web Mining
Business Intelligence: A Rapidly Growing Option through Web  MiningBusiness Intelligence: A Rapidly Growing Option through Web  Mining
Business Intelligence: A Rapidly Growing Option through Web MiningIOSR Journals
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.docbutest
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.docbutest
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technologyanchalsinghdm
 
Web mining and social media mining
Web mining and social media miningWeb mining and social media mining
Web mining and social media miningRoxana Tadayon
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...IOSR Journals
 
Web Search Engine, Web Crawler, and Semantics Web
Web Search Engine, Web Crawler, and Semantics WebWeb Search Engine, Web Crawler, and Semantics Web
Web Search Engine, Web Crawler, and Semantics WebAatif19921
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?Rackspace
 
Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)Mumbai Academisc
 
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...IAEME Publication
 
Web Page Recommendation Using Web Mining
Web Page Recommendation Using Web MiningWeb Page Recommendation Using Web Mining
Web Page Recommendation Using Web MiningIJERA Editor
 
Evolution Towards Web 3.0: The Semantic Web
Evolution Towards Web 3.0: The Semantic WebEvolution Towards Web 3.0: The Semantic Web
Evolution Towards Web 3.0: The Semantic WebLeeFeigenbaum
 

Similar to Web mining (20)

Business Intelligence: A Rapidly Growing Option through Web Mining
Business Intelligence: A Rapidly Growing Option through Web  MiningBusiness Intelligence: A Rapidly Growing Option through Web  Mining
Business Intelligence: A Rapidly Growing Option through Web Mining
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.doc
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.doc
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technology
 
Web mining and social media mining
Web mining and social media miningWeb mining and social media mining
Web mining and social media mining
 
Web
WebWeb
Web
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
 
Web Search Engine, Web Crawler, and Semantics Web
Web Search Engine, Web Crawler, and Semantics WebWeb Search Engine, Web Crawler, and Semantics Web
Web Search Engine, Web Crawler, and Semantics Web
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?
 
Pxc3893553
Pxc3893553Pxc3893553
Pxc3893553
 
H0314450
H0314450H0314450
H0314450
 
Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)
 
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
 
E017413647
E017413647E017413647
E017413647
 
Web Page Recommendation Using Web Mining
Web Page Recommendation Using Web MiningWeb Page Recommendation Using Web Mining
Web Page Recommendation Using Web Mining
 
E3602042044
E3602042044E3602042044
E3602042044
 
Web mining
Web miningWeb mining
Web mining
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
 
Evolution Towards Web 3.0: The Semantic Web
Evolution Towards Web 3.0: The Semantic WebEvolution Towards Web 3.0: The Semantic Web
Evolution Towards Web 3.0: The Semantic Web
 

Recently uploaded

Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Recently uploaded (20)

Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Web mining

  • 1. By: Shireen Fatima () Guide: Dr. Siddhartha Ghosh
  • 2.  Web Mining :Accomplishments & Future Directions by Jaideep Srivastava  Mining the Web: Discovering Knowledge from Hypertext Data by Soumen Chakrabati  Web Mining today and tomorrow by Kavita Sharma and Vikas Kumar
  • 3.        Introduction Applications Challenges Web Mining taxonomy Solution to Search Engine Problem Web Mining through cloud computing Conclusion
  • 4.  Data mining: turn data into knowledge.  Web mining is the application of data mining techniques to find interesting and potentially useful knowledge from web data.
  • 5. Web data is  Web content –text,image,records,etc.  Web structure –hyperlinks,tags,etc.  Web usage –http logs,app server logs,etc.
  • 6.  Personalized customer experience in ecommerce - Amazon.com  Web Search- Google  Web wide tracking - Double Click  Understanding Web communities- AOL  Understanding auction behavior - eBay  Personalized Portal for the Web - My Yahoo
  • 7.  Information filtering techniques try to learn about users’ interests based on their evaluation and actions, and then to use this information to analyze new documents.  It Increase the value of each visitor. Improve the visitor’s experience at the websites.  Web mining is attractive for companies, because of several advantages.In the most general sense it can contribute to the increase of profit.
  • 8. Information is Huge. Information is diverse.  Information is redundant  
  • 9.
  • 10.  Discovery of useful information from web contents / data / documents  The data mining techniques applied are: Classification Clustering Associations   
  • 11.  Given: -A source of textual documents. -Similarity measure e.g., how many words are common in these documents? • Find: Several clusters of documents that are relevant to each other
  • 12.  Association Rules: discovers similarity transactions among X =====> Y where X,Y are sets of items, sets of items across confidence or P(X v Y), support or P(X^Y)  Classification: is the task of generalizing known structure to apply to new data.  For example, an e-mail program might attempt to classify an e-mail as "legitimate" or as "spam".
  • 13.      The structure of a typical Web graph consists of Web pages as nodes, and hyperlinks as edges connecting between two related pages. Web Structure Mining is the process of discovering structure information from the Web. Web-graph: A directed graph that represents the Web. ‰ Node: Each Web page is a node of the Web-graph. ‰ Link: Each hyperlink on the Web is a directed edge of the Web-graph
  • 14.  It deals with understanding user behavior in interacting with the web or with a website.  To obtain information that may assist web sites for reorganization or adaptation to better suit the user.
  • 15.
  • 16.  Clustering and Classification  clients who often access /products/software/webminer.html tend to be from educational institutions.  clients who placed an online order for software tend to be students in the 20-25 age group  75% of clients who download software from /products/software/demos/ visit between 7:00 and 11:00 pm on weekends
  • 17. Sequential patterns - A set of items is followed by another item in time-order Web usage examples 30% of clients who visited /products/software/, had done a search in Yahoo using the keyword “software” before their visit 60% of clients who placed an online order for WEBMINER, placed another online order for software within 15 days
  • 18.  As the search engines use enormous information existing in the web sites, web pages, it is a challenging task to engineer, implement and to improvise the search engine.  It helps in problems of how to effectively deal with uncontrolled hypertext collection where anyone can publish anything they want.
  • 19.  Web Mining Applications have been used by the web sites such as Web search e.g., Google and Yahoo ,Web Recommendations e.g., Amazon.com , Web Advertising e.g., Google and Yahoo.  Web site design e.g., landing page optimization
  • 20.  Cloud Computing is clearly one of today's most seductive technology areas due at least in part to its cost efficiency and flexibility.  Cloud Mining is new approach to faced search interface for your data. SaS (Software-as-a-Service) is used for reducing the cost of web mining and try to provide security that become with cloud mining technique.
  • 21.  Web Mining fills the information gap between web users and web designers  Many successful techniques have been developed for the mining the web  Cloud mining is the improvised method for web mining  The need for discovering new methods and techniques to handle the amounts of data existing in this universe will always exist.