SlideShare a Scribd company logo
1 of 48
Download to read offline
April 2017
Business Economics & Information Technology
OUTLINE
◼Web mining
◼Data mining/Data mining techniques/ Data mining Algorithms
◼Social media mining
◼Text mining
◼Categories of web mining
Web content mining
Web Usage Mining
Web Structure Mining
https://orange.biolab.si/
WHAT IS WEB MINING?
Web Mining is the use of the data mining techniques to automatically discover and
extract information from web.
Web Mining can find interesting and potentially useful knowledge from web data
WHAT IS DATA MINING?
Data mining or knowledge discovery from data is the process of analyzing data from
different perspectives and summarizing it into useful information
Knowledge Discovery in Databases
Raw data knowledge
DATA MINING TECHNIQUES
 Clustering
 Classification
 Association Rules
 Correlation
 Naive Bayesian
 Neural Networks
 Outlier detection/ Anomaly detection
 Regression
 Logistic Regression
The most popular data mining techniques are:
DATA MINING
WHAT IS WEB DATA?
Web content –text , image, records, etc.
 Web structure – hyperlinks, tags, etc.
 Web usage –http log , app server logs ,etc
Intra-page structures- document level
 Inter-page structures- hyperlink level
 Supplemental data
 Profiles
 Registration information
Cookies
DATA MINING VS. WEB MINING
Data Mining
Data is structured and relational
Well-defined tables, columns, rows, keys, and constraints.
Web Mining
Semi-structured(HTML) and unstructured
EXAMPLE: ASSESSING CREDIT RISK
Situation: Person applies for a loan
Task: Should a bank approve the loan?
Note: People who have the best credit don’t need
the loans, and people with worst credit are not likely to repay.
Bank’s best customers are in the middle.
EXAMPLE: INSURANCE FRAUD
Insurance Fraud is the filing of a false claim to life, health, automobile, property or
other types of insurance benefits.
Insurance companies lose millions of dollars each year through fraudulent claims,
largely because they do not have a way to easily determine which claims are legitimate
and which may be fraudulent.
EXAMPLE: INSURANCE FRAUD
Data mining enables insurance companies to predict which insurance claims are likely
to be fraudulent.
http://www.hugin.com/solutions/fraud-detection-management/online-demonstration
OPPORTUNITIES & CHALLENGES
 The amount of information on the Web is huge
 The coverage of Web information is very wide and diverse.One can Find information
about almost anything. Information/data of almost all types exist on the Web. For
example, structured tables, texts, stream data, etc.
Much of the Web information is semi-structured due to the nested structure of HTML
code.
 Much of the Web information is linked. There are hyperlinks among pages within a
site, and across different sites.
 Much of the Web information is redundant. The same piece of information or its
variants may appear in many pages.
OPPORTUNITIES & CHALLENGES
The Web is noisy.A Webpage generally contains a mixture of many kinds of
information. For example: main contents, advertisements, navigation panels, copyright
notices, etc.
The Web is dynamic. New pages are constantly being generated. Keeping up with the
changes and monitoring the changes are important issues.
Above all, the Web is a virtual society. It is not only about data, information and
services, but also about interactions among people, organizations and automatic
systems,and communities.
APPLICATION OF WEB MINING IN E-COMMERECE
Customer Analyzing
Mined data help acquire new, retain existing customers, Improvement of merchant services and
profit by predicting customer online purchase behavior
◼What do the customers do?
◼What do the customers want?
◼How effectively use the web data to market products and to service the customer?
◼Whether customers are purposefully or just browsing?
◼Buying something they are familer with or something they know little about?
◼Are they shopping from home, from work or from a hotel?
Web personalization
According to the information from user behavior, a website can be designed and re-structured to
make it more advance and user-friendly. In addition, the image and product value of the
company is very important in satisfying customer need based on website quality.
Personalizing a website involves tailoring content based on the characteristics of each
individual user’s online behaviors.
Personalized content is often determined by user behaviors such as pages viewed, buttons
clicked and forms submitted.
APPLICATION OF WEB MINING IN E-COMMERECE
Product search & Recommendation
When the user searches for a product how we find the best results for the users?
Typically, a user query of a few keywords can match many products.
 Through large-scale data analysis of query logs, we can create graphs between queries and products, and
between different products.
 For example, the user who searches for “Verizon cell phones” might click on the Samsung SCH U940 Glyde
product, and the LG VX10000 Voyager. We now know the query is related to those two products, and the two
products have a relationship to each other since a user viewed (and perhaps considered buying) both.
APPLICATION OF WEB MINING IN E-COMMERECE
CATEGORIES OF WEB MINING
Web mining is divided into three categories:
1.Web Content Mining
2. Web Usage Mining
3. Web Structure Mining
WEB CONTENT MINING
To gather, categorize, organize and provide the best possible information available on the web to the user
requesting the information
The data may be unstructured or structured (data from a database) or semi-structured (html)
Content mining is the scanning and mining of text, pictures, video, audio and graphs of a Web page to
determine the relevance of the content to the search query
Content mining provides the results lists to search engines in order of highest relevance to the keywords in
the query
Web content mining is related to data mining and text mining Discovering useful information
from contents of Webpages
TEXT MINING
Text mining is the analysis of data contained in natural language text
Text mining attempts to derive meaning from the words and sentences in order to
classify documents, route messages appropriately, as well as create summaries of
content
Unstructured Data Examples: Email, Insurance Claim,
Web Pages, Technical Documents, Contracts
 https://www.nytimes.com/2016/09/24/us/politics/presidential-debate-hillary-clinton-donald-trump.html?_r=0
 https://www.youtube.com/watch?v=Ozo2QuCKml0
https://voyant-tools.org/
DATA MINING TECHNIQUES USING IN WEB CONTENT MINING
The more basic and popular data mining techniques in web content mining are:
Classification : Placing the documents into a predefined set of groups such as science articles, Political
articles, etc.
Clustering : Clustering is a technique used to group similar documents (is not done based on
predefined). As a result useful documents will not be omitted from the search results. Clustering helps the
user to easily select the topic of interest.
Summarization is used to reduce the length of the document by maintaining the main points. An
example for text Summarization is Microsoft word’s AutoSummarize
Visualization utilizes feature extraction and key term indexing to build a graphical representation.
Through visualization, documents having similarity are found out is useful to find out related topic from a
very large amount of documents. Examples: Word Cloud, Scatter Plot, Streamgraph, Tree map, Heat map,
Gantt Chart, etc.
WEB USAGE MINING
Web usage mining
 Is used to understand the customer behavior
Focuses on the discovering of potential knowledge from browsing patterns of the users.
Can discover the knowledge in the hidden browsing patterns and analyses the visiting characteristics of the
users.
The primary data source used in web usage mining is the server log-files (web-logs).
Browsing web pages by the user leaves a lot of information in the log-file.
Analyzing log-files information drives us to understand the behavior of the user
Techniques use for discovering the potential knowledge from the browsing patterns are:
Clustering
Classification
Association rule
40% of Online Shopper don't complete
their purchases
THE PHASES OF WEB USAGE MINING
CLASSIFICATION
Classification is the most familiar and most popular data mining technique for web usage
mining.
Data classification is the process of organizing data into categories for its most effective and
efficient use.
Classification technique uses to segment and classify observations
Example :
People with age less than 40 and salary more than 40000, trade
on line(Demographic segmentation ) .
 Blackberry was launched for users who were business people, Samsung was launched for
users who like android and like various applications for a free price, and Apple was launched for
the premium customers who want to be a part of a unique and popular niche(Behavioral
segmentation)
CLASSIFICATION
Classification consist of assigning a class label to a set of unclassified cases.
The goal of classification is to build a model that can be used to predict the class of records whose class
label is not Know.
CLASSIFICATION ALGORITHMS
The most popular classification algorithms are:
Decision trees
Logistic regression
Neural networks
k-nearest neighbors
DECISION TREES
◼A decision tree is a graph that uses a branching method to illustrate every possible outcome of a decision.
EXAMPLE
DECISION TREES
Customer online-
shopping
behavior
Decision Tree using Orange Data Mining
Analysing data in Orange using Decision tree.
Select file: Decision tree from Dataset Folder(On Fronter)
Exercise:
Explain the output of the Decision tree
CLUSTERING
◼Clustering is the process of dividing a dataset into groups such that the members of
each group are as similar as possible to one another and different groups are as
dissimilar as possible from one another
◼The most popular distance-based clustering algorithms is ‘k-means’.
K MEANS FOR CLUSTERING
K MEANS FOR CLUSTERING
K-Means Algorithm for Clustering
The number of car accident is
classified by population
CLUSTERING USING ORANGE
Select file: Clustering from Dataset Folder(On Fronter)
Select K-Means from Unsupervised Widget set.
Select MDC(Multidimensional scaling )
Unsupervised Widget set
Exercise:
Explain the output of the Clustering
to create a segmentation based only on buying behavior
https://archive.ics.uci.edu/ml/datasets/Wholesale+customers
ASSOCIATION RULE
Association rule finds interesting associations and correlation
relationships among large sets of data items.
Association rules show attribute value conditions that occur frequently
together in a given data set.
A typical example of association rule mining is Market Basket Analysis.
What items are frequently
bought together by customers?
EXAMPLE OF MARKET BASKET
Items are frequently
bought together by customers, should be
placed together in the store to maximize
sales.
PRODUCT OFFER & RECOMMENDATIONS
IF {milk, flour, sugar, eggs, candles} THEN {party hats, paper plates, magician}
Association analysis in Orange
Select file: Association Rulefrom Dataset Folder(On Fronter)
Select Data Table from Data at the Widget set.
Select Frequent Itemset from Associate
Select Association Rules from Associ
Exercise:
Explain the output of the Association
https://www.lynda.com/Business-Intelligence-tutorials/Association-analysis-
Orange/475936/529739-4.html
WEB STRUCTURE MINING
 The structure of a Web consists of Web pages as nodes, and hyperlinks as edges
connecting between two related pages
 The research at the hyperlink level is also called HYPERLINK
ANALYSIS
 Web structure mining is to study the relationship between the reference pages to find useful
patterns, and improve search quality by analyzing the links between pages
 Web structure Mining focuses on
Reducing irrelevant search results
Help indexing information on the web
Web Structure Terminology
Web-Graph: A directed graph that represent the web.
Node: Each Web page is a node of the Web-graph.
Link: Each hyperlink on the Web is a directed edge of the Web-graph.
In-degree: The in-degree of a node, p is the number of distinct links that
point to p.
Out-degree: The out-degree of a node, p is the number of distinct links
originating at p that point to other nodes.
Web Structure Terminology
Directed Path: A sequence of links, starting from p that can be followed to reach q.
Shortest Path: Of all the paths between nodes p and q, which has the shortest length, i.e.
number of links on it.
Diameter: The maximum of all the shortest paths between a pair of nodes p and q, for all pairs of
nodes p and q in the Web-graph (the length of the longest shortest path)
Hubs and authorities are ‘fans’ and ‘centers’ of a web graph
A good hub page is one that points to many good authority pages
A good authority page is one that is pointed to by many good hub pages
Hubs and Authorities
INTERESTING WEB STRUCTURE
Google’s Page Rank
Rank of a web page depends on the rank of the web pages
pointing to it
Hyperlink analysis algorithm assigns numerical weight to a
webpage
Page Rank increases effectiveness of search engines
To Climb to The Top of Google Search
SOCIAL MEDIA MINING
Social media mining is the process of representing, analyzing, and extracting actionable patterns and trends
from raw social media data.
Social media mining uses a range of basic concepts from computer science, data mining, machine learning,
and statistics.
Social media mining is based on theory from social network analysis(SNA)
Data mining techniques in social media mining are:
Graph Mining
Text Mining
SOCIAL NETWORK ANALYSIS
Social network analysis [SNA] is the mapping and measuring of relationships and flows between
people, groups, organizations, computers, and other connected information/knowledge entities.
The nodes in the network are the people and groups while the links show relationships or flows
between the nodes.
 SNA provides both a visual and a mathematical analysis of human relationships.
EXAMPLE:
Who knows whom and who shares what information
and knowledge with whom through what media.
GRAPH MINING
Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented as a grap
https://neo4j.com/download/
A Graph is a set of nodes and the
relationships that connect those nodes
 Nodes and Relationships contain
properties to represent data.
TEXT MINING
◼A social network contains a lot of data in the nodes of various forms. For example, a
social network may contain blogs, articles, messages, and etc.
◼ Common application for text mining is to aid in the automatic classification of texts.
For example, it is possible to "filter" out automatically most undesirable "junk email"
based on certain terms or words that are not likely to appear in legitimate messages
EXERCISE
Compare two competitive products in Social Media base on the comments on the products (Use text mining tool-Voyant)
SUMMARY
◼ Web mining
◼ Data mining
◼ Data mining techniques
◼ Web Data
◼ Applications of web mining in E-commerce
◼ Categories of web mining
 Web content mining
 Text mining
 Data mining
o Classification
o Clustering
o Summarization
o Visualization
 Web Usage Mining
 Clustering –K means algorithms
 Classification – Decision Tree
 Association rule –Basket Analysis
 Web Structure Mining
◼ Social Media Mining
 Graph Mining
 Text Mining

More Related Content

What's hot

Ch 2-introduction to dbms
Ch 2-introduction to dbmsCh 2-introduction to dbms
Ch 2-introduction to dbmsRupali Rana
 
Enterprise Architecture Planning (EAP) untuk EHMS Kementrian Kesehatan RI
Enterprise Architecture Planning (EAP) untuk EHMS Kementrian Kesehatan RIEnterprise Architecture Planning (EAP) untuk EHMS Kementrian Kesehatan RI
Enterprise Architecture Planning (EAP) untuk EHMS Kementrian Kesehatan RIWisnu Arimurti
 
Database management system1
Database management system1Database management system1
Database management system1jamwal85
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit IIpkaviya
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...EUDAT
 
Information Retrieval Techniques of Google
Information Retrieval Techniques of Google Information Retrieval Techniques of Google
Information Retrieval Techniques of Google Cyr Ish
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISrathnaarul
 
Difference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional ModelingDifference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional ModelingAbdul Aslam
 
Chapter 5 database security
Chapter 5   database securityChapter 5   database security
Chapter 5 database securitySyaiful Ahdan
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented DatabasesFabio Fumarola
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introductionnimmyjans4
 
Big Data Overview 2013-2014
Big Data Overview 2013-2014Big Data Overview 2013-2014
Big Data Overview 2013-2014KMS Technology
 
Information Seeking Behaviour in Electronic Environment: Issues and Trends
Information Seeking Behaviour in Electronic Environment: Issues and TrendsInformation Seeking Behaviour in Electronic Environment: Issues and Trends
Information Seeking Behaviour in Electronic Environment: Issues and TrendsDebashisnaskar
 

What's hot (20)

Ch 2-introduction to dbms
Ch 2-introduction to dbmsCh 2-introduction to dbms
Ch 2-introduction to dbms
 
Enterprise Architecture Planning (EAP) untuk EHMS Kementrian Kesehatan RI
Enterprise Architecture Planning (EAP) untuk EHMS Kementrian Kesehatan RIEnterprise Architecture Planning (EAP) untuk EHMS Kementrian Kesehatan RI
Enterprise Architecture Planning (EAP) untuk EHMS Kementrian Kesehatan RI
 
Database management system1
Database management system1Database management system1
Database management system1
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit II
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
 
Information Retrieval Techniques of Google
Information Retrieval Techniques of Google Information Retrieval Techniques of Google
Information Retrieval Techniques of Google
 
Data models
Data modelsData models
Data models
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
Difference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional ModelingDifference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional Modeling
 
Chapter 5 database security
Chapter 5   database securityChapter 5   database security
Chapter 5 database security
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
Cloud Management Mechanisms
Cloud Management MechanismsCloud Management Mechanisms
Cloud Management Mechanisms
 
Web mining
Web miningWeb mining
Web mining
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Advanced Database System
Advanced Database SystemAdvanced Database System
Advanced Database System
 
Big Data Overview 2013-2014
Big Data Overview 2013-2014Big Data Overview 2013-2014
Big Data Overview 2013-2014
 
Cs6703 grid and cloud computing unit 3
Cs6703 grid and cloud computing unit 3Cs6703 grid and cloud computing unit 3
Cs6703 grid and cloud computing unit 3
 
Information Seeking Behaviour in Electronic Environment: Issues and Trends
Information Seeking Behaviour in Electronic Environment: Issues and TrendsInformation Seeking Behaviour in Electronic Environment: Issues and Trends
Information Seeking Behaviour in Electronic Environment: Issues and Trends
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 

Similar to Web mining and social media mining

Business Intelligence: A Rapidly Growing Option through Web Mining
Business Intelligence: A Rapidly Growing Option through Web  MiningBusiness Intelligence: A Rapidly Growing Option through Web  Mining
Business Intelligence: A Rapidly Growing Option through Web MiningIOSR Journals
 
Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)Mumbai Academisc
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentationmillerca2
 
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engineankur881120
 
Structuring Serendipitous Collaboration
Structuring Serendipitous CollaborationStructuring Serendipitous Collaboration
Structuring Serendipitous CollaborationNick Inglis
 
IT8005_EC_Unit_IV_Internet_Marketing_Technologies
IT8005_EC_Unit_IV_Internet_Marketing_TechnologiesIT8005_EC_Unit_IV_Internet_Marketing_Technologies
IT8005_EC_Unit_IV_Internet_Marketing_TechnologiesPalani Kumar
 
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...IAEME Publication
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technologyanchalsinghdm
 
ANALYSIS OF CLICKSTREAM DATA
ANALYSIS OF CLICKSTREAM DATAANALYSIS OF CLICKSTREAM DATA
ANALYSIS OF CLICKSTREAM DATAIRJET Journal
 
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING ...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM  FOR E-COMMERCE WEBSITES USERS USING ...DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM  FOR E-COMMERCE WEBSITES USERS USING ...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING ...kevig
 
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...ijnlc
 

Similar to Web mining and social media mining (20)

Business Intelligence: A Rapidly Growing Option through Web Mining
Business Intelligence: A Rapidly Growing Option through Web  MiningBusiness Intelligence: A Rapidly Growing Option through Web  Mining
Business Intelligence: A Rapidly Growing Option through Web Mining
 
Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)
 
Data Mining
Data MiningData Mining
Data Mining
 
clickstream analysis
 clickstream analysis clickstream analysis
clickstream analysis
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentation
 
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
 
Structuring Serendipitous Collaboration
Structuring Serendipitous CollaborationStructuring Serendipitous Collaboration
Structuring Serendipitous Collaboration
 
IT8005_EC_Unit_IV_Internet_Marketing_Technologies
IT8005_EC_Unit_IV_Internet_Marketing_TechnologiesIT8005_EC_Unit_IV_Internet_Marketing_Technologies
IT8005_EC_Unit_IV_Internet_Marketing_Technologies
 
Web Scraping Services.pptx
Web Scraping Services.pptxWeb Scraping Services.pptx
Web Scraping Services.pptx
 
Web Mining
Web MiningWeb Mining
Web Mining
 
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
 
Web Content Mining
Web Content MiningWeb Content Mining
Web Content Mining
 
Web content mining
Web content miningWeb content mining
Web content mining
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
A Clustering Based Approach for knowledge discovery on web.
A Clustering Based Approach for knowledge discovery on web.A Clustering Based Approach for knowledge discovery on web.
A Clustering Based Approach for knowledge discovery on web.
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technology
 
ANALYSIS OF CLICKSTREAM DATA
ANALYSIS OF CLICKSTREAM DATAANALYSIS OF CLICKSTREAM DATA
ANALYSIS OF CLICKSTREAM DATA
 
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING ...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM  FOR E-COMMERCE WEBSITES USERS USING ...DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM  FOR E-COMMERCE WEBSITES USERS USING ...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING ...
 
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...
 

Recently uploaded

办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 

Recently uploaded (20)

办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 

Web mining and social media mining

  • 1. April 2017 Business Economics & Information Technology
  • 2. OUTLINE ◼Web mining ◼Data mining/Data mining techniques/ Data mining Algorithms ◼Social media mining ◼Text mining ◼Categories of web mining Web content mining Web Usage Mining Web Structure Mining https://orange.biolab.si/
  • 3. WHAT IS WEB MINING? Web Mining is the use of the data mining techniques to automatically discover and extract information from web. Web Mining can find interesting and potentially useful knowledge from web data
  • 4. WHAT IS DATA MINING? Data mining or knowledge discovery from data is the process of analyzing data from different perspectives and summarizing it into useful information Knowledge Discovery in Databases Raw data knowledge
  • 5. DATA MINING TECHNIQUES  Clustering  Classification  Association Rules  Correlation  Naive Bayesian  Neural Networks  Outlier detection/ Anomaly detection  Regression  Logistic Regression The most popular data mining techniques are:
  • 7. WHAT IS WEB DATA? Web content –text , image, records, etc.  Web structure – hyperlinks, tags, etc.  Web usage –http log , app server logs ,etc Intra-page structures- document level  Inter-page structures- hyperlink level  Supplemental data  Profiles  Registration information Cookies
  • 8. DATA MINING VS. WEB MINING Data Mining Data is structured and relational Well-defined tables, columns, rows, keys, and constraints. Web Mining Semi-structured(HTML) and unstructured
  • 9. EXAMPLE: ASSESSING CREDIT RISK Situation: Person applies for a loan Task: Should a bank approve the loan? Note: People who have the best credit don’t need the loans, and people with worst credit are not likely to repay. Bank’s best customers are in the middle.
  • 10. EXAMPLE: INSURANCE FRAUD Insurance Fraud is the filing of a false claim to life, health, automobile, property or other types of insurance benefits. Insurance companies lose millions of dollars each year through fraudulent claims, largely because they do not have a way to easily determine which claims are legitimate and which may be fraudulent.
  • 11. EXAMPLE: INSURANCE FRAUD Data mining enables insurance companies to predict which insurance claims are likely to be fraudulent. http://www.hugin.com/solutions/fraud-detection-management/online-demonstration
  • 12. OPPORTUNITIES & CHALLENGES  The amount of information on the Web is huge  The coverage of Web information is very wide and diverse.One can Find information about almost anything. Information/data of almost all types exist on the Web. For example, structured tables, texts, stream data, etc. Much of the Web information is semi-structured due to the nested structure of HTML code.  Much of the Web information is linked. There are hyperlinks among pages within a site, and across different sites.  Much of the Web information is redundant. The same piece of information or its variants may appear in many pages.
  • 13. OPPORTUNITIES & CHALLENGES The Web is noisy.A Webpage generally contains a mixture of many kinds of information. For example: main contents, advertisements, navigation panels, copyright notices, etc. The Web is dynamic. New pages are constantly being generated. Keeping up with the changes and monitoring the changes are important issues. Above all, the Web is a virtual society. It is not only about data, information and services, but also about interactions among people, organizations and automatic systems,and communities.
  • 14. APPLICATION OF WEB MINING IN E-COMMERECE Customer Analyzing Mined data help acquire new, retain existing customers, Improvement of merchant services and profit by predicting customer online purchase behavior ◼What do the customers do? ◼What do the customers want? ◼How effectively use the web data to market products and to service the customer? ◼Whether customers are purposefully or just browsing? ◼Buying something they are familer with or something they know little about? ◼Are they shopping from home, from work or from a hotel?
  • 15. Web personalization According to the information from user behavior, a website can be designed and re-structured to make it more advance and user-friendly. In addition, the image and product value of the company is very important in satisfying customer need based on website quality. Personalizing a website involves tailoring content based on the characteristics of each individual user’s online behaviors. Personalized content is often determined by user behaviors such as pages viewed, buttons clicked and forms submitted. APPLICATION OF WEB MINING IN E-COMMERECE
  • 16. Product search & Recommendation When the user searches for a product how we find the best results for the users? Typically, a user query of a few keywords can match many products.  Through large-scale data analysis of query logs, we can create graphs between queries and products, and between different products.  For example, the user who searches for “Verizon cell phones” might click on the Samsung SCH U940 Glyde product, and the LG VX10000 Voyager. We now know the query is related to those two products, and the two products have a relationship to each other since a user viewed (and perhaps considered buying) both. APPLICATION OF WEB MINING IN E-COMMERECE
  • 17. CATEGORIES OF WEB MINING Web mining is divided into three categories: 1.Web Content Mining 2. Web Usage Mining 3. Web Structure Mining
  • 18. WEB CONTENT MINING To gather, categorize, organize and provide the best possible information available on the web to the user requesting the information The data may be unstructured or structured (data from a database) or semi-structured (html) Content mining is the scanning and mining of text, pictures, video, audio and graphs of a Web page to determine the relevance of the content to the search query Content mining provides the results lists to search engines in order of highest relevance to the keywords in the query Web content mining is related to data mining and text mining Discovering useful information from contents of Webpages
  • 19. TEXT MINING Text mining is the analysis of data contained in natural language text Text mining attempts to derive meaning from the words and sentences in order to classify documents, route messages appropriately, as well as create summaries of content Unstructured Data Examples: Email, Insurance Claim, Web Pages, Technical Documents, Contracts  https://www.nytimes.com/2016/09/24/us/politics/presidential-debate-hillary-clinton-donald-trump.html?_r=0  https://www.youtube.com/watch?v=Ozo2QuCKml0 https://voyant-tools.org/
  • 20. DATA MINING TECHNIQUES USING IN WEB CONTENT MINING The more basic and popular data mining techniques in web content mining are: Classification : Placing the documents into a predefined set of groups such as science articles, Political articles, etc. Clustering : Clustering is a technique used to group similar documents (is not done based on predefined). As a result useful documents will not be omitted from the search results. Clustering helps the user to easily select the topic of interest. Summarization is used to reduce the length of the document by maintaining the main points. An example for text Summarization is Microsoft word’s AutoSummarize Visualization utilizes feature extraction and key term indexing to build a graphical representation. Through visualization, documents having similarity are found out is useful to find out related topic from a very large amount of documents. Examples: Word Cloud, Scatter Plot, Streamgraph, Tree map, Heat map, Gantt Chart, etc.
  • 21. WEB USAGE MINING Web usage mining  Is used to understand the customer behavior Focuses on the discovering of potential knowledge from browsing patterns of the users. Can discover the knowledge in the hidden browsing patterns and analyses the visiting characteristics of the users. The primary data source used in web usage mining is the server log-files (web-logs). Browsing web pages by the user leaves a lot of information in the log-file. Analyzing log-files information drives us to understand the behavior of the user Techniques use for discovering the potential knowledge from the browsing patterns are: Clustering Classification Association rule 40% of Online Shopper don't complete their purchases
  • 22. THE PHASES OF WEB USAGE MINING
  • 23. CLASSIFICATION Classification is the most familiar and most popular data mining technique for web usage mining. Data classification is the process of organizing data into categories for its most effective and efficient use. Classification technique uses to segment and classify observations Example : People with age less than 40 and salary more than 40000, trade on line(Demographic segmentation ) .  Blackberry was launched for users who were business people, Samsung was launched for users who like android and like various applications for a free price, and Apple was launched for the premium customers who want to be a part of a unique and popular niche(Behavioral segmentation)
  • 24. CLASSIFICATION Classification consist of assigning a class label to a set of unclassified cases. The goal of classification is to build a model that can be used to predict the class of records whose class label is not Know.
  • 25. CLASSIFICATION ALGORITHMS The most popular classification algorithms are: Decision trees Logistic regression Neural networks k-nearest neighbors
  • 26. DECISION TREES ◼A decision tree is a graph that uses a branching method to illustrate every possible outcome of a decision. EXAMPLE
  • 28. Decision Tree using Orange Data Mining Analysing data in Orange using Decision tree. Select file: Decision tree from Dataset Folder(On Fronter) Exercise: Explain the output of the Decision tree
  • 29. CLUSTERING ◼Clustering is the process of dividing a dataset into groups such that the members of each group are as similar as possible to one another and different groups are as dissimilar as possible from one another ◼The most popular distance-based clustering algorithms is ‘k-means’.
  • 30. K MEANS FOR CLUSTERING
  • 31. K MEANS FOR CLUSTERING K-Means Algorithm for Clustering The number of car accident is classified by population
  • 32. CLUSTERING USING ORANGE Select file: Clustering from Dataset Folder(On Fronter) Select K-Means from Unsupervised Widget set. Select MDC(Multidimensional scaling ) Unsupervised Widget set Exercise: Explain the output of the Clustering to create a segmentation based only on buying behavior https://archive.ics.uci.edu/ml/datasets/Wholesale+customers
  • 33. ASSOCIATION RULE Association rule finds interesting associations and correlation relationships among large sets of data items. Association rules show attribute value conditions that occur frequently together in a given data set. A typical example of association rule mining is Market Basket Analysis. What items are frequently bought together by customers?
  • 34. EXAMPLE OF MARKET BASKET Items are frequently bought together by customers, should be placed together in the store to maximize sales.
  • 35. PRODUCT OFFER & RECOMMENDATIONS IF {milk, flour, sugar, eggs, candles} THEN {party hats, paper plates, magician}
  • 36. Association analysis in Orange Select file: Association Rulefrom Dataset Folder(On Fronter) Select Data Table from Data at the Widget set. Select Frequent Itemset from Associate Select Association Rules from Associ Exercise: Explain the output of the Association https://www.lynda.com/Business-Intelligence-tutorials/Association-analysis- Orange/475936/529739-4.html
  • 37. WEB STRUCTURE MINING  The structure of a Web consists of Web pages as nodes, and hyperlinks as edges connecting between two related pages  The research at the hyperlink level is also called HYPERLINK ANALYSIS  Web structure mining is to study the relationship between the reference pages to find useful patterns, and improve search quality by analyzing the links between pages  Web structure Mining focuses on Reducing irrelevant search results Help indexing information on the web
  • 38. Web Structure Terminology Web-Graph: A directed graph that represent the web. Node: Each Web page is a node of the Web-graph. Link: Each hyperlink on the Web is a directed edge of the Web-graph. In-degree: The in-degree of a node, p is the number of distinct links that point to p. Out-degree: The out-degree of a node, p is the number of distinct links originating at p that point to other nodes.
  • 39. Web Structure Terminology Directed Path: A sequence of links, starting from p that can be followed to reach q. Shortest Path: Of all the paths between nodes p and q, which has the shortest length, i.e. number of links on it. Diameter: The maximum of all the shortest paths between a pair of nodes p and q, for all pairs of nodes p and q in the Web-graph (the length of the longest shortest path)
  • 40. Hubs and authorities are ‘fans’ and ‘centers’ of a web graph A good hub page is one that points to many good authority pages A good authority page is one that is pointed to by many good hub pages Hubs and Authorities
  • 42. Google’s Page Rank Rank of a web page depends on the rank of the web pages pointing to it Hyperlink analysis algorithm assigns numerical weight to a webpage Page Rank increases effectiveness of search engines To Climb to The Top of Google Search
  • 43. SOCIAL MEDIA MINING Social media mining is the process of representing, analyzing, and extracting actionable patterns and trends from raw social media data. Social media mining uses a range of basic concepts from computer science, data mining, machine learning, and statistics. Social media mining is based on theory from social network analysis(SNA) Data mining techniques in social media mining are: Graph Mining Text Mining
  • 44. SOCIAL NETWORK ANALYSIS Social network analysis [SNA] is the mapping and measuring of relationships and flows between people, groups, organizations, computers, and other connected information/knowledge entities. The nodes in the network are the people and groups while the links show relationships or flows between the nodes.  SNA provides both a visual and a mathematical analysis of human relationships. EXAMPLE: Who knows whom and who shares what information and knowledge with whom through what media.
  • 45. GRAPH MINING Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented as a grap https://neo4j.com/download/ A Graph is a set of nodes and the relationships that connect those nodes  Nodes and Relationships contain properties to represent data.
  • 46. TEXT MINING ◼A social network contains a lot of data in the nodes of various forms. For example, a social network may contain blogs, articles, messages, and etc. ◼ Common application for text mining is to aid in the automatic classification of texts. For example, it is possible to "filter" out automatically most undesirable "junk email" based on certain terms or words that are not likely to appear in legitimate messages
  • 47. EXERCISE Compare two competitive products in Social Media base on the comments on the products (Use text mining tool-Voyant)
  • 48. SUMMARY ◼ Web mining ◼ Data mining ◼ Data mining techniques ◼ Web Data ◼ Applications of web mining in E-commerce ◼ Categories of web mining  Web content mining  Text mining  Data mining o Classification o Clustering o Summarization o Visualization  Web Usage Mining  Clustering –K means algorithms  Classification – Decision Tree  Association rule –Basket Analysis  Web Structure Mining ◼ Social Media Mining  Graph Mining  Text Mining