SlideShare a Scribd company logo
Web Mining & Text Mining
Prepared by : Sharma Hemant
hemantbeast@gmail.com
Web Mining
Web Mining
 Web Mining is the application of data mining techniques to extract knowledge
from web data such as Web content, Web structure and Web usage data.
 It is the process of discovering the useful and previously unknown information
from the web data.
 Web data is :-
• Web content :- text, images, records, etc.
• Web structure :- hyperlinks, tags, etc.
• Web usage :- http logs, app server logs, etc.
Web Mining
Web Content Mining
 Web content mining performed by extracting useful information from the content
of a web page/site.
 It includes extraction of structured data/information from web pages,
identification, match, and integration of semantically similar data.
 The type of web content may consist of text, image, audio, video, etc. It is also
know as text mining.
 It uses the Natural Language Processing and Information Retrieval techniques for
mining the data.
Web Structure Mining
 The structure of a typical Web graph consists of Web pages as nodes, and
hyperlinks as edges connecting between two related pages.
 Web structure mining is the process of discovering structure information from the
web.
• This type of mining can be performed either at the (intra-page) document level or the
(inter-page) hyperlink level.
• The research at the hyperlink level is also called Hyperlink Analysis.
Web Structure Terminology
 Web-graph : A directed graph that represents the Web.
 Node : Each Web page represents a node of the Web-graph.
 Link : Each hyperlink on the Web is a directed edge of the Web-graph.
 In-degree : The number of distinct links that point to a node.
 Out-degree : The number of distinct links originating at a node that point to other
nodes.
 Directed Path : It is a sequence of links, starting from a node say r that can be
followed to reach another node say t.
 Shortest Path : The path with the shortest length out of all the paths between
nodes p and q.
 Diameter : It is the maximum of all the shortest paths between a pair of nodes p
and q, for all pairs of nodes p and q in the Web-graph.
Web Structure Terminology
Web Usage Mining
 A Web is a collection of inter-related files on one or more Web Servers.
 Discovery of meaningful patterns from data generated by client-server transaction
on one or more Web localities.
 Typical Sources of Data :
• Automatically generated data stored in server access logs, referrer logs, agent logs, and
client-side cookies.
• User profiles.
• Metadata : page attributes, content attributes, usage data.
 Web servers, Web proxies, and client application can quite easily capture Web
Usage data.
 Web Server Log : It is a file that is created by the server to record all the
activities it performs.
 For ex: When a user enters URL into the browsers address bar or requests by
clicking on a link.
 The page request sent to web server maintains the following info. in its log like
Information about URL, Whether the request was successful, Users IP address,
time and date, etc.
Web Usage Mining
Text Mining
Text Mining
 The objective of Text Mining is to exploit information contained in textual
documents in various ways, including discovery of patterns and trends in data,
associations among entities, predictive rules, etc.
 The results can be important both for :
• The analysis of the collection, and
• Providing intelligent navigation and browsing methods.
Text Mining Workflow
Data Mining vs Text Mining
 Both seek novel and useful pattern.
 Both are semi-automated process.
 Difference is the nature of the data:
• Structured versus Unstructured data
• Structured data: databases
• Unstructured data: word docs, pdf files, xml files, and so on
 Text mining – first, impose structure to the data, then mine the structured data.
Technology premise of Text Mining
 Summarization : It is a process of making summary of any document containing
large amount of information while theme or main idea of document is maintained.
 Information Extraction : It utilizes relations within the text. It uses pattern
matching for it.
 Categorization : It is a supervised learning technique which places the document
according to content. Document categorization is largely used in libraries.
 Visualization : It is computer graphic effect to represent information and
revealing relationships.
 Clustering : It is a document’s textual similarity based unsupervised technique
which is used by data analysis to divide the text into mutually exclusive groups.
 Question Answering : Natural language queries or questions answering is
responsible to decide a way find a more suitable answer for particular question.
 Sentiment Analysis : It is also known as opinion mining is configured of user’s
emotion, mostly into several classes which are positive, negative, neutral and
mixed. It is mainly used to get people’s view or attitude towards anything which
includes services and products.
Technology premise of Text Mining
Web Mining & Text Mining

More Related Content

What's hot

multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
moni sindhu
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
lavanya marichamy
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
Monu Chaudhary
 
The impact of web on ir
The impact of web on irThe impact of web on ir
The impact of web on ir
Primya Tamil
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
ShivanandaVSeeri
 
Web content mining
Web content miningWeb content mining
Web content mining
Akanksha Dombe
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
Archana Swaminathan
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
ramya marichamy
 
What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)
Pratik Tambekar
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
pyingkodi maran
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Salah Amean
 
Inverted index
Inverted indexInverted index
Inverted index
Krishna Gehlot
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
data mining
data miningdata mining
data mining
manasa polu
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
Data cube computation
Data cube computationData cube computation
Data cube computation
Rashmi Sheikh
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
kavitha muneeshwaran
 
Data Mining
Data MiningData Mining
Data Mining
SHIKHA GAUTAM
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
hktripathy
 

What's hot (20)

multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
 
The impact of web on ir
The impact of web on irThe impact of web on ir
The impact of web on ir
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Web content mining
Web content miningWeb content mining
Web content mining
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
 
What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
Inverted index
Inverted indexInverted index
Inverted index
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
data mining
data miningdata mining
data mining
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Data cube computation
Data cube computationData cube computation
Data cube computation
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 

Similar to Web Mining & Text Mining

WEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdfWEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdf
SowmyaJyothi3
 
Web personalization using clustering of web usage data
Web personalization using clustering of web usage dataWeb personalization using clustering of web usage data
Web personalization using clustering of web usage data
ijfcstjournal
 
Web mining
Web miningWeb mining
Web mining
SarthakSahoo8
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
Sai Kumar Ale
 
Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining
Jeremiah Fadugba
 
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
ijdkp
 
1 _text_mining_v0a
1  _text_mining_v0a1  _text_mining_v0a
1 _text_mining_v0a
saira gilani
 
Web mining (1)
Web mining (1)Web mining (1)
Web mining (1)
ajaybabu1314
 
WEB MINING.pptx
WEB MINING.pptxWEB MINING.pptx
WEB MINING.pptx
HarshithRaj21
 
Identifying the Number of Visitors to improve Website Usability from Educatio...
Identifying the Number of Visitors to improve Website Usability from Educatio...Identifying the Number of Visitors to improve Website Usability from Educatio...
Identifying the Number of Visitors to improve Website Usability from Educatio...
Editor IJCATR
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
IJERD Editor
 
Web Mining
Web MiningWeb Mining
Web Mining
Shobha Rani
 
Web mining
Web miningWeb mining
Web mining
SwarnaLatha177
 
Aa03401490154
Aa03401490154Aa03401490154
Aa03401490154
ijceronline
 
Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)
Mumbai Academisc
 
A Study On Web Structure Mining
A Study On Web Structure MiningA Study On Web Structure Mining
A Study On Web Structure Mining
Nicole Heredia
 
A Study on Web Structure Mining
A Study on Web Structure MiningA Study on Web Structure Mining
A Study on Web Structure Mining
IRJET Journal
 
ANALYSIS OF RESEARCH ISSUES IN WEB DATA MINING
ANALYSIS OF RESEARCH ISSUES IN WEB DATA MINING ANALYSIS OF RESEARCH ISSUES IN WEB DATA MINING
ANALYSIS OF RESEARCH ISSUES IN WEB DATA MINING
International Journal of Technical Research & Application
 
Minning www
Minning wwwMinning www
Minning www
Sonali Parab
 
An Improved Annotation Based Summary Generation For Unstructured Data
An Improved Annotation Based Summary Generation For Unstructured DataAn Improved Annotation Based Summary Generation For Unstructured Data
An Improved Annotation Based Summary Generation For Unstructured Data
Melinda Watson
 

Similar to Web Mining & Text Mining (20)

WEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdfWEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdf
 
Web personalization using clustering of web usage data
Web personalization using clustering of web usage dataWeb personalization using clustering of web usage data
Web personalization using clustering of web usage data
 
Web mining
Web miningWeb mining
Web mining
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining
 
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
 
1 _text_mining_v0a
1  _text_mining_v0a1  _text_mining_v0a
1 _text_mining_v0a
 
Web mining (1)
Web mining (1)Web mining (1)
Web mining (1)
 
WEB MINING.pptx
WEB MINING.pptxWEB MINING.pptx
WEB MINING.pptx
 
Identifying the Number of Visitors to improve Website Usability from Educatio...
Identifying the Number of Visitors to improve Website Usability from Educatio...Identifying the Number of Visitors to improve Website Usability from Educatio...
Identifying the Number of Visitors to improve Website Usability from Educatio...
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Web mining
Web miningWeb mining
Web mining
 
Aa03401490154
Aa03401490154Aa03401490154
Aa03401490154
 
Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)
 
A Study On Web Structure Mining
A Study On Web Structure MiningA Study On Web Structure Mining
A Study On Web Structure Mining
 
A Study on Web Structure Mining
A Study on Web Structure MiningA Study on Web Structure Mining
A Study on Web Structure Mining
 
ANALYSIS OF RESEARCH ISSUES IN WEB DATA MINING
ANALYSIS OF RESEARCH ISSUES IN WEB DATA MINING ANALYSIS OF RESEARCH ISSUES IN WEB DATA MINING
ANALYSIS OF RESEARCH ISSUES IN WEB DATA MINING
 
Minning www
Minning wwwMinning www
Minning www
 
An Improved Annotation Based Summary Generation For Unstructured Data
An Improved Annotation Based Summary Generation For Unstructured DataAn Improved Annotation Based Summary Generation For Unstructured Data
An Improved Annotation Based Summary Generation For Unstructured Data
 

More from Hemant Sharma

Types of Drivers in JDBC
Types of Drivers in JDBCTypes of Drivers in JDBC
Types of Drivers in JDBC
Hemant Sharma
 
Church Turing Thesis
Church Turing ThesisChurch Turing Thesis
Church Turing Thesis
Hemant Sharma
 
Double DES & Triple DES
Double DES & Triple DESDouble DES & Triple DES
Double DES & Triple DES
Hemant Sharma
 
Interaction Modeling
Interaction ModelingInteraction Modeling
Interaction Modeling
Hemant Sharma
 
Fundamentals of Language Processing
Fundamentals of Language ProcessingFundamentals of Language Processing
Fundamentals of Language Processing
Hemant Sharma
 
Fundamentals of Language Processing
Fundamentals of Language ProcessingFundamentals of Language Processing
Fundamentals of Language Processing
Hemant Sharma
 

More from Hemant Sharma (6)

Types of Drivers in JDBC
Types of Drivers in JDBCTypes of Drivers in JDBC
Types of Drivers in JDBC
 
Church Turing Thesis
Church Turing ThesisChurch Turing Thesis
Church Turing Thesis
 
Double DES & Triple DES
Double DES & Triple DESDouble DES & Triple DES
Double DES & Triple DES
 
Interaction Modeling
Interaction ModelingInteraction Modeling
Interaction Modeling
 
Fundamentals of Language Processing
Fundamentals of Language ProcessingFundamentals of Language Processing
Fundamentals of Language Processing
 
Fundamentals of Language Processing
Fundamentals of Language ProcessingFundamentals of Language Processing
Fundamentals of Language Processing
 

Recently uploaded

My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 

Recently uploaded (20)

My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 

Web Mining & Text Mining

  • 1. Web Mining & Text Mining Prepared by : Sharma Hemant hemantbeast@gmail.com
  • 3. Web Mining  Web Mining is the application of data mining techniques to extract knowledge from web data such as Web content, Web structure and Web usage data.  It is the process of discovering the useful and previously unknown information from the web data.  Web data is :- • Web content :- text, images, records, etc. • Web structure :- hyperlinks, tags, etc. • Web usage :- http logs, app server logs, etc.
  • 5. Web Content Mining  Web content mining performed by extracting useful information from the content of a web page/site.  It includes extraction of structured data/information from web pages, identification, match, and integration of semantically similar data.  The type of web content may consist of text, image, audio, video, etc. It is also know as text mining.  It uses the Natural Language Processing and Information Retrieval techniques for mining the data.
  • 6. Web Structure Mining  The structure of a typical Web graph consists of Web pages as nodes, and hyperlinks as edges connecting between two related pages.  Web structure mining is the process of discovering structure information from the web. • This type of mining can be performed either at the (intra-page) document level or the (inter-page) hyperlink level. • The research at the hyperlink level is also called Hyperlink Analysis.
  • 7. Web Structure Terminology  Web-graph : A directed graph that represents the Web.  Node : Each Web page represents a node of the Web-graph.  Link : Each hyperlink on the Web is a directed edge of the Web-graph.  In-degree : The number of distinct links that point to a node.  Out-degree : The number of distinct links originating at a node that point to other nodes.
  • 8.  Directed Path : It is a sequence of links, starting from a node say r that can be followed to reach another node say t.  Shortest Path : The path with the shortest length out of all the paths between nodes p and q.  Diameter : It is the maximum of all the shortest paths between a pair of nodes p and q, for all pairs of nodes p and q in the Web-graph. Web Structure Terminology
  • 9. Web Usage Mining  A Web is a collection of inter-related files on one or more Web Servers.  Discovery of meaningful patterns from data generated by client-server transaction on one or more Web localities.  Typical Sources of Data : • Automatically generated data stored in server access logs, referrer logs, agent logs, and client-side cookies. • User profiles. • Metadata : page attributes, content attributes, usage data.
  • 10.  Web servers, Web proxies, and client application can quite easily capture Web Usage data.  Web Server Log : It is a file that is created by the server to record all the activities it performs.  For ex: When a user enters URL into the browsers address bar or requests by clicking on a link.  The page request sent to web server maintains the following info. in its log like Information about URL, Whether the request was successful, Users IP address, time and date, etc. Web Usage Mining
  • 12. Text Mining  The objective of Text Mining is to exploit information contained in textual documents in various ways, including discovery of patterns and trends in data, associations among entities, predictive rules, etc.  The results can be important both for : • The analysis of the collection, and • Providing intelligent navigation and browsing methods.
  • 14. Data Mining vs Text Mining  Both seek novel and useful pattern.  Both are semi-automated process.  Difference is the nature of the data: • Structured versus Unstructured data • Structured data: databases • Unstructured data: word docs, pdf files, xml files, and so on  Text mining – first, impose structure to the data, then mine the structured data.
  • 15. Technology premise of Text Mining  Summarization : It is a process of making summary of any document containing large amount of information while theme or main idea of document is maintained.  Information Extraction : It utilizes relations within the text. It uses pattern matching for it.  Categorization : It is a supervised learning technique which places the document according to content. Document categorization is largely used in libraries.  Visualization : It is computer graphic effect to represent information and revealing relationships.
  • 16.  Clustering : It is a document’s textual similarity based unsupervised technique which is used by data analysis to divide the text into mutually exclusive groups.  Question Answering : Natural language queries or questions answering is responsible to decide a way find a more suitable answer for particular question.  Sentiment Analysis : It is also known as opinion mining is configured of user’s emotion, mostly into several classes which are positive, negative, neutral and mixed. It is mainly used to get people’s view or attitude towards anything which includes services and products. Technology premise of Text Mining