SlideShare a Scribd company logo
1 of 33
GUJARAT TECHNOLOGICAL UNIVERSITY
Introduction To Web Mining
and
Spatial Data Mining
Active Learning Assignment of
Data Ware Housing and Mining (3161610)
PREPARED BY
AARSH DHOKAI
DHARMAM SAVANI
GUIDED BY
PROF. RAVI PATEL
SIR
A. D. Patel Institute of Technology
• What is the Data Mining ?
• Data mining is a process of extracting
and discovering patterns in large data
sets involving methods at the
intersection of machine learning,
statistics, and database systems.
• What is the Web Mining ?
• Web Mining is the process of Data
Mining techniques to automatically
discover and extract information from
Web documents and services.
• The main purpose of web mining is
discovering useful information from the
World-Wide Web and its usage patterns.
D ATA M I N I N G V / S W E B M I N I N G
Points Data Mining Web Mining
Definition Data Mining is the process that attempts to
discover pattern and hidden knowledge in
large data sets in any system.
Web Mining is the process of data mining
techniques to automatically discover and extract
information from web documents.
Application Data Mining is very useful for to find pattern
in large batches of data.
Web Mining is very useful for a particular
website and e-service.
Performed By Data scientist and data engineers. Data scientists along with data analysts.
Access Data Mining access data privately. Web Mining access data publicly.
Structure Data Mining gets the information from
explicit structure.
Web Mining gets the information from
structured, unstructured and semi-structured
web pages.
Problem Type Clustering, classification, regression,
prediction, optimization and control.
Web content mining, Web structure mining, Web
usage mining
Tools It includes tools like machine learning
algorithms.
Special tools for web mining are Scrapy,
PageRank and Apache logs.
Skills It includes approaches for data cleansing,
machine learning algorithms. Statistics and
probability.
It includes application level knowledge, data
engineering with mathematical modules like
statistics and probability.
W H Y W E B
M I N I N G ?
• Web mining is the application of
data mining techniques to
discover patterns, structures,
and knowledge from the Web.
• The World Wide Web is fertile
source for data mining.
• The World Wide Web serves as
a huge, widely distributed,
global information center for
news, advertisements,
consumer information, financial
management, education,
government, and e-commerce.
T Y P E S O F W E B M I N I N G
Web Mining
Content
Mining
Structure
Mining
Usage
Mining
W E B
C O N T E N T
M I N I N G
• Web Content Mining is the process of extracting
useful information from the content of the web
documents.
• Web content consist of several types of data – text,
image, audio, video or structured records such as
lists and tables.
• Web content mining has been studied extensively by
researchers, search engines, and other web service
companies.
• Web content mining can build links across multiple
web pages for individuals; therefore, it has the
potential to inappropriately disclose personal
information.
W E B C O N T E N T M I N I N G
understand the
content of web
pages.
provide scalable
and informative
keyword-based
page indexing.
entity/concept
resolution.
web page
relevance and
ranking.
web page content
summaries.
other valuable
information related
to web search and
analysis.
Web content mining is done to:-
W E B
S T R U C T U R E
M I N I N G
• Web structure mining uses graph
theory to analyze the node and
connection structure of a web site.
According to the type of web
structural data.
• Web structure mining can be divided
into two kinds:
• Extracting patterns from
hyperlinks in the web:
a hyperlink is a structural
component that connects the
web page to a different location.
• Mining the document structure:
analysis of the tree-like structure
of page structures to
describe HTML or XML tag
usage.
• Web structure mining terminology:
• Web graph: directed
graph representing web.
• Node: web page in graph.
• Edge: hyperlinks.
• In degree: number of links
pointing to particular node.
• Out degree: number of links
generated from particular
node.
W E B S T R U C T U R E M I N I N G
Evaluate quality
of Web Page or
Ranking of web
pages
Give authority of
a page on a
topic
Deciding which
pages to crawl
Finding Related
Pages
Detection of
duplicated
pages
Example:-
Google page
rank algorithm
Web structure mining is done to :-
W E B
U S A G E
M I N I N G
• It is the is the process of extracting useful information
from server logs of users.
• It is classified in to three kind of data usage :
• Web Server Data: The web server including IP
address, page reference and access time
collects user logs.
• Application Server Data: Ability to track various
kinds of business events and log them in
application server logs.
• Application Level Data: Defining new kinds of
events and logging them by generating histories
of the events.
W E B U S A G E M I N I N G
finds patterns related to
general or particular
groups of users.
understands user’s
search patterns,
trends, and
associations.
predicts what users are
looking for on the
Internet.
helps improve search
efficiency and
effectiveness.
promotes products or
related information to
different groups of
users at the right time.
Web search companies
routinely conduct web
usage mining to
improve their quality of
service.
Web usage mining is done to :-
T O O L S F O R
W E B
M I N I N G
• R
• Oracle Data Mining
• Tableau
Web Usage Mining
• Scrapy(Python)
Web Content Mining
• HITS algorithm
• PageRank Algorithm
Web Structure Mining
A P P L I C AT I O N S
O F
W E B M I N I N G
I N B U S I N E S S
web mining enabled e-commerce to do personalized marketing, which
eventually results in higher trade volumes.
Companies can establish better customer relationship by understanding the
needs of the customer better and reacting to customer needs faster.
Companies can find, attract and retain customers; they can save on
production costs by utilizing the acquired insight of customer requirements.
S E C U R I T Y A N D
C R I M E
I N V E S T I G A T I O N
• Government agencies are using this
technology to classify threats and fight
against terrorism. The predicting capability
of mining applications can benefit society
by identifying criminal activities.
• Terrorist groups use the Web as their
infrastructure for various purposes.
• Web Usage Mining is aims to track down
online access to abnormal content, which
may include terrorist-generated sites, by
analyzing the content of information
accessed by the Web users.
S E A R C H
E N G I N E S
• Web mining helps to improve the power of web
search engine by classifying the web
documents and identifying the web pages.
• It is used for Web Searching e.g., Google,
Yahoo etc.
• The use of data mining in web search engine
helps in analyzing the content and at the same
time delivering results that are relevant for the
users. As a result, digital marketers who are
focused on creating valuable content for users
sure to benefit from the impact of data mining
on SEO.
A D VA N TA G E S
O F
W E B M I N I N G
The amount of information on the Web
is huge, and easily accessible.
The coverage of Web information is
very wide and diverse. One can find
information about almost anything.
Data of almost all types exist on the
Web, e.g., structured tables, texts,
multimedia data, etc.
Much of the Web information is linked.
There are hyperlinks among pages
within a site, and across different sites.
C H A L L E N G E S I N W E B M I N I N G
Much of the Web information is
redundant. The same piece of
information or its variants may
appear in many pages.
Much of the Web information is semi-
structured due to the nested
structure of HTML code.
The Web is noisy. A Web page
typically contains a mixture of many
kinds of information, e.g., main
contents, advertisements, navigation
panels, copyright notices, etc.
the Web is dynamic. Information on
the Web changes constantly.
Keeping up with the changes and
monitoring the changes are
important issues.
C H A L L E N G E S I N W E B M I N I N G
URL’s can be
tracked to
access the data.
Since data is
updatable it is
not trustable.
Multiplicity of
events and
URL’s.
Large amount of
data remain
unused.
Data may be
inaccurate.
Data may be
incomplete and
unavailable.
S PAT I A L D ATA M I N I N G
W H AT I S S PAT I A L
D ATA ?
• Spatial data is any data with a direct or indirect reference
to a specific location or geographical area.
• Spatial data is often referred to as geospatial data or
geographic information.
I N T R O D U C T I O N
T O
S P A T I A L D A T A
M I N I N G
Spatial data mining is the process of
discovering interesting, useful, non-
trivial patterns from large spatial
datasets.
Eg. Determining hotspots, unusual
locations.
Spatial Data Mining Tasks : continued
in further slide.
S PAT I A L D ATA M I N I N G TA S K S
• Classification :
• finds a set of rules which
determine the class of the
classified object according to
its attributes
• e. g. ” Classify remotely-sensed
images based on spectrum and
GIS data.
• Association Rules :
• find (spatially related) rules from the database.
Association rules describe patterns, which are often in
the database.
• The association rule has the following form: A → B
(s%, c%), where s is the support of the rule (the
probability, that A and B hold together in all the possible
cases) and c is the confidence (the conditional
probability that B is true under the condition of A.
• E. g. ” Rain (x, pour) = > landslide (x, happen), support is
76%, and confidence is 51%.”
S PAT I A L D ATA M I N I N G TA S K S
• Clustering :
• groups the object from database into clusters
in such a way that object in one cluster are
similar and objects from different clusters are
dissimilar.
• e. g. we can find clusters of cities with similar
level of un employment or we can cluster
pixels into similarity classes based on
spectral characteristics.
• Trend Detection :
• Finds trends in database. A trend is a
temporal pattern in some time series data. A
spatial trend is defined as a pattern of
change of a non-spatial attribute in the
neighborhood of a spatial object.
• e. g. ”Google Maps Traffic Detection”
S PAT I A L D ATA M I N I N G TA S K S
• Characteristic Rules :
• A common character of a kind of spatial entity, or
several kinds of spatial entities. A kind of tested
knowledge for summarizing similar features of
objects in a target class.
• e. g. ” Characterize similar ground objects in a
large set of remote sensing images.”.
• Discriminant Rules :
• Describe differences between two parts of
database.
• e. g. Compare land price in urban boundary and
land price in urban center.
S PAT I A L
D ATA B A S E
• Database is similar to a plain relational database, but in addition to
storing data on qualitative and quantitative attributes, spatial
databases store data about physical location and feature geometry
type.
• Every record in a spatial database is stored with numeric
coordinates that represent where that record occurs on a map and
each feature is represented by only one of these three geometry
types:
 Point
 Line
 Polygon
• Stores a large amount of space-related data
• Maps, Remote Sensing, Medical Imaging, VLSI chip layout
S PAT I A L D ATA B A S E
• Whether you want to calculate the distance between two places on a
map or determine the area of a particular piece of land, you can use
spatial database querying to quickly and easily make automated
spatial calculations on entire sets of records at one time.
• You can use spatial databases to perform almost all the same types of
calculations on — and manipulations of — attribute data that you can
in a plain relational database system.
S PAT I A L C L A S S I F I C AT I O N
• Analyze spatial objects to derive classification schemes, such as decision trees, in
relevance to certain spatial properties (district, highway, river)
• Classifying medium-size families according to income, region, and infant mortality
rates
• Mining Data for volcanoes on Venus
• Employ methods such as:
• Decision-tree classification, Naïve-Bayesian classifier + boosting, neural network, etc.
S PAT I A L
T R E N D
A N A LY S I S
• Detect changes and trends along a
spatial dimension.
• Study the trend of non-spatial or spatial
data changing with space.
Function
• Observe the trend of changes of the
climate.
• Crime rate or unemployment rate change
with regard to city geo- distribution.
• Traffic flows in highways and in cities.
Application examples
A P P L I C AT I O N S O F
S PAT I A L D ATA M I N I N G
Domain Spatial Data Mining Application
Public Safety Discovery of hotspot patterns from crime event maps
Epidemiology Detection of disease outbreak
Neuroscience Discovering patterns of human brain activity from
neuroimages
Climate
Science
Finding positive or negative correlations between
temperatures of distance places
Business Market allocation to maximize stores' profits
O T H E R A P P L I C AT I O N S
• Spatial data mining is used in
• Space technology : ISRO GPS SYSTEM
• Security : National Crime Records Bureau uses spatial data to
track down criminals
• GIS, Geo-marketing, Remote Sensing, Image database
exploration, medical imaging, Navigation
C H A L L E N G E S
I N S PAT I A L D ATA M I N I N G
• Complexity of spatial data types and access methods
• Large amounts of data Requires Huge Data storage
facilities.
T H A N K Y O U

More Related Content

What's hot

Data mining tasks
Data mining tasksData mining tasks
Data mining tasksKhwaja Aamer
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Usama Fayyaz
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Mohammad Junaid Khan
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.Institute of Technology Telkom
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningDataminingTools Inc
 
web mining
web miningweb mining
web miningArpit Verma
 
Ppt on data science
Ppt on data science Ppt on data science
Ppt on data science Ansh Budania
 
Database Management System
Database Management SystemDatabase Management System
Database Management SystemNishant Munjal
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine LearningSamra Shahzadi
 
Data warehousing
Data warehousingData warehousing
Data warehousingShruti Dalela
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingJason Rodrigues
 
Application of data mining
Application of data miningApplication of data mining
Application of data miningSHIVANI SONI
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptxmaha797959
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 

What's hot (20)

Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Problems, Problem spaces and Search
Problems, Problem spaces and SearchProblems, Problem spaces and Search
Problems, Problem spaces and Search
 
web mining
web miningweb mining
web mining
 
Kdd process
Kdd processKdd process
Kdd process
 
Ppt on data science
Ppt on data science Ppt on data science
Ppt on data science
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Data mining
Data mining Data mining
Data mining
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
OLAP
OLAPOLAP
OLAP
 

Similar to Introduction to Web Mining and Spatial Data Mining

The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Business Intelligence: A Rapidly Growing Option through Web Mining
Business Intelligence: A Rapidly Growing Option through Web  MiningBusiness Intelligence: A Rapidly Growing Option through Web  Mining
Business Intelligence: A Rapidly Growing Option through Web MiningIOSR Journals
 
Web Mining
Web MiningWeb Mining
Web MiningShobha Rani
 
Web mining and social media mining
Web mining and social media miningWeb mining and social media mining
Web mining and social media miningRoxana Tadayon
 
WEB MINING.pptx
WEB MINING.pptxWEB MINING.pptx
WEB MINING.pptxHarshithRaj21
 
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...IAEME Publication
 
Literature Survey on Web Mining
Literature Survey on Web MiningLiterature Survey on Web Mining
Literature Survey on Web MiningIOSR Journals
 
Web content mining
Web content miningWeb content mining
Web content miningAkanksha Dombe
 
Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)Mumbai Academisc
 
Web mining application &trends in data mining
Web mining application &trends in data miningWeb mining application &trends in data mining
Web mining application &trends in data miningPriyaKarnan3
 
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdfMeet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf09372002dedi
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
WEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdfWEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdfSowmyaJyothi3
 

Similar to Introduction to Web Mining and Spatial Data Mining (20)

Web mining
Web miningWeb mining
Web mining
 
Web mining
Web miningWeb mining
Web mining
 
Web mining
Web miningWeb mining
Web mining
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Minning www
Minning wwwMinning www
Minning www
 
Business Intelligence: A Rapidly Growing Option through Web Mining
Business Intelligence: A Rapidly Growing Option through Web  MiningBusiness Intelligence: A Rapidly Growing Option through Web  Mining
Business Intelligence: A Rapidly Growing Option through Web Mining
 
Web Mining
Web MiningWeb Mining
Web Mining
 
5463 26 web mining
5463 26 web mining5463 26 web mining
5463 26 web mining
 
Web mining and social media mining
Web mining and social media miningWeb mining and social media mining
Web mining and social media mining
 
Web
WebWeb
Web
 
WEB MINING.pptx
WEB MINING.pptxWEB MINING.pptx
WEB MINING.pptx
 
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
 
Literature Survey on Web Mining
Literature Survey on Web MiningLiterature Survey on Web Mining
Literature Survey on Web Mining
 
Minning WWW
Minning WWWMinning WWW
Minning WWW
 
Web content mining
Web content miningWeb content mining
Web content mining
 
Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)
 
Web mining application &trends in data mining
Web mining application &trends in data miningWeb mining application &trends in data mining
Web mining application &trends in data mining
 
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdfMeet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
WEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdfWEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdf
 

Recently uploaded

main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoĂŁo Esperancinha
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 

Recently uploaded (20)

Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 

Introduction to Web Mining and Spatial Data Mining

  • 1. GUJARAT TECHNOLOGICAL UNIVERSITY Introduction To Web Mining and Spatial Data Mining Active Learning Assignment of Data Ware Housing and Mining (3161610) PREPARED BY AARSH DHOKAI DHARMAM SAVANI GUIDED BY PROF. RAVI PATEL SIR A. D. Patel Institute of Technology
  • 2. • What is the Data Mining ? • Data mining is a process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. • What is the Web Mining ? • Web Mining is the process of Data Mining techniques to automatically discover and extract information from Web documents and services. • The main purpose of web mining is discovering useful information from the World-Wide Web and its usage patterns.
  • 3. D ATA M I N I N G V / S W E B M I N I N G Points Data Mining Web Mining Definition Data Mining is the process that attempts to discover pattern and hidden knowledge in large data sets in any system. Web Mining is the process of data mining techniques to automatically discover and extract information from web documents. Application Data Mining is very useful for to find pattern in large batches of data. Web Mining is very useful for a particular website and e-service. Performed By Data scientist and data engineers. Data scientists along with data analysts. Access Data Mining access data privately. Web Mining access data publicly. Structure Data Mining gets the information from explicit structure. Web Mining gets the information from structured, unstructured and semi-structured web pages. Problem Type Clustering, classification, regression, prediction, optimization and control. Web content mining, Web structure mining, Web usage mining Tools It includes tools like machine learning algorithms. Special tools for web mining are Scrapy, PageRank and Apache logs. Skills It includes approaches for data cleansing, machine learning algorithms. Statistics and probability. It includes application level knowledge, data engineering with mathematical modules like statistics and probability.
  • 4. W H Y W E B M I N I N G ? • Web mining is the application of data mining techniques to discover patterns, structures, and knowledge from the Web. • The World Wide Web is fertile source for data mining. • The World Wide Web serves as a huge, widely distributed, global information center for news, advertisements, consumer information, financial management, education, government, and e-commerce.
  • 5. T Y P E S O F W E B M I N I N G Web Mining Content Mining Structure Mining Usage Mining
  • 6. W E B C O N T E N T M I N I N G • Web Content Mining is the process of extracting useful information from the content of the web documents. • Web content consist of several types of data – text, image, audio, video or structured records such as lists and tables. • Web content mining has been studied extensively by researchers, search engines, and other web service companies. • Web content mining can build links across multiple web pages for individuals; therefore, it has the potential to inappropriately disclose personal information.
  • 7. W E B C O N T E N T M I N I N G understand the content of web pages. provide scalable and informative keyword-based page indexing. entity/concept resolution. web page relevance and ranking. web page content summaries. other valuable information related to web search and analysis. Web content mining is done to:-
  • 8. W E B S T R U C T U R E M I N I N G • Web structure mining uses graph theory to analyze the node and connection structure of a web site. According to the type of web structural data. • Web structure mining can be divided into two kinds: • Extracting patterns from hyperlinks in the web: a hyperlink is a structural component that connects the web page to a different location. • Mining the document structure: analysis of the tree-like structure of page structures to describe HTML or XML tag usage. • Web structure mining terminology: • Web graph: directed graph representing web. • Node: web page in graph. • Edge: hyperlinks. • In degree: number of links pointing to particular node. • Out degree: number of links generated from particular node.
  • 9. W E B S T R U C T U R E M I N I N G Evaluate quality of Web Page or Ranking of web pages Give authority of a page on a topic Deciding which pages to crawl Finding Related Pages Detection of duplicated pages Example:- Google page rank algorithm Web structure mining is done to :-
  • 10. W E B U S A G E M I N I N G • It is the is the process of extracting useful information from server logs of users. • It is classified in to three kind of data usage : • Web Server Data: The web server including IP address, page reference and access time collects user logs. • Application Server Data: Ability to track various kinds of business events and log them in application server logs. • Application Level Data: Defining new kinds of events and logging them by generating histories of the events.
  • 11. W E B U S A G E M I N I N G finds patterns related to general or particular groups of users. understands user’s search patterns, trends, and associations. predicts what users are looking for on the Internet. helps improve search efficiency and effectiveness. promotes products or related information to different groups of users at the right time. Web search companies routinely conduct web usage mining to improve their quality of service. Web usage mining is done to :-
  • 12. T O O L S F O R W E B M I N I N G • R • Oracle Data Mining • Tableau Web Usage Mining • Scrapy(Python) Web Content Mining • HITS algorithm • PageRank Algorithm Web Structure Mining
  • 13. A P P L I C AT I O N S O F W E B M I N I N G
  • 14. I N B U S I N E S S web mining enabled e-commerce to do personalized marketing, which eventually results in higher trade volumes. Companies can establish better customer relationship by understanding the needs of the customer better and reacting to customer needs faster. Companies can find, attract and retain customers; they can save on production costs by utilizing the acquired insight of customer requirements.
  • 15. S E C U R I T Y A N D C R I M E I N V E S T I G A T I O N • Government agencies are using this technology to classify threats and fight against terrorism. The predicting capability of mining applications can benefit society by identifying criminal activities. • Terrorist groups use the Web as their infrastructure for various purposes. • Web Usage Mining is aims to track down online access to abnormal content, which may include terrorist-generated sites, by analyzing the content of information accessed by the Web users.
  • 16. S E A R C H E N G I N E S • Web mining helps to improve the power of web search engine by classifying the web documents and identifying the web pages. • It is used for Web Searching e.g., Google, Yahoo etc. • The use of data mining in web search engine helps in analyzing the content and at the same time delivering results that are relevant for the users. As a result, digital marketers who are focused on creating valuable content for users sure to benefit from the impact of data mining on SEO.
  • 17. A D VA N TA G E S O F W E B M I N I N G The amount of information on the Web is huge, and easily accessible. The coverage of Web information is very wide and diverse. One can find information about almost anything. Data of almost all types exist on the Web, e.g., structured tables, texts, multimedia data, etc. Much of the Web information is linked. There are hyperlinks among pages within a site, and across different sites.
  • 18. C H A L L E N G E S I N W E B M I N I N G Much of the Web information is redundant. The same piece of information or its variants may appear in many pages. Much of the Web information is semi- structured due to the nested structure of HTML code. The Web is noisy. A Web page typically contains a mixture of many kinds of information, e.g., main contents, advertisements, navigation panels, copyright notices, etc. the Web is dynamic. Information on the Web changes constantly. Keeping up with the changes and monitoring the changes are important issues.
  • 19. C H A L L E N G E S I N W E B M I N I N G URL’s can be tracked to access the data. Since data is updatable it is not trustable. Multiplicity of events and URL’s. Large amount of data remain unused. Data may be inaccurate. Data may be incomplete and unavailable.
  • 20. S PAT I A L D ATA M I N I N G
  • 21. W H AT I S S PAT I A L D ATA ? • Spatial data is any data with a direct or indirect reference to a specific location or geographical area. • Spatial data is often referred to as geospatial data or geographic information.
  • 22. I N T R O D U C T I O N T O S P A T I A L D A T A M I N I N G Spatial data mining is the process of discovering interesting, useful, non- trivial patterns from large spatial datasets. Eg. Determining hotspots, unusual locations. Spatial Data Mining Tasks : continued in further slide.
  • 23. S PAT I A L D ATA M I N I N G TA S K S • Classification : • finds a set of rules which determine the class of the classified object according to its attributes • e. g. ” Classify remotely-sensed images based on spectrum and GIS data. • Association Rules : • find (spatially related) rules from the database. Association rules describe patterns, which are often in the database. • The association rule has the following form: A → B (s%, c%), where s is the support of the rule (the probability, that A and B hold together in all the possible cases) and c is the confidence (the conditional probability that B is true under the condition of A. • E. g. ” Rain (x, pour) = > landslide (x, happen), support is 76%, and confidence is 51%.”
  • 24. S PAT I A L D ATA M I N I N G TA S K S • Clustering : • groups the object from database into clusters in such a way that object in one cluster are similar and objects from different clusters are dissimilar. • e. g. we can find clusters of cities with similar level of un employment or we can cluster pixels into similarity classes based on spectral characteristics. • Trend Detection : • Finds trends in database. A trend is a temporal pattern in some time series data. A spatial trend is defined as a pattern of change of a non-spatial attribute in the neighborhood of a spatial object. • e. g. ”Google Maps Traffic Detection”
  • 25. S PAT I A L D ATA M I N I N G TA S K S • Characteristic Rules : • A common character of a kind of spatial entity, or several kinds of spatial entities. A kind of tested knowledge for summarizing similar features of objects in a target class. • e. g. ” Characterize similar ground objects in a large set of remote sensing images.”. • Discriminant Rules : • Describe differences between two parts of database. • e. g. Compare land price in urban boundary and land price in urban center.
  • 26. S PAT I A L D ATA B A S E • Database is similar to a plain relational database, but in addition to storing data on qualitative and quantitative attributes, spatial databases store data about physical location and feature geometry type. • Every record in a spatial database is stored with numeric coordinates that represent where that record occurs on a map and each feature is represented by only one of these three geometry types:  Point  Line  Polygon • Stores a large amount of space-related data • Maps, Remote Sensing, Medical Imaging, VLSI chip layout
  • 27. S PAT I A L D ATA B A S E • Whether you want to calculate the distance between two places on a map or determine the area of a particular piece of land, you can use spatial database querying to quickly and easily make automated spatial calculations on entire sets of records at one time. • You can use spatial databases to perform almost all the same types of calculations on — and manipulations of — attribute data that you can in a plain relational database system.
  • 28. S PAT I A L C L A S S I F I C AT I O N • Analyze spatial objects to derive classification schemes, such as decision trees, in relevance to certain spatial properties (district, highway, river) • Classifying medium-size families according to income, region, and infant mortality rates • Mining Data for volcanoes on Venus • Employ methods such as: • Decision-tree classification, NaĂŻve-Bayesian classifier + boosting, neural network, etc.
  • 29. S PAT I A L T R E N D A N A LY S I S • Detect changes and trends along a spatial dimension. • Study the trend of non-spatial or spatial data changing with space. Function • Observe the trend of changes of the climate. • Crime rate or unemployment rate change with regard to city geo- distribution. • Traffic flows in highways and in cities. Application examples
  • 30. A P P L I C AT I O N S O F S PAT I A L D ATA M I N I N G Domain Spatial Data Mining Application Public Safety Discovery of hotspot patterns from crime event maps Epidemiology Detection of disease outbreak Neuroscience Discovering patterns of human brain activity from neuroimages Climate Science Finding positive or negative correlations between temperatures of distance places Business Market allocation to maximize stores' profits
  • 31. O T H E R A P P L I C AT I O N S • Spatial data mining is used in • Space technology : ISRO GPS SYSTEM • Security : National Crime Records Bureau uses spatial data to track down criminals • GIS, Geo-marketing, Remote Sensing, Image database exploration, medical imaging, Navigation
  • 32. C H A L L E N G E S I N S PAT I A L D ATA M I N I N G • Complexity of spatial data types and access methods • Large amounts of data Requires Huge Data storage facilities.
  • 33. T H A N K Y O U