SlideShare a Scribd company logo
WEB MINING
Submitted by:
Dheeraj Kashnyal
dheerajkashnyal55@gmail.com
ETL Design &
Report Specifications
Introduction
• Web mining is the use of techniques to automatically discover and extract
information from Web documents and services.
• Various kinds of information extracted via Web Mining:
• Web activity, from server logs and Web browser activity tracking.
• Web graph, from links between pages, people and other data.
• Web content, for the data found on Web pages and inside of documents.
• The project is based on extracting values from
web pages and other documents found on the
web.
• This presentation covers the ETL design and
Report Specification portion.
Web Mining
Challenges
• The Web is noisy. A Web page typically contains a mixture of many kinds of
information, e.g., main contents, advertisements, navigation panels,
copyright notices, etc.
• The Web is dynamic. Information on the Web changes constantly. Keeping
up with the changes and monitoring the changes are important issues.
• Much of the Web information is redundant. The same piece of information
or its variants may appear in many pages.
• Information/data of almost all types exist on the Web, e.g., structured tables,
texts, multimedia data, etc.
• Much of the Web information is semi-structured due to the nested structure
of HTML code.
Web Mining
Data Flow of the System
Web Mining
WM_FACT
Datekey
TODkey
Visitorkey
Referrerkey
Statuskey
Objectkey
Browserkey
OSKey
Timestamp of Request
GMT_Diff
TimeViewed
BytesTransferred
DATE_DIM_TB
Datekey
Date
DayOfWeek
DayOfWeekNumber
WeekNumber
Week
MonthDay
MonthNumber
Month
Quarter
Year
BROWSER_DIM_TB
Browser key
Browser Type
Browser Name
OS_DIM_TB
OS_key
OS Name
OS Type
STATUS_DIM_TB
Status_key
Status Code
StatusDescription
StatusType
REFERRER_DIM_TB
Referrer_key
ReferringURL
ReferringSite
Keyword
OBJECT_DIM_TB
Object_key
URL
FileName
FileType
ObjectType
Object_size
Content Page
PageName
PageType
VISITOR_DIM_TB
Visitor_key
VisitorFlag
IPAddress
DomainName
CountryCode
Country
User_Name
TOD_DIM_TB
TODkey
TOD Lower
TOD Higher
Period of Day
Developing Data Model
ETL DESIGN
 Given Source to Target(Dimension) Data Mapping
 Given Data Sources are files
 Mapping of Dimension Table
• DIMENSION TABLES
• Date Dimension.
• TOD Dimension
• Visitor Dimension
• Object Dimension
• Referrer Dimension
• Status Dimension
• Browser Dimension
• Operating System Dimension
• FACT TABLE
• Click Stream Fact
Web Mining
Cont.……
• FACT TABLE
• Click Stream Fact
• The time in seconds the visitor has viewed a page on a
particular date and time is stored in the click stream fact as a
measure.
• The bytes transferred to the user machine from the web
server are stored as a measure.
• The referrer key points to referred dimension, which
provides information about the referrer of the page.
• Rest are the Foreign Keys of the respective dimensions .
Web Mining
Report Specifications
• Statistics of Visits
• The measures reported are:
• No. of visitors during the day
• No. of content pages access by all visitors
• No. of objects accessed by all visitors
• Total size of the data that is being delivered
• Most popular (most accessed) pages
• The report should show the following measures
• No. of visitors during the day accessing this content page
• No. of hits during the day for this content page
• The report should show only the Top 5 pages accessed
based on the No. of hits
Web Mining
Report Specifications
• Least popular (least accessed) pages
• The report should show the following measures
• No. of visitors during the day accessing this content page
• No. of hits during the day for this content page
• The report should show only the Top 5 pages where the no.
of hits are the lowest
• Location of the visitors
• The report should show the Date of the visit
• The report should list the location of the Visitor
• Country Code
• Country Name
Web Mining
• Most frequent visitors
• The report should list the Visitor’s details
• IP Address
• Domain name of the visitor’s IP Address
• The report should show the following measures
• No. of hits during the date range for this content page
• Total size of content delivered
• The report should show only the Top 5 visitors accessed
based on the No. of hits.
• Top Referrers and Keywords
• The report should show the Referrer Domain and the
keyword used
• The report should show the no. of hits during the date range
and the time period
Web Mining
Report Specifications
Report Specifications
• Most used Browsers and Operating systems
• The report should retrieve data for the date range between
from and to date.
• The report should show the no. of hits during the date range
and the time period.
Web Mining
Web Mining

More Related Content

What's hot

Web mining
Web miningWeb mining
Web mining
Rahul Mishra
 
Web Mining
Web MiningWeb Mining
Web Mining
Ziyad Abid
 
Web content mining
Web content miningWeb content mining
Web content mining
Akanksha Dombe
 
Web mining
Web miningWeb mining
Web mining
MohamadHayeri1
 
Web mining tools
Web mining toolsWeb mining tools
Web mining tools
Sujata Regoti
 
Web Mining
Web MiningWeb Mining
Web Mining
Mudit Dholakia
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
DataminingTools Inc
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
Monu Chaudhary
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
mahavir_a
 
Web Mining Presentation Final
Web Mining Presentation FinalWeb Mining Presentation Final
Web Mining Presentation Final
Er. Jagrat Gupta
 
Web mining
Web miningWeb mining
Web mining
Silicon
 
Web mining
Web miningWeb mining
Web mining
Iniya Kannan
 
Web mining
Web miningWeb mining
Web mining
Daminda Herath
 
Personal Web Usage Mining
Personal Web Usage MiningPersonal Web Usage Mining
Personal Web Usage Mining
Daminda Herath
 
Gaurav web mining
Gaurav web miningGaurav web mining
Gaurav web mining
Gaurav Uniyal
 
Web Content Mining
Web Content MiningWeb Content Mining
Web Content Mining
Daminda Herath
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data Mining
AarshDhokai
 
5463 26 web mining
5463 26 web mining5463 26 web mining
A survey on web usage mining techniques
A survey on web usage mining techniquesA survey on web usage mining techniques
A survey on web usage mining techniques
International Center for Research & Development
 
webmining overview
webmining overviewwebmining overview
webmining overview
abon
 

What's hot (20)

Web mining
Web miningWeb mining
Web mining
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Web content mining
Web content miningWeb content mining
Web content mining
 
Web mining
Web miningWeb mining
Web mining
 
Web mining tools
Web mining toolsWeb mining tools
Web mining tools
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
 
Web Mining Presentation Final
Web Mining Presentation FinalWeb Mining Presentation Final
Web Mining Presentation Final
 
Web mining
Web miningWeb mining
Web mining
 
Web mining
Web miningWeb mining
Web mining
 
Web mining
Web miningWeb mining
Web mining
 
Personal Web Usage Mining
Personal Web Usage MiningPersonal Web Usage Mining
Personal Web Usage Mining
 
Gaurav web mining
Gaurav web miningGaurav web mining
Gaurav web mining
 
Web Content Mining
Web Content MiningWeb Content Mining
Web Content Mining
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data Mining
 
5463 26 web mining
5463 26 web mining5463 26 web mining
5463 26 web mining
 
A survey on web usage mining techniques
A survey on web usage mining techniquesA survey on web usage mining techniques
A survey on web usage mining techniques
 
webmining overview
webmining overviewwebmining overview
webmining overview
 

Similar to Web mining

Web Analytics Primer
Web Analytics PrimerWeb Analytics Primer
Web Analytics Primer
Chad Richeson
 
Web mining
Web miningWeb mining
Web mining
Innovative Pencils
 
Adobe Digital Analytics - SiteCatalyst, Test & Target Workshop
Adobe Digital Analytics - SiteCatalyst, Test & Target WorkshopAdobe Digital Analytics - SiteCatalyst, Test & Target Workshop
Adobe Digital Analytics - SiteCatalyst, Test & Target Workshop
Digital Vidya
 
Web mining
Web miningWeb mining
Web mining
SwarnaLatha177
 
Pharma
PharmaPharma
Getting started with Compete PRO
Getting started with Compete PROGetting started with Compete PRO
Getting started with Compete PRO
Compete
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptx
ScrbifPt
 
Big data meet_up_08042016
Big data meet_up_08042016Big data meet_up_08042016
Big data meet_up_08042016
Mark Smith
 
e-business
e-business e-business
e-business
goranmisic
 
IRT Unit_4.pptx
IRT Unit_4.pptxIRT Unit_4.pptx
IRT Unit_4.pptx
thenmozhip8
 
Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...
yalisassoon
 
Web Site Hosting
Web Site HostingWeb Site Hosting
Web Site Hosting
webhostingguy
 
webservertrafficanalysis
webservertrafficanalysiswebservertrafficanalysis
webservertrafficanalysis
nitesh kanojiya
 
TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013
Avtex
 
Web technology unit I - Part A
Web technology unit I -  Part AWeb technology unit I -  Part A
Web technology unit I - Part A
SSN College of Engineering, Kalavakkam
 
web analytics overview
web analytics overviewweb analytics overview
web analytics overview
Masih Nabizadeh
 
Anatomy of Search Relevance: From Data To Action
Anatomy of Search Relevance: From Data To ActionAnatomy of Search Relevance: From Data To Action
Anatomy of Search Relevance: From Data To Action
Saïd Radhouani
 
Anatomy of Relevance - From Data to Action: Presented by Saïd Radhouani, Yell...
Anatomy of Relevance - From Data to Action: Presented by Saïd Radhouani, Yell...Anatomy of Relevance - From Data to Action: Presented by Saïd Radhouani, Yell...
Anatomy of Relevance - From Data to Action: Presented by Saïd Radhouani, Yell...
Lucidworks
 
1. web technology basics
1. web technology basics1. web technology basics
1. web technology basics
Jyoti Yadav
 
Turn the Focus From the Tool to the Business
Turn the Focus From the Tool to the BusinessTurn the Focus From the Tool to the Business
Turn the Focus From the Tool to the Business
Tamara Bredemus
 

Similar to Web mining (20)

Web Analytics Primer
Web Analytics PrimerWeb Analytics Primer
Web Analytics Primer
 
Web mining
Web miningWeb mining
Web mining
 
Adobe Digital Analytics - SiteCatalyst, Test & Target Workshop
Adobe Digital Analytics - SiteCatalyst, Test & Target WorkshopAdobe Digital Analytics - SiteCatalyst, Test & Target Workshop
Adobe Digital Analytics - SiteCatalyst, Test & Target Workshop
 
Web mining
Web miningWeb mining
Web mining
 
Pharma
PharmaPharma
Pharma
 
Getting started with Compete PRO
Getting started with Compete PROGetting started with Compete PRO
Getting started with Compete PRO
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptx
 
Big data meet_up_08042016
Big data meet_up_08042016Big data meet_up_08042016
Big data meet_up_08042016
 
e-business
e-business e-business
e-business
 
IRT Unit_4.pptx
IRT Unit_4.pptxIRT Unit_4.pptx
IRT Unit_4.pptx
 
Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...
 
Web Site Hosting
Web Site HostingWeb Site Hosting
Web Site Hosting
 
webservertrafficanalysis
webservertrafficanalysiswebservertrafficanalysis
webservertrafficanalysis
 
TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013
 
Web technology unit I - Part A
Web technology unit I -  Part AWeb technology unit I -  Part A
Web technology unit I - Part A
 
web analytics overview
web analytics overviewweb analytics overview
web analytics overview
 
Anatomy of Search Relevance: From Data To Action
Anatomy of Search Relevance: From Data To ActionAnatomy of Search Relevance: From Data To Action
Anatomy of Search Relevance: From Data To Action
 
Anatomy of Relevance - From Data to Action: Presented by Saïd Radhouani, Yell...
Anatomy of Relevance - From Data to Action: Presented by Saïd Radhouani, Yell...Anatomy of Relevance - From Data to Action: Presented by Saïd Radhouani, Yell...
Anatomy of Relevance - From Data to Action: Presented by Saïd Radhouani, Yell...
 
1. web technology basics
1. web technology basics1. web technology basics
1. web technology basics
 
Turn the Focus From the Tool to the Business
Turn the Focus From the Tool to the BusinessTurn the Focus From the Tool to the Business
Turn the Focus From the Tool to the Business
 

Recently uploaded

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 

Recently uploaded (20)

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 

Web mining

  • 1. WEB MINING Submitted by: Dheeraj Kashnyal dheerajkashnyal55@gmail.com ETL Design & Report Specifications
  • 2. Introduction • Web mining is the use of techniques to automatically discover and extract information from Web documents and services. • Various kinds of information extracted via Web Mining: • Web activity, from server logs and Web browser activity tracking. • Web graph, from links between pages, people and other data. • Web content, for the data found on Web pages and inside of documents. • The project is based on extracting values from web pages and other documents found on the web. • This presentation covers the ETL design and Report Specification portion. Web Mining
  • 3. Challenges • The Web is noisy. A Web page typically contains a mixture of many kinds of information, e.g., main contents, advertisements, navigation panels, copyright notices, etc. • The Web is dynamic. Information on the Web changes constantly. Keeping up with the changes and monitoring the changes are important issues. • Much of the Web information is redundant. The same piece of information or its variants may appear in many pages. • Information/data of almost all types exist on the Web, e.g., structured tables, texts, multimedia data, etc. • Much of the Web information is semi-structured due to the nested structure of HTML code. Web Mining
  • 4. Data Flow of the System Web Mining
  • 5. WM_FACT Datekey TODkey Visitorkey Referrerkey Statuskey Objectkey Browserkey OSKey Timestamp of Request GMT_Diff TimeViewed BytesTransferred DATE_DIM_TB Datekey Date DayOfWeek DayOfWeekNumber WeekNumber Week MonthDay MonthNumber Month Quarter Year BROWSER_DIM_TB Browser key Browser Type Browser Name OS_DIM_TB OS_key OS Name OS Type STATUS_DIM_TB Status_key Status Code StatusDescription StatusType REFERRER_DIM_TB Referrer_key ReferringURL ReferringSite Keyword OBJECT_DIM_TB Object_key URL FileName FileType ObjectType Object_size Content Page PageName PageType VISITOR_DIM_TB Visitor_key VisitorFlag IPAddress DomainName CountryCode Country User_Name TOD_DIM_TB TODkey TOD Lower TOD Higher Period of Day Developing Data Model
  • 6. ETL DESIGN  Given Source to Target(Dimension) Data Mapping  Given Data Sources are files  Mapping of Dimension Table • DIMENSION TABLES • Date Dimension. • TOD Dimension • Visitor Dimension • Object Dimension • Referrer Dimension • Status Dimension • Browser Dimension • Operating System Dimension • FACT TABLE • Click Stream Fact Web Mining
  • 7. Cont.…… • FACT TABLE • Click Stream Fact • The time in seconds the visitor has viewed a page on a particular date and time is stored in the click stream fact as a measure. • The bytes transferred to the user machine from the web server are stored as a measure. • The referrer key points to referred dimension, which provides information about the referrer of the page. • Rest are the Foreign Keys of the respective dimensions . Web Mining
  • 8. Report Specifications • Statistics of Visits • The measures reported are: • No. of visitors during the day • No. of content pages access by all visitors • No. of objects accessed by all visitors • Total size of the data that is being delivered • Most popular (most accessed) pages • The report should show the following measures • No. of visitors during the day accessing this content page • No. of hits during the day for this content page • The report should show only the Top 5 pages accessed based on the No. of hits Web Mining
  • 9. Report Specifications • Least popular (least accessed) pages • The report should show the following measures • No. of visitors during the day accessing this content page • No. of hits during the day for this content page • The report should show only the Top 5 pages where the no. of hits are the lowest • Location of the visitors • The report should show the Date of the visit • The report should list the location of the Visitor • Country Code • Country Name Web Mining
  • 10. • Most frequent visitors • The report should list the Visitor’s details • IP Address • Domain name of the visitor’s IP Address • The report should show the following measures • No. of hits during the date range for this content page • Total size of content delivered • The report should show only the Top 5 visitors accessed based on the No. of hits. • Top Referrers and Keywords • The report should show the Referrer Domain and the keyword used • The report should show the no. of hits during the date range and the time period Web Mining Report Specifications
  • 11. Report Specifications • Most used Browsers and Operating systems • The report should retrieve data for the date range between from and to date. • The report should show the no. of hits during the date range and the time period. Web Mining