SlideShare a Scribd company logo
1 of 63
Download to read offline
PROJECT 1: Analyzing clickstream data
On a Web site, clickstream analysis (sometimes called clickstream analytics) is the process of collecting, analyzing, and
reporting aggregate data about which pages visitors visit in what order - which are the result of the succession of mouse
clicks each visitor makes (that is, the clickstream).
Download Link
1. Loading the data files into HDFS
2. Starting the new Beeline shell (hive-server 2)
3. Creating new database – alabs_db
4.Creating and loading HIVE table – users
5. All 3 HIVE base tables – omniturelogs, products and users created
6. Content of HIVE script – webanalytics.sql
6. Using webanalytics.sql, omniture and webanalytics tables are created
7. Creating omniture2 view
PROJECT 2: Sentiment
Analysis/Opinion Mining
Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and
computational linguistics to identify and extract subjective information in source materials. Sentiment analysis is widely
applied to reviews and social media for a variety of applications, ranging from marketing to customer service.
Data Download Link
Tableau Link
1. Loading the data files into HDFS
2. Content of twitter_conf.conf file
3. Executing the TwitterAgent flume agent using twitter_conf.conf file
4. Twitter data moved to HDFS
5. Content of tweets.sql file
6. Executing tweets.sql to create tables and views for analysis
7. Tables and views for analysis are created
Tweets ID sentiment
PROJECT 3: Lending
Club Loan Analysis
Lending Club is a US peer-to-peer lending company. Lending Club operates an online lending platform that enables
borrowers to obtain a loan, and investors to purchase notes backed by payments made on loans. Lending Club is the
world's largest peer-to-peer lending platform.
Data Download Link
Tableau Link
1. Loading the data files into HDFS
2. Content of loan_analysis.sql file
3. Tables and view created using loan_analysis.sql
PROJECT 4: HVAC
Temperature Analysis
HVAC (stands for Heating, Ventilation and Air Conditioning) equipment needs a control system to regulate the operation of
a heating and/or air conditioning system. Usually a sensing device is used to compare the actual state (e.g. temperature)
with a target state. Then the control system draws a conclusion what action has to be taken.
Data Download Link
Tableau Link
1. Loading the data files into HDFS
2. Content of sensor_analysis.sql file
3. Tables and view created using sensor_analysis.sql
PROJECT 5: Upsell Analysis
Upselling is a sales technique whereby a seller induces the customer to purchase more expensive items, upgrades or other
add-ons in an attempt to make a more profitable sale.
Data Download Link
1. Sample data
2. Content of upsell_analysis.sql file
A
B
C
3. A
What is A doing?
• Concatenates first name and last name to a single field – name
• Assigns each customer a category
• Calculates the total amount spent by the customer in each category
• Order customers by the total amount spent in descending order
4. B
4.1 What is B doing?
• Extracts name from A
• Each customer is assigned his respective categories using COLLECT_LIST() function which converts
multiple rows to a single row of array datatype
• Each customer is assigned his respective amount spent on those categories
• Calculating the overall total amount spent by each customer on all categories
• Evaluating the recommended category for each customer as per the amount spent per category
4.2 Sample data of B
5. Sample data after C
PROJECT 6: Web Logs’ Analysis
An access log is a list of all the requests for individual files that people have requested from a Web site. These files will
include the HTML files and their imbedded graphic images and any other associated files that get transmitted. The access
log (sometimes referred to as the "raw data") can be analysed and summarized by another program.
Data Download Link
Tableau Link
1. Accessing apache access logs using flume
1.1 flume.conf
1.2 Extract web logs’ data using the following command:
/usr/lib/flume-ng/bin/flume-ng agent –n source_agent –c conf –f /usr/lib/flume-
ng/conf/flume.conf
2. Sample log data
3. Moving log file to HDFS
3. PIG script – log_processing.pig
3.1 Content
3.2 Execution
4. Creating HIVE table on the processed log data
Sagnik_AnalytixLabs_Projects
Sagnik_AnalytixLabs_Projects

More Related Content

What's hot

OpenDataMonitor Overview
OpenDataMonitor OverviewOpenDataMonitor Overview
OpenDataMonitor OverviewOpenDataMonitor
 
Sap business intelligence 4.0 report basic
Sap business intelligence 4.0   report basicSap business intelligence 4.0   report basic
Sap business intelligence 4.0 report basictovetrivel
 
Day 6.1 and_6.2__flat_files_and_service_api
Day 6.1 and_6.2__flat_files_and_service_apiDay 6.1 and_6.2__flat_files_and_service_api
Day 6.1 and_6.2__flat_files_and_service_apitovetrivel
 
Data flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousingData flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousingDr. Dipti Patil
 
Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETLganblues
 
4Science Submission Module Preview
4Science Submission Module Preview4Science Submission Module Preview
4Science Submission Module Preview4Science
 
Metadata harvesting
Metadata harvestingMetadata harvesting
Metadata harvestingAndrewLIS688
 
OneBridge Online Log Viewer (OOLV2)
OneBridge Online Log Viewer (OOLV2)OneBridge Online Log Viewer (OOLV2)
OneBridge Online Log Viewer (OOLV2)neleos
 
Pnbhfl training final
Pnbhfl training finalPnbhfl training final
Pnbhfl training finalNupur Mishra
 
Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)Dean Willson
 

What's hot (17)

Docs
DocsDocs
Docs
 
OpenDataMonitor Overview
OpenDataMonitor OverviewOpenDataMonitor Overview
OpenDataMonitor Overview
 
MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018
 
Sap business intelligence 4.0 report basic
Sap business intelligence 4.0   report basicSap business intelligence 4.0   report basic
Sap business intelligence 4.0 report basic
 
Koha Cataloguing Module
Koha Cataloguing ModuleKoha Cataloguing Module
Koha Cataloguing Module
 
Data integration
Data integrationData integration
Data integration
 
Day 6.1 and_6.2__flat_files_and_service_api
Day 6.1 and_6.2__flat_files_and_service_apiDay 6.1 and_6.2__flat_files_and_service_api
Day 6.1 and_6.2__flat_files_and_service_api
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Data flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousingData flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousing
 
Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETL
 
4Science Submission Module Preview
4Science Submission Module Preview4Science Submission Module Preview
4Science Submission Module Preview
 
Metadata harvesting
Metadata harvestingMetadata harvesting
Metadata harvesting
 
Data stage
Data stageData stage
Data stage
 
New
NewNew
New
 
OneBridge Online Log Viewer (OOLV2)
OneBridge Online Log Viewer (OOLV2)OneBridge Online Log Viewer (OOLV2)
OneBridge Online Log Viewer (OOLV2)
 
Pnbhfl training final
Pnbhfl training finalPnbhfl training final
Pnbhfl training final
 
Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)
 

Similar to Sagnik_AnalytixLabs_Projects

Library management system
Library management systemLibrary management system
Library management systemArman Ahmed
 
Case Study For Data Governance Portal
Case Study For Data Governance PortalCase Study For Data Governance Portal
Case Study For Data Governance PortalMike Taylor
 
Running Head System Proposal .docx
Running Head System Proposal                                     .docxRunning Head System Proposal                                     .docx
Running Head System Proposal .docxagnesdcarey33086
 
ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)Abdelkrim Boujraf
 
Blood Bank Management System
Blood Bank Management SystemBlood Bank Management System
Blood Bank Management SystemKavi
 
FAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptx
FAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptxFAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptx
FAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptxgattamanenitejeswar
 
SAP BO Web Intelligence (BI Launch Pad)-Basics
SAP BO Web Intelligence (BI Launch Pad)-BasicsSAP BO Web Intelligence (BI Launch Pad)-Basics
SAP BO Web Intelligence (BI Launch Pad)-BasicsKiran Joy
 
SAP BO Web Intelligence Basics
SAP BO Web Intelligence BasicsSAP BO Web Intelligence Basics
SAP BO Web Intelligence BasicsKiran Joy
 
[DSC Europe 23] Antoni Ivanov - Make data central feature
[DSC Europe 23] Antoni Ivanov - Make data central feature[DSC Europe 23] Antoni Ivanov - Make data central feature
[DSC Europe 23] Antoni Ivanov - Make data central featureDataScienceConferenc1
 
Code camp 2015 visual programming mm
Code camp 2015 visual programming mmCode camp 2015 visual programming mm
Code camp 2015 visual programming mmMitch Miller
 
Big Data projects.pdf
Big Data projects.pdfBig Data projects.pdf
Big Data projects.pdfssuserf0a206
 
Connect 2014 - CUST109 - planning and upgrading to ibm connections 4.5 succes...
Connect 2014 - CUST109 - planning and upgrading to ibm connections 4.5 succes...Connect 2014 - CUST109 - planning and upgrading to ibm connections 4.5 succes...
Connect 2014 - CUST109 - planning and upgrading to ibm connections 4.5 succes...Martin Schmidt
 

Similar to Sagnik_AnalytixLabs_Projects (20)

Library management system
Library management systemLibrary management system
Library management system
 
lecture_29.pptx
lecture_29.pptxlecture_29.pptx
lecture_29.pptx
 
Mobile shopping
Mobile shoppingMobile shopping
Mobile shopping
 
contentDM
contentDMcontentDM
contentDM
 
Best android classes in mumbai
Best android classes in mumbaiBest android classes in mumbai
Best android classes in mumbai
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Case Study For Data Governance Portal
Case Study For Data Governance PortalCase Study For Data Governance Portal
Case Study For Data Governance Portal
 
Running Head System Proposal .docx
Running Head System Proposal                                     .docxRunning Head System Proposal                                     .docx
Running Head System Proposal .docx
 
ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)
 
Final
FinalFinal
Final
 
Blood Bank Management System
Blood Bank Management SystemBlood Bank Management System
Blood Bank Management System
 
FAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptx
FAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptxFAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptx
FAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptx
 
SAP BO Web Intelligence (BI Launch Pad)-Basics
SAP BO Web Intelligence (BI Launch Pad)-BasicsSAP BO Web Intelligence (BI Launch Pad)-Basics
SAP BO Web Intelligence (BI Launch Pad)-Basics
 
SAP BO Web Intelligence Basics
SAP BO Web Intelligence BasicsSAP BO Web Intelligence Basics
SAP BO Web Intelligence Basics
 
Mobile web development
Mobile web developmentMobile web development
Mobile web development
 
Assignment.pdf
Assignment.pdfAssignment.pdf
Assignment.pdf
 
[DSC Europe 23] Antoni Ivanov - Make data central feature
[DSC Europe 23] Antoni Ivanov - Make data central feature[DSC Europe 23] Antoni Ivanov - Make data central feature
[DSC Europe 23] Antoni Ivanov - Make data central feature
 
Code camp 2015 visual programming mm
Code camp 2015 visual programming mmCode camp 2015 visual programming mm
Code camp 2015 visual programming mm
 
Big Data projects.pdf
Big Data projects.pdfBig Data projects.pdf
Big Data projects.pdf
 
Connect 2014 - CUST109 - planning and upgrading to ibm connections 4.5 succes...
Connect 2014 - CUST109 - planning and upgrading to ibm connections 4.5 succes...Connect 2014 - CUST109 - planning and upgrading to ibm connections 4.5 succes...
Connect 2014 - CUST109 - planning and upgrading to ibm connections 4.5 succes...
 

Sagnik_AnalytixLabs_Projects

  • 1. PROJECT 1: Analyzing clickstream data On a Web site, clickstream analysis (sometimes called clickstream analytics) is the process of collecting, analyzing, and reporting aggregate data about which pages visitors visit in what order - which are the result of the succession of mouse clicks each visitor makes (that is, the clickstream). Download Link
  • 2. 1. Loading the data files into HDFS
  • 3. 2. Starting the new Beeline shell (hive-server 2)
  • 4. 3. Creating new database – alabs_db
  • 5. 4.Creating and loading HIVE table – users
  • 6.
  • 7. 5. All 3 HIVE base tables – omniturelogs, products and users created
  • 8. 6. Content of HIVE script – webanalytics.sql
  • 9. 6. Using webanalytics.sql, omniture and webanalytics tables are created
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. PROJECT 2: Sentiment Analysis/Opinion Mining Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials. Sentiment analysis is widely applied to reviews and social media for a variety of applications, ranging from marketing to customer service. Data Download Link Tableau Link
  • 18. 1. Loading the data files into HDFS
  • 19. 2. Content of twitter_conf.conf file
  • 20. 3. Executing the TwitterAgent flume agent using twitter_conf.conf file
  • 21. 4. Twitter data moved to HDFS
  • 22. 5. Content of tweets.sql file
  • 23.
  • 24.
  • 25. 6. Executing tweets.sql to create tables and views for analysis
  • 26. 7. Tables and views for analysis are created
  • 28. PROJECT 3: Lending Club Loan Analysis Lending Club is a US peer-to-peer lending company. Lending Club operates an online lending platform that enables borrowers to obtain a loan, and investors to purchase notes backed by payments made on loans. Lending Club is the world's largest peer-to-peer lending platform. Data Download Link Tableau Link
  • 29. 1. Loading the data files into HDFS
  • 30. 2. Content of loan_analysis.sql file
  • 31.
  • 32.
  • 33. 3. Tables and view created using loan_analysis.sql
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. PROJECT 4: HVAC Temperature Analysis HVAC (stands for Heating, Ventilation and Air Conditioning) equipment needs a control system to regulate the operation of a heating and/or air conditioning system. Usually a sensing device is used to compare the actual state (e.g. temperature) with a target state. Then the control system draws a conclusion what action has to be taken. Data Download Link Tableau Link
  • 40. 1. Loading the data files into HDFS
  • 41. 2. Content of sensor_analysis.sql file
  • 42.
  • 43. 3. Tables and view created using sensor_analysis.sql
  • 44.
  • 45.
  • 46. PROJECT 5: Upsell Analysis Upselling is a sales technique whereby a seller induces the customer to purchase more expensive items, upgrades or other add-ons in an attempt to make a more profitable sale. Data Download Link
  • 48. 2. Content of upsell_analysis.sql file
  • 49. A B C
  • 50. 3. A What is A doing? • Concatenates first name and last name to a single field – name • Assigns each customer a category • Calculates the total amount spent by the customer in each category • Order customers by the total amount spent in descending order
  • 51. 4. B 4.1 What is B doing? • Extracts name from A • Each customer is assigned his respective categories using COLLECT_LIST() function which converts multiple rows to a single row of array datatype • Each customer is assigned his respective amount spent on those categories • Calculating the overall total amount spent by each customer on all categories • Evaluating the recommended category for each customer as per the amount spent per category
  • 53. 5. Sample data after C
  • 54. PROJECT 6: Web Logs’ Analysis An access log is a list of all the requests for individual files that people have requested from a Web site. These files will include the HTML files and their imbedded graphic images and any other associated files that get transmitted. The access log (sometimes referred to as the "raw data") can be analysed and summarized by another program. Data Download Link Tableau Link
  • 55. 1. Accessing apache access logs using flume 1.1 flume.conf 1.2 Extract web logs’ data using the following command: /usr/lib/flume-ng/bin/flume-ng agent –n source_agent –c conf –f /usr/lib/flume- ng/conf/flume.conf
  • 57. 3. Moving log file to HDFS
  • 58. 3. PIG script – log_processing.pig 3.1 Content
  • 60.
  • 61. 4. Creating HIVE table on the processed log data