SlideShare a Scribd company logo
1 of 6
Download to read offline
REAL TIME PROJECT:
Click Stream Data Analytics Report Project
ClickStream Data
ClickStream data could be generated from any activity performed by the user over a web application.
What could be the user activity over any website? For example, I am logging into Amazon, what are the
activities I could perform? In a pattern, I may navigate through some pages; spend some time over certain
pages and click on certain things. All these activities, including reaching that particular page or application,
clicking, navigating from one page to another and spending time make a set of data. All these will be logged
by a web application. This data is known as ClickStream Data. It has a high business value, specific to e-
commerce applications and for those who want to understand their users’ behavior.
More formally, ClickStream can be defined as data about the links that a user clicked, including the
point of time when each one of them were clicked. E-commerce businesses mine and analyse ClickStream
data on their own websites. Most of the E-commerce applications have their built-in system, which mines all
this information.
ClickStream Analytics
Using the ClickStream data adds a lot of value to businesses, through which they can bring many
customers or visitors. It helps them understand whether the application is right, and the application
experience of users is good or bad, based on the navigation patterns that people take. They can also predict
which page you are most likely to visit next and can-do Ad Targeting as well. With this, they can understand
the needs of users and come up with better recommendations. Several other things are possible using the
ClickStream Data.
Project Scope
In this project candidates are given with sample click stream data which is taken from a web
application in a text file along with problem statements.
➢ Users information in MySQL database.
➢ Click stream data in text file generated from Web application.
Each candidate has to come up with high level system architecture design based upon the Hadoop eco
systems covered during the course. Each candidate has to table the High-level system architecture along
with designed eco systems and pros and cons will be discussed with all the other candidates. Finally, will
choose the best possible optimal system design approach for implementation.
Candidates are given instructions to create an oozie work flow with the respective Hadoop Eco systems
finalized based on the discussion. Candidates has to submit the project for the given problem statement and
this will be validated by the trainer individually before course completion.
ECO System involved in click stream analytics Project
➢ HDFS
➢ Sqoop
➢ Pig
➢ Hive
➢ Oozie
Big Data Hadoop Course Content
Chapter 1: Introduction to Big Data-hadoop
➢ Overview of Hadoop Ecosystem
➢ Role of Hadoop in Big Data– Overview of other Big DataSystems
➢ Who is using Hadoop
➢ Hadoop integrations into Exiting Software Products
➢ Current Scenario in Hadoop Ecosystem
➢ Installation
➢ Configuration
➢ Use Cases of Hadoop (HealthCare, Retail,Telecom)
Chapter 2 : HDFS
➢ Concepts
➢ Architecture
➢ Data Flow (File Read , FileWrite)
➢ Fault Tolerance
➢ Shell Commands
➢ Data Flow Archives
➢ Coherency -Data Integrity
➢ Role of Secondary Name Node
Chapter 3 : Mapreduce
➢ Theory
➢ Data Flow (Map – Shuffle –Reduce)
➢ MapRed vs MapReduce APIs
➢ Programming [Mapper, Reducer, Combiner, Partitioner]
➢ Writables
➢ Input Format
➢ Output format
➢ Streaming API using python
➢ Inherent Failure Handling using Speculative Execution
➢ Magic of Shuffle Phase
➢ File Formats
➢ Sequence Files
Chapter 4: Hbase
➢ Introduction to NoSQL
➢ CAP Theorem
➢ Classification of NoSQL
➢ Hbase and RDBMS
➢ HBASE and HDFS
➢ Architecture (Read Path, Write Path, Compactions,Splits)
➢ Installation
➢ Configuration
➢ Role of Zookeeper
➢ HBase Shell Introduction to Filters
➢ Row Key Design -What’s New in HBase HandsOn
Chapter 5 : Hive
➢ Architecture
➢ Installation
➢ Configuration
➢ Hive vs RDBMS
➢ Tables
➢ DDL
➢ DML
➢ UDF
➢ Partitioning
➢ Bucketing
➢ Hive functions
➢ Date functions
➢ String functions
➢ Cast function Meta Store
➢ Joins
➢ Real-time HQL will be shared along with database migrationproject
Chapter 6 : pig
➢ Architecture
➢ Installation
➢ Hive vs Pig
➢ Pig Latin Syntax
➢ Data Types
➢ Functions (Eval, Load/Store, String, Date Time)
➢ Joins
➢ UDFs- Performance
➢ Troubleshooting
➢ Commonly Used Functions
Chapter 7 : sqoop
➢ Architecture , Installation, Commands(Import , Hive-Import, EVal, Hbase Import, Import All
tables, Export)
➢ Connectors to Existing DBs and DW
Practicals
➢ SQOOP to import Real Time Weblogs from application to DB and try to export the sameto
MySQL
Chapter 8 : kafka
➢ Kafka introduction
➢ Data streaming Introduction
➢ Producer-consumer-topics
➢ Brokers
➢ Partitions
➢ Unix Streaming via kafka
Practicals
Kafka
➢ Producer and Subscribers setup and publish a topic from Producer to subscriber
Chapter 9 : oozie
➢ Architecture
➢ Installation
➢ Workflow
➢ Coordinator
➢ Action (Map reduce, Hive, Pig,Sqoop)
➢ Introduction to Bundle
➢ Mail Notifications
Chapter 10: Hadoop 2.0 and spark
➢ Limitations in Hadoop
➢ –HDFS Federation
➢ High Availability in HDFS
➢ HDFS Snapshots
➢ Other Improvements inHDFS2
➢ Introduction to YARN akaMR2
➢ Limitations in MR1
➢ Architecture of YARN
➢ Map Reduce Job Flow inYARN
➢ Introduction to Stinger Initiative andTez
➢ Back Ward Compatibility for Hadoop1.X
➢ Spark Fundamentals
➢ RDD- Sample Scala Program- SparkStreaming
Practicals
➢ Difference between SPARK1.x and SPARK2.x
➢ PySpark program to create word count program in pyspark
Chapter 11: Big Data Use cases
➢ Hadoop
➢ HDFS architecture and usage
➢ MapReduce Architecture and real time exercises
➢ Hadoop Eco systems
➢ Sqoop - mysql Db Migration
➢ Hive. -- Deep drive
➢ Pig - weblog parsing and ETL
➢ Oozie - Workflow scheduling
➢ Flume - weblogs ingestion
➢ No SQL
➢ HBase
➢ Apache Kafka
➢ Pentaho ETL tool integration & working with Hadoop eco system
➢ Apache SPARK
➢ Introduction and working with RDD.
➢ Multi node Setup Guidance
➢ Hadoop latest version Pros & cons discussion
➢ Ends with Introduction of Data science.
Chapter 12: Real Time Project
➢ Getting applications web logs
➢ Getting user information from my sql via sqoop
➢ Getting extracted data from Pig script
➢ Creating Hive SQL Table for querying
➢ Creating Reports from Hive QL

More Related Content

What's hot

Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Sharing bisnis big data v3 part1
Sharing  bisnis big data v3 part1Sharing  bisnis big data v3 part1
Sharing bisnis big data v3 part1Dwika Sudrajat
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applicationsdzhou
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop frameworkTu Pham
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irdatastack
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Distributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationDistributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationGennady Baranov
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)SahilRaina21
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopRojaT4
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3RojaT4
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
Benchmarking Apache Druid
Benchmarking Apache Druid Benchmarking Apache Druid
Benchmarking Apache Druid Matt Sarrel
 

What's hot (20)

Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Sharing bisnis big data v3 part1
Sharing  bisnis big data v3 part1Sharing  bisnis big data v3 part1
Sharing bisnis big data v3 part1
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Distributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationDistributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentation
 
MongoDB
MongoDBMongoDB
MongoDB
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - Hadoop
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Benchmarking Apache Druid
Benchmarking Apache Druid Benchmarking Apache Druid
Benchmarking Apache Druid
 

Similar to Big data-hadoop-training-course-content-content

Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlKhanderao Kand
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingCascading
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Infochimps: Cloud for Big Data
Infochimps: Cloud for Big DataInfochimps: Cloud for Big Data
Infochimps: Cloud for Big Datainside-BigData.com
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRABhadra Gowdra
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitSlim Baltagi
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and FutureKeiichiro Ono
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsScyllaDB
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 

Similar to Big data-hadoop-training-course-content-content (20)

Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Big Data
Big DataBig Data
Big Data
 
Final White Paper_
Final White Paper_Final White Paper_
Final White Paper_
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log Processing
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Infochimps: Cloud for Big Data
Infochimps: Cloud for Big DataInfochimps: Cloud for Big Data
Infochimps: Cloud for Big Data
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
 
Hadoop
HadoopHadoop
Hadoop
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Big Data and Hadoop Training in Bangalore by myTectra
Big Data and Hadoop Training in Bangalore by myTectraBig Data and Hadoop Training in Bangalore by myTectra
Big Data and Hadoop Training in Bangalore by myTectra
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 
Data scientist a perfect job
Data scientist a perfect jobData scientist a perfect job
Data scientist a perfect job
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 

More from Training Institute

tell us which cloud you prefer
tell us which cloud you prefertell us which cloud you prefer
tell us which cloud you preferTraining Institute
 
Ui path training-course-content
Ui path training-course-contentUi path training-course-content
Ui path training-course-contentTraining Institute
 
Selenium training-course-content-syllabus-credo systemz
Selenium training-course-content-syllabus-credo systemzSelenium training-course-content-syllabus-credo systemz
Selenium training-course-content-syllabus-credo systemzTraining Institute
 
Python training-course-content
Python training-course-contentPython training-course-content
Python training-course-contentTraining Institute
 
Angular training-course-syllabus
Angular training-course-syllabus Angular training-course-syllabus
Angular training-course-syllabus Training Institute
 
Mean stack training-course-content
Mean stack training-course-contentMean stack training-course-content
Mean stack training-course-contentTraining Institute
 
Angular training-course-syllabus
Angular training-course-syllabusAngular training-course-syllabus
Angular training-course-syllabusTraining Institute
 
Angular webinar - Credo Systemz
Angular webinar - Credo SystemzAngular webinar - Credo Systemz
Angular webinar - Credo SystemzTraining Institute
 

More from Training Institute (10)

tell us which cloud you prefer
tell us which cloud you prefertell us which cloud you prefer
tell us which cloud you prefer
 
Testing
TestingTesting
Testing
 
Ui path training-course-content
Ui path training-course-contentUi path training-course-content
Ui path training-course-content
 
Selenium training-course-content-syllabus-credo systemz
Selenium training-course-content-syllabus-credo systemzSelenium training-course-content-syllabus-credo systemz
Selenium training-course-content-syllabus-credo systemz
 
Python training-course-content
Python training-course-contentPython training-course-content
Python training-course-content
 
Aws training-course-content
Aws training-course-contentAws training-course-content
Aws training-course-content
 
Angular training-course-syllabus
Angular training-course-syllabus Angular training-course-syllabus
Angular training-course-syllabus
 
Mean stack training-course-content
Mean stack training-course-contentMean stack training-course-content
Mean stack training-course-content
 
Angular training-course-syllabus
Angular training-course-syllabusAngular training-course-syllabus
Angular training-course-syllabus
 
Angular webinar - Credo Systemz
Angular webinar - Credo SystemzAngular webinar - Credo Systemz
Angular webinar - Credo Systemz
 

Recently uploaded

REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptxJoelynRubio1
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...EADTU
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17Celine George
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
PANDITA RAMABAI- Indian political thought GENDER.pptx
PANDITA RAMABAI- Indian political thought GENDER.pptxPANDITA RAMABAI- Indian political thought GENDER.pptx
PANDITA RAMABAI- Indian political thought GENDER.pptxakanksha16arora
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfPondicherry University
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonhttgc7rh9c
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111GangaMaiya1
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 

Recently uploaded (20)

REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
PANDITA RAMABAI- Indian political thought GENDER.pptx
PANDITA RAMABAI- Indian political thought GENDER.pptxPANDITA RAMABAI- Indian political thought GENDER.pptx
PANDITA RAMABAI- Indian political thought GENDER.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 

Big data-hadoop-training-course-content-content

  • 1. REAL TIME PROJECT: Click Stream Data Analytics Report Project ClickStream Data ClickStream data could be generated from any activity performed by the user over a web application. What could be the user activity over any website? For example, I am logging into Amazon, what are the activities I could perform? In a pattern, I may navigate through some pages; spend some time over certain pages and click on certain things. All these activities, including reaching that particular page or application, clicking, navigating from one page to another and spending time make a set of data. All these will be logged by a web application. This data is known as ClickStream Data. It has a high business value, specific to e- commerce applications and for those who want to understand their users’ behavior. More formally, ClickStream can be defined as data about the links that a user clicked, including the point of time when each one of them were clicked. E-commerce businesses mine and analyse ClickStream data on their own websites. Most of the E-commerce applications have their built-in system, which mines all this information. ClickStream Analytics Using the ClickStream data adds a lot of value to businesses, through which they can bring many customers or visitors. It helps them understand whether the application is right, and the application experience of users is good or bad, based on the navigation patterns that people take. They can also predict which page you are most likely to visit next and can-do Ad Targeting as well. With this, they can understand the needs of users and come up with better recommendations. Several other things are possible using the ClickStream Data. Project Scope In this project candidates are given with sample click stream data which is taken from a web application in a text file along with problem statements. ➢ Users information in MySQL database. ➢ Click stream data in text file generated from Web application.
  • 2. Each candidate has to come up with high level system architecture design based upon the Hadoop eco systems covered during the course. Each candidate has to table the High-level system architecture along with designed eco systems and pros and cons will be discussed with all the other candidates. Finally, will choose the best possible optimal system design approach for implementation. Candidates are given instructions to create an oozie work flow with the respective Hadoop Eco systems finalized based on the discussion. Candidates has to submit the project for the given problem statement and this will be validated by the trainer individually before course completion. ECO System involved in click stream analytics Project ➢ HDFS ➢ Sqoop ➢ Pig ➢ Hive ➢ Oozie
  • 3. Big Data Hadoop Course Content Chapter 1: Introduction to Big Data-hadoop ➢ Overview of Hadoop Ecosystem ➢ Role of Hadoop in Big Data– Overview of other Big DataSystems ➢ Who is using Hadoop ➢ Hadoop integrations into Exiting Software Products ➢ Current Scenario in Hadoop Ecosystem ➢ Installation ➢ Configuration ➢ Use Cases of Hadoop (HealthCare, Retail,Telecom) Chapter 2 : HDFS ➢ Concepts ➢ Architecture ➢ Data Flow (File Read , FileWrite) ➢ Fault Tolerance ➢ Shell Commands ➢ Data Flow Archives ➢ Coherency -Data Integrity ➢ Role of Secondary Name Node Chapter 3 : Mapreduce ➢ Theory ➢ Data Flow (Map – Shuffle –Reduce) ➢ MapRed vs MapReduce APIs ➢ Programming [Mapper, Reducer, Combiner, Partitioner] ➢ Writables ➢ Input Format ➢ Output format ➢ Streaming API using python ➢ Inherent Failure Handling using Speculative Execution ➢ Magic of Shuffle Phase ➢ File Formats
  • 4. ➢ Sequence Files Chapter 4: Hbase ➢ Introduction to NoSQL ➢ CAP Theorem ➢ Classification of NoSQL ➢ Hbase and RDBMS ➢ HBASE and HDFS ➢ Architecture (Read Path, Write Path, Compactions,Splits) ➢ Installation ➢ Configuration ➢ Role of Zookeeper ➢ HBase Shell Introduction to Filters ➢ Row Key Design -What’s New in HBase HandsOn Chapter 5 : Hive ➢ Architecture ➢ Installation ➢ Configuration ➢ Hive vs RDBMS ➢ Tables ➢ DDL ➢ DML ➢ UDF ➢ Partitioning ➢ Bucketing ➢ Hive functions ➢ Date functions ➢ String functions ➢ Cast function Meta Store ➢ Joins ➢ Real-time HQL will be shared along with database migrationproject Chapter 6 : pig ➢ Architecture ➢ Installation ➢ Hive vs Pig ➢ Pig Latin Syntax ➢ Data Types ➢ Functions (Eval, Load/Store, String, Date Time) ➢ Joins ➢ UDFs- Performance ➢ Troubleshooting
  • 5. ➢ Commonly Used Functions Chapter 7 : sqoop ➢ Architecture , Installation, Commands(Import , Hive-Import, EVal, Hbase Import, Import All tables, Export) ➢ Connectors to Existing DBs and DW Practicals ➢ SQOOP to import Real Time Weblogs from application to DB and try to export the sameto MySQL Chapter 8 : kafka ➢ Kafka introduction ➢ Data streaming Introduction ➢ Producer-consumer-topics ➢ Brokers ➢ Partitions ➢ Unix Streaming via kafka Practicals Kafka ➢ Producer and Subscribers setup and publish a topic from Producer to subscriber Chapter 9 : oozie ➢ Architecture ➢ Installation ➢ Workflow ➢ Coordinator ➢ Action (Map reduce, Hive, Pig,Sqoop) ➢ Introduction to Bundle ➢ Mail Notifications Chapter 10: Hadoop 2.0 and spark ➢ Limitations in Hadoop ➢ –HDFS Federation ➢ High Availability in HDFS ➢ HDFS Snapshots ➢ Other Improvements inHDFS2 ➢ Introduction to YARN akaMR2 ➢ Limitations in MR1 ➢ Architecture of YARN
  • 6. ➢ Map Reduce Job Flow inYARN ➢ Introduction to Stinger Initiative andTez ➢ Back Ward Compatibility for Hadoop1.X ➢ Spark Fundamentals ➢ RDD- Sample Scala Program- SparkStreaming Practicals ➢ Difference between SPARK1.x and SPARK2.x ➢ PySpark program to create word count program in pyspark Chapter 11: Big Data Use cases ➢ Hadoop ➢ HDFS architecture and usage ➢ MapReduce Architecture and real time exercises ➢ Hadoop Eco systems ➢ Sqoop - mysql Db Migration ➢ Hive. -- Deep drive ➢ Pig - weblog parsing and ETL ➢ Oozie - Workflow scheduling ➢ Flume - weblogs ingestion ➢ No SQL ➢ HBase ➢ Apache Kafka ➢ Pentaho ETL tool integration & working with Hadoop eco system ➢ Apache SPARK ➢ Introduction and working with RDD. ➢ Multi node Setup Guidance ➢ Hadoop latest version Pros & cons discussion ➢ Ends with Introduction of Data science. Chapter 12: Real Time Project ➢ Getting applications web logs ➢ Getting user information from my sql via sqoop ➢ Getting extracted data from Pig script ➢ Creating Hive SQL Table for querying ➢ Creating Reports from Hive QL