SlideShare a Scribd company logo
1 of 15
Big Data Basics
AUTHOR : MITHUN BANERJEE
DATE: 05-OCTOBER-2016
C O P Y R I G H T P R O T E C T E D B Y E C L I P S E T E C H N O C O N S U L T I N G G L O B A L ( P ) L T D .
What is Big data?
Big data is the term for a collection of data sets so large and complex
that it becomes difficult to process using on-hand database
management tools or traditional data processing applications.
--Wikipedia
Is the above definition fully comprehensive? 
Lets try to go deep in next slides
Data units to measure exponential growth of data
over the years
VOLUME of DATA
Type of data
• Relational Data (Tables/Transaction/Legacy Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
Social Network, SemanticWeb (RDF), …
• Streaming Data
You can only scan the data once
• A single application can be generating/collecting many types of data
• Big Public Data (online, weather, finance, etc)
Variety (complexities) of
data
Velocity of data
Late decisions  missing opportunities
Example: Healthcare monitoring: sensors monitoring your activities and body 
any abnormal measurements require immediate reaction
Velocity of data
Social media and networks
(all of us are generating data) Scientific instruments
(collecting all sorts of data)
Sensor technology and networks
(measuring all kinds of data)
REALTIME / FAST DATA
3Vs
4Vs
Generation and
Consumption of Data
In past
In present
OLTP: O N L I N E T R A N S A C T I O N P R O C E S S I NG ( D B M S )
OLAP: O N L I N E A N A LY T I C A L P R O C E S S I N G ( D ATA
WA R E H O U S I N G )
RTAP: REAL-TIME ANALYTICS PROCESSING (BIG
DATA ARCHITECTURE & TECHNOLOGY)
Driver of Data
- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources
-Very large datasets
- More of a real-time
- Ad-hoc querying and reporting
- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets
The Evolution of Business Intelligence
BI Reporting
OLAP &
Dataware house
Business Objects, SAS,
Informatica, Cognos other
SQL ReportingTools
Interactive
Business
Intelligence &
In-memory RDBMS
QliqView,Tableau, HANA
Big Data:
RealTime &
SingleView
Graph Databases
Big Data:
Batch Processing &
Distributed Data Store
Hadoop/Spark;
HBase/Cassandra
1990’s 2000’s 2010’s
Speed
Scale
Scale
Speed
Topic 1: Data Analytics &
Data Mining
• EXPLORATORY DATA ANALYSIS
•
• LINEAR CLASSIFICATION (PERCEPTRON &
LOGISTIC REGRESSION)
•
• LINEAR REGRESSION
• C4.5 DECISION TREE
• APRIORI
• K-MEANS CLUSTERING
•
• EM ALGORITHM
• PAGERANK & HITS
• COLLABORATIVE FILTERING
Topic 2: Hadoop/MapReduce
Programming & Data Processing
ARCHITECTURE OF HADOOP, HDFS, AND YARN
PROGRAMMING ON HADOOP
BASIC DATA PROCESSING: SORT AND JOIN
INFORMATION RETRIEVAL USING HADOOP
DATA MINING USING HADOOP
(KMEANS+HISTOGRAMS)
MACHINE LEARNING ON HADOOP (EM)
HIVE/PIG
HBASE AND CASSANDRA
Topic 3: Graph Database and
Graph Analytics
GRAPH DATABASE
(HTTP://EN.WIKIPEDIA.ORG/WIKI/GRAPH_DATAB
ASE)
Native Graph Database (Neo4j)
Pregel/Giraph (Distributed Graph Processing Engine)
NEO4J/TITAN/GRAPHLAB/GRAPHSQL
Reference to read for in
depth home work
•Hadoop:The Definitive Guide,Tom White, O’Reilly
•Data Mining: Concepts andTechniques,Third Edition, by
Jiawei Han et al.
•https://www.mongodb.com/collateral/big-data-examples-
and-guidelines-enterprise-decision-maker
•
•http://www.aptude.com/blog/entry/hadoop-vs-mongodb-
which-platform-is-better-for-handling-big-data
•
•http://www.slideshare.net/wlaforest/an-introduction-to-
big-data-nosql-and-mongodb
•http://www.infoworld.com/article/2608460/application-
development/the-10-worst-big-data-practices.html
Ets train ppt_big_data_basics_v2.0

More Related Content

What's hot

Big data landscape map collection by aibdp
Big data landscape map collection by aibdpBig data landscape map collection by aibdp
Big data landscape map collection by aibdpAIBDP
 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsRavi Teja
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormRevolution Analytics
 
FeniStockSwing
FeniStockSwingFeniStockSwing
FeniStockSwingfenichawla
 
How the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeedHow the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeedRevolution Analytics
 
data warehousing and data mining
data warehousing and data mining data warehousing and data mining
data warehousing and data mining E2MATRIX
 
Advanced Analytics for Any Data at Real-Time Speed
Advanced Analytics for Any Data at Real-Time SpeedAdvanced Analytics for Any Data at Real-Time Speed
Advanced Analytics for Any Data at Real-Time Speeddanpotterdwch
 
Data analysis with pandas and scikit-learn
Data analysis with pandas and scikit-learnData analysis with pandas and scikit-learn
Data analysis with pandas and scikit-learnGlib Kechyn
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and InternetSanoj Kumar
 
introduction to data warehousing and mining
 introduction to data warehousing and mining introduction to data warehousing and mining
introduction to data warehousing and miningRajesh Chandra
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)Rainer Sternfeld
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratchVinayak Hegde
 
Thinking Outside the Table
Thinking Outside the TableThinking Outside the Table
Thinking Outside the TableOntotext
 
Building Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsBuilding Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsOntotext
 

What's hot (20)

Big data landscape map collection by aibdp
Big data landscape map collection by aibdpBig data landscape map collection by aibdp
Big data landscape map collection by aibdp
 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data Analytics
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
 
FeniStockSwing
FeniStockSwingFeniStockSwing
FeniStockSwing
 
How the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeedHow the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeed
 
data warehousing and data mining
data warehousing and data mining data warehousing and data mining
data warehousing and data mining
 
Big Data Landscape 2016
Big Data Landscape 2016Big Data Landscape 2016
Big Data Landscape 2016
 
Applications of R (DataWeek 2014)
Applications of R (DataWeek 2014)Applications of R (DataWeek 2014)
Applications of R (DataWeek 2014)
 
Advanced Analytics for Any Data at Real-Time Speed
Advanced Analytics for Any Data at Real-Time SpeedAdvanced Analytics for Any Data at Real-Time Speed
Advanced Analytics for Any Data at Real-Time Speed
 
Data analysis with pandas and scikit-learn
Data analysis with pandas and scikit-learnData analysis with pandas and scikit-learn
Data analysis with pandas and scikit-learn
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
 
It takes a village (to raise a ML model)
It takes a village (to raise a ML model)It takes a village (to raise a ML model)
It takes a village (to raise a ML model)
 
introduction to data warehousing and mining
 introduction to data warehousing and mining introduction to data warehousing and mining
introduction to data warehousing and mining
 
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and HealthcareBig data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratch
 
Session 10 data
Session 10 dataSession 10 data
Session 10 data
 
Thinking Outside the Table
Thinking Outside the TableThinking Outside the Table
Thinking Outside the Table
 
Building Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsBuilding Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 steps
 

Viewers also liked

Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big dataPrashant Sharma
 
Big data ppt
Big data pptBig data ppt
Big data pptYash Raj
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview pptVIKAS KATARE
 
Big Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPTBig Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPTNikhil Atkuri
 
GI2016 ppt shi (big data analytics on the internet)
GI2016 ppt shi (big data analytics on the internet)GI2016 ppt shi (big data analytics on the internet)
GI2016 ppt shi (big data analytics on the internet)IGN Vorstand
 

Viewers also liked (11)

Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
Big data Ppt
Big data PptBig data Ppt
Big data Ppt
 
Big Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPTBig Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPT
 
GI2016 ppt shi (big data analytics on the internet)
GI2016 ppt shi (big data analytics on the internet)GI2016 ppt shi (big data analytics on the internet)
GI2016 ppt shi (big data analytics on the internet)
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 

Similar to Ets train ppt_big_data_basics_v2.0

Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...Experfy
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataSitaram Kotnis
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureInside Analysis
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPDr Geetha Mohan
 
"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.orgAIBDP
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfrajsharma159890
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 

Similar to Ets train ppt_big_data_basics_v2.0 (20)

Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Thilga
ThilgaThilga
Thilga
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information Architecture
 
Big data-ppt-
Big data-ppt-Big data-ppt-
Big data-ppt-
 
De-Mystifying Big Data
De-Mystifying Big DataDe-Mystifying Big Data
De-Mystifying Big Data
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOP
 
"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org
 
Big data
Big dataBig data
Big data
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Bigdata (1) converted
Bigdata (1) convertedBigdata (1) converted
Bigdata (1) converted
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 

More from Eclipse Techno Consulting Global (P) Ltd (9)

Secret to success revealed
Secret to success revealedSecret to success revealed
Secret to success revealed
 
Ets's vision for 3 d modelling & animation
Ets's vision for 3 d modelling & animationEts's vision for 3 d modelling & animation
Ets's vision for 3 d modelling & animation
 
Ets's take on motivation
Ets's take on motivationEts's take on motivation
Ets's take on motivation
 
Ets's take on motivation
Ets's take on motivationEts's take on motivation
Ets's take on motivation
 
offers for our customers
offers for our customers offers for our customers
offers for our customers
 
Soft skill enhancement presentation
Soft skill enhancement presentationSoft skill enhancement presentation
Soft skill enhancement presentation
 
Internet marketing proposal from ETS
Internet marketing proposal from ETSInternet marketing proposal from ETS
Internet marketing proposal from ETS
 
How to arrange Events in Corporate world
How to arrange Events in Corporate worldHow to arrange Events in Corporate world
How to arrange Events in Corporate world
 
Face and Voice Recognition- Artificial Intelligence
Face and Voice Recognition- Artificial IntelligenceFace and Voice Recognition- Artificial Intelligence
Face and Voice Recognition- Artificial Intelligence
 

Recently uploaded

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 

Recently uploaded (20)

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 

Ets train ppt_big_data_basics_v2.0

  • 1. Big Data Basics AUTHOR : MITHUN BANERJEE DATE: 05-OCTOBER-2016 C O P Y R I G H T P R O T E C T E D B Y E C L I P S E T E C H N O C O N S U L T I N G G L O B A L ( P ) L T D .
  • 2. What is Big data? Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. --Wikipedia Is the above definition fully comprehensive?  Lets try to go deep in next slides
  • 3. Data units to measure exponential growth of data over the years VOLUME of DATA
  • 4. Type of data • Relational Data (Tables/Transaction/Legacy Data) • Text Data (Web) • Semi-structured Data (XML) • Graph Data Social Network, SemanticWeb (RDF), … • Streaming Data You can only scan the data once • A single application can be generating/collecting many types of data • Big Public Data (online, weather, finance, etc) Variety (complexities) of data
  • 5. Velocity of data Late decisions  missing opportunities Example: Healthcare monitoring: sensors monitoring your activities and body  any abnormal measurements require immediate reaction Velocity of data Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Sensor technology and networks (measuring all kinds of data) REALTIME / FAST DATA
  • 6. 3Vs
  • 7. 4Vs
  • 8. Generation and Consumption of Data In past In present OLTP: O N L I N E T R A N S A C T I O N P R O C E S S I NG ( D B M S ) OLAP: O N L I N E A N A LY T I C A L P R O C E S S I N G ( D ATA WA R E H O U S I N G ) RTAP: REAL-TIME ANALYTICS PROCESSING (BIG DATA ARCHITECTURE & TECHNOLOGY)
  • 9. Driver of Data - Optimizations and predictive analytics - Complex statistical analysis - All types of data, and many sources -Very large datasets - More of a real-time - Ad-hoc querying and reporting - Data mining techniques - Structured data, typical sources - Small to mid-size datasets
  • 10. The Evolution of Business Intelligence BI Reporting OLAP & Dataware house Business Objects, SAS, Informatica, Cognos other SQL ReportingTools Interactive Business Intelligence & In-memory RDBMS QliqView,Tableau, HANA Big Data: RealTime & SingleView Graph Databases Big Data: Batch Processing & Distributed Data Store Hadoop/Spark; HBase/Cassandra 1990’s 2000’s 2010’s Speed Scale Scale Speed
  • 11. Topic 1: Data Analytics & Data Mining • EXPLORATORY DATA ANALYSIS • • LINEAR CLASSIFICATION (PERCEPTRON & LOGISTIC REGRESSION) • • LINEAR REGRESSION • C4.5 DECISION TREE • APRIORI • K-MEANS CLUSTERING • • EM ALGORITHM • PAGERANK & HITS • COLLABORATIVE FILTERING
  • 12. Topic 2: Hadoop/MapReduce Programming & Data Processing ARCHITECTURE OF HADOOP, HDFS, AND YARN PROGRAMMING ON HADOOP BASIC DATA PROCESSING: SORT AND JOIN INFORMATION RETRIEVAL USING HADOOP DATA MINING USING HADOOP (KMEANS+HISTOGRAMS) MACHINE LEARNING ON HADOOP (EM) HIVE/PIG HBASE AND CASSANDRA
  • 13. Topic 3: Graph Database and Graph Analytics GRAPH DATABASE (HTTP://EN.WIKIPEDIA.ORG/WIKI/GRAPH_DATAB ASE) Native Graph Database (Neo4j) Pregel/Giraph (Distributed Graph Processing Engine) NEO4J/TITAN/GRAPHLAB/GRAPHSQL
  • 14. Reference to read for in depth home work •Hadoop:The Definitive Guide,Tom White, O’Reilly •Data Mining: Concepts andTechniques,Third Edition, by Jiawei Han et al. •https://www.mongodb.com/collateral/big-data-examples- and-guidelines-enterprise-decision-maker • •http://www.aptude.com/blog/entry/hadoop-vs-mongodb- which-platform-is-better-for-handling-big-data • •http://www.slideshare.net/wlaforest/an-introduction-to- big-data-nosql-and-mongodb •http://www.infoworld.com/article/2608460/application- development/the-10-worst-big-data-practices.html