SlideShare a Scribd company logo
Data size analysis of dataset worked so far on Hadoop/Spark.
Kindly go through below dataset size details worked in different projects.
Sr. Projects Enviorment
Data
Size(approx)
Type
1 Panera, LLC-
Capacity Planning
Development 3.4TB(6month
data)
Unstructured Data
2 Panera, LLC -
Predictive Analytics
Production 6.8 TB( 1 year
data)
Unstructured Data
3 AT&T Insights production 22 TB Structured and
Unstructured
4 AT&T Insights Non-
production(stan
dby)
22 TB Complex
Structured and
Unstructured data(
29 markets)
5 CTL Project
Development :
REST API Ingestion
for Data Lake
Development 310 GB Semi-
Structured(JSON)
6 AT&T Telegence
Mobility
Production 7 TB Semi-
Structured(XML
Text)
36.2 TB ( 69%)
9 TB (17%)
7.3 TB (14%)
Data Size Statistics: Structure vs
Unstructured
UnStructured Structured Semi Structured

More Related Content

What's hot

congress_project_w205_conference-FINAL
congress_project_w205_conference-FINALcongress_project_w205_conference-FINAL
congress_project_w205_conference-FINAL
Amir Ziai
 

What's hot (20)

Data science in ruby is it possible? is it fast? should we use it?
Data science in ruby is it possible? is it fast? should we use it?Data science in ruby is it possible? is it fast? should we use it?
Data science in ruby is it possible? is it fast? should we use it?
 
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação ExtremaA Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
 
Graphalytics: A big data benchmark for graph processing platforms
Graphalytics: A big data benchmark for graph processing platformsGraphalytics: A big data benchmark for graph processing platforms
Graphalytics: A big data benchmark for graph processing platforms
 
Skillshare - Let's talk about R in Data Journalism
Skillshare - Let's talk about R in Data JournalismSkillshare - Let's talk about R in Data Journalism
Skillshare - Let's talk about R in Data Journalism
 
Graphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platforms
 
MLSD18. Basic Transformations - QCRI
MLSD18. Basic Transformations - QCRIMLSD18. Basic Transformations - QCRI
MLSD18. Basic Transformations - QCRI
 
Reactive Databases for Big Data applications
Reactive Databases for Big Data applicationsReactive Databases for Big Data applications
Reactive Databases for Big Data applications
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Demand for Python Leaves R and SAS in the Dust
Demand for Python Leaves R and SAS in the DustDemand for Python Leaves R and SAS in the Dust
Demand for Python Leaves R and SAS in the Dust
 
MLSD18. Summary of Morning Sessions
MLSD18. Summary of Morning SessionsMLSD18. Summary of Morning Sessions
MLSD18. Summary of Morning Sessions
 
0629venmoplus
0629venmoplus0629venmoplus
0629venmoplus
 
Applications of R (DataWeek 2014)
Applications of R (DataWeek 2014)Applications of R (DataWeek 2014)
Applications of R (DataWeek 2014)
 
VenmoPlus demo week6
VenmoPlus demo week6VenmoPlus demo week6
VenmoPlus demo week6
 
Application as data flow - LSCC Talks #5
Application as data flow - LSCC Talks #5Application as data flow - LSCC Talks #5
Application as data flow - LSCC Talks #5
 
congress_project_w205_conference-FINAL
congress_project_w205_conference-FINALcongress_project_w205_conference-FINAL
congress_project_w205_conference-FINAL
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
 
The IoT and big data
The IoT and big dataThe IoT and big data
The IoT and big data
 
Distributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive AnalyticsDistributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive Analytics
 
Neo4j GraphTour New York_Thomson Reuters SS
Neo4j GraphTour New York_Thomson Reuters SSNeo4j GraphTour New York_Thomson Reuters SS
Neo4j GraphTour New York_Thomson Reuters SS
 
Data tools ecosystem for non-programmers
Data tools ecosystem for non-programmersData tools ecosystem for non-programmers
Data tools ecosystem for non-programmers
 

Similar to Data_Size_statistics

INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
Attila Barta
 
Wikipedia: Tuned Predictions on Big Data
Wikipedia: Tuned Predictions on Big DataWikipedia: Tuned Predictions on Big Data
Wikipedia: Tuned Predictions on Big Data
Vivian S. Zhang
 

Similar to Data_Size_statistics (20)

2013 DATA @ NFLX (Tableau User Group)
2013 DATA @ NFLX (Tableau User Group)2013 DATA @ NFLX (Tableau User Group)
2013 DATA @ NFLX (Tableau User Group)
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Ets train ppt_big_data_basics_v2.0
Ets train ppt_big_data_basics_v2.0Ets train ppt_big_data_basics_v2.0
Ets train ppt_big_data_basics_v2.0
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
 
Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)
 
SC10 project slides
SC10 project slidesSC10 project slides
SC10 project slides
 
AWS Big Data Analytics IP Expo 2013
AWS Big Data Analytics IP Expo 2013AWS Big Data Analytics IP Expo 2013
AWS Big Data Analytics IP Expo 2013
 
NoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesNoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, Opportunities
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and Kibana
 
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up ServerHow Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
 
Big Data presentation Tensing
Big Data presentation TensingBig Data presentation Tensing
Big Data presentation Tensing
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
Big data @ Bukalapak
Big data @ BukalapakBig data @ Bukalapak
Big data @ Bukalapak
 
Wikipedia: Tuned Predictions on Big Data
Wikipedia: Tuned Predictions on Big DataWikipedia: Tuned Predictions on Big Data
Wikipedia: Tuned Predictions on Big Data
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
 

Data_Size_statistics

  • 1. Data size analysis of dataset worked so far on Hadoop/Spark. Kindly go through below dataset size details worked in different projects. Sr. Projects Enviorment Data Size(approx) Type 1 Panera, LLC- Capacity Planning Development 3.4TB(6month data) Unstructured Data 2 Panera, LLC - Predictive Analytics Production 6.8 TB( 1 year data) Unstructured Data 3 AT&T Insights production 22 TB Structured and Unstructured 4 AT&T Insights Non- production(stan dby) 22 TB Complex Structured and Unstructured data( 29 markets) 5 CTL Project Development : REST API Ingestion for Data Lake Development 310 GB Semi- Structured(JSON) 6 AT&T Telegence Mobility Production 7 TB Semi- Structured(XML Text) 36.2 TB ( 69%) 9 TB (17%) 7.3 TB (14%) Data Size Statistics: Structure vs Unstructured UnStructured Structured Semi Structured