SlideShare a Scribd company logo
Nicholas Berg, Director Enterprise Analytics
Seagate Technology
Sensor Overload!Taming The Raging Manufacturing Big Data Torrent
(DMT-1633)
IBM World of Watson 2016 Conference
You May Know Seagate as a
Hard Drive
Manufacturer…
§  $11B Annual Revenue
§  45,000 employees, 26 countries
§  1st and only to ship over 2 billion drives
§  Stores more than 40% of the world’s data
§  Technology leader with 9000+ patents
The Intelligent Information Infrastructure
Seagate is Evolving to Address Trends
SYSTEMS
DEVICES
ARRAYS
Converged Infrastructure
Hybrid Data
Systems
Software-defined
Storage
Flash Arrays
Cloud Backup
& Disaster Recovery
NAND PCIe
Traditional Enterprise
NAND SASHyperscale
Desktop
Notebook / Tablet
Branded NAND SATAHybrid
Kinetic
Still Growing
Expanded Offerings
Started with Our
Core Products
Seagate’s Manufacturing & Design
Fremont, CA
Longmont, CO
Shakopee, MN
Bloomington, MN
Springtown, N. Ireland
Gwanggyo, South Korea
Wuxi, China
Suzhou, China
Teparuk & Korat, ThailandJohor, Penang &
Seremban Malaysia
Shugart, Singapore
Woodlands, Singapore
Drive & Component Manufacturing
Design Centers
Havant, UK
•  Seagate owns and operates 11 factories in 8
countries on 3 continents
•  More than three million square feet of
manufacturing space
•  Factories are vertically integrated from silicon
fabrication to drive assembly
•  Integrate over 45 billion parts per year
Hard Disk Drive Components
•  Seagate ships
~200M/year, or,
~500,000 drives/day
•  An assembled drive
generates about
20MB of test data
•  Each drive has
150-300 main
components
depending on drive
complexity
•  Unqualified estimate: 100MB/drive machine data generated
•  ~50TB per day
•  ~20PB of data a year
Sensor Overload!
•  What to do with 50TB of data
per day, or 20PB a year?
•  Streaming and real time
analytics for factory controls
•  Capture and perform deeper
analytics for product design,
quality, yields and many
many other uses
All Elements Working Together As One System
Solution: Big Data Analytics
End-to-End Coherent, Scalable Data Collection and Retention
Big Data Analytics Infrastructure (H/W + S/W) and Algorithms
Drive Quality
Engineering
and Assurance
Data
Drive
Assembly and
Manufacturing
Test Data
Incoming
Components
Data
Ongoing Quality
and Reliability
Test Data
Returned
Drives Test
and
Diagnostics
Data
Customer
Integration and
Field Data
(including Field
Telemetry)
Predictive Life
Models
Test auto-
Diagnostics
and Alerts
Predictive
Financial
Models
Robust
Excursion
Detection Algos
Ad-hoc Big
Data Analytics
Projects
In-situ Failure
Prediction
Big Data-Driven Quality Decision Layer
4.4 ZB in 2013
85% from New Data Types
15x Machine Data by 2020
44 ZB by 2020
ZB = 1B TB
Traditional Data Architecture
Pressured
Sources: Reinsel, David. “Where in the World Is Storage: A Look at Byte Density Across the Globe” IDC October 2013, IDC/EMC Digital Universe, April 2014
Hadoop
Ecosystem
Architecture
•  A Logical Data Warehouse combines traditional data warehouses with big data
systems to evolve your analytics capabilities beyond where you are today
•  Hadoop does not replace your EDW. EDW is a good “general purpose” data
management solution for integrating and conforming enterprise data to produce your
everyday business analytics
•  A typical EDW may have 100’s of data feeds, dozens of integrated applications and
run 1000’s to 100,000’s of queries a day
•  Hadoop is more specialized and much less mature. For now it will have only a few
application integration points and run fewer queries at a lower concurrency,
answering different questions
•  A Hadoop cluster of 60-100 nodes is a supercomputer. What would you use a
supercomputer for? Probably to answer the really big questions
Evolving to a Logical Data Warehouse
The Data Lake: Data Tiering
Hadoop cluster data loading and querying
EDW	
Factory	Data	
Systems	
UNIX	
HDFS	
SQOOP	
Big	SQL	
Hive	
SparkR	Ambari	
Compact	
&	Load	
Tier	3	(Derived	Data	Tables)	
Tier	2	(Hive	Structured	Data	Tables)	
Tier	1	(Raw	Data	Files)	
READ	
JDBC	|	ODBC|	Other	Drivers	
WRITE	
Data	Science	
ApplicaGons	
(SAS,	R,	Python,	ML)	10%	Drive	
Sampled	
T1/T2	
ApplicaFon	
Spark	
Yarn	
Ganglia	|	Nagios	
R	Shiny	
H2O	
Jupyter	
R	Studio	 MapReduce	
SparkSQL	
HCatalog	
Tez	
T1/T2	
ApplicaFon	
Component	
Component	
Drive	Data	
100%
Tier 1 / Tier 2 custom data loading application
Data Transport
•  Scoop: Pull EDW data to HDFS Tier 1
•  Non-EDW files (Factory push):
•  Trickle feed files to staging area
•  Unzip, Merge, reZip small files to large files
•  Push compacted files to HDFS Tier 1
Data Mapping & Loading
•  Match source/target columns
•  Detect and handle column changes
•  Transform data
•  Insert or Update data in Tier 2
•  Dual feed to cluster 2 Tier 1 Tier 2
Scheduling
•  Oozie backend
•  Configurable frequency
•  Currently Daily
•  Snapshots (waits for data loads to complete)
•  Meta data backups
Compaction
•  Major and Minor compaction
•  Minor: merges small files to large ones
•  Major: remove old versions of data (updates)
•  Consolidates HDFS directories
T1/T2 App
Enterprise Hadoop Architecture
CPUs 12
HDDs 12x3TB
RAM 144GB
Network Dual 10GBit
D
N
S
P
E
C
•  Incremental phased delivery, or use case by use case
•  Form a “data lake” or “data reservoir” for all enterprise data
•  Data availability must come first, model and transform the data in place within
Hadoop
•  resist moving the data again
•  Lots of talk about schema on read but for DW types of uses, this is impractical
•  Data modeling is still required but can be simplified
•  Have multiple clusters: Development, Test and then two or more Production, one for
Ad Hoc data exploration & experimentation, one for more governed uses
•  Use existing custom query/analytics solution to provide “transparent” access to
Hadoop
Some early Hadoop practices and learnings
•  Use partitioned/tiered data sets: raw, modeled/standardized, analytics, history/archive
•  Tier 0: extended history/archives (if needed)
•  Tier 1: low latency raw data for power users to access using low level tooling (MR,
Python)
•  Tier 2: de-duped, modeled and transformed data used by the majority of Hadoop
users
•  Tier 3: specialized analytic data sets for specific needs (e.g. data pivots,
aggregations)
•  Copy summarized data, derived analytics to EDW for broader use/analysis with BI
tools
•  Do lots of performance testing, run benchmarks, continually optimize
Some early Hadoop practices and learnings
Data Science
Theme Example
Image Analytics &
Pattern Recognition
Media flaw pattern recognition
Machine Learning Reduce drive testing through
failure prediction
Anomaly Detection Multivariate SPC
Predictive Analytics Predict process interactions that
are critical to quality and yields
Prognostic Health Field telemetry analytics
Data Science
Research
Automated feature extraction
Deep learning & image analytics
Imbalanced data
Data science tooling
Data Science
RED: PREDICTED
BLUE: ACTUAL
Data Science
In Action
•  Knowing which Hadoop projects to “bet on”, which data formats and compression
types to use
•  Speed of change: probably has more code been written than any other IT platform
•  Need to upgrade cluster software frequently (once a quarter)
•  Gaps: Some things not ready like ACID, real-time queries
•  Resource management for different types of workloads
•  Lack of BI tools that can really take advantage of huge data sets and visualize them
•  Still very batch processing orientated but interactive is gaining traction with Spark etc.
•  Provisioning large numbers of machines, hardware failures
•  Integrating remote clusters, cross cluster data movement and inter-cluster processing
Hadoop challenges – an emerging and evolving platform
Big Data Analytics Platform Evolution
Data	VirtualizaGon	
Storage	
Data	
Sources	
VisualizaGon,	BI,	
ReporGng	
Data	PreparaGon	
GUI		AnalyGcs	
Data	Science	
Programming	Tools	
Data	
Engineering	
Cloud	
Data	
Storage	
Seagate	Data	Centers	 Cloud	Data	Centers	
Data	PreparaGon	
GUI		AnalyGcs	
VisualizaGon,	BI,	
ReporGng	
CiGzen	Data	
ScienGst	&	SME	
Data	Science	
Programming	Tools	
Data	
ScienGst	
Business	Intelligence,	
Report	Designer	
Compute			
Compute	
(elasGc)		
Cloud	
automaGon	
workflows
Nicholas	Berg		 	nicholas.e.berg@seagate.com

More Related Content

What's hot

Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
Jeffrey T. Pollock
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
Caserta
 
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Yellowfin
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
Capgemini
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
Caserta
 
DW 101
DW 101DW 101
DW 101
jeffd00
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
Contexti
 
Developing a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceDeveloping a Strategy for Data Lake Governance
Developing a Strategy for Data Lake Governance
Tony Baer
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
Caserta
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
DataStax
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Denodo
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
Revolution Analytics
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
DataKitchen
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
MapR Technologies
 
Solution Architecture US healthcare
Solution Architecture US healthcare Solution Architecture US healthcare
Solution Architecture US healthcare
sumiteshkr
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
Datameer
 
Data science workshop
Data science workshopData science workshop
Data science workshop
Hortonworks
 

What's hot (20)

Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
DW 101
DW 101DW 101
DW 101
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
Developing a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceDeveloping a Strategy for Data Lake Governance
Developing a Strategy for Data Lake Governance
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
Solution Architecture US healthcare
Solution Architecture US healthcare Solution Architecture US healthcare
Solution Architecture US healthcare
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Data science workshop
Data science workshopData science workshop
Data science workshop
 

Viewers also liked

Big Fish Games: Democratizing Data Access
Big Fish Games: Democratizing Data AccessBig Fish Games: Democratizing Data Access
Big Fish Games: Democratizing Data Access
Seeling Cheung
 
Medical University of South Carolina: Using Big Data and Predictive Analytics...
Medical University of South Carolina: Using Big Data and Predictive Analytics...Medical University of South Carolina: Using Big Data and Predictive Analytics...
Medical University of South Carolina: Using Big Data and Predictive Analytics...
Seeling Cheung
 
BigInsights For Telecom
BigInsights For TelecomBigInsights For Telecom
BigInsights For Telecom
Seeling Cheung
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and Analytics
Seeling Cheung
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung
 
Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...
Seeling Cheung
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Seeling Cheung
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guide
Cynthia Saracco
 
Seagate
SeagateSeagate
Big Data: Querying complex JSON data with BigInsights and Hadoop
Big Data:  Querying complex JSON data with BigInsights and HadoopBig Data:  Querying complex JSON data with BigInsights and Hadoop
Big Data: Querying complex JSON data with BigInsights and Hadoop
Cynthia Saracco
 
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Cynthia Saracco
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab
Cynthia Saracco
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark
Cynthia Saracco
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase
Cynthia Saracco
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM
Cynthia Saracco
 
Pénfigo
PénfigoPénfigo
Pénfigo
Juan Meléndez
 
Digital, Social & Mobile in 2015
Digital, Social & Mobile in 2015Digital, Social & Mobile in 2015
Digital, Social & Mobile in 2015
We Are Social Singapore
 
Creative Traction Methodology - For Early Stage Startups
Creative Traction Methodology - For Early Stage StartupsCreative Traction Methodology - For Early Stage Startups
Creative Traction Methodology - For Early Stage Startups
Tommaso Di Bartolo
 
Capitalize on Big Data Through Hitachi Innovation
Capitalize on Big Data Through Hitachi InnovationCapitalize on Big Data Through Hitachi Innovation
Capitalize on Big Data Through Hitachi Innovation
Hitachi Vantara
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
Nicolas Morales
 

Viewers also liked (20)

Big Fish Games: Democratizing Data Access
Big Fish Games: Democratizing Data AccessBig Fish Games: Democratizing Data Access
Big Fish Games: Democratizing Data Access
 
Medical University of South Carolina: Using Big Data and Predictive Analytics...
Medical University of South Carolina: Using Big Data and Predictive Analytics...Medical University of South Carolina: Using Big Data and Predictive Analytics...
Medical University of South Carolina: Using Big Data and Predictive Analytics...
 
BigInsights For Telecom
BigInsights For TelecomBigInsights For Telecom
BigInsights For Telecom
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and Analytics
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guide
 
Seagate
SeagateSeagate
Seagate
 
Big Data: Querying complex JSON data with BigInsights and Hadoop
Big Data:  Querying complex JSON data with BigInsights and HadoopBig Data:  Querying complex JSON data with BigInsights and Hadoop
Big Data: Querying complex JSON data with BigInsights and Hadoop
 
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM
 
Pénfigo
PénfigoPénfigo
Pénfigo
 
Digital, Social & Mobile in 2015
Digital, Social & Mobile in 2015Digital, Social & Mobile in 2015
Digital, Social & Mobile in 2015
 
Creative Traction Methodology - For Early Stage Startups
Creative Traction Methodology - For Early Stage StartupsCreative Traction Methodology - For Early Stage Startups
Creative Traction Methodology - For Early Stage Startups
 
Capitalize on Big Data Through Hitachi Innovation
Capitalize on Big Data Through Hitachi InnovationCapitalize on Big Data Through Hitachi Innovation
Capitalize on Big Data Through Hitachi Innovation
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
 

Similar to Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent

Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
Michael Hiskey
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptx
Priyadarshini648418
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
Kognitio
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
IT Strategy Group
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Rizaldy Ignacio
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
Databricks
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
Pankajkumar496281
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Denodo
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
Attunity
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
NetApp
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
xKinAnx
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
Caserta
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
Microsoft
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
6535ANURAGANURAG
 

Similar to Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent (20)

Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptx
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 

Recently uploaded

Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
mukulupadhayay1
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
Vineet
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
22ad0301
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
zoykygu
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
hiju9823
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
newdirectionconsulta
 

Recently uploaded (20)

Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
 

Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent

  • 1. Nicholas Berg, Director Enterprise Analytics Seagate Technology Sensor Overload!Taming The Raging Manufacturing Big Data Torrent (DMT-1633) IBM World of Watson 2016 Conference
  • 2. You May Know Seagate as a Hard Drive Manufacturer… §  $11B Annual Revenue §  45,000 employees, 26 countries §  1st and only to ship over 2 billion drives §  Stores more than 40% of the world’s data §  Technology leader with 9000+ patents
  • 3. The Intelligent Information Infrastructure Seagate is Evolving to Address Trends SYSTEMS DEVICES ARRAYS Converged Infrastructure Hybrid Data Systems Software-defined Storage Flash Arrays Cloud Backup & Disaster Recovery NAND PCIe Traditional Enterprise NAND SASHyperscale Desktop Notebook / Tablet Branded NAND SATAHybrid Kinetic Still Growing Expanded Offerings Started with Our Core Products
  • 4. Seagate’s Manufacturing & Design Fremont, CA Longmont, CO Shakopee, MN Bloomington, MN Springtown, N. Ireland Gwanggyo, South Korea Wuxi, China Suzhou, China Teparuk & Korat, ThailandJohor, Penang & Seremban Malaysia Shugart, Singapore Woodlands, Singapore Drive & Component Manufacturing Design Centers Havant, UK •  Seagate owns and operates 11 factories in 8 countries on 3 continents •  More than three million square feet of manufacturing space •  Factories are vertically integrated from silicon fabrication to drive assembly •  Integrate over 45 billion parts per year
  • 5. Hard Disk Drive Components •  Seagate ships ~200M/year, or, ~500,000 drives/day •  An assembled drive generates about 20MB of test data •  Each drive has 150-300 main components depending on drive complexity •  Unqualified estimate: 100MB/drive machine data generated •  ~50TB per day •  ~20PB of data a year
  • 6. Sensor Overload! •  What to do with 50TB of data per day, or 20PB a year? •  Streaming and real time analytics for factory controls •  Capture and perform deeper analytics for product design, quality, yields and many many other uses
  • 7. All Elements Working Together As One System Solution: Big Data Analytics End-to-End Coherent, Scalable Data Collection and Retention Big Data Analytics Infrastructure (H/W + S/W) and Algorithms Drive Quality Engineering and Assurance Data Drive Assembly and Manufacturing Test Data Incoming Components Data Ongoing Quality and Reliability Test Data Returned Drives Test and Diagnostics Data Customer Integration and Field Data (including Field Telemetry) Predictive Life Models Test auto- Diagnostics and Alerts Predictive Financial Models Robust Excursion Detection Algos Ad-hoc Big Data Analytics Projects In-situ Failure Prediction Big Data-Driven Quality Decision Layer
  • 8. 4.4 ZB in 2013 85% from New Data Types 15x Machine Data by 2020 44 ZB by 2020 ZB = 1B TB Traditional Data Architecture Pressured Sources: Reinsel, David. “Where in the World Is Storage: A Look at Byte Density Across the Globe” IDC October 2013, IDC/EMC Digital Universe, April 2014
  • 10. •  A Logical Data Warehouse combines traditional data warehouses with big data systems to evolve your analytics capabilities beyond where you are today •  Hadoop does not replace your EDW. EDW is a good “general purpose” data management solution for integrating and conforming enterprise data to produce your everyday business analytics •  A typical EDW may have 100’s of data feeds, dozens of integrated applications and run 1000’s to 100,000’s of queries a day •  Hadoop is more specialized and much less mature. For now it will have only a few application integration points and run fewer queries at a lower concurrency, answering different questions •  A Hadoop cluster of 60-100 nodes is a supercomputer. What would you use a supercomputer for? Probably to answer the really big questions Evolving to a Logical Data Warehouse
  • 11. The Data Lake: Data Tiering
  • 12. Hadoop cluster data loading and querying EDW Factory Data Systems UNIX HDFS SQOOP Big SQL Hive SparkR Ambari Compact & Load Tier 3 (Derived Data Tables) Tier 2 (Hive Structured Data Tables) Tier 1 (Raw Data Files) READ JDBC | ODBC| Other Drivers WRITE Data Science ApplicaGons (SAS, R, Python, ML) 10% Drive Sampled T1/T2 ApplicaFon Spark Yarn Ganglia | Nagios R Shiny H2O Jupyter R Studio MapReduce SparkSQL HCatalog Tez T1/T2 ApplicaFon Component Component Drive Data 100%
  • 13. Tier 1 / Tier 2 custom data loading application Data Transport •  Scoop: Pull EDW data to HDFS Tier 1 •  Non-EDW files (Factory push): •  Trickle feed files to staging area •  Unzip, Merge, reZip small files to large files •  Push compacted files to HDFS Tier 1 Data Mapping & Loading •  Match source/target columns •  Detect and handle column changes •  Transform data •  Insert or Update data in Tier 2 •  Dual feed to cluster 2 Tier 1 Tier 2 Scheduling •  Oozie backend •  Configurable frequency •  Currently Daily •  Snapshots (waits for data loads to complete) •  Meta data backups Compaction •  Major and Minor compaction •  Minor: merges small files to large ones •  Major: remove old versions of data (updates) •  Consolidates HDFS directories T1/T2 App
  • 14. Enterprise Hadoop Architecture CPUs 12 HDDs 12x3TB RAM 144GB Network Dual 10GBit D N S P E C
  • 15. •  Incremental phased delivery, or use case by use case •  Form a “data lake” or “data reservoir” for all enterprise data •  Data availability must come first, model and transform the data in place within Hadoop •  resist moving the data again •  Lots of talk about schema on read but for DW types of uses, this is impractical •  Data modeling is still required but can be simplified •  Have multiple clusters: Development, Test and then two or more Production, one for Ad Hoc data exploration & experimentation, one for more governed uses •  Use existing custom query/analytics solution to provide “transparent” access to Hadoop Some early Hadoop practices and learnings
  • 16. •  Use partitioned/tiered data sets: raw, modeled/standardized, analytics, history/archive •  Tier 0: extended history/archives (if needed) •  Tier 1: low latency raw data for power users to access using low level tooling (MR, Python) •  Tier 2: de-duped, modeled and transformed data used by the majority of Hadoop users •  Tier 3: specialized analytic data sets for specific needs (e.g. data pivots, aggregations) •  Copy summarized data, derived analytics to EDW for broader use/analysis with BI tools •  Do lots of performance testing, run benchmarks, continually optimize Some early Hadoop practices and learnings
  • 17. Data Science Theme Example Image Analytics & Pattern Recognition Media flaw pattern recognition Machine Learning Reduce drive testing through failure prediction Anomaly Detection Multivariate SPC Predictive Analytics Predict process interactions that are critical to quality and yields Prognostic Health Field telemetry analytics Data Science Research Automated feature extraction Deep learning & image analytics Imbalanced data Data science tooling
  • 18. Data Science RED: PREDICTED BLUE: ACTUAL Data Science In Action
  • 19. •  Knowing which Hadoop projects to “bet on”, which data formats and compression types to use •  Speed of change: probably has more code been written than any other IT platform •  Need to upgrade cluster software frequently (once a quarter) •  Gaps: Some things not ready like ACID, real-time queries •  Resource management for different types of workloads •  Lack of BI tools that can really take advantage of huge data sets and visualize them •  Still very batch processing orientated but interactive is gaining traction with Spark etc. •  Provisioning large numbers of machines, hardware failures •  Integrating remote clusters, cross cluster data movement and inter-cluster processing Hadoop challenges – an emerging and evolving platform
  • 20. Big Data Analytics Platform Evolution Data VirtualizaGon Storage Data Sources VisualizaGon, BI, ReporGng Data PreparaGon GUI AnalyGcs Data Science Programming Tools Data Engineering Cloud Data Storage Seagate Data Centers Cloud Data Centers Data PreparaGon GUI AnalyGcs VisualizaGon, BI, ReporGng CiGzen Data ScienGst & SME Data Science Programming Tools Data ScienGst Business Intelligence, Report Designer Compute Compute (elasGc) Cloud automaGon workflows