SlideShare a Scribd company logo
1 of 31
Hype, Hopes, Hell & Hadoop
Big Data: Reality Check and Infrastructure
Implications of “The Enterprise of Everything”
Jean-Luc Chatelain, EVP & CTOStampedeCon 2014
2
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
2 And now, a quick word from my sponsor 
3
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
DDN | Who We Are
• Main Office: Santa Clara, California, USA
• Employees: ~550 in 20 Countries
• Installed Base: End Customers in 50 Countries
• Go To Market: Partner & Reseller Assisted, Direct
• DDN: World’s Largest Private Storage Company
We Design, Deploy and Optimize Storage Systems that Solve
HPC, Big Data and Cloud Business Challenges at Scale
World-Renowned & Award-Winning
4
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
Big Data & Cloud Infrastructure
DDN’s Award-Winning Product Portfolio
Analytics Reference
Architectures
EXAScaler™
10Ks of Clients
1TB/s+, HSM
Linux HPC Clients
NFS & CIFS [2014]
Petascale
Lustre® Storage
Enterprise
Scale-Out File Storage
GRIDScaler™
~10K Clients
1TB/s+, HSM
Linux/Windows HPC Clients
NFS & CIFS
SFA12KX™
48GB/s, 1.7M IOPS
1,680 Drives in 2
Racks
Optional Embedded
Computing
SFA7700™
13GB/s; 600K
IOPS
• 7700X
• 7700E
Storage Fusion Architecture™ Core Storage Platforms
SATA SSD
Flexible Drive Configuration
SAS
SFX™ Automated Flash Caching
WOS® 3.0
32 Trillion Unique Objects
Geo-Replicated Cloud Storage
256 Million Objects/Second
Self-Healing Cloud
Embedded metadata mgmt
Cloud Foundation
Big Data Platform
Management
DirectMon®
Cloud
Tiering
Infinite Memory Engine™
Distributed File System Buffer Cache
WOS7000
60 Drives in 4U
Self-Contained Servers
Adaptive Transparent Flash Cache
SFX API Gives Users Control
[pre-staging, alignment, bypass]
S3/Swift
Hype & Hopes
6
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
Hype
2011 2014
#bigdata in the trough of disillusion is great news for the enterprise!
Today
7
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
Back To The Future?
The term “Big Data” coined circa 1999(1)
• Pervasive in some existing markets since late 90’s
– HPC sensu latissimo
– Life Sciences
– Intelligence
– ASP (remember that word?)
Is there anything new here? Why the hype?
(1) A Personal Perspective on the Origin(s) and Development of Big Data" Diebold 2012
8
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
Is There a #bigdata Definition?
For some yes; for others no – or maybe there are multiple definitions
• It is “a basket of
technologies”
• It creates “a mindset
change in decision
making”
“Data sets that exceed the boundaries and sizes of current infrastructure
capabilities, forcing technologists to take a non-traditional approach”
Normal
Processing
Capabilities
File/Object Size, Content Volume
Activity:IOPS
Lots of
data
Large file
sizes
Lots of
transactions
9
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
#bigdata: 2 Dimensions of the 3 V’s
Petabytes of Data
but also
Trillions of Information
Objects
GB/s to TB/s
but also
Millions of Information
Object per second
Structured & Unstructured
but also
Streams & Batches
workloads
The “trillions” & “millions” are the primary drivers of complexity
and challenge “Time to Results”
VelocityVolume Variety
Remember . . .
1ms lost per operation on a billion operations workload= 11.5 days lost!
10
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
So, is #bigdata the new thing?
11
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
Quiz!
12
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
The Dawn of a Telemetry Revolution
Internet
of
Things
Social
Sensors
Telemetry
Revolution
The Birth of a
Mindset Change in
Business Decision
Making
Hell
14
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
Governance, Regulation, Compliance
The Universe of Big Data is
a massive black hole into
which GRC has fallen
• Governance
• Regulation
• Compliance
• Security
• Privacy
Now, welcome to the era of shadow data and
behold the plague of hyper-scalability
15
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
Tackling #bigdata Is Non-trivial
Value extraction (insights
driving business results) is
only done on 1% of total
enterprise data
Time to value & time to result is
business critical
– Inadequate infrastructure =
failure & credibility loss
The cardinality
dimensions of the 3V’s
are the infrastructure
killers
Material: network, compute,
storage
– Human: DBA, sysadmin &
storadmin
Today #bigdata project cannot live
in IT or it will fail
Dare to be different
#bigdata nullifies the feature race
and favors the benefit race
16
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
Let’s Talk Real #bignumbers
HPC is a forward looking time machine that eats #bigdata for lunch
• Enterprise’s
#bigdata problems
of today were HPC
problems 3 to 5
years ago
• HPC & WEB
architectures are
converging
17
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
The #bigdata Effect on Existing IT Infrastructures
18
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
Top 3 #bigdata Infrastructure Challenges
19
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
The Scalability Devil Effect on Typical
Analytics
• Economics of large capacity EDW storage
• Scalability of NAS/SAN file systems
• Bandwidth demand of OLAP engine
• IOPS demand of modelization
• Memory requirements of visualization
• MPP drives I/O blending
Structured
Data
Unstructured
Data
ETL
ETL
EDW
NAS/SAN
ETL
ETL
OLAP
Engine
Semantic
Engine
Model
Visualize
Report
Hadoop
21
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
Hadoop
• IS NOT a person or the solution to world famine or a BI
platform or an analytics platform or an EDW or a CEP
engine or …..
• IS a growing basket of technologies facilitating BI and/or
analytics especially if there is a lot of unstructured data
• IS at the core of many “science projects”
• IS in the infancy of deployment in the traditional enterprise
• HDFS “data lake” concept is very important
22
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
BI & Analytics Today
Database
File System
ETL
(primary)
Enterprise
Data
Warehouse
Reporting
&
Visualization
ETL
(secondary)
Analytics
CEP
Business
Auditing
&
Planning
23
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
Hadoop Effect
Database
ETL
Enterprise
Data
Warehouse
Reporting
&
Visualization
Analytics
CEP
Business
Auditing
&
Planning
Buiness
Data
Warehouse
24
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
24
#bigdata “At Work” with DDN
Case Studies
25
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
Accelerating Fraud Awareness
Harnessing Hadoop and Big Data
DDN helps PayPal’s Financial Linking
System achieve 200–250ms
processing and customer transparency
“On the cost side, the same
performance at 3-4 times less cost,
that’s clearly important. The fact is,
you’ve got scalability you didn’t
have previously.”
Ryan Quick, Principal Architect, PayPal
26
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
Accelerating Financial Insights
“Other technologies paled in
comparison to the performance
levels achieved with DDN’s SFA12K.”
Brian Alexseychuk, Managing Director of Infrastructure
• Resolved scaling challenges and
parallelized workflows
• Exceeded competitors on metrics such as
scalability, speed, density, and TCO
• Improved revenues, reduced trade
slippage by 70% & cut telecom expenses
27
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
Accelerating Time To Cure
“If you can serve some of the fastest
computers on the planet, then you
can help us.”
Phil Butcher, Head IT
“If you need 10K cores to perform an
extra layer of analysis in an hour …
you need a real solution that can
address everything from very small
to extremely large data sets.”
Tim Cutts, Head of Scientific Computing
28
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
Accelerating Intelligence Insights
Naval Research Lab
Large Data Program
Application
• Deep storage & fast distributed search
• Super-HD, 2/3-D, and streaming data
DDN enables rapid threat detection by speeding
up real-time data and imagery up to 500%.
In Conclusion
30
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
2 Faces of #bigdata =
Opportunities for Innovation
Technology
– Hyper-scalability: DB & FS
– Privacy (masking, obfuscation)
– Keyless security
– Visualization and navigation of
large datasets
– HDFS persistence
– Provenance
– In-memory computing
– In-Storage Processing
– GraphDB on MPP
– Brute force or machine
learning?
– Predictive & prescriptive
analytics
Business
– Agility
– Narrow casted solutions with
higher stickiness
– Data driven business decision
– Retain existing customers and
gain new ones
31
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change. ddn.com
@informationcto

More Related Content

What's hot

Gp Introduction 200811
Gp Introduction 200811Gp Introduction 200811
Gp Introduction 200811
iswaha
 
Academic Workflows with iRODS FINAL
Academic Workflows with iRODS FINALAcademic Workflows with iRODS FINAL
Academic Workflows with iRODS FINAL
Randy Splinter
 
NetApp - 10martie2011
NetApp - 10martie2011NetApp - 10martie2011
NetApp - 10martie2011
Agora Group
 

What's hot (20)

The Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPCThe Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPC
 
Solving Big Data Problems
Solving Big Data ProblemsSolving Big Data Problems
Solving Big Data Problems
 
Blazing Fast Lustre Storage
Blazing Fast Lustre StorageBlazing Fast Lustre Storage
Blazing Fast Lustre Storage
 
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
 
FAQ on Dedupe NetApp
FAQ on Dedupe NetAppFAQ on Dedupe NetApp
FAQ on Dedupe NetApp
 
EMC config Hadoop
EMC config HadoopEMC config Hadoop
EMC config Hadoop
 
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
 
Trends in Data Protection with DCIG
Trends in Data Protection with DCIGTrends in Data Protection with DCIG
Trends in Data Protection with DCIG
 
Gp Introduction 200811
Gp Introduction 200811Gp Introduction 200811
Gp Introduction 200811
 
Netapp Storage
Netapp StorageNetapp Storage
Netapp Storage
 
Academic Workflows with iRODS FINAL
Academic Workflows with iRODS FINALAcademic Workflows with iRODS FINAL
Academic Workflows with iRODS FINAL
 
Data Domain Architecture
Data Domain ArchitectureData Domain Architecture
Data Domain Architecture
 
Dell PowerEdge M820 blades: Balancing performance, density, and high availabi...
Dell PowerEdge M820 blades: Balancing performance, density, and high availabi...Dell PowerEdge M820 blades: Balancing performance, density, and high availabi...
Dell PowerEdge M820 blades: Balancing performance, density, and high availabi...
 
5 Things You Need to Know About Enterprise Fl
 5 Things You Need to Know About Enterprise Fl 5 Things You Need to Know About Enterprise Fl
5 Things You Need to Know About Enterprise Fl
 
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
 
Cisco UCS Application acceleration data optimization
Cisco UCS Application acceleration data optimizationCisco UCS Application acceleration data optimization
Cisco UCS Application acceleration data optimization
 
NetApp - 10martie2011
NetApp - 10martie2011NetApp - 10martie2011
NetApp - 10martie2011
 
[db tech showcase Tokyo 2016] E22: Getting real time Oracle data into Kafka a...
[db tech showcase Tokyo 2016] E22: Getting real time Oracle data into Kafka a...[db tech showcase Tokyo 2016] E22: Getting real time Oracle data into Kafka a...
[db tech showcase Tokyo 2016] E22: Getting real time Oracle data into Kafka a...
 
Webinar: Cleaning up the SDS Mess - Four Keys to Success
Webinar: Cleaning up the SDS Mess - Four Keys to SuccessWebinar: Cleaning up the SDS Mess - Four Keys to Success
Webinar: Cleaning up the SDS Mess - Four Keys to Success
 
Storage Conference 08 V2
Storage Conference 08 V2Storage Conference 08 V2
Storage Conference 08 V2
 

Similar to Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
StampedeCon
 

Similar to Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything) (20)

Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
 
DDN Service Strategy
DDN Service StrategyDDN Service Strategy
DDN Service Strategy
 
Big Data Management: A Unified Approach to Drive Business Results
Big Data Management: A Unified Approach to Drive Business ResultsBig Data Management: A Unified Approach to Drive Business Results
Big Data Management: A Unified Approach to Drive Business Results
 
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the ITCIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
Integrating Structure and Analytics with Unstructured Data
Integrating Structure and Analytics with Unstructured DataIntegrating Structure and Analytics with Unstructured Data
Integrating Structure and Analytics with Unstructured Data
 
IDERA Live | Maintaining Data Governance During Rapidly Changing Conditions
IDERA Live | Maintaining Data Governance During Rapidly Changing ConditionsIDERA Live | Maintaining Data Governance During Rapidly Changing Conditions
IDERA Live | Maintaining Data Governance During Rapidly Changing Conditions
 
HP Enterprise Software: Making your applications and information work for you
HP Enterprise Software: Making your applications and information work for youHP Enterprise Software: Making your applications and information work for you
HP Enterprise Software: Making your applications and information work for you
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
2015 12 08 Evanta CIO Summit_Miller
2015 12 08 Evanta CIO Summit_Miller2015 12 08 Evanta CIO Summit_Miller
2015 12 08 Evanta CIO Summit_Miller
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
Big Data Maturity Model
Big Data Maturity ModelBig Data Maturity Model
Big Data Maturity Model
 
Building Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalBuilding Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New Normal
 
The 4 Biggest Trends In Big Data and Analytics Right For 2021
The 4 Biggest Trends In Big Data and Analytics Right For 2021The 4 Biggest Trends In Big Data and Analytics Right For 2021
The 4 Biggest Trends In Big Data and Analytics Right For 2021
 
The LCG Digital Transformation Maturity Model
The LCG Digital Transformation Maturity ModelThe LCG Digital Transformation Maturity Model
The LCG Digital Transformation Maturity Model
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of Success
 
Cloud,beyond the hype, looking at the journey to Cloud
Cloud,beyond the hype, looking at the journey to CloudCloud,beyond the hype, looking at the journey to Cloud
Cloud,beyond the hype, looking at the journey to Cloud
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

  • 1. Hype, Hopes, Hell & Hadoop Big Data: Reality Check and Infrastructure Implications of “The Enterprise of Everything” Jean-Luc Chatelain, EVP & CTOStampedeCon 2014
  • 2. 2 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com 2 And now, a quick word from my sponsor 
  • 3. 3 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com DDN | Who We Are • Main Office: Santa Clara, California, USA • Employees: ~550 in 20 Countries • Installed Base: End Customers in 50 Countries • Go To Market: Partner & Reseller Assisted, Direct • DDN: World’s Largest Private Storage Company We Design, Deploy and Optimize Storage Systems that Solve HPC, Big Data and Cloud Business Challenges at Scale World-Renowned & Award-Winning
  • 4. 4 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com Big Data & Cloud Infrastructure DDN’s Award-Winning Product Portfolio Analytics Reference Architectures EXAScaler™ 10Ks of Clients 1TB/s+, HSM Linux HPC Clients NFS & CIFS [2014] Petascale Lustre® Storage Enterprise Scale-Out File Storage GRIDScaler™ ~10K Clients 1TB/s+, HSM Linux/Windows HPC Clients NFS & CIFS SFA12KX™ 48GB/s, 1.7M IOPS 1,680 Drives in 2 Racks Optional Embedded Computing SFA7700™ 13GB/s; 600K IOPS • 7700X • 7700E Storage Fusion Architecture™ Core Storage Platforms SATA SSD Flexible Drive Configuration SAS SFX™ Automated Flash Caching WOS® 3.0 32 Trillion Unique Objects Geo-Replicated Cloud Storage 256 Million Objects/Second Self-Healing Cloud Embedded metadata mgmt Cloud Foundation Big Data Platform Management DirectMon® Cloud Tiering Infinite Memory Engine™ Distributed File System Buffer Cache WOS7000 60 Drives in 4U Self-Contained Servers Adaptive Transparent Flash Cache SFX API Gives Users Control [pre-staging, alignment, bypass] S3/Swift
  • 6. 6 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com Hype 2011 2014 #bigdata in the trough of disillusion is great news for the enterprise! Today
  • 7. 7 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com Back To The Future? The term “Big Data” coined circa 1999(1) • Pervasive in some existing markets since late 90’s – HPC sensu latissimo – Life Sciences – Intelligence – ASP (remember that word?) Is there anything new here? Why the hype? (1) A Personal Perspective on the Origin(s) and Development of Big Data" Diebold 2012
  • 8. 8 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com Is There a #bigdata Definition? For some yes; for others no – or maybe there are multiple definitions • It is “a basket of technologies” • It creates “a mindset change in decision making” “Data sets that exceed the boundaries and sizes of current infrastructure capabilities, forcing technologists to take a non-traditional approach” Normal Processing Capabilities File/Object Size, Content Volume Activity:IOPS Lots of data Large file sizes Lots of transactions
  • 9. 9 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com #bigdata: 2 Dimensions of the 3 V’s Petabytes of Data but also Trillions of Information Objects GB/s to TB/s but also Millions of Information Object per second Structured & Unstructured but also Streams & Batches workloads The “trillions” & “millions” are the primary drivers of complexity and challenge “Time to Results” VelocityVolume Variety Remember . . . 1ms lost per operation on a billion operations workload= 11.5 days lost!
  • 10. 10 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com So, is #bigdata the new thing?
  • 11. 11 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com Quiz!
  • 12. 12 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com The Dawn of a Telemetry Revolution Internet of Things Social Sensors Telemetry Revolution The Birth of a Mindset Change in Business Decision Making
  • 13. Hell
  • 14. 14 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com Governance, Regulation, Compliance The Universe of Big Data is a massive black hole into which GRC has fallen • Governance • Regulation • Compliance • Security • Privacy Now, welcome to the era of shadow data and behold the plague of hyper-scalability
  • 15. 15 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com Tackling #bigdata Is Non-trivial Value extraction (insights driving business results) is only done on 1% of total enterprise data Time to value & time to result is business critical – Inadequate infrastructure = failure & credibility loss The cardinality dimensions of the 3V’s are the infrastructure killers Material: network, compute, storage – Human: DBA, sysadmin & storadmin Today #bigdata project cannot live in IT or it will fail Dare to be different #bigdata nullifies the feature race and favors the benefit race
  • 16. 16 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com Let’s Talk Real #bignumbers HPC is a forward looking time machine that eats #bigdata for lunch • Enterprise’s #bigdata problems of today were HPC problems 3 to 5 years ago • HPC & WEB architectures are converging
  • 17. 17 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com The #bigdata Effect on Existing IT Infrastructures
  • 18. 18 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com Top 3 #bigdata Infrastructure Challenges
  • 19. 19 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com The Scalability Devil Effect on Typical Analytics • Economics of large capacity EDW storage • Scalability of NAS/SAN file systems • Bandwidth demand of OLAP engine • IOPS demand of modelization • Memory requirements of visualization • MPP drives I/O blending Structured Data Unstructured Data ETL ETL EDW NAS/SAN ETL ETL OLAP Engine Semantic Engine Model Visualize Report
  • 21. 21 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com Hadoop • IS NOT a person or the solution to world famine or a BI platform or an analytics platform or an EDW or a CEP engine or ….. • IS a growing basket of technologies facilitating BI and/or analytics especially if there is a lot of unstructured data • IS at the core of many “science projects” • IS in the infancy of deployment in the traditional enterprise • HDFS “data lake” concept is very important
  • 22. 22 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com BI & Analytics Today Database File System ETL (primary) Enterprise Data Warehouse Reporting & Visualization ETL (secondary) Analytics CEP Business Auditing & Planning
  • 23. 23 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com Hadoop Effect Database ETL Enterprise Data Warehouse Reporting & Visualization Analytics CEP Business Auditing & Planning Buiness Data Warehouse
  • 24. 24 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com 24 #bigdata “At Work” with DDN Case Studies
  • 25. 25 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com Accelerating Fraud Awareness Harnessing Hadoop and Big Data DDN helps PayPal’s Financial Linking System achieve 200–250ms processing and customer transparency “On the cost side, the same performance at 3-4 times less cost, that’s clearly important. The fact is, you’ve got scalability you didn’t have previously.” Ryan Quick, Principal Architect, PayPal
  • 26. 26 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com Accelerating Financial Insights “Other technologies paled in comparison to the performance levels achieved with DDN’s SFA12K.” Brian Alexseychuk, Managing Director of Infrastructure • Resolved scaling challenges and parallelized workflows • Exceeded competitors on metrics such as scalability, speed, density, and TCO • Improved revenues, reduced trade slippage by 70% & cut telecom expenses
  • 27. 27 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com Accelerating Time To Cure “If you can serve some of the fastest computers on the planet, then you can help us.” Phil Butcher, Head IT “If you need 10K cores to perform an extra layer of analysis in an hour … you need a real solution that can address everything from very small to extremely large data sets.” Tim Cutts, Head of Scientific Computing
  • 28. 28 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com Accelerating Intelligence Insights Naval Research Lab Large Data Program Application • Deep storage & fast distributed search • Super-HD, 2/3-D, and streaming data DDN enables rapid threat detection by speeding up real-time data and imagery up to 500%.
  • 30. 30 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com 2 Faces of #bigdata = Opportunities for Innovation Technology – Hyper-scalability: DB & FS – Privacy (masking, obfuscation) – Keyless security – Visualization and navigation of large datasets – HDFS persistence – Provenance – In-memory computing – In-Storage Processing – GraphDB on MPP – Brute force or machine learning? – Predictive & prescriptive analytics Business – Agility – Narrow casted solutions with higher stickiness – Data driven business decision – Retain existing customers and gain new ones
  • 31. 31 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com @informationcto