SlideShare a Scribd company logo
LEARN • NETWORK • COLLABORATE • INFLUENCE
LEARN • NETWORK • COLLABORATE • INFLUENCE
Deploying Big Data platforms
LEARN • NETWORK • COLLABORATE • INFLUENCE
Chris Kernaghan
Principal Consultant
LEARN • NETWORK • COLLABORATE • INFLUENCE
Cholera epidemic first use of big data
LEARN • NETWORK • COLLABORATE • INFLUENCE
Big Data Epidemiology by Google
LEARN • NETWORK • COLLABORATE • INFLUENCE
How I really got started in Big Data
John, we need
to give Chris
more grey hair
Let’s throw him
into a Big Data
demo
LEARN • NETWORK • COLLABORATE • INFLUENCE
My examples
LEARN • NETWORK • COLLABORATE • INFLUENCE
LEARN • NETWORK • COLLABORATE • INFLUENCE
Areas of focus
Data acquisition
and curation
Data storage Compute
infrastructure
Analysis and
Insight
Everything as Code*
* Well As much as possible
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data Acquisition and curation
Areas of focus
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data Lake
HANA
LEARN • NETWORK • COLLABORATE • INFLUENCE
How big was the Panama Papers data set
LEARN • NETWORK • COLLABORATE • INFLUENCE
How big was the Panama Papers data set
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data Lake
Panama Papers Technology stack
SQL
LEARN • NETWORK • COLLABORATE • INFLUENCE
The tools used supported 370 journalists from
around the world
Infrastructure
was a pool of
up to 40
servers run in
AWS
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data quality and curation are not one time activities
Remove the human element as much as possible
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data security
• Data lake
– What data do you collect
– Do you have restrictions on what data can be combined
– How long does your data live
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data security
• Geographical concerns
– Where does your data reside
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data security
• Authentication
– Who is accessing your data
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data Storage
Areas of focus
LEARN • NETWORK • COLLABORATE • INFLUENCE
How BIG is Big Data
LEARN • NETWORK • COLLABORATE • INFLUENCE
LEARN • NETWORK • COLLABORATE • INFLUENCE
Storage Considerations
• IOPS are still important
– Big data still uses a lot of spinning disk
• Replication and Redundancy
– Eats a lot of disk space
• Build for failure
• Sometimes you have to go in-memory
LEARN • NETWORK • COLLABORATE • INFLUENCE
Compute infrastructure
Areas of focus
LEARN • NETWORK • COLLABORATE • INFLUENCE
Structured Reporting Versus Big Data/Science
Compute requirements
2
• Structured reporting systems run business processes
– Sized and static
– Under change control
– Business centric
LEARN • NETWORK • COLLABORATE • INFLUENCE
Structured Reporting Versus Big Data/Science
Compute requirements
2
• Data science systems answer difficult questions irregularly
– Cloud or heavy use of virtualisation
– Developer centric
– Rapidly evolving
LEARN • NETWORK • COLLABORATE • INFLUENCE
What you still need to remember
2
• Compute is cheap
• Scalability is critical
LEARN • NETWORK • COLLABORATE • INFLUENCE
What you still need to remember
2
• Software definition for consistency
• Automate as much as possible
LEARN • NETWORK • COLLABORATE • INFLUENCE
2
100 Hadoop
Nodes
122GB RAM
Each = 12.2TB RAM
Build time of 3Hrs
LEARN • NETWORK • COLLABORATE • INFLUENCE
Use of scripted builds from VM to application
2
Disk definition
Network
defintion
Software
Install
LEARN • NETWORK • COLLABORATE • INFLUENCE
Use of scripted builds from VM to application
3
• Deployment was consistent for each and every node of the
cluster
– Hostnames defined the same way
– Configuration files created the same way
LEARN • NETWORK • COLLABORATE • INFLUENCE
Use of scripted builds from VM to application
3
• Faster deployment
– Automated build 3hrs to build and deploy 100 nodes
– Manual build 800hrs + to build and deploy 100 nodes
• Use of automated tools to detect failure and start new node
(ElasticBeanstalk)
LEARN • NETWORK • COLLABORATE • INFLUENCE
Use of scripted builds from VM to application
3
• Reusability of script
– Heavy use of parameters means it is adaptable
• Use of Git meant distributed development was handled easily
LEARN • NETWORK • COLLABORATE • INFLUENCE
LEARN • NETWORK • COLLABORATE • INFLUENCE
Analysis and Insight
3
Areas of focus
Presentation Tag Line
LEARN • NETWORK • COLLABORATE • INFLUENCE
Query the Data
• Programmatically
– Python
– R
• Application
– Lumira
– Business Objects
– Spark
– SQL
– Excel
– ElasticSearch
LEARN • NETWORK • COLLABORATE • INFLUENCE
Analysis and Visualisation
• Quick Analysis
– Lumira, Excel
• Graph
– Neo4J, Synerscope
• Charts
– Business Objects, Grafana, Kibana
• Dynamic
– D3
http://www.wikiviz.org/wiki/Tools
LEARN • NETWORK • COLLABORATE • INFLUENCE
Things to remember
• Remember the type of
platform you are using
• Storage is cheap but not
all storage is equal
• Scalability is critical
• Version control rocks
• Automate everything
you can
• Value is in the data but
not all data is valuable
• Data should not live
forever
LEARN • NETWORK • COLLABORATE • INFLUENCE
3
• Key Takeways
LEARN • NETWORK • COLLABORATE • INFLUENCE

More Related Content

What's hot

How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...DataWorks Summit
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoopgregchanan
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLDataStax
 
Enterprise Data Warehouse Optimization: 7 Keys to Success
Enterprise Data Warehouse Optimization: 7 Keys to SuccessEnterprise Data Warehouse Optimization: 7 Keys to Success
Enterprise Data Warehouse Optimization: 7 Keys to SuccessHortonworks
 
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...DataStax Academy
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithDatabricks
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!DataWorks Summit
 
Redash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesRedash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesDatabricks
 
Complex Data Transformations Made Easy
Complex Data Transformations Made EasyComplex Data Transformations Made Easy
Complex Data Transformations Made EasyData Con LA
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsMark Rittman
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & RedshiftDataKitchen
 
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...In-Memory Computing Summit
 
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...DataStax Academy
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTGuido Schmutz
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Hortonworks
 
Data Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJData Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJDataWorks Summit/Hadoop Summit
 

What's hot (20)

How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
 
Enterprise Data Warehouse Optimization: 7 Keys to Success
Enterprise Data Warehouse Optimization: 7 Keys to SuccessEnterprise Data Warehouse Optimization: 7 Keys to Success
Enterprise Data Warehouse Optimization: 7 Keys to Success
 
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
 
Instrumenting your Instruments
Instrumenting your Instruments Instrumenting your Instruments
Instrumenting your Instruments
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
 
Redash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesRedash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data Lakes
 
Complex Data Transformations Made Easy
Complex Data Transformations Made EasyComplex Data Transformations Made Easy
Complex Data Transformations Made Easy
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
 
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
 
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoT
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
 
Data Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJData Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJ
 

Similar to Deploying Big Data Platforms

Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
 
PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent Jonny Daenen
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applicationsAmit Kejriwal
 
Houd controle over uw data
Houd controle over uw dataHoud controle over uw data
Houd controle over uw dataICT-Partners
 
The Website Resiliency Imperative
The Website Resiliency ImperativeThe Website Resiliency Imperative
The Website Resiliency ImperativeDistil Networks
 
How To Build A Stable And Robust Base For a “Cloud”
How To Build A Stable And Robust Base For a “Cloud”How To Build A Stable And Robust Base For a “Cloud”
How To Build A Stable And Robust Base For a “Cloud”Hardway Hou
 
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data DataCentred
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Cloudera, Inc.
 
November 2013 HUG: Cyber Security with Hadoop
November 2013 HUG: Cyber Security with HadoopNovember 2013 HUG: Cyber Security with Hadoop
November 2013 HUG: Cyber Security with HadoopYahoo Developer Network
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Building a Hybrid Cloud Solution
Building a Hybrid Cloud Solution Building a Hybrid Cloud Solution
Building a Hybrid Cloud Solution Cloudian
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 
Cloud - Security - Big Data
Cloud - Security - Big DataCloud - Security - Big Data
Cloud - Security - Big DataRaffael Marty
 
Ankus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration frameworkAnkus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration frameworkAshrith Mekala
 
How and why you need to build a big data lab
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data labChris Kernaghan
 
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing ArchitecturesWorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing ArchitecturesIlkay Altintas, Ph.D.
 
Neo4j + Process Tempo present Plan Your Cloud Migration with Confidence
Neo4j + Process Tempo present Plan Your Cloud Migration with ConfidenceNeo4j + Process Tempo present Plan Your Cloud Migration with Confidence
Neo4j + Process Tempo present Plan Your Cloud Migration with ConfidenceNeo4j
 
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...Denodo
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0Amr Kamel Deklel
 

Similar to Deploying Big Data Platforms (20)

Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
Houd controle over uw data
Houd controle over uw dataHoud controle over uw data
Houd controle over uw data
 
The Website Resiliency Imperative
The Website Resiliency ImperativeThe Website Resiliency Imperative
The Website Resiliency Imperative
 
How To Build A Stable And Robust Base For a “Cloud”
How To Build A Stable And Robust Base For a “Cloud”How To Build A Stable And Robust Base For a “Cloud”
How To Build A Stable And Robust Base For a “Cloud”
 
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 
November 2013 HUG: Cyber Security with Hadoop
November 2013 HUG: Cyber Security with HadoopNovember 2013 HUG: Cyber Security with Hadoop
November 2013 HUG: Cyber Security with Hadoop
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Building a Hybrid Cloud Solution
Building a Hybrid Cloud Solution Building a Hybrid Cloud Solution
Building a Hybrid Cloud Solution
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
Cloud - Security - Big Data
Cloud - Security - Big DataCloud - Security - Big Data
Cloud - Security - Big Data
 
Ankus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration frameworkAnkus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration framework
 
How and why you need to build a big data lab
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data lab
 
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing ArchitecturesWorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
 
Neo4j + Process Tempo present Plan Your Cloud Migration with Confidence
Neo4j + Process Tempo present Plan Your Cloud Migration with ConfidenceNeo4j + Process Tempo present Plan Your Cloud Migration with Confidence
Neo4j + Process Tempo present Plan Your Cloud Migration with Confidence
 
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 

More from Chris Kernaghan

DevOps for SAP customers
DevOps for SAP customersDevOps for SAP customers
DevOps for SAP customersChris Kernaghan
 
Can you do DevOps in SAP (DevOps -> SAP)
Can you do DevOps in SAP (DevOps -> SAP)Can you do DevOps in SAP (DevOps -> SAP)
Can you do DevOps in SAP (DevOps -> SAP)Chris Kernaghan
 
Change Management in Hybrid landscapes 2017
Change Management in Hybrid landscapes 2017Change Management in Hybrid landscapes 2017
Change Management in Hybrid landscapes 2017Chris Kernaghan
 
Can you do DevOps in SAP (SAP -> DevOps)
Can you do DevOps in SAP (SAP -> DevOps)Can you do DevOps in SAP (SAP -> DevOps)
Can you do DevOps in SAP (SAP -> DevOps)Chris Kernaghan
 
Change management in hybrid landscapes
Change management in hybrid landscapesChange management in hybrid landscapes
Change management in hybrid landscapesChris Kernaghan
 
Quick and dirty performance analysis
Quick and dirty performance analysisQuick and dirty performance analysis
Quick and dirty performance analysisChris Kernaghan
 
HANA - the backbone for S/4 HANA
HANA - the backbone for S/4 HANAHANA - the backbone for S/4 HANA
HANA - the backbone for S/4 HANAChris Kernaghan
 
TEC118 – How Do You Manage the Configuration of Your Environments from Metal ...
TEC118 –How Do You Manage the Configuration of Your Environments from Metal ...TEC118 –How Do You Manage the Configuration of Your Environments from Metal ...
TEC118 – How Do You Manage the Configuration of Your Environments from Metal ...Chris Kernaghan
 
Automating Infrastructure as a Service Deployments and monitoring – TEC213
Automating Infrastructure as a Service Deployments and monitoring – TEC213Automating Infrastructure as a Service Deployments and monitoring – TEC213
Automating Infrastructure as a Service Deployments and monitoring – TEC213Chris Kernaghan
 
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deploymentsSAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deploymentsChris Kernaghan
 
SAP TechEd 2013 session Tec118 managing your-environment
SAP TechEd 2013 session Tec118 managing your-environmentSAP TechEd 2013 session Tec118 managing your-environment
SAP TechEd 2013 session Tec118 managing your-environmentChris Kernaghan
 
01 sap hana landscape and operations infrastructure v2 0
01  sap hana landscape and operations infrastructure v2 001  sap hana landscape and operations infrastructure v2 0
01 sap hana landscape and operations infrastructure v2 0Chris Kernaghan
 

More from Chris Kernaghan (15)

DevOps for SAP customers
DevOps for SAP customersDevOps for SAP customers
DevOps for SAP customers
 
Can you do DevOps in SAP (DevOps -> SAP)
Can you do DevOps in SAP (DevOps -> SAP)Can you do DevOps in SAP (DevOps -> SAP)
Can you do DevOps in SAP (DevOps -> SAP)
 
Change Management in Hybrid landscapes 2017
Change Management in Hybrid landscapes 2017Change Management in Hybrid landscapes 2017
Change Management in Hybrid landscapes 2017
 
Beginners HANA
Beginners HANABeginners HANA
Beginners HANA
 
Can you do DevOps in SAP (SAP -> DevOps)
Can you do DevOps in SAP (SAP -> DevOps)Can you do DevOps in SAP (SAP -> DevOps)
Can you do DevOps in SAP (SAP -> DevOps)
 
Change management in hybrid landscapes
Change management in hybrid landscapesChange management in hybrid landscapes
Change management in hybrid landscapes
 
Quick and dirty performance analysis
Quick and dirty performance analysisQuick and dirty performance analysis
Quick and dirty performance analysis
 
HANA - the backbone for S/4 HANA
HANA - the backbone for S/4 HANAHANA - the backbone for S/4 HANA
HANA - the backbone for S/4 HANA
 
Cloud or On Premise
Cloud or On PremiseCloud or On Premise
Cloud or On Premise
 
TEC118 – How Do You Manage the Configuration of Your Environments from Metal ...
TEC118 –How Do You Manage the Configuration of Your Environments from Metal ...TEC118 –How Do You Manage the Configuration of Your Environments from Metal ...
TEC118 – How Do You Manage the Configuration of Your Environments from Metal ...
 
Automating Infrastructure as a Service Deployments and monitoring – TEC213
Automating Infrastructure as a Service Deployments and monitoring – TEC213Automating Infrastructure as a Service Deployments and monitoring – TEC213
Automating Infrastructure as a Service Deployments and monitoring – TEC213
 
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deploymentsSAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
 
SAP TechEd 2013 session Tec118 managing your-environment
SAP TechEd 2013 session Tec118 managing your-environmentSAP TechEd 2013 session Tec118 managing your-environment
SAP TechEd 2013 session Tec118 managing your-environment
 
01 sap hana landscape and operations infrastructure v2 0
01  sap hana landscape and operations infrastructure v2 001  sap hana landscape and operations infrastructure v2 0
01 sap hana landscape and operations infrastructure v2 0
 
Sapuki sig 2013
Sapuki sig 2013Sapuki sig 2013
Sapuki sig 2013
 

Recently uploaded

A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfEasyPrinterHelp
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfChristopherTHyatt
 
Buy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptxBuy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptxEasyPrinterHelp
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024TopCSSGallery
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKUXDXConf
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 

Recently uploaded (20)

A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
Buy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptxBuy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptx
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 

Deploying Big Data Platforms

  • 1. LEARN • NETWORK • COLLABORATE • INFLUENCE
  • 2. LEARN • NETWORK • COLLABORATE • INFLUENCE Deploying Big Data platforms LEARN • NETWORK • COLLABORATE • INFLUENCE Chris Kernaghan Principal Consultant
  • 3. LEARN • NETWORK • COLLABORATE • INFLUENCE Cholera epidemic first use of big data
  • 4. LEARN • NETWORK • COLLABORATE • INFLUENCE Big Data Epidemiology by Google
  • 5. LEARN • NETWORK • COLLABORATE • INFLUENCE How I really got started in Big Data John, we need to give Chris more grey hair Let’s throw him into a Big Data demo
  • 6. LEARN • NETWORK • COLLABORATE • INFLUENCE My examples
  • 7. LEARN • NETWORK • COLLABORATE • INFLUENCE
  • 8. LEARN • NETWORK • COLLABORATE • INFLUENCE Areas of focus Data acquisition and curation Data storage Compute infrastructure Analysis and Insight Everything as Code* * Well As much as possible
  • 9. LEARN • NETWORK • COLLABORATE • INFLUENCE Data Acquisition and curation Areas of focus
  • 10. LEARN • NETWORK • COLLABORATE • INFLUENCE Data Lake HANA
  • 11. LEARN • NETWORK • COLLABORATE • INFLUENCE How big was the Panama Papers data set
  • 12. LEARN • NETWORK • COLLABORATE • INFLUENCE How big was the Panama Papers data set
  • 13. LEARN • NETWORK • COLLABORATE • INFLUENCE Data Lake Panama Papers Technology stack SQL
  • 14. LEARN • NETWORK • COLLABORATE • INFLUENCE The tools used supported 370 journalists from around the world Infrastructure was a pool of up to 40 servers run in AWS
  • 15. LEARN • NETWORK • COLLABORATE • INFLUENCE Data quality and curation are not one time activities Remove the human element as much as possible
  • 16. LEARN • NETWORK • COLLABORATE • INFLUENCE Data security • Data lake – What data do you collect – Do you have restrictions on what data can be combined – How long does your data live
  • 17. LEARN • NETWORK • COLLABORATE • INFLUENCE Data security • Geographical concerns – Where does your data reside
  • 18. LEARN • NETWORK • COLLABORATE • INFLUENCE Data security • Authentication – Who is accessing your data
  • 19. LEARN • NETWORK • COLLABORATE • INFLUENCE Data Storage Areas of focus
  • 20. LEARN • NETWORK • COLLABORATE • INFLUENCE How BIG is Big Data
  • 21. LEARN • NETWORK • COLLABORATE • INFLUENCE
  • 22. LEARN • NETWORK • COLLABORATE • INFLUENCE Storage Considerations • IOPS are still important – Big data still uses a lot of spinning disk • Replication and Redundancy – Eats a lot of disk space • Build for failure • Sometimes you have to go in-memory
  • 23. LEARN • NETWORK • COLLABORATE • INFLUENCE Compute infrastructure Areas of focus
  • 24. LEARN • NETWORK • COLLABORATE • INFLUENCE Structured Reporting Versus Big Data/Science Compute requirements 2 • Structured reporting systems run business processes – Sized and static – Under change control – Business centric
  • 25. LEARN • NETWORK • COLLABORATE • INFLUENCE Structured Reporting Versus Big Data/Science Compute requirements 2 • Data science systems answer difficult questions irregularly – Cloud or heavy use of virtualisation – Developer centric – Rapidly evolving
  • 26. LEARN • NETWORK • COLLABORATE • INFLUENCE What you still need to remember 2 • Compute is cheap • Scalability is critical
  • 27. LEARN • NETWORK • COLLABORATE • INFLUENCE What you still need to remember 2 • Software definition for consistency • Automate as much as possible
  • 28. LEARN • NETWORK • COLLABORATE • INFLUENCE 2 100 Hadoop Nodes 122GB RAM Each = 12.2TB RAM Build time of 3Hrs
  • 29. LEARN • NETWORK • COLLABORATE • INFLUENCE Use of scripted builds from VM to application 2 Disk definition Network defintion Software Install
  • 30. LEARN • NETWORK • COLLABORATE • INFLUENCE Use of scripted builds from VM to application 3 • Deployment was consistent for each and every node of the cluster – Hostnames defined the same way – Configuration files created the same way
  • 31. LEARN • NETWORK • COLLABORATE • INFLUENCE Use of scripted builds from VM to application 3 • Faster deployment – Automated build 3hrs to build and deploy 100 nodes – Manual build 800hrs + to build and deploy 100 nodes • Use of automated tools to detect failure and start new node (ElasticBeanstalk)
  • 32. LEARN • NETWORK • COLLABORATE • INFLUENCE Use of scripted builds from VM to application 3 • Reusability of script – Heavy use of parameters means it is adaptable • Use of Git meant distributed development was handled easily
  • 33. LEARN • NETWORK • COLLABORATE • INFLUENCE
  • 34. LEARN • NETWORK • COLLABORATE • INFLUENCE Analysis and Insight 3 Areas of focus Presentation Tag Line
  • 35. LEARN • NETWORK • COLLABORATE • INFLUENCE Query the Data • Programmatically – Python – R • Application – Lumira – Business Objects – Spark – SQL – Excel – ElasticSearch
  • 36. LEARN • NETWORK • COLLABORATE • INFLUENCE Analysis and Visualisation • Quick Analysis – Lumira, Excel • Graph – Neo4J, Synerscope • Charts – Business Objects, Grafana, Kibana • Dynamic – D3 http://www.wikiviz.org/wiki/Tools
  • 37. LEARN • NETWORK • COLLABORATE • INFLUENCE Things to remember • Remember the type of platform you are using • Storage is cheap but not all storage is equal • Scalability is critical • Version control rocks • Automate everything you can • Value is in the data but not all data is valuable • Data should not live forever
  • 38. LEARN • NETWORK • COLLABORATE • INFLUENCE 3 • Key Takeways
  • 39. LEARN • NETWORK • COLLABORATE • INFLUENCE

Editor's Notes

  1. John Snow London
  2. 2008 H1N1 flu pandemic in US CDC had out of date data
  3. Panama papers – transient use case Under Armour – constant data use case answering lots of different questions Common Sense Finance institution – transient audit data use case Natures Hope – Pushing structured data into data lake to provide better temperate control as part of their data lifecycle Intel – using event streaming to drive manufacturing processes
  4. We are literally drowning in data – data lakes What data do we acquire – sensor data, web data, social media, transactional data What data is actually necessary, how long does it need to live for, what is its data life cycle What data do we need that we do not have access to How do we curate data for data lakes
  5. We are literally drowning in data – data lakes What data do we acquire – sensor data, web data, social media, transactional data What data is actually necessary, how long does it need to live for, what is its data life cycle What data do we need that we do not have access to How do we curate data for data lakes
  6. We have four developers and three journalists. 
  7. Time line Working on Platform for 3 years across the various links Processed Panama papers in around 12 months
  8. How do we store data – databases and files Big data data storage systems HDFS Cloud based S3 or Azure Storage Databases – SQL and NoSQL CSV Hardware – massively scalable software defined infrastructures which expect failure
  9. John broke my cluster 20 nodes – scaled to 100 nodes