Submit Search
Upload
Hadoop engineering bo_f_final
•
Download as PPTX, PDF
•
0 likes
•
1,995 views
Ramya Sunil
Follow
Best Practices in Hadoop Engineering: operations, quality and releases
Read less
Read more
Engineering
Report
Share
Report
Share
1 of 18
Download now
Recommended
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
The Future of Apache Storm
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
DataWorks Summit/Hadoop Summit
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
DataWorks Summit
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
DataWorks Summit/Hadoop Summit
Recommended
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
The Future of Apache Storm
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
DataWorks Summit/Hadoop Summit
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
DataWorks Summit
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
DataWorks Summit/Hadoop Summit
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
DataWorks Summit
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Yafang Chang
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
DataWorks Summit/Hadoop Summit
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
Hadoop - Lessons Learned
Hadoop - Lessons Learned
tcurdt
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
DataWorks Summit
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
Yafang Chang
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
DataWorks Summit
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit/Hadoop Summit
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
Chris Nauroth
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
Chris Nauroth
Getting to Walk with DevOps
Getting to Walk with DevOps
Eklove Mohan
Docker for the enterprise
Docker for the enterprise
Bert Poller
More Related Content
What's hot
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
DataWorks Summit
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Yafang Chang
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
DataWorks Summit/Hadoop Summit
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
Hadoop - Lessons Learned
Hadoop - Lessons Learned
tcurdt
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
DataWorks Summit
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
Yafang Chang
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
DataWorks Summit
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit/Hadoop Summit
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
Chris Nauroth
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
Chris Nauroth
What's hot
(20)
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Hadoop - Lessons Learned
Hadoop - Lessons Learned
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
Similar to Hadoop engineering bo_f_final
Getting to Walk with DevOps
Getting to Walk with DevOps
Eklove Mohan
Docker for the enterprise
Docker for the enterprise
Bert Poller
Ankit Chohan - Java
Ankit Chohan - Java
Ankit Chohan
DevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 Conference
Grid Dynamics
Midwest PHP - Scaling Magento
Midwest PHP - Scaling Magento
Mathew Beane
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
Vietnam Open Infrastructure User Group
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
Peter Clapham
Flexible compute
Flexible compute
Peter Clapham
Open shift and docker - october,2014
Open shift and docker - october,2014
Hojoong Kim
Oracle Cloud DBaaS
Oracle Cloud DBaaS
Arush Jain
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
Jean Vanderdonckt
Modern Web-site Development Pipeline
Modern Web-site Development Pipeline
GlobalLogic Ukraine
Apache Tez – Present and Future
Apache Tez – Present and Future
Jianfeng Zhang
Apache Tez – Present and Future
Apache Tez – Present and Future
Rajesh Balamohan
PP_Eric_Gandt
PP_Eric_Gandt
Eric Gandt
Performance of Microservice Frameworks on different JVMs
Performance of Microservice Frameworks on different JVMs
Maarten Smeets
Red Hat for IBM System z IBM Enterprise2014 Las Vegas
Red Hat for IBM System z IBM Enterprise2014 Las Vegas
Filipe Miranda
Devops architecture
Devops architecture
Ojasvi Jagtap
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
Senturus
Testing Below the Application
Testing Below the Application
Ash Winter
Similar to Hadoop engineering bo_f_final
(20)
Getting to Walk with DevOps
Getting to Walk with DevOps
Docker for the enterprise
Docker for the enterprise
Ankit Chohan - Java
Ankit Chohan - Java
DevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 Conference
Midwest PHP - Scaling Magento
Midwest PHP - Scaling Magento
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
Flexible compute
Flexible compute
Open shift and docker - october,2014
Open shift and docker - october,2014
Oracle Cloud DBaaS
Oracle Cloud DBaaS
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
Modern Web-site Development Pipeline
Modern Web-site Development Pipeline
Apache Tez – Present and Future
Apache Tez – Present and Future
Apache Tez – Present and Future
Apache Tez – Present and Future
PP_Eric_Gandt
PP_Eric_Gandt
Performance of Microservice Frameworks on different JVMs
Performance of Microservice Frameworks on different JVMs
Red Hat for IBM System z IBM Enterprise2014 Las Vegas
Red Hat for IBM System z IBM Enterprise2014 Las Vegas
Devops architecture
Devops architecture
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
Testing Below the Application
Testing Below the Application
Recently uploaded
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
959SahilShah
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
eptoze12
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
dollysharma2066
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
roselinkalist12
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
srsj9000
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
Dr SOUNDIRARAJ N
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
null - The Open Security Community
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
NikhilNagaraju
pipeline in computer architecture design
pipeline in computer architecture design
ssuser87fa0c1
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
Chandu841456
Internship report on mechanical engineering
Internship report on mechanical engineering
malavadedarshan25
Effects of rheological properties on mixing
Effects of rheological properties on mixing
viprabot1
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
GDSCAESB
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
PragyanshuParadkar1
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
VICTOR MAESTRE RAMIREZ
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
João Esperancinha
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
britheesh05
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
Mark Billinghurst
Recently uploaded
(20)
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
pipeline in computer architecture design
pipeline in computer architecture design
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
Internship report on mechanical engineering
Internship report on mechanical engineering
Effects of rheological properties on mixing
Effects of rheological properties on mixing
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
Hadoop engineering bo_f_final
1.
© Hortonworks Inc.
2011 Hadoop Engineering Best Practices Raja Aluri, Release Eng Deepesh Khandelwal, Quality Eng Ramya Sunil, Quality Eng Page 1
2.
© Hortonworks Inc.
2011 Agenda • Source Mechanics • Why do System Testing? • Test Matrix • Automated Testing Flow • Test Planning • Planning your own System Testing • Q & A Page 2 Architecting the Future of Big Data
3.
© Hortonworks Inc.
2011 Apache Hortonworks Partner Source Mechanics • Hortonworks Open Source Philosophy • How we do Apache first development • How we incorporate fixes or features that did not make into apache yet • How we integrate our partner contributions to the source code • Bookkeeping of the delta between apache and Hortonworks Page 3 Architecting the Future of Big Data
4.
© Hortonworks Inc.
2011 Apache-Hortonworks-Partner Source flow Page 4 Architecting the Future of Big Data Partner ApacheRef HDPRef Partner HWX ApacheRef HDP Apache Git Hadoopbranch-2 Hadoopbranch-2.4 Issue Type Course of Action Normal Issue Patch in Apache first Urgent Issue Patch in HWX Repo first Read-Write Repository Read-Only Repository Continuous Merges Continuous Merges HDP Build CI HDP Package Repo HDP Maven Repository Publish Releases QE Workflow for Testing
5.
© Hortonworks Inc.
2011 Unit Testing • Test individual parts of the program in isolation, white-box testing • Homogeneous cluster, usually in-memory • One configuration, usually 1 operating system and unsecure • Limited dataset, usually few kilobytes Page 5 Architecting the Future of Big Data Unit testing component A Unit testing component C Unit testing component B ?? ?? ?? ?? DB Interaction Concurrent user interaction Third party connectors ?? ?? ??
6.
© Hortonworks Inc.
2011 System Testing • Mimics production environment – Multiple nodes in the cluster – Multiple concurrent users – Different workloads • Multiple configurations to test • Large dataset, more complex and richer • Encompasses different types of testing – Functional – Performance, Stress and Reliability – High Availability – Backwards Compatibility – Integration testing – Third party connectors – Upgrade testing Page 6 Architecting the Future of Big Data
7.
© Hortonworks Inc.
2011 System Testing cont... • Heterogeneous testing – Cross version testing – Cross operating system testing – Hardware configs like Disk and CPU – Security settings, level of encryption Page 7 Architecting the Future of Big Data
8.
© Hortonworks Inc.
2011 Test Matrix • Total of ~15000+ configurations to test! Page 8 Architecting the Future of Big Data OS •CentOS •SuSE •Debian •Ubuntu •Windows JDK •Oracle JDK •OpenJDK •Different version - 1.6.x, 1.7.x, 1.8.x Security •Disabled •Enabled – MIT-only, AD-only, MIT-AD •Ranger - enabled/disabled Encryption •Wire encryption – enabled/disabled •Transparent Data Encryption – enabled/disabled DB •Mysql •Oracle •Postgres •MSSQL File system •HDFS •WASB •Other vendor specific FSs Others •Tez – enabled/disabled •Slider apps v/s standalone
9.
© Hortonworks Inc.
2011 Automated Testing Flow Page 9 Architecting the Future of Big Data Build Job Apache Repos Internal Commits Staging Repo QE Deploy Trigger Provision VMs Deploy HDP Stack Test Setup & Execution Test analysis Continuous Integration Publishing Builds to staging repo Installer deploying bits from staging repo to test cluster Bug tracking system
10.
© Hortonworks Inc.
2011 Test Planning 20+ components in the HDP stack and growing! Page 10 Architecting the Future of Big Data Test plan Internal developers Apache jiras and community forums Product Management Support tickets
11.
© Hortonworks Inc.
2011 Planning your own QATS Architecting the Future of Big Data Page 11
12.
© Hortonworks Inc.
2011 Typical user scenarios • Fresh install • Upgrade stack, going from an earlier release to a newer one • Migration, changing distributions • Applying changes to an existing cluster – Upgrading hardware in regards to CPU, memory, disks – Changing dependent software pieces like OS, JDK – Changing security settings like turning ON Kerberos, Encryption – Changing component configs in *-site.xml, enabling HA Page 12 Architecting the Future of Big Data
13.
© Hortonworks Inc.
2011 Planning your own QATS Page 13 Architecting the Future of Big Data E2E automation Preparation phase • Collect requirements on the stack and workload • Identify appropriate hardware CI development phase • Build in- house CI system for deployment and testing Testing phase • Build basic acceptance tests • End to end automation for your application
14.
© Hortonworks Inc.
2011 Preparation Phase • Collect the stack requirements – Identify all the stack components that will be installed including the third-party applications, connectors – Identify the installer – Identify configs • Hardware selection – Should be scaled appropriately to mimic production environment – Prefer multi-node than single-node with component services distributed • Collect workload information – Use actual workload whenever possible – If not, simulate the workload, some tools available – Use rumen to obtain jobtrace from existing clusters – Use gridmix to generate workload – Data set size and complexity – Number of concurrent users Page 14 Architecting the Future of Big Data
15.
© Hortonworks Inc.
2011 CI Development phase • Implement a CI system – Modularize CI system, eg individual Jenkins jobs for provision, deploy and test • Determine the cadence of testing • Establish reporting Page 15 Architecting the Future of Big Data Provision cluster Deploy Test
16.
© Hortonworks Inc.
2011 Testing Phase • Basic Acceptance Tests – Basic service check for individual deployed components – Basic acceptance tests to validate integrations • Establish baseline – to track performance of pipeline components in future • Compatibility tests (including apps, third party connectors, dashboards etc) • E2E automation to simulate production workloads Page 16 Architecting the Future of Big Data
17.
© Hortonworks Inc.
2011 Q & A Page 17 Architecting the Future of Big Data
18.
© Hortonworks Inc.
2011 Thank You! Architecting the Future of Big Data Page 18
Download now