SlideShare a Scribd company logo
1 of 18
© Hortonworks Inc. 2011
Hadoop Engineering Best Practices
Raja Aluri, Release Eng
Deepesh Khandelwal, Quality Eng
Ramya Sunil, Quality Eng
Page 1
© Hortonworks Inc. 2011
Agenda
• Source Mechanics
• Why do System Testing?
• Test Matrix
• Automated Testing Flow
• Test Planning
• Planning your own System Testing
• Q & A
Page 2
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Apache Hortonworks Partner Source
Mechanics
• Hortonworks Open Source Philosophy
• How we do Apache first development
• How we incorporate fixes or features that did not make into apache yet
• How we integrate our partner contributions to the source code
• Bookkeeping of the delta between apache and Hortonworks
Page 3
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Apache-Hortonworks-Partner Source flow
Page 4
Architecting the Future of Big Data
Partner
ApacheRef
HDPRef
Partner
HWX
ApacheRef
HDP
Apache Git
Hadoopbranch-2
Hadoopbranch-2.4
Issue Type Course of Action
Normal Issue Patch in Apache first
Urgent Issue Patch in HWX Repo first
Read-Write Repository
Read-Only Repository
Continuous
Merges
Continuous
Merges
HDP Build
CI
HDP
Package
Repo
HDP
Maven
Repository
Publish
Releases
QE Workflow
for Testing
© Hortonworks Inc. 2011
Unit Testing
• Test individual parts of the program in isolation, white-box testing
• Homogeneous cluster, usually in-memory
• One configuration, usually 1 operating system and unsecure
• Limited dataset, usually few kilobytes
Page 5
Architecting the Future of Big Data
Unit testing
component A
Unit testing
component
C
Unit testing
component
B
?? ??
??
??
DB
Interaction
Concurrent
user
interaction
Third party
connectors
??
??
??
© Hortonworks Inc. 2011
System Testing
• Mimics production environment
– Multiple nodes in the cluster
– Multiple concurrent users
– Different workloads
• Multiple configurations to test
• Large dataset, more complex and richer
• Encompasses different types of testing
– Functional
– Performance, Stress and Reliability
– High Availability
– Backwards Compatibility
– Integration testing
– Third party connectors
– Upgrade testing
Page 6
Architecting the Future of Big Data
© Hortonworks Inc. 2011
System Testing cont...
• Heterogeneous testing
– Cross version testing
– Cross operating system testing
– Hardware configs like Disk and CPU
– Security settings, level of encryption
Page 7
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Test Matrix
• Total of ~15000+ configurations to test!
Page 8
Architecting the Future of Big Data
OS
•CentOS
•SuSE
•Debian
•Ubuntu
•Windows
JDK
•Oracle JDK
•OpenJDK
•Different version - 1.6.x, 1.7.x,
1.8.x
Security
•Disabled
•Enabled – MIT-only, AD-only,
MIT-AD
•Ranger - enabled/disabled
Encryption
•Wire encryption –
enabled/disabled
•Transparent Data Encryption
– enabled/disabled
DB
•Mysql
•Oracle
•Postgres
•MSSQL
File system
•HDFS
•WASB
•Other vendor specific FSs
Others
•Tez – enabled/disabled
•Slider apps v/s standalone
© Hortonworks Inc. 2011
Automated Testing Flow
Page 9
Architecting the Future of Big Data
Build Job
Apache
Repos
Internal
Commits
Staging
Repo
QE Deploy
Trigger
Provision VMs
Deploy HDP Stack
Test Setup & Execution
Test analysis
Continuous Integration
Publishing Builds to staging
repo
Installer deploying bits from
staging repo to test cluster
Bug tracking system
© Hortonworks Inc. 2011
Test Planning
20+ components in the HDP stack and growing!
Page 10
Architecting the Future of Big Data
Test
plan
Internal
developers
Apache jiras
and
community
forums
Product
Management
Support
tickets
© Hortonworks Inc. 2011
Planning your own QATS
Architecting the Future of Big Data
Page 11
© Hortonworks Inc. 2011
Typical user scenarios
• Fresh install
• Upgrade stack, going from an earlier release to a newer one
• Migration, changing distributions
• Applying changes to an existing cluster
– Upgrading hardware in regards to CPU, memory, disks
– Changing dependent software pieces like OS, JDK
– Changing security settings like turning ON Kerberos, Encryption
– Changing component configs in *-site.xml, enabling HA
Page 12
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Planning your own QATS
Page 13
Architecting the Future of Big Data
E2E automation
Preparation
phase
• Collect
requirements
on the stack
and workload
• Identify
appropriate
hardware
CI development
phase
• Build in-
house CI
system for
deployment
and testing
Testing phase
• Build basic
acceptance
tests
• End to end
automation
for your
application
© Hortonworks Inc. 2011
Preparation Phase
• Collect the stack requirements
– Identify all the stack components that will be installed including the third-party
applications, connectors
– Identify the installer
– Identify configs
• Hardware selection
– Should be scaled appropriately to mimic production environment
– Prefer multi-node than single-node with component services distributed
• Collect workload information
– Use actual workload whenever possible
– If not, simulate the workload, some tools available
– Use rumen to obtain jobtrace from existing clusters
– Use gridmix to generate workload
– Data set size and complexity
– Number of concurrent users
Page 14
Architecting the Future of Big Data
© Hortonworks Inc. 2011
CI Development phase
• Implement a CI system
– Modularize CI system, eg individual Jenkins jobs for provision, deploy and test
• Determine the cadence of testing
• Establish reporting
Page 15
Architecting the Future of Big Data
Provision
cluster
Deploy Test
© Hortonworks Inc. 2011
Testing Phase
• Basic Acceptance Tests
– Basic service check for individual deployed components
– Basic acceptance tests to validate integrations
• Establish baseline – to track performance of pipeline components in
future
• Compatibility tests (including apps, third party connectors, dashboards
etc)
• E2E automation to simulate production workloads
Page 16
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Q & A
Page 17
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Thank You!
Architecting the Future of Big Data
Page 18

More Related Content

What's hot

Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsDataWorks Summit
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopYafang Chang
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learnedtcurdt
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentDataWorks Summit
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownDataWorks Summit
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewYafang Chang
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureDataWorks Summit
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Chris Nauroth
 

What's hot (20)

Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
 

Similar to Hadoop engineering bo_f_final

Getting to Walk with DevOps
Getting to Walk with DevOpsGetting to Walk with DevOps
Getting to Walk with DevOpsEklove Mohan
 
Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterpriseBert Poller
 
Ankit Chohan - Java
Ankit Chohan - JavaAnkit Chohan - Java
Ankit Chohan - JavaAnkit Chohan
 
DevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceDevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceGrid Dynamics
 
Midwest PHP - Scaling Magento
Midwest PHP - Scaling MagentoMidwest PHP - Scaling Magento
Midwest PHP - Scaling MagentoMathew Beane
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014Hojoong Kim
 
Oracle Cloud DBaaS
Oracle Cloud DBaaSOracle Cloud DBaaS
Oracle Cloud DBaaSArush Jain
 
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...Jean Vanderdonckt
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development PipelineGlobalLogic Ukraine
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureJianfeng Zhang
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureRajesh Balamohan
 
Performance of Microservice Frameworks on different JVMs
Performance of Microservice Frameworks on different JVMsPerformance of Microservice Frameworks on different JVMs
Performance of Microservice Frameworks on different JVMsMaarten Smeets
 
Red Hat for IBM System z IBM Enterprise2014 Las Vegas
Red Hat for IBM System z IBM Enterprise2014 Las Vegas Red Hat for IBM System z IBM Enterprise2014 Las Vegas
Red Hat for IBM System z IBM Enterprise2014 Las Vegas Filipe Miranda
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksSenturus
 
Testing Below the Application
Testing Below the ApplicationTesting Below the Application
Testing Below the ApplicationAsh Winter
 

Similar to Hadoop engineering bo_f_final (20)

Getting to Walk with DevOps
Getting to Walk with DevOpsGetting to Walk with DevOps
Getting to Walk with DevOps
 
Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterprise
 
Ankit Chohan - Java
Ankit Chohan - JavaAnkit Chohan - Java
Ankit Chohan - Java
 
DevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceDevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 Conference
 
Midwest PHP - Scaling Magento
Midwest PHP - Scaling MagentoMidwest PHP - Scaling Magento
Midwest PHP - Scaling Magento
 
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko VancsaStarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
 
Oracle Cloud DBaaS
Oracle Cloud DBaaSOracle Cloud DBaaS
Oracle Cloud DBaaS
 
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development Pipeline
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
PP_Eric_Gandt
PP_Eric_GandtPP_Eric_Gandt
PP_Eric_Gandt
 
Performance of Microservice Frameworks on different JVMs
Performance of Microservice Frameworks on different JVMsPerformance of Microservice Frameworks on different JVMs
Performance of Microservice Frameworks on different JVMs
 
Red Hat for IBM System z IBM Enterprise2014 Las Vegas
Red Hat for IBM System z IBM Enterprise2014 Las Vegas Red Hat for IBM System z IBM Enterprise2014 Las Vegas
Red Hat for IBM System z IBM Enterprise2014 Las Vegas
 
Devops architecture
Devops architectureDevops architecture
Devops architecture
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
 
Testing Below the Application
Testing Below the ApplicationTesting Below the Application
Testing Below the Application
 

Recently uploaded

Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture designssuser87fa0c1
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixingviprabot1
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 

Recently uploaded (20)

Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture design
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixing
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 

Hadoop engineering bo_f_final

  • 1. © Hortonworks Inc. 2011 Hadoop Engineering Best Practices Raja Aluri, Release Eng Deepesh Khandelwal, Quality Eng Ramya Sunil, Quality Eng Page 1
  • 2. © Hortonworks Inc. 2011 Agenda • Source Mechanics • Why do System Testing? • Test Matrix • Automated Testing Flow • Test Planning • Planning your own System Testing • Q & A Page 2 Architecting the Future of Big Data
  • 3. © Hortonworks Inc. 2011 Apache Hortonworks Partner Source Mechanics • Hortonworks Open Source Philosophy • How we do Apache first development • How we incorporate fixes or features that did not make into apache yet • How we integrate our partner contributions to the source code • Bookkeeping of the delta between apache and Hortonworks Page 3 Architecting the Future of Big Data
  • 4. © Hortonworks Inc. 2011 Apache-Hortonworks-Partner Source flow Page 4 Architecting the Future of Big Data Partner ApacheRef HDPRef Partner HWX ApacheRef HDP Apache Git Hadoopbranch-2 Hadoopbranch-2.4 Issue Type Course of Action Normal Issue Patch in Apache first Urgent Issue Patch in HWX Repo first Read-Write Repository Read-Only Repository Continuous Merges Continuous Merges HDP Build CI HDP Package Repo HDP Maven Repository Publish Releases QE Workflow for Testing
  • 5. © Hortonworks Inc. 2011 Unit Testing • Test individual parts of the program in isolation, white-box testing • Homogeneous cluster, usually in-memory • One configuration, usually 1 operating system and unsecure • Limited dataset, usually few kilobytes Page 5 Architecting the Future of Big Data Unit testing component A Unit testing component C Unit testing component B ?? ?? ?? ?? DB Interaction Concurrent user interaction Third party connectors ?? ?? ??
  • 6. © Hortonworks Inc. 2011 System Testing • Mimics production environment – Multiple nodes in the cluster – Multiple concurrent users – Different workloads • Multiple configurations to test • Large dataset, more complex and richer • Encompasses different types of testing – Functional – Performance, Stress and Reliability – High Availability – Backwards Compatibility – Integration testing – Third party connectors – Upgrade testing Page 6 Architecting the Future of Big Data
  • 7. © Hortonworks Inc. 2011 System Testing cont... • Heterogeneous testing – Cross version testing – Cross operating system testing – Hardware configs like Disk and CPU – Security settings, level of encryption Page 7 Architecting the Future of Big Data
  • 8. © Hortonworks Inc. 2011 Test Matrix • Total of ~15000+ configurations to test! Page 8 Architecting the Future of Big Data OS •CentOS •SuSE •Debian •Ubuntu •Windows JDK •Oracle JDK •OpenJDK •Different version - 1.6.x, 1.7.x, 1.8.x Security •Disabled •Enabled – MIT-only, AD-only, MIT-AD •Ranger - enabled/disabled Encryption •Wire encryption – enabled/disabled •Transparent Data Encryption – enabled/disabled DB •Mysql •Oracle •Postgres •MSSQL File system •HDFS •WASB •Other vendor specific FSs Others •Tez – enabled/disabled •Slider apps v/s standalone
  • 9. © Hortonworks Inc. 2011 Automated Testing Flow Page 9 Architecting the Future of Big Data Build Job Apache Repos Internal Commits Staging Repo QE Deploy Trigger Provision VMs Deploy HDP Stack Test Setup & Execution Test analysis Continuous Integration Publishing Builds to staging repo Installer deploying bits from staging repo to test cluster Bug tracking system
  • 10. © Hortonworks Inc. 2011 Test Planning 20+ components in the HDP stack and growing! Page 10 Architecting the Future of Big Data Test plan Internal developers Apache jiras and community forums Product Management Support tickets
  • 11. © Hortonworks Inc. 2011 Planning your own QATS Architecting the Future of Big Data Page 11
  • 12. © Hortonworks Inc. 2011 Typical user scenarios • Fresh install • Upgrade stack, going from an earlier release to a newer one • Migration, changing distributions • Applying changes to an existing cluster – Upgrading hardware in regards to CPU, memory, disks – Changing dependent software pieces like OS, JDK – Changing security settings like turning ON Kerberos, Encryption – Changing component configs in *-site.xml, enabling HA Page 12 Architecting the Future of Big Data
  • 13. © Hortonworks Inc. 2011 Planning your own QATS Page 13 Architecting the Future of Big Data E2E automation Preparation phase • Collect requirements on the stack and workload • Identify appropriate hardware CI development phase • Build in- house CI system for deployment and testing Testing phase • Build basic acceptance tests • End to end automation for your application
  • 14. © Hortonworks Inc. 2011 Preparation Phase • Collect the stack requirements – Identify all the stack components that will be installed including the third-party applications, connectors – Identify the installer – Identify configs • Hardware selection – Should be scaled appropriately to mimic production environment – Prefer multi-node than single-node with component services distributed • Collect workload information – Use actual workload whenever possible – If not, simulate the workload, some tools available – Use rumen to obtain jobtrace from existing clusters – Use gridmix to generate workload – Data set size and complexity – Number of concurrent users Page 14 Architecting the Future of Big Data
  • 15. © Hortonworks Inc. 2011 CI Development phase • Implement a CI system – Modularize CI system, eg individual Jenkins jobs for provision, deploy and test • Determine the cadence of testing • Establish reporting Page 15 Architecting the Future of Big Data Provision cluster Deploy Test
  • 16. © Hortonworks Inc. 2011 Testing Phase • Basic Acceptance Tests – Basic service check for individual deployed components – Basic acceptance tests to validate integrations • Establish baseline – to track performance of pipeline components in future • Compatibility tests (including apps, third party connectors, dashboards etc) • E2E automation to simulate production workloads Page 16 Architecting the Future of Big Data
  • 17. © Hortonworks Inc. 2011 Q & A Page 17 Architecting the Future of Big Data
  • 18. © Hortonworks Inc. 2011 Thank You! Architecting the Future of Big Data Page 18