SlideShare a Scribd company logo
Michael Arnold
Admins: Smoke Test Your
Hadoop Cluster!
2019 Pune Data Conferenceby
• Who is this guy?
• What is a smoke test?
• When should we smoke test?
• Why should we smoke test?
• How should we smoke test?
Agenda
3
Who is Michael Arnold?
• Principal Systems Engineer / Consultant
• Automation geek
• 20+ years in IT - 9 years with Hadoop
• I help people deal with:
• Servers (physical and virtual)
• Networks
• Server operating systems
• Hadoop distributions
• Making it all run smoothly
4
A smoke test is a quick, preliminary test to confirm that
basic functionality exists in the item being tested.
What is a Smoke Test?
5
Addresses basic questions like:
• does the program run?
• does the user interface open?
• does clicking the main button do anything?
https://en.wikipedia.org/wiki/Smoke_testing_(software)
What is a Smoke Test?
6
"The process of smoke testing aims to determine whether
the application is so badly broken as to make further
immediate testing unnecessary."
https://en.wikipedia.org/wiki/Smoke_testing_(software)
What is a Smoke Test?
7
Software Testing Levels:
• Unit test
• Integration test
• System test
• Acceptance test
Are there other kinds of tests?
8
Software Testing Types:
• Smoke test
• Functional test
• Usability test
• Security test
• Performance test
• Regression test
• Compliance test
Are there other kinds of tests?
9
Integration Smoke Test
An integration smoke test is simply a smoke test of the parts
of the system.
In our case, a Hadoop subsystem like HDFS and
MapReduce.
10
Performing a smoke test immediately following an
installation or upgrade acts as a way to ensure that the high-
level functionality of the system is working.
Even after minor configuration changes.
When should an integration smoke test be performed?
11
It can catch regressions or misconfigurations before release
to customers.
It can mitigate the risk of outage.
Why integration smoke testing?
12
Does not catch regressions within applications running on
top of the cluster services (ie custom Spark or MapReduce
code).
That would be a System test.
What does Hadoop integration smoke testing not do?
13
Smoke Test Examples:
• ZooKeeper
• HDFS
• MapReduce
• Spark
• Pig
• Hive
• Impala
• Kudu
• HBase
• Accumulo
• Solr
• Kafka
14
# Replace $ZOOKEEPER 'localhost' with the correct hostname.
ZOOKEEPER=localhost:2181
cat <<EOF | zookeeper-client -server $ZOOKEEPER 2>/dev/null | grep -v
INFO
create /zk_test my_data
ls /
get /zk_test
set /zk_test junk
get /zk_test
quit
EOF
ZooKeeper
15
hdfs dfs -ls /
hdfs dfs -put /etc/hosts /tmp/hosts
hdfs dfs -get /tmp/hosts /tmp/hosts123
cat /tmp/hosts123
HDFS
16
yarn jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-
mapreduce/hadoop-examples.jar pi 10 1000
MapReduce
17
echo "this is the end. the only end. my friend." >/tmp/mrin
hdfs dfs -put /tmp/mrin /tmp/
yarn jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-
mapreduce/hadoop-examples.jar wordcount /tmp/mrin
/tmp/mrout
hdfs dfs -cat /tmp/mrout/part-*
MapReduce
(wordcount)
18
MASTER=yarn /opt/cloudera/parcels/CDH/lib/spark/bin/run-
example SparkPi 100
Spark
19
echo "this is the end. the only end. my friend." >/tmp/sparkin
hdfs dfs -put /tmp/sparkin /tmp/
cat <<EOF | spark-shell --master yarn-client
val file = sc.textFile("hdfs:///tmp/sparkin")
val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_
+ _)
counts.saveAsTextFile("hdfs:///tmp/sparkout")
exit
EOF
hdfs dfs -cat /tmp/sparkout/part-*
Spark
(wordcount)
20
hdfs dfs -copyFromLocal /etc/passwd /tmp/test.pig
cat <<EOF | pig /tmp/pig
A = LOAD '/tmp/test.pig' USING PigStorage(':');
B = FOREACH A GENERATE $0 AS id;
STORE B INTO '/tmp/test.pig.out';
EOF
hdfs dfs -cat /tmp/test.pig.out/part-m-00000
Pig
21
# Replace $HIVESERVER2 with the correct hostname that is running the HS2
HIVESERVER2=
# Create table
beeline -n `whoami` -u "jdbc:hive2://${HIVESERVER2}:10000/" -e "CREATE
TABLE test_hive(id INT, name STRING);'
# Insert data
beeline -n `whoami` -u "jdbc:hive2://${HIVESERVER2}:10000/" -e 'INSERT
INTO TABLE test_hive VALUES (1, "justin"), (2, "michael"), (3, "scott");'
# Query table
beeline -n `whoami` -u "jdbc:hive2://${HIVESERVER2}:10000/" -e "SELECT *
FROM test_hive WHERE id=1;"
Hive
22
# Replace $IMPALAD with the correct hostname that's running the Impala
Daemon
IMPALAD=
impala-shell -i $IMPALAD -q 'INVALIDATE METADATA;
SELECT * FROM test_hive;'
Impala (data from Hive)
23
# Replace $IMPALAD with the correct hostname that's running the Impala Daemon
IMPALAD=
# Create table
impala-shell -i $IMPALAD -q 'CREATE TABLE test_impala(id INT,
name STRING);'
# Insert data
impala-shell -i $IMPALAD -q 'INSERT INTO TABLE test_impala
VALUES (1, "bohan"), (2, "shanlin"), (3, "xiaobing");'
# query table
impala-shell -i $IMPALAD -q 'SELECT * FROM test_impala WHERE
id=1;'
Impala
24
# Replace $IMPALAD with the correct hostname that's running the Impala Daemon
IMPALAD=
# Create table
impala-shell -i $IMPALAD -q 'CREATE TABLE test_kudu(id BIGINT, name
STRING, PRIMARY KEY(id)) PARTITION BY HASH (id) PARTITIONS 3
STORED AS KUDU;'
# Insert data
impala-shell -i $IMPALAD -q 'INSERT INTO TABLE test_kudu VALUES
(1, "wasim"), (2, "ninad"), (3, "mohsin");'
# query table
impala-shell -i $IMPALAD -q 'SELECT * FROM test_kudu WHERE id=1;'
Kudu (via Impala)
25
cat <<EOF >/tmp/hbase
create 'test', 'cf'
list 'test'
put 'test', 'row1', 'cf:a', 'value1'
scan 'test'
exit
EOF
hbase shell -n /tmp/hbase
HBase
26
cat <<EOF >/tmp/accumulo
createtable test
insert row1 cf a value1
flush -w
scan -t test
exit
EOF
accumulo shell -u root -p secret -f /tmp/accumulo
Accumulo
27
# Replace $SOLRSERVER with the correct hostname that's running a Solr
Server
SOLRSERVER=
solrctl instancedir --generate /tmp/solr
solrctl instancedir --create t_conf /tmp/solr
solrctl collection --create t_col -s 1 -c t_conf
Solr
28
cd /opt/cloudera/parcels/CDH/share/doc/solr-
doc*/example/exampledocs
java -Durl=http://${SOLRSERVER}:8983/solr/t_col/update -
jar post.jar *.xml
curl
"http://${SOLRSERVER}:8983/solr/t_col_shard1_replica1/select?q
=*%3A*&wt=json&indent=true"
Solr (continued)
29
# Replace $KAFKA 'localhost' with the correct hostname.
KAFKA=localhost:9092
# Replace $ZK_ROOT with the correct ZooKeeper root for Kafka (if you configured one).
ZK_ROOT=
kafka-topics --zookeeper ${ZOOKEEPER}${ZK_ROOT} --create --
topic test_kafka --partitions 1 --replication-factor 1 2>/dev/null
kafka-topics --zookeeper ${ZOOKEEPER}${ZK_ROOT} --list
2>/dev/null
Kafka
30
# Run the consumer and producer in separate terminals.
# Send data to the producer and it appears in the consumer.
# ^C to quit.
kafka-console-consumer --zookeeper
${ZOOKEEPER}${ZK_ROOT} --topic test_kafka 2>/dev/null
cat /etc/passwd | kafka-console-producer --broker-list
$KAFKA --topic test_kafka 2>/dev/null
Kafka (continued)
31
Make sure to remove your test data as part of the
integration smoke test.
Removal details:
https://github.com/teamclairvoyant/hadoop-smoke-tests
Clean Up
Questions?
https://github.com/teamclairvoyant/hadoop-smoke-tests
Michael Arnold
@hadoopgeek
https://www.linkedin.com/in/michaelarnold/
Thank you
http://clairvoyantsoft.com/insight/

More Related Content

What's hot

Reliable Python REST API (by Volodymyr Hotsyk) - Web Back-End Tech Hangout - ...
Reliable Python REST API (by Volodymyr Hotsyk) - Web Back-End Tech Hangout - ...Reliable Python REST API (by Volodymyr Hotsyk) - Web Back-End Tech Hangout - ...
Reliable Python REST API (by Volodymyr Hotsyk) - Web Back-End Tech Hangout - ...
Innovecs
 
DevSec Defense
DevSec DefenseDevSec Defense
DevSec Defense
Daniel Bohannon
 
Alexander Kolb - Flinkspector – Taming the squirrel
Alexander Kolb - Flinkspector – Taming the squirrelAlexander Kolb - Flinkspector – Taming the squirrel
Alexander Kolb - Flinkspector – Taming the squirrel
Flink Forward
 
Refactoring Infrastructure Code
Refactoring Infrastructure CodeRefactoring Infrastructure Code
Refactoring Infrastructure Code
Nell Shamrell-Harrington
 
Automate that
Automate thatAutomate that
Automate thatAtlassian
 
An Introduction to Celery
An Introduction to CeleryAn Introduction to Celery
An Introduction to Celery
Idan Gazit
 
psCloudstack Internals
psCloudstack InternalspsCloudstack Internals
psCloudstack Internals
Hans van Veen
 
Rest api with Python
Rest api with PythonRest api with Python
Rest api with Python
Santosh Ghimire
 
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
Puppet
 
Django REST Framework
Django REST FrameworkDjango REST Framework
Django REST Framework
Load Impact
 
Publishing a Perl6 Module
Publishing a Perl6 ModulePublishing a Perl6 Module
Publishing a Perl6 Module
ast_j
 
DevOps and Chef
DevOps and ChefDevOps and Chef
DevOps and Chef
PiXeL16
 
#SPUG - Legacy applications
#SPUG - Legacy applications#SPUG - Legacy applications
#SPUG - Legacy applications
Piotr Pasich
 
Deploying E.L.K stack w Puppet
Deploying E.L.K stack w PuppetDeploying E.L.K stack w Puppet
Deploying E.L.K stack w Puppet
Colin Brown
 
Python RESTful webservices with Python: Flask and Django solutions
Python RESTful webservices with Python: Flask and Django solutionsPython RESTful webservices with Python: Flask and Django solutions
Python RESTful webservices with Python: Flask and Django solutions
Solution4Future
 
Learn REST API with Python
Learn REST API with PythonLearn REST API with Python
Learn REST API with PythonLarry Cai
 
Ansible, Simplicity, and the Zen of Python
Ansible, Simplicity, and the Zen of PythonAnsible, Simplicity, and the Zen of Python
Ansible, Simplicity, and the Zen of Python
toddmowen
 
Drupal Camp Brighton 2015: Ansible Drupal Medicine show
Drupal Camp Brighton 2015: Ansible Drupal Medicine showDrupal Camp Brighton 2015: Ansible Drupal Medicine show
Drupal Camp Brighton 2015: Ansible Drupal Medicine show
George Boobyer
 
The JavaFX Ecosystem
The JavaFX EcosystemThe JavaFX Ecosystem
The JavaFX Ecosystem
Andres Almiray
 
First glance at Akka 2.0
First glance at Akka 2.0First glance at Akka 2.0
First glance at Akka 2.0
Vasil Remeniuk
 

What's hot (20)

Reliable Python REST API (by Volodymyr Hotsyk) - Web Back-End Tech Hangout - ...
Reliable Python REST API (by Volodymyr Hotsyk) - Web Back-End Tech Hangout - ...Reliable Python REST API (by Volodymyr Hotsyk) - Web Back-End Tech Hangout - ...
Reliable Python REST API (by Volodymyr Hotsyk) - Web Back-End Tech Hangout - ...
 
DevSec Defense
DevSec DefenseDevSec Defense
DevSec Defense
 
Alexander Kolb - Flinkspector – Taming the squirrel
Alexander Kolb - Flinkspector – Taming the squirrelAlexander Kolb - Flinkspector – Taming the squirrel
Alexander Kolb - Flinkspector – Taming the squirrel
 
Refactoring Infrastructure Code
Refactoring Infrastructure CodeRefactoring Infrastructure Code
Refactoring Infrastructure Code
 
Automate that
Automate thatAutomate that
Automate that
 
An Introduction to Celery
An Introduction to CeleryAn Introduction to Celery
An Introduction to Celery
 
psCloudstack Internals
psCloudstack InternalspsCloudstack Internals
psCloudstack Internals
 
Rest api with Python
Rest api with PythonRest api with Python
Rest api with Python
 
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
 
Django REST Framework
Django REST FrameworkDjango REST Framework
Django REST Framework
 
Publishing a Perl6 Module
Publishing a Perl6 ModulePublishing a Perl6 Module
Publishing a Perl6 Module
 
DevOps and Chef
DevOps and ChefDevOps and Chef
DevOps and Chef
 
#SPUG - Legacy applications
#SPUG - Legacy applications#SPUG - Legacy applications
#SPUG - Legacy applications
 
Deploying E.L.K stack w Puppet
Deploying E.L.K stack w PuppetDeploying E.L.K stack w Puppet
Deploying E.L.K stack w Puppet
 
Python RESTful webservices with Python: Flask and Django solutions
Python RESTful webservices with Python: Flask and Django solutionsPython RESTful webservices with Python: Flask and Django solutions
Python RESTful webservices with Python: Flask and Django solutions
 
Learn REST API with Python
Learn REST API with PythonLearn REST API with Python
Learn REST API with Python
 
Ansible, Simplicity, and the Zen of Python
Ansible, Simplicity, and the Zen of PythonAnsible, Simplicity, and the Zen of Python
Ansible, Simplicity, and the Zen of Python
 
Drupal Camp Brighton 2015: Ansible Drupal Medicine show
Drupal Camp Brighton 2015: Ansible Drupal Medicine showDrupal Camp Brighton 2015: Ansible Drupal Medicine show
Drupal Camp Brighton 2015: Ansible Drupal Medicine show
 
The JavaFX Ecosystem
The JavaFX EcosystemThe JavaFX Ecosystem
The JavaFX Ecosystem
 
First glance at Akka 2.0
First glance at Akka 2.0First glance at Akka 2.0
First glance at Akka 2.0
 

Similar to Admins: Smoke Test Your Hadoop Cluster!

Puppet for Sys Admins
Puppet for Sys AdminsPuppet for Sys Admins
Puppet for Sys Admins
Puppet
 
Ansible testing
Ansible   testingAnsible   testing
Ansible testing
Scott van Kalken
 
Building Testable PHP Applications
Building Testable PHP ApplicationsBuilding Testable PHP Applications
Building Testable PHP Applications
chartjes
 
Getting started with Splunk Breakout Session
Getting started with Splunk Breakout SessionGetting started with Splunk Breakout Session
Getting started with Splunk Breakout Session
Splunk
 
Drilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache DrillDrilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache Drill
Charles Givre
 
Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Michelangelo van Dam
 
Fatc
FatcFatc
Terraform 101
Terraform 101Terraform 101
Terraform 101
Haggai Philip Zagury
 
Who pulls the strings?
Who pulls the strings?Who pulls the strings?
Who pulls the strings?
Ronny
 
DevOps with Serverless
DevOps with ServerlessDevOps with Serverless
DevOps with Serverless
Yan Cui
 
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
Yevgeniy Brikman
 
Network automation with Ansible and Python
Network automation with Ansible and PythonNetwork automation with Ansible and Python
Network automation with Ansible and Python
Jisc
 
Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12
Michelangelo van Dam
 
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
Priyanka Aash
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
DataWorks Summit
 
Mist - Serverless proxy to Apache Spark
Mist - Serverless proxy to Apache SparkMist - Serverless proxy to Apache Spark
Mist - Serverless proxy to Apache Spark
Вадим Челышов
 
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Provectus
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos Engineering
SIGHUP
 
06 pig-01-intro
06 pig-01-intro06 pig-01-intro
06 pig-01-intro
Aasim Naveed
 

Similar to Admins: Smoke Test Your Hadoop Cluster! (20)

Puppet for Sys Admins
Puppet for Sys AdminsPuppet for Sys Admins
Puppet for Sys Admins
 
Ansible testing
Ansible   testingAnsible   testing
Ansible testing
 
Building Testable PHP Applications
Building Testable PHP ApplicationsBuilding Testable PHP Applications
Building Testable PHP Applications
 
Getting started with Splunk Breakout Session
Getting started with Splunk Breakout SessionGetting started with Splunk Breakout Session
Getting started with Splunk Breakout Session
 
Drilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache DrillDrilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache Drill
 
Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012
 
Fatc
FatcFatc
Fatc
 
Terraform 101
Terraform 101Terraform 101
Terraform 101
 
Who pulls the strings?
Who pulls the strings?Who pulls the strings?
Who pulls the strings?
 
DevOps with Serverless
DevOps with ServerlessDevOps with Serverless
DevOps with Serverless
 
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
 
Network automation with Ansible and Python
Network automation with Ansible and PythonNetwork automation with Ansible and Python
Network automation with Ansible and Python
 
Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12
 
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
 
Mist - Serverless proxy to Apache Spark
Mist - Serverless proxy to Apache SparkMist - Serverless proxy to Apache Spark
Mist - Serverless proxy to Apache Spark
 
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos Engineering
 
Having Fun with Play
Having Fun with PlayHaving Fun with Play
Having Fun with Play
 
06 pig-01-intro
06 pig-01-intro06 pig-01-intro
06 pig-01-intro
 

Recently uploaded

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 

Admins: Smoke Test Your Hadoop Cluster!

  • 1. Michael Arnold Admins: Smoke Test Your Hadoop Cluster! 2019 Pune Data Conferenceby
  • 2. • Who is this guy? • What is a smoke test? • When should we smoke test? • Why should we smoke test? • How should we smoke test? Agenda
  • 3. 3 Who is Michael Arnold? • Principal Systems Engineer / Consultant • Automation geek • 20+ years in IT - 9 years with Hadoop • I help people deal with: • Servers (physical and virtual) • Networks • Server operating systems • Hadoop distributions • Making it all run smoothly
  • 4. 4 A smoke test is a quick, preliminary test to confirm that basic functionality exists in the item being tested. What is a Smoke Test?
  • 5. 5 Addresses basic questions like: • does the program run? • does the user interface open? • does clicking the main button do anything? https://en.wikipedia.org/wiki/Smoke_testing_(software) What is a Smoke Test?
  • 6. 6 "The process of smoke testing aims to determine whether the application is so badly broken as to make further immediate testing unnecessary." https://en.wikipedia.org/wiki/Smoke_testing_(software) What is a Smoke Test?
  • 7. 7 Software Testing Levels: • Unit test • Integration test • System test • Acceptance test Are there other kinds of tests?
  • 8. 8 Software Testing Types: • Smoke test • Functional test • Usability test • Security test • Performance test • Regression test • Compliance test Are there other kinds of tests?
  • 9. 9 Integration Smoke Test An integration smoke test is simply a smoke test of the parts of the system. In our case, a Hadoop subsystem like HDFS and MapReduce.
  • 10. 10 Performing a smoke test immediately following an installation or upgrade acts as a way to ensure that the high- level functionality of the system is working. Even after minor configuration changes. When should an integration smoke test be performed?
  • 11. 11 It can catch regressions or misconfigurations before release to customers. It can mitigate the risk of outage. Why integration smoke testing?
  • 12. 12 Does not catch regressions within applications running on top of the cluster services (ie custom Spark or MapReduce code). That would be a System test. What does Hadoop integration smoke testing not do?
  • 13. 13 Smoke Test Examples: • ZooKeeper • HDFS • MapReduce • Spark • Pig • Hive • Impala • Kudu • HBase • Accumulo • Solr • Kafka
  • 14. 14 # Replace $ZOOKEEPER 'localhost' with the correct hostname. ZOOKEEPER=localhost:2181 cat <<EOF | zookeeper-client -server $ZOOKEEPER 2>/dev/null | grep -v INFO create /zk_test my_data ls / get /zk_test set /zk_test junk get /zk_test quit EOF ZooKeeper
  • 15. 15 hdfs dfs -ls / hdfs dfs -put /etc/hosts /tmp/hosts hdfs dfs -get /tmp/hosts /tmp/hosts123 cat /tmp/hosts123 HDFS
  • 17. 17 echo "this is the end. the only end. my friend." >/tmp/mrin hdfs dfs -put /tmp/mrin /tmp/ yarn jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20- mapreduce/hadoop-examples.jar wordcount /tmp/mrin /tmp/mrout hdfs dfs -cat /tmp/mrout/part-* MapReduce (wordcount)
  • 19. 19 echo "this is the end. the only end. my friend." >/tmp/sparkin hdfs dfs -put /tmp/sparkin /tmp/ cat <<EOF | spark-shell --master yarn-client val file = sc.textFile("hdfs:///tmp/sparkin") val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) counts.saveAsTextFile("hdfs:///tmp/sparkout") exit EOF hdfs dfs -cat /tmp/sparkout/part-* Spark (wordcount)
  • 20. 20 hdfs dfs -copyFromLocal /etc/passwd /tmp/test.pig cat <<EOF | pig /tmp/pig A = LOAD '/tmp/test.pig' USING PigStorage(':'); B = FOREACH A GENERATE $0 AS id; STORE B INTO '/tmp/test.pig.out'; EOF hdfs dfs -cat /tmp/test.pig.out/part-m-00000 Pig
  • 21. 21 # Replace $HIVESERVER2 with the correct hostname that is running the HS2 HIVESERVER2= # Create table beeline -n `whoami` -u "jdbc:hive2://${HIVESERVER2}:10000/" -e "CREATE TABLE test_hive(id INT, name STRING);' # Insert data beeline -n `whoami` -u "jdbc:hive2://${HIVESERVER2}:10000/" -e 'INSERT INTO TABLE test_hive VALUES (1, "justin"), (2, "michael"), (3, "scott");' # Query table beeline -n `whoami` -u "jdbc:hive2://${HIVESERVER2}:10000/" -e "SELECT * FROM test_hive WHERE id=1;" Hive
  • 22. 22 # Replace $IMPALAD with the correct hostname that's running the Impala Daemon IMPALAD= impala-shell -i $IMPALAD -q 'INVALIDATE METADATA; SELECT * FROM test_hive;' Impala (data from Hive)
  • 23. 23 # Replace $IMPALAD with the correct hostname that's running the Impala Daemon IMPALAD= # Create table impala-shell -i $IMPALAD -q 'CREATE TABLE test_impala(id INT, name STRING);' # Insert data impala-shell -i $IMPALAD -q 'INSERT INTO TABLE test_impala VALUES (1, "bohan"), (2, "shanlin"), (3, "xiaobing");' # query table impala-shell -i $IMPALAD -q 'SELECT * FROM test_impala WHERE id=1;' Impala
  • 24. 24 # Replace $IMPALAD with the correct hostname that's running the Impala Daemon IMPALAD= # Create table impala-shell -i $IMPALAD -q 'CREATE TABLE test_kudu(id BIGINT, name STRING, PRIMARY KEY(id)) PARTITION BY HASH (id) PARTITIONS 3 STORED AS KUDU;' # Insert data impala-shell -i $IMPALAD -q 'INSERT INTO TABLE test_kudu VALUES (1, "wasim"), (2, "ninad"), (3, "mohsin");' # query table impala-shell -i $IMPALAD -q 'SELECT * FROM test_kudu WHERE id=1;' Kudu (via Impala)
  • 25. 25 cat <<EOF >/tmp/hbase create 'test', 'cf' list 'test' put 'test', 'row1', 'cf:a', 'value1' scan 'test' exit EOF hbase shell -n /tmp/hbase HBase
  • 26. 26 cat <<EOF >/tmp/accumulo createtable test insert row1 cf a value1 flush -w scan -t test exit EOF accumulo shell -u root -p secret -f /tmp/accumulo Accumulo
  • 27. 27 # Replace $SOLRSERVER with the correct hostname that's running a Solr Server SOLRSERVER= solrctl instancedir --generate /tmp/solr solrctl instancedir --create t_conf /tmp/solr solrctl collection --create t_col -s 1 -c t_conf Solr
  • 28. 28 cd /opt/cloudera/parcels/CDH/share/doc/solr- doc*/example/exampledocs java -Durl=http://${SOLRSERVER}:8983/solr/t_col/update - jar post.jar *.xml curl "http://${SOLRSERVER}:8983/solr/t_col_shard1_replica1/select?q =*%3A*&wt=json&indent=true" Solr (continued)
  • 29. 29 # Replace $KAFKA 'localhost' with the correct hostname. KAFKA=localhost:9092 # Replace $ZK_ROOT with the correct ZooKeeper root for Kafka (if you configured one). ZK_ROOT= kafka-topics --zookeeper ${ZOOKEEPER}${ZK_ROOT} --create -- topic test_kafka --partitions 1 --replication-factor 1 2>/dev/null kafka-topics --zookeeper ${ZOOKEEPER}${ZK_ROOT} --list 2>/dev/null Kafka
  • 30. 30 # Run the consumer and producer in separate terminals. # Send data to the producer and it appears in the consumer. # ^C to quit. kafka-console-consumer --zookeeper ${ZOOKEEPER}${ZK_ROOT} --topic test_kafka 2>/dev/null cat /etc/passwd | kafka-console-producer --broker-list $KAFKA --topic test_kafka 2>/dev/null Kafka (continued)
  • 31. 31 Make sure to remove your test data as part of the integration smoke test. Removal details: https://github.com/teamclairvoyant/hadoop-smoke-tests Clean Up

Editor's Notes

  1. It came from the realm of hardware testing where simply plugging in the electronic item might result in it catching fire. Quick Shallow <pause>
  2. In our case, can we write to and read from HDFS? Will a MapReduce job run?
  3. Don't go and turn the cluster over to the customer if it isn't running correctly.
  4. Unit: individual components are tested; ie a software function Integration: units are tested as a group; does the network code work with the storage driver? System: the entire program is tested for functionality Acceptance: the entire program is tested to meet end-user criteria
  5. Smoke: quickly checks the most important aspects of the software; does it run? Functional: checking all software functions for correct inputs/outputs Usability: end-user perspective testing; it may pass Functional, but might be unUsable. Security: uncovers security vulnerabilities Performance: aka Load testing; evaluates increased workloads and how the system performs Regression: tests to make sure previous bugs have not been re-introduced Compliance: tests of internal or external standards to which the program needs to be compliant
  6. Hive Solr HBase
  7. install or upgrade <click> It is better to find problems while in a maintenance window and before users are given access to the system.
  8. -These are examples for a non-secured; un-encrypted environment. -A lot of these are Cloudera specific, but paths can be changed. -Assumes SSH session as a normal user on the cluster Edge node.