2019 Pune Data Conference
Software smoke testing is a preliminary level of testing. It makes certain that all of the primary components of a system are functioning correctly. For example, when installing a new secured Hadoop cluster, running a series of quick tests to make sure that things like HDFS and MapReduce are operational can save a lot of headache before enabling Kerberos. Smoke tests can also save you time and embarrassment by making sure that things work before you turn the cluster over to your customer.
In this talk, Michael Arnold will explain the utility of testing Hadoop components after cluster builds and software upgrades. Michael will present code examples that you can use to confirm functionality of Spark, Kudu, HBase, Kafka, MapReduce, etc on your cluster.
Apache Hive Hook
I couldn't find enough info about Hive hooks.
So, I made this.
I hope this presentation will be useful when you want to use hooks.
This included some infomation about metastore event listeners.
This was written based on release-0.11 tag.
Performance benchmarks are all too often inaccurate. This talk introduces some things to look for in setting up and running benchmarks to make them effective.
PesterSec: Using Pester & ScriptAnalyzer to Detect Obfuscated PowerShellDaniel Bohannon
Slides from presentation: "PesterSec: Using Pester & ScriptAnalyzer to Detect Obfuscated PowerShell" presented at PSConfEU in Hanover, Germany.
For more information: http://www.danielbohannon.com/presentations/
Slides from presentation: "Revoke-Obfuscation: PowerShell Obfuscation Detection (And Evasion) Using Science" originally released at Black Hat USA 2017 & DEF CON by @danielhbohannon and @Lee_Holmes.
For more information: http://www.danielbohannon.com/presentations/
perl often doesn't get updated because people don't have a way to know if their current code works with the new one. The problem is that they lack unit tests. This talk describes how simple it is to generate unit tests with Perl and shell, use them to automate solving problems like missing modules, and test a complete code base.
Apache Hive Hook
I couldn't find enough info about Hive hooks.
So, I made this.
I hope this presentation will be useful when you want to use hooks.
This included some infomation about metastore event listeners.
This was written based on release-0.11 tag.
Performance benchmarks are all too often inaccurate. This talk introduces some things to look for in setting up and running benchmarks to make them effective.
PesterSec: Using Pester & ScriptAnalyzer to Detect Obfuscated PowerShellDaniel Bohannon
Slides from presentation: "PesterSec: Using Pester & ScriptAnalyzer to Detect Obfuscated PowerShell" presented at PSConfEU in Hanover, Germany.
For more information: http://www.danielbohannon.com/presentations/
Slides from presentation: "Revoke-Obfuscation: PowerShell Obfuscation Detection (And Evasion) Using Science" originally released at Black Hat USA 2017 & DEF CON by @danielhbohannon and @Lee_Holmes.
For more information: http://www.danielbohannon.com/presentations/
perl often doesn't get updated because people don't have a way to know if their current code works with the new one. The problem is that they lack unit tests. This talk describes how simple it is to generate unit tests with Perl and shell, use them to automate solving problems like missing modules, and test a complete code base.
Reliable Python REST API (by Volodymyr Hotsyk) - Web Back-End Tech Hangout - ...Innovecs
On Saturday, 12 of April, regular quarterly meeting of Tech Hangout Community took place in Creative Space 12, the cultural and educational center based in Kiev! The event was held under the motto «One day of inspiring talks on Web Back-End». This time Python, Ruby and PHP developers gathered to make peace and learn the Force.
*TECH HANGOUT COMMUNITY was found in 2012 by the developers for the developers for knowledge and experience sharing. Such meetings are the part of Innovecs Educational Project that actively develops sphere of internal trainings and knowledge exchange program among professionals. This Initiative was born within the walls of Innovecs and has proved to be extremely popular and high-demand. In a short period of time it gained its own Facebook group with more than 90 members, blog with more than 40 posts and constant quarterly external meeting of Tech hangout community with more than 80 participants. The concept of the event proposes a 30-minute report on the topic previously defined, and the discussion in a roundtable session format.
Join to discuss - https://www.facebook.com/groups/techhangout/
Slides from presentation: "DevSec Defense: How DevOps Practices Can Drive Detection Development For Defenders"
For more information: http://www.danielbohannon.com/presentations/
Alexander Kolb - Flinkspector – Taming the squirrelFlink Forward
http://flink-forward.org/kb_sessions/flinkspector-taming-the-squirrel/
The costs of logic errors in production for streaming applications are higher than for batch processing systems. Depending on the setup, errors cannot be rectified or have already influenced important decisions. The goal of Flinkspector is to improve the test process of Apache Flink streaming applications in order to detect streaming application logic errors early during development. It features dedicated mechanics for test setup, execution, and evaluation. While Flinkspector’s streamlined API keeps testing overhead small. The framework is able to handle non-terminating and parallelized data flows involving windowing. The lightweight integration-tests enabled by Flinkspector allow Flink applications to be included into the continuous integration and deployment process. The talk introduces the core functionality of Flinkspector. In addition, background concepts of the runtime and the evaluation algorithms are presented. https://github.com/ottogroup/flink-spector
psCloudstack was presented at the CloudStack Collaboration Conference Europe (#ccceu14) in Budapest
The session slides are more or less reference material, the session itself was mainly demo's
A presentation on "Rest api with Python" presented at Python Developers Nepal Meetup #6 by Santosh Ghimire. It focuses more on Django Rest Framework library.
Presentation prepared especially for #4 SPUG meeting in Gliwice describes integration an old code, spaghetti code with Symfony2 framework. Two really different engines in one application.
Python RESTful webservices with Python: Flask and Django solutionsSolution4Future
Slides contain RESTful solutions based on Python frameworks like Flask and Django. The presentation introduce in REST concept, presents benchmarks and research for best solutions, analyzes performance problems and shows how to simple get better results. Finally presents soruce code in Flask and Django how to make your own RESTful API in 15 minutes.
Ansible, Simplicity, and the Zen of Pythontoddmowen
Slides from the following talk presented at PyCon Australia 2015:
https://www.youtube.com/watch?v=JlrkizEBjXk
Ansible is a configuration management tool, written in Python, that has taken the world of IT automation by storm. Its most remarkable quality is simplicity.
The Zen of Python is a set of aphorisms which capture the design philosophy of the Python language, one being "Simple is better than complex".
Drupal Camp Brighton 2015: Ansible Drupal Medicine showGeorge Boobyer
In this session we are going to look at the latest craze amongst developers with some Sysadmin responsibilities - Ansible.
As with all trending technologies you can be led to believe that it is the new wonder drug (multi purpose in a jar - if you ain't ill it will fix your car). But in this case we will look at some of the key ways that automated provisioning, configuration and state management can actually cure some of the critical headaches you face securing and managing production infrastructure and Drupal sites - (as with all such wonder drugs seek the advice of your GP before radically changing your lifestyle). Also as a warning once you start delving deeper into the world of web security you'll need a pretty thick skin - denial was a comfortable place to be. We won’t be covering Ansible for use in local development with systems such as VLAD - that hopefully will be the subject of other presentations.
Critically we are going to look at Ansible in a Drupal context with a focus on security and hopefully encourage participation in the development of tighter integration with Drupal site deployment and management as well as security defence measures.
By the end of the session we hope to have been convinced that with the adoption of Ansible you will feel more secure, more efficient and more relaxed about managing your infrastructure and sites and also to show how the principles of collaboration common within the Drupal community can transpose with great effect to the Ansible community . Code examples will be provided to support the topics covered.
Reliable Python REST API (by Volodymyr Hotsyk) - Web Back-End Tech Hangout - ...Innovecs
On Saturday, 12 of April, regular quarterly meeting of Tech Hangout Community took place in Creative Space 12, the cultural and educational center based in Kiev! The event was held under the motto «One day of inspiring talks on Web Back-End». This time Python, Ruby and PHP developers gathered to make peace and learn the Force.
*TECH HANGOUT COMMUNITY was found in 2012 by the developers for the developers for knowledge and experience sharing. Such meetings are the part of Innovecs Educational Project that actively develops sphere of internal trainings and knowledge exchange program among professionals. This Initiative was born within the walls of Innovecs and has proved to be extremely popular and high-demand. In a short period of time it gained its own Facebook group with more than 90 members, blog with more than 40 posts and constant quarterly external meeting of Tech hangout community with more than 80 participants. The concept of the event proposes a 30-minute report on the topic previously defined, and the discussion in a roundtable session format.
Join to discuss - https://www.facebook.com/groups/techhangout/
Slides from presentation: "DevSec Defense: How DevOps Practices Can Drive Detection Development For Defenders"
For more information: http://www.danielbohannon.com/presentations/
Alexander Kolb - Flinkspector – Taming the squirrelFlink Forward
http://flink-forward.org/kb_sessions/flinkspector-taming-the-squirrel/
The costs of logic errors in production for streaming applications are higher than for batch processing systems. Depending on the setup, errors cannot be rectified or have already influenced important decisions. The goal of Flinkspector is to improve the test process of Apache Flink streaming applications in order to detect streaming application logic errors early during development. It features dedicated mechanics for test setup, execution, and evaluation. While Flinkspector’s streamlined API keeps testing overhead small. The framework is able to handle non-terminating and parallelized data flows involving windowing. The lightweight integration-tests enabled by Flinkspector allow Flink applications to be included into the continuous integration and deployment process. The talk introduces the core functionality of Flinkspector. In addition, background concepts of the runtime and the evaluation algorithms are presented. https://github.com/ottogroup/flink-spector
psCloudstack was presented at the CloudStack Collaboration Conference Europe (#ccceu14) in Budapest
The session slides are more or less reference material, the session itself was mainly demo's
A presentation on "Rest api with Python" presented at Python Developers Nepal Meetup #6 by Santosh Ghimire. It focuses more on Django Rest Framework library.
Presentation prepared especially for #4 SPUG meeting in Gliwice describes integration an old code, spaghetti code with Symfony2 framework. Two really different engines in one application.
Python RESTful webservices with Python: Flask and Django solutionsSolution4Future
Slides contain RESTful solutions based on Python frameworks like Flask and Django. The presentation introduce in REST concept, presents benchmarks and research for best solutions, analyzes performance problems and shows how to simple get better results. Finally presents soruce code in Flask and Django how to make your own RESTful API in 15 minutes.
Ansible, Simplicity, and the Zen of Pythontoddmowen
Slides from the following talk presented at PyCon Australia 2015:
https://www.youtube.com/watch?v=JlrkizEBjXk
Ansible is a configuration management tool, written in Python, that has taken the world of IT automation by storm. Its most remarkable quality is simplicity.
The Zen of Python is a set of aphorisms which capture the design philosophy of the Python language, one being "Simple is better than complex".
Drupal Camp Brighton 2015: Ansible Drupal Medicine showGeorge Boobyer
In this session we are going to look at the latest craze amongst developers with some Sysadmin responsibilities - Ansible.
As with all trending technologies you can be led to believe that it is the new wonder drug (multi purpose in a jar - if you ain't ill it will fix your car). But in this case we will look at some of the key ways that automated provisioning, configuration and state management can actually cure some of the critical headaches you face securing and managing production infrastructure and Drupal sites - (as with all such wonder drugs seek the advice of your GP before radically changing your lifestyle). Also as a warning once you start delving deeper into the world of web security you'll need a pretty thick skin - denial was a comfortable place to be. We won’t be covering Ansible for use in local development with systems such as VLAD - that hopefully will be the subject of other presentations.
Critically we are going to look at Ansible in a Drupal context with a focus on security and hopefully encourage participation in the development of tighter integration with Drupal site deployment and management as well as security defence measures.
By the end of the session we hope to have been convinced that with the adoption of Ansible you will feel more secure, more efficient and more relaxed about managing your infrastructure and sites and also to show how the principles of collaboration common within the Drupal community can transpose with great effect to the Ansible community . Code examples will be provided to support the topics covered.
Drilling Cyber Security Data With Apache DrillCharles Givre
This deck walks you through using Apache Drill and Apache Superset (Incubating) to explore cyber security datasets including PCAP, HTTPD log files, Syslog and more.
How to test infrastructure code: automated testing for Terraform, Kubernetes,...Yevgeniy Brikman
This talk is a step-by-step, live-coding class on how to write automated tests for infrastructure code, including the code you write for use with tools such as Terraform, Kubernetes, Docker, and Packer. Topics covered include unit tests, integration tests, end-to-end tests, test parallelism, retries, error handling, static analysis, and more.
This workshop is a hands-on training where a real Zend Framework application is used as an example to start improving QA using tools to test, document and perform software metric calculations to indicate where the software can be improved. I also explain the reports produced by a CI system.
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!Priyanka Aash
"We propose a new exploit technique that brings a whole-new attack surface to defeat path normalization, which is complicated in implementation due to many implicit properties and edge cases. This complication, being under-estimated or ignored by developers for a long time, has made our proposed attack vector possible, lethal, and general. Therefore, many 0days have been discovered via this approach in popular web frameworks written in trending programming languages, including Python, Ruby, Java, and JavaScript.
Being a very fundamental problem that exists in path normalization logic, sophisticated web frameworks can also suffer. For example, we've found various 0days on Java Spring Framework, Ruby on Rails, Next.js, and Python aiohttp, just to name a few. This general technique can also adapt to multi-layered web architecture, such as using Nginx or Apache as a proxy for Tomcat. In that case, reverse proxy protections can be bypassed. To make things worse, we're able to chain path normalization bugs to bypass authentication and achieve RCE in real world Bug Bounty Programs. Several scenarios will be demonstrated to illustrate how path normalization can be exploited to achieve sensitive information disclosure, SMB-Relay and RCE.
Understanding the basics of this technique, the audience won't be surprised to know that more than 10 vulnerabilities have been found in sophisticated frameworks and multi-layered web architectures aforementioned via this technique."
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...Provectus
In this demo based talk with live coding, we’ll present a functional typeful framework for developing Apache Spark applications. We’ll walk through the following key topics: – turning unmanageable Spark scripts into typeful Spark Functions – serverless deployment of Spark functions into the cloud – unit testing Spark functions to save cluster resources and developers time – seamless Spark session management between concurrent Spark jobs in exclusive or share modes
Practical Chaos Engineering will show how to start running chaos experiments in your infrastructure and will try to guide your through the principles of chaos.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
2. • Who is this guy?
• What is a smoke test?
• When should we smoke test?
• Why should we smoke test?
• How should we smoke test?
Agenda
3. 3
Who is Michael Arnold?
• Principal Systems Engineer / Consultant
• Automation geek
• 20+ years in IT - 9 years with Hadoop
• I help people deal with:
• Servers (physical and virtual)
• Networks
• Server operating systems
• Hadoop distributions
• Making it all run smoothly
4. 4
A smoke test is a quick, preliminary test to confirm that
basic functionality exists in the item being tested.
What is a Smoke Test?
5. 5
Addresses basic questions like:
• does the program run?
• does the user interface open?
• does clicking the main button do anything?
https://en.wikipedia.org/wiki/Smoke_testing_(software)
What is a Smoke Test?
6. 6
"The process of smoke testing aims to determine whether
the application is so badly broken as to make further
immediate testing unnecessary."
https://en.wikipedia.org/wiki/Smoke_testing_(software)
What is a Smoke Test?
7. 7
Software Testing Levels:
• Unit test
• Integration test
• System test
• Acceptance test
Are there other kinds of tests?
8. 8
Software Testing Types:
• Smoke test
• Functional test
• Usability test
• Security test
• Performance test
• Regression test
• Compliance test
Are there other kinds of tests?
9. 9
Integration Smoke Test
An integration smoke test is simply a smoke test of the parts
of the system.
In our case, a Hadoop subsystem like HDFS and
MapReduce.
10. 10
Performing a smoke test immediately following an
installation or upgrade acts as a way to ensure that the high-
level functionality of the system is working.
Even after minor configuration changes.
When should an integration smoke test be performed?
11. 11
It can catch regressions or misconfigurations before release
to customers.
It can mitigate the risk of outage.
Why integration smoke testing?
12. 12
Does not catch regressions within applications running on
top of the cluster services (ie custom Spark or MapReduce
code).
That would be a System test.
What does Hadoop integration smoke testing not do?
19. 19
echo "this is the end. the only end. my friend." >/tmp/sparkin
hdfs dfs -put /tmp/sparkin /tmp/
cat <<EOF | spark-shell --master yarn-client
val file = sc.textFile("hdfs:///tmp/sparkin")
val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_
+ _)
counts.saveAsTextFile("hdfs:///tmp/sparkout")
exit
EOF
hdfs dfs -cat /tmp/sparkout/part-*
Spark
(wordcount)
20. 20
hdfs dfs -copyFromLocal /etc/passwd /tmp/test.pig
cat <<EOF | pig /tmp/pig
A = LOAD '/tmp/test.pig' USING PigStorage(':');
B = FOREACH A GENERATE $0 AS id;
STORE B INTO '/tmp/test.pig.out';
EOF
hdfs dfs -cat /tmp/test.pig.out/part-m-00000
Pig
21. 21
# Replace $HIVESERVER2 with the correct hostname that is running the HS2
HIVESERVER2=
# Create table
beeline -n `whoami` -u "jdbc:hive2://${HIVESERVER2}:10000/" -e "CREATE
TABLE test_hive(id INT, name STRING);'
# Insert data
beeline -n `whoami` -u "jdbc:hive2://${HIVESERVER2}:10000/" -e 'INSERT
INTO TABLE test_hive VALUES (1, "justin"), (2, "michael"), (3, "scott");'
# Query table
beeline -n `whoami` -u "jdbc:hive2://${HIVESERVER2}:10000/" -e "SELECT *
FROM test_hive WHERE id=1;"
Hive
22. 22
# Replace $IMPALAD with the correct hostname that's running the Impala
Daemon
IMPALAD=
impala-shell -i $IMPALAD -q 'INVALIDATE METADATA;
SELECT * FROM test_hive;'
Impala (data from Hive)
23. 23
# Replace $IMPALAD with the correct hostname that's running the Impala Daemon
IMPALAD=
# Create table
impala-shell -i $IMPALAD -q 'CREATE TABLE test_impala(id INT,
name STRING);'
# Insert data
impala-shell -i $IMPALAD -q 'INSERT INTO TABLE test_impala
VALUES (1, "bohan"), (2, "shanlin"), (3, "xiaobing");'
# query table
impala-shell -i $IMPALAD -q 'SELECT * FROM test_impala WHERE
id=1;'
Impala
24. 24
# Replace $IMPALAD with the correct hostname that's running the Impala Daemon
IMPALAD=
# Create table
impala-shell -i $IMPALAD -q 'CREATE TABLE test_kudu(id BIGINT, name
STRING, PRIMARY KEY(id)) PARTITION BY HASH (id) PARTITIONS 3
STORED AS KUDU;'
# Insert data
impala-shell -i $IMPALAD -q 'INSERT INTO TABLE test_kudu VALUES
(1, "wasim"), (2, "ninad"), (3, "mohsin");'
# query table
impala-shell -i $IMPALAD -q 'SELECT * FROM test_kudu WHERE id=1;'
Kudu (via Impala)
29. 29
# Replace $KAFKA 'localhost' with the correct hostname.
KAFKA=localhost:9092
# Replace $ZK_ROOT with the correct ZooKeeper root for Kafka (if you configured one).
ZK_ROOT=
kafka-topics --zookeeper ${ZOOKEEPER}${ZK_ROOT} --create --
topic test_kafka --partitions 1 --replication-factor 1 2>/dev/null
kafka-topics --zookeeper ${ZOOKEEPER}${ZK_ROOT} --list
2>/dev/null
Kafka
30. 30
# Run the consumer and producer in separate terminals.
# Send data to the producer and it appears in the consumer.
# ^C to quit.
kafka-console-consumer --zookeeper
${ZOOKEEPER}${ZK_ROOT} --topic test_kafka 2>/dev/null
cat /etc/passwd | kafka-console-producer --broker-list
$KAFKA --topic test_kafka 2>/dev/null
Kafka (continued)
31. 31
Make sure to remove your test data as part of the
integration smoke test.
Removal details:
https://github.com/teamclairvoyant/hadoop-smoke-tests
Clean Up
It came from the realm of hardware testing where simply plugging in the electronic item might result in it catching fire.
Quick
Shallow
<pause>
In our case, can we write to and read from HDFS? Will a MapReduce job run?
Don't go and turn the cluster over to the customer if it isn't running correctly.
Unit: individual components are tested; ie a software function
Integration: units are tested as a group; does the network code work with the storage driver?
System: the entire program is tested for functionality
Acceptance: the entire program is tested to meet end-user criteria
Smoke: quickly checks the most important aspects of the software; does it run?
Functional: checking all software functions for correct inputs/outputs
Usability: end-user perspective testing; it may pass Functional, but might be unUsable.
Security: uncovers security vulnerabilities
Performance: aka Load testing; evaluates increased workloads and how the system performs
Regression: tests to make sure previous bugs have not been re-introduced
Compliance: tests of internal or external standards to which the program needs to be compliant
Hive Solr HBase
install or upgrade
<click>
It is better to find problems while in a maintenance window and before users are given access to the system.
-These are examples for a non-secured; un-encrypted environment.
-A lot of these are Cloudera specific, but paths can be changed.
-Assumes SSH session as a normal user on the cluster Edge node.