SlideShare a Scribd company logo
A Glimpse of Test
Automation in
Hadoop Ecosystem
Deepika Achary
Test Engineer, OCLC
Who Am I?
def self.info()
name = ‘Deepika Achary’
job = ‘Test Automation Engineer’
company = ‘OCLC’
email = ‘acharyd@oclc.org’
hobbies = [‘Baking’, ‘Watching movies’, ‘Painting’]
end
What are we going to talk about?
What is BigData?
u “BigData” refers to data that is so large, with
unstructured or semi-structured format of data
that it’s difficult to process using traditional
methods
u Volumes too great for a typical DBMS (terabytes,
petabytes, exabytes of data)
u Sources of BigData – Social media, IoT
Appliances(Smart devices), E-commerce
transactions, GPS location data etc.
Let’s have a look how much data is
generated per minute on internet
1+ Million2+ Million2+ Million4.5 Million
Initial
thought
process and
Approach
We decided to move to Hadoop to handle the
huge data
Some of the applications are being developed
in Hadoop ecosystem
That’s when the Hadoop ecosystem
components came into picture and thought
about test automation for the applications
Why Jruby?
What is JRuby and Why use it?
u JRuby is a 100% pure-Java
implementation of the Ruby
programming language
u JRuby allows Ruby programs to use
Java classes. This is a powerful concept
that JRuby now brings to Ruby users.
u JRuby can integrate with Java code. If
you have Java class libraries (.jar's), you
can reference and use them from within
Ruby code with JRuby
u By leveraging Java the platform with
the power of the Ruby programming
language, programmers get the best
from both worlds
Ruby vs JRuby
Ruby JRuby
A dynamic, interpreted, open
source programming language
with a focus on simplicity and
productivity
A high performance, stable, fully
threaded Java implementation
of the Ruby programming
language
High performance Comparatively higher
performance than Ruby
Uses Ruby gems Can use Ruby gems along with
java libraries
DATA
HDFS
KAFKA
SOLR
HBASE
Let’s explore
individual components
HDFS
Hadoop Distributed File System
Storage layer of Hadoop
Data gets stored in distributed manner in HDFS
Files broken down in smaller chunks and stored in
various machines
Breaking the files and creating copies of the files
and stored in different nodes
If one machine fails, make sure data can be
retrieved from other machines
300 MB
100 MB
100MB
BigData
HDFS
A
AB
B
C
C
100MB
Hadoop
Framework
DATA
HDFS
KAFKA
SOLR
HBASE
Automation
Automation Setup for HDFS
u Jar used - org.apache.hadoop:hadoop-common
u Some Classes used -
q org.apache.hadoop.conf.Configuration
Provides access to HDFS configuration parameters
q org.apache.hadoop.fs.Path
Used to construct a file path from a string
q org.apache.hadoop.fs.FileStatus
Interface that represents the information for a file
q org.apache.hadoop.fs.FileSystem
Class that provides an object to interact with Hadoop file system
u Some HDFS operations –
q copyFromLocal – Moving files to HDFS for storage or processing
q copyToLocal – Moving stored files from HDFS to local
HBase
u No SQL database
u Wrapper built over HDFS
u HBase is a database which is column-oriented
distributed database designed to work on
Distributed File System called HDFS
u It is a part of the Hadoop ecosystem that
provides random real-time read/write access
to data in the Hadoop File System
u One can store the data in HDFS either directly
or through HBase. Data consumer
reads/accesses the data in HDFS randomly
using HBase
Hbase Table
Row Key - 1234
Personal Professional
Name City Designation Salary
Josh Columbus TAD 80,000
Row Key - 5678
Personal Professional
Name City Designation Salary
Alex Atlanta Sr. Developer 90,000
DATA
HDFS
KAFKA
SOLR
HBASE
Automation
Automation Setup for HBASE
u Gem used - hbase-jruby
u hbase-jruby is a simple JRuby binding for HBase
u hbase-jruby provides the followings:
q Easy, Ruby interface for the fundamental HBase operation
u Operations done using hbase-jruby –
q PUT – Puts data into the table
q GET – Retrieve data from table using one or more rowkeys
q SCAN – Scans the table for given range of rowkeys
q DELETE – Deletes data from table
Kafka
u Kafka is publish-subscribe/ pub-
sub messaging system
u Publish–subscribe is a messaging
pattern where senders of
messages are called publishers
and receivers are called
consumers. Publishers will send
messages into the kafka topic
and subscribers will consume the
messages from kafka topic
Kafka Tool
Automation Setup for KAFKA
u Jar used - org.apache.kafka:kafka-clients
u Classes used -
u org.apache.kafka.clients.consumer.ConsumerConfig
Configuration for the Kafka Consumer
u org.apache.kafka.clients.producer.ProducerConfig
Configuration for the Kafka Producer
DATA
HDFS
KAFKA
SOLR
HBASE
Automation
Solr
u Solr is a
search/storage engine
where you can index a
set of documents and
then query to return a
set of documents that
matches user query
u Solr can be used along
with Hadoop. As
Hadoop handles a
large amount of data,
Solr helps us in finding
the required
information from such
a large source
REST
SERVICE
Request
SOLRResponse
Solr Admin
DATA
HDFS
KAFKA
SOLR
HBASE
Automation
Automation Setup for SOLR
u Jar used - org.apache.solr:solr-solrj
u Classes used -
q org.apache.solr.common.SolrDocument
A concrete representation of a document within a Solr index
q org.apache.solr.client.solrj.SolrQuery
This is an augmented SolrParams with get/set/add fields for common fields used in the Standard
and Dismax request handlers
q org.apache.solr.client.solrj.impl.CloudSolrClient
Instances of this class communicate with Zookeeper to discover Solr endpoints for Solr collections
q org.apache.solr.client.solrj.impl.HttpSolrClient
A SolrClient implementation that talks directly to a Solr server via HTTP
Advantages of Automation
u Concrete framework - Validating individual components and pinpoint where
things went wrong
u Gray box for QA – Ability to provide additional information to developers
which helps them debug the issue and do root cause analysis
u Ease of use – Execute automation scripts near real time/ batch mode
u Data Flexibility – Create our own test data as we want based on use cases and
have it automated – makes life simple
u Bugs – Identifying an incorrect entry from a million records
u Quick turnaround time and faster feedback
u Always up and running - Health check of all systems
u Reusability - Created a gem, used across OCLC
Key Takeaways
u Overview of BigData
u Hadoop Ecosystem and its components
u How to Automate?
Questions?
Thank You!!

More Related Content

What's hot

What's hot (20)

Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop architecture by ajay
Hadoop architecture by ajayHadoop architecture by ajay
Hadoop architecture by ajay
 
HDFS
HDFSHDFS
HDFS
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
 
Introduction to Apache HBase Training
Introduction to Apache HBase TrainingIntroduction to Apache HBase Training
Introduction to Apache HBase Training
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop Integration
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
6.hive
6.hive6.hive
6.hive
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 

Similar to A glimpse of test automation in hadoop ecosystem by Deepika Achary

Similar to A glimpse of test automation in hadoop ecosystem by Deepika Achary (20)

Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Ess1000 glossary
Ess1000 glossaryEss1000 glossary
Ess1000 glossary
 
Analysing big data with cluster service and R
Analysing big data with cluster service and RAnalysing big data with cluster service and R
Analysing big data with cluster service and R
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!
 
Building Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaBuilding Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and Kafka
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 

More from QA or the Highway

Jeff Van Fleet and John Townsend - Transition from Testing to Leadership.pdf
Jeff Van Fleet and John Townsend - Transition from Testing to Leadership.pdfJeff Van Fleet and John Townsend - Transition from Testing to Leadership.pdf
Jeff Van Fleet and John Townsend - Transition from Testing to Leadership.pdf
QA or the Highway
 

More from QA or the Highway (20)

KrishnaToolComparisionPPT.pdf
KrishnaToolComparisionPPT.pdfKrishnaToolComparisionPPT.pdf
KrishnaToolComparisionPPT.pdf
 
Ravi Lakkavalli - World Quality Report.pptx
Ravi Lakkavalli - World Quality Report.pptxRavi Lakkavalli - World Quality Report.pptx
Ravi Lakkavalli - World Quality Report.pptx
 
Caleb Crandall - Testing Between the Buckets.pptx
Caleb Crandall - Testing Between the Buckets.pptxCaleb Crandall - Testing Between the Buckets.pptx
Caleb Crandall - Testing Between the Buckets.pptx
 
Thomas Haver - Mobile Testing.pdf
Thomas Haver - Mobile Testing.pdfThomas Haver - Mobile Testing.pdf
Thomas Haver - Mobile Testing.pdf
 
Thomas Haver - Example Mapping.pdf
Thomas Haver - Example Mapping.pdfThomas Haver - Example Mapping.pdf
Thomas Haver - Example Mapping.pdf
 
Joe Colantonio - Actionable Automation Awesomeness in Testing Farm.pdf
Joe Colantonio - Actionable Automation Awesomeness in Testing Farm.pdfJoe Colantonio - Actionable Automation Awesomeness in Testing Farm.pdf
Joe Colantonio - Actionable Automation Awesomeness in Testing Farm.pdf
 
Sarah Geisinger - Continious Testing Metrics That Matter.pdf
Sarah Geisinger - Continious Testing Metrics That Matter.pdfSarah Geisinger - Continious Testing Metrics That Matter.pdf
Sarah Geisinger - Continious Testing Metrics That Matter.pdf
 
Jeff Sing - Quarterly Service Delivery Reviews.pdf
Jeff Sing - Quarterly Service Delivery Reviews.pdfJeff Sing - Quarterly Service Delivery Reviews.pdf
Jeff Sing - Quarterly Service Delivery Reviews.pdf
 
Leandro Melendez - Chihuahua Load Tests.pdf
Leandro Melendez - Chihuahua Load Tests.pdfLeandro Melendez - Chihuahua Load Tests.pdf
Leandro Melendez - Chihuahua Load Tests.pdf
 
Rick Clymer - Incident Management.pdf
Rick Clymer - Incident Management.pdfRick Clymer - Incident Management.pdf
Rick Clymer - Incident Management.pdf
 
Robert Fornal - ChatGPT as a Testing Tool.pptx
Robert Fornal - ChatGPT as a Testing Tool.pptxRobert Fornal - ChatGPT as a Testing Tool.pptx
Robert Fornal - ChatGPT as a Testing Tool.pptx
 
Federico Toledo - Extra-functional testing.pdf
Federico Toledo - Extra-functional testing.pdfFederico Toledo - Extra-functional testing.pdf
Federico Toledo - Extra-functional testing.pdf
 
Andrew Knight - Managing the Test Data Nightmare.pptx
Andrew Knight - Managing the Test Data Nightmare.pptxAndrew Knight - Managing the Test Data Nightmare.pptx
Andrew Knight - Managing the Test Data Nightmare.pptx
 
Melissa Tondi - Automation We_re Doing it Wrong.pdf
Melissa Tondi - Automation We_re Doing it Wrong.pdfMelissa Tondi - Automation We_re Doing it Wrong.pdf
Melissa Tondi - Automation We_re Doing it Wrong.pdf
 
Jeff Van Fleet and John Townsend - Transition from Testing to Leadership.pdf
Jeff Van Fleet and John Townsend - Transition from Testing to Leadership.pdfJeff Van Fleet and John Townsend - Transition from Testing to Leadership.pdf
Jeff Van Fleet and John Townsend - Transition from Testing to Leadership.pdf
 
DesiradhaRam Gadde - Testers _ Testing in ChatGPT-AI world.pptx
DesiradhaRam Gadde - Testers _ Testing in ChatGPT-AI world.pptxDesiradhaRam Gadde - Testers _ Testing in ChatGPT-AI world.pptx
DesiradhaRam Gadde - Testers _ Testing in ChatGPT-AI world.pptx
 
Damian Synadinos - Word Smatter.pdf
Damian Synadinos - Word Smatter.pdfDamian Synadinos - Word Smatter.pdf
Damian Synadinos - Word Smatter.pdf
 
Lee Barnes - What Successful Test Automation is.pdf
Lee Barnes - What Successful Test Automation is.pdfLee Barnes - What Successful Test Automation is.pdf
Lee Barnes - What Successful Test Automation is.pdf
 
Jordan Powell - API Testing with Cypress.pptx
Jordan Powell - API Testing with Cypress.pptxJordan Powell - API Testing with Cypress.pptx
Jordan Powell - API Testing with Cypress.pptx
 
Carlos Kidman - Exploring AI Applications in Testing.pptx
Carlos Kidman - Exploring AI Applications in Testing.pptxCarlos Kidman - Exploring AI Applications in Testing.pptx
Carlos Kidman - Exploring AI Applications in Testing.pptx
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 

A glimpse of test automation in hadoop ecosystem by Deepika Achary

  • 1. A Glimpse of Test Automation in Hadoop Ecosystem Deepika Achary Test Engineer, OCLC
  • 2. Who Am I? def self.info() name = ‘Deepika Achary’ job = ‘Test Automation Engineer’ company = ‘OCLC’ email = ‘acharyd@oclc.org’ hobbies = [‘Baking’, ‘Watching movies’, ‘Painting’] end
  • 3. What are we going to talk about?
  • 4. What is BigData? u “BigData” refers to data that is so large, with unstructured or semi-structured format of data that it’s difficult to process using traditional methods u Volumes too great for a typical DBMS (terabytes, petabytes, exabytes of data) u Sources of BigData – Social media, IoT Appliances(Smart devices), E-commerce transactions, GPS location data etc.
  • 5. Let’s have a look how much data is generated per minute on internet 1+ Million2+ Million2+ Million4.5 Million
  • 6. Initial thought process and Approach We decided to move to Hadoop to handle the huge data Some of the applications are being developed in Hadoop ecosystem That’s when the Hadoop ecosystem components came into picture and thought about test automation for the applications Why Jruby?
  • 7. What is JRuby and Why use it? u JRuby is a 100% pure-Java implementation of the Ruby programming language u JRuby allows Ruby programs to use Java classes. This is a powerful concept that JRuby now brings to Ruby users. u JRuby can integrate with Java code. If you have Java class libraries (.jar's), you can reference and use them from within Ruby code with JRuby u By leveraging Java the platform with the power of the Ruby programming language, programmers get the best from both worlds
  • 8. Ruby vs JRuby Ruby JRuby A dynamic, interpreted, open source programming language with a focus on simplicity and productivity A high performance, stable, fully threaded Java implementation of the Ruby programming language High performance Comparatively higher performance than Ruby Uses Ruby gems Can use Ruby gems along with java libraries
  • 11. HDFS Hadoop Distributed File System Storage layer of Hadoop Data gets stored in distributed manner in HDFS Files broken down in smaller chunks and stored in various machines Breaking the files and creating copies of the files and stored in different nodes If one machine fails, make sure data can be retrieved from other machines
  • 14. Automation Setup for HDFS u Jar used - org.apache.hadoop:hadoop-common u Some Classes used - q org.apache.hadoop.conf.Configuration Provides access to HDFS configuration parameters q org.apache.hadoop.fs.Path Used to construct a file path from a string q org.apache.hadoop.fs.FileStatus Interface that represents the information for a file q org.apache.hadoop.fs.FileSystem Class that provides an object to interact with Hadoop file system u Some HDFS operations – q copyFromLocal – Moving files to HDFS for storage or processing q copyToLocal – Moving stored files from HDFS to local
  • 15.
  • 16. HBase u No SQL database u Wrapper built over HDFS u HBase is a database which is column-oriented distributed database designed to work on Distributed File System called HDFS u It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System u One can store the data in HDFS either directly or through HBase. Data consumer reads/accesses the data in HDFS randomly using HBase
  • 17. Hbase Table Row Key - 1234 Personal Professional Name City Designation Salary Josh Columbus TAD 80,000 Row Key - 5678 Personal Professional Name City Designation Salary Alex Atlanta Sr. Developer 90,000
  • 19. Automation Setup for HBASE u Gem used - hbase-jruby u hbase-jruby is a simple JRuby binding for HBase u hbase-jruby provides the followings: q Easy, Ruby interface for the fundamental HBase operation u Operations done using hbase-jruby – q PUT – Puts data into the table q GET – Retrieve data from table using one or more rowkeys q SCAN – Scans the table for given range of rowkeys q DELETE – Deletes data from table
  • 20.
  • 21. Kafka u Kafka is publish-subscribe/ pub- sub messaging system u Publish–subscribe is a messaging pattern where senders of messages are called publishers and receivers are called consumers. Publishers will send messages into the kafka topic and subscribers will consume the messages from kafka topic
  • 23. Automation Setup for KAFKA u Jar used - org.apache.kafka:kafka-clients u Classes used - u org.apache.kafka.clients.consumer.ConsumerConfig Configuration for the Kafka Consumer u org.apache.kafka.clients.producer.ProducerConfig Configuration for the Kafka Producer
  • 25.
  • 26. Solr u Solr is a search/storage engine where you can index a set of documents and then query to return a set of documents that matches user query u Solr can be used along with Hadoop. As Hadoop handles a large amount of data, Solr helps us in finding the required information from such a large source REST SERVICE Request SOLRResponse
  • 29. Automation Setup for SOLR u Jar used - org.apache.solr:solr-solrj u Classes used - q org.apache.solr.common.SolrDocument A concrete representation of a document within a Solr index q org.apache.solr.client.solrj.SolrQuery This is an augmented SolrParams with get/set/add fields for common fields used in the Standard and Dismax request handlers q org.apache.solr.client.solrj.impl.CloudSolrClient Instances of this class communicate with Zookeeper to discover Solr endpoints for Solr collections q org.apache.solr.client.solrj.impl.HttpSolrClient A SolrClient implementation that talks directly to a Solr server via HTTP
  • 30.
  • 31. Advantages of Automation u Concrete framework - Validating individual components and pinpoint where things went wrong u Gray box for QA – Ability to provide additional information to developers which helps them debug the issue and do root cause analysis u Ease of use – Execute automation scripts near real time/ batch mode u Data Flexibility – Create our own test data as we want based on use cases and have it automated – makes life simple u Bugs – Identifying an incorrect entry from a million records u Quick turnaround time and faster feedback u Always up and running - Health check of all systems u Reusability - Created a gem, used across OCLC
  • 32. Key Takeaways u Overview of BigData u Hadoop Ecosystem and its components u How to Automate?