vodQA Bangalore 2019 - Orange

•Download as PPTX, PDF•

0 likes•35 views

vodQA

How to squeeze your test suite with Orange

Technology

Squeeze your test
suite using Orange
https://github.com/SudhaNadchal
sudhashettyn@gmail.com
Sudha Nadchal

The problem
 Same verification steps in several test cases
 Execution of redundant test cases may take several
days
 Need to minimize Time and Cost of execution and
maintenance
 Bulky test suites in Legacy applications

The solution
Bulky Test
Suite
Leaner
Test suite
Text mining

What is Orange?
 Open source Visual programming tool kit
 Supports:
• Data visualization
• Machine learning
• Data mining
• Already in use in the field of Bioinformatics,
Space research, Image analytics, Geology etc

Workflow to visualize similar test cases

What text Pre-processing does?
Stemming
Lemmatization
Transformation Stop words Removal
Uppercase to lowercase

s
Dimensionality reduction
Similarity calculation

Limitations
 Cluster gets be hard to interpret if corpus is too large
 Spelling errors and other inconsistencies
 Stemmers are not accurate.
‘Caring’ -> Lemmatization -> ‘Care’
‘Caring’ -> Stemming -> ‘Car’

Recently uploaded

Corporate and higher education May webinar.pptxRustici Software

WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2

Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021

JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37

CNIC Information System with Pakdata Cf In Pakistandanishmna97

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub

API Governance and Monetization - The evolution of API governanceWSO2

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Simplifying Mobile A11y Presentation.pptxMarkSteadman7

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2

Understanding the FAA Part 107 License ..Christopher Logan Kennedy

Platformless Horizons for Digital AdaptabilityWSO2

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

Recently uploaded (20)

Corporate and higher education May webinar.pptx

WSO2's API Vision: Unifying Control, Empowering Developers

Six Myths about Ontologies: The Basics of Formal Ontology

JohnPollard-hybrid-app-RailsConf2024.pptx

CNIC Information System with Pakdata Cf In Pakistan

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

API Governance and Monetization - The evolution of API governance

Strategies for Landing an Oracle DBA Job as a Fresher

Simplifying Mobile A11y Presentation.pptx

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...

Understanding the FAA Part 107 License ..

Platformless Horizons for Digital Adaptability

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform

Artificial Intelligence Chap.5 : Uncertainty

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

AWS Community Day CPH - Three problems of Terraform

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

vodQA Bangalore 2019 - Orange

1. Squeeze your test suite using Orange https://github.com/SudhaNadchal sudhashettyn@gmail.com Sudha Nadchal

2. The problem  Same verification steps in several test cases  Execution of redundant test cases may take several days  Need to minimize Time and Cost of execution and maintenance  Bulky test suites in Legacy applications

3. The solution Bulky Test Suite Leaner Test suite Text mining

4. What is Orange?  Open source Visual programming tool kit  Supports: • Data visualization • Machine learning • Data mining • Already in use in the field of Bioinformatics, Space research, Image analytics, Geology etc

5. Approach

6. Workflow to visualize similar test cases

7. Import Documents to Orange Workflow

8. Create a corpus

9. Text Pre-processing

10. What text Pre-processing does? Stemming Lemmatization Transformation Stop words Removal Uppercase to lowercase

11. Bag of Words

12. How Bag of words assigns weight?

13. s Dimensionality reduction Similarity calculation

15. Limitations  Cluster gets be hard to interpret if corpus is too large  Spelling errors and other inconsistencies  Stemmers are not accurate. ‘Caring’ -> Lemmatization -> ‘Care’ ‘Caring’ -> Stemming -> ‘Car’

16. Thank you!

Editor's Notes

Redundancy is the repeated data among different test cases
As a solution to these problems, I propose a text mining based approach using a open source tool called Orange that anyone can use. This solution provides visual representation of two or more test case pairs in a test suite that are identical or functionally related. This can be used as an input in order to eliminate the duplicate test cases and also for functionally merging the test cases; thus making a leaner test suite.
A capability must be established to save the test cases in a test suite or test plan folder of interest as individual text files in a folder. This serves as input to the model. If any test case contains any attachments that needs to be ignored. If the test cases are managed in HP ALM, this can be established using REST API of ALM.
Download orange. By default orange doesn’t come with Text mining features. We need to install “Text – Add on “. Once we install the Text Add in, we are good to start. We have a canvas where one can drag and drop the components aka Widgets. This is how the workflow looks like. I’m using
"Import Documents" widget is used to import test case folder of choice.
Once the documents are imported, Word cloud and Corpus viewer are for visualizing the text. Corpus is the collection of documents. Wordcloud widget displays the frequencies of word. The larger the word appears in word cloud, more frequent it is used in the cloud. Infact, when we view the corpus without any pre-processing, the word cloud displays silly things such as punctuations and uninformative words.
We’ll use Pre-process text to get rid of these. This widget/component carries out a series of operations on corpus.
Lemmatization is the process of converting a word to its base form. The difference between stemming and lemmatization is, lemmatization considers the context and converts the word to its meaningful base form, whereas stemming just removes the last few characters, often leading to incorrect meanings and spelling errors. For example, lemmatization would correctly identify the base form of ‘caring’ to ‘care’, whereas, stemming would cutoff the ‘ing’ part and convert it to car. ‘Caring’ -> Lemmatization -> ‘Care’‘Caring’ -> Stemming -> ‘Car’ Normalization Stemmer –Porter, Snowball Lemmatizer – Wordnet Filtering – Stop words
BoW is a way of extracting features from test cases. Bow is only concerned with whether known words occur in the test case, not where in the test case. The intuition is that test cases are similar or related if they have similar content.
Term frequency (tf) is basically the output of the Bag of word. For a specific testcase, it determines how important a word is by looking at how frequently it appears in the testcase. Term frequency measures the local importance of the word. If a word appears a lot of times, then the word must be important. For example, if our document is “I am a cat lover. I have a cat named Steve. I feed a cat outside my room regularly,” we see that the words with the highest frequency are I, a, and cat. This agrees with our intuition that high term frequency = higher importance since the document is all about my fascination with cats. The second component of tf-idf is inverse document frequency (idf). For a word to be considered a signature word of a document, it shouldn’t appear that often in the other documents. Thus, a signature word’s document frequency must be low, meaning its inverse document frequency must be high
Euclidian Distance is the ordinary straight line distance between the two points in space.
Cluster gets be hard to interpret if corpus is too large Spelling errors and other inconsistencies can result in inaccuracies Stemmers are not accurate. ‘Caring’ -> Lemmatization -> ‘Care’‘Caring’ -> Stemming -> ‘Car’

vodQA Bangalore 2019 - Orange

Recommended

Recommended

More Related Content

More from vodQA

More from vodQA (20)

Recently uploaded

Recently uploaded (20)

vodQA Bangalore 2019 - Orange

Editor's Notes