SlideShare a Scribd company logo
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesWith big data evolving rapidly, organizations must seek solutions to ensure robust processes for quality assurance around big data implementations. 
Executive SummaryHarvesting relevant information from big data is an imperative for enterprises seeking to optimize strategic business decision-making. Opportunities that were traditionally unavailable are now a reali- ty, with new and more revealing insights extracted from sources such as social media and devices that constitute the Internet of Things. Consequently, emerging technologies are enabling organiza- tions to gain valuable business insights from data that is growing exponentially in volume, velocity, variation of data formats and complexity. Leading industry analysts forecast the big data market to reach U.S.$25 billion by 2015.1 As a con- sequence, organizations will require newer data integration platforms, fueling demand for QA processes that service new platforms, leading to the necessity of big data testing. For big data testing strategy to be effective, the “4Vs” of big data — volume (scale of data), vari- ety (different forms of data), velocity (analysis 
of streaming data in microseconds) and verac- ity (certainty of data) — must be continuously monitored and validated. In addition to the large volumes, the heterogeneous and unstructured nature of big data increases the complexity of val- idation, rendering sampling-based traditional QA strategy infeasible. Setting up a QA infrastructure to manage these volumes itself is a challenge. The absence of robust test data management strategies and a lack of performance testing tools within many IT organizations make big data testing one of the most perplexing technical prop- ositions that business encounters. Meeting the big data testing challenge requires utilities and automation solutions to improve test coverage, particularly when sampling-based traditional QA strategies are inadequate. This white paper outlines our proposed big data test- ing framework, with a focus on identifying the key processes in data warehouse testing, perfor- mance testing and test data management. • Cognizant 20-20 Insightscognizant 20-20 insights | october 2014
c 2 ognizant 20-20 insights 
Challenges in Big Data 
Since the mid-1990s, organizations have become 
accustomed to handling data contained in rela-tional 
databases and spreadsheets; this data 
is structured. However, with the advent of big 
data, information can reside in semi-structured 
or unstructured formats, which are cumbersome 
to interpret and manage as the data resides in 
database rows and columns. 
With the phenomenal explo-sion 
in the IT intensity of 
most businesses, data vol-umes and velocities have 
accelerated, creating a 
need for real-time big data 
testing. This has height-ened 
concerns over how to 
assure quality across the 
big data ecosystem. 
At present, testers process 
clean and structured data. 
However, they also need to 
handle semi-structured and 
unstructured data. Key issues that require rela-tively more attention in big data testing include: 
• Data security. 
• Performance issues and the workload on the 
system due to heightened data volumes. 
• Scalability of the data storage media. 
Data warehouse testing, performance testing, 
and test data management are the fundamental 
components of big data testing. Addressing these 
challenges is tantamount to verifying the entire 
big data testing continuum (see Figure 1). 
Streamlining Processes to Overcome 
Challenges in Big Data Testing 
Given the ever-evolving technology landscape, 
today’s necessity becomes obsolete tomorrow. As 
a result, it is important to establish streamlined 
processes that will stay the course despite chang-ing 
technologies and evolving platforms. Software 
testing follows the same evolutionary cycle. 
With the 
phenomenal 
explosion in the 
IT intensity of 
most businesses, 
data volumes 
and velocities 
have accelerated, 
creating a need for 
real-time big data 
testing. 
Data Warehouse Testing 
• Decreased test coverage 
due to complex organiza-tion 
of big data require-ment. 
• Test data supports limited 
normalization. 
• 4 Vs — variety, velocity, 
volume and veracity — are 
not monitored. 
Performance Testing 
• Generation of greater 
workload for performance 
testing of big data. 
• Test results in the form of 
reports, charts and graphs 
are at least twice as big in 
comparison with tradition-al 
BI reports. 
• Interpretation of results 
and identifying bottle-necks. 
• Performance tuning. 
Test Data Management 
• Management of test data 
during automated testing 
process. 
• Anticipating the acquisi-tion 
and management of 
test data during different 
phases of the software 
testing lifecycle. 
• Test data setup in relation 
to test coverage, accuracy 
and types of big test data. 
• Investment in servers 
utilized for performance 
testing and small-scale 
companies may not be 
cost-effective. 
At a Glance: Challenges in Big Data Testing 
Figure 1
cognizant 20-20 insights 3 
To address the dynamic changes in the big data 
ecosystem, organizations must streamline their 
processes for data warehouse testing, perfor-mance testing and test data management. 
Strengthen Data Warehousing Processes 
While data warehouse testing is performed in 
a controlled environment, the unpredictable 
nature of the big data testing environment pres-ents a unique set of challenges. Data warehouse 
and business intelligence testing require highly 
complex testing strategies, processes and tools 
pertaining specifically to the 4Vs of big data. 
Recommendations to refine test strategies and 
processes include: 
• Make “big” things simple through a “divide- 
and-test” strategy. Organize your big data 
warehouse into smaller units that are easily 
testable, thus improving the test coverage and 
optimizing the big data test set. 
• Normalize design and tests. Achieve a more 
effective generation of normalized test data 
for big data testing by normalizing the dynamic 
schemas at the design level. 
• Enhance testing through measuring the 4Vs. 
Data warehouse test environments that are 
specifically designed to handle the 4Vs of big 
data will result in improved test coverage. 
Strengthen Performance 
Performance testing is an integral part of 
system testing that focuses on volumes, work-loads, real-time scenarios and end users’ 
navigational habits. The performance of a system 
depends on variable factors such as network, 
underlying hardware, Web servers, database 
servers, hosting servers, number of peak loads 
and prolonged workloads. However, addressing 
these requirements — and maintaining big data 
test systems performance — requires the orga-nization’s full attention. Recommendations to 
implement the big data framework in perfor-mance testing include: 
• Simulate a real-time environment with dis-tributed and parallel workload distribution. 
Testing should be carried out in parallel in a dis-tributed environment. The scripts generated 
by performance testing tools should be dis-tributed among the controllers to simulate a 
real-time environment. 
• Integration with distributed test data: Per-formance testing strategies depend predomi-nately on the scenario set of 
the controllers. The spread-sheets and the back-end 
databases that typically hold 
test data often lack the ability 
to hold unstructured big data. 
To overcome this obstacle, the 
controller should be provided 
with an interface that can be 
used to integrate with the 
existing distributed test data. 
• Parallel test execution: Enabling distributed 
virtual users to execute tests in parallel is an 
effective way to handle test execution. 
Strengthen Test Data Quality 
Recommendations for addressing the pain points 
in test data management of big data include: 
• Planning and designing: Automated scripts 
cannot be scaled to test big data. Scaling up 
test data sets without adequate planning and 
design will lead to delayed response time and 
possibly timed-out test execution. Performing 
action-based testing (ABT) will help mitigate 
this issue. In ABT, tests are treated as actions 
in a test module. These actions are pointed 
toward keywords along with the parameters 
required for executing the tests. 
• Infrastructure setup: Test automation 
consumes enormous resources to generate 
workloads. However, investing in dedicated 
servers is not cost-efficient for the small-scale 
operations that process big data. Renting 
infrastructure as infrastructure as a service 
delivered via the cloud can help mitigate 
costs. Alternatively, the generation of higher 
workloads for performance testing of big data 
can be effectively handled with virtual parallel-ism 
on numerous virtual machines. 
Big Data Testing Is No Longer 
a Distant Chimera 
Big data testing strategy is pivotal for the suc-cess of big data initiatives. As a logical extension, 
testing and QA teams will not be exempted from 
handling big data. Yet, big data testing remains 
in a nascent stage and lacks a defined manual 
testing framework to transition to automated 
testing. Moreover, QA processes, customized 
frameworks and tools used in various specialized 
testing services will require a significant upgrade 
to effectively and efficiently handle big data. 
What was once 
called garbage 
data is now 
known as big 
data. Nothing is 
wasted, deleted or 
removed.
The risks and challenges illustrated in this paper 
are just the tip of the iceberg. Big data is getting 
bigger by the day. With every passing moment 
there are bigger challenges of scalability and 
increased usage of cloud resources to ramp up 
testing of big data ecosystems. Unprecedented 
risks and issues may still emerge as the test-ing 
community starts working with big data. 
Leveraging the expertise of big data testing con-sultants can ease the pain and accelerate the 
learning curve in three important ways: 
• Designing an end-to-end QA strategy to 
contend with the 4Vs. 
• Gaining advice on the use of tried-and-true and 
appropriate tools. 
• Mitigating anticipated (and unanticipated) 
risks and related issues. 
What was once called garbage data is now known 
as big data. Nothing is wasted, deleted or removed. 
All data sets, structured, semi-structured and 
unstructured, are of paramount importance for 
businesses interested in making informed and 
timely decisions on strategies that drive business 
success today and tomorrow. 
Footnote 
1 “ Global Big Data Industry at $25 Billion by 2015,” NASSCOM, September 5, 2012, http://www.crisil.com/ 
Ratings/Brochureware/News/CRISIL-GRA_NASSCOM-big-data_050912.pdf. 
About the Author 
Sushmitha Geddam is a Project Manager at Cognizant’s Data Testing Center of Excellence (CoE) 
R&D team and leads the big data initiatives. She has over 10 years of experience handling projects 
in specialized testing areas such as database, DW/ETL and data migration across numerous industry 
sectors, including investment banking, healthcare, insurance and telecom. Sushmitha can be reached 
at Sushmitha.Geddam@cognizant.com. 
About Cognizant QE&A Data Testing Center of Excellence 
Cognizant QE&A Data Testing CoE breeds expertise around industry-leading technologies. Our experts 
help you to accelerate and optimize the QA of your data testing needs by using state-of-the-art products, 
solutions and frameworks. For more details, reach us at QEAinBigData@cognizant.com. 
About Cognizant 
Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process out-sourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 75 development and delivery centers worldwide and approximately 187,400 employees as of June 30, 2014, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant. 
World Headquarters 
500 Frank W. Burr Blvd. 
Teaneck, NJ 07666 USA 
Phone: +1 201 801 0233 
Fax: +1 201 801 0243 
Toll Free: +1 888 937 3277 
Email: inquiry@cognizant.com 
European Headquarters 
1 Kingdom Street 
Paddington Central 
London W2 6BD 
Phone: +44 (0) 20 7297 7600 
Fax: +44 (0) 20 7121 0102 
Email: infouk@cognizant.com 
India Operations Headquarters 
#5/535, Old Mahabalipuram Road 
Okkiyam Pettai, Thoraipakkam 
Chennai, 600 096 India 
Phone: +91 (0) 44 4209 6000 
Fax: +91 (0) 44 4209 6060 
Email: inquiryindia@cognizant.com 
© Copyright 2014, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any 
means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is 
subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.

More Related Content

What's hot

A simplified approach for quality management in data warehouse
A simplified approach for quality management in data warehouseA simplified approach for quality management in data warehouse
A simplified approach for quality management in data warehouse
IJDKP
 
"Test Data Management In a Nutshell" by Satyajit Singh
"Test Data Management In a Nutshell" by Satyajit Singh"Test Data Management In a Nutshell" by Satyajit Singh
"Test Data Management In a Nutshell" by Satyajit Singh
Agile Testing Alliance
 
Tufts Research: EDC Trends, Insights, and Opportunities
Tufts Research: EDC Trends, Insights, and OpportunitiesTufts Research: EDC Trends, Insights, and Opportunities
Tufts Research: EDC Trends, Insights, and Opportunities
Veeva Systems
 
DATPROF Test data Management (data privacy & data subsetting) - English
DATPROF Test data Management (data privacy & data subsetting) - EnglishDATPROF Test data Management (data privacy & data subsetting) - English
DATPROF Test data Management (data privacy & data subsetting) - English
DATPROF
 
Webinar: How to Ace Your SaaS-based EDC System Validation for Sponsors and CROs
Webinar: How to Ace Your SaaS-based EDC System Validation for Sponsors and CROsWebinar: How to Ace Your SaaS-based EDC System Validation for Sponsors and CROs
Webinar: How to Ace Your SaaS-based EDC System Validation for Sponsors and CROs
Statistics & Data Corporation
 
Best practices for implementing and maintaining successful standards
Best practices for implementing and maintaining successful standardsBest practices for implementing and maintaining successful standards
Best practices for implementing and maintaining successful standards
Veeva Systems
 
Enabling Proactive Quality Management Across Quality and Manufacturing
Enabling Proactive Quality Management Across Quality and ManufacturingEnabling Proactive Quality Management Across Quality and Manufacturing
Enabling Proactive Quality Management Across Quality and Manufacturing
Veeva Systems
 
Data Warehouse Development Standardization Framework (DWDSF): A Way to Handle...
Data Warehouse Development Standardization Framework (DWDSF): A Way to Handle...Data Warehouse Development Standardization Framework (DWDSF): A Way to Handle...
Data Warehouse Development Standardization Framework (DWDSF): A Way to Handle...
IOSRjournaljce
 
Restructuring The Government Ict Infrastructures And Standards To Achieve Glo...
Restructuring The Government Ict Infrastructures And Standards To Achieve Glo...Restructuring The Government Ict Infrastructures And Standards To Achieve Glo...
Restructuring The Government Ict Infrastructures And Standards To Achieve Glo...
Ravi Tirumalai
 
Unify quality manufacturing to drive speed, compliance and collaboration
Unify quality manufacturing to drive speed, compliance and collaborationUnify quality manufacturing to drive speed, compliance and collaboration
Unify quality manufacturing to drive speed, compliance and collaboration
Veeva Systems
 
Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...
ijdms
 
TrackWise Enterprise Quality Management System
TrackWise Enterprise Quality Management SystemTrackWise Enterprise Quality Management System
TrackWise Enterprise Quality Management System
Jakub Sládeček
 
2014 dqe handouts
2014 dqe handouts2014 dqe handouts
2014 dqe handouts
Data Blueprint
 
PICI’s Best Practices for Building Oncology Studies in an EDC
PICI’s Best Practices for Building Oncology Studies in an EDCPICI’s Best Practices for Building Oncology Studies in an EDC
PICI’s Best Practices for Building Oncology Studies in an EDC
Veeva Systems
 
Ibm Optim Techical Overview 01282009
Ibm Optim Techical Overview 01282009Ibm Optim Techical Overview 01282009
Ibm Optim Techical Overview 01282009
lucascibm
 
Ta3s - Testing Banking and Finance Applications
Ta3s - Testing Banking and Finance ApplicationsTa3s - Testing Banking and Finance Applications
Ta3s - Testing Banking and Finance Applications
Ta3s Solutions Private Limited
 
Future of RWE - Big Data and Analytics for Pharma 2017 presentation
Future of RWE - Big Data and Analytics for Pharma 2017 presentationFuture of RWE - Big Data and Analytics for Pharma 2017 presentation
Future of RWE - Big Data and Analytics for Pharma 2017 presentation
Saama
 
Data Quality
Data QualityData Quality
Data Quality
jerdeb
 
Designing an EDC System to Work for a CRA
Designing an EDC System to Work for a CRADesigning an EDC System to Work for a CRA
Designing an EDC System to Work for a CRA
Veeva Systems
 
Migrating Clinical Data in Various Formats to a Clinical Data Management System
Migrating Clinical Data in Various Formats to a Clinical Data Management SystemMigrating Clinical Data in Various Formats to a Clinical Data Management System
Migrating Clinical Data in Various Formats to a Clinical Data Management SystemPerficient, Inc.
 

What's hot (20)

A simplified approach for quality management in data warehouse
A simplified approach for quality management in data warehouseA simplified approach for quality management in data warehouse
A simplified approach for quality management in data warehouse
 
"Test Data Management In a Nutshell" by Satyajit Singh
"Test Data Management In a Nutshell" by Satyajit Singh"Test Data Management In a Nutshell" by Satyajit Singh
"Test Data Management In a Nutshell" by Satyajit Singh
 
Tufts Research: EDC Trends, Insights, and Opportunities
Tufts Research: EDC Trends, Insights, and OpportunitiesTufts Research: EDC Trends, Insights, and Opportunities
Tufts Research: EDC Trends, Insights, and Opportunities
 
DATPROF Test data Management (data privacy & data subsetting) - English
DATPROF Test data Management (data privacy & data subsetting) - EnglishDATPROF Test data Management (data privacy & data subsetting) - English
DATPROF Test data Management (data privacy & data subsetting) - English
 
Webinar: How to Ace Your SaaS-based EDC System Validation for Sponsors and CROs
Webinar: How to Ace Your SaaS-based EDC System Validation for Sponsors and CROsWebinar: How to Ace Your SaaS-based EDC System Validation for Sponsors and CROs
Webinar: How to Ace Your SaaS-based EDC System Validation for Sponsors and CROs
 
Best practices for implementing and maintaining successful standards
Best practices for implementing and maintaining successful standardsBest practices for implementing and maintaining successful standards
Best practices for implementing and maintaining successful standards
 
Enabling Proactive Quality Management Across Quality and Manufacturing
Enabling Proactive Quality Management Across Quality and ManufacturingEnabling Proactive Quality Management Across Quality and Manufacturing
Enabling Proactive Quality Management Across Quality and Manufacturing
 
Data Warehouse Development Standardization Framework (DWDSF): A Way to Handle...
Data Warehouse Development Standardization Framework (DWDSF): A Way to Handle...Data Warehouse Development Standardization Framework (DWDSF): A Way to Handle...
Data Warehouse Development Standardization Framework (DWDSF): A Way to Handle...
 
Restructuring The Government Ict Infrastructures And Standards To Achieve Glo...
Restructuring The Government Ict Infrastructures And Standards To Achieve Glo...Restructuring The Government Ict Infrastructures And Standards To Achieve Glo...
Restructuring The Government Ict Infrastructures And Standards To Achieve Glo...
 
Unify quality manufacturing to drive speed, compliance and collaboration
Unify quality manufacturing to drive speed, compliance and collaborationUnify quality manufacturing to drive speed, compliance and collaboration
Unify quality manufacturing to drive speed, compliance and collaboration
 
Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...
 
TrackWise Enterprise Quality Management System
TrackWise Enterprise Quality Management SystemTrackWise Enterprise Quality Management System
TrackWise Enterprise Quality Management System
 
2014 dqe handouts
2014 dqe handouts2014 dqe handouts
2014 dqe handouts
 
PICI’s Best Practices for Building Oncology Studies in an EDC
PICI’s Best Practices for Building Oncology Studies in an EDCPICI’s Best Practices for Building Oncology Studies in an EDC
PICI’s Best Practices for Building Oncology Studies in an EDC
 
Ibm Optim Techical Overview 01282009
Ibm Optim Techical Overview 01282009Ibm Optim Techical Overview 01282009
Ibm Optim Techical Overview 01282009
 
Ta3s - Testing Banking and Finance Applications
Ta3s - Testing Banking and Finance ApplicationsTa3s - Testing Banking and Finance Applications
Ta3s - Testing Banking and Finance Applications
 
Future of RWE - Big Data and Analytics for Pharma 2017 presentation
Future of RWE - Big Data and Analytics for Pharma 2017 presentationFuture of RWE - Big Data and Analytics for Pharma 2017 presentation
Future of RWE - Big Data and Analytics for Pharma 2017 presentation
 
Data Quality
Data QualityData Quality
Data Quality
 
Designing an EDC System to Work for a CRA
Designing an EDC System to Work for a CRADesigning an EDC System to Work for a CRA
Designing an EDC System to Work for a CRA
 
Migrating Clinical Data in Various Formats to a Clinical Data Management System
Migrating Clinical Data in Various Formats to a Clinical Data Management SystemMigrating Clinical Data in Various Formats to a Clinical Data Management System
Migrating Clinical Data in Various Formats to a Clinical Data Management System
 

Viewers also liked

Developing a Comprehensive Safe-Driving Program for Teens
Developing a Comprehensive Safe-Driving Program for TeensDeveloping a Comprehensive Safe-Driving Program for Teens
Developing a Comprehensive Safe-Driving Program for Teens
Cognizant
 
Enabling Aftermarket Services as a Growth Driver for Manufacturers
Enabling Aftermarket Services as a Growth Driver for ManufacturersEnabling Aftermarket Services as a Growth Driver for Manufacturers
Enabling Aftermarket Services as a Growth Driver for Manufacturers
Cognizant
 
Safeguarding Bank Assets with an Early Warning System
Safeguarding Bank Assets with an Early Warning SystemSafeguarding Bank Assets with an Early Warning System
Safeguarding Bank Assets with an Early Warning System
Cognizant
 
Educators Pave the Way for Next Generation of Learners
Educators Pave the Way for Next Generation of LearnersEducators Pave the Way for Next Generation of Learners
Educators Pave the Way for Next Generation of Learners
Cognizant
 
Employee Wellness: Two Parts Perspiration, One Part Persistence
Employee Wellness: Two Parts Perspiration, One Part PersistenceEmployee Wellness: Two Parts Perspiration, One Part Persistence
Employee Wellness: Two Parts Perspiration, One Part Persistence
Cognizant
 
Beyond Green: The Triple Play of Sustainability
Beyond Green: The Triple Play of SustainabilityBeyond Green: The Triple Play of Sustainability
Beyond Green: The Triple Play of Sustainability
Cognizant
 
Bank(ing) on Data Science
Bank(ing) on Data ScienceBank(ing) on Data Science
Bank(ing) on Data Science
Cognizant
 
A New Approach to Application Portfolio Assessment for New-Age Business-Techn...
A New Approach to Application Portfolio Assessment for New-Age Business-Techn...A New Approach to Application Portfolio Assessment for New-Age Business-Techn...
A New Approach to Application Portfolio Assessment for New-Age Business-Techn...
Cognizant
 
Informed Manufacturing: Reaching for New Horizons
Informed Manufacturing: Reaching for New HorizonsInformed Manufacturing: Reaching for New Horizons
Informed Manufacturing: Reaching for New Horizons
Cognizant
 
The Online Self-Service Portal Journey
The Online Self-Service Portal JourneyThe Online Self-Service Portal Journey
The Online Self-Service Portal Journey
Cognizant
 
A Framework for Digital Business Transformation
A Framework for Digital Business TransformationA Framework for Digital Business Transformation
A Framework for Digital Business Transformation
Cognizant
 
Supply Chain Management of Locally-Grown Organic Food: A Leap Toward Sustaina...
Supply Chain Management of Locally-Grown Organic Food: A Leap Toward Sustaina...Supply Chain Management of Locally-Grown Organic Food: A Leap Toward Sustaina...
Supply Chain Management of Locally-Grown Organic Food: A Leap Toward Sustaina...
Cognizant
 
Transforming HR into a Strategic Asset enabled by Oracle HCM Cloud
Transforming HR into a Strategic Asset enabled by Oracle HCM CloudTransforming HR into a Strategic Asset enabled by Oracle HCM Cloud
Transforming HR into a Strategic Asset enabled by Oracle HCM Cloud
Cognizant
 

Viewers also liked (13)

Developing a Comprehensive Safe-Driving Program for Teens
Developing a Comprehensive Safe-Driving Program for TeensDeveloping a Comprehensive Safe-Driving Program for Teens
Developing a Comprehensive Safe-Driving Program for Teens
 
Enabling Aftermarket Services as a Growth Driver for Manufacturers
Enabling Aftermarket Services as a Growth Driver for ManufacturersEnabling Aftermarket Services as a Growth Driver for Manufacturers
Enabling Aftermarket Services as a Growth Driver for Manufacturers
 
Safeguarding Bank Assets with an Early Warning System
Safeguarding Bank Assets with an Early Warning SystemSafeguarding Bank Assets with an Early Warning System
Safeguarding Bank Assets with an Early Warning System
 
Educators Pave the Way for Next Generation of Learners
Educators Pave the Way for Next Generation of LearnersEducators Pave the Way for Next Generation of Learners
Educators Pave the Way for Next Generation of Learners
 
Employee Wellness: Two Parts Perspiration, One Part Persistence
Employee Wellness: Two Parts Perspiration, One Part PersistenceEmployee Wellness: Two Parts Perspiration, One Part Persistence
Employee Wellness: Two Parts Perspiration, One Part Persistence
 
Beyond Green: The Triple Play of Sustainability
Beyond Green: The Triple Play of SustainabilityBeyond Green: The Triple Play of Sustainability
Beyond Green: The Triple Play of Sustainability
 
Bank(ing) on Data Science
Bank(ing) on Data ScienceBank(ing) on Data Science
Bank(ing) on Data Science
 
A New Approach to Application Portfolio Assessment for New-Age Business-Techn...
A New Approach to Application Portfolio Assessment for New-Age Business-Techn...A New Approach to Application Portfolio Assessment for New-Age Business-Techn...
A New Approach to Application Portfolio Assessment for New-Age Business-Techn...
 
Informed Manufacturing: Reaching for New Horizons
Informed Manufacturing: Reaching for New HorizonsInformed Manufacturing: Reaching for New Horizons
Informed Manufacturing: Reaching for New Horizons
 
The Online Self-Service Portal Journey
The Online Self-Service Portal JourneyThe Online Self-Service Portal Journey
The Online Self-Service Portal Journey
 
A Framework for Digital Business Transformation
A Framework for Digital Business TransformationA Framework for Digital Business Transformation
A Framework for Digital Business Transformation
 
Supply Chain Management of Locally-Grown Organic Food: A Leap Toward Sustaina...
Supply Chain Management of Locally-Grown Organic Food: A Leap Toward Sustaina...Supply Chain Management of Locally-Grown Organic Food: A Leap Toward Sustaina...
Supply Chain Management of Locally-Grown Organic Food: A Leap Toward Sustaina...
 
Transforming HR into a Strategic Asset enabled by Oracle HCM Cloud
Transforming HR into a Strategic Asset enabled by Oracle HCM CloudTransforming HR into a Strategic Asset enabled by Oracle HCM Cloud
Transforming HR into a Strategic Asset enabled by Oracle HCM Cloud
 

Similar to Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges

Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL Testing
Cognizant
 
Etl testing strategies
Etl testing strategiesEtl testing strategies
Etl testing strategies
sivam_1
 
Test data management
Test data managementTest data management
Test data management
Rohit Gupta
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overview
dublinx
 
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWADecember 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWACarsten Roland
 
MetaSuite and_hp_quality_center_enterprise
MetaSuite and_hp_quality_center_enterpriseMetaSuite and_hp_quality_center_enterprise
MetaSuite and_hp_quality_center_enterprise
Minerva SoftCare GmbH
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
Agile Testing Alliance
 
Understanding big data testing
Understanding big data testingUnderstanding big data testing
Understanding big data testing
Narola Infotech
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data InitiativesAccelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives☁Jake Weaver ☁
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
RTTS
 
Lecture 23
Lecture 23Lecture 23
Lecture 23
Shani729
 
TMF 2014 Event Proceedings
TMF 2014 Event ProceedingsTMF 2014 Event Proceedings
TMF 2014 Event ProceedingsKJR
 
4 Test Data Management Techniques That Empower Software Testing
4 Test Data Management Techniques That Empower Software Testing4 Test Data Management Techniques That Empower Software Testing
4 Test Data Management Techniques That Empower Software Testing
Cigniti Technologies Ltd
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)Matt Barnes
 
Infographic Things You Should Know About Big Data Testing
Infographic Things You Should Know About Big Data TestingInfographic Things You Should Know About Big Data Testing
Infographic Things You Should Know About Big Data Testing
KiwiQA
 
Enterprise Test Data Generation.pptx
Enterprise Test Data Generation.pptxEnterprise Test Data Generation.pptx
Enterprise Test Data Generation.pptx
GenRocket Inc
 
Making Data Quality a Way of Life
Making Data Quality a Way of LifeMaking Data Quality a Way of Life
Making Data Quality a Way of Life
Cognizant
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA
 
Testing Data & Data-Centric Applications - Whitepaper
Testing Data & Data-Centric Applications - WhitepaperTesting Data & Data-Centric Applications - Whitepaper
Testing Data & Data-Centric Applications - WhitepaperRyan Dowd
 
OberservePoint - The Digital Data Quality Playbook
OberservePoint - The Digital Data Quality  PlaybookOberservePoint - The Digital Data Quality  Playbook
OberservePoint - The Digital Data Quality Playbook
ObservePoint
 

Similar to Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges (20)

Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL Testing
 
Etl testing strategies
Etl testing strategiesEtl testing strategies
Etl testing strategies
 
Test data management
Test data managementTest data management
Test data management
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overview
 
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWADecember 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
 
MetaSuite and_hp_quality_center_enterprise
MetaSuite and_hp_quality_center_enterpriseMetaSuite and_hp_quality_center_enterprise
MetaSuite and_hp_quality_center_enterprise
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
 
Understanding big data testing
Understanding big data testingUnderstanding big data testing
Understanding big data testing
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data InitiativesAccelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
 
Lecture 23
Lecture 23Lecture 23
Lecture 23
 
TMF 2014 Event Proceedings
TMF 2014 Event ProceedingsTMF 2014 Event Proceedings
TMF 2014 Event Proceedings
 
4 Test Data Management Techniques That Empower Software Testing
4 Test Data Management Techniques That Empower Software Testing4 Test Data Management Techniques That Empower Software Testing
4 Test Data Management Techniques That Empower Software Testing
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)
 
Infographic Things You Should Know About Big Data Testing
Infographic Things You Should Know About Big Data TestingInfographic Things You Should Know About Big Data Testing
Infographic Things You Should Know About Big Data Testing
 
Enterprise Test Data Generation.pptx
Enterprise Test Data Generation.pptxEnterprise Test Data Generation.pptx
Enterprise Test Data Generation.pptx
 
Making Data Quality a Way of Life
Making Data Quality a Way of LifeMaking Data Quality a Way of Life
Making Data Quality a Way of Life
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Testing Data & Data-Centric Applications - Whitepaper
Testing Data & Data-Centric Applications - WhitepaperTesting Data & Data-Centric Applications - Whitepaper
Testing Data & Data-Centric Applications - Whitepaper
 
OberservePoint - The Digital Data Quality Playbook
OberservePoint - The Digital Data Quality  PlaybookOberservePoint - The Digital Data Quality  Playbook
OberservePoint - The Digital Data Quality Playbook
 

More from Cognizant

Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...
Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...
Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...
Cognizant
 
Data Modernization: Breaking the AI Vicious Cycle for Superior Decision-making
Data Modernization: Breaking the AI Vicious Cycle for Superior Decision-makingData Modernization: Breaking the AI Vicious Cycle for Superior Decision-making
Data Modernization: Breaking the AI Vicious Cycle for Superior Decision-making
Cognizant
 
It Takes an Ecosystem: How Technology Companies Deliver Exceptional Experiences
It Takes an Ecosystem: How Technology Companies Deliver Exceptional ExperiencesIt Takes an Ecosystem: How Technology Companies Deliver Exceptional Experiences
It Takes an Ecosystem: How Technology Companies Deliver Exceptional Experiences
Cognizant
 
Intuition Engineered
Intuition EngineeredIntuition Engineered
Intuition Engineered
Cognizant
 
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
Cognizant
 
Enhancing Desirability: Five Considerations for Winning Digital Initiatives
Enhancing Desirability: Five Considerations for Winning Digital InitiativesEnhancing Desirability: Five Considerations for Winning Digital Initiatives
Enhancing Desirability: Five Considerations for Winning Digital Initiatives
Cognizant
 
The Work Ahead in Manufacturing: Fulfilling the Agility Mandate
The Work Ahead in Manufacturing: Fulfilling the Agility MandateThe Work Ahead in Manufacturing: Fulfilling the Agility Mandate
The Work Ahead in Manufacturing: Fulfilling the Agility Mandate
Cognizant
 
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
Cognizant
 
Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Cognizant
 
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Cognizant
 
Green Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for SustainabilityGreen Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for Sustainability
Cognizant
 
Policy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for InsurersPolicy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for Insurers
Cognizant
 
The Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with DigitalThe Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with Digital
Cognizant
 
AI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to ValueAI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to Value
Cognizant
 
Operations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First ApproachOperations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First Approach
Cognizant
 
Five Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the CloudFive Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the Cloud
Cognizant
 
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining FocusedGetting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Cognizant
 
Crafting the Utility of the Future
Crafting the Utility of the FutureCrafting the Utility of the Future
Crafting the Utility of the Future
Cognizant
 
Utilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformUtilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data Platform
Cognizant
 
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
Cognizant
 

More from Cognizant (20)

Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...
Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...
Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...
 
Data Modernization: Breaking the AI Vicious Cycle for Superior Decision-making
Data Modernization: Breaking the AI Vicious Cycle for Superior Decision-makingData Modernization: Breaking the AI Vicious Cycle for Superior Decision-making
Data Modernization: Breaking the AI Vicious Cycle for Superior Decision-making
 
It Takes an Ecosystem: How Technology Companies Deliver Exceptional Experiences
It Takes an Ecosystem: How Technology Companies Deliver Exceptional ExperiencesIt Takes an Ecosystem: How Technology Companies Deliver Exceptional Experiences
It Takes an Ecosystem: How Technology Companies Deliver Exceptional Experiences
 
Intuition Engineered
Intuition EngineeredIntuition Engineered
Intuition Engineered
 
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
 
Enhancing Desirability: Five Considerations for Winning Digital Initiatives
Enhancing Desirability: Five Considerations for Winning Digital InitiativesEnhancing Desirability: Five Considerations for Winning Digital Initiatives
Enhancing Desirability: Five Considerations for Winning Digital Initiatives
 
The Work Ahead in Manufacturing: Fulfilling the Agility Mandate
The Work Ahead in Manufacturing: Fulfilling the Agility MandateThe Work Ahead in Manufacturing: Fulfilling the Agility Mandate
The Work Ahead in Manufacturing: Fulfilling the Agility Mandate
 
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
 
Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...
 
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
 
Green Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for SustainabilityGreen Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for Sustainability
 
Policy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for InsurersPolicy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for Insurers
 
The Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with DigitalThe Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with Digital
 
AI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to ValueAI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to Value
 
Operations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First ApproachOperations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First Approach
 
Five Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the CloudFive Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the Cloud
 
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining FocusedGetting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
 
Crafting the Utility of the Future
Crafting the Utility of the FutureCrafting the Utility of the Future
Crafting the Utility of the Future
 
Utilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformUtilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data Platform
 
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
 

Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges

  • 1. Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesWith big data evolving rapidly, organizations must seek solutions to ensure robust processes for quality assurance around big data implementations. Executive SummaryHarvesting relevant information from big data is an imperative for enterprises seeking to optimize strategic business decision-making. Opportunities that were traditionally unavailable are now a reali- ty, with new and more revealing insights extracted from sources such as social media and devices that constitute the Internet of Things. Consequently, emerging technologies are enabling organiza- tions to gain valuable business insights from data that is growing exponentially in volume, velocity, variation of data formats and complexity. Leading industry analysts forecast the big data market to reach U.S.$25 billion by 2015.1 As a con- sequence, organizations will require newer data integration platforms, fueling demand for QA processes that service new platforms, leading to the necessity of big data testing. For big data testing strategy to be effective, the “4Vs” of big data — volume (scale of data), vari- ety (different forms of data), velocity (analysis of streaming data in microseconds) and verac- ity (certainty of data) — must be continuously monitored and validated. In addition to the large volumes, the heterogeneous and unstructured nature of big data increases the complexity of val- idation, rendering sampling-based traditional QA strategy infeasible. Setting up a QA infrastructure to manage these volumes itself is a challenge. The absence of robust test data management strategies and a lack of performance testing tools within many IT organizations make big data testing one of the most perplexing technical prop- ositions that business encounters. Meeting the big data testing challenge requires utilities and automation solutions to improve test coverage, particularly when sampling-based traditional QA strategies are inadequate. This white paper outlines our proposed big data test- ing framework, with a focus on identifying the key processes in data warehouse testing, perfor- mance testing and test data management. • Cognizant 20-20 Insightscognizant 20-20 insights | october 2014
  • 2. c 2 ognizant 20-20 insights Challenges in Big Data Since the mid-1990s, organizations have become accustomed to handling data contained in rela-tional databases and spreadsheets; this data is structured. However, with the advent of big data, information can reside in semi-structured or unstructured formats, which are cumbersome to interpret and manage as the data resides in database rows and columns. With the phenomenal explo-sion in the IT intensity of most businesses, data vol-umes and velocities have accelerated, creating a need for real-time big data testing. This has height-ened concerns over how to assure quality across the big data ecosystem. At present, testers process clean and structured data. However, they also need to handle semi-structured and unstructured data. Key issues that require rela-tively more attention in big data testing include: • Data security. • Performance issues and the workload on the system due to heightened data volumes. • Scalability of the data storage media. Data warehouse testing, performance testing, and test data management are the fundamental components of big data testing. Addressing these challenges is tantamount to verifying the entire big data testing continuum (see Figure 1). Streamlining Processes to Overcome Challenges in Big Data Testing Given the ever-evolving technology landscape, today’s necessity becomes obsolete tomorrow. As a result, it is important to establish streamlined processes that will stay the course despite chang-ing technologies and evolving platforms. Software testing follows the same evolutionary cycle. With the phenomenal explosion in the IT intensity of most businesses, data volumes and velocities have accelerated, creating a need for real-time big data testing. Data Warehouse Testing • Decreased test coverage due to complex organiza-tion of big data require-ment. • Test data supports limited normalization. • 4 Vs — variety, velocity, volume and veracity — are not monitored. Performance Testing • Generation of greater workload for performance testing of big data. • Test results in the form of reports, charts and graphs are at least twice as big in comparison with tradition-al BI reports. • Interpretation of results and identifying bottle-necks. • Performance tuning. Test Data Management • Management of test data during automated testing process. • Anticipating the acquisi-tion and management of test data during different phases of the software testing lifecycle. • Test data setup in relation to test coverage, accuracy and types of big test data. • Investment in servers utilized for performance testing and small-scale companies may not be cost-effective. At a Glance: Challenges in Big Data Testing Figure 1
  • 3. cognizant 20-20 insights 3 To address the dynamic changes in the big data ecosystem, organizations must streamline their processes for data warehouse testing, perfor-mance testing and test data management. Strengthen Data Warehousing Processes While data warehouse testing is performed in a controlled environment, the unpredictable nature of the big data testing environment pres-ents a unique set of challenges. Data warehouse and business intelligence testing require highly complex testing strategies, processes and tools pertaining specifically to the 4Vs of big data. Recommendations to refine test strategies and processes include: • Make “big” things simple through a “divide- and-test” strategy. Organize your big data warehouse into smaller units that are easily testable, thus improving the test coverage and optimizing the big data test set. • Normalize design and tests. Achieve a more effective generation of normalized test data for big data testing by normalizing the dynamic schemas at the design level. • Enhance testing through measuring the 4Vs. Data warehouse test environments that are specifically designed to handle the 4Vs of big data will result in improved test coverage. Strengthen Performance Performance testing is an integral part of system testing that focuses on volumes, work-loads, real-time scenarios and end users’ navigational habits. The performance of a system depends on variable factors such as network, underlying hardware, Web servers, database servers, hosting servers, number of peak loads and prolonged workloads. However, addressing these requirements — and maintaining big data test systems performance — requires the orga-nization’s full attention. Recommendations to implement the big data framework in perfor-mance testing include: • Simulate a real-time environment with dis-tributed and parallel workload distribution. Testing should be carried out in parallel in a dis-tributed environment. The scripts generated by performance testing tools should be dis-tributed among the controllers to simulate a real-time environment. • Integration with distributed test data: Per-formance testing strategies depend predomi-nately on the scenario set of the controllers. The spread-sheets and the back-end databases that typically hold test data often lack the ability to hold unstructured big data. To overcome this obstacle, the controller should be provided with an interface that can be used to integrate with the existing distributed test data. • Parallel test execution: Enabling distributed virtual users to execute tests in parallel is an effective way to handle test execution. Strengthen Test Data Quality Recommendations for addressing the pain points in test data management of big data include: • Planning and designing: Automated scripts cannot be scaled to test big data. Scaling up test data sets without adequate planning and design will lead to delayed response time and possibly timed-out test execution. Performing action-based testing (ABT) will help mitigate this issue. In ABT, tests are treated as actions in a test module. These actions are pointed toward keywords along with the parameters required for executing the tests. • Infrastructure setup: Test automation consumes enormous resources to generate workloads. However, investing in dedicated servers is not cost-efficient for the small-scale operations that process big data. Renting infrastructure as infrastructure as a service delivered via the cloud can help mitigate costs. Alternatively, the generation of higher workloads for performance testing of big data can be effectively handled with virtual parallel-ism on numerous virtual machines. Big Data Testing Is No Longer a Distant Chimera Big data testing strategy is pivotal for the suc-cess of big data initiatives. As a logical extension, testing and QA teams will not be exempted from handling big data. Yet, big data testing remains in a nascent stage and lacks a defined manual testing framework to transition to automated testing. Moreover, QA processes, customized frameworks and tools used in various specialized testing services will require a significant upgrade to effectively and efficiently handle big data. What was once called garbage data is now known as big data. Nothing is wasted, deleted or removed.
  • 4. The risks and challenges illustrated in this paper are just the tip of the iceberg. Big data is getting bigger by the day. With every passing moment there are bigger challenges of scalability and increased usage of cloud resources to ramp up testing of big data ecosystems. Unprecedented risks and issues may still emerge as the test-ing community starts working with big data. Leveraging the expertise of big data testing con-sultants can ease the pain and accelerate the learning curve in three important ways: • Designing an end-to-end QA strategy to contend with the 4Vs. • Gaining advice on the use of tried-and-true and appropriate tools. • Mitigating anticipated (and unanticipated) risks and related issues. What was once called garbage data is now known as big data. Nothing is wasted, deleted or removed. All data sets, structured, semi-structured and unstructured, are of paramount importance for businesses interested in making informed and timely decisions on strategies that drive business success today and tomorrow. Footnote 1 “ Global Big Data Industry at $25 Billion by 2015,” NASSCOM, September 5, 2012, http://www.crisil.com/ Ratings/Brochureware/News/CRISIL-GRA_NASSCOM-big-data_050912.pdf. About the Author Sushmitha Geddam is a Project Manager at Cognizant’s Data Testing Center of Excellence (CoE) R&D team and leads the big data initiatives. She has over 10 years of experience handling projects in specialized testing areas such as database, DW/ETL and data migration across numerous industry sectors, including investment banking, healthcare, insurance and telecom. Sushmitha can be reached at Sushmitha.Geddam@cognizant.com. About Cognizant QE&A Data Testing Center of Excellence Cognizant QE&A Data Testing CoE breeds expertise around industry-leading technologies. Our experts help you to accelerate and optimize the QA of your data testing needs by using state-of-the-art products, solutions and frameworks. For more details, reach us at QEAinBigData@cognizant.com. About Cognizant Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process out-sourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 75 development and delivery centers worldwide and approximately 187,400 employees as of June 30, 2014, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant. World Headquarters 500 Frank W. Burr Blvd. Teaneck, NJ 07666 USA Phone: +1 201 801 0233 Fax: +1 201 801 0243 Toll Free: +1 888 937 3277 Email: inquiry@cognizant.com European Headquarters 1 Kingdom Street Paddington Central London W2 6BD Phone: +44 (0) 20 7297 7600 Fax: +44 (0) 20 7121 0102 Email: infouk@cognizant.com India Operations Headquarters #5/535, Old Mahabalipuram Road Okkiyam Pettai, Thoraipakkam Chennai, 600 096 India Phone: +91 (0) 44 4209 6000 Fax: +91 (0) 44 4209 6060 Email: inquiryindia@cognizant.com © Copyright 2014, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.