SlideShare a Scribd company logo
T8
Concurrent Class
10/3/2013 11:15:00 AM

"Become a Big Data Quality
Hero"
Presented by:
Jason Rauen
LexisNexis

Brought to you by:

340 Corporate Way, Suite 300, Orange Park, FL 32073
888-268-8770 ∙ 904-278-0524 ∙ sqeinfo@sqe.com ∙ www.sqe.com
Jason Rauen
LexisNexis
Jason Rauen is a senior quality test analyst at Georgia-based LexisNexis Risk Solutions. With
more than fifteen years of experience, Jason has led the data testing team in big data from its
inception. He has presented big data scripting techniques at HPCC Systems national Data
Summit. His background includes working at companies including Microsoft, AT&T, and
LexisNexis, and instructing at Intel, Boeing, Executrain, and the Department of the Navy.
9/19/2013

Interesting Quotes……

“Quality isn’t measured by how many clients you
obtain; it’s measured by how many clients you
retain.”
“QA isn’t the bottom of the totem pole; it’s the dirt
holding it up.”

Become a Big Data Quality Hero
A look inside QA for Big Data
Presented by 01001010 01100001 01110011 01101111 01101110 00100000
01010010 01100001 01110101 01100101 01101110 (Jason Rauen)

1
9/19/2013

Overview
• Why Test and How it’s Different
– Issues
– Benefits

• Architecture and why you need to know
– HPCC Systems/Hadoop
– Know Your Data/Environment

• Strategies and Concepts
– What to look for
– Sample Gathering (AUB)
– Stats
– Profiling

Why Test and How it’s Different
Why Test Data:
• Traditional methods not adequate – Traditional sampling
needs improvement and is scenario based, not enough
samples, human error, etc….
• Tied into current environment
• Government regulatory compliances
• Auditing requirements
• Company wide initiatives

2
9/19/2013

Why Test and How it’s Different

Want to keep your customers?

Why Test and How it’s Different
• When?
o Testing - SDLC
o Routine Testing
o Frequency - Yearly/Monthly/Weekly/Daily/Hourly/On
Demand

• What? Types Testing
New Project – Source to Target (Transform)
Standard - Production Validation
Emergency releases

• How?
o Using what you have available
o Freebies – Profiling tools, etc…

3
9/19/2013

Why Test and How it’s Different

Issues:
• Lack of control
Timing of builds
Samples and location of samples
• 3rd Party Apps
Lack of licenses, Costs, Training, and existing
knowledge
• Extra hardware
• Upgrades

Why Test and How it’s Different

Benefits:
• Cost savings
• Better Coverage
No Samples
Increased Sampling
Focused Samples
• Faster (Time is $)
• Quicker to Diagnosing issues
• Better Data Integrity
• Collaboration with other groups

4
9/19/2013

Architecture and why you need to know

Typical Generic Architecture

input

DB

Architecture and why you need to know
Data Fabrication Engines
• HDFS Hadoop and HPCC THOR
• Made of several nodes
• Where the ETL happens
• Where the Keys are made
Data Delivery Engines
• HPCC ROXIE, HBASE, etc…
• Keys moved to and referenced here
• Queries reside

5
9/19/2013

Architecture and why you need to know

Architecture and why you need to know

HDFS
Hadoop Mapreduce

HBASE

6
9/19/2013

Architecture and why you need to know

Architecture and why you need to know
HDFS
Map

Shuffle

Reduce

7
9/19/2013

Architecture and why you need to know
HPCC Systems
DISTRIBUTE/PROJECT/TRANSFORM

Rollup

Strategies and Concepts
• What to look for……
Brand New, Incomplete, or Missing Builds (Data Cops)
Data progression Today/Yesterday FatherKey/Grandfatherkey
Count of Deltas in release/deploy
Keys updated
Missing keys/New keys
Field Validations Indexed and Non Indexed
Key Layout issues
Corruption unprintable or invalid characters
Duplicate records of new and existing records
Data Fabrication Engine to Data delivery Engine deploys/sync
Queries with new data

8
9/19/2013

Strategies and Concepts
JOIN
• Sample gathering
• New Key for testing
• Deployment Validation
- Data Fabrication
• Deployment Validation
- Data Delivery
And get a free cookie…

Strategies and Concepts
AUB for JOIN
A = Left key (New)
B = Right key (Old)
Types of JOINS

Inner Join

Left Outer Join

Full Outer Join

Right Outer Join

Minus or Left Only

9
9/19/2013

Strategies and Concepts

AUB for JOIN
A = Left key (New)
B = Right key (Old)

VENN

Strategies and Concepts
Statistics: What you try to remember with this swimming
behind you.

10
9/19/2013

Strategies and Concepts
Statistics:
• On data sets and keys
- Gives you a high level look at the release
- Ranges
- You’ll start to notice a trend line
• On Releases
- Done over time you’ll see the trend of new data sets and keys
- Done over time you’ll see the trend of changed or modified
data sets and keys

Strategies and Concepts
RELEASE NUMBERS
400

350

AVERAG 175.4

300
CEILING 210.6
250
FLOOR 135.1
200

150

100

50

0
1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

11
9/19/2013

Strategies and Concepts

Data Profiling:
• Data Profiling Summary Report
• Data Profiling Field Detail Report
http://www.hpccsystems.com/demos/dataprofiling-demo

• Data Profiling Field Combination Report

Strategies and Concepts
Data Profiling Summary Report

12
9/19/2013

Strategies and Concepts
Data Profiling Field Detail Report

Strategies and Concepts
Data Profiling Field Combination Report

13
9/19/2013

Strategies and Concepts
SQL

Pig

ECL

SELECT * FROM Products;

DUMP Products;

Products;

SELECT * FROM Products
WHERE productcode =
‘R2D2C3PO’;

Products= FILTER
Products BY productcode
= ‘R2D2C3PO’;
DUMP Products;

Products(productcode =
‘R2D2C3PO’);

Products= GROUP
Products ALL;
Products =FOREACH
Products GENERATE
COUNT (Products);
DUMP Products;

COUNT(Products);

SELECT COUNT (*) FROM
Products;

Strategies and Concepts
SQL

Pig

ECL

SELECT * FROM Products
ORDER BY productcode;

Products= ORDER
SORT(
Products BY productcode; Products,productcode);
DUMP Products;

SELECT * FROM Products FULL
OUTER JOIN OtherProducts
ON Products.col1 =
OtherProducts.col1;

Products= JOIN Products
BY col1 FULL OUTER,
OtherProducts BY col1;
DUMP Products;

JOIN(Products,OtherPro
ducts, LEFT.col1 =
RIGHT.col1,FULL
OUTER);

14
9/19/2013

Summary

Why Test and How it’s Different
Architecture and why you need to know
Strategies and Concepts

Questions?

15
9/19/2013

Contact / Useful links
www.linkedin/in/jasonrauen

• HPCC Systems/ECL Links:
http://hpccsystems.com
http://hpccsystems.com/demos

• Hadoop/Pig Latin Links:
http://pig.apache.org
http://hadoop.apache.org

• SQL Links:
http://sql.org/
http://msdn.microsoft.com/en-US/sqlserver/default.aspx

16

More Related Content

Viewers also liked

Test Estimation for Managers
Test Estimation for Managers Test Estimation for Managers
Test Estimation for Managers
TechWell
 
Yin and Yang: Metrics within Agile and Traditional Lifecycles
Yin and Yang: Metrics within Agile and Traditional LifecyclesYin and Yang: Metrics within Agile and Traditional Lifecycles
Yin and Yang: Metrics within Agile and Traditional Lifecycles
TechWell
 
How to Survive the Coming Test Automation Zombie Apocalypse
How to Survive the Coming Test Automation Zombie ApocalypseHow to Survive the Coming Test Automation Zombie Apocalypse
How to Survive the Coming Test Automation Zombie Apocalypse
TechWell
 
Lean Management: Lessons from the Field
Lean Management: Lessons from the FieldLean Management: Lessons from the Field
Lean Management: Lessons from the Field
TechWell
 
Agile Development in a Regulated Environment
Agile Development in a Regulated EnvironmentAgile Development in a Regulated Environment
Agile Development in a Regulated Environment
TechWell
 
Keynote: Surviving or Thriving: Top Ten Lessons for the Professional Tester
Keynote: Surviving or Thriving: Top Ten Lessons for the Professional TesterKeynote: Surviving or Thriving: Top Ten Lessons for the Professional Tester
Keynote: Surviving or Thriving: Top Ten Lessons for the Professional Tester
TechWell
 
An Introduction to SAFe: The Scaled Agile Framework
An Introduction to SAFe: The Scaled Agile FrameworkAn Introduction to SAFe: The Scaled Agile Framework
An Introduction to SAFe: The Scaled Agile Framework
TechWell
 
Measurement and Metrics for Test Managers
Measurement and Metrics for Test ManagersMeasurement and Metrics for Test Managers
Measurement and Metrics for Test Managers
TechWell
 
Testing After You’ve Finished Testing
Testing After You’ve Finished TestingTesting After You’ve Finished Testing
Testing After You’ve Finished Testing
TechWell
 
Patters for Team Collaboration: Toward Whole Team Quality
Patters for Team Collaboration: Toward Whole Team QualityPatters for Team Collaboration: Toward Whole Team Quality
Patters for Team Collaboration: Toward Whole Team Quality
TechWell
 
Introducing Keyword-Driven Test Automation
Introducing Keyword-Driven Test AutomationIntroducing Keyword-Driven Test Automation
Introducing Keyword-Driven Test Automation
TechWell
 

Viewers also liked (11)

Test Estimation for Managers
Test Estimation for Managers Test Estimation for Managers
Test Estimation for Managers
 
Yin and Yang: Metrics within Agile and Traditional Lifecycles
Yin and Yang: Metrics within Agile and Traditional LifecyclesYin and Yang: Metrics within Agile and Traditional Lifecycles
Yin and Yang: Metrics within Agile and Traditional Lifecycles
 
How to Survive the Coming Test Automation Zombie Apocalypse
How to Survive the Coming Test Automation Zombie ApocalypseHow to Survive the Coming Test Automation Zombie Apocalypse
How to Survive the Coming Test Automation Zombie Apocalypse
 
Lean Management: Lessons from the Field
Lean Management: Lessons from the FieldLean Management: Lessons from the Field
Lean Management: Lessons from the Field
 
Agile Development in a Regulated Environment
Agile Development in a Regulated EnvironmentAgile Development in a Regulated Environment
Agile Development in a Regulated Environment
 
Keynote: Surviving or Thriving: Top Ten Lessons for the Professional Tester
Keynote: Surviving or Thriving: Top Ten Lessons for the Professional TesterKeynote: Surviving or Thriving: Top Ten Lessons for the Professional Tester
Keynote: Surviving or Thriving: Top Ten Lessons for the Professional Tester
 
An Introduction to SAFe: The Scaled Agile Framework
An Introduction to SAFe: The Scaled Agile FrameworkAn Introduction to SAFe: The Scaled Agile Framework
An Introduction to SAFe: The Scaled Agile Framework
 
Measurement and Metrics for Test Managers
Measurement and Metrics for Test ManagersMeasurement and Metrics for Test Managers
Measurement and Metrics for Test Managers
 
Testing After You’ve Finished Testing
Testing After You’ve Finished TestingTesting After You’ve Finished Testing
Testing After You’ve Finished Testing
 
Patters for Team Collaboration: Toward Whole Team Quality
Patters for Team Collaboration: Toward Whole Team QualityPatters for Team Collaboration: Toward Whole Team Quality
Patters for Team Collaboration: Toward Whole Team Quality
 
Introducing Keyword-Driven Test Automation
Introducing Keyword-Driven Test AutomationIntroducing Keyword-Driven Test Automation
Introducing Keyword-Driven Test Automation
 

Similar to Become a Big Data Quality Hero

Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Databricks
 
Testing the Data Warehouse
Testing the Data WarehouseTesting the Data Warehouse
Testing the Data Warehouse
TechWell
 
Testing the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTesting the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big Problems
TechWell
 
Test Data, Information, Knowledge, Wisdom: past, present & future of standing...
Test Data, Information, Knowledge, Wisdom: past, present & future of standing...Test Data, Information, Knowledge, Wisdom: past, present & future of standing...
Test Data, Information, Knowledge, Wisdom: past, present & future of standing...
Neil Thompson
 
Data Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and AnalyticsData Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and Analytics
Domino Data Lab
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
Grega Kespret
 
Testing the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTesting the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big Problems
TechWell
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
Looker
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
Looker
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
XanGwaps
 
DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)
Blake Irvine
 
SQL Server Managing Test Data & Stress Testing January 2011
SQL Server Managing Test Data & Stress Testing January 2011SQL Server Managing Test Data & Stress Testing January 2011
SQL Server Managing Test Data & Stress Testing January 2011
Mark Ginnebaugh
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 
Data DevOps: An Overview
Data DevOps: An OverviewData DevOps: An Overview
Data DevOps: An Overview
Scott W. Ambler
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Precisely
 
Testing the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTesting the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big Problems
TechWell
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
AbdulrahimShaibuIssa
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015
DataKitchen
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Krishna Sankar
 
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Data Con LA
 

Similar to Become a Big Data Quality Hero (20)

Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
 
Testing the Data Warehouse
Testing the Data WarehouseTesting the Data Warehouse
Testing the Data Warehouse
 
Testing the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTesting the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big Problems
 
Test Data, Information, Knowledge, Wisdom: past, present & future of standing...
Test Data, Information, Knowledge, Wisdom: past, present & future of standing...Test Data, Information, Knowledge, Wisdom: past, present & future of standing...
Test Data, Information, Knowledge, Wisdom: past, present & future of standing...
 
Data Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and AnalyticsData Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and Analytics
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
Testing the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTesting the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big Problems
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)
 
SQL Server Managing Test Data & Stress Testing January 2011
SQL Server Managing Test Data & Stress Testing January 2011SQL Server Managing Test Data & Stress Testing January 2011
SQL Server Managing Test Data & Stress Testing January 2011
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 
Data DevOps: An Overview
Data DevOps: An OverviewData DevOps: An Overview
Data DevOps: An Overview
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
 
Testing the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTesting the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big Problems
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
 
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
 

More from TechWell

Failing and Recovering
Failing and RecoveringFailing and Recovering
Failing and Recovering
TechWell
 
Instill a DevOps Testing Culture in Your Team and Organization
Instill a DevOps Testing Culture in Your Team and Organization Instill a DevOps Testing Culture in Your Team and Organization
Instill a DevOps Testing Culture in Your Team and Organization
TechWell
 
Test Design for Fully Automated Build Architecture
Test Design for Fully Automated Build ArchitectureTest Design for Fully Automated Build Architecture
Test Design for Fully Automated Build Architecture
TechWell
 
System-Level Test Automation: Ensuring a Good Start
System-Level Test Automation: Ensuring a Good StartSystem-Level Test Automation: Ensuring a Good Start
System-Level Test Automation: Ensuring a Good Start
TechWell
 
Build Your Mobile App Quality and Test Strategy
Build Your Mobile App Quality and Test StrategyBuild Your Mobile App Quality and Test Strategy
Build Your Mobile App Quality and Test Strategy
TechWell
 
Testing Transformation: The Art and Science for Success
Testing Transformation: The Art and Science for SuccessTesting Transformation: The Art and Science for Success
Testing Transformation: The Art and Science for Success
TechWell
 
Implement BDD with Cucumber and SpecFlow
Implement BDD with Cucumber and SpecFlowImplement BDD with Cucumber and SpecFlow
Implement BDD with Cucumber and SpecFlow
TechWell
 
Develop WebDriver Automated Tests—and Keep Your Sanity
Develop WebDriver Automated Tests—and Keep Your SanityDevelop WebDriver Automated Tests—and Keep Your Sanity
Develop WebDriver Automated Tests—and Keep Your Sanity
TechWell
 
Ma 15
Ma 15Ma 15
Ma 15
TechWell
 
Eliminate Cloud Waste with a Holistic DevOps Strategy
Eliminate Cloud Waste with a Holistic DevOps StrategyEliminate Cloud Waste with a Holistic DevOps Strategy
Eliminate Cloud Waste with a Holistic DevOps Strategy
TechWell
 
Transform Test Organizations for the New World of DevOps
Transform Test Organizations for the New World of DevOpsTransform Test Organizations for the New World of DevOps
Transform Test Organizations for the New World of DevOps
TechWell
 
The Fourth Constraint in Project Delivery—Leadership
The Fourth Constraint in Project Delivery—LeadershipThe Fourth Constraint in Project Delivery—Leadership
The Fourth Constraint in Project Delivery—Leadership
TechWell
 
Resolve the Contradiction of Specialists within Agile Teams
Resolve the Contradiction of Specialists within Agile TeamsResolve the Contradiction of Specialists within Agile Teams
Resolve the Contradiction of Specialists within Agile Teams
TechWell
 
Pin the Tail on the Metric: A Field-Tested Agile Game
Pin the Tail on the Metric: A Field-Tested Agile GamePin the Tail on the Metric: A Field-Tested Agile Game
Pin the Tail on the Metric: A Field-Tested Agile Game
TechWell
 
Agile Performance Holarchy (APH)—A Model for Scaling Agile Teams
Agile Performance Holarchy (APH)—A Model for Scaling Agile TeamsAgile Performance Holarchy (APH)—A Model for Scaling Agile Teams
Agile Performance Holarchy (APH)—A Model for Scaling Agile Teams
TechWell
 
A Business-First Approach to DevOps Implementation
A Business-First Approach to DevOps ImplementationA Business-First Approach to DevOps Implementation
A Business-First Approach to DevOps Implementation
TechWell
 
Databases in a Continuous Integration/Delivery Process
Databases in a Continuous Integration/Delivery ProcessDatabases in a Continuous Integration/Delivery Process
Databases in a Continuous Integration/Delivery Process
TechWell
 
Mobile Testing: What—and What Not—to Automate
Mobile Testing: What—and What Not—to AutomateMobile Testing: What—and What Not—to Automate
Mobile Testing: What—and What Not—to Automate
TechWell
 
Cultural Intelligence: A Key Skill for Success
Cultural Intelligence: A Key Skill for SuccessCultural Intelligence: A Key Skill for Success
Cultural Intelligence: A Key Skill for Success
TechWell
 
Turn the Lights On: A Power Utility Company's Agile Transformation
Turn the Lights On: A Power Utility Company's Agile TransformationTurn the Lights On: A Power Utility Company's Agile Transformation
Turn the Lights On: A Power Utility Company's Agile Transformation
TechWell
 

More from TechWell (20)

Failing and Recovering
Failing and RecoveringFailing and Recovering
Failing and Recovering
 
Instill a DevOps Testing Culture in Your Team and Organization
Instill a DevOps Testing Culture in Your Team and Organization Instill a DevOps Testing Culture in Your Team and Organization
Instill a DevOps Testing Culture in Your Team and Organization
 
Test Design for Fully Automated Build Architecture
Test Design for Fully Automated Build ArchitectureTest Design for Fully Automated Build Architecture
Test Design for Fully Automated Build Architecture
 
System-Level Test Automation: Ensuring a Good Start
System-Level Test Automation: Ensuring a Good StartSystem-Level Test Automation: Ensuring a Good Start
System-Level Test Automation: Ensuring a Good Start
 
Build Your Mobile App Quality and Test Strategy
Build Your Mobile App Quality and Test StrategyBuild Your Mobile App Quality and Test Strategy
Build Your Mobile App Quality and Test Strategy
 
Testing Transformation: The Art and Science for Success
Testing Transformation: The Art and Science for SuccessTesting Transformation: The Art and Science for Success
Testing Transformation: The Art and Science for Success
 
Implement BDD with Cucumber and SpecFlow
Implement BDD with Cucumber and SpecFlowImplement BDD with Cucumber and SpecFlow
Implement BDD with Cucumber and SpecFlow
 
Develop WebDriver Automated Tests—and Keep Your Sanity
Develop WebDriver Automated Tests—and Keep Your SanityDevelop WebDriver Automated Tests—and Keep Your Sanity
Develop WebDriver Automated Tests—and Keep Your Sanity
 
Ma 15
Ma 15Ma 15
Ma 15
 
Eliminate Cloud Waste with a Holistic DevOps Strategy
Eliminate Cloud Waste with a Holistic DevOps StrategyEliminate Cloud Waste with a Holistic DevOps Strategy
Eliminate Cloud Waste with a Holistic DevOps Strategy
 
Transform Test Organizations for the New World of DevOps
Transform Test Organizations for the New World of DevOpsTransform Test Organizations for the New World of DevOps
Transform Test Organizations for the New World of DevOps
 
The Fourth Constraint in Project Delivery—Leadership
The Fourth Constraint in Project Delivery—LeadershipThe Fourth Constraint in Project Delivery—Leadership
The Fourth Constraint in Project Delivery—Leadership
 
Resolve the Contradiction of Specialists within Agile Teams
Resolve the Contradiction of Specialists within Agile TeamsResolve the Contradiction of Specialists within Agile Teams
Resolve the Contradiction of Specialists within Agile Teams
 
Pin the Tail on the Metric: A Field-Tested Agile Game
Pin the Tail on the Metric: A Field-Tested Agile GamePin the Tail on the Metric: A Field-Tested Agile Game
Pin the Tail on the Metric: A Field-Tested Agile Game
 
Agile Performance Holarchy (APH)—A Model for Scaling Agile Teams
Agile Performance Holarchy (APH)—A Model for Scaling Agile TeamsAgile Performance Holarchy (APH)—A Model for Scaling Agile Teams
Agile Performance Holarchy (APH)—A Model for Scaling Agile Teams
 
A Business-First Approach to DevOps Implementation
A Business-First Approach to DevOps ImplementationA Business-First Approach to DevOps Implementation
A Business-First Approach to DevOps Implementation
 
Databases in a Continuous Integration/Delivery Process
Databases in a Continuous Integration/Delivery ProcessDatabases in a Continuous Integration/Delivery Process
Databases in a Continuous Integration/Delivery Process
 
Mobile Testing: What—and What Not—to Automate
Mobile Testing: What—and What Not—to AutomateMobile Testing: What—and What Not—to Automate
Mobile Testing: What—and What Not—to Automate
 
Cultural Intelligence: A Key Skill for Success
Cultural Intelligence: A Key Skill for SuccessCultural Intelligence: A Key Skill for Success
Cultural Intelligence: A Key Skill for Success
 
Turn the Lights On: A Power Utility Company's Agile Transformation
Turn the Lights On: A Power Utility Company's Agile TransformationTurn the Lights On: A Power Utility Company's Agile Transformation
Turn the Lights On: A Power Utility Company's Agile Transformation
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 

Become a Big Data Quality Hero

  • 1. T8 Concurrent Class 10/3/2013 11:15:00 AM "Become a Big Data Quality Hero" Presented by: Jason Rauen LexisNexis Brought to you by: 340 Corporate Way, Suite 300, Orange Park, FL 32073 888-268-8770 ∙ 904-278-0524 ∙ sqeinfo@sqe.com ∙ www.sqe.com
  • 2. Jason Rauen LexisNexis Jason Rauen is a senior quality test analyst at Georgia-based LexisNexis Risk Solutions. With more than fifteen years of experience, Jason has led the data testing team in big data from its inception. He has presented big data scripting techniques at HPCC Systems national Data Summit. His background includes working at companies including Microsoft, AT&T, and LexisNexis, and instructing at Intel, Boeing, Executrain, and the Department of the Navy.
  • 3. 9/19/2013 Interesting Quotes…… “Quality isn’t measured by how many clients you obtain; it’s measured by how many clients you retain.” “QA isn’t the bottom of the totem pole; it’s the dirt holding it up.” Become a Big Data Quality Hero A look inside QA for Big Data Presented by 01001010 01100001 01110011 01101111 01101110 00100000 01010010 01100001 01110101 01100101 01101110 (Jason Rauen) 1
  • 4. 9/19/2013 Overview • Why Test and How it’s Different – Issues – Benefits • Architecture and why you need to know – HPCC Systems/Hadoop – Know Your Data/Environment • Strategies and Concepts – What to look for – Sample Gathering (AUB) – Stats – Profiling Why Test and How it’s Different Why Test Data: • Traditional methods not adequate – Traditional sampling needs improvement and is scenario based, not enough samples, human error, etc…. • Tied into current environment • Government regulatory compliances • Auditing requirements • Company wide initiatives 2
  • 5. 9/19/2013 Why Test and How it’s Different Want to keep your customers? Why Test and How it’s Different • When? o Testing - SDLC o Routine Testing o Frequency - Yearly/Monthly/Weekly/Daily/Hourly/On Demand • What? Types Testing New Project – Source to Target (Transform) Standard - Production Validation Emergency releases • How? o Using what you have available o Freebies – Profiling tools, etc… 3
  • 6. 9/19/2013 Why Test and How it’s Different Issues: • Lack of control Timing of builds Samples and location of samples • 3rd Party Apps Lack of licenses, Costs, Training, and existing knowledge • Extra hardware • Upgrades Why Test and How it’s Different Benefits: • Cost savings • Better Coverage No Samples Increased Sampling Focused Samples • Faster (Time is $) • Quicker to Diagnosing issues • Better Data Integrity • Collaboration with other groups 4
  • 7. 9/19/2013 Architecture and why you need to know Typical Generic Architecture input DB Architecture and why you need to know Data Fabrication Engines • HDFS Hadoop and HPCC THOR • Made of several nodes • Where the ETL happens • Where the Keys are made Data Delivery Engines • HPCC ROXIE, HBASE, etc… • Keys moved to and referenced here • Queries reside 5
  • 8. 9/19/2013 Architecture and why you need to know Architecture and why you need to know HDFS Hadoop Mapreduce HBASE 6
  • 9. 9/19/2013 Architecture and why you need to know Architecture and why you need to know HDFS Map Shuffle Reduce 7
  • 10. 9/19/2013 Architecture and why you need to know HPCC Systems DISTRIBUTE/PROJECT/TRANSFORM Rollup Strategies and Concepts • What to look for…… Brand New, Incomplete, or Missing Builds (Data Cops) Data progression Today/Yesterday FatherKey/Grandfatherkey Count of Deltas in release/deploy Keys updated Missing keys/New keys Field Validations Indexed and Non Indexed Key Layout issues Corruption unprintable or invalid characters Duplicate records of new and existing records Data Fabrication Engine to Data delivery Engine deploys/sync Queries with new data 8
  • 11. 9/19/2013 Strategies and Concepts JOIN • Sample gathering • New Key for testing • Deployment Validation - Data Fabrication • Deployment Validation - Data Delivery And get a free cookie… Strategies and Concepts AUB for JOIN A = Left key (New) B = Right key (Old) Types of JOINS Inner Join Left Outer Join Full Outer Join Right Outer Join Minus or Left Only 9
  • 12. 9/19/2013 Strategies and Concepts AUB for JOIN A = Left key (New) B = Right key (Old) VENN Strategies and Concepts Statistics: What you try to remember with this swimming behind you. 10
  • 13. 9/19/2013 Strategies and Concepts Statistics: • On data sets and keys - Gives you a high level look at the release - Ranges - You’ll start to notice a trend line • On Releases - Done over time you’ll see the trend of new data sets and keys - Done over time you’ll see the trend of changed or modified data sets and keys Strategies and Concepts RELEASE NUMBERS 400 350 AVERAG 175.4 300 CEILING 210.6 250 FLOOR 135.1 200 150 100 50 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 11
  • 14. 9/19/2013 Strategies and Concepts Data Profiling: • Data Profiling Summary Report • Data Profiling Field Detail Report http://www.hpccsystems.com/demos/dataprofiling-demo • Data Profiling Field Combination Report Strategies and Concepts Data Profiling Summary Report 12
  • 15. 9/19/2013 Strategies and Concepts Data Profiling Field Detail Report Strategies and Concepts Data Profiling Field Combination Report 13
  • 16. 9/19/2013 Strategies and Concepts SQL Pig ECL SELECT * FROM Products; DUMP Products; Products; SELECT * FROM Products WHERE productcode = ‘R2D2C3PO’; Products= FILTER Products BY productcode = ‘R2D2C3PO’; DUMP Products; Products(productcode = ‘R2D2C3PO’); Products= GROUP Products ALL; Products =FOREACH Products GENERATE COUNT (Products); DUMP Products; COUNT(Products); SELECT COUNT (*) FROM Products; Strategies and Concepts SQL Pig ECL SELECT * FROM Products ORDER BY productcode; Products= ORDER SORT( Products BY productcode; Products,productcode); DUMP Products; SELECT * FROM Products FULL OUTER JOIN OtherProducts ON Products.col1 = OtherProducts.col1; Products= JOIN Products BY col1 FULL OUTER, OtherProducts BY col1; DUMP Products; JOIN(Products,OtherPro ducts, LEFT.col1 = RIGHT.col1,FULL OUTER); 14
  • 17. 9/19/2013 Summary Why Test and How it’s Different Architecture and why you need to know Strategies and Concepts Questions? 15
  • 18. 9/19/2013 Contact / Useful links www.linkedin/in/jasonrauen • HPCC Systems/ECL Links: http://hpccsystems.com http://hpccsystems.com/demos • Hadoop/Pig Latin Links: http://pig.apache.org http://hadoop.apache.org • SQL Links: http://sql.org/ http://msdn.microsoft.com/en-US/sqlserver/default.aspx 16