You've made the move to MongoDB for its flexible schema and querying capabilities in order to enhance agility and reduce costs for your business. Shouldn't your data quality process be just as organized and efficient?
Using QuerySurge for testing your MongoDB data as part of your quality effort will increase your testing speed, boost your testing coverage (up to 100%), and improve the level of quality within your Big Data store. QuerySurge will help you keep your team organized and on track too!
To learn more about QuerySurge, visit www.QuerySurge.com
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Big Data Testing: Ensuring MongoDB Data Quality
1. Webinar:
automating the data testing
for
Bill Hayduk
CEO, President
RTTS
Jeff Bocarsly, PhD
VP & Chief Architect
RTTS
Presenters
Ensuring MongoDB Data Quality
built by
QuerySurge™
2. built by
QuerySurge™
About
FACTS
Founded:
1996
Locations:
New York (HQ), Atlanta,
Philadelphia, Phoenix
Strategic Partners:
IBM, Microsoft, HP,
Oracle, Teradata,
HortonWorks, Cloudera,
Amazon
Software:
QuerySurge
RTTS is the leading provider of software & data quality
for critical business systems
4. What is MongoDB?
Name: MongoDB (from "humongous")
1NoSQL means now “not only SQL”
210gen changed its name to MongoDB, Inc.
Source: Wikipedia
built by
QuerySurge™
• classified as a NoSQL1 database
• does not implement the table-based relational db structure
• cross-platform document-oriented database
• makes the integration of data in certain types of apps easier & faster
• free and open source
• originally built by 10gen2 and released in 2009
“MongoDB is in 5th place as the most popular type of database management system,
and 1st place for NoSQL database management systems.”
April 2014
5. built by
QuerySurge™
• Online real-time processing
• Data set is smaller
• Measured in milliseconds
• Offline big data processing
• Offline analytics
• Measured in minutes & hours
MongoDB versus Hadoop
Source: classpattern.com
When use MongoDB? / When use Hadoop?
6. MongoDB Use Cases
built by
QuerySurge™
Source: MongoDB, Inc.
Data Warehouse Batch Aggregation
ETL from MongoDB
ETL to MongoDB
7. Use Cases: Data Warehouse
Relational DB & Data
Warehousing
Source Data
@
BI, Analytics &
Reporting
built by
QuerySurge™
Ingestion
8. Data Quality Issues
built by
QuerySurge™
Data Quality Best Practices boost revenue by 66%.
The average organization loses $8.2 million annually through poor Data Quality.
46% of companies cite Data Quality as a barrier for adopting Business
Intelligence products.
80% of organizations… will underestimate the costs related to the data acquisition
tasks by an average of 50 percent.
10. Validating Data: 3 Big Issues
- need to verify more data and to do it faster
- need to automate the testing effort
- need to be able to test across different platforms
Need a testing tool!
built by
QuerySurge™
11. What is QuerySurge™?
a collaborative
data testing tool that
finds bad data & provides
a holistic view of your
data’s health
built by
QuerySurge™
12. • Reduce your costs & risks
• Improve your data quality
• Accelerate your testing cycles
• Share information with your team
with QuerySurge™ you can:
built by
QuerySurge™
• Provides huge ROI (i.e. 1,300%)*
*based on client’s calculation of Return on Investment
13. Finding Bad Data
SQL
SQL
SQL
SQL
SQL
SQL
QS pulls data from data source(s)
QS pulls data from data target(s)
QS compares data in seconds
QS generates reports, audit trails
How?
reports
built by
QuerySurge™
14. the QuerySurge advantage
built by
QuerySurge™
Automate the entire testing cycle
Automate kickoff, tests, comparison, auto-emailed results
Create Tests easily with no SQL programming
ensures minimal time & effort to create tests / obtain results
Test across different platforms
data warehouse, Hadoop, NoSQL, database, flat file, XML
Collaborate with team
Data Health dashboard, shared tests & auto-emailed reports
Verify more data & do it quickly
verifies up to 100% of all data up to 1,000 x faster
Integrate for Continuous Delivery
Integrates with most Build, ETL & QA management software
15. Collaboration
Testers
- functional testing
- regression testing
- result analysis
Developers / DBAs
- unit testing
- result analysis
Data Analysts
- review, analyze data
- verify mapping failures
Operations teams
- monitoring
- result analysis
Managers
- oversight
- result analysis
Share information on the
built by
QuerySurge™
18. built by
From a recent poll1 of:
• Big Data Experts
• Data Warehouse Architects
• Solution Architects
• ETL Architects
Recent Survey: Data Experts
Consensus Answer:
80% of data columns have no transformation at all
Our Question: What % of columns in your Data Warehouse
have no transformations at all?
1Poll conducted by RTTS on targeted LinkedIn groups
Why is this important?
19. Fast and Easy.
No programming needed.
built by
QuerySurge™
QuerySurge™ Modules
Compare by Table, Column & Row
• Perform 80% of all data tests - no SQL
coding needed
• Opens up testing to novices &
non-technical team members
• Speeds up testing for skilled SQL coders
• provides a huge Return-On-Investment
21. Design Library
• Create Query Pairs (source & target SQLs)
• Great for team members skilled with SQL
QuerySurge™ Modules
Scheduling
Build groups of Query Pairs
Schedule Test Runs
built by
QuerySurge™
22. Deep-Dive Reporting
Examine and automatically
email test results
Run Dashboard
View real-time execution
Analyze real-time results
QuerySurge™ Modules
built by
QuerySurge™
23. built by
QuerySurge™
• view data reliability & pass rate
• add, move, filter, zoom-in on any data
widget & underlying data
• verify build success or failure
24. QuerySurge Test Management Connectors
built by
QuerySurge™
Drive QuerySurge execution from your Test Management Solution
Outcome results (Pass/Fail/etc.) are returned from QuerySurge to your Test Management Solution
Results are linked in your Test Management Solution so that you can click directly into detailed QuerySurge
results
• HP ALM (Quality Center)
• Microsoft Team Foundation Server
• IBM Rational Quality Manager
Integration with leading
Test Management Solutions
25. Use Case: Big Data and
Relational DB & Data
Warehousing
Source Data
@
BI, Analytics &
Reporting
Ingestion
built by
QuerySurge™
™
26. Value-Add
QuerySurge provides value by either:
in testing data coverage from < 1% to
upwards of 100%
in testing time by as much as 1,000 x
combination of in test coverage while in
testing time
26
built by
QuerySurge™
27. Return on Investment
QuerySurge provides an increase in better data due to shorter / more
thorough testing cycle - saving $$$.
27
built by
QuerySurge™
Pharmaceutical Organization
Saves $288,000 in Clinical Trials
Data Migration Testing Project
1Since 2010, the pharmaceutical industry has been assessed
over $13 billion in fines.
Source: wikipedia
http://en.wikipedia.org/wiki/List_of_largest_pharmaceutical_settlements
This savings does not include savings
from avoiding fines from regulatory
bodies or lawsuits.1
Total Savings
28. Contact us if your team would like:
(1) a Trial in the Cloud of QuerySurge, including self-learning
tutorial that works with sample data for 3 days or
(2) a downloaded Trial of QuerySurge, including self-learning
tutorial with sample data or your data for 15 days or
(3) a Proof of Concept of QuerySurge, including a kickoff &
setup meeting and weekly meetings with our team of experts
for 30 days
http://www.querysurge.com/compare-trial-optionsfor more information, Go here
QuerySurge
built by
QuerySurge™
TRIAL
IN THE CLOUD
Editor's Notes
There are a multitude of bad news that is hitting the press regarding bad data. And these mistakes are extremely costly, as the news from the Australian retail grocery industry losing $1 billion Australian ($940 million US) highlights.
QuerySurge provides insight onto the health of your data throughout your organization through BI dashboards and reporting at your fingertips. It is a collaborative tools that allows for distributed use of the tool throughout your organization and provides for a sharable, holistic view of your data’s health and your organization’s level of maturity of your data management.
QuerySurge helps your team coordinate your data quality initiatives while speeding up your development and testing cycles and finding your bad data. Why risk having your team identify trends and develop strategic initiatives when the underlying data is incorrect? QuerySurge reduces this risk.
QuerySurge finds bad data by natively connecting to:
any data source, whether it is any type of database, flat file or xml and
can connect to any data target, whether it is a db, file, xml, data warehouse or hadoop implementation.
QuerySurge pulls data from the source and the target and compares them very quickly (typically in a few minutes) and then produces reports that show every data difference, even if there are millions of rows and hundreds of columns in the test. These reports can be automatically emailed to your team.
You can pick from a multitude of reports or export the results so that you can build your own reports.
QuerySurge can utilized by active practitioners such as testers & developers to create and launch tests, or by managers, analysts and operations to view data test results and the overall health of the data. QuerySurge facilitates this by providing 2 types of licenses: (1) full user & (2) participant user.
(1) Full User – This type of user has unlimited access to create QueryPairs, Suites, and Scenarios. This user can also schedule and run tests, see results, run and export reports, and export data. Perfect for anyone creating and/or running data tests while performing analysis of results.
(2) Participant User – This user cannot create or run tests, but has access to all other information - including viewing all query pairs, results, and reports, receiving email notifications, and exporting test results and reports. Perfect for managers, analysts, architects, DBAs, developers, and operations users who need to know the health of their data.
Your distributed team from around the world can use any of these web browsers: Internet Explorer, Chrome, Firefox and Safari.
Installs on operating systems: Windows & Linux.
QS connects to any JDBC-compliant data source. Even if it is not listed here.