Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Bill Hayduk
Founder, CEO
a software division ofQuerySurge™
The Data World Distilled:
Understanding how the data world work...
QuerySurge™
About
FACTS
RTTS Founded:
1996
Location:
New York, NY
(Headquarters)
Customer profile:
Fortune 1000
Software O...
Data Warehouse Marketplace
“the worldwide data warehouse management software market is forecast
to generate nearly $17 bil...
Fast Facts about Data
• By the end of 2020, companies will spend > USD $72 billion
on Big Data hardware, software, & profe...
Data
Warehouse
Data Quality
Data Testing
Big Data
a software division ofQuerySurge™
ETL/ Data
Integration
BI & Analytics
D...
Data
Warehouse
Data Quality
Data Testing
Big Data
a software division ofQuerySurge™
ETL/ Data
Integration
BI & Analytics
D...
What is Big Data?
a software division ofQuerySurge™
Big Data: defined as too much volume, velocity and variability to work on normal
database architectures.
“The market for b...
Handles more than 1 million customer transactions every hour.
• data imported into databases that contain > 2.5 petabytes ...
What is ?
• easily deals with complexities of high of data
Hadoop is an open source project that develops software for sca...
• Redundant and reliable
• Extremely powerful
• Easy to program distributed apps
• Runs on commodity hardware
a software d...
Top Vendors
built by
QuerySurge™
““By the end of 2020, companies will spend more than USD $72 billion on
on Big Data hardw...
MapReduce
(Task Tracker)
HDFS
(Data
Node)
MapReduce – processing part that manages
the programming jobs. (a.k.a. Task Trac...
Cluster
Add more machines for scaling – from 1 to 100 to 1,000
Job Tracker accepts jobs, assigns tasks, identifies failed ...
MapReduce
(Task Tracker)
HDFS
(Data
Node)HiveQLHiveQL
HiveQLHiveQL
HiveQL
Apache Hive - a data warehouse infrastructure bu...
What is NoSQL?
A term used to describe high-performance, non-relational databases that provide a mechanism for storage and...
Top Vendors
built by
QuerySurge™
built by
QuerySurge™
• Online real-time processing
• Data set is smaller
• Measured in milliseconds
• Offline big data pro...
built by
QuerySurge™
Source: MongoDB, Inc.
Data Warehouse Batch Aggregation
ETL from MongoDB
ETL to MongoDB
NoSQL Example:...
Data
Warehouse
Data Quality
Data Testing
Big Data
a software division ofQuerySurge™
ETL/ Data
Integration
BI & Analytics
D...
a software division ofQuerySurge™
What is a Data Warehouse?
Data Warehouse
• typically a relational database that is designed for query and analysis
rather than for transaction proce...
“The worldwide data warehouse management software market is
forecast to generate nearly $17 billion in revenue by 2019”
- ...
Alternate Delivery Models
a software division ofQuerySurge™
Data Warehouse - the marketplace
Leading Cloud DWHs
Oracle fou...
Why build a Data Warehouse?
• Data stored in operational systems (OLTP) not
easily accessible
• OLTP systems are not desig...
The Data Warehouse Business Solution
• Collects data from different sources (other
databases, files, web services, etc)
• ...
The Data Warehouse data
• Subject-oriented
• Integrated
• Non-volatile
• Time-variant
a software division ofQuerySurge™
Da...
Data
Warehouse
Data Quality
Data Testing
Big Data
a software division ofQuerySurge™
ETL / Data
Integration
BI & Analytics
...
ETL = Extract, Transform, Load
Why ETL?
Need to load the data warehouse regularly (daily/weekly) so that it can serve its
...
Legacy DB
CRM/ERP
DB
Finance DB
the ETL process
Source Data ETL Process Target DWH
a software division ofQuerySurge™
Extra...
Leaders in ETL Solutions
a software division ofQuerySurge™
Continuous Integration/ETL solutions - the Marketplace
(ab init...
Data
Warehouse
Data Quality
Data Testing
Big Data
a software division ofQuerySurge™
ETL/ Data
Integration
BI & Analytics
D...
a software division ofQuerySurge™
Business Intelligence (BI)
Business Intelligence – What is it?
• Software applications used in spotting, digging-out, and
analyzing business data
• B...
“The business intelligence (BI) and analytics software market is forecast to
grow to $22.8 billion by the end of 2020”
“Th...
Wal-Mart uses vast amounts of data and category analysis to
dominate the industry.
Amazon and Yahoo follow a "test and lea...
Data Mart
A database that has the same characteristics as a data warehouse, but is
usually smaller and is focused on the d...
Legacy DB
CRM/ERP
DB
Finance DB
Source Data
ETL Process
Target DWH
ETL Process
a software division ofQuerySurge™
Business ...
Data
Warehouse
Data Quality
Data Testing
Big Data
a software division ofQuerySurge™
ETL/ Data
Integration
BI & Analytics
D...
built byQuerySurge™
Data Quality Best Practices boost revenue by 66%.
46% of companies cite Data Quality as a barrier for
...
o Profiling
o Parsing and standardization
o Generalized Cleansing
o Matching
o Monitoring
o Enrichment
o Subject-area-spec...
“The market for data quality software tools reached $1.61 billion in 2017 (the
most recent year for which Gartner has data...
Data
Warehouse
Data Quality
Data Testing
Big Data
a software division ofQuerySurge™
ETL/ Data
Integration
BI & Analytics
D...
o Profiling
o Parsing and standardization
o Generalized Cleansing
o Matching
o Monitoring
o Enrichment
o Subject-area-spec...
a software division ofQuerySurge™
Where Data Testing fits in your data strategy
Business Intelligence & Analytics
CxOs are using Business Intelligence & Analytics to make critical business decisions
– w...
Data Analyst: Creates data requirements (source-to-
target map or mapping doc)
Data Architect: Models and builds data stor...
a.k.a. Source-to-Target Map
It’s the critical element required
to efficiently plan the target Data
Stores. It also defines...
Sampling
• Review Business Rules (i.e. mapping document, data flow mappings)
• Write Tests in SQL editor
• Execute 2 Tests...
Huge Risk
Roles Tasks
Timeline
Data Analyst
Data Architect
ETL Developer
Data Tester
Model and
build target
Data Stores
Re...
a software division of
About QuerySurge
QuerySurge™
is the leading testing solution for
automated validation & testing of Big Data
QuerySurge
Use Cases
a software division of...
QuerySurge connects
to any 2 points
at one time
How QuerySurge Works
SQL
HQL
SQL
Comparison of every data set
Source
Data
...
ETL Developer: Codes data movement based on Mapping Requirements
Data Warehouse
ETL
Data Tester: Tests data movement based...
QuerySurge supports the following data stores…
• Amazon Redshift, Elastic Map Reduce, DynamoDB
• Apache Hadoop/Hive, Spark...
Data
Warehouse
Data Quality
Data Testing
Big Data
a software division ofQuerySurge™
ETL/ Data
Integration
BI & Analytics
D...
1) Data stewardship
Identifying and assigning roles and responsibilities.
- who is creating its data,
- who has overall re...
source: IBM Data Governance Council Maturity Model
• Patterned after the Capability
Maturity Model
Integration(CMMI) from ...
“Rapidly increasing growth in data volumes, rising regulatory & compliance
mandates, and enhancing strategic risk manageme...
The Data World Distilled
Data Warehouse
ETL
Data Mart
ETL
Source Data Big Data lake BI & Analytics
Source types
• Flat files
• Excel
• json
• Xml
•...
The Data World Distilled:
Understanding how the data world works in the Big Data era
Any questions?
Bill Hayduk
Founder, C...
Upcoming SlideShare
Loading in …5
×

the Data World Distilled

1,457 views

Published on

The Data World Distilled
Understanding how the data world works in the Big Data era

I created this slide deck as a learning tool for new employees, I figured I would post it in case it can help others understand the data space.

This slide deck covers:
- Big Data
- Data Warehouses
- ETL/Data Integration
- Business Intelligence and Analytics
- Data Quality
- Data Testing
- Data Governance

It provides a brief description along with key vendors in the space.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

the Data World Distilled

  1. 1. Bill Hayduk Founder, CEO a software division ofQuerySurge™ The Data World Distilled: Understanding how the data world works in the Big Data era
  2. 2. QuerySurge™ About FACTS RTTS Founded: 1996 Location: New York, NY (Headquarters) Customer profile: Fortune 1000 Software Offering QuerySurge (2012) QuerySurge Partners: • 11 industry-leading Technology Partners • 14 global System Integrators • 22 regional consulting firms RTTS is the parent company of QuerySurge and began as a consulting firm centered on QA & testing a software division of Technology Partners System Integrators Sales & Consulting Partners
  3. 3. Data Warehouse Marketplace “the worldwide data warehouse management software market is forecast to generate nearly $17 billion in revenue by 2019” - Forrester Top vendors: Oracle, Teradata, IBM, Microsoft, SAP, Micro Focus and Amazon Business Intelligence Marketplace “The business intelligence (BI) and analytics software market is forecast to grow to $22.8 billion by the end of 2020” - Gartner SAP, IBM, SAS, Microsoft, Oracle, Tableau, Qlik, MicroStrategy , Information Builders DWH, BI, Big Data Marketplaces a software division ofQuerySurge™ Big Data Marketplace “By the end of 2020, companies will spend > USD $72 billion on on Big Data hardware, software, & professional services” - IDC Oracle, IBM, Microsoft, Amazon, Micro Focus, HortonWorks, Cloudera, Teradata, SAP, MongoDB, MapR, DataStax, Snowflake.
  4. 4. Fast Facts about Data • By the end of 2020, companies will spend > USD $72 billion on Big Data hardware, software, & professional services (the current market size is USD $46 billion) • > 75% of companies are investing or planning to invest in Big Data in the next 2 years • Professional services represents 43% of the Big Data market (services=USD $31 Billion of $72 Billion) a software division ofQuerySurge™
  5. 5. Data Warehouse Data Quality Data Testing Big Data a software division ofQuerySurge™ ETL/ Data Integration BI & Analytics Data Governance The Data World Distilled
  6. 6. Data Warehouse Data Quality Data Testing Big Data a software division ofQuerySurge™ ETL/ Data Integration BI & Analytics Data Governance The Data World Distilled
  7. 7. What is Big Data? a software division ofQuerySurge™
  8. 8. Big Data: defined as too much volume, velocity and variability to work on normal database architectures. “The market for big data is $70 billion and growing by 15% a year.” - EMC COO Pat Gelsinger Size Defined as 5 petabytes or more 1 petabyte = 1,000 terabytes 1,000 terabytes = 1,000,000 gigabytes 1,000,000 gigabytes = 1,000,000,000 megabytes a software division ofQuerySurge™ What is Big Data?
  9. 9. Handles more than 1 million customer transactions every hour. • data imported into databases that contain > 2.5 petabytes of data • the equivalent of 167 times the information contained in all the books in the US Library of Congress. Facebook handles 40 billion photos from its user base. Google processes 1 Terabyte per hour Twitter processes 85 million tweets per day eBay processes 80 Terabytes per day others a software division ofQuerySurge™ the Big Data Impact
  10. 10. What is ? • easily deals with complexities of high of data Hadoop is an open source project that develops software for scalable, distributed computing. • is a of large data sets across clusters of computers using simple programming models. from single servers to 1,000’s of machines, each offering local computation and storage. • detects and at the application layer a software division ofQuerySurge™
  11. 11. • Redundant and reliable • Extremely powerful • Easy to program distributed apps • Runs on commodity hardware a software division ofQuerySurge™ Key Attributes of Hadoop
  12. 12. Top Vendors built by QuerySurge™ ““By the end of 2020, companies will spend more than USD $72 billion on on Big Data hardware, software, & professional services” - IDC
  13. 13. MapReduce (Task Tracker) HDFS (Data Node) MapReduce – processing part that manages the programming jobs. (a.k.a. Task Tracker) HDFS (Hadoop Distributed File System) – stores data on the machines. (a.k.a. Data Node) machine a software division ofQuerySurge™ Basic Hadoop Architecture
  14. 14. Cluster Add more machines for scaling – from 1 to 100 to 1,000 Job Tracker accepts jobs, assigns tasks, identifies failed machines Name Node Coordination for HDFS. Inserts and extraction are communicated through the Name Node. Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Name Node a software division ofQuerySurge™ Basic Hadoop Architecture(continued)
  15. 15. MapReduce (Task Tracker) HDFS (Data Node)HiveQLHiveQL HiveQLHiveQL HiveQL Apache Hive - a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Hive provides a mechanism to query the data using a SQL-like language called HiveQL that interacts with the HDFS files • create • insert • update • delete • select a software division ofQuerySurge™ Apache Hive
  16. 16. What is NoSQL? A term used to describe high-performance, non-relational databases that provide a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases NoSQL Database Types Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents. Graph stores are used to store information about networks of data, such as social connections. Graph stores include Neo4J and Giraph. Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or 'key'), together with its value. Examples of key-value stores are Riak and Berkeley DB. Some key-value stores, such as Redis, allow each value to have a type, such as 'integer', which adds functionality. Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows. a software division ofQuerySurge™ About
  17. 17. Top Vendors built by QuerySurge™
  18. 18. built by QuerySurge™ • Online real-time processing • Data set is smaller • Measured in milliseconds • Offline big data processing • Offline analytics • Measured in minutes & hours Source: classpattern.com When to use NoSQL? / When to use Hadoop? NoSQL versus Hadoop
  19. 19. built by QuerySurge™ Source: MongoDB, Inc. Data Warehouse Batch Aggregation ETL from MongoDB ETL to MongoDB NoSQL Example: Use Cases
  20. 20. Data Warehouse Data Quality Data Testing Big Data a software division ofQuerySurge™ ETL/ Data Integration BI & Analytics Data Governance The Data World Distilled
  21. 21. a software division ofQuerySurge™ What is a Data Warehouse?
  22. 22. Data Warehouse • typically a relational database that is designed for query and analysis rather than for transaction processing • a place where historical data is stored for archival, analysis and security purposes. • contains either raw data or formatted data • combines data from multiple sources • Sales • salaries • operational data • human resource data • inventory data • web logs • Social networks • Internet text and docs • other Legacy DB CRM/ERP DB Finance DB a software division ofQuerySurge™ What is a Data Warehouse?
  23. 23. “The worldwide data warehouse management software market is forecast to generate nearly $17 billion in revenue by 2019” - Forrester Data Warehouse size Small data warehouses: < 5 TB Midsize data warehouses: 5 TB - 20 TB Large data warehouses: >20 TB - Analyst firm Gartner Leaders in on-premises Data Warehouse Data Management Systems - Analyst firm Gartner’s ‘Magic Quadrant for Data Warehouse Database Management Systems’a software division ofQuerySurge™ Data Warehouse - the marketplace
  24. 24. Alternate Delivery Models a software division ofQuerySurge™ Data Warehouse - the marketplace Leading Cloud DWHs Oracle founder Larry Ellison with an Exadata appliance Leading Appliance DWHs An appliance is software and servers optimized together.
  25. 25. Why build a Data Warehouse? • Data stored in operational systems (OLTP) not easily accessible • OLTP systems are not designed for end-user analysis • The data in OLTP is constantly changing • May be deficient in historical data • Diverse forms of data stored in different platforms and/or dissimilar formats a software division ofQuerySurge™ Data Warehouse - Business Case
  26. 26. The Data Warehouse Business Solution • Collects data from different sources (other databases, files, web services, etc) • Integrates data into logical business areas • Provides direct access to data with powerful reporting tools (BI) a software division ofQuerySurge™ Data Warehouse - Business Case
  27. 27. The Data Warehouse data • Subject-oriented • Integrated • Non-volatile • Time-variant a software division ofQuerySurge™ Data Warehouse - about the data
  28. 28. Data Warehouse Data Quality Data Testing Big Data a software division ofQuerySurge™ ETL / Data Integration BI & Analytics Data Governance The Data World Distilled
  29. 29. ETL = Extract, Transform, Load Why ETL? Need to load the data warehouse regularly (daily/weekly) so that it can serve its purpose of facilitating business analysis. a software division ofQuerySurge™ Data Integration & the ETL process Extract - data from one or more OLTP systems and copy into the warehouse Transform – removing inconsistencies, assemble to a common format, adding missing fields, summarizing detailed data and deriving new fields to store calculated data. Load – map the data, transform and/or load it into the DWH. The ETL function is either performed by home-grown software that someone wrote or through commercial software
  30. 30. Legacy DB CRM/ERP DB Finance DB the ETL process Source Data ETL Process Target DWH a software division ofQuerySurge™ Extract Transform Load
  31. 31. Leaders in ETL Solutions a software division ofQuerySurge™ Continuous Integration/ETL solutions - the Marketplace (ab initio)
  32. 32. Data Warehouse Data Quality Data Testing Big Data a software division ofQuerySurge™ ETL/ Data Integration BI & Analytics Data Governance The Data World Distilled
  33. 33. a software division ofQuerySurge™ Business Intelligence (BI)
  34. 34. Business Intelligence – What is it? • Software applications used in spotting, digging-out, and analyzing business data • BI provides simple access to data which can be used in day to day operations, integrates data into logical business areas • BI provides historical, current and predictive views of business operations • BI is made up of several related activities, including data mining, online analytical processing, querying and reporting. a software division ofQuerySurge™ Business Intelligence (BI) Business Intelligence software is like reporting engines on steroids
  35. 35. “The business intelligence (BI) and analytics software market is forecast to grow to $22.8 billion by the end of 2020” “The four large "stack" vendors (SAP, Oracle, IBM and Microsoft) continue to consolidate the market, owning 59 percent of the market share. ” - Analyst firm Gartner a software division ofQuerySurge™ BI & Analytics - the marketplace - Analyst firm Forrester Research’s ‘Forrester Wave’ Leaders in BI
  36. 36. Wal-Mart uses vast amounts of data and category analysis to dominate the industry. Amazon and Yahoo follow a "test and learn" approach to business changes. Hardee’s, Wendy’s, and T.G.I. Friday’s use BI to make strategic decisions. a software division ofQuerySurge™ Business Intelligence (BI) - Who uses it?
  37. 37. Data Mart A database that has the same characteristics as a data warehouse, but is usually smaller and is focused on the data for one division or one workgroup within an enterprise. Typically hold aggregated data and some granular data. It is a subset of the DWH and makes it more efficient for Business Intelligence reporting. BI tools sit on top of the data marts. a software division ofQuerySurge™ Business Intelligence (BI) & Data Marts Legacy DB CRM/ERP DB Finance DB Source Data ETL Process Target DW ETL Process Data Mart
  38. 38. Legacy DB CRM/ERP DB Finance DB Source Data ETL Process Target DWH ETL Process a software division ofQuerySurge™ Business Intelligence (BI) & Analytics Data Mart
  39. 39. Data Warehouse Data Quality Data Testing Big Data a software division ofQuerySurge™ ETL/ Data Integration BI & Analytics Data Governance The Data World Distilled
  40. 40. built byQuerySurge™ Data Quality Best Practices boost revenue by 66%. 46% of companies cite Data Quality as a barrier for adopting Business Intelligence products. 80% of organizations… will underestimate the costs related to the data acquisition tasks by an average of 50 percent. Data Quality Issues The average organization loses $14.2 million annually through poor Data Quality.
  41. 41. o Profiling o Parsing and standardization o Generalized Cleansing o Matching o Monitoring o Enrichment o Subject-area-specific support o Metadata management o Configuration environment Data Quality QuerySurge™ Primary Characteristics of Data Quality tools courtesy of Gartner’s “Magic Quadrant for Data Quality Tools” a software division of
  42. 42. “The market for data quality software tools reached $1.61 billion in 2017 (the most recent year for which Gartner has data), an increase of 11.6% over 2016. Gartner’s interactions with clients also indicate that demand remains high.” - Analyst firm Gartner a software division ofQuerySurge™ Data Quality - the marketplace - Analyst firm Gartner’s Magic Quadrant Leaders in Data Quality
  43. 43. Data Warehouse Data Quality Data Testing Big Data a software division ofQuerySurge™ ETL/ Data Integration BI & Analytics Data Governance The Data World Distilled
  44. 44. o Profiling o Parsing and standardization o Generalized Cleansing o Matching o Monitoring o Enrichment o Subject-area-specific support o Metadata management o Configuration environment Data Quality vs. Data Testing QuerySurge™ ▪ Data Completeness ▪ Data Transformation ▪ Regression Testing ▪ Reporting Primary Characteristics of Data Quality tools courtesy of Gartner’s “Magic Quadrant for Data Quality Tools” Data Verification & Validation? Primary Characteristics of Data Testing tools Courtesy of the book "Testing the Data Warehouse Practicum" Data Verification & Validation? a software division of
  45. 45. a software division ofQuerySurge™ Where Data Testing fits in your data strategy
  46. 46. Business Intelligence & Analytics CxOs are using Business Intelligence & Analytics to make critical business decisions – with the assumption that the underlying data is fine. “The average organization loses $14.2 million annually through poor Data Quality.” - Gartner Data Architecture The Executive Office and Critical Data Typical data issue areas ETL Mainframe
  47. 47. Data Analyst: Creates data requirements (source-to- target map or mapping doc) Data Architect: Models and builds data store (Big Data lake, Data Warehouse, etc.) ETL Developer: Transforms and loads data from sources to target data stores Data Tester: Validates the data, based on mappings, as it moves and transforms from sources to targets Key Roles in Building & Testing a Data Store a software division ofQuerySurge™
  48. 48. a.k.a. Source-to-Target Map It’s the critical element required to efficiently plan the target Data Stores. It also defines the Extract, Transform, Load (ETL) process. Intention: ✓ capture business rules ✓ data flow mapping and ✓ data movement requirements. Mapping Doc specifies: ▪ Source input definition ▪ Target/output details ▪ Business & data transformation rules ▪ Absolute data quality requirements ▪ Optional data quality requirements. a software division ofQuerySurge™ Data Requirements = Mapping Document
  49. 49. Sampling • Review Business Rules (i.e. mapping document, data flow mappings) • Write Tests in SQL editor • Execute 2 Tests: 1 at Source & 1 at Target • Export results to 2 Excel files • Compare a Sampling of results by eye (‘Stare & Compare’) Issue with Stare & Compare: Impossible to visually compare billions of data sets Result: usually less than 1% of data is compared Example - Current QuerySurge customer • one test = 100 million rows X 200 columns = 20 billion data sets • there is no practical way to manually verify (eyeball) this data set • the client has more than 15,000 total tests a software division of Most Common Data Validation Method QuerySurge™
  50. 50. Huge Risk Roles Tasks Timeline Data Analyst Data Architect ETL Developer Data Tester Model and build target Data Stores Review Mapping Document Maintain Target Data Stores Create 2 SQL tests for each mapping with SQL editor Review Mapping Document Dump results of tests to 2 Excel files Compare Excel files by eye Execute tests Determine Requirements Create & maintain Mapping Document iterate iterate Data Store Roles, Tasks, & Timelines Review Mapping Document Extract & load data or extract, transform, & load data Build data movement logic
  51. 51. a software division of About QuerySurge QuerySurge™
  52. 52. is the leading testing solution for automated validation & testing of Big Data QuerySurge Use Cases a software division of What is QuerySurge? a software division ofQuerySurge™
  53. 53. QuerySurge connects to any 2 points at one time How QuerySurge Works SQL HQL SQL Comparison of every data set Source Data Target Data Data Intelligence Reports, Data Health Dashboard, automated email reports Results – pass/fail Target Data Big Data stores • Hadoop • NoSQL Data Warehouses XML Web Services Source Data Data Stores • Databases • Data Warehouses • Data Marts Flat Files • Fixed Width • Delimited • Excel • JSON Business Intelligence Reports
  54. 54. ETL Developer: Codes data movement based on Mapping Requirements Data Warehouse ETL Data Tester: Tests data movement based on Mapping Requirements Data Mart ETL Source Data Big Data lake Testing Point #1 Testing Point #2 Testing Points #3 BI & Analytics BI Analyst extracts data for reports Testing Point #4 Tester tests BI Reports Big Data Process - Developer & Tester
  55. 55. QuerySurge supports the following data stores… • Amazon Redshift, Elastic Map Reduce, DynamoDB • Apache Hadoop/Hive, Spark • Cassandra • Cloudera • Couchbase • Exasol • Flat Files (delimited, fixed-width) • Hortonworks • IBM (Db2, Netezza, Informix, Big Insights, Cloudant, MDM, Cognos) • JSON files • Mainframe • MAPR • Micro Focus Vertica • Microsoft (SQL Server DWH, HDInsight, PDW, SSAS, Excel, Access, SharePoint) • MongoDB • Oracle (Oracle DB, MySQL, Exadata, NoSQL, Hadoop) • Pivotal GreenPlum • PostgreSQL • Salesforce • SAP (HANA, IQ, ASE, SQL Anywhere, Altiscale Data Cloud) • Snowflake • Tableau • Teradata, Aster • Workday • XML …and any other data store QuerySurge Supports 50+ Data Stores Flat Files Excel
  56. 56. Data Warehouse Data Quality Data Testing Big Data a software division ofQuerySurge™ ETL/ Data Integration BI & Analytics Data Governance The Data World Distilled
  57. 57. 1) Data stewardship Identifying and assigning roles and responsibilities. - who is creating its data, - who has overall responsibility for the data, - who uses the data, who routes it, - who oversees its use. 2) Data classification Identify and categorize data types into groups. 3) Data quality Data quality - the process of measuring the reliability of current data sets to provide information that can be used to make organizational decisions. 4) Data management Process where all the organization's data governance efforts come together. The company actively manages its data governance efforts and involves the creation of the architectures and business processes required to properly maintain the organization’s data through its full lifecycle. 4 main components of successful data governance
  58. 58. source: IBM Data Governance Council Maturity Model • Patterned after the Capability Maturity Model Integration(CMMI) from the Software Engineering Institute (SEI) at Carnegie Mellon University • Devised by IBM, along with 55 other companies • Few stable processes exist • “Just do it” mentality • Data-related policies become more clear & reflect the organization’s data principles. • Data integration opportunities are better leveraged. • Risk assessment for data integrity & quality becomes part of the organization’s project methodology. • Further defined value of data for more data elements • Data Governance methodology is introduced during the planning stages of new projects • Enterprise data models are documented & published • Data Governance is second nature • ROI for data-related projects is tracked • Business value of data mgmt is recognized • Cost of data mgmt is easier to manage • Costs are reduced as processes become automated • More data-related controls are documented • Metadata becomes an important part of documenting critical data elements. built by QuerySurge™ Data Maturity Model - Process
  59. 59. “Rapidly increasing growth in data volumes, rising regulatory & compliance mandates, and enhancing strategic risk management & decision-making are expected to drive the growth of the data governance market.” The data governance market size is expected to grow from $1.31 Billion in 2018 to $3.53 Billion by 2023, at a CAGR of 22.0%.” - MarketsAndMarkets.com a software division ofQuerySurge™ Data Governance - the marketplace - The Forrester Wave Leaders in Data Governance
  60. 60. The Data World Distilled
  61. 61. Data Warehouse ETL Data Mart ETL Source Data Big Data lake BI & Analytics Source types • Flat files • Excel • json • Xml • Web services • databases ETL Vendors • Ab Initio • IBM • Informatica • Microsoft • Oracle • SAP • SAS • Talend Hadoop Vendors • Amazon • Cloudera • Hortonworks • IBM • MAPR • Microsoft NoSQL Vendors • Amazon • Apache • Cassandra • Couchbase • MongoDB • Oracle Data Warehouse Vendors • Amazon • IBM • Microsoft • Micro Focus • Oracle • SAP • Snowflake • Teradata BI Vendors • IBM • Microsoft • Microstrategy • Qlik • Tableau • SAP • Oracle Data Quality • Informatica • IBM • Oracle • SAP • SAS • Talend Data Testing • QuerySurge • Informatica • Tricentis • Data Gaps • IceDQ • Bitwise Data Governance • Collibra • DATUM • GDE • IBM • Informatica • SAP the Data World by Top Vendors built by
  62. 62. The Data World Distilled: Understanding how the data world works in the Big Data era Any questions? Bill Hayduk Founder, CEO QuerySurge™

×