BIG DATA AND THE DATA
QUALITY IMPERATIVE
ED WRAZEN
VP PRODUCT MANAGEMENT, BIG DATA
2
EMERGENCE OF THE “NEW” ENTERPRISE DATA HUB
Data Sources
Applications
Data Warehouse
Data Marts
Databases
RDBMS
Files
Reference Data
Enterprise
Applications
Business
Intelligence
Custom
Analytics
Enterprise
Hub
New Sources
Monitor
& Manage
The expanded Data
Hub
Data Ingestion
+ Volume
+ Velocity
+ Variety
3
CHALLENGES WITH ENTERPRISE DATA
 Multiple silos of information
 Collating information is resource
intensive
 Analysis of data is difficult and
intensive
 Inconsistent, inaccurate,
incomplete data
 Difficult to reconcile
 Manual overhead
 No single version of the truth!
4
BIG DATA USE CASES
Profiled database
(RDMS such as
MySQL)
Single Customer View
• Cleanse, validate and match disparate customer data points to improve customer
experience, customer insights, more targeted marketing
Analytics
• Ensure accuracy for downstream analytics initiatives for marketing, fraud detection, risk
mitigation, etc.
Data Lake
• Data isn’t often cleansed as it enters the organization or data lake, resulting in larger
scale of data quality issues
Lower-cost storage, processing
• Organizations seek low-cost, high-performance ways to store, process, analyze, and
manage larger volumes of data at faster speeds
5
BIG DATA CHALLENGES
Common Big Data Roadblocks
 Limited in-house expertise
 Maturity of emerging technology
 Alignment to business objectives
 Complexity of unstructured data
 Lack of trust and assurance in data
 Inability to manage velocity of data expansion
 Number of internal and external sources of data
6
DATA QUALITY AND SINGLE CUSTOMER VIEWS
Integrating data from
multiple data sources
presents differences in
completeness,
consistency and
quality
7
Can I trust this data
enough to make my
critical decisions?
How accurate are
these numbers?
IMPACT OF POOR DATA QUALITY ON ANALYTICS
Are these terms
consistent with our
business definitions?
How current is this
data? When was it
last updated?
8
COMPLEXITY OF UNSTRUCTURED DATA
Revd new transfer claim ondiary. inj party
still OOW and treating. Atty repped.called
atty for status. Been treating for over 4
months now, sft tissue neck and back sprain.
Clmnt complaining of numbness and tingling
in fingers. Clmnt is now being scheduled for
MRI and CT scan. RX has been written for
oxycotin for pain. Atty will send all updated
meds and records he has in his file.
Severity
Indicator ?
Medication?
Employment
Status ?
9
INSIGHT AND CONTEXT FROM UNSTRUCTURED
DATA IS POSSIBLE, BUT DIFFICULT
Oxycotin = Oxycontin = Medication
10
BIG DATA QUALITY CHALLENGES PERSIST
“ I spend the vast majority of my time cleaning
data systems…cleaning and preparing
data sets makes everything I do better
… it’s the highest value activity I do”
Josh Willis
Senior Director of Data Science
Cloudera
(From “Training a new generation of
Data Scientists” – Cloudera video)
11
SHIFT IN FOCUS
Profiled database
(RDMS such as
MySQL)
Big Data adopters moving beyond the hype and focusing on traditional
challenges and business goals
Top 3 Challenges
 Finding value
 Risk and governance (security, privacy, data quality)
 Integrating multiple data sources
Top 3 Priorities
 Enhanced customer experience
 Process efficiency
 More targeted marketing
Source: Gartner
12
ABOUT TRILLIUM
Trillium is a global provider and innovator of data quality solutions
• A business unit of Harte Hanks (HHS-NYSE)
• Over 2 decades in business with specific focus on data quality
• Data quality solutions for Big Data, CRM, MDM, ERP, Single Customer Views, Data Integration
Data Governance, Risk & Compliance, Fraud, Marketing
Analyst Ratings
Gartner
 2014 Magic Quadrant: Leader
Forrester
 Forrester Wave 2013 – Leader
Bloor Research
 Market Leader
Client Examples
13
TRILLIUM BIG DATA
• Graphically build DQ workflows
• Reuse existing processes
• Deploy natively in Hadoop
• Leverage Hadoop
processing architecture
Trillium Server
Interface
Hadoop
HDFS
17 New England Executive Park, Suite 300 | Burlington, MA 01803 | 1-978-436-8900 | www.trilliumsoftware.com
Parse
Parse
Standardize
Match
Commonize
14
BENEFITS OF BIG DATA QUALITY
Understand the impact of data quality and reduce downstream risk
• Profile, analyze and measure the quality of multi-domain data
• Create a data quality blueprint and plan for data cleansing
Build the best view of your global customer data
• Cleanse and enrich customer data and create single customer views
• Improve business processes, detect fraud, create personalized customer
experiences, and deploy targeted marketing campaigns
Maximize the value of your Big Data investments
• Power downstream machine learning initiatives and analytics platforms with
reliable, fit-for-purpose data that supports timely, accurate business decisions
17 New England Executive Park, Suite 300 | Burlington, MA 01803 | 1-978-436-8900 | www.trilliumsoftware.com
15
CONTACT INFORMATION
email: ed.wrazen@trilliumsoftware.com
Tel: +44 118 940 7634
web: www.trilliumsoftware.com
17 New England Executive Park, Suite 300 | Burlington, MA 01803 | 1-978-436-8900 | www.trilliumsoftware.com
email: info@intodq.com
Tel: 0297 254 390
web: www.intodq.com

Big Data Expo 2015 - Trillium software Big Data and the Data Quality

  • 1.
    BIG DATA ANDTHE DATA QUALITY IMPERATIVE ED WRAZEN VP PRODUCT MANAGEMENT, BIG DATA
  • 2.
    2 EMERGENCE OF THE“NEW” ENTERPRISE DATA HUB Data Sources Applications Data Warehouse Data Marts Databases RDBMS Files Reference Data Enterprise Applications Business Intelligence Custom Analytics Enterprise Hub New Sources Monitor & Manage The expanded Data Hub Data Ingestion + Volume + Velocity + Variety
  • 3.
    3 CHALLENGES WITH ENTERPRISEDATA  Multiple silos of information  Collating information is resource intensive  Analysis of data is difficult and intensive  Inconsistent, inaccurate, incomplete data  Difficult to reconcile  Manual overhead  No single version of the truth!
  • 4.
    4 BIG DATA USECASES Profiled database (RDMS such as MySQL) Single Customer View • Cleanse, validate and match disparate customer data points to improve customer experience, customer insights, more targeted marketing Analytics • Ensure accuracy for downstream analytics initiatives for marketing, fraud detection, risk mitigation, etc. Data Lake • Data isn’t often cleansed as it enters the organization or data lake, resulting in larger scale of data quality issues Lower-cost storage, processing • Organizations seek low-cost, high-performance ways to store, process, analyze, and manage larger volumes of data at faster speeds
  • 5.
    5 BIG DATA CHALLENGES CommonBig Data Roadblocks  Limited in-house expertise  Maturity of emerging technology  Alignment to business objectives  Complexity of unstructured data  Lack of trust and assurance in data  Inability to manage velocity of data expansion  Number of internal and external sources of data
  • 6.
    6 DATA QUALITY ANDSINGLE CUSTOMER VIEWS Integrating data from multiple data sources presents differences in completeness, consistency and quality
  • 7.
    7 Can I trustthis data enough to make my critical decisions? How accurate are these numbers? IMPACT OF POOR DATA QUALITY ON ANALYTICS Are these terms consistent with our business definitions? How current is this data? When was it last updated?
  • 8.
    8 COMPLEXITY OF UNSTRUCTUREDDATA Revd new transfer claim ondiary. inj party still OOW and treating. Atty repped.called atty for status. Been treating for over 4 months now, sft tissue neck and back sprain. Clmnt complaining of numbness and tingling in fingers. Clmnt is now being scheduled for MRI and CT scan. RX has been written for oxycotin for pain. Atty will send all updated meds and records he has in his file. Severity Indicator ? Medication? Employment Status ?
  • 9.
    9 INSIGHT AND CONTEXTFROM UNSTRUCTURED DATA IS POSSIBLE, BUT DIFFICULT Oxycotin = Oxycontin = Medication
  • 10.
    10 BIG DATA QUALITYCHALLENGES PERSIST “ I spend the vast majority of my time cleaning data systems…cleaning and preparing data sets makes everything I do better … it’s the highest value activity I do” Josh Willis Senior Director of Data Science Cloudera (From “Training a new generation of Data Scientists” – Cloudera video)
  • 11.
    11 SHIFT IN FOCUS Profileddatabase (RDMS such as MySQL) Big Data adopters moving beyond the hype and focusing on traditional challenges and business goals Top 3 Challenges  Finding value  Risk and governance (security, privacy, data quality)  Integrating multiple data sources Top 3 Priorities  Enhanced customer experience  Process efficiency  More targeted marketing Source: Gartner
  • 12.
    12 ABOUT TRILLIUM Trillium isa global provider and innovator of data quality solutions • A business unit of Harte Hanks (HHS-NYSE) • Over 2 decades in business with specific focus on data quality • Data quality solutions for Big Data, CRM, MDM, ERP, Single Customer Views, Data Integration Data Governance, Risk & Compliance, Fraud, Marketing Analyst Ratings Gartner  2014 Magic Quadrant: Leader Forrester  Forrester Wave 2013 – Leader Bloor Research  Market Leader Client Examples
  • 13.
    13 TRILLIUM BIG DATA •Graphically build DQ workflows • Reuse existing processes • Deploy natively in Hadoop • Leverage Hadoop processing architecture Trillium Server Interface Hadoop HDFS 17 New England Executive Park, Suite 300 | Burlington, MA 01803 | 1-978-436-8900 | www.trilliumsoftware.com Parse Parse Standardize Match Commonize
  • 14.
    14 BENEFITS OF BIGDATA QUALITY Understand the impact of data quality and reduce downstream risk • Profile, analyze and measure the quality of multi-domain data • Create a data quality blueprint and plan for data cleansing Build the best view of your global customer data • Cleanse and enrich customer data and create single customer views • Improve business processes, detect fraud, create personalized customer experiences, and deploy targeted marketing campaigns Maximize the value of your Big Data investments • Power downstream machine learning initiatives and analytics platforms with reliable, fit-for-purpose data that supports timely, accurate business decisions 17 New England Executive Park, Suite 300 | Burlington, MA 01803 | 1-978-436-8900 | www.trilliumsoftware.com
  • 15.
    15 CONTACT INFORMATION email: ed.wrazen@trilliumsoftware.com Tel:+44 118 940 7634 web: www.trilliumsoftware.com 17 New England Executive Park, Suite 300 | Burlington, MA 01803 | 1-978-436-8900 | www.trilliumsoftware.com email: info@intodq.com Tel: 0297 254 390 web: www.intodq.com

Editor's Notes