WHAT’S NEW - TRILLIUM DATA QUALITY
Harald Smith
February 2018
Speaker
Harald Smith
▪ Director of Product Management, Syncsort
▪ 20 years in Information Management focused on data quality,
integration, and governance
▪ Consulting, product management, software & solution development
▪ Co-author of Patterns of Information Management, as well as
two Redbooks on Information Governance and Data Integration
▪ Current Blog on InfoWorld: “Data Democratized”
Agenda
3
Syncsort Confidential and Proprietary - do not copy or distribute
▪ Big Data Quality
▪ Data Governance
▪ Operational Data Quality
▪ Production Processing & Application Integration
Data is Top of Mind
Volume and Complexity Is Growing
Compliance Demands are Broader and
Deeper
Trust and Confidence in Data Is Decreasing
“Only 3% of the DQ scores in our study can be rated ‘acceptable’ using the loosest-possible standard.”
-- Harvard Business Review, September 2017
4
Insights from Syncsort’s 2017 Big Data Trends survey
▪ Data Quality is
recognized as a mission-
critical success factor for
the Data Lake
▪ Data Quality tops the list of
challenges of data lake
implementation, followed
closely by Data Governance
▪ But… not everyone is
making the connection
between Data Quality
and Big Data success
▪ Participants who did not
include data quality as a top
3 priority for implementing
the data lake expressed the
most interest in analytically-
intensive data lake uses…
which are highly dependent
on proper data quality
▪ Financial services and
insurance industries are the
most focused on Data
Quality and Data
Governance
▪ Named Data Quality as top
priority 50% more often than
participants from other
industries
▪ Also identified Data Governance
as a top priority at more than
twice the rate of those from
other industries
5
Redefine the value of quality data
▪ Enable business leaders and users to seamlessly anticipate opportunities, uncover hidden risks
and make better decisions by rapidly providing complete, accurate and trusted data in
everything they do
▪ Enable governance of critical data elements through integration of data quality in the right place
at the right time
▪ Enable enrichment, validation, & verification of data central to the Customer 360 view,
including Big Data environments (e.g. Hadoop and Spark)
▪ Simplify User Experience by focusing on core use cases and patterns
▪ Provide consistent processing and results on premise and in the cloud
Syncsort Confidential and Proprietary - do not copy or distribute
6
Trillium Software Product Portfolio
Trillium Software System
On Premise or via Trillium Cloud
Deploy any or all products to the cloud
Completely managed SaaS in AWS or Azure deployed in 30 days or less
Trillium Discovery 15.7
Automated data profiling and discovery tool that identifies data
quality issues, facilitates business rule management, and provides
data quality metrics
Trillium Quality 15.7
Data quality engine that provides data cleansing, matching, and
enrichment for multi-domain, global data (including global address
validation)
Trillium Precise 1.0
Integrates data enrichment for all Trillium
products for key data elements including
phone, email, IP address, and person
Trillium Solutions
CRM, ERP, MDM
Customized solutions for leading platforms:
• Trillium Quality for Dynamics CRM 2.3
• Trillium Quality for SAP
Discovery Center/Administration Center
Web-browser based UI’s for specific users
Trillium Quality for Big Data 15.7
Enables data quality processing including
cleansing, matching, and global data
enrichment on Big Data platforms
Trillium Director/TSI - Real-time integration
Enables real-time, secure data quality within any application via web
services or API’s
7
Trillium Software Functional Overview
Data Profiling
Trillium QualityTrillium Discovery
Business Rules &
Data Quality
Assessment
Data Validation,
Standardization,
Linking & more
Data
Verification &
Enrichment
• CRM
• Customer
360
Operational Integrations
Data
Governance
Analytics &
Reporting
8
Trillium Discovery – Profiling and Monitoring
▪ Measure and mitigate risk and cost
associated with poor data quality
▪ Profile data sources to understand
current conditions and quality issues
▪ Report on data quality metrics for
accuracy, consistency and completeness
▪ Create and validate business rules
▪ Monitor data quality thresholds and
trends over time
▪ Quantify, annotate, and prioritize data
quality issues
▪ Generate recode and lookup tables, and
prepare remediated files
Syncsort Confidential and Proprietary - do not copy or distribute
9
Trillium Quality – Cleansing, Verification, Enrichment, and Matching
▪ Develop workflows to transform, parse,
standardize, match and survive best record
▪ Consolidate data sources on input
▪ Match on party, household, business or any
custom identifiers
▪ De-duplicate and unify data sources to create a
single golden record
▪ Global address validation with individual
country postal rules to clean, correct and
complete name and address data
▪ Enrich missing postal information,
latitude/longitude and other reference data
▪ Deploy in batch, real-time, Hadoop or in
multiple applications
10
Syncsort Confidential and Proprietary - do not copy or distribute
Trillium Director/TSI
(Real-time DQ Application Server)
Core EngineCore Engine
Rules
CustomApplications
CleanseCleanse
ClientRequests
C,Java,WebService,XMLoverHTTPS
MatchMatch
Customer Applications
Trillium Director – Real-time Data Quality
▪ Deploy batch or real-time
Trillium Quality services across
multiple platforms, servers,
and applications through one
interface
▪ Integrate into multiple
workflows to provide data
quality services to applications
throughout the enterprise
▪ REST, SOAP web services
▪ Monitor the availability of
each Director and facilitate
fail-over if a Director becomes
unavailable
11
Syncsort Confidential and Proprietary - do not copy or distribute
TRILLIUM QUALITY FOR BIG DATA
New Release!
Syncsort Confidential and Proprietary - do not copy or distribute
12
Trillium Quality for Big Data
Benefits
Data Lake is the source of
TRUSTED data for analytics
Robust data quality
processing at Big Data scale
to meet SLAs, support use
cases like Customer 360
No coding or tuning saves
time and resources – and
helps address Big Data skills
shortages
Save time and network
resources by keeping data in
place in the data lake
SolutionKey Challenges
Big Data projects require:
Massive scalability
Low latency
Many data sources for a
complete view
Data quality processing using a
standalone server is no longer
adequate to keep up:
Millions of transactions per
day now very common
Critical for data quality
processing to meet end
user SLAs and/or key
success factors
Trillium Quality for Big Data
executes data quality jobs
natively within Big Data
frameworks (Hadoop
MapReduce, Apache Spark)
Leverages the DMX-h
execution framework
(Intelligent Execution)
No need to move/copy huge
volumes of data for quality
processing; Big Data remains
in place
No coding or tuning; jobs are
automatically optimized
13
New: Trillium Quality for Big Data – Key Features
▪ 1st release of Trillium Quality for Big Data leveraging the Syncsort DMX-h execution
framework
▪ What’s Included:
▪ Support for Hortonworks Data Platform (HDP) and Cloudera
▪ Dynamically leverages MapReduce and Apache Spark (1.0, 2.0)
▪ Standard OOTB support for cleansing, address verification, and matching, including multi-match
implementations
▪ Project Deployment to Big Data from the Trillium Control Center including Software and Postal
Directories offering Global project support
▪ UUID functionality supported and automated from the Trillium Control Center
▪ Comprehensive documentation set - including install and developer guides - is available with the
release
Syncsort Confidential and Proprietary - do not copy or distribute 14
Trillium Quality for Big Data – Functional Architecture
Development
Environment
Data Quality
Processing
1
n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Runtime
Environment
Data Quality
Processing
INPU
T
Flat Files
REF
.
Trillium
Control
Center
Deploy to
Hadoop
Trillium Quality Hadoop Cluster
Develop Once – Deploy Anywhere
▪ Reuse existing Trillium Data Quality projects
▪ Reuse existing skills and experience in Trillium Software
▪ Harness Trillium Software functions and preconfigured workflows and rules
▪ Maintain parsing, standardization, matching, postal enrichment rule sets
▪ Easy-to-deploy via automation
▪ Leverages DMX-h Intelligent Execution to determine optimal execution: Hadoop or Spark
(No recoding required!)
Syncsort Confidential and Proprietary - do not copy or distribute
15
Trillium Quality for Big Data
Focus on Data Quality, not the Big Data platform
▪ Use existing Data Quality skills and expertise
▪ No need to worry about mappers, reducers, big side or small side of joins, etc
▪ Automatic optimization for best performance, load balancing, etc.
▪ No changes or tuning required, even if you change execution frameworks
▪ Future-proof job designs for emerging compute frameworks, e.g. Spark 2.x
▪ Run multiple execution frameworks in a single job
Single GUI Execute Anywhere!
16Syncsort Confidential and Proprietary - do not copy or distribute
Intelligent Execution - Insulate your organization from underlying complexities of Hadoop
TRILLIUM SOFTWARE SYSTEM
Expanded Governance Integrations!
Syncsort Confidential and Proprietary - do not copy or distribute
17
Trillium Integrations for Data Governance
Benefits
End-to-End Data Governance:
From defining policies/rules for
data quality…
To technical implementation of
data testing and metrics to
ensure rules are complied with in
data management
Data quality metrics results not
meeting thresholds can alert data
steward(s) to take corrective
action or provide remediated
data
Automated integration saves
time and resources – and helps
ensure trust in data
SolutionKey Challenges
Organizations invest in data
governance solutions to support
compliance, ensure data is
actionable by the business, etc.
Many of these solutions
define and manage data
quality rules, but don’t
provide the processing
through which the rules are
executed on the data, or the
quality is measured
Trillium Discovery provides REST
API integration with solutions
such as Collibra DGC and ASG
Enterprise Data Intelligence or BI
tools (e.g. Qlik, Tableau)
Automated delivery of
policy-based rules to Trillium
Discovery
Automated delivery of rule
or profiling results to data
governance solution
API’s for support of custom
integrations
18
TSS v15.7 REST APIs and Governance Integrations
▪ Data Quality is a key core component to a Data
Governance process to address business compliance,
risk and data management requirements and
standards.
▪ What’s included:
▪ Published Trillium Discovery APIs with full
documentation
▪ All features available in the Discovery Center UI
available via API: data source metadata, profiling
results, data quality rules & rule results, join and
dependency analysis
▪ Standard GET/POST functions using JSON
▪ Filter & sort rows, select columns of interest, and page
result sets
Syncsort Confidential and Proprietary - do not copy or distribute 19
Trillium Software System
Governance Integration – Functional Architecture
TS Discovery 15.7
Automated data profiling and discovery
that identifies data quality issues,
facilitates business rule management, and
provides data quality metrics
Trillium Clients
Control Center (rich)
Discovery Center (browser)
Administration Center (browser)
Trillium Repository
Export data grid
rows to file
Data extracts
.csv
ODBC Reporting
Adapter (SQL)
REST API’s
REST API’s
Data
interchange
.json
GET/POST
Data extracts
.csv
Import/export
rule sets
Rule set
packages
.xml
Excel
Queries
Application integration
BI & reporting tools
20
Data Governance integration with Collibra
▪ Trillium Discovery and Collibra DGC:
▪ Bi-directional linkage of Collibra data
quality rules with Trillium Discovery
business rules
▪ Packaged workflow can run OOTB
▪ Develop and modify in Collibra and transfer
those rules to Discovery to apply
appropriate syntax and connect to data
sources
▪ Collibra data quality rules become available
in Trillium Discovery
▪ Automatically deliver results from
associated data sources to Collibra as Data
QualityMetrics
21
Use Case: Data Governance Policy Management
Trillium Discovery
Converts DGC rules into technically
executable data quality rules
Constantly runs data quality metrics on
near real-time basis
Closing the loop with DGC…
If Data Quality metrics fall below defined
thresholds, Collibra users are alerted via
their dashboards
Data stewards can review in Trillium
Discovery and take corrective actions
Bi-directional
connectivity to
constantly sync:
▪ Collibra
rulebooks and
Discovery Center
rules
▪ Results of
Discovery quality
tests
Collibra DGC
Lets non-technical users define
business policies and data rules in
plain language
22
Trillium Discovery-Collibra Integration – Functional Architecture
Collibra DGC Trillium DiscoveryCollibra Connect
DQ MetricDQ Rule
1. Create the ‘semantic’ Data Quality Rule in
the Rulebook
2. Optionally edit the Predicate (expression)
and Threshold
5. Review results for all associated Data
Quality Metrics
3. Edit the DQ Rule Expression and associate
to 1 or more data sources
4. Run the ‘technical’ Data Quality Rule and
generate Passing/Failing counts
• Ongoing bi-directional polling or scheduling
• Linked on Collibra domain & object
identifiers
Syncsort Confidential and Proprietary - do not copy or distribute
23
Data Governance integration with ASG
▪ Trillium Discovery and ASG Enterprise
Data Intelligence:
▪ Uni-directional delivery of Trillium
Discovery profiling results
▪ ASG scanner runs OOTB
▪ End-to-End data transparency using
data lineage/data relationship
graphs (ASG) with data quality
metrics (Trillium Discovery) through
each transformation
▪ Visual evidence that data processing
is in fact taking place where & when
intended, with no unexpected
results/impairment to data quality
levels through a process flow
24
Use Case: Data Governance Compliance Tracing
25
Validity
Sun 05/01/2016 12:00:00 PM MDT
Threshold: 100
Pass: 96
Dimensions: Accuracy
Completeness
Sun 05/01/2016 12:00:00 PM MDT
Threshold: 98
Pass: 96
Dimensions: Completeness
ASG Enterprise Data intelligence
Lets users trace data quality issues
with critical data elements through the
data lineage graph to find where &
when the issue appeared
TRILLIUM QUALITY FOR DYNAMICS CRM
New Features!
Syncsort Confidential and Proprietary - do not copy or distribute
26
Trillium Quality for Dynamics CRM
Benefits
Rapid time to value – can be
operational within a day
High-quality data is constantly
maintained exactly where it is
needed the most - directly
within MS Dynamics CRM
SolutionKey Challenges
Organizations invest in CRM to
drive insight for sales,
customer support in order to
provide better service and
increase revenue.
Duplicates, old data,
incomplete data,
misspellings, data in wrong
fields, ...
Poor data quality is a
primary reason for 40% of
all business initiatives
failing to achieve their
target*
Trillium Quality for Dynamics CRM
Real-time cleansing &
matching right in MS
Dynamics CRM
Comprehensive batch
cleansing of the entire CRM
database (including
resolution from prior data
migrations)
Leads Analysis tool for insight
into the value of prospect lists
before loading into CRM
27*source: Gartner Research
New v15.7 MS Dynamics CRM 2.3 – New Key Features
▪ Seamless Data Quality integration directly in the Dynamics CRM environment enabling
evaluation, cleansing, de-duplication, and merging of global customer and leads
records both on premise and in the Cloud.
▪ What’s included:
▪ Enable Cross-Entity Matching for Leads
▪ Match Leads to Contacts for Newly Entered Leads
▪ Match Link Table for Leads shows Matches to both Leads and Contacts
▪ Enable Merge Between Leads and Contacts
▪ Enable Editing of Web Resource XML (Deployment Manager)
▪ New Leads Analysis tool – Examine Leads and Directly; Import only qualified, new records into CRM
▪ End User Master Batch Availability for cleansing full CRM database
▪ Microsoft Dynamics 365 support
▪ Deploy Global Data Quality solution, including email/phone verification through via Trillium Precise
Syncsort Confidential and Proprietary - do not copy or distribute 28
Trillium for MS Dynamics CRM v2.3 – Online Process
Integration directly in Dynamics CRM screens
▪ Enter Contacts or Leads as normal
▪ Popup validation of Address (optionally email & phone)
▪ Match to existing Contacts and Leads (cross-entity) with
cross-population of validated fields
Syncsort Confidential and Proprietary - do not copy or distribute
29
Trillium for MS Dynamics CRM v2.3 – Batch Process
Integration into Batch Upload
▪ Use the Trillium web interface to analyze a lead list against the CRM instance to determine new or existing leads
▪ Establish valid patterns for matches
▪ Confirm the data prior to import
Syncsort Confidential and Proprietary - do not copy or distribute
30
TRILLIUM SOFTWARE SYSTEM
Postal Maintenance & Real-time Integrations
Syncsort Confidential and Proprietary - do not copy or distribute
31
Trillium Quality Postal Download Web Service
▪ A web-based application to download, activate, and update all postal and geocoding directories
where TSS administrators may manage licensing, view status and perform maintenance on
directories as scheduled or required, such as monthly or quarterly, to assure accurate
processing against freshest possible data with minimal downtime.
▪ What’s included:
▪ Support for ASCII directories for TSS Control Center and 32-bit processing
▪ Support for UTF-8 directories for 64-bit processing with EDQ
▪ Support for Trillium Cloud to perform updates in AWS
▪ Interface to view/examine current directory status on a per country or per directory file basis
▪ User defined 1-Step or 2-Step process for directory Update and Activation through the User Interface
▪ Implementation via a User Interface or an Automated Script run on a user-defined schedule
▪ User-defined transfer processing speed - via number of workers, delay, buffer-size
▪ Enabled via a secure postal access key
Syncsort Confidential and Proprietary - do not copy or distribute 32
Trillium Quality Postal Update – Functional Process
Readily managed Postal Table update process
▪ Managed through the the TSI Web Server Administration browser UI
▪ Set configuration for the Postal Download Service
▪ Can choose to use 1- or 2-step process
▪ Check on Postal Directory Status and see what is out-of-date
▪ Review Postal License Management and request new countries
▪ Track updates to the Postal Directories as they happen
Syncsort Confidential and Proprietary - do not copy or distribute
33
TSS v15.7 Real-time REST Web Services
▪ New Web Service methods and samples are available for real-time data cleansing and
matching​ through the Trillium Server Interface (TSI) for application integration
▪ What’s included:
▪ Trillium Server Interface supports the industry standard REST web services requests (in addition to
SOAP) with full JSON support
▪ Adds REST real-time Cleanse and Match (both Reference and Window match) support via Apache
Tomcat​
▪ Includes Software Development Kit (SDK) for REST and SOAP web services in support of both Java
and .Net C# (including sample files to facilitate upgrading from the Director)​
▪ Incorporates SSL (Secure Socket Layer) implementation to ensure secure data transfer with TSI Web
Services ​
Syncsort Confidential and Proprietary - do not copy or distribute 34
TRILLIUM CLOUD
All features available on-premise or in Cloud
Syncsort Confidential and Proprietary - do not copy or distribute
35
Trillium Cloud
▪ Entire Trillium product portfolio is
available via the Cloud
▪ Cloud based solutions licensed on a
‘subscription basis’
▪ Complete Infrastructure & Data
Center Facilities
▪ Program or Project Management at
a Technical Level
▪ Technical Operations and Monitoring
of Infrastructure / Solution
▪ Trillium Cloud Solution Benefits:
▪ No Long Term Capital Investment
▪ Faster ROI
▪ Removal of Technical Complexity
36 36
Questions and Next Steps
▪ For more information on Trillium Software and our data quality solutions, please visit:
www.trilliumsoftware.com/products
▪ For the latest Trillium Software release, please visit our Customer Portal or contact us at:
www.trilliumsoftware.com/contact-us
▪ Contact Info:
Harald Smith, Director of Product Management, Syncsort
Harald.Smith@trilliumsoftware.com
https://www.linkedin.com/in/harald-smith-71028b
twitter: @haraldsmith1
37
THANK YOU

What’s New in Syncsort’s Trillium Software System (TSS) 15.7

  • 1.
    WHAT’S NEW -TRILLIUM DATA QUALITY Harald Smith February 2018
  • 2.
    Speaker Harald Smith ▪ Directorof Product Management, Syncsort ▪ 20 years in Information Management focused on data quality, integration, and governance ▪ Consulting, product management, software & solution development ▪ Co-author of Patterns of Information Management, as well as two Redbooks on Information Governance and Data Integration ▪ Current Blog on InfoWorld: “Data Democratized”
  • 3.
    Agenda 3 Syncsort Confidential andProprietary - do not copy or distribute ▪ Big Data Quality ▪ Data Governance ▪ Operational Data Quality ▪ Production Processing & Application Integration
  • 4.
    Data is Topof Mind Volume and Complexity Is Growing Compliance Demands are Broader and Deeper Trust and Confidence in Data Is Decreasing “Only 3% of the DQ scores in our study can be rated ‘acceptable’ using the loosest-possible standard.” -- Harvard Business Review, September 2017 4
  • 5.
    Insights from Syncsort’s2017 Big Data Trends survey ▪ Data Quality is recognized as a mission- critical success factor for the Data Lake ▪ Data Quality tops the list of challenges of data lake implementation, followed closely by Data Governance ▪ But… not everyone is making the connection between Data Quality and Big Data success ▪ Participants who did not include data quality as a top 3 priority for implementing the data lake expressed the most interest in analytically- intensive data lake uses… which are highly dependent on proper data quality ▪ Financial services and insurance industries are the most focused on Data Quality and Data Governance ▪ Named Data Quality as top priority 50% more often than participants from other industries ▪ Also identified Data Governance as a top priority at more than twice the rate of those from other industries 5
  • 6.
    Redefine the valueof quality data ▪ Enable business leaders and users to seamlessly anticipate opportunities, uncover hidden risks and make better decisions by rapidly providing complete, accurate and trusted data in everything they do ▪ Enable governance of critical data elements through integration of data quality in the right place at the right time ▪ Enable enrichment, validation, & verification of data central to the Customer 360 view, including Big Data environments (e.g. Hadoop and Spark) ▪ Simplify User Experience by focusing on core use cases and patterns ▪ Provide consistent processing and results on premise and in the cloud Syncsort Confidential and Proprietary - do not copy or distribute 6
  • 7.
    Trillium Software ProductPortfolio Trillium Software System On Premise or via Trillium Cloud Deploy any or all products to the cloud Completely managed SaaS in AWS or Azure deployed in 30 days or less Trillium Discovery 15.7 Automated data profiling and discovery tool that identifies data quality issues, facilitates business rule management, and provides data quality metrics Trillium Quality 15.7 Data quality engine that provides data cleansing, matching, and enrichment for multi-domain, global data (including global address validation) Trillium Precise 1.0 Integrates data enrichment for all Trillium products for key data elements including phone, email, IP address, and person Trillium Solutions CRM, ERP, MDM Customized solutions for leading platforms: • Trillium Quality for Dynamics CRM 2.3 • Trillium Quality for SAP Discovery Center/Administration Center Web-browser based UI’s for specific users Trillium Quality for Big Data 15.7 Enables data quality processing including cleansing, matching, and global data enrichment on Big Data platforms Trillium Director/TSI - Real-time integration Enables real-time, secure data quality within any application via web services or API’s 7
  • 8.
    Trillium Software FunctionalOverview Data Profiling Trillium QualityTrillium Discovery Business Rules & Data Quality Assessment Data Validation, Standardization, Linking & more Data Verification & Enrichment • CRM • Customer 360 Operational Integrations Data Governance Analytics & Reporting 8
  • 9.
    Trillium Discovery –Profiling and Monitoring ▪ Measure and mitigate risk and cost associated with poor data quality ▪ Profile data sources to understand current conditions and quality issues ▪ Report on data quality metrics for accuracy, consistency and completeness ▪ Create and validate business rules ▪ Monitor data quality thresholds and trends over time ▪ Quantify, annotate, and prioritize data quality issues ▪ Generate recode and lookup tables, and prepare remediated files Syncsort Confidential and Proprietary - do not copy or distribute 9
  • 10.
    Trillium Quality –Cleansing, Verification, Enrichment, and Matching ▪ Develop workflows to transform, parse, standardize, match and survive best record ▪ Consolidate data sources on input ▪ Match on party, household, business or any custom identifiers ▪ De-duplicate and unify data sources to create a single golden record ▪ Global address validation with individual country postal rules to clean, correct and complete name and address data ▪ Enrich missing postal information, latitude/longitude and other reference data ▪ Deploy in batch, real-time, Hadoop or in multiple applications 10 Syncsort Confidential and Proprietary - do not copy or distribute
  • 11.
    Trillium Director/TSI (Real-time DQApplication Server) Core EngineCore Engine Rules CustomApplications CleanseCleanse ClientRequests C,Java,WebService,XMLoverHTTPS MatchMatch Customer Applications Trillium Director – Real-time Data Quality ▪ Deploy batch or real-time Trillium Quality services across multiple platforms, servers, and applications through one interface ▪ Integrate into multiple workflows to provide data quality services to applications throughout the enterprise ▪ REST, SOAP web services ▪ Monitor the availability of each Director and facilitate fail-over if a Director becomes unavailable 11 Syncsort Confidential and Proprietary - do not copy or distribute
  • 12.
    TRILLIUM QUALITY FORBIG DATA New Release! Syncsort Confidential and Proprietary - do not copy or distribute 12
  • 13.
    Trillium Quality forBig Data Benefits Data Lake is the source of TRUSTED data for analytics Robust data quality processing at Big Data scale to meet SLAs, support use cases like Customer 360 No coding or tuning saves time and resources – and helps address Big Data skills shortages Save time and network resources by keeping data in place in the data lake SolutionKey Challenges Big Data projects require: Massive scalability Low latency Many data sources for a complete view Data quality processing using a standalone server is no longer adequate to keep up: Millions of transactions per day now very common Critical for data quality processing to meet end user SLAs and/or key success factors Trillium Quality for Big Data executes data quality jobs natively within Big Data frameworks (Hadoop MapReduce, Apache Spark) Leverages the DMX-h execution framework (Intelligent Execution) No need to move/copy huge volumes of data for quality processing; Big Data remains in place No coding or tuning; jobs are automatically optimized 13
  • 14.
    New: Trillium Qualityfor Big Data – Key Features ▪ 1st release of Trillium Quality for Big Data leveraging the Syncsort DMX-h execution framework ▪ What’s Included: ▪ Support for Hortonworks Data Platform (HDP) and Cloudera ▪ Dynamically leverages MapReduce and Apache Spark (1.0, 2.0) ▪ Standard OOTB support for cleansing, address verification, and matching, including multi-match implementations ▪ Project Deployment to Big Data from the Trillium Control Center including Software and Postal Directories offering Global project support ▪ UUID functionality supported and automated from the Trillium Control Center ▪ Comprehensive documentation set - including install and developer guides - is available with the release Syncsort Confidential and Proprietary - do not copy or distribute 14
  • 15.
    Trillium Quality forBig Data – Functional Architecture Development Environment Data Quality Processing 1 n . . . . . . . . . . . . . . . . . . . . . . Runtime Environment Data Quality Processing INPU T Flat Files REF . Trillium Control Center Deploy to Hadoop Trillium Quality Hadoop Cluster Develop Once – Deploy Anywhere ▪ Reuse existing Trillium Data Quality projects ▪ Reuse existing skills and experience in Trillium Software ▪ Harness Trillium Software functions and preconfigured workflows and rules ▪ Maintain parsing, standardization, matching, postal enrichment rule sets ▪ Easy-to-deploy via automation ▪ Leverages DMX-h Intelligent Execution to determine optimal execution: Hadoop or Spark (No recoding required!) Syncsort Confidential and Proprietary - do not copy or distribute 15
  • 16.
    Trillium Quality forBig Data Focus on Data Quality, not the Big Data platform ▪ Use existing Data Quality skills and expertise ▪ No need to worry about mappers, reducers, big side or small side of joins, etc ▪ Automatic optimization for best performance, load balancing, etc. ▪ No changes or tuning required, even if you change execution frameworks ▪ Future-proof job designs for emerging compute frameworks, e.g. Spark 2.x ▪ Run multiple execution frameworks in a single job Single GUI Execute Anywhere! 16Syncsort Confidential and Proprietary - do not copy or distribute Intelligent Execution - Insulate your organization from underlying complexities of Hadoop
  • 17.
    TRILLIUM SOFTWARE SYSTEM ExpandedGovernance Integrations! Syncsort Confidential and Proprietary - do not copy or distribute 17
  • 18.
    Trillium Integrations forData Governance Benefits End-to-End Data Governance: From defining policies/rules for data quality… To technical implementation of data testing and metrics to ensure rules are complied with in data management Data quality metrics results not meeting thresholds can alert data steward(s) to take corrective action or provide remediated data Automated integration saves time and resources – and helps ensure trust in data SolutionKey Challenges Organizations invest in data governance solutions to support compliance, ensure data is actionable by the business, etc. Many of these solutions define and manage data quality rules, but don’t provide the processing through which the rules are executed on the data, or the quality is measured Trillium Discovery provides REST API integration with solutions such as Collibra DGC and ASG Enterprise Data Intelligence or BI tools (e.g. Qlik, Tableau) Automated delivery of policy-based rules to Trillium Discovery Automated delivery of rule or profiling results to data governance solution API’s for support of custom integrations 18
  • 19.
    TSS v15.7 RESTAPIs and Governance Integrations ▪ Data Quality is a key core component to a Data Governance process to address business compliance, risk and data management requirements and standards. ▪ What’s included: ▪ Published Trillium Discovery APIs with full documentation ▪ All features available in the Discovery Center UI available via API: data source metadata, profiling results, data quality rules & rule results, join and dependency analysis ▪ Standard GET/POST functions using JSON ▪ Filter & sort rows, select columns of interest, and page result sets Syncsort Confidential and Proprietary - do not copy or distribute 19
  • 20.
    Trillium Software System GovernanceIntegration – Functional Architecture TS Discovery 15.7 Automated data profiling and discovery that identifies data quality issues, facilitates business rule management, and provides data quality metrics Trillium Clients Control Center (rich) Discovery Center (browser) Administration Center (browser) Trillium Repository Export data grid rows to file Data extracts .csv ODBC Reporting Adapter (SQL) REST API’s REST API’s Data interchange .json GET/POST Data extracts .csv Import/export rule sets Rule set packages .xml Excel Queries Application integration BI & reporting tools 20
  • 21.
    Data Governance integrationwith Collibra ▪ Trillium Discovery and Collibra DGC: ▪ Bi-directional linkage of Collibra data quality rules with Trillium Discovery business rules ▪ Packaged workflow can run OOTB ▪ Develop and modify in Collibra and transfer those rules to Discovery to apply appropriate syntax and connect to data sources ▪ Collibra data quality rules become available in Trillium Discovery ▪ Automatically deliver results from associated data sources to Collibra as Data QualityMetrics 21
  • 22.
    Use Case: DataGovernance Policy Management Trillium Discovery Converts DGC rules into technically executable data quality rules Constantly runs data quality metrics on near real-time basis Closing the loop with DGC… If Data Quality metrics fall below defined thresholds, Collibra users are alerted via their dashboards Data stewards can review in Trillium Discovery and take corrective actions Bi-directional connectivity to constantly sync: ▪ Collibra rulebooks and Discovery Center rules ▪ Results of Discovery quality tests Collibra DGC Lets non-technical users define business policies and data rules in plain language 22
  • 23.
    Trillium Discovery-Collibra Integration– Functional Architecture Collibra DGC Trillium DiscoveryCollibra Connect DQ MetricDQ Rule 1. Create the ‘semantic’ Data Quality Rule in the Rulebook 2. Optionally edit the Predicate (expression) and Threshold 5. Review results for all associated Data Quality Metrics 3. Edit the DQ Rule Expression and associate to 1 or more data sources 4. Run the ‘technical’ Data Quality Rule and generate Passing/Failing counts • Ongoing bi-directional polling or scheduling • Linked on Collibra domain & object identifiers Syncsort Confidential and Proprietary - do not copy or distribute 23
  • 24.
    Data Governance integrationwith ASG ▪ Trillium Discovery and ASG Enterprise Data Intelligence: ▪ Uni-directional delivery of Trillium Discovery profiling results ▪ ASG scanner runs OOTB ▪ End-to-End data transparency using data lineage/data relationship graphs (ASG) with data quality metrics (Trillium Discovery) through each transformation ▪ Visual evidence that data processing is in fact taking place where & when intended, with no unexpected results/impairment to data quality levels through a process flow 24
  • 25.
    Use Case: DataGovernance Compliance Tracing 25 Validity Sun 05/01/2016 12:00:00 PM MDT Threshold: 100 Pass: 96 Dimensions: Accuracy Completeness Sun 05/01/2016 12:00:00 PM MDT Threshold: 98 Pass: 96 Dimensions: Completeness ASG Enterprise Data intelligence Lets users trace data quality issues with critical data elements through the data lineage graph to find where & when the issue appeared
  • 26.
    TRILLIUM QUALITY FORDYNAMICS CRM New Features! Syncsort Confidential and Proprietary - do not copy or distribute 26
  • 27.
    Trillium Quality forDynamics CRM Benefits Rapid time to value – can be operational within a day High-quality data is constantly maintained exactly where it is needed the most - directly within MS Dynamics CRM SolutionKey Challenges Organizations invest in CRM to drive insight for sales, customer support in order to provide better service and increase revenue. Duplicates, old data, incomplete data, misspellings, data in wrong fields, ... Poor data quality is a primary reason for 40% of all business initiatives failing to achieve their target* Trillium Quality for Dynamics CRM Real-time cleansing & matching right in MS Dynamics CRM Comprehensive batch cleansing of the entire CRM database (including resolution from prior data migrations) Leads Analysis tool for insight into the value of prospect lists before loading into CRM 27*source: Gartner Research
  • 28.
    New v15.7 MSDynamics CRM 2.3 – New Key Features ▪ Seamless Data Quality integration directly in the Dynamics CRM environment enabling evaluation, cleansing, de-duplication, and merging of global customer and leads records both on premise and in the Cloud. ▪ What’s included: ▪ Enable Cross-Entity Matching for Leads ▪ Match Leads to Contacts for Newly Entered Leads ▪ Match Link Table for Leads shows Matches to both Leads and Contacts ▪ Enable Merge Between Leads and Contacts ▪ Enable Editing of Web Resource XML (Deployment Manager) ▪ New Leads Analysis tool – Examine Leads and Directly; Import only qualified, new records into CRM ▪ End User Master Batch Availability for cleansing full CRM database ▪ Microsoft Dynamics 365 support ▪ Deploy Global Data Quality solution, including email/phone verification through via Trillium Precise Syncsort Confidential and Proprietary - do not copy or distribute 28
  • 29.
    Trillium for MSDynamics CRM v2.3 – Online Process Integration directly in Dynamics CRM screens ▪ Enter Contacts or Leads as normal ▪ Popup validation of Address (optionally email & phone) ▪ Match to existing Contacts and Leads (cross-entity) with cross-population of validated fields Syncsort Confidential and Proprietary - do not copy or distribute 29
  • 30.
    Trillium for MSDynamics CRM v2.3 – Batch Process Integration into Batch Upload ▪ Use the Trillium web interface to analyze a lead list against the CRM instance to determine new or existing leads ▪ Establish valid patterns for matches ▪ Confirm the data prior to import Syncsort Confidential and Proprietary - do not copy or distribute 30
  • 31.
    TRILLIUM SOFTWARE SYSTEM PostalMaintenance & Real-time Integrations Syncsort Confidential and Proprietary - do not copy or distribute 31
  • 32.
    Trillium Quality PostalDownload Web Service ▪ A web-based application to download, activate, and update all postal and geocoding directories where TSS administrators may manage licensing, view status and perform maintenance on directories as scheduled or required, such as monthly or quarterly, to assure accurate processing against freshest possible data with minimal downtime. ▪ What’s included: ▪ Support for ASCII directories for TSS Control Center and 32-bit processing ▪ Support for UTF-8 directories for 64-bit processing with EDQ ▪ Support for Trillium Cloud to perform updates in AWS ▪ Interface to view/examine current directory status on a per country or per directory file basis ▪ User defined 1-Step or 2-Step process for directory Update and Activation through the User Interface ▪ Implementation via a User Interface or an Automated Script run on a user-defined schedule ▪ User-defined transfer processing speed - via number of workers, delay, buffer-size ▪ Enabled via a secure postal access key Syncsort Confidential and Proprietary - do not copy or distribute 32
  • 33.
    Trillium Quality PostalUpdate – Functional Process Readily managed Postal Table update process ▪ Managed through the the TSI Web Server Administration browser UI ▪ Set configuration for the Postal Download Service ▪ Can choose to use 1- or 2-step process ▪ Check on Postal Directory Status and see what is out-of-date ▪ Review Postal License Management and request new countries ▪ Track updates to the Postal Directories as they happen Syncsort Confidential and Proprietary - do not copy or distribute 33
  • 34.
    TSS v15.7 Real-timeREST Web Services ▪ New Web Service methods and samples are available for real-time data cleansing and matching​ through the Trillium Server Interface (TSI) for application integration ▪ What’s included: ▪ Trillium Server Interface supports the industry standard REST web services requests (in addition to SOAP) with full JSON support ▪ Adds REST real-time Cleanse and Match (both Reference and Window match) support via Apache Tomcat​ ▪ Includes Software Development Kit (SDK) for REST and SOAP web services in support of both Java and .Net C# (including sample files to facilitate upgrading from the Director)​ ▪ Incorporates SSL (Secure Socket Layer) implementation to ensure secure data transfer with TSI Web Services ​ Syncsort Confidential and Proprietary - do not copy or distribute 34
  • 35.
    TRILLIUM CLOUD All featuresavailable on-premise or in Cloud Syncsort Confidential and Proprietary - do not copy or distribute 35
  • 36.
    Trillium Cloud ▪ EntireTrillium product portfolio is available via the Cloud ▪ Cloud based solutions licensed on a ‘subscription basis’ ▪ Complete Infrastructure & Data Center Facilities ▪ Program or Project Management at a Technical Level ▪ Technical Operations and Monitoring of Infrastructure / Solution ▪ Trillium Cloud Solution Benefits: ▪ No Long Term Capital Investment ▪ Faster ROI ▪ Removal of Technical Complexity 36 36
  • 37.
    Questions and NextSteps ▪ For more information on Trillium Software and our data quality solutions, please visit: www.trilliumsoftware.com/products ▪ For the latest Trillium Software release, please visit our Customer Portal or contact us at: www.trilliumsoftware.com/contact-us ▪ Contact Info: Harald Smith, Director of Product Management, Syncsort Harald.Smith@trilliumsoftware.com https://www.linkedin.com/in/harald-smith-71028b twitter: @haraldsmith1 37
  • 38.