Do You Trust Your
Machine Learning
Outcomes?
How to improve trust in advanced
analytics, AI, and machine learning
Dr. Tendü Yoğurtçu | Chief Technology Officer, Precisely
Housekeeping
Webinar Audio
• Today’s webinar audio is streamed through your computer
speakers
• If you need technical assistance with the web interface
or audio, please reach out to us using the Q&A box
Questions Welcome
• Submit your questions at any time during the presentation
using the Q&A box
Recording and slides
• This webinar is being recorded. You will receive an email
following the webinar with a link to the recording and slides
Agenda
• Trends in data and the growth of AI
• Common industry use cases for
machine learning & data challenges
• Real-world stories of ML success
• Strategies for improving trust in ML
outcomes
Today’s Speaker
Tendü Yoğurtçu, PhD
Chief Technology Officer, Precisely
Source: IDC, Worldwide Global DataSphere Forecast, 2020–2024
The rising tide of data
HYBRID CLOUD
ARTIFICIAL INTELLIGENCE
DATA GOVERNANCE
Data created in 2019
New data created in 2019 in real time
Data will be created by 2024
45Zb
19%
143Zb
61% New data created on endpoints
New data created in the cloud
20%
of the 45 Zb is generated by replication
and distribution, creating data liabilities
88%
DATA STREAMING
Why AI and ML?
AUTOMATE
• Automate workflows,
common processes,
and decision making
SCALE
Scale processing
across massive
volumes of data
PREDICT
Predict outcomes and
recommend actions to
support business
planning
COMPETE
Obtain competitive
advantage through
greater insight and
operational efficiency
ML can also be applied to improve the accuracy and consistency
of data you use for business processes.
5
Hybrid Cloud
68%
of organizations said
disparate data
negatively impacted
their organization2
Streaming
92%
of firms agree
they need to
increase use of
outside data5
Location
47%
of newly created
data records have
at least one
critical error3
AI
54%
of enterprises
challenged by lack
of data location
intelligence4
of CEOs are concerned about the integrity
of the data they’re basing decisions on1
Sources: 1. Forbes, 2. Data Trends Survey 2019, 3. Harvard Business Review, 4. IDC, 5. Forrester
84%
Real-world machine
learning stories
Real-world example
Business challenge: could not
capitalize on demographic and
experience trends
Technical challenge: data
scientists spent weeks on getting
clean, consolidated data to feed
AI initiatives
Solution: Using ML powered
entity resolution led to more
accurate results in less than 4
hours rather than 4+ weeks
Insurance and ML
8
Business challenges
• Making pricing policy decisions
• Analyzing risk
• Assessing business impact as a
catastrophe develops
• Optimally allocating resources
after an event occurs
• Growing business through
highly-targeted marketing
programs for new and existing
policyholders
Data challenges
• Entity resolution at scale
• Lack of access to siloed data
• Inconsistency of data across
multiple sources
• Freshness of third-party data
for understanding risks
associated with weather and
natural disasters
Business Challenge: More quickly
and accurately predict a
property's market value
Technical Challenge: Joining
thousands of variables from
disparate sources and ensuring
data accuracy & consistency for
predictive ML models
Solution: Cloud-native location
intelligence with curated datasets
reduced time to build trusted
data from 13+ hours to 3.2 hours
Real-world example
Banking & loans and ML
9
Business challenges
• Reducing risk by understanding
variables that most impact
home valuation
• Informing loan activity by
producing scores for mortgage
bankers
• Making intelligent, risk-based
decision using standardized
location information
• Growing new business and
expanding current business
with highly-targeted marketing
programs
Data challenges
• Incomplete data
• Verifying accuracy &
standardizing the data
• Linking 3rd-party data to
customer reference sets
• Marrying location information
from multiple sources; e.g.,
satellite, drone map/plot info
Business challenge: analyze
global business trends to help
investors make sound decisions
Technical challenge: data
scientists needed to accurately
join datasets from various sources
to feed trusted data into ML
models
Solution: 30 data scientists in AI
lab geocoded and enriched their
data with PreciselyID and Points
of Interest to improve trust in the
models they were building
Real-world example
Financial services and ML
10
Business challenges
• Processing millions of data
points for risk & AML analysis
• Improving the accuracy of real-
time approvals & reducing the
number of false rejections
• Increasing profitability by
mining customer data for better
insights
• Helping investors make sound
decisions by analyzing global
business trends
Data challenges
• Standardizing data coming
from different sources
• Verifying accuracy of the data
• Feeding data to ML models
with maximum accuracy and
consistency
• Enriching in-house data with
accurate third-party data to
feed models and provide lift
Business challenge: rising
marketing costs and a poor
customer experience due to
duplicate customer records
Technical challenge: data was
siloed, and duplicate data records
prevented single view of customer.
Solution: deployed a context
graph and ML-powered
Customer 360 solution for a
trusted, unified view of its
customers; reduced deduplication
time from 3 hours to under 5 mins
Real-world example
Retail and ML
11
Business challenges
• Understanding consumer patterns
• Predicting retail growth at scale
• Delivering a personalized
customer experience that
maximizes customer loyalty
• Performing site planning
Data challenges
• Siloed data
• Data standardization and
validation
• Duplicate customer information
across CRM and ERP systems –
and time required to de-dup
large quantities of data
• Obtaining a single view of a
customer’s data
Improving trust in your
ML outcomes
Improve trust in your data
to improve trust in your ML outcomes
INTEGRATE
Break down data silos
to bring all your
enterprise data to your
ML models
VERIFY
Ensure the data used
to build, train, & feed
ML models is
accurate & consistent
LOCATE
Apply the consistent
element of location to
organize, manage, &
enrich your data for
greater insights
ENRICH
Enrich your data with
expertly curated, up-to-
date consumer insights,
business, and
demographic
information
Trust your data. Build your possibilities.
13
The Precisely Data Integrity Suite
• Delivers the essential elements of data integrity –
accuracy, consistency, and context
• Built on data integration, data quality, location
intelligence, and data enrichment trusted by over
12,000 enterprise customers
• Modular architecture allows you to choose just the
capabilities you need – and implement them
alongside your current infrastructure at scale
• Empowers faster, confident decision-making
with trusted data
Data
Integration
Data
Enrichment
Location
Intelligence
Data
Quality
Questions
Learn more at
precisely.com/data-integrity
 Do You Trust Your Machine Learning Outcomes?

Do You Trust Your Machine Learning Outcomes?

  • 1.
    Do You TrustYour Machine Learning Outcomes? How to improve trust in advanced analytics, AI, and machine learning Dr. Tendü Yoğurtçu | Chief Technology Officer, Precisely
  • 2.
    Housekeeping Webinar Audio • Today’swebinar audio is streamed through your computer speakers • If you need technical assistance with the web interface or audio, please reach out to us using the Q&A box Questions Welcome • Submit your questions at any time during the presentation using the Q&A box Recording and slides • This webinar is being recorded. You will receive an email following the webinar with a link to the recording and slides
  • 3.
    Agenda • Trends indata and the growth of AI • Common industry use cases for machine learning & data challenges • Real-world stories of ML success • Strategies for improving trust in ML outcomes Today’s Speaker Tendü Yoğurtçu, PhD Chief Technology Officer, Precisely
  • 4.
    Source: IDC, WorldwideGlobal DataSphere Forecast, 2020–2024 The rising tide of data HYBRID CLOUD ARTIFICIAL INTELLIGENCE DATA GOVERNANCE Data created in 2019 New data created in 2019 in real time Data will be created by 2024 45Zb 19% 143Zb 61% New data created on endpoints New data created in the cloud 20% of the 45 Zb is generated by replication and distribution, creating data liabilities 88% DATA STREAMING
  • 5.
    Why AI andML? AUTOMATE • Automate workflows, common processes, and decision making SCALE Scale processing across massive volumes of data PREDICT Predict outcomes and recommend actions to support business planning COMPETE Obtain competitive advantage through greater insight and operational efficiency ML can also be applied to improve the accuracy and consistency of data you use for business processes. 5
  • 6.
    Hybrid Cloud 68% of organizationssaid disparate data negatively impacted their organization2 Streaming 92% of firms agree they need to increase use of outside data5 Location 47% of newly created data records have at least one critical error3 AI 54% of enterprises challenged by lack of data location intelligence4 of CEOs are concerned about the integrity of the data they’re basing decisions on1 Sources: 1. Forbes, 2. Data Trends Survey 2019, 3. Harvard Business Review, 4. IDC, 5. Forrester 84%
  • 7.
  • 8.
    Real-world example Business challenge:could not capitalize on demographic and experience trends Technical challenge: data scientists spent weeks on getting clean, consolidated data to feed AI initiatives Solution: Using ML powered entity resolution led to more accurate results in less than 4 hours rather than 4+ weeks Insurance and ML 8 Business challenges • Making pricing policy decisions • Analyzing risk • Assessing business impact as a catastrophe develops • Optimally allocating resources after an event occurs • Growing business through highly-targeted marketing programs for new and existing policyholders Data challenges • Entity resolution at scale • Lack of access to siloed data • Inconsistency of data across multiple sources • Freshness of third-party data for understanding risks associated with weather and natural disasters
  • 9.
    Business Challenge: Morequickly and accurately predict a property's market value Technical Challenge: Joining thousands of variables from disparate sources and ensuring data accuracy & consistency for predictive ML models Solution: Cloud-native location intelligence with curated datasets reduced time to build trusted data from 13+ hours to 3.2 hours Real-world example Banking & loans and ML 9 Business challenges • Reducing risk by understanding variables that most impact home valuation • Informing loan activity by producing scores for mortgage bankers • Making intelligent, risk-based decision using standardized location information • Growing new business and expanding current business with highly-targeted marketing programs Data challenges • Incomplete data • Verifying accuracy & standardizing the data • Linking 3rd-party data to customer reference sets • Marrying location information from multiple sources; e.g., satellite, drone map/plot info
  • 10.
    Business challenge: analyze globalbusiness trends to help investors make sound decisions Technical challenge: data scientists needed to accurately join datasets from various sources to feed trusted data into ML models Solution: 30 data scientists in AI lab geocoded and enriched their data with PreciselyID and Points of Interest to improve trust in the models they were building Real-world example Financial services and ML 10 Business challenges • Processing millions of data points for risk & AML analysis • Improving the accuracy of real- time approvals & reducing the number of false rejections • Increasing profitability by mining customer data for better insights • Helping investors make sound decisions by analyzing global business trends Data challenges • Standardizing data coming from different sources • Verifying accuracy of the data • Feeding data to ML models with maximum accuracy and consistency • Enriching in-house data with accurate third-party data to feed models and provide lift
  • 11.
    Business challenge: rising marketingcosts and a poor customer experience due to duplicate customer records Technical challenge: data was siloed, and duplicate data records prevented single view of customer. Solution: deployed a context graph and ML-powered Customer 360 solution for a trusted, unified view of its customers; reduced deduplication time from 3 hours to under 5 mins Real-world example Retail and ML 11 Business challenges • Understanding consumer patterns • Predicting retail growth at scale • Delivering a personalized customer experience that maximizes customer loyalty • Performing site planning Data challenges • Siloed data • Data standardization and validation • Duplicate customer information across CRM and ERP systems – and time required to de-dup large quantities of data • Obtaining a single view of a customer’s data
  • 12.
    Improving trust inyour ML outcomes
  • 13.
    Improve trust inyour data to improve trust in your ML outcomes INTEGRATE Break down data silos to bring all your enterprise data to your ML models VERIFY Ensure the data used to build, train, & feed ML models is accurate & consistent LOCATE Apply the consistent element of location to organize, manage, & enrich your data for greater insights ENRICH Enrich your data with expertly curated, up-to- date consumer insights, business, and demographic information Trust your data. Build your possibilities. 13
  • 14.
    The Precisely DataIntegrity Suite • Delivers the essential elements of data integrity – accuracy, consistency, and context • Built on data integration, data quality, location intelligence, and data enrichment trusted by over 12,000 enterprise customers • Modular architecture allows you to choose just the capabilities you need – and implement them alongside your current infrastructure at scale • Empowers faster, confident decision-making with trusted data Data Integration Data Enrichment Location Intelligence Data Quality
  • 15.

Editor's Notes

  • #5 Users of real-time and streaming data architectures increasingly realize that real-time data quality is an operational concern
  • #6 Need to automate decision making Need to scale Need to predict Need to plan Need a competitive advantage Rise of the use of AI to improve existing data pipelines and processes – such as smart rules, automatic data classification, and intelligent automated rule application Rise of Data Quality for AI and the emergence of MLOps - the need for “good data” instead of just “big data”
  • #7 Call out data challenges associated with each of these statistics: Cannot build real-time data pipelines to feed business applications and analytics Time consuming and manual effort to standardize, verify, and validate data across entities  Difficult to make addresses data fit for purpose – this requires significant expertise, time, and resources Manually tracking and incorporating up-to-date location, business, and demographic information
  • #9 Leverage hyper-accurate geocoding to inform pricing policy decisions and risk This can speak to risk that exists due to a policy location and variables that could cause a claim such as flooding, hurricanes, or wildfires But is can also speak to adjacent risk. Understanding if there is a nearby business that could cause a problem (what if they are near a fireworks store?), or if you have too many policies located in a single building, such as a high-rise building, which could cause a large loss on many policies if an event happened, such as a fire.
  • #10 Business Challenge: improve speed and accuracy of valuation models by joining thousands of variables to effectively predict a property’s market value. Technical Challenge: Connecting volumes of data from disparate sources and ensuring a consistent and accurate approach to feed trusted data into ML models Solution: Deployed cloud-native location intelligence with expertly curated datasets to connect and build trusted data feeding ML to predict property market values. Resulted in building trusted data in 3.2 hours (and getting faster all the time!) reduced from 13+ hours Right data is connected to the right property, trusted data Other say they do that but don’t do well, false positives – we do better matching, accurate location, make it easier to use with preciselyID and super fast When I say, “build trusted data,” I am referring to the processing to join/bring the data together that will be used to feed the product value ML model.
  • #11 Unlike the traditional methods, ML can analyze significant volumes of personal information to reduce their risk.
  • #12 AI workflows can analyze data sources like consumer mobility and purchase pattern
  • #14 Location: not just about enriching, we enrich it correctly so people do not get false positives, false information Importance of data being done correctly, building trust is critical
  • #15 And that is why Precisely has introduced the Precisely Data Integrity Suite. It delivers the essential elements of data integrity – accuracy, consistency, and content – to give your business the confidence to make better, faster decisions based on trusted data. Built on proven data integration, data quality, location intelligence, and data enrichment capabilities trusted by more than 12,000 global organizations, the Precisely Data Integrity Suite delivers unmatched value for any data integrity initiative. And with a modular architecture, you can pick just the capabilities you need, implement them alongside your current infrastructure, and add-on new capabilities as your needs grow.