Smarter businesses apply AI to learn and continuously evolve the way they work. To extract full value from AI, companies need data strategy that gives them access to all their data – no matter where it lives – in an environment that easily scales and applies the latest discovery technology including advanced analytics, visualization and AI. Learn how IBM Watson and Data provides all the tools companies need to embed AI, machine learning and deep learning in their business, while enabling professionals to gain the most from their data to drive smarter business and lead industry-changing transformations.
1. Consume your data for AI
The role of modern Asset Catalogs
Jay Limburn
Distinguished Engineer and Director of Offering Management
IBM Watson Data & AI
@jaylimburn
2. “Two guys in a Starbucks can have access to the
same computing power as a Fortune 500 company.”
Jim Deters, Co-Founder and CEO, Galvanize
Digital businesses are disrupting ALL industries and professions
72% are
vulnerable to
disruption within 3
years
OVER 90% of Business
Execs see the need for
this transformation
ONLY 14% believe they
have the ability to be able
to ACT on data quickly
Source: FROM DATA TO DISRUPTION: INNOVATION THROUGH DIGITAL INTELLIGENCE"
IBM-sponsored report by Harvard Business Review Analytic Services, 2016"
5. Watson is AI for Smarter Business
Continuous Learning
Compliance
Assist
Customer
Care
Expert
Assist
Watson
Assistant for
Industry
Watson
Cybersecurity
Compare &
Comply
Voice of the
Customer
Watson ApplicationsWatson Business Solution
ISV & third party apps
Search & Find
Relevant Data
Connect & Access
Data
Prepare Data
(Ingest, Curate,
& Enrich)
Build & Train
AI Models
Deploy
AI Models
Monitor, Analyze,
Manage
Watson Machine Learning and Deep Learning as a Service
Watson Knowledge Catalog
Active Policy
Enforcement
AI Powered Catalog Search
and Social Collaboration
Data/Knowledge KitsIntelligent AI
Asset Catalog
Model Governance,
Traceability & Lineage
Watson Studio
Watson APIs
5"
6. Watson Studio & Watson Knowledge Catalog
Supporting the end-to-end AI workflow
Prepare Data !
for Analysis!
Build and Train
ML/DL Models! Deploy Models!
Monitor, Analyze
and Manage!
Search and Find
Relevant Data !
Connect & !
Access Data!
Connect and
discover content
from multiple data
sources in the cloud
or on premises.
Bring structured
and unstructured
data to one toolkit.
Clean and prepare your
data with Data
Refinery, a tool to
create data
preparation pipelines
visually.
Use popular open
source libraries to
prepare unstructured
data.
Democratize the
creation of ML and DL
models. Design your AI
models
programmatically or
visually with the most
popular open source
and IBM ML/DL
frameworks or
leverage transfer
learning on pre-
trained models using
Watson tools to adapt
to your business
domain. Train at scale
on GPUs and
distributed compute
Deploy your models
easily and have them
scale automatically for
online, batch or
streaming use cases
Monitor the
performance of the
models in production
and trigger automatic
retraining and
redeployment of
models. Build
Enterprise Trust with
Bias Detection,
Mitigation Model
Robustness and
Testing Service Model
Security.
Find data (structured,
unstructured) and AI
assets (e.g., ML/DL
models, notebooks,
Watson Data Kits) in
the Knowledge
Catalog with
intelligent search and
giving the right access
to the right users.
7. are unable to
collaborate on
common data
80%
say fragmented
data gets in
the way
84%
require faster
data and AI and
analytics
to compete
9 10more
than
Out
of
Source: FROM DATA TO DISRUPTION: INNOVATION THROUGH DIGITAL INTELLIGENCE
IBM-sponsored report by Harvard Business Review Analytic Services, 2016
Knowledge Workers have data issues
of their time is spent searching for data!80%
8. The Data Lake fallacy
Enterprise Data Lakes are not
delivering on their promise
• Inability to easily find data
• Lack of trust in how the data will be
used
• Obstacles to finding and sharing
• Too difficult to ingest sources
• Ever increasing cost 60
75
80
30
15
17
10
10
3
LOB Knowledge Workers
Business Analysts
Data Scientists
% Time spent working with data
Finding Data Using Data Sharing Data
Users spent significantly more time finding the correct data, rather then extracting value from it.
“Through 2018, 90% of deployed data lakes will be useless
as they are overwhelmed with information assets captured
for uncertain use cases.” - Gartner
A 10% increase in data accessibility will result in more
than $65 million additional net income for the typical
Fortune 1000 company. - Baseline
9. Organizations are racing to unleash the power of data and apply AI
Information Architecture
Machine Learning
Deep Learning & AI
Data Lakes
Metadata Management Asset Catalogs
Data Catalogs
Collect, Understand and Govern Share, Activate and Curate
10. ActivateCatalogDiscover
10Powered by Watson Data & AI
An intelligent asset catalog for a 360 degree view of your data and AI
Intelligent discovery of data
and AI assets with advanced
classification and profiling to
provide context
Ø Intelligent data classification and
profiling that determines what the
data is and how it should be used
Ø Quickly build a 360 view of all assets
and provide them for AI and Analytics
Ø Crawlers to auto discover usage
information of data to understand
how data is used
A rich metadata index of all data
and AI assets with social
collaboration and enhanced
findability
Powerful governance tools to
control and protect access to
data with visibility to data use
Ø Business Glossary to define business
terms and map them to technical
assets
Ø Active Policy Engine to author,
activate and enforce business policies
and rules
Ø Governance and Insights dashboards
to understand how data is used and
how the governance program is
impacting it
Ø A business friendly shopping portal for
your enterprise data
Ø Integrated with other platform
solutions to facilitate self service
analytics and AI
Ø Access controls and security
Ø Seamlessly integrated for productive
use with Data prep, movement,
dashboarding, Machine learning and
Data science
Watson Knowledge Catalog
Unlock tribal knowledge and unleash your data science projects
12. A 360 view of all information to fuel data innovation and AI
Cloud data | On prem data | Data we own | Data we don’t | Structured | Unstructured
Deep Learning
Machine Learning
Business Analytics
& Real time Analytics
Make your data workers more effective, super charging your data science and AI
Machine Learning
Metadata Index
UIs Collaboration
Enforcement Monitoring
Knowledge Catalog
APIs
Classification&NLU
Classification&NLUData lake
Data warehouse
Cloud Sources
On-premise sources
Open data
Social data
Sensor data
Dark data
13. Watson Knowledge Catalog
Differentiating Capabilities
• IBM is the only partner that does not require
data movement & storage in our cloud.
• Providing choice & flexibility for data location
gives the opportunity for highly regulated
industries to benefit from IBM Cloud and
Watson.
Your data, Your way
• Watson Knowledge Catalog supports all
assets… not just data. Think dashboards,
data science & machine learning models,
connections, notebooks, etc. – All available
in one easy to find self service experience.
Focus on Enabling AI
• Watson Knowledge Catalog is fully
integrated with Watson Studio meaning that
once assets have been discovered users can
easily drive productive use through
integrated tools for data shaping, data
movement, data science, AI and machine
learning.
Seamless integration for Productive
Use
• Watson Knowledge Catalog contains an
advanced policy enforcement engine to
establish categories, policies & rules on data
assets.
• Activation of these policies means data is
masked or protected on the fly at the point of
access
Modern Policy Activation
• To be able to truly ensure your Data Science
teams have a full view of data it has to
include structured and unstructured data.
• Watson Knowledge Catalog uses Watson
Natural Language processing to extract and
understand the value locked away in
unstructured and structured data and
presents it uniformly for consumption.
Structured & Unstructured
• ‘Watson Recommends’ is our AI powered
recommendation engine that analyzes the
digital exhaust of the system and
relationships of the assets to provide
recommendations on assets for use.
• Integrated with the social collaboration
capabilities of the catalog to further fuel the
recommendation engine.
AI powered recommendations
13
14. Watson Studio
Tools for supporting the end-to-end AI workflow
Model Lifecycle Management
Machine Learning Runtimes Deep Learning Runtimes
Authoring Tools
Cloud Infrastructure as a Service
• Most popular open source frameworks
• IBM best-in-class frameworks
• Create, collaborate, deploy, and monitor
• Best of breed open source & IBM tools
• Code (R, Python or Scala) and no-code/visual
modeling tools
• Fully managed service
• Container-based resource management
• Elastic pay as you go cpu/gpu power
15. 3
Data Refinery
Making data fit for use
Self-service data refinement and cleaning Comprehensive profiling
Interactive visualization Scheduling and monitoring
Data Refinery
Self service data prep
16. Watson Recommends – AI powered suggestions
Use AI to improve your AI
• Catalog extended to
support all AI assets
not just data assets.
• AI based engine to
suggest the best
assets for each user
based upon digital
exhaust.
• Uses AI to self learn
and improve over time
to guide users to
previous dark data that
would have been lost
• Search results automatically push the most
relevant items to the top of the list
• New AI based
Recommended assets list
• New most popular assets
list
Coming Soon
• Understand and track
model lineage – Know
which data was use to train
which models and when
17. Social Collaboration Features
Curation for everyone! crowd sourced model training
• Empower all users to
determine the value of
all assets across the
business
• Social interactions with
data feed into Watson
Recommends
algorithms to improve
over time
• Users of the system
improve it over time
• Provide comments and
further enrich the
metadata that
describes your key
assets
• Most liked assets available to
drive people to the most popular
assets
• Comment and collaborate on
datasets to train the Watson
recommends service and aid
findability and understanding of
assets
18. Support for Unstructured Data
Tearing down the barriers between structured and unstructured content to power AI
• Extract key entities
from unstructured
content and catalog
alongside structured
content opening up all
assets for self service
• Leverage Watson NLU to
identify the content and
context of unstructured
data assets to fuel your
Data Science and AI
practices
• Integrated document
previewers
• Categorize and
understand where you
key business entities
exist across all assets
• Integrated unstructured document viewers
• Natural Language Understanding extracts the
key concepts from the document to be used
in analytics and ML.
19. Intelligent Data Masking
Automatically mask sensitive data elements through policy activation
• Extensions to the policy
activation engine to
mask or anonymize data
at the point of
consumption
• Provides a personalized
extract of data based
upon company and
regulatory policies
allowing previously
unavailable assets to be
available for AI, Data
Science and Analytics
• Embedded within
Catalog and Refinery to
ensure extracts contain
anonymized data
• Author data anonymization rules
natively within the Policy Manager
• Select from 2 options to
anonymize data:
• Substitute
• Mask
• Apply data anonymization at the
point of use within the system
ensuring that users get value from
the information they are
permitted to see
20. Data Quality
Build a trusted currency for data
• ML based data quality to
inform users quickly if an
asset is relevant for their
purpose.
• Optimized for large data
set to calculate 8
different quality
dimensions
• Aggregate score used to
provide trust in quality of
data in a standardized
form across the
enterprise
• Provides drift analysis to
capture and track how
quality of the assets
evolve over time.
• Detailed breakdown to further
inform the user which aspects
contributed to overall score.
• Auto classification confidence
scores and frequency analysis
provides further detailed
information on asset relevance
• Trusted currency calculated to
inform users at a glance if this
asset is useful.
22. IBM’s Hybrid capabilities
IBM Watson Studio
IBM Data Refinery
Data Preparation, Integration
IBM Watson
Knowledge Catalog
Jupyter Notebooks, RStudio, Watson ML,
Visual Recognition
IBM Unified
Governance
Information Governance Catalog
IGC DataSets Compatibility"
Business & technical metadata
IGC-Catalog Sync"
Bi-directional
1
2
Enterprise Data, External
Data & Feeds
Enterprise Data, External
Data & Feeds
Data
Engineer
Compliance
Officer
CDO
Data Scientist
Citizen Data
Scientist/Analyst
App
Developer
IBM Industry Models
3rd Party metadata systems
Collect, Understand and Govern Share, Activate, Curate
IBM Cloud Private IBM Cloud
IBM DSX
23. Summary
23
Have you
found what
you are
looking for
yet?
Digital disruption means AI is paramount to
remaining competitive
AI requires understanding and context of ever
growing volumes of data and models
Self Service is key to unleashing the data
science and analytics teams to innovate and
power the AI needed to compete
An intelligent catalog provides the first step
towards the AI journey
And its really easy to get started today…….