2. FUTUREWEI TECHNOLOGIES, INCFUTUREWEI TECHNOLOGIES, INC Page 2
The Enterprise Uptake of AI
• AI augmentation will recover 6.2 billion hours of
worker productivity in 2021 (Gartner)
• AI will contribute $15.7 trillion to global economy
in 2030 and increase GDP by 14% (PWC)
• More than 80% of organizations view AI as a
strategic opportunity (MIT Sloan)
The Promise
• Only a quarter of survey respondents have revenue-
bearing AI projects in production (O’Reilly)
• A majority of companies use less than one-tenth of their
digital budget towards AI (McKinsey)
• By 2022, just 15% of use cases leveraging AI and
involving edge or IoT will be successful (Gartner)
The Reality
Surprise!
ML is not a major challenge
in AI adoption
3. FUTUREWEI TECHNOLOGIES, INC PageFUTUREWEI TECHNOLOGIES, INC Page 3
Challenges in AI Adoption
▪ Silos
• Functional silos hinder e2e integration
• Project silos result in duplicate efforts
▪ Skills
• Data scientists for building models
• Engineering skills in data, SW, and IT
▪ Infrastructure
• Disparate environments and tools
• Hybrid multicloud is the norm
▪ Data
• 4 V’s: volume, velocity, variety, veracity
• Data integration and wrangling
▪ Trust
• Regulation compliance
• AI fairness and explainability
Source: Enterprise Strategy Group
4. FUTUREWEI TECHNOLOGIES, INCFUTUREWEI TECHNOLOGIES, INC Page 4
“Garbage In, Garbage Out” [George Fuechsel]
The Issue of Data Quality
■ Data quality is a key challenge in AI
› Only 3% of Companies’ data meets basic quality standards (Harvard Business Review)
› On average, 47% of newly-created data records have at least one critical error (HBR)
› The estimated financial impact of poor data quality is $15M a year on average (Gartner)
› 60% of decision makers cite data quality as their top challenge when trying to deliver AI capabilities (Forrester)
■ Data quality problems may arise throughout the data pipeline
Missing values
Invalid values
Inaccurate or uncertain data
Duplicate or inconsistent records
Hard deletes
…
Source Problems
Uncoordinated schema change
Change in data meaning
Time zone inconsistency
Stale data
Unavailable data sources
…
Ingestion Problems
Discrepancy in data models
Contradicting data values
Differences in semantics
Mismatched records
Biased data
…
Integration Problems
5. FUTUREWEI TECHNOLOGIES, INC PageFUTUREWEI TECHNOLOGIES, INC Page 5
“It Is in the Pipeline” [Idiom]
The Issue of Insight Generation
KNN
Naïve Bayse
Decision Tree
SVM
Combined Prediction
Prediction
Video Analysis
Language
Identification
Image
Classification
French
Translation
Chinese
Translation
Social Media Processing
Image Processing
Cropping &
Resizing
AlexNet ResNet-50 ResNet-152
■ The need for insight pipelines
› Data pre-processing
› Model ensemble
› Model cascade
› Control logic
› Data post-processing
■ Benefits of a pipeline approach
› Accuracy
› Throughput
› Robustness
› Fairness
› Simplicity
› Cost efficiency
■ Real-time insight against high-speed data streams
6. FUTUREWEI TECHNOLOGIES, INC PageFUTUREWEI TECHNOLOGIES, INC Page 6
“Variety, Multiplicity Are the Two Most Powerful Vehicles of Lust” [M de Sade]
The Issue of Hybrid Multicloud
■ Enterprises are adopting a hybid multicloud strategy
■ AI use cases in hybrid multicloud
› Train with proprietary data on-prem and deploy on public cloud
› Train on public cloud with specialized hardware (GPUs) and deploy on-prem
› Train on public or private cloud and deploy on IoT devices or the edge for privacy and scalability
› Input data distributed across public cloud and different organizations
› Collaborating models deployed across multiple clouds due to data residency or performance constraints
■ Implications: model portability, geofencing, end-to-end management, performance
# of Clouds Average Median
Using 3.4 3.0
Experimenting 1.5 1.0
Total 4.9 4.0
Source: RightScale
7. FUTUREWEI TECHNOLOGIES, INC PageFUTUREWEI TECHNOLOGIES, INC Page 7
Provisioning
Rapid spin-up of various environments
Flexible deployment of data and ML tools
Fast-paced algorithmic experimentation
Heterogeneity
Variety in data and ML frameworks
Disparate and distributed data sources
Hybrid multicloud
Performance
Application-specific SLOs
Real-time live data streams
Processing and transfer of big data
Cost
Resource over-provisioning
Sticky resource allocation
Dependability on IT team
Infrastructure
“Automation Is Good, If You Know Where to Put the Machine” [E Goldratt]
The Issue of AI Infrastructure
8. FUTUREWEI TECHNOLOGIES, INCFUTUREWEI TECHNOLOGIES, INC Page 8
“What Is Difficult Is Being Just” [Victor Hugo]
The Issue of Fairness
■ Fairness is an ever important concern for ML models
› High-stake applications: hiring, lending, recidivism prediction, …
› Biased ML models hurt minority or historically disadvantaged groups
› Model bias arises from bias in training data
■ Fairness is a complex concept
› Many definitions, with very different outcomes
» 21 mathematical definitions by A. Narayanan
› Impossible to satisfy all definitions of fairness at the same time
■ Bias mitigation approaches
› Pre-processing: modify the input data
› In-processing: modify the machine learning algorithm
› Post-processing: modify the output of a model
The Impossibility Theorem: It is impossible to have all of Predictive Rate Parity, False-Positive Rate Balance,
and False Negative Rate Balance except in degenerate cases [Chouldechova 2017; Kleinberg et al. 2017] .
Some definitions of fairness
Unawareness: protected attributes are excluded from
features in training
Individual fairness: similar individuals receive similar
treatments or outcomes
Group fairness: groups defined by protected attributes
receive similar treatments or outcomes
Predictive rate parity: the fraction of correct positive
predictions should be the same across groups
False-positive rate balance: false-positive rates
should be the same across groups
False-negative rate balance: false-negative rates
should be the same across groups
Counterfactual fairness: decision remains the same even if
a protected attribute assumes a counterfactual value
9. FUTUREWEI TECHNOLOGIES, INCFUTUREWEI TECHNOLOGIES, INC Page 9
“Trust Takes Years to Build and Seconds to Break” [Proverb]
The Issue of Explainability
■ Explainability is key to establishing trust in AI
› Help ensure safety in complex and critical tasks
› Discover unintended bias and guard against discrimination
› Identify mismatched objectives to avoid suboptimal decisions
› Gain understanding to improve results
■ Different personas impose different requirements
› Government auditors
› Consumers
› Data subjects
› Domain experts
› Application developers
■ Most model explanations assume a black-box model
Explanations as samples
Samples in terms of prototypes and
criticisms
Explanations as features
Disentangled meaningful features
Self-explaining model
Persona-specific explanations
Intrinsic model
Easy to understand rules
Explanations based on samples
Contrafactual instances
Explanations based on features
Anchor explanations
Contrastive explanations
Surrogate model
Learning a new interpretable model
Understanding Data
Understanding a Model
Local Explanations Global Explanations
Direct
Explanations
Pos hoc
Explanations
10. FUTUREWEI TECHNOLOGIES, INCFUTUREWEI TECHNOLOGIES, INC Page 10
“Continuous Improvement Is Better Than Delayed Perfection” [Mark Twain]
The Issue of CICD
Traditional Applications AI Applications
Development artifacts are just code Development artifacts include code, datasets, and models
Requires software engineering and IT skills Requires data engineering, software engineering, ML and IT skills
Integration triggered by code change Integration triggered by code/data/model change
Integration and delivery through a single CICD process Data, model and insight pipelines are integrated and delivered separately
Artifacts change at relatively low frequency Artifacts change continually due to data and model shifts
Small number of artifact versions that evolve linearly Huge number of nonlinear artifact versions due to model tuning / retraining
Application configurations specified by humans Model parameters obtained from training
Monitoring of application performance Monitoring of pipeline performance, data and model quality
CICD Process
Develop Integrate Deliver Execute
Dev Artifact
Repository
???
11. FUTUREWEI TECHNOLOGIES, INCFUTUREWEI TECHNOLOGIES, INC Page 11
Wenju: A First-Of-A-Kind Enterprise AI Platform
Business
Processes
Data
Pipelines
Model
Pipelines
Insight
Pipelines
Processes for producing
datasets used by machine
learning
Processes for training and
optimizing machine
learning models
Processes for deriving business
insight through analytics and
inference
Infrastructure
Management
Data & Model
Governance
Continuous
Integration &
Delivery
Hybrid
Multicloud
Enablement
Integrated
Platform for Entire
Solution Team
12. FUTUREWEI TECHNOLOGIES, INCFUTUREWEI TECHNOLOGIES, INC Page 12
Infrastructure Management
Pipeline Lifecycle Management
Define Build Test Deploy Operate
Data Pipelines Model Pipelines Insight Pipelines
Catalog Traceability Trust QualityPrivacy
Dataset Repository Model Repository Pipeline Repository
Asset Management
Provision ScaleInstall Optimize Update
Data Integration Tools Machine Learning ToolsData Analytics Tools
Bare Metal Virtual Machines Docker Containers Serverless Functions
On-Prem Private Cloud Public Cloud Edge IoT Devices
Metadata Repository
Service
Management
Tenants
Users
Projects
Security
Resilience
Metering
Monitoring
Portal
Dashboard
Conceptual Architecture
13. FUTUREWEI TECHNOLOGIES, INCFUTUREWEI TECHNOLOGIES, INC Page 13
Wenju’s Responses to Challenges in Enterprise AI
Wenju Features
■ Integrated and consistent experience for the entire solution team
■ Simplified development and operations via policies and templates
■ Turnkey infrastructure provisioning and SLO-driven optimization
■ Enablement of pipeline distribution across hybrid multicloud
■ Real-time prediction pipeline against high-speed data streams
■ Unified management and governance of data and models
Challenges
▪ Silos
• Functional silos hinder e2e integration
• Project silos result in duplicate efforts
▪ Skills
• Data scientists for building models
• Engineering skills in data, SW, and IT
▪ Infrastructure
• Disparate environments and tools
• Hybrid multicloud is the norm
▪ Data
• 4 V’s: volume, velocity, variety, veracity
• Data integration and wrangling
▪ Trust
• Regulation compliance
• AI fairness and explainability