Keynote acm10.14.2017

Data Science: Let’s Cut the Hype & Measure its Enterprise Value
Keynote Talk at the ACM SF Bay Area Chapter’s Data Science Camp by Alo Ghosh
14th October 2017 at the PayPal Town Hall, San Jose, CA 95131.
10/28/2017 1
SYNOPSYS
❖ While we, in Silicon Valley, chase the latest and most complex optimization algorithms to beat patterns out of nonlinear, non-
convex, multi-layered neural networks at our hallowed cathedral of Deep Learning (DL), enterprises that create the economic
value in our society are struggling with “elementary” issues mired in buzzwords like Data Lakes and Data Wrangling/Munging.
❖ Enterprises large and small are also facing a crippling shortage of well-trained Data Analysis and Machine Learning (ML) talent
amidst a storm of hype, FOMO and soothsayers that push them to invest heavily on this new incarnation of Data Science. Data
analysts, trained in wrangling tools and business requirements, need to be deployed as armies that replace old BI-DW hands.
❖ Enterprise (internal) data are predominantly unstructured, and these data islands remain invisible to outsiders. IBM Watson’s
strategy is to be the integrator of these datasets that others have far less access to But Watson’s “cognitive computing "model
remains error prone and faces competition by Microsoft & Google. Meanwhile, ML-DL’s impact on enterprise data is minimal.
❖ DL is edging out classical ML and becoming the ‘king of Data Science’. But DL today has severe limitations that will morph over
time. Till then, remember: (a) it is yet another pattern recognition tool - it just works very well with big data and GPU’s; (b) its
learning is restricted to its data set only; (c) causality hinges on counterfactual tests; (d) its impact on enterprise is unproven.
❖ Many great minds over past decades have created business-economics MODELS like game theory, option theory, capital asset
pricing, oligopolistic/monopolistic strategies, business valuation, business risk management, supply chain optimization, asset-
liability management, derivatives based hedging, et al. Enterprise ML-DL must blend these MODELS with the DATA they mine.
If “thought is a dance of vectors”, let’s first collect-prep-train-visualize the data to be vectorized, then predict the future by running
trained models in the context of rigorous business-economics rules, and finally test for causality using myriad counterfactuals.

10/28/2017 2
TODAY’S PAUCITY OF WELL TRAINED ML-DL TALENT IS DUE TO VC FUNDING , ACQUI-HIRING & ACADEMIC-POACHING
SOLUTION: LARGE SCALE EDUCATION & TRAINING, WELL BEYOND ACADEMIA
“To Find AI Engineers, Google and Facebook Hire Their Professors”. BI.
“Universities’ AI Talent Poached by Tech Giants”. WSJ.
“As Silicon Valley fights for talent, universities struggle to hold on to their stars”. Economist.
“The War For Talent Is Over, And Everyone Lost”.
Fast Company.
“IBM Predicts Demand For Data Scientists Will
Soar 28% By 2020”

SOLUTION: TEACH STUDENTS WITH COMPUTING MORE MATH-STATS-PROGRAMMING & BUSINESS DOMAIN KNOWLEDGE
10/28/2017 3
TRAINING ML-DL TALENT FOR ENTERPRISE IDEALLY REQUIRES RIGOROUS COMPUTER SCIENCE BS + STRONG MBA
?

WHILE THERE IS NOW A PLETHORA OF HIGH QUALITY ONLINE COURSES, THEY WORK FOR ONLY 10% OF LEARNERS GLOBALLY
SOLUTION: THE WORLD’S BEST AI BRAINS TEACH ONLINE IN HYBRID CLASSROOMS EQUIPPED WITH THE BEST S/W, H/W & YOUNG COACHES
10/28/2017 4
The Flex model lets students move on fluid schedules among learning
activities according to their needs. Online learning is the backbone of
student learning in a Flex model. Teachers provide support and
instruction on a flexible, as-needed basis while students work through
course curriculum and content. This model can give students a high
degree of control over their learning.
Udacity Connect: Accelerated
Blended Learning Comes Home
to California. 17 Aug 2017.
Bootcamp-
level intensity.
without
quitting
your day job.
New York City
Saturdays, 2pm-5pm

DATA PREP COSUMES 80% OF DATA SCIENCE EFFORT BUT RELATED SKILLS ARE HARDLY TAUGHT = LACK OF DATA ANALYSTS
SOLUTION: TEACH DATA PREP USING DATA WRANGLING TOOLS LIKE TRIFACTA & VISUALIZATION TOOLS LIKE TABLEAU, BOTH ON GOOGLE CLOUD PLATFORM
10/28/2017 5

DATA WRANGLING DRIVEN BY ENTERPRISE POLICIES ARE NEEDED TO BEST MANAGE ENTERPRISE ‘DATA LAKES’
SOLUTION: LEVERAGE DATA VUSALIZATION DASHBOARDS WITH KEY ENTERPRISE END USERS TO UNDERSTAND THEIR NEEDS
10/28/2017 6

STRATEGY CONSULTANTS ARE HYPING ML-DL TO ENTERPRISE CLIENTS WHO THEN WANT QUICK SOLUTIONS
OBSERVATION: CLIENTS CLAMOR FOR AI SOLUTIONS IN LIEU OF POWERPOINTS. CONSULTANTS ARE TRYING TO HIRE AI TALENT EN MASSE
10/28/2017 7
“RESHAPING BUSINESS WITH ARTIFICIAL INTELLIGENCE”
“THE AGE OF ANALYTICS:
COMPETING IN A DATA-DRIVEN WORLD”

ENTERPRISE ML-DL MODELING MUST INCLUDE THEIR VAST UNSTRUCTURED DATA TROVES & BUY TRAINED ML-DL MODELS
SOLUTION: WHILE IBM WATSON IS THE KING WITH ITS SLEW OF NLP ALOGOS & THEIR VARIANTS, MICROSOFT & GOOGLE ARE NOW IN THE FRAY
10/28/2017 8
“IBM Watson Is a
technology platform
that uses natural
language processing
and machine learning to
reveal insights from
large amounts of
unstructured data”.
HAS DEEP LEARNING IMPROVED NLP???

AS DEEP LEARNING CONTINUES TO DOMINATE DATA SCIENCE, WE MUST BE COGNIZANT OF ITS CURRENT LIMITATIONS - 1.
OBSERVATIONS: DEEP LEARNING IS SIMPLY VECTORIZATION OF INFORMATION, BRITTLE, NOT GENERALIZABLE & NO HUMAN LEVEL UNDERSTANDING
10/28/2017 9
❖ “In deep learning, everything is a vector, i.e. everything is a point in a geometric space. Model inputs (it could be text, images,
etc) and targets are first "vectorized", i.e. turned into some initial input vector space and target vector space. Each layer in a
deep learning model operates one simple geometric transformation on the data that goes through it. Together, the chain of
layers of the model forms one very complex geometric transformation, broken down into a series of simple ones. This complex
transformation attempts to maps the input space to the target space, one point at a time. This transformation is parametrized
by the weights of the layers, which are iteratively updated based on how well the model is currently performing. A key
characteristic of this geometric transformation is that it must be differentiable, which is required in order for us to be able to
learn its parameters via gradient descent……. That's the magic of deep learning: turning meaning into vectors, into geometric
spaces, then incrementally learning complex geometric transformations that map one space to another. All you need are
spaces of sufficiently high dimensionality in order to capture the full scope of the relationships found in the original data.”
❖ “The space of applications that can be implemented with this simple strategy is nearly infinite. And yet, many more applications
are completely out of reach for current deep learning techniques—even given vast amounts of human-annotated data……
Anything that requires reasoning—like programming, or applying the scientific method—long-term planning, and
algorithmic-like data manipulation, is out of reach for deep learning models, no matter how much data you throw at them.”
❖ “Deep learning models do not have any understanding of their input, at least not in any human sense. Our own
understanding of images, sounds, and language, is grounded in our sensorimotor experience as humans—as embodied
earthly creatures. Machine learning models have no access to such experiences and thus cannot "understand" their inputs in
any human-relatable way…….. They were trained on a different, far narrower, task than the one we wanted to teach them: that
of merely mapping training inputs to training targets, point by point. Show them anything that deviates from their training
data, and they will break in the most absurd ways.” Francois Chollet, Google, creator of Keras.

AS DEEP LEARNING CONTINUES TO DOMINATE DATA SCIENCE, WE MUST BE COGNIZANT OF ITS CURRENT LIMITATIONS – 2.
OBSERVATION: FOR BUSINESS & ECONOMIC APPLICATIONS, WE MUST MOVE FROM CORRELATION (PREDICTION) → CAUSATION (INFERENCE)
10/28/2017 10
❖ Machine learning, data mining, predictive analytics, etc. all use data to predict some
variable as a function of other variable:
o They may or may not care about insight, importance, patterns
o They may or may not care about inference--how Y changes as some X changes
❖ Econometrics: Use statistical methods for prediction, inference, causal modeling of
economic relationships.
o Hope for some sort of insight based on hypotheses, inference is a goal
o In particular, causal inference is the goal for economic decision making
❖ Causality:
o “More police in precincts with higher crime; does that mean that police cause
crime?” Policy decision: should we add more police to a given district?
o “Lots of people die in hospitals; are hospitals bad for your health?” Policy
decision: should I go to hospital for treatment?
o “Advertise more in December, sell more in December.” But what is the causal
impact of ad spending on sales? Policy decision: how much should I spend on
advertising?
❖ Counterfactuals and causality:
o Crime: It is likely that data was generated by a decision rule that said “add
more police to areas with high crime.” This may have reduced crime over
what it would have been, but these area may still have had high crime.
o Hospital. If I go to a hospital, will I be better off than I would have been if I
didn’t go?
o Advertising. What would my sales be if I would have advertised less?
With a few notable exceptions, ML abstracts away
from the data generating mechanism, and hence sees
the data as raw material from which predictions are to
be extracted. ML generally lacks the vocabulary to
capture the distinction between observational data
and randomized data that statistics finds crucial.

❖ While perception tasks such as visual object recognition and text understanding play an important role in human intelligence, the subsequent
tasks that involve inference, reasoning and planning require an even higher level of intelligence. The past few years have seen major advances
in many perception tasks using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian
nature are still more powerful and flexible.
❖ We know that DL models produce black boxes and inexplicability as to why models predict well. These shortcoming can be addressed by invoking
Bayesian neural network models that:
o Capture stochastic processes underlying the observed data sets
o Can use the vast Bayesian statistics knowledge base
o Can be explained by mathematically rigorous theory
o Can be extended in a principled way
o Can be combined with Bayesian models / techniques in a practical way
o Has uncertainty estimates built-in
❖ To achieve integrated intelligence that involves both perception and inference, it is naturally desirable to tightly integrate deep learning and
Bayesian models within a principled probabilistic framework, which we call Bayesian Deep Learning. In this unified framework, the perception of
text or images using deep learning can boost the performance of higher-level inference and in return, the feedback from the inference process is
able to enhance the perception of text or images.
❖ While Bayesian networks have known scalability issues, they can serve the very
important function of blending in business-economics models that are used widely in
enterprise decision making, e.g.:
o Game theory
o Option theory
o Asset pricing theory
o Risk management models
o Supply chain optimization
o Corporate strategy and valuation
TO ENABLE RELEVANT ENTERPRISE ML-DL MODELS WE MUST BLEND ENTERPRISE DATA WITH BUSINESS-ECONOMICS MODELS
OBSERVATION: FOR BUSINESS & ECONOMIC APPLICATIONS, WE MUST MOVE FROM PERCEPTION TO INFERENCE. ONE OPTION IS BAYESIAN DEEP LEARNING
10/28/2017 11
Bayesian optimization is an effective methodology for the
global optimization of functions with expensive evaluations.
It relies on querying a distribution over functions defined by
a relatively cheap surrogate model. An accurate model for
this distribution over functions is critical to the effectiveness
of the Bayesian approach, and is typically fit using Gaussian
processes (GPs). However, since GPs scale cubically with the
number of observations, it has been challenging to handle
objectives whose optimization requires many evaluations,
and as such, massively parallelizing the optimization.

TO ENABLE ENTERPRISE DEEP LEARNING WE MUST BLEND BUSINESS DATA WITH BUSINESS-ECONOMICS MODELS
OBSERVATION: USING GAUSSIAN PROCESSES, WE CAN MODEL THE MANY
APPLICATIONS OF ‘OPTION THEORY’ IN BUSINESS & ECONOMICS
10/28/2017 12

OBSERVATION: ANOTHER FIRMLY ESTABLISHED PARADIGM IN BUSINESS-
ECONOMICS IS ‘GAME THEORY’ & ITS FAMOUS CONCEPT OF ‘NASH
EQUILIBRIUM’. THIS PARADIGM PROVIDES CRUCIAL STRATEGIC DECISION
SUPPORT TO OLIGOPOLISTIC & MONOPOLISTIC ENTERPRISES & TO
GOVERNMENTS THE WORLD OVER, PARTICULALRLY IN PRICING DECISIONS
10/28/2017 13

TO ENABLE RELEVANT ENTERPRISE ML-DL MODELS WE MUST BLEND ENTERPRISE DATA WITH BUSINESS-ECONOMICS MODELS
OBSERVATION: ADVERSARIAL NETWORKS & GAME THEORY
10/28/2017 14
❖ Game Theory reveals a future direction of Deep Learning: This makes intuitive sense for two reasons. The first intuition is that DL systems will
eventually need to tackle situations with imperfect knowledge. In fact, we’ve already seen this in DeepMind’s AlphaGo that uses partial
knowledge to tactically and strategically best the world-best human in the game of Go. The second intuition is that systems will not remain
monolithic as they are now, but rather would involve multiple coordinating (or competing) cliques of DL systems. We actually already do see this
now in the construction of Adversarial Networks.
❖ Adversarial networks consists of competing neural networks, a generator, and discriminator, the former tries to generate fake images while the
later tries to identify real images. The interesting feature of these systems is that a closed form loss function is not required. In fact, some
systems have the surprising capability of discovering its own loss function! A disadvantage of adversarial networks are they are difficult to train.
Adversarial learning consists in finding a Nash equilibrium to a two-player non-cooperative game. Yann LeCun calls adversarial networks “the
coolest idea in machine learning in the last twenty years”.
❖ The classical view of machine learning is that the problem can be cast as an optimization problem where all that is needed are algorithms that
are able to search for an optimal solution. However, with machine learning, we want to build machines that don’t overfit the data but rather is
able to perform well on data that it has yet to encounter. We want these machines to make predictions about the unknown. This requirement,
which is called generalization, is very different from the classical optimization problem. It is very different from the classical dynamics problem
where all information is expected to be available. That is why a lot of the engineering in deep learning requires additional constraints on the
optimization problem. These are called ‘priors’ in some texts and also called regularizations in an optimization problem.
❖ Where do these regularizations come from and how can we select a good regularization? How do we handle impartial information? This is
where a game theoretic viewpoint becomes important. Generalization is sometimes referred to as ‘structural risk minimization’. In other words,
we build mechanisms to handle generalization using strategies similar to how parties mitigate risk. So we have actually returned full circle. Game
theory is described as “the study of mathematical models of conflict and cooperation between intelligent rational decision-makers.” In our quest
of understanding learning machines, we end up with mathematics that was meant for the study of the interactions of intelligent beings.

OBSERVATION: PER STRATEGY CONSULTANTS, THE ULTIMATE DREAM APPLICATION OF AI IN THE ENTERPRISE WORLD IS WHAT
BCG HAS CALLED “THE STRATEGY MACHINE”, WITH AMAZON AS THE SHINING EXAMPLE OF ITS IMPLEMENTATION IN REALITY.
10/28/2017 15

Data Science: Let’s Cut the Hype & Measure its Enterprise Value
Keynote Talk at the ACM SF Bay Area Chapter’s Data Science Camp by Alo Ghosh
14th October 2017 at the PayPal Town Hall, San Jose, CA 95131.
10/28/2017 16
CONCLUSION
❖ LET’S TRAIN MORE DATA ANALYSTS USING DATA WRANGLING & VISUALIZATION TOOLS IN REAL-
LIFE BUSINESS CONTEXTS. THE VALUE DERIVED BY ENTERPRISE CLIENTS WOULD BE IMMEDIATE
& LONGER-TERM BY HELPING POINT TO AREAS WHERE MACHINE LEARNING CAN BE REALLY
USEFUL. THIS IS ALL EXTREMELY CONTEXT-SPECIFIC & CANNOT BE GENERALIZED TO THEORY.
❖ LET’S ARM OURSELVES WITH THE BEST AI TOOLS & PLATFORMS THAT CAN ALSO HANDLE THE
PROFUSION OF UNSTRUCTURD DATA IN EVERY ENTERPRISE, ALONG WITH FULL KNOWLEDGE
OF THEIR POWER & LIMITATIONS TO SOLVE REAL-LIFE PROBLEMS IN BUSINESS CONTEXTS.
❖ LET’S BE PRINCIPALLY DATA-DRIVEN IN OUR ANALYTIC ENDEAVORS BUT ALSO PAY ATTENTION
TO TIME TESTED & NOBEL PRIZE WINNING MODELS IN ECONOMICS & FINANCE THAT
HAVE COME TO GUIDE GENERATIONS OF ENTERPRISE IN VALUE CREATION.
❖ THEN WE CAN LEVERAGE THE WIDELY ACCEPTED VALUATION TOOLS TO MEASURE
THE CONTINUING VALUE-ADD OF OUR DATA SCIENCE PROJECTS & VENTURES.

Keynote acm10.14.2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Keynote acm10.14.2017

Similar to Keynote acm10.14.2017 (20)

Recently uploaded

Recently uploaded (20)

Keynote acm10.14.2017