SlideShare a Scribd company logo
1 of 65
Machine Learning
in Banks
By,
Abhishek Upadhyay
Will Machine replace You? Simple Answer is “NO”
Machine
Learning:
Intersection of
programming,
statistics, and
data
Machine that can think and work like
humans, but free from human errors and
biases, can harness the big data and high
computational power, runs on highly
sophisticated algorithms is the next frontier
in technological arms race
Machine Learning: Approach
• “Machine learning is a method of data analysis that automates analytical model
building. Using algorithms that iteratively learn from data, machine learning
allows computers to find hidden insights without being explicitly programmed
where to look.” – SAS
• Let machine learn how best to achieve the desired output. Processes data in a
way that machine can detect and learn from patterns, predict future activity,
and make decisions. Data is fuel: More the data, better the learning.
18th century also saw one of the most influential data
visualization (a map) of all time, produced by John
Snow. John Snow disapproved the then dominant
miasma theory that stated that diseases such as
cholera were caused by noxious bad air.
In 1854, London, Cholera strike killed many, then, John
Snow collected data of all deaths, analysed it, and
plotted every death on map. Later, John Snow used
dot map to illustrate the cluster of cholera cases
around the pump. The map shows the location of the
13 public wells in the area and 578 cholera deaths
mapped by home address, around Soho district. It was
evident from map that cases were clustered around
the pump in Broad (Broadwick) street. Map was part
of the detailed statistical analysis done by Snow. Snow
interpreted and theorized that cholera was spread
through the contaminated water.
Quality Data and Powerful visualisation results in right
interpretation
And finally intervention: A new sewer system.
How to measure Data Quality
• 5 R’s of Data Quality
• Relevancy: Is the data collected is relevant to the problem we are
solving?
• Recency: How recent was the data generated?
• Range: How narrow or wide is the scope of data?
• Robustness: What is signal to noise ratio?
• Reliability: How accurate is the data that we are working with?
• Ask Questions:
• What is my target data quality?
• What if only 50% of data is available and How sensitive is analysis to
missing data?
• What signal will be used to understand the quality of data?
• Ephemeral vs Durable
• Noise vs Bias
Machine Learning
Introduction
Machine Learning
• Machine Learning is a branch of artificial intelligence that allows computer
systems to learn directly from examples, data, and experience.
• Through enabling computers to perform specific tasks intelligently, ML systems
can carry out complex processes by learning from data, rather than following pre-
programmed rules.
• Machine Learning relies on data and does different type of analysis:
• Descriptive (data mining) - Analyse the data for creating approach for future such as
detecting anomalies in current data based on past data;
• Predictive (forecasting) – Turn data into valuable information for prediction such as when
problem will occur;
• Prescriptive (optimization) – synthesize big data and suggest decisions.
• One of the most important paper on ML is written by Pedro Domingos in 2012 in
which he summarized the key lessons of Machine Learning
Machine Learning: Predictive Model Building
• Data collection (predictors and response)
• Exploratory data analysis (EDA)
• Data Preprocessing
• Assigning numerical values to categorical data
• Handling missing values
• Normalizing the features (so that features on small scales do not dominate when fitting a model to the data).
• Algorithm selection
• Model selection
Parametrized mapping (function or process) between the data domain and the response set, which learn the
characteristics of a system from the input (labelled) data. Modelling may have several algorithms to derive a
model; however, the term algorithm here refers to a learning algorithm. The learning algorithm is used to train,
validate, and test the model using a given data set to find an optimal value for the parameters, validate it, and
evaluate its performance.
• Model evaluation
• AUC
The Learners
Thousands of ML algorithms exist and hundreds get added with time (Domingos, 2012). A learner
can be mix of learning algorithms. Algorithms can be grouped or understood of classified in two
ways: One way is the learning style and other is similarity in function. There are different ways an
algorithm can model a problem based on its interaction with the input data and it is popular to first
consider the learning styles that an algorithm can adopt (Brownlee, 2013):
• Supervised learning, system is trained with data that is labelled, for example, machine
trained with normal or suspicious labelled previous credit card transactions.
• Unsupervised learning, the objective is to identify or infer a function to describe a hidden
structure or similarity of patterns in unlabelled data, such as finding clusters in graph or
which products customer can buy next.
• Semi-supervised learning, Input data is mixture of labelled and unlabelled data.
• Reinforcement learning, focuses learning from experience in which machine learns the
consequences of its decisions, such as which moves lead to win in game. In IT operations,
reinforcement learning enables a self-healing system that learns what actions need to do to
recover from an incident, increase data flows, and optimize operations (Kaluza, 2015)
Machine Learning – The time is now
• The recent success of AI has got lot of media attention because of the immersive experience
that users are seeing and experiencing and many believe soon existing software’s, trading
algorithms, fund managers, and banking systems may be replaced by AI.
• Machine could be customer service executive, banker, and next Warren Buffet.
• ML – a narrow Artificial Intelligence – is booming and media attention is all time high right now.
• “The future is already here – It’s just not evenly distributed” – William Gibson
• “Machine intelligence is the last invention that humanity will ever need to make.” (Bostrom,
2015)
• Either integrate slowly or adopt rapidly, Banks have to do it, Banks definitely need a roadmap
Challenges
In adoption of Machine Learning
Disruptive
Technology
• The difference in the technological frames of
managers and technologists, Complacency
with existing systems, and Organisational
Agility to adopt new technology?
• Established firms are more inclined to invest
in sustaining technologies (Christensen, C.
M., 1997)
Technology
Competence
• Managers understanding of ML model and
assuming ownership of it?
• Decision makers have different level of
understanding of data: many find data
mysterious, confusing, and distant
• Interpretability of ML models?
Trust
• Analytical processes hidden between Layers – Will
they be accepted by Regulators?
• What’s risk?
• Will data sharing lead to privacy breach?
• How model is making decision and what is
accuracy?
• Auditing?
• Costs vs Benefits?
Culture
Skills
Data & Information Silos
• A data silo is a repository of data that remains under the control of one
department and is isolated from the rest of the organization.
• Distributed data is the biggest challenge in realising the potential of data.
• Traditionally software applications are written at one point in time and
optimised for their main business function creating data silos.
• Political groups within organisation are suspicious of others wanting their
data
• Banks have lived through multiple leaders, mergers & acquisitions, long
existing divisions, legacy IT systems, and philosophies, resulting in
distributed and incompatible systems.
• Vendor lock-in, software vendors are good in making customers transition
to another product tough by making hard the data extraction & migration
hard from their product.
Overcome
the Challenges
Promote favourable patterns
• Articulate interactions between managers and technologists for common
interpretation of technology
• Organizational Agility is collective of members agility and creativeness
• “Design strategies that enhance the ability of humans to understand AI systems
and decisions (such as explicitly explaining those decisions), and to participate in
their use, may help build trust and prevent drastic failures” – Stanford University
research report
• Open up the discussion across bank, create boundaries, simulate attractors,
encourage dissent and diversity, manage starting conditions and monitor for
emergence (of patterns), and brainswarming
• Duplicate data, log data, and present it when needed
• Embrace opportunity with right mindset and tools
• Robust IT architecture to support changes
Visualize the
data
Visualize the
ML Model
and decision
making
Visualize the IT
Rabobank has invested in a 3D model of its IT
landscape mapping departments and IT systems: two
models, one representing current and one
representing the future (target), after 3 months of
gathering information. Rabobank has now also built
the 3D model in virtual reality, in order to add an
extra layer to the experience. There are also plans to
build a virtual model using hologram techniques.
Current situation in image, departments on top,
systems below, connections in between.
Data Governance
• The upgraded enterprise data governance should focus on unifying the
people, processes, data, and technologies. New age data governance
aimed at ending the bureaucracy within Banks will organize the free flow of
information but at the same time maintain data quality and deal with
security and regulatory challenges.
• For years IT professionals convinced business leaders that it is technical
thing and they should control the data governance decisions. But,
pragmatic and ease to use comes to fore as business leaders from different
verticals realize their teams can make numerous decisions using powerful
tools but still controlled within rules, laws, regulations, and ethics.
• Data Governance 2.0 is about an agile approach to data governance
focussed on just enough controls for managing risks, enabling more
insightful use of data.
Approaches
• Data Governance is an iterative cycle with need focussed on master
data management and business case
• The elements of strategy are both defence (SSOT) and offense
(MVOT) (source: HBR)
• Options:
• Cold Storage
• Data Lake
• Cloud Data warehouse
• Logical Data Warehouse
• Specialised tool for use case basis
Vendors
• COLLIBRA is recognised as leader in Data Governance platforms by Gartner and Forrester
• INFORMATICA and IBM have build Data Governance platforms for long time
• TRIFACTA try to solve the data preparation problem with a belief that people who know
data should be able to wrangle the data themselves. Trifacta sits between the data
storage and ML models and provides many facilities such as Connectivity, Metadata
management, Processing, Intelligence, Wrangling, Transformation, and Publishing.
Trifacta is a data wrangling platform that helps business analysts to intuitively wrangle
data themselves without relying on IT engineers (trifacta, 2017)
• PAXATA is reliable platform for solving the data preparation as the platform beings
together multi-structured data from diverse sources (paxata, 2017)
• Many Banks while adopting Machine Learning will start small – use case by use case - but
without complete analysis of the data governance they may either face scalability
problems in later stages and they will run ML models in silos. Banks can opt for building
a unified architecture for data governance that is available to all and it is scalable.
DATABRICKS is a unified analytics platform. (databricks, 2017)
Machine in Context
• Machines are strategic assets.
• The ML programmers have to guide the exploration of data in ways
that support the goals of organisation; they need to put hat of
strategic analysis not just data analysis
• ML programmers have to be aware of other parts of company, know
problems of company in depth, and be more valuable
• ML programmers should know the broad context in which to put
machine learning.
• As a manager, ask yourself a question, “How would you describe the
role of machine in your organisation?”
Where to use Machine Learning
finding new use cases of Machine Learning
Finding use case for Machine Learning – Find business problem
The business problem can of simple or complicated and if Machine Learning
can help solve the problem can be decided by asking questions:
• Q: Think of one or more business problems of your Bank that are un-solved or can be
solved better? Problems that are:
• Complicated – Not straight-forward – Can’t be solved by standard software or automation –
OR not clear pre-defined sequence of steps
• A learning from data is required such as prediction – more than casual inference – need to
know certain aspects of data related to each other
• Problem is sufficiently self-contained, relatively insulated from outside influences
Business problem should have all three characteristics to be a potential problem to be solved by
Machine Learning.
One knowing that problem fits in ML domain, further two important questions to
answer are:
• Q: Whether right data exists for the problem? Where does it comes from? Is data
feed for machine sufficient to solve the problem?
• Q: Which ML method makes more sense to the problem?
Does business problem matched ML canonical problem
Canonical Problem Description Question
Classification When to choose in which category data belongs or
predict the category of data. Categories can be
only two (Yes or No, Fraud or Not Fraud) in case of
binary classification or more in case of multi-class
classification.
To which category this data point
belongs?
Regression When to predict a value such as stock price. Given the input from dataset, what is
likely value of particular problem?
Clustering When the data is unlabelled and complex,
organising it to look simpler, then unsupervised
learning is called clustering such as finding clusters
in network
Which data points are similar to form
cluster?
Dimensionality
reduction
When dataset has huge number of features or
variables or columns then select most effective
features so that implemented ML model is simple,
faster, and reliable.
What are the most significant
features of this data and how can
these be summarised?
Semi-supervised
learning
When dataset is combination of labelled and
unlabelled.
Anomaly Detection:
When to identify data points that are simply
unusual such as detecting the breach or fraud. The
training dataset will be small and possible
variations are numerous. Anomaly Detection
learns what is normal activity (on small training
data and in real time i.e. online) and identifies
anything significantly different.
How can detection be developed
with small training data set?
Reinforcement learning The learning algorithm takes action for each data
point and receives reward if decision was good.
Algorithm modifies its strategy to gain maximum
rewards, for example in Gaming and Robotics
What actions will achieve most
effectively desired output?
The next table, taken from Royal
Society report, classifies the problems
that ML solves; therefore, ML
classification is done based on the
problems.
Canonical problems are the
fundamental problems that ML seeks
to solve.
If identified Business Problem is one of
the Canonical Problem, then business
problem can be solved by Machine
Learning.
Multiple algorithms exist for each
Canonical problem and Banks will have
to experiment with them regarding
which one suits best. Experienced ML
programmers and data scientist can
tell more easily about right ML
algorithm or combination of ML
algorithms.
ML Canonical Problem and examples
Canonical Problem Applications
Classification Face recognition: Identify people by their face for identity verification and surveillance
Image recognition: Picture is of car or not car to be used during insurance claims.
Fraud detection: Based on customer features predict if customer can repay loan
Classification: Customer sentiment is positive or negative for new product
Regression Financial forecasting: What stock price will be in future
Click rate prediction: Probability of customer clicking ad
Pricing: House price prediction
Future performance: Include financial data, news, sentiments, social media, and press releases to predict
future of companies.
Clustering Document modelling: Find patterns and structures in documents
Network analysis: Find clusters in network
E-commerce: Customers with similar interests
Dimensionality
reduction
Data mining: Find best performing features from high dimension data
E-commerce: Which features summarize our targeted customer
Semi-supervised
learning
Anomaly Detection: Cybersecurity, Financial Terrorism
Reinforcement
learning
Trading
Gaming: AlphaGo
Use Cases
Existing few use cases of machine learning
Machine Learning Use Cases
• ZOPA
• HashtingsDIRECT – Insurance Company
• Fraud prevention in PayPal – Payments
• Numerai - Hedge Fund relies on an anonymous army of coders
• Neurensic - Policing the Stock Market
• Google Cloud ML Engine
• Machine Learning in Insurance
• Data Governance using Machine Learning
• Reducing the credit Gap
ZOPA – peer-to-peer lending platform
ZOPA uses ML in everything it does: Credit Risk, Fraud, ID verification, Document tampering, Pricing, and Customer
Segmentation.
For Credit Risk, Journey of ML: Gathering the Dataset, Defining the target, Feature optimisation, ML model building,
and Model stacking.
Dataset for Credit Risk comes from Credit agencies – Mortgages, credit cards, accounts, payments, and so on – more
than 3000 features for 100s thousands of borrowers. Firstly, the dataset is cleaned – handling null and rare values
and converting category values into numbers – and then normalization. 3000 features are reduced to tens using
feature selection (dimensionality reduction) for easier, faster, and more reliable implementation. Features are
selected based on importance. Based on features selected, ML model predicts (binary classification) is customer can
repay the loan or not. For prediction, ZOPA, experimented many ML algorithms found that combination of Neural
networks and GBT worked best.
HashtingsDIRECT – Insurance Company
• Business Problem is to predict:
• How many accidents customer is likely to have?
• How much accidents are going to cost?
• The datasets are imbalanced: Many customers don’t have
accidents and many accidents are not expensive.
• Data comes from several sources (3rd parties and customers),
at different quality, and different stages (claims, location,
vehicle, and government)
• Challenges in insurance industry are: managing quotes, legacy
IT, long feedback loops (customer can claim after years), and
conservative industry (not allowed to use few fairly predictive
features such as Gender).
• A combination of supervised and unsupervised learning is
done.
• ML can’t run on current legacy IT, so, IT will be upgraded
Fraud prevention in PayPal – Payments
• PayPal uses Oracle for online transaction but for offline processing PayPal uses Hadoop.
• PayPal has 15000 features engineered over years in repository to build ML models over
it.
• ML models are applied at transaction level (when users make payment). More
sophisticated models are applied after transaction completes.
• Fraud prevention is a supervised learning problem and PayPal uses Neural Networks (as
algorithm).
• Current ML model had AUC of 0.96, but PayPal decided to improve it by improve
labelling.
• Labelling in PayPal was human driven and PayPal tested if it can be improved using ML.
• PayPal used ML Algorithm - Active Learning (see Notes) - over heuristic user labelling to
improve the labelling quality that in turn improves the data that is feed into Supervised
ML models.
• Finally improving AUC from 0.96 to 0.979. Previously it used to take a week for labelling
but with use of Active Learning labelling effort has come down to 30 minutes.
(See Notes)
Fraud prevention in PayPal – Payments
Numerai - Hedge Fund relies on an anonymous army of coders
• Entirely anonymous (if they want) 7500 developers receive trading data to make
forecast using ML. If predictions are useful, they get paid in cryptocurrency.
• Since industry has been under fire for being overpriced and underperforming, ML
can munch extra data that humans and regular algorithms couldn’t make sense.
• Numer.ai wants to transition the negative competition between massive hedge
funds into highly valuable collaborative space to create the first hedge fund with
network effect.
• Numerai brings crowd intelligence – harness people who don’t want to take it as
a day job but they are good in data handling, AI, and statistics, and they don’t
have resources to start a hedge fund - capturing the intelligence of crowd in
which any one can participate and bring it to stock market.
• It’s a math problem and one doesn’t need to know any finance but can share the
his or her data science skills with Numerai and Numerai will pay by making
money in stock market. Another benefit of ML that Numerai is utilizing is many
ML models can run on encrypted data.
Numerai - Hedge Fund relies on
an anonymous army of coders
• Numerai had a notion of meta model - one big benefit of ML comes from
combining different ML models such as one model perfectly predicts only
utility stocks and another model perfectly predicts bank stocks and meta
model can be created combining both models.
• Every data scientist on Numerai is solving the same problem using the
same underlying features. But every data scientist approaches the problem in
their own unique way. With many different solutions to the same problem,
Numerai is able to combine each model into a meta model just like Random
Forests combines decision trees into a forest.
• Logloss error of the best Numerai data scientists — Numerai’s meta
model has lower error than any individual model.
Neurensic - Policing the Stock Market
• Economy worldwide is priced on capital market and Most of the trading is algorithmic
but surveillance has been manual for a very long time.
• Billions of dollars are spent by financial institutions on trade surveillance alone. Trade
surveillance is a trillion-dollar problem in billion-dollar industry that is based on 20 years
old technology.
• Neurensic look financial data for illegal activities by applying ML to the market. Data
comes from all kind of exchanges (Chicago, NASDAQ, London, and so on), audit logs,
clearing houses, and bunch of systems. The machine has to be intelligent because data is
big (terabytes scale - trillion rows daily), signal is very small, and algorithms have to be
updated (change) rapidly because of the changes in ways of illegal things.
• Since Neurensic is not a law enforcement organisation, also it can't declare someone's
intent, but it can monitor ones behaviour and with help of past data (activities that have
been investigated or prosecuted before) and domain experts, can give it a risk score, and
bring it in front of compliance officer. Finding the illegal or questionable activity is the
first step and next is to explain why? especially when ML are notorious for being opaque.
Neurensic - Policing
the Stock Market
• Pull the data (audit logs and market data), filter it
down to last step (from billions to 100s), put compelling
visualizations, and explain step by step. Visualisations of
raw data is a key because that is real evidence (not ML
results) - ML runs on heavy processed data but in order to
present findings one needs to go back raw data and
possibly past transactions line by line.
• Visualizations can be graphs, movies - show trades in
real time in slow - tick by tick, and new visuals for new
patterns. Browser and Mobile based visualizations that
can be displayed to CxO's and lawyers. All data science in
Neurensic is in Python running on H2O.ai (premier open
source ML tool with cutting edge ML algorithms) and
within minutes DS algorithm goes from research to
production.
Neurensic - Policing
the Stock Market
• Audit logs (in CSV format) are pulled into H2O,
which is very good in compressing; the data is then
made ready, using ETL – handle missing values,
normalize, uniform mapping for tokens and products
from different sources, and clean-up for ML; Sorted
and each CPU gets equal proportion of data to
process in parallel; Clustered – depending on the ML
model requirements data is segregated and grouped
such as transactions close in time or transactions for
same instrument; define features and run ML models
on clusters in sequence on each CPU; and risks file are
generated based on type of risk. Risk file has lot of
details and a pointer to raw data.
Google Cloud ML Engine
• Organisations moving to cloud and using Google Cloud ML engine will have most of the
work only related to cleansing of data (estimated 90%).
• Currently Google is using ML in almost all of its products
• Google is providing the ML as a service (Application Programming Interface, API)
• Benefit of using Google API’s is that they are already tested on huge data and are in
production for few years so organisations using those API’s can avoid training the ML
• API’s can be used to solve many business problems, such as API’s related to image and
video can be used in Insurance industry (see Notes)
• API’s related to speech, natural language processing (NLP), language translation, and so
on can be used for creating Video assistant, Face Banking, chat bots, or enhance existing
applications
• Many Banking use cases, running on Google Cloud are already using API’s such as Risk
analytics, Regulations, Customer Segmentation, Cross-selling and up-selling, Credit risk,
and fraud detection
(See Notes)
Google Cloud ML Engine
Google has open sourced its ML framework – TensorFlow, which is used extensively inside Google; but for
more managed experience Google Cloud customers can choose “Cloud ML Engine”, so that Client
organisations need not worry about infrastructure at all and focus only on ML model.
Companies can use Google framework to build ML models or they can use the API’s directly.
Machine Learning in Insurance
• “Anything You Can Do, AI Can Do Better” - George Argesanu, Global Head of Analytics from AIG
and Monika Schulze, Global Head of Marketing, Zurich Insurance Company
• Predictive analytics (Canonical problem “Regression”) using ML can be applied in all existing areas
of Insurance: Pricing, Fraud, Claims, Marketing, P&L Analysis, Behavioural Analysis, and
Preventive Insurance.
• As machines think faster and smarter companies will get more accurate pricing, processes will be
efficient, frauds are more likely to be caught, and losses will be more likely prevented.
• Insurance industry is going through disruption and will face new products: Telematics, Self-driving
cars, Internet of Things, and Cybercrimes.
• Insurance industry will have to think of business problems which will come in future and think
how machine learning will help to solve them: How fitness bands and apps change user
behaviour, how premiums are affected, how lives can be saved, and how possessions can be
preserved.
(see Notes)
(See Notes)
Data Governance using Machine Learning
• The problems solved by ML are similar to data governance problems
such as dimension reduction, prediction, and regression.
• The commercial platform from Pedro systems is designed to make
sense of the unstructured data from unstructured documents such as
excel sheet.
• The Pendo data platform indexes the documents, searches data inside
documents and give insights, document’s full lineage, Iterates,
refines, and classifies the data.
• The ML algorithm indexes the documents, classifies the customer
data, and makes it accessible and searchable, full data lineage of
unstructured data
Reduce the Credit Gap - Individuals
• With the advent of mobile devices, internet of things, social media, and mobile payments
in past few years, more features are available for ML models.
• Measuring the financial well-being is a problem of Classification and Dimensionality
reduction.
• Traditional credit scoring system rely on individual traits – age, job, and gender – while
being oblivious to the spatial-temporal mobility patterns and habits of the individuals.
• Inspired by the emerging trend of mobile payment and geo-aware, a ML model build over
human consumption pattern across time and space is able to predict the financial well-
being by 30% to 49% better than comparable demographic models (Singh, et al., 2015).
• Inspired by the biological phenomenon of foraging, a basic pattern of animal movement for
food and resources, human purchase behaviour was studied through three shopping
behavioural traits: Diversity (Exploration), Loyalty (Exploitation), and Regularity (Plasticity).
Behavioural indicators were combined with more financial trouble indicators and ML
model was trained to predict whether customer is likely to have financial trouble or not.
Reduce the Credit Gap - SME
Traditional approaches for measuring financial
well-being such as Earning per asset or Equity
per asset have limited predictability.
One idea is to create a network of merchants
in which vertices represent merchants and
weight of the edge is based on the number of
common customers.
Analysing the network is the canonical
problem of Clustering (under unsupervised
learning) that is solved by ML. Structural
information, community detection, will be
used as indicator of financial well-being of
merchant. The information is used as feature
in constructing the next ML predictive model.
Conclusion
Further work & Next Steps
Data Governance
• Organizations preparing to adopt Machine Learning tomorrow should
start Data Governance today: either for one department or entire
organisation
• Stream data from silos/departments and invest in the use case, tie
the integration to create value – a progressive step to create
integrated platform of enterprise data for advance capabilities
(Wilder-James, 2016)
• To address the issue of data quality, Great Western Bank created a
data committee with members from different teams across
organisation. The team created standard definitions that teams across
bank use. (banktech, 2014)
Next Steps
• Find a business area where data is already robust and plentiful, and
has a business problem that can be solved by machine learning
• Train the managers, move to cloud, make organisation agile, find
business problems, finding use cases with robust data, apply data
governance, and prepare data today for applying Machine Learning
tomorrow
• Embrace the increased competition in data value chain and realize
ecosystem is the new value proposition
• Build platform instead of product
Cautions
• Though guarantees of Machine Learning may be only theoretical, but
things will be done better with machine learning – We are relying on
the promise and power of computers
• It will also be interesting to see how the Machine Learning works
together with some other key developments such as data protection
laws, identity thefts, secure systems, and distributed ledger.
• Expectations from Business leaders and managers is more now on
how they approach this new relation between humans and machines,
and they become more accountable and liable when data is shared
and decisions are taken by machines.
Conclusion
• Its time rethink strategy and make it iterative as machines become
better and better
• One can’t ignore the opportunity that computing power is increasing,
we are collecting more data, storage cost is reducing, and algorithms
are improving
• Not only implement Machine Learning but also interpret the model
• Upgrade the IT architecture and Data Governance practices
Thank You
Abhishek Upadhyay

More Related Content

What's hot

Power BI Architecture
Power BI ArchitecturePower BI Architecture
Power BI ArchitectureArthur Graus
 
Data-Ed Online: Data Management Maturity Model
Data-Ed Online: Data Management Maturity ModelData-Ed Online: Data Management Maturity Model
Data-Ed Online: Data Management Maturity ModelDATAVERSITY
 
Master Data Management - Aligning Data, Process, and Governance
Master Data Management - Aligning Data, Process, and GovernanceMaster Data Management - Aligning Data, Process, and Governance
Master Data Management - Aligning Data, Process, and GovernanceDATAVERSITY
 
Doing Enterprise Architecture
Doing Enterprise ArchitectureDoing Enterprise Architecture
Doing Enterprise ArchitectureJohn Macasio
 
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...DATAVERSITY
 
Data Governance Program Powerpoint Presentation Slides
Data Governance Program Powerpoint Presentation SlidesData Governance Program Powerpoint Presentation Slides
Data Governance Program Powerpoint Presentation SlidesSlideTeam
 
Seven building blocks for MDM
Seven building blocks for MDMSeven building blocks for MDM
Seven building blocks for MDMKousik Mukherjee
 
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...DATAVERSITY
 
Creating Enterprise Value from Business Architecture
Creating Enterprise Value from Business ArchitectureCreating Enterprise Value from Business Architecture
Creating Enterprise Value from Business Architectureiasaglobal
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Power BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual WorkshopPower BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual WorkshopCCG
 
Alignment: Office of the Chief Data Officer & BCBS 239
Alignment: Office of the Chief Data Officer & BCBS 239Alignment: Office of the Chief Data Officer & BCBS 239
Alignment: Office of the Chief Data Officer & BCBS 239Craig Milroy
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data ProductsPeter Skomoroch
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance StrategyAnalytics8
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
Data Marketplace - Rethink the Data
Data Marketplace - Rethink the DataData Marketplace - Rethink the Data
Data Marketplace - Rethink the DataDenodo
 
ValueFlowIT: A new IT Operating Model Emerges
ValueFlowIT: A new IT Operating Model EmergesValueFlowIT: A new IT Operating Model Emerges
ValueFlowIT: A new IT Operating Model EmergesDavid Favelle
 

What's hot (20)

Power BI Architecture
Power BI ArchitecturePower BI Architecture
Power BI Architecture
 
Data-Ed Online: Data Management Maturity Model
Data-Ed Online: Data Management Maturity ModelData-Ed Online: Data Management Maturity Model
Data-Ed Online: Data Management Maturity Model
 
Master Data Management - Aligning Data, Process, and Governance
Master Data Management - Aligning Data, Process, and GovernanceMaster Data Management - Aligning Data, Process, and Governance
Master Data Management - Aligning Data, Process, and Governance
 
Doing Enterprise Architecture
Doing Enterprise ArchitectureDoing Enterprise Architecture
Doing Enterprise Architecture
 
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
 
Data Governance Program Powerpoint Presentation Slides
Data Governance Program Powerpoint Presentation SlidesData Governance Program Powerpoint Presentation Slides
Data Governance Program Powerpoint Presentation Slides
 
Seven building blocks for MDM
Seven building blocks for MDMSeven building blocks for MDM
Seven building blocks for MDM
 
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
 
Creating Enterprise Value from Business Architecture
Creating Enterprise Value from Business ArchitectureCreating Enterprise Value from Business Architecture
Creating Enterprise Value from Business Architecture
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Power BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual WorkshopPower BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual Workshop
 
Sadcw 6e chapter4
Sadcw 6e chapter4Sadcw 6e chapter4
Sadcw 6e chapter4
 
Alignment: Office of the Chief Data Officer & BCBS 239
Alignment: Office of the Chief Data Officer & BCBS 239Alignment: Office of the Chief Data Officer & BCBS 239
Alignment: Office of the Chief Data Officer & BCBS 239
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data Products
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Ebook - The Guide to Master Data Management
Ebook - The Guide to Master Data Management Ebook - The Guide to Master Data Management
Ebook - The Guide to Master Data Management
 
Data Marketplace - Rethink the Data
Data Marketplace - Rethink the DataData Marketplace - Rethink the Data
Data Marketplace - Rethink the Data
 
ValueFlowIT: A new IT Operating Model Emerges
ValueFlowIT: A new IT Operating Model EmergesValueFlowIT: A new IT Operating Model Emerges
ValueFlowIT: A new IT Operating Model Emerges
 
Sadcw 6e chapter3
Sadcw 6e chapter3Sadcw 6e chapter3
Sadcw 6e chapter3
 

Similar to Machine learning in Banks

BIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGBIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGUmair Shafique
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoTShivam Singh
 
Top 10 Trends to Watch for In Data Science
Top 10 Trends to Watch for In Data ScienceTop 10 Trends to Watch for In Data Science
Top 10 Trends to Watch for In Data ScienceEdtech Learning
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
What is Machine Learning.pptx
What is Machine Learning.pptxWhat is Machine Learning.pptx
What is Machine Learning.pptxkprasad8
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseSoftServe
 
Introduction to management information system
Introduction to management information systemIntroduction to management information system
Introduction to management information systemOnline
 
Data analytics career path
Data analytics career pathData analytics career path
Data analytics career pathRubikal
 
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MAHIRA
 
Machine learning by prity mahato
Machine learning by prity mahatoMachine learning by prity mahato
Machine learning by prity mahatoPrity Mahato
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkSaratoga
 
Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020Debmalya Biswas
 
Introduction to data science.pdf
Introduction to data science.pdfIntroduction to data science.pdf
Introduction to data science.pdfalsaid fathy
 
Machine learning is the new BI
Machine learning is the new BIMachine learning is the new BI
Machine learning is the new BICycloides
 

Similar to Machine learning in Banks (20)

BIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGBIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNING
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoT
 
Top 10 Trends to Watch for In Data Science
Top 10 Trends to Watch for In Data ScienceTop 10 Trends to Watch for In Data Science
Top 10 Trends to Watch for In Data Science
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
Machine learning
Machine learningMachine learning
Machine learning
 
What is Machine Learning.pptx
What is Machine Learning.pptxWhat is Machine Learning.pptx
What is Machine Learning.pptx
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to management information system
Introduction to management information systemIntroduction to management information system
Introduction to management information system
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
 
Data analytics career path
Data analytics career pathData analytics career path
Data analytics career path
 
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine learning by prity mahato
Machine learning by prity mahatoMachine learning by prity mahato
Machine learning by prity mahato
 
Machine learning
Machine learningMachine learning
Machine learning
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you Think
 
Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020
 
Introduction to data science.pdf
Introduction to data science.pdfIntroduction to data science.pdf
Introduction to data science.pdf
 
Final Report.pdf
Final Report.pdfFinal Report.pdf
Final Report.pdf
 
Machine learning is the new BI
Machine learning is the new BIMachine learning is the new BI
Machine learning is the new BI
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Machine learning in Banks

  • 2. Will Machine replace You? Simple Answer is “NO”
  • 4. Machine that can think and work like humans, but free from human errors and biases, can harness the big data and high computational power, runs on highly sophisticated algorithms is the next frontier in technological arms race
  • 5. Machine Learning: Approach • “Machine learning is a method of data analysis that automates analytical model building. Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look.” – SAS • Let machine learn how best to achieve the desired output. Processes data in a way that machine can detect and learn from patterns, predict future activity, and make decisions. Data is fuel: More the data, better the learning.
  • 6.
  • 7.
  • 8. 18th century also saw one of the most influential data visualization (a map) of all time, produced by John Snow. John Snow disapproved the then dominant miasma theory that stated that diseases such as cholera were caused by noxious bad air. In 1854, London, Cholera strike killed many, then, John Snow collected data of all deaths, analysed it, and plotted every death on map. Later, John Snow used dot map to illustrate the cluster of cholera cases around the pump. The map shows the location of the 13 public wells in the area and 578 cholera deaths mapped by home address, around Soho district. It was evident from map that cases were clustered around the pump in Broad (Broadwick) street. Map was part of the detailed statistical analysis done by Snow. Snow interpreted and theorized that cholera was spread through the contaminated water. Quality Data and Powerful visualisation results in right interpretation And finally intervention: A new sewer system.
  • 9. How to measure Data Quality • 5 R’s of Data Quality • Relevancy: Is the data collected is relevant to the problem we are solving? • Recency: How recent was the data generated? • Range: How narrow or wide is the scope of data? • Robustness: What is signal to noise ratio? • Reliability: How accurate is the data that we are working with? • Ask Questions: • What is my target data quality? • What if only 50% of data is available and How sensitive is analysis to missing data? • What signal will be used to understand the quality of data? • Ephemeral vs Durable • Noise vs Bias
  • 11.
  • 12.
  • 13.
  • 14. Machine Learning • Machine Learning is a branch of artificial intelligence that allows computer systems to learn directly from examples, data, and experience. • Through enabling computers to perform specific tasks intelligently, ML systems can carry out complex processes by learning from data, rather than following pre- programmed rules. • Machine Learning relies on data and does different type of analysis: • Descriptive (data mining) - Analyse the data for creating approach for future such as detecting anomalies in current data based on past data; • Predictive (forecasting) – Turn data into valuable information for prediction such as when problem will occur; • Prescriptive (optimization) – synthesize big data and suggest decisions. • One of the most important paper on ML is written by Pedro Domingos in 2012 in which he summarized the key lessons of Machine Learning
  • 15. Machine Learning: Predictive Model Building • Data collection (predictors and response) • Exploratory data analysis (EDA) • Data Preprocessing • Assigning numerical values to categorical data • Handling missing values • Normalizing the features (so that features on small scales do not dominate when fitting a model to the data). • Algorithm selection • Model selection Parametrized mapping (function or process) between the data domain and the response set, which learn the characteristics of a system from the input (labelled) data. Modelling may have several algorithms to derive a model; however, the term algorithm here refers to a learning algorithm. The learning algorithm is used to train, validate, and test the model using a given data set to find an optimal value for the parameters, validate it, and evaluate its performance. • Model evaluation • AUC
  • 16.
  • 17. The Learners Thousands of ML algorithms exist and hundreds get added with time (Domingos, 2012). A learner can be mix of learning algorithms. Algorithms can be grouped or understood of classified in two ways: One way is the learning style and other is similarity in function. There are different ways an algorithm can model a problem based on its interaction with the input data and it is popular to first consider the learning styles that an algorithm can adopt (Brownlee, 2013): • Supervised learning, system is trained with data that is labelled, for example, machine trained with normal or suspicious labelled previous credit card transactions. • Unsupervised learning, the objective is to identify or infer a function to describe a hidden structure or similarity of patterns in unlabelled data, such as finding clusters in graph or which products customer can buy next. • Semi-supervised learning, Input data is mixture of labelled and unlabelled data. • Reinforcement learning, focuses learning from experience in which machine learns the consequences of its decisions, such as which moves lead to win in game. In IT operations, reinforcement learning enables a self-healing system that learns what actions need to do to recover from an incident, increase data flows, and optimize operations (Kaluza, 2015)
  • 18. Machine Learning – The time is now • The recent success of AI has got lot of media attention because of the immersive experience that users are seeing and experiencing and many believe soon existing software’s, trading algorithms, fund managers, and banking systems may be replaced by AI. • Machine could be customer service executive, banker, and next Warren Buffet. • ML – a narrow Artificial Intelligence – is booming and media attention is all time high right now. • “The future is already here – It’s just not evenly distributed” – William Gibson • “Machine intelligence is the last invention that humanity will ever need to make.” (Bostrom, 2015) • Either integrate slowly or adopt rapidly, Banks have to do it, Banks definitely need a roadmap
  • 19. Challenges In adoption of Machine Learning
  • 20.
  • 21. Disruptive Technology • The difference in the technological frames of managers and technologists, Complacency with existing systems, and Organisational Agility to adopt new technology? • Established firms are more inclined to invest in sustaining technologies (Christensen, C. M., 1997)
  • 22. Technology Competence • Managers understanding of ML model and assuming ownership of it? • Decision makers have different level of understanding of data: many find data mysterious, confusing, and distant • Interpretability of ML models?
  • 23. Trust • Analytical processes hidden between Layers – Will they be accepted by Regulators? • What’s risk? • Will data sharing lead to privacy breach? • How model is making decision and what is accuracy? • Auditing? • Costs vs Benefits?
  • 26. Data & Information Silos • A data silo is a repository of data that remains under the control of one department and is isolated from the rest of the organization. • Distributed data is the biggest challenge in realising the potential of data. • Traditionally software applications are written at one point in time and optimised for their main business function creating data silos. • Political groups within organisation are suspicious of others wanting their data • Banks have lived through multiple leaders, mergers & acquisitions, long existing divisions, legacy IT systems, and philosophies, resulting in distributed and incompatible systems. • Vendor lock-in, software vendors are good in making customers transition to another product tough by making hard the data extraction & migration hard from their product.
  • 28. Promote favourable patterns • Articulate interactions between managers and technologists for common interpretation of technology • Organizational Agility is collective of members agility and creativeness • “Design strategies that enhance the ability of humans to understand AI systems and decisions (such as explicitly explaining those decisions), and to participate in their use, may help build trust and prevent drastic failures” – Stanford University research report • Open up the discussion across bank, create boundaries, simulate attractors, encourage dissent and diversity, manage starting conditions and monitor for emergence (of patterns), and brainswarming • Duplicate data, log data, and present it when needed • Embrace opportunity with right mindset and tools • Robust IT architecture to support changes
  • 30. Visualize the ML Model and decision making
  • 31. Visualize the IT Rabobank has invested in a 3D model of its IT landscape mapping departments and IT systems: two models, one representing current and one representing the future (target), after 3 months of gathering information. Rabobank has now also built the 3D model in virtual reality, in order to add an extra layer to the experience. There are also plans to build a virtual model using hologram techniques. Current situation in image, departments on top, systems below, connections in between.
  • 32. Data Governance • The upgraded enterprise data governance should focus on unifying the people, processes, data, and technologies. New age data governance aimed at ending the bureaucracy within Banks will organize the free flow of information but at the same time maintain data quality and deal with security and regulatory challenges. • For years IT professionals convinced business leaders that it is technical thing and they should control the data governance decisions. But, pragmatic and ease to use comes to fore as business leaders from different verticals realize their teams can make numerous decisions using powerful tools but still controlled within rules, laws, regulations, and ethics. • Data Governance 2.0 is about an agile approach to data governance focussed on just enough controls for managing risks, enabling more insightful use of data.
  • 33. Approaches • Data Governance is an iterative cycle with need focussed on master data management and business case • The elements of strategy are both defence (SSOT) and offense (MVOT) (source: HBR) • Options: • Cold Storage • Data Lake • Cloud Data warehouse • Logical Data Warehouse • Specialised tool for use case basis
  • 34. Vendors • COLLIBRA is recognised as leader in Data Governance platforms by Gartner and Forrester • INFORMATICA and IBM have build Data Governance platforms for long time • TRIFACTA try to solve the data preparation problem with a belief that people who know data should be able to wrangle the data themselves. Trifacta sits between the data storage and ML models and provides many facilities such as Connectivity, Metadata management, Processing, Intelligence, Wrangling, Transformation, and Publishing. Trifacta is a data wrangling platform that helps business analysts to intuitively wrangle data themselves without relying on IT engineers (trifacta, 2017) • PAXATA is reliable platform for solving the data preparation as the platform beings together multi-structured data from diverse sources (paxata, 2017) • Many Banks while adopting Machine Learning will start small – use case by use case - but without complete analysis of the data governance they may either face scalability problems in later stages and they will run ML models in silos. Banks can opt for building a unified architecture for data governance that is available to all and it is scalable. DATABRICKS is a unified analytics platform. (databricks, 2017)
  • 35. Machine in Context • Machines are strategic assets. • The ML programmers have to guide the exploration of data in ways that support the goals of organisation; they need to put hat of strategic analysis not just data analysis • ML programmers have to be aware of other parts of company, know problems of company in depth, and be more valuable • ML programmers should know the broad context in which to put machine learning. • As a manager, ask yourself a question, “How would you describe the role of machine in your organisation?”
  • 36. Where to use Machine Learning finding new use cases of Machine Learning
  • 37. Finding use case for Machine Learning – Find business problem The business problem can of simple or complicated and if Machine Learning can help solve the problem can be decided by asking questions: • Q: Think of one or more business problems of your Bank that are un-solved or can be solved better? Problems that are: • Complicated – Not straight-forward – Can’t be solved by standard software or automation – OR not clear pre-defined sequence of steps • A learning from data is required such as prediction – more than casual inference – need to know certain aspects of data related to each other • Problem is sufficiently self-contained, relatively insulated from outside influences Business problem should have all three characteristics to be a potential problem to be solved by Machine Learning. One knowing that problem fits in ML domain, further two important questions to answer are: • Q: Whether right data exists for the problem? Where does it comes from? Is data feed for machine sufficient to solve the problem? • Q: Which ML method makes more sense to the problem?
  • 38.
  • 39. Does business problem matched ML canonical problem Canonical Problem Description Question Classification When to choose in which category data belongs or predict the category of data. Categories can be only two (Yes or No, Fraud or Not Fraud) in case of binary classification or more in case of multi-class classification. To which category this data point belongs? Regression When to predict a value such as stock price. Given the input from dataset, what is likely value of particular problem? Clustering When the data is unlabelled and complex, organising it to look simpler, then unsupervised learning is called clustering such as finding clusters in network Which data points are similar to form cluster? Dimensionality reduction When dataset has huge number of features or variables or columns then select most effective features so that implemented ML model is simple, faster, and reliable. What are the most significant features of this data and how can these be summarised? Semi-supervised learning When dataset is combination of labelled and unlabelled. Anomaly Detection: When to identify data points that are simply unusual such as detecting the breach or fraud. The training dataset will be small and possible variations are numerous. Anomaly Detection learns what is normal activity (on small training data and in real time i.e. online) and identifies anything significantly different. How can detection be developed with small training data set? Reinforcement learning The learning algorithm takes action for each data point and receives reward if decision was good. Algorithm modifies its strategy to gain maximum rewards, for example in Gaming and Robotics What actions will achieve most effectively desired output? The next table, taken from Royal Society report, classifies the problems that ML solves; therefore, ML classification is done based on the problems. Canonical problems are the fundamental problems that ML seeks to solve. If identified Business Problem is one of the Canonical Problem, then business problem can be solved by Machine Learning. Multiple algorithms exist for each Canonical problem and Banks will have to experiment with them regarding which one suits best. Experienced ML programmers and data scientist can tell more easily about right ML algorithm or combination of ML algorithms.
  • 40. ML Canonical Problem and examples Canonical Problem Applications Classification Face recognition: Identify people by their face for identity verification and surveillance Image recognition: Picture is of car or not car to be used during insurance claims. Fraud detection: Based on customer features predict if customer can repay loan Classification: Customer sentiment is positive or negative for new product Regression Financial forecasting: What stock price will be in future Click rate prediction: Probability of customer clicking ad Pricing: House price prediction Future performance: Include financial data, news, sentiments, social media, and press releases to predict future of companies. Clustering Document modelling: Find patterns and structures in documents Network analysis: Find clusters in network E-commerce: Customers with similar interests Dimensionality reduction Data mining: Find best performing features from high dimension data E-commerce: Which features summarize our targeted customer Semi-supervised learning Anomaly Detection: Cybersecurity, Financial Terrorism Reinforcement learning Trading Gaming: AlphaGo
  • 41. Use Cases Existing few use cases of machine learning
  • 42. Machine Learning Use Cases • ZOPA • HashtingsDIRECT – Insurance Company • Fraud prevention in PayPal – Payments • Numerai - Hedge Fund relies on an anonymous army of coders • Neurensic - Policing the Stock Market • Google Cloud ML Engine • Machine Learning in Insurance • Data Governance using Machine Learning • Reducing the credit Gap
  • 43. ZOPA – peer-to-peer lending platform ZOPA uses ML in everything it does: Credit Risk, Fraud, ID verification, Document tampering, Pricing, and Customer Segmentation. For Credit Risk, Journey of ML: Gathering the Dataset, Defining the target, Feature optimisation, ML model building, and Model stacking. Dataset for Credit Risk comes from Credit agencies – Mortgages, credit cards, accounts, payments, and so on – more than 3000 features for 100s thousands of borrowers. Firstly, the dataset is cleaned – handling null and rare values and converting category values into numbers – and then normalization. 3000 features are reduced to tens using feature selection (dimensionality reduction) for easier, faster, and more reliable implementation. Features are selected based on importance. Based on features selected, ML model predicts (binary classification) is customer can repay the loan or not. For prediction, ZOPA, experimented many ML algorithms found that combination of Neural networks and GBT worked best.
  • 44. HashtingsDIRECT – Insurance Company • Business Problem is to predict: • How many accidents customer is likely to have? • How much accidents are going to cost? • The datasets are imbalanced: Many customers don’t have accidents and many accidents are not expensive. • Data comes from several sources (3rd parties and customers), at different quality, and different stages (claims, location, vehicle, and government) • Challenges in insurance industry are: managing quotes, legacy IT, long feedback loops (customer can claim after years), and conservative industry (not allowed to use few fairly predictive features such as Gender). • A combination of supervised and unsupervised learning is done. • ML can’t run on current legacy IT, so, IT will be upgraded
  • 45. Fraud prevention in PayPal – Payments • PayPal uses Oracle for online transaction but for offline processing PayPal uses Hadoop. • PayPal has 15000 features engineered over years in repository to build ML models over it. • ML models are applied at transaction level (when users make payment). More sophisticated models are applied after transaction completes. • Fraud prevention is a supervised learning problem and PayPal uses Neural Networks (as algorithm). • Current ML model had AUC of 0.96, but PayPal decided to improve it by improve labelling. • Labelling in PayPal was human driven and PayPal tested if it can be improved using ML. • PayPal used ML Algorithm - Active Learning (see Notes) - over heuristic user labelling to improve the labelling quality that in turn improves the data that is feed into Supervised ML models. • Finally improving AUC from 0.96 to 0.979. Previously it used to take a week for labelling but with use of Active Learning labelling effort has come down to 30 minutes. (See Notes)
  • 46. Fraud prevention in PayPal – Payments
  • 47. Numerai - Hedge Fund relies on an anonymous army of coders • Entirely anonymous (if they want) 7500 developers receive trading data to make forecast using ML. If predictions are useful, they get paid in cryptocurrency. • Since industry has been under fire for being overpriced and underperforming, ML can munch extra data that humans and regular algorithms couldn’t make sense. • Numer.ai wants to transition the negative competition between massive hedge funds into highly valuable collaborative space to create the first hedge fund with network effect. • Numerai brings crowd intelligence – harness people who don’t want to take it as a day job but they are good in data handling, AI, and statistics, and they don’t have resources to start a hedge fund - capturing the intelligence of crowd in which any one can participate and bring it to stock market. • It’s a math problem and one doesn’t need to know any finance but can share the his or her data science skills with Numerai and Numerai will pay by making money in stock market. Another benefit of ML that Numerai is utilizing is many ML models can run on encrypted data.
  • 48. Numerai - Hedge Fund relies on an anonymous army of coders • Numerai had a notion of meta model - one big benefit of ML comes from combining different ML models such as one model perfectly predicts only utility stocks and another model perfectly predicts bank stocks and meta model can be created combining both models. • Every data scientist on Numerai is solving the same problem using the same underlying features. But every data scientist approaches the problem in their own unique way. With many different solutions to the same problem, Numerai is able to combine each model into a meta model just like Random Forests combines decision trees into a forest. • Logloss error of the best Numerai data scientists — Numerai’s meta model has lower error than any individual model.
  • 49. Neurensic - Policing the Stock Market • Economy worldwide is priced on capital market and Most of the trading is algorithmic but surveillance has been manual for a very long time. • Billions of dollars are spent by financial institutions on trade surveillance alone. Trade surveillance is a trillion-dollar problem in billion-dollar industry that is based on 20 years old technology. • Neurensic look financial data for illegal activities by applying ML to the market. Data comes from all kind of exchanges (Chicago, NASDAQ, London, and so on), audit logs, clearing houses, and bunch of systems. The machine has to be intelligent because data is big (terabytes scale - trillion rows daily), signal is very small, and algorithms have to be updated (change) rapidly because of the changes in ways of illegal things. • Since Neurensic is not a law enforcement organisation, also it can't declare someone's intent, but it can monitor ones behaviour and with help of past data (activities that have been investigated or prosecuted before) and domain experts, can give it a risk score, and bring it in front of compliance officer. Finding the illegal or questionable activity is the first step and next is to explain why? especially when ML are notorious for being opaque.
  • 50. Neurensic - Policing the Stock Market • Pull the data (audit logs and market data), filter it down to last step (from billions to 100s), put compelling visualizations, and explain step by step. Visualisations of raw data is a key because that is real evidence (not ML results) - ML runs on heavy processed data but in order to present findings one needs to go back raw data and possibly past transactions line by line. • Visualizations can be graphs, movies - show trades in real time in slow - tick by tick, and new visuals for new patterns. Browser and Mobile based visualizations that can be displayed to CxO's and lawyers. All data science in Neurensic is in Python running on H2O.ai (premier open source ML tool with cutting edge ML algorithms) and within minutes DS algorithm goes from research to production.
  • 51. Neurensic - Policing the Stock Market • Audit logs (in CSV format) are pulled into H2O, which is very good in compressing; the data is then made ready, using ETL – handle missing values, normalize, uniform mapping for tokens and products from different sources, and clean-up for ML; Sorted and each CPU gets equal proportion of data to process in parallel; Clustered – depending on the ML model requirements data is segregated and grouped such as transactions close in time or transactions for same instrument; define features and run ML models on clusters in sequence on each CPU; and risks file are generated based on type of risk. Risk file has lot of details and a pointer to raw data.
  • 52. Google Cloud ML Engine • Organisations moving to cloud and using Google Cloud ML engine will have most of the work only related to cleansing of data (estimated 90%). • Currently Google is using ML in almost all of its products • Google is providing the ML as a service (Application Programming Interface, API) • Benefit of using Google API’s is that they are already tested on huge data and are in production for few years so organisations using those API’s can avoid training the ML • API’s can be used to solve many business problems, such as API’s related to image and video can be used in Insurance industry (see Notes) • API’s related to speech, natural language processing (NLP), language translation, and so on can be used for creating Video assistant, Face Banking, chat bots, or enhance existing applications • Many Banking use cases, running on Google Cloud are already using API’s such as Risk analytics, Regulations, Customer Segmentation, Cross-selling and up-selling, Credit risk, and fraud detection (See Notes)
  • 53. Google Cloud ML Engine Google has open sourced its ML framework – TensorFlow, which is used extensively inside Google; but for more managed experience Google Cloud customers can choose “Cloud ML Engine”, so that Client organisations need not worry about infrastructure at all and focus only on ML model. Companies can use Google framework to build ML models or they can use the API’s directly.
  • 54.
  • 55. Machine Learning in Insurance • “Anything You Can Do, AI Can Do Better” - George Argesanu, Global Head of Analytics from AIG and Monika Schulze, Global Head of Marketing, Zurich Insurance Company • Predictive analytics (Canonical problem “Regression”) using ML can be applied in all existing areas of Insurance: Pricing, Fraud, Claims, Marketing, P&L Analysis, Behavioural Analysis, and Preventive Insurance. • As machines think faster and smarter companies will get more accurate pricing, processes will be efficient, frauds are more likely to be caught, and losses will be more likely prevented. • Insurance industry is going through disruption and will face new products: Telematics, Self-driving cars, Internet of Things, and Cybercrimes. • Insurance industry will have to think of business problems which will come in future and think how machine learning will help to solve them: How fitness bands and apps change user behaviour, how premiums are affected, how lives can be saved, and how possessions can be preserved. (see Notes)
  • 57. Data Governance using Machine Learning • The problems solved by ML are similar to data governance problems such as dimension reduction, prediction, and regression. • The commercial platform from Pedro systems is designed to make sense of the unstructured data from unstructured documents such as excel sheet. • The Pendo data platform indexes the documents, searches data inside documents and give insights, document’s full lineage, Iterates, refines, and classifies the data. • The ML algorithm indexes the documents, classifies the customer data, and makes it accessible and searchable, full data lineage of unstructured data
  • 58. Reduce the Credit Gap - Individuals • With the advent of mobile devices, internet of things, social media, and mobile payments in past few years, more features are available for ML models. • Measuring the financial well-being is a problem of Classification and Dimensionality reduction. • Traditional credit scoring system rely on individual traits – age, job, and gender – while being oblivious to the spatial-temporal mobility patterns and habits of the individuals. • Inspired by the emerging trend of mobile payment and geo-aware, a ML model build over human consumption pattern across time and space is able to predict the financial well- being by 30% to 49% better than comparable demographic models (Singh, et al., 2015). • Inspired by the biological phenomenon of foraging, a basic pattern of animal movement for food and resources, human purchase behaviour was studied through three shopping behavioural traits: Diversity (Exploration), Loyalty (Exploitation), and Regularity (Plasticity). Behavioural indicators were combined with more financial trouble indicators and ML model was trained to predict whether customer is likely to have financial trouble or not.
  • 59. Reduce the Credit Gap - SME Traditional approaches for measuring financial well-being such as Earning per asset or Equity per asset have limited predictability. One idea is to create a network of merchants in which vertices represent merchants and weight of the edge is based on the number of common customers. Analysing the network is the canonical problem of Clustering (under unsupervised learning) that is solved by ML. Structural information, community detection, will be used as indicator of financial well-being of merchant. The information is used as feature in constructing the next ML predictive model.
  • 61. Data Governance • Organizations preparing to adopt Machine Learning tomorrow should start Data Governance today: either for one department or entire organisation • Stream data from silos/departments and invest in the use case, tie the integration to create value – a progressive step to create integrated platform of enterprise data for advance capabilities (Wilder-James, 2016) • To address the issue of data quality, Great Western Bank created a data committee with members from different teams across organisation. The team created standard definitions that teams across bank use. (banktech, 2014)
  • 62. Next Steps • Find a business area where data is already robust and plentiful, and has a business problem that can be solved by machine learning • Train the managers, move to cloud, make organisation agile, find business problems, finding use cases with robust data, apply data governance, and prepare data today for applying Machine Learning tomorrow • Embrace the increased competition in data value chain and realize ecosystem is the new value proposition • Build platform instead of product
  • 63. Cautions • Though guarantees of Machine Learning may be only theoretical, but things will be done better with machine learning – We are relying on the promise and power of computers • It will also be interesting to see how the Machine Learning works together with some other key developments such as data protection laws, identity thefts, secure systems, and distributed ledger. • Expectations from Business leaders and managers is more now on how they approach this new relation between humans and machines, and they become more accountable and liable when data is shared and decisions are taken by machines.
  • 64. Conclusion • Its time rethink strategy and make it iterative as machines become better and better • One can’t ignore the opportunity that computing power is increasing, we are collecting more data, storage cost is reducing, and algorithms are improving • Not only implement Machine Learning but also interpret the model • Upgrade the IT architecture and Data Governance practices

Editor's Notes

  1. Source: https://www.outbrain.com/blog/wp-content/uploads/2017/01/OB-Blog-Artwork-Artificial-Intelligence-1-v1.jpg Image Source: http://industrial.omron.com.br/media/2016/04/robot-human-1.png
  2. Image Source: https://qwg2b3m869-flywheel.netdna-ssl.com/wp-content/uploads/2015/09/artificial-intelligence-companies.png
  3. Image Source: https://i1.wp.com/i.huffpost.com/gen/4864344/images/o-AI-HUMAN-facebook.jpg
  4. Sources: https://en.wikipedia.org/wiki/John_Snow https://en.wikipedia.org/wiki/Miasma_theory https://www.theguardian.com/news/datablog/2013/mar/15/john-snow-cholera-map#data http://www.sciencemuseum.org.uk/broughttolife/themes/publichealth/cholera
  5. Ref - https://www.youtube.com/watch?v=NRYLPmy8V1k AI is broad term for field in which human intelligence is simulated in machine. ML is term applied to systems learning from experience (data).
  6. Ai has been area of interest since the term was originated 60 years back but what is different in this decade.
  7. Sources: Domingos, P., 2012. A Few Useful Things to Know about Machine Learning. Communications of the ACM, 55(10), pp. 78-87. Brownlee, J., 2013. A Tour of Machine Learning Algorithms. [Online] Available at: http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/ Kaluza, B., 2015. What Kind Of Problems Can Machine Learning Solve?. [Online] Available at: https://www.evolven.com/blog/what-kind-of-problems-can-machine-learning-solve.html
  8. Image Source: http://i.dailymail.co.uk/i/pix/2016/06/22/16/2E19888500000578-0-image-a-34_1466610048489.jpg
  9. Image source: http://skeptikai.com/2016/03/10/disruptive-technology-and-corporate-karate/
  10. Image Source: http://www.nationaljurist.com/lawyer-statesman/what-lawyers-duty-technology-competence
  11. Image Source: https://www.dentons.com/en/find-your-dentons-team/practices/trusts-estates-and-wealth-preservation
  12. For long, Banking is a standardized industry and Banks developed legitimate best principles because cause and effect relation was linear, repeatability allowed predictive models to be created, and the focus was on efficiency. The decision model was: sense incoming information, categorize that information, and respond in accordance with predetermined practice. The culture of Banks prevent Banks from being agile and innovative. The slow adoption of latest technology can be related to the culture of banks. Are experienced managers ready to be influenced by machines and delegate some of their tasks to machines without fear of being replaced?
  13. Even though Banks have recognized the opportunity of Big Data, Data Science, and ML, they are still struggling with the today’s big data ecosystem, because of shortage of trained data scientists who can make sense of troves of data collected by Banks over years. Not only Data Scientists, Banks also need skilled resources, Data Engineers and Architects, who can transform the legacy IT systems and enterprise architectures into ML ecosystem. Banks have used Data warehouses for storing large data and ran queries over it, which worked well with structured data, but, in the age of massive unstructured data and big data, ecosystem needs upgradation. Too many technologies and frequent changes is confusing.
  14. Source: Fedyk, A., 2016. How to Tell If Machine Learning Can Solve Your Business Problem. [Online] Available at: https://hbr.org/2016/11/how-to-tell-if-machine-learning-can-solve-your-business-problem
  15. Source: Fedyk, A., 2016. How to Tell If Machine Learning Can Solve Your Business Problem. [Online] Available at: https://hbr.org/2016/11/how-to-tell-if-machine-learning-can-solve-your-business-problem
  16. Source: The Royal Society Staff, 2017. Machine Learning Report, London: The Royal Society.
  17. Source: Galli, S., 2017. Machine Learning in Financial Credit Risk Assessment. [Online] Available at: http://www.datasciencefestival.com/soledad-galli-machine-learning-financial-credit-risk-assessment/?mc_cid=bc35d812cc&mc_eid=23f320c300
  18. Source: Wenzel, A., 2017. Challenges & Opportunities with Data Science In Insurance. [Online] Available at: http://www.datasciencefestival.com/ansgar-wenzel-challenges-opportunities-data-science-insurance/
  19. PayPal is receiving many transactions and labelling each one is not possible so idea behind Active learning is how to select minimum number or best sample (top 10 samples) needed to build ML models. A robust sampling technique is key for success of Active Learning. Idea is to select sample effectively that should be given to human for labelling, for example, suppose a model predicts if transaction is fraud or not with accuracy of 0.5, then sample can be given to human to label it correctly using other tools, so, rather than giving random datasets from huge pool of database give dataset that is closer to boundary (0.5 accurate), and once data is labelled by human, add that data back to model. Research says, randomly selecting sample and applying ML results in 70% accuracy but ML model using Active learning results in 90% accuracy. PayPal started with small set of labelled data, and analysed the current ML model being used. PayPal was using single layer neural network but they found deep learning and GBT successfully challenged their existing model and GBT outperformed deep learning because features were human engineered. One benefit of deep learning is it learns features so that is another research for PayPal to reengineer some of the features, which are human engineered, using deep learning. PayPal uses both models and if they are in disagreement then human experts intervene (Query by Committee). Humans are feed with data and supporting infrastructure such as simulation environment to re-compute the feature logic and apply it back to ML model. Finding and applying the benefits of deep learning is next logical step in ML for PayPal. PayPal is using H2O tool rather than TensorFlow and R language. 1-year data (11 million transactions) was used to train and tested on 4 million transactions using 500-600 features. Improving the prediction can be done by analysing bigger data (uses transactions of more than 5 years), improving algorithm, better features, or better labelling.
  20. Source: Ramanaathan, Venkatesh “Large Scale Machine Learning for Payment Fraud Prevention”, 2017, https://www.infoq.com/presentations/paypal-ml-fraud-prevention?utm_source=presentations_about_MachineLearning&utm_medium=link&utm_campaign=MachineLearning
  21. Sources: https://www.technologyreview.com/s/603120/new-hedge-fund-relies-on-an-anonymous-army-of-coders-to-turn-a-profit/ https://numer.ai/
  22. Source: https://medium.com/numerai/invisible-super-intelligence-for-the-stock-market-3c64b57b244c
  23. Source: Click, C., 2017. Policing the Stock Market with Machine Learning. [Online] Available at: https://www.infoq.com/presentations/score-ml-stock-market
  24. Organisations moving to cloud and using Google Cloud ML engine will have most of the work related to cleansing of data (estimated 90%). ML is the rocket and data is the fuel. The growth of ML acceptance is exponential as reported by NVIDIA that CPU sales are flat for past few years but GPU is where growth is, driven by ML. Google has realised this explosion and investing in special purpose hardware – Tensor processing unit (TPU). Currently Google is using ML in its broad range of products: Android, Apps, Gmail, Maps, Speech, Search, Translation, YouTube, Self-driving car, and so on. The use of ML in Google is transparent to end user but it is running behind, for example, when user searches for photos of beach the ML in background classifies each image if it is a beach photo or not. Google is providing the ML as an API (Application Programming Interface) such as Cloud Vision API, Speech API, Natural Language API, Translation API, Video Intelligence API, and so on to its clients. Benefit of using Google API’s is that they are already tested on huge data and are in production for few years, such as same technology used by google for image search is available for organisations to use, as an API. Lot of problems related to ML are solved by Cloud providers such as Google, except the data. Organisations have to think about data governance, cleaning, lineage, and quality. API’s related to image and video can be applicable to Banking industry such image uploaded by end user for insurance claims: Video Stabilization, Stabilize the video taken by phone; Protect the content from malicious uploads, such as, preventing end user from uploading the adult content; Classify the model/make of car from photos; Classifying the videos uploaded by end users; pulling text from image or masking the sensitive text in image such as ID number in image before storing in database; and finding the location of image. Another benefit of using the API’s related to images is that they have already learned the image recognition so clients don’t need to train the model. Combining Natural language processing API’s for speech recognition, language translation, Image, and Video API’s Banks can create truly immersive apps such as Face Banking, Video advisors, and revamp the existing online and mobile banking platforms. Using ML to build relationship graph of the information and data. Rather than hard coding rules for information retrieval and search (hard coded search queries with rules), use ML to learn the patterns in similar to RankBrain, which is a key signal used in Google search. Google is experimenting to use AlphaGo, based on reinforcement learning, to reduce the amount of energy spend on air conditioning. Google reduced the data centre usage by 40% by using ML. Since, Google has lot of high-end solutions for both structured and unstructured data, analytics and big data tools, ML API’s and frameworks; it makes a promising use case for organisations to move their data to Google Cloud or any similar one, if not done or not in progress yet. Many use cases from financial services companies are already running on Google cloud such as Risk analytics and regulations, Customer segmentation, Cross-selling and up-selling, Sales and Marketing campaign management, and Credit worthiness. Insurance company, AXA, uses ML to calculate premium, shifting from rules based system to ML based system. Similarly, Bank SMFG, moved from rules based system of credit card fraud detection to ML based system on Google Cloud. End users provide data, and ML learns from data to better understand user; thereby creating a new level of personalised services. Business processes can be streamlined as machines learn about the patterns and anomalies. Providing more relevant information to right user on right time.
  25. Source: Ramanathan, R., 2017. Infuse your business with Machine Learning. [Online] Available at: https://cloudonair.withgoogle.com/events/next-live-emea-2017/watch?talk=session-3-day-1
  26. Source: Argesanu, G. & Schulze, M., 2016. Anything You Can Do, AI Can Do Better, London: Insurance Nexus. In the report prepared by George Argesanu, Global Head of Analytics from AIG and Monika Schulze, Global Head of Marketing, Zurich Insurance Company “Anything You Can Do, AI Can Do Better”, they say Machine Learning is a natural progression of more than century of predictive analytics and it is cutting edge technology that blows apart old thinking and catapult company into generating business even many times faster it has been before. One of the biggest difference between the existing statistical modelling such as Generalised Linear Model and ML models such as Neural Networks and Decision Trees is that machines are algorithmic based and self-learning so they blur the line between past and present which predicting future whereas statistical models use historical data and parametric approach. ML models can be applied on customer by customer basis and can combine images and videos. Predictive analytics (Canonical problem “Regression”) using ML can be applied in all existing areas of Insurance: Pricing, Fraud, Claims, Marketing, P&L Analysis, Behavioural Analysis, and Preventive Insurance. As machines think faster and smarter companies will get more accurate pricing, processes will be efficient, frauds are more likely to be caught, and losses will be more likely prevented. Insurance companies face same challenges as discussed before in Literature Review chapter and they can modernise case by case. But, Insurance industry is going through disruption and will face new products: Telematics, Self-driving cars, Internet of Things, and Cybercrimes. Insurance industry will have to think of business problems which will come in future and think how machine learning will help to solve them: How fitness bands and apps change user behaviour, how premiums are affected, how lives can be saved, and how possessions can be preserved. Per line of business companies need to articulate what problem they are trying to solve and where is data. Create foundation of data for analytics, move data and computations to cloud, and engage machine learning. Insurance companies lack frequent interactions with their customers therefore they need to encourage customers to engage in machine learning process through physical technologies such as Telematics and through applications for greater access to data. Many insurance companies start with risk mitigation approach. Partnership within company and with external providers and taking it slowly are defining ways for approaching Machine Learning.
  27. “Recognition pipeline in an autonomous car” is an example of two canonical problems of ML – Feature extraction and Classification – It is a 3-step process and ML methods are used. The image is captured in real time and classified as “Stop Sign”. Insurance companies dealing with insurance of autonomous cars will face many use cases of classifications while possibly processing claims of autonomous cars, when they will analyse the images captured and decisions made by the autonomous car that met an accident.
  28. Source: pendo, 2017. pendo systems. [Online] Available at: http://www.pendosystems.com/
  29. Source: Singh, V. K., Bozkaya, B. & Pentland, A., 2015. Money Walks: Implicit Mobility Behavior and Financial Well-Being. PLOS ONE 10(8): e0136628.
  30. Eagle, N., Macy, M. & Claxton, R., 2010. Network Diversity and Economic Development. Science, 328(5981), pp. 1029-1031. Image Source: MIT Online course: Big Data & Social Analytics