1. Data Science: Transforming the finance industry
Titos Matsakos, PhD
Data Scientist at JDX FinTech
9 December 2016
The Data Science revolution
Data Science, Predictive Analytics, Big Data, Supervised/Unsupervised Machine Learning. Itinerant
buzzwords or opportunity for competitive advantage? As we’ve seen during the last couple of
decades, it can be much more than that: innovative Data Science applications have not only brought
major technology disruptions across industries, but have also established today’s market leaders. Take
the examples of online retailers and media streaming companies such as Amazon, YouTube, and
Netflix, or social media networks and peer-to-peer services such as Facebook, Uber, and Airbnb.
Notably, many of the companies that have totally transformed traditional business models hold few
assets other than data and algorithms. In fact, recognising such intangible resources as profitable
assets is yet to be broadly realised, with many companies far behind in investing in the necessary
human, software, and hardware resources for the collection and mining of data.
Data Science innovations are ubiquitous in the technology sector; but what about the finance
industry? Enter Signet Bank, a small regional bank in Virginia. By applying predictive analytics to the
issuance of credit cards in the 1990s, the business achieved such an enormous growth that led to a
quite well-known spin-off: Capital One. The credit card industry developed a new norm – from uniform
pricing across customers to tailored products based on the specific client profile; a true example of a
data-driven revolution. Other notable Data Science applications that had a major impact on the
assessment of credit risk include the calculation of credit scores and the estimation of probabilities of
default.
Mining data: Machine Learning
But what is Machine Learning and why is it so powerful? Despite its somewhat obscure name, the core
principles are simple to understand. Loosely speaking, machine learning consists of a collection of
methodologies and algorithms that, given an input dataset, can be applied to perform a wide range of
tasks such as:
predict the value of a variable (e.g.
the price of an asset) – Regression
predict the category of an
observation (e.g. whether an
individual will default) – Classification
group similar items, individuals, or
entities (e.g. categorise clients in
groups) – Clustering
identify similar items, individuals, or
entities (e.g. find contracts with
similar clauses) – Similarity Matching
2. Regression and Classification are typical cases of supervised learning, where a set of input variables
(also called features or predictors) are used to predict the value or class of a target variable (also called
output or response). The word “supervised” implies that the predictive model is built by discovering
common patterns and correlations between the input and output variables of a training dataset.
Regression examples include the calculation of credit scores – based on a set of features, such as
income, age and credit history – and the pricing of a property – based on real estate data such as the
floor area, location and year of construction. Classification examples include the detection of fraud
from transaction data, and whether a company will default over the next year given a set of financial
statement, macroeconomic, and historic default rate data. It follows that the more accurate the
predictability of the model, the higher the efficiency and profitability of the business. A non-exhaustive
list of methods that perform Regression and Classification is multivariate regression, logistic
regression, support vector machines, and decision trees.
On the other hand, Clustering is often referred to as an “unsupervised” machine learning algorithm
because there is no expected dependence to be established. Instead, the task is to discover structure
and/or patterns in the data by grouping together and characterising the properties of entities such as
clients, companies, or products. For instance, by categorising companies based on their size, revenue,
probability of default, etc., a bank can optimise resource allocation by incentivising business with
specific client groups. Examples of Clustering methods include K-Means, density-based clustering
algorithms such as DBSCAN, and self-organizing maps. Similarity Matching, as the name suggests, finds
a close match to a desired item. For example, this is used to recommend financial products to a
customer based on other customers with a similar profile, or to identify and target potential clients
that are likely to respond positively to an offer.
The above methods need not be associated with datasets that only consist of numeric values. Natural
Language Processing (NLP) is involved in a lot of powerful applications in Machine Learning as well.
Classification, Clustering, and Similarity Matching algorithms can organise, group, and compare
articles, documents, or any kind of text based on their content. Typical examples are the grouping of
news articles by topic, and the recommendation of similar articles to read. In finance, documents such
as contracts, deals, or agreements can be grouped together and categorised based on their terms and
clauses. This can not only facilitate the management of a large number of complex contracts, but also
help ensure compliance within the ever expanding regulatory framework. Yet another application is
the processing of emails and transcribed phone calls in order to detect or even prevent fraudulent
activities.
Conclusion
Data Science, Predictive Analytics, Big Data, and Machine Learning, are far from an ephemeral hype.
Similarly to the Internet revolution two decades ago, Data Science is gradually becoming part of every
organisation, transforming business models and defining a new norm for how we conduct business.
Implementing data-driven solutions is not a straightforward process though; projects can be highly
technical, the feasibility of which requires investment in talent, software and hardware resources.
However, the value added by collecting and mining data can often offer unparalleled opportunities
for growth, and getting on board early can increase the return on investment in the long run.
http://jdxconsulting.com/index.php/2016/12/09/data-science-transforming-the-finance-industry/