Credit risk is at the core of banking business and its adequate measurement is crucial for financial institutions. Due to lack of historical default data and heterogeneity of customers, qualitative expert-based information is an important factor in measuring the creditworthiness of large companies. However, such information is often extracted manually, causing inefficiencies and possible subjectivity. In this paper we develop the RatingBot: a text mining based rating approach, which efficiently and objectively models relevant qualitative information based on annual reports. It combines both the literature on text mining in finance and machine learning in credit rating to derive the credit rating of a company. We evaluate our approach on two datasets: a publicly available one that facilitates replicability, and a dataset provided by a major European bank representing real-world scenario. The results show that RatingBot delivers additional predictive power and should be considered in future research on credit rating models.
Model Call Girl in Tilak Nagar Delhi reach out to us at π9953056974π
Β
RatingBot: A Text Mining Based Rating Approach
1. RatingBot: A Text Mining Based Rating Approach
Presented by Wong Ming Jie & Zhu Cungen
2. CONTENT
β Motivation
β Credit Risk
β Measurement of Credit Risk
β Related Work
β Text Mining in Finance
β Machine Learning in credit Rating
β Research Question
β Data Context
β Methodology
β Discussion and Conclusion
3. MOTIVATION
β Financial institutions such as banks serve as
intermediaries between
o lenders (seeks short-term investing opportunities) and
o borrowers (seeks long-term financing)
β Facilitating development of products and services to both
groups of clienteles
β Their services and products need to achieve balance
between risk-taking and profit
o Maximise profits
o Do not want to exceed their risk appetite
4. MOTIVATION
β Risk is core of the financial industry
o Credit;
o Market; and
o Operational risks
β Credit risk is defined as the risk that a counterparty
(borrower) may become less likely to fulfill its obligation
in part or in full on the agreed upon date
β To manage credit risk, banks need to develop rating
models for capital calculation to
o Quantify possible expected and unexpected loss of the
counterparty
o Ascertain their creditworthiness
5. MOTIVATION
β Measurement of Credit Risk comprises:
o Probability Default (PD);
o Loss Given Default (LGD); and
o Exposure at Default (EAD)
β PD represents the creditworthiness of the counterparty
and is used for loan approval
β Estimating PD is based on applying statistical inference
from quantitative/categorical input variables using
financial information (historical number of defaults),
demographics (country) to determine whether a
counterparty will default
β This form of inference is limited to large homogenous
populations with substantial incidents of defaults
6. MOTIVATION
β For small populations which are heterogenous and low
default rates, the data used for statistical inference may
not be generalizable and is difficult to predict PD
β Most cases, banks move towards the approach of using
o Quantitative sources (including profitability)
o Qualitative sources (corporate disclosures, news, analyst calls)
β Characteristics of qualitative sources may contribute to
estimating credit-worthiness (improves PD estimation)
o Sentiment of corporate disclosures correlates with future earnings,
return on assets, stock returns and return volatility
o Strong financial performance may relate to creditworthiness which
could be overlooked
7. MOTIVATION
β Qualitative sources such as annual reports provide more
structured, objective and forward-looking information
and are publicly available
β More predictive power since it includes consideration of
quantitatively-difficult-to-quantify factors (future
strategy, performance) which may be relevant to
creditworthiness
β Unlike quantitative sources, qualitative sources require
experts to extract and interpret meaningful information
β Manual interpreting and coding exposed to subjectivity,
errors and inefficiencies which affects PD estimation
8. RELATED WORK β TEXT MINING
β Text mining has been studied extensively in finance
β Content analysis methods required to extract relevant
information from unstructured form of annual reports for
PD estimation
o Bag-of-words
o Document narrative extraction
β Unlike document narrative extraction, bag-of-words is
more flexible and assumes word (or sentence) order is
irrelevant for document representation
β Bag-of-words method can be implemented with either
o Term-weighting approach
o Machine-learning approach
9. RELATED WORK β TEXT MINING
β Text mining has been studied extensively in finance
β Content analysis methods required to extract relevant
information from unstructured form of annual reports for
PD estimation
o Bag-of-words
o Document narrative extraction
β Unlike document narrative extraction, bag-of-words is
more flexible and assumes word (or sentence) order is
irrelevant for document representation
β Bag-of-words method can be implemented with either
o Term-weighting approach
o Machine-learning approach
10. RELATED WORK β TEXT MINING
β Term-weighing structure a document in different terms
and assign a weight to each of them based on the level of
representation of importance to derive a sentiment score
β Predefined sentiment dictionaries can use to determine
the sentiment score of the document
o Harvard GI word list (designed for general uses)
o Longhran and McDonald (designed for financial uses and credit
rating applications)
β Longhran and MacDonald is extracted from a set of 10-K
SEC filings
β A 10-K filing is an annual report required by the U.S.
Securities and Exchange Commission (SEC) from a
company that provides a comprehensive summary of its
financial performance (within 60 days after its fiscal year)
11. RELATED WORK β TEXT MINING
β Machine-learning require a set of manually labelled words
or sentences as inputs which are then used to train
classification models
β These models will then be used to label new words or
sentences in the document
β This method is costlier to implement and requires finance
experts who are native speakers of the language of the
words
β A new approach is to identify themes within a corpus of
documents by identifying latent (hidden) topics that
represent the document using probabilistic Latent
Dirichlet Allocation (LDA) and then Gibbs Sampling
(MCMC) to derive the topics and approximate the
distribution parameters
12. RELATED WORK β CREDIT RATING
β Most machine learning in credit rating uses binary
classification (default/non-) to predict the default of a
counterparty
β Most common methods in literature are:
o Neural Networks (NN)
o Support Vector Machine (SVM)
o Linear and Logistic Regression (LR)
o Decision Trees (DT)
o Fuzzy Logic (FL)
o Genetic Programming (GP)
o Discriminant Analysis (DA)
β Most machine learning techniques proposed specifically
use only quantitative information as model variables or
inputs
13. RESEARCH QUESTION
β Research Question: Can we combine the two approaches?
o Term-weighting, topic models and sentiment analysis to identify
and represent qualitative information in annual reports in a
structured way -> inputs
o Apply classification approaches using machine-learning on these
inputs to predict the credit rating of a company
β Proposition: Based on the idea to use annual report of a
company as inputs and then apply text mining and
classification approaches to automatically and objectively
derive its credit rating
β Credit rating as an indication of perceived risk can be used
to investigate PD by banks and financial institutions
14. DATA CONTEXT
β Dataset 1:
o Downloaded from 10 available 10-K filings from SEC EDGAR
database from 2009 to 2016
o Avoid the influence of financial crisis between 2007 and 2008
o 34,972 10-K filings were joined with 17,622 Standard & Poor
(S&P)βs ratings in 2016 (9197 unique companies)
o Then the new joined dataset was constructed by matching the SEC
statements with the latest S&P ratings after 2009 so that the
reports were issued at most 9 months before the credit rating dates
o This resulted in 1716 data points for 1351 companies
o After removing 228 data points which had a defaulted rating given
(intention is to study credit rating class), the final sample has 1488
data points
15. DATA CONTEXT
β Dataset 2:
o Provided by a major European bank
o Consists of annual reports and internal credit ratings of companies
between 2009 and 2016
o Contains non-standardized general reports with some reported
partly in 10-K filings format and some not written in English
o 10,435 total annual statements
o After removing read-protected and non-English reports and reports
from defaulted and non-rated companies, there were only 5508
annual statements for consideration
β Dataset 1 was used because it allows for replication of the
results and 10-K filings are commonly used sources in
literature
β Dataset 2 was used because it represents real-world
scenario with internal ratings
16. DATA CONTEXT
β Document representation:
o Loughran and MacDonald dictionary was used to determine the
sentiment-weighted analysis for term-weighting approach
o Dictionary was derived from 10-K SEC findings (suitable for credit
rating context)
o Terms appearing in less than 5% of the documents (dataset 1 and
2) were removed based on the chi-squared statistics so that only
the most important terms remained
β Robustness check was also made to ensure that the
annual reports from dataset 1 and 2 are relevant to the
credit rating of a company
β Datasets 1 and 2 were compared across the absolute
frequency of terms and the sentiment-weighted
frequency of terms after using the dictionary
18. Methodology
ο¬ Data Context
o Feature: A company rates its counterparty π by making annual
report ππ, π β 1,2, β¦ . , π : textual or qualitative data πΏ
o Label: Counterparty π get credit rating ππ,ππ β 1,2, β¦ . , π
o What we can do?
οΌ Derive the relationship between textural report and rating
οΌ Predict the rating for new given counterparties or annual report
ο¬ Cannot directly compute textual data
o Information transformation
οΌ Preprocess m textual documents ππ π=1
π
οΌ Represent report with new computational form: quantitative
11/14/2018 18
Transformation:
1. Text preprocessing
2. Document representation
Qualitative
Form
Quantitative
Form
19. Methodology
ο¬ Text Preprocessing
o Step 1: clean pictures, HTML tags, formatting etc. raw text
οΌ Only raw text is left
o Step 2: transform every letter into low-case formation
οΌ For instance, βCapitalβ become βcapitalβ
o Step 3: remove numbers, special characters and punctuation
o Step 4: tokenize sentence into words by removing spaces
o Step 5: remove stop words, such as βorβ, βandβ, and βtheβ meaningless
o Step 6: stem terms (words) back to root form
11/14/2018 19
Transformation:
1. Text preprocessing
2. Document representation
Qualitative
Form
Quantitative
Form
20. Methodology
ο¬ Text Preprocessing
o Step 6: stem terms (words) back to root form
οΌ Stemming: remove βsβ from βrisksβ or βedβ from βdeclinedβ
β’ Group terms into same semantic root
β’ However, not applicable to terms like βgoesβ and βwentβ
οΌ Alternatives: lemmatization
β’ Can overcome limitation of stemming with higher precise
β’ Require complex computation, unwieldy in practice
β’ Still choose stemming algorithm
o Result of text preprocessing
οΌ A list of stemmed terms
11/14/2018 20
Transformation:
1. Text preprocessing
2. Document representation
Qualitative
Form
Quantitative
Form
21. Methodology
ο¬ Document representation
o In the list of stemmed terms
οΌ Many terms occur more than once
οΌ Compress them and make interpretable
β’ Term-weighting: weight every unique term
11/14/2018 21
Transformation:
1. Text preprocessing
2. Document representation
Qualitative
Form
Quantitative
Form
22. Methodology
ο¬ Term-weighting
o Binary frequency: whether occur, dummy
οΌ Too Naive to consider true frequency
o Absolute or relative frequency: How many times a term occurs in ππ
οΌ Ignore the distribution of a term over different ππ
οΌ E.g. if a term has same distribution cross ππ, it might be useless to
predict rating score
o Term frequency-inverted document frequency (tf-idf)
οΌ Decrease high frequency while increase low frequency (smoothing)
o Ignore the sentiment of words
οΌ In financial field, sentiment of report has significant relationship with
company performance
11/14/2018 22
23. Methodology
ο¬ Term-weighting
o Ignore the sentiment of words
o Sentiment-weighted frequency
11/14/2018 23
What if ππππ.π = π£π,π Γ πππ π‘ππππ ?
24. Methodology
ο¬ Document representation
o In the list of stemmed terms
οΌ Many terms occur more than once
οΌ Compress them and make interpretable
β’ Term-weighting: weight every unique term
οΌ Reduce terms to avoid overfitting and computation complexity
β’ Term selection: remove terms with low frequency
β’ Term selection: remove terms with low explanatory power by chi-
square test
β’ Term extraction: topic model---LDA
11/14/2018 24
Transformation:
1. Text preprocessing
2. Document representation
Qualitative
Form
Quantitative
Form
input
25. Methodology
ο¬ LDA
o Sample one document ππ with probability π ππ
o Sample πππππβπππ‘ πππ π‘ππππ’π‘πππ πΆ to generate topic distribution π½π for ππ
o Sample π½π for ππ to get topic π§π,π
o Topics are latent and unknown
o Sample πππππβπππ‘ πππ π‘ππππ’π‘πππ π· to generate word distribution π π§ π,π
for
topic π§π,π
o Sample π π§ π,π
to generate final word π€π,π
11/14/2018 25
Hyperparameter: parameter of parameter
26. Methodology
ο¬ Document representation
o In the list of stemmed terms
οΌ Many terms occur more than once
οΌ Compress them and make interpretable
β’ Term-weighting: weight every unique term
οΌ Reduce terms to avoid overfitting and computation complexity
β’ Term selection: remove terms with low frequency
β’ Term selection: remove terms with low explanatory power by chi-
square test
β’ Term extraction: topic model---LDA
o Results: π π‘ππππ π=1
π
|π‘ππππβ ,π‘πππππ,β = π π‘ππππβ|ππ
11/14/2018 26
Transformation:
1. Text preprocessing
2. Document representation
Qualitative
Form
Quantitative
Form
input
27. Methodology
ο¬ Document representation
o Results: π π‘ππππ π=1
π
|π‘ππππβ ,π‘πππππ,β = π π‘ππππβ|ππ
οΌ Interpretation: π‘πππππ,β measures the significance of topic term h
just as the role of weight of word term l
o Final date set: π π‘ππππ π=1
π
|π‘ππππβ ,π‘πππππ,β = π π‘ππππβ|ππ
οΌ π€π,π|π€π,π πππππ’πππ π€πππβπ‘π ππππ,π πππ π‘πππππ,β π=1
π
, ππ
π=1
π
οΌ ππ, ππ π=1
π
11/14/2018 27
Transformation:
1. Text preprocessing
2. Document representation
Qualitative
Form ππ
Quantitative
Form π€π,π
29. Methodology
ο¬ Classification
o Support Vector Machine (SVM): what about benchmark?
οΌ Aim: maximum-margin hyperplane
οΌ the distance between the hyperplane and the nearest point π
from either group is maximized.
οΌ Reduce overfitting by
L2-norm regularization
οΌ Stable
οΌ Lack of interpretability
11/14/2018 29
30. Methodology
ο¬ Classification
o Neural Networks (NN)
οΌ Three layers: input, hidden (a or multiple) and output layer
οΌ Before hidden----Aggregation function: π=1
π
π½π π€π,π
οΌ In the hidden---- Activation function: g π₯ =
1, π₯ > 0
0, π₯ β€ 0
for each
left layers
οΌ Label rule:π π πππ€
=
1, g π·π πππ€ > 0
2, πππ π
οΌ Train model:
οΌ backpropagation to update prediction errors
11/14/2018 30
31. Methodology
ο¬ Classification
o Decision Tree (DT)
οΌ Aim: π ππ = π|ππ β π ππ = π π ππ|ππ = π
οΌ spilling criterion: chi-squared, Gini coefficient or entropy-based
οΌ Overfitting
οΌ Good interpretability
11/14/2018 31
35. Model Development and Evaluation
ο¬ Dependent Variables
o 19 ratings or classes: ππ
o Rating bands
οΌ πππππ =
1 ππππ β€ 7
2 ππ 7 < ππ β€ 10
3 ππ 10 < ππ β€ 13
4 ππππ > 13
οΌ Less granular
o Binary dependent variable: ππππ
οΌ ππππ =
investment grade company ππππ β€ 10
π ππππ’πππ‘ππ£π πππππ πππππππ¦ ππ > 10
οΌ Comparable
11/14/2018 35
Unbalanced
Unbalanced
Unbalanced
36. Model Development and Evaluation
ο¬ Dependent Variables
o In addition, intuitively, less classes, more accurate
οΌ π πππππ‘ππ ππ πππ ππ¦πππ > π πππππ‘ππ ππ π π’π β ππ¦πππ
οΌ Accuracy: Binary ratings> Rating bands > 19 ratings
11/14/2018 36
37. Model Development and Evaluation
ο¬ Test Set Performance
o Train: test = 3:1
o Every presented classifier performs best under responding column
o higher than random accuracy 1/19=5.2%, ΒΌ=25%, Β½=50%
o SVM best for dataset2 while DT and NN best for dataset1
11/14/2018 37
38. Model Development and Evaluation
ο¬ Test Set Performance
o ππππ’ππππ¦ =
ππ+ππ
ππ+ππ+πΉπ+πΉπ
: overall accuracy
o ππππππ πππ =
ππ
ππ+πΉπ
: the proportion of correctly classified points within a true
class
o ππππππ =
ππ
ππ+πΉπ
: the proportion of correctly classified points within a
classified or generated class
11/14/2018 38
39. Model Development and Evaluation
ο¬ Test Set Performance
o πππ =
ππ+ππ
ππ+ππ+πΉπ+πΉπ
; πππ =
ππ
ππ+πΉπ
; ππππππ =
ππ
ππ+πΉπ
o Thought: small class tends to have low precision
o Thought: Some classifiers can sacrifice the precision and recall of
small class in order to pursue a high accuracy or a
or precision of big class.
o Thought: understand class distribution
Before choosing and develop classifier.
11/14/2018 39
42. Model Development and Evaluation
ο¬ Model Results
o How to interpret the result?
οΌ Compare accuracy between sentiment-using and -unusing
11/14/2018 42
43. Model Development and Evaluation
ο¬ Model Results
o The effect of considering sentiment of the terms depends on the classifier,
the dataset, and the type of dependent variable.
o NN, DT, and SVM work best
o STM works bad, which is surprising
οΌ Good interpretability
οΌ Potential reason given by this paper
ο¬ Should use linear regression (LR) rather than generalized linear model(GLM)
οΌ My thought:
ο¬ GLM may work better since it has less assumption than LR
ο¬ But why not good? In the generating process of STM, it assume response
variable (class) is produced by or sampled from topics. However, the true
ratings may not be produced in this way.
ο¬ We only know sentiment has relationship with rating, but that dose not
means sentiment produce rating
11/14/2018 43
44. Conclusion
ο¬ RatingBot to predict rating score
ο¬ Limitations
o The credibility of rating
οΌ Considering time component to capture the manipulation of companies to
annual report to get higher rating
o Try other classifiers, such as deep learning
o Including other text sources, such as news, social media content
11/14/2018 44