Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[2018 台灣人工智慧學校校友年會] Textual Data Analytics in Finance / 王釧茹

381 views

Published on

王釧茹 / 中央研究院資訊科技創新研究中心助研究員

Published in: Data & Analytics
  • Be the first to comment

[2018 台灣人工智慧學校校友年會] Textual Data Analytics in Finance / 王釧茹

  1. 1. Talk @ Taiwan AI Academy, November 17, 2018 Textual Data Analytics in Finance Dr. Chuan-Ju Wang (王釧茹) Research Center for Information Technology Innovation, Academia Sinica Computational Finance and Data Analytics Laboratory (CFDA Lab) http://cfda.csie.org
  2. 2. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Quant — Data Scientist Source: http://www.indeed.com/jobtrends Source: http://www.computerweekly.com/blogs/Data-Matters/2014/06/data-scientist-the-new-quant.html
  3. 3. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Data Science in Finance
  4. 4. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Text Analytics ❖ Big Data ❖ Structured Data ❖ user logs, sensor logs, click through logs, … ❖ Unstructured Data ❖ web texts, user conversions, public opinions, reports… ❖ Big Data for Unstructured Text – Text Analytics ❖ Goal — Turn text into data for analysis, via application of natural language processing (NLP) and analytical methods https://insidebigdata.com/2015/06/05/text-analytics-the-next-generation-of-big-data/
  5. 5. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Textual Sentiment Analysis for Financial Risk Prediction On the Risk Prediction and Analysis of Soft Information in Finance Reports. European Journal of Operational Research (EJOR), 257(1), 243-250, 2017.
  6. 6. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Soft and Hard Information in Finance ❖ Growing amount of financial data makes it more and more important to learn how to discover valuable information for various financial applications. ❖ In finance, there are typically two kinds of information: ❖ Soft information: text, including opinions, ideas, and market commentary. ❖ Hard information: numerical values, such as financial measures and historical prices. ❖ Our work aims to exploit soft information for financial risk prediction.
  7. 7. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Risk Proxy: Stock Return Volatility ❖ Stock return ❖ Stock return volatility ❖ A common risk metric measured by the standard deviation of returns over a period of time. Rt = (St St 1) St 1 v[t n,t] = t i=t n(Ri R)2 n , where R = t i=t n Ri (n + 1) .
  8. 8. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Financial Sentiment Analysis ❖ In this work, we attempt to apply sentiment analysis on the risk prediction task. ❖ A finance-specific sentiment lexicon is adopted for analysis. ❖ Two machine learning techniques are adopted for the task: ❖ Regression approach: Predict the stock return volatilities. ❖ Ranking approach: Rank the companies to be in line with their relative risk levels.
  9. 9. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Financial Sentiment Lexicon ❖ Words in finance domain and in general usage usually have different meanings, such as ❖ vice: immoral or wicked behavior ❖ vice: secondary (in finance context) ❖ Almost three-fourths of the words in the 10-K financial reports from year 1994 to 2008, which are identified as negative by the widely used Harvard Psychosociological Dictionary, are typically not considered negative in financial contexts.
  10. 10. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Six Finance-Specific Lexicons ❖ Loughran and McDonald (2011) ❖ When is a liability not a liability? textual analysis, dictionaries, and 10-ks. Journal of Finance.
  11. 11. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Problem Formulation ❖ Predict target: Future’s stock return volatility (regression) and future’s relative risk levels (ranking) ❖ Features ❖ Soft textual information: All words or financial sentiment words ❖ Hard numerical information: The twelve months before the report volatility for each company v(+12) 2007/3/222006/3/22 Report filing date 2005/3/22 v(-12)
  12. 12. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Corpora: The 10-K Corpus ❖ A Form 10-K is an annual report required by the U.S. Securities and Exchange Commission (SEC) ❖ Only section 7 “management’s discussion and analysis of financial conditions and results of operations”(MD&A) ❖ The Sarbanes-Oxley Act of 2002: Explain the drastic increase in length during the 2002-2003 period
  13. 13. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Experimental Results
  14. 14. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Financial Sentiment Terms Analysis amend deficit forbear delist defaultsureti discontinu wherebi unabl disput concern profit violat regain uncom -plet accid abl integr grantor ceg nasdaq gnb coven forbear waiver sureti excelsior rais ebix shelbour nplacement syndic pfc stage same driver default small- cap seri hearth awg amend libert special benefici sever breach doubt Fin-Neg Fin-Pos Fin-Lit Fin-Unc Non SEN ORG 1 1 2 3 4 5 2 3 4 5 deficit deficits default defaulted defaulting defaults delist delisted deslisting delists amend amendable amendatory amended amending amendment amendments amends forbear forbearance forbearances forbearing forbears
  15. 15. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 FIN10K Prototype Demo https://cfda.csie.org/10K/ FIN10K: A Web-based Information System for Financial Report Analysis and Visualization. ACM CIKM (Demo paper), 2016.
  16. 16. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Financial Keyword Expansion via Continuous Word Vector Representations Discovering Finance Keywords via Continuous Space Language Models. ACM Transactions on Management Information Systems, 7(3), 7:1-7:17, 2016.
  17. 17. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Sentiment Analysis — the Lexicon ❖ For sentiment analysis, the lexicon is one of the most important and common resources. ❖ Usually have a great impact on results and the corresponding analyses ❖ In finance, the lexicon is usually semi-manually generated. ❖ Result in inadequate words ❖ In this work, we attempt to use the advanced continuous space language models to expand finance keywords automatically.
  18. 18. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Continuous Space Language Models ❖ “You shall know a word by the company it keeps”
 (J. R. Firth 1957) ❖ One of the most successful ideas of modern statistical NLP!
  19. 19. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Continuous Space Language Models ❖ Continuous space language models ❖ a.k.a. Continuous word embeddings ❖ Words are represented as low-rank dense vectors. ❖ Recent studies show their superiority in capturing syntactic and contextual regularities in language.
  20. 20. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Keyword Expansion ❖ Our Proposed Keyword Expansion Method ❖ Adapt this technique to incorporate syntactic information to capture more similarly meaningful keywords. ❖ Learn vector representations of words via a large collection of financial reports (domain-specific) ❖ Words in the financial sentiment lexicon are used as seed words to obtain those within the top N cosine distances.
  21. 21. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Keyword Expansion ❖ Keyword Expansion with Syntactic Information
  22. 22. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 The New 10-K Corpus
  23. 23. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Four Prediction Tasks ❖ Four prediction tasks are conducted. ❖ To demonstrate that our approach is effective for discovering predictability keywords 1) Post-event volatility 2) Stock volatility 3) Abnormal trading volume 4) Excess returns
  24. 24. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Postevent Volatility Prediction
  25. 25. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 FIN10K Prototype Demo https://cfda.csie.org/10K/ FIN10K: A Web-based Information System for Financial Report Analysis and Visualization. ACM CIKM (Demo paper), 2016.
  26. 26. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Beyond Word-Level Analysis ❖ Multi-word expression detection and analysis ❖ Beyond Word-Level to Sentence-Level Sentiment Analysis for Financial Reports ❖ RiskFinder: A Sentence-level Risk Detector for Financial Reports, NAACL’18 ❖ https://cfda.csie.org/RiskFinder/ ❖ FRIDAYS: A Financial Risk Information Detecting and Analyzing System, AAAI’18 ❖ https://cfda.csie.org/FRIDAYS/
  27. 27. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Summary ❖ If structured data is big, then unstructured data is huge. ❖ 20% (structured) vs. 80% (unstructured) ❖ There is a massive potential waiting to be leveraged in the analysis of unstructured data in the field of finance.
  28. 28. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Thanks for Your Listening!

×