SlideShare a Scribd company logo
1 of 25
Natural language processing for
annual report in Australia to predict
the stock trend
Yumu Yang & Haoxuan Zhang & Yunfei Zhang
Overview
Introduction & Background
Data Preparation
Test Processing
Module Training
All in One System Design
Conclusion & Future research
Project Background
Stock Prediction Method:
• Human Experience
• Analyze the stock trend
• Analyze the news of stock market
Traditional Stock Prediction Shortages:
• Affect by many factors
• Difficult to predict
• Highly based on experience
Project Flow Chart
Part 2: Data preparation
Annual report Crawling
• Industry area : Energy
• Crawl annual report from more than 200 companies
• Recent 5 years
Annual report
• Crawl annual report from
http://datanalysis.morningstar.c
om.au website
Sample of annual report
Annual report Pre-processing
• Convert .pdf file to .txt file using
OCR(Optical character recognition)
technology
Australian Securities Exchange
(ASX)
• One of world’s leading financial market exchanges
• Biggest exchange company in Australia
Stock Price Crawling
• 5 years of energy companies
• Collect 5 days price after annual report
released
• Yahoo Finance API
• Clean data into -1/0/1
Part 3: Text Processing
Keywords processing
In this part, I will be charge for the keywords
processing, to fulfill the future use of the annual
report, we need to convert the format of the file
and procedure it into python 3 environment for
processing.
Artificial Intelligence VS Human Being
As we know that annual report was a summary of the company for one year and it will be
extremely complex for human to read because there are amount of information in the
report, now we choose to use machine to process it will be easiest to process and not that
exhausting.
Format convert
TextRank4
About TextRank4
This is a python implementation of TextRank for automatic keyword
and sentence extraction (summarization) as done in Github. However,
this implementation uses Levenshtein Distance as the relation
between text units. This implementation carries out automatic
keyword and sentence extraction.
100 word summary
Number of keywords extracted is relative to the size of the text (a third
of the number of nodes in the graph)
Adjacent keywords in the text are concatenated into keyphrases
Python Code
Sample Outputs
Fit the model
We will using the Azure for our modelling
process and our data format should be like:
Keyword1, Keyword2, Keyword3. The expected
output should be like image shown below.
Final Output
Part 4: Model Design
(real-time demo with SVM)
Part 5: All in One System
Automatic System on Azure
Conclusion & Future Research
1. Insufficient Data Volume: Around 200
companies in ASX Energy Area
2. Implement into System: Python Package for
Azure
3. Report forms
Thanks!

More Related Content

Similar to Natural Language Processing for Annual Report in Australia

employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docx
rohithprabhas1
 
STC 2014_ADAPTING TEST AUTOMATION TECHNIQUES TO IMPROVE ORACLE ERP CONFIGURAT...
STC 2014_ADAPTING TEST AUTOMATION TECHNIQUES TO IMPROVE ORACLE ERP CONFIGURAT...STC 2014_ADAPTING TEST AUTOMATION TECHNIQUES TO IMPROVE ORACLE ERP CONFIGURAT...
STC 2014_ADAPTING TEST AUTOMATION TECHNIQUES TO IMPROVE ORACLE ERP CONFIGURAT...
Shivika Khare
 
How to Migrate Applications Off a Mainframe
How to Migrate Applications Off a MainframeHow to Migrate Applications Off a Mainframe
How to Migrate Applications Off a Mainframe
VMware Tanzu
 

Similar to Natural Language Processing for Annual Report in Australia (20)

Comparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonComparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & Python
 
Resume
ResumeResume
Resume
 
Database performance management
Database performance managementDatabase performance management
Database performance management
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docx
 
STC 2014_ADAPTING TEST AUTOMATION TECHNIQUES TO IMPROVE ORACLE ERP CONFIGURAT...
STC 2014_ADAPTING TEST AUTOMATION TECHNIQUES TO IMPROVE ORACLE ERP CONFIGURAT...STC 2014_ADAPTING TEST AUTOMATION TECHNIQUES TO IMPROVE ORACLE ERP CONFIGURAT...
STC 2014_ADAPTING TEST AUTOMATION TECHNIQUES TO IMPROVE ORACLE ERP CONFIGURAT...
 
How to Migrate Applications Off a Mainframe
How to Migrate Applications Off a MainframeHow to Migrate Applications Off a Mainframe
How to Migrate Applications Off a Mainframe
 
Im symposium presentation - OCR and Text analytics for Medical Chart Review ...
Im symposium presentation -  OCR and Text analytics for Medical Chart Review ...Im symposium presentation -  OCR and Text analytics for Medical Chart Review ...
Im symposium presentation - OCR and Text analytics for Medical Chart Review ...
 
Insync10 anthony spierings
Insync10 anthony spieringsInsync10 anthony spierings
Insync10 anthony spierings
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
Practical automation for beginners
Practical automation for beginnersPractical automation for beginners
Practical automation for beginners
 
HR management system
HR management systemHR management system
HR management system
 
SudhanshuKumar
SudhanshuKumarSudhanshuKumar
SudhanshuKumar
 
Vedic Calculator
Vedic CalculatorVedic Calculator
Vedic Calculator
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
 
IRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech RecognitionIRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech Recognition
 
Demantra Case Study Doug
Demantra Case Study DougDemantra Case Study Doug
Demantra Case Study Doug
 
IRJET - Text Summarizer.
IRJET -  	  Text Summarizer.IRJET -  	  Text Summarizer.
IRJET - Text Summarizer.
 
Applying linear regression and predictive analytics
Applying linear regression and predictive analyticsApplying linear regression and predictive analytics
Applying linear regression and predictive analytics
 
Manager's Guide To Oracle Cost Containment
Manager's Guide To Oracle Cost ContainmentManager's Guide To Oracle Cost Containment
Manager's Guide To Oracle Cost Containment
 
Automated Essay Grading using Features Selection
Automated Essay Grading using Features SelectionAutomated Essay Grading using Features Selection
Automated Essay Grading using Features Selection
 

Recently uploaded

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 

Natural Language Processing for Annual Report in Australia

  • 1. Natural language processing for annual report in Australia to predict the stock trend Yumu Yang & Haoxuan Zhang & Yunfei Zhang
  • 2. Overview Introduction & Background Data Preparation Test Processing Module Training All in One System Design Conclusion & Future research
  • 3. Project Background Stock Prediction Method: • Human Experience • Analyze the stock trend • Analyze the news of stock market Traditional Stock Prediction Shortages: • Affect by many factors • Difficult to predict • Highly based on experience
  • 5. Part 2: Data preparation
  • 6. Annual report Crawling • Industry area : Energy • Crawl annual report from more than 200 companies • Recent 5 years
  • 7. Annual report • Crawl annual report from http://datanalysis.morningstar.c om.au website
  • 9. Annual report Pre-processing • Convert .pdf file to .txt file using OCR(Optical character recognition) technology
  • 10. Australian Securities Exchange (ASX) • One of world’s leading financial market exchanges • Biggest exchange company in Australia
  • 11. Stock Price Crawling • 5 years of energy companies • Collect 5 days price after annual report released • Yahoo Finance API • Clean data into -1/0/1
  • 12. Part 3: Text Processing
  • 13. Keywords processing In this part, I will be charge for the keywords processing, to fulfill the future use of the annual report, we need to convert the format of the file and procedure it into python 3 environment for processing.
  • 14. Artificial Intelligence VS Human Being As we know that annual report was a summary of the company for one year and it will be extremely complex for human to read because there are amount of information in the report, now we choose to use machine to process it will be easiest to process and not that exhausting.
  • 16. About TextRank4 This is a python implementation of TextRank for automatic keyword and sentence extraction (summarization) as done in Github. However, this implementation uses Levenshtein Distance as the relation between text units. This implementation carries out automatic keyword and sentence extraction. 100 word summary Number of keywords extracted is relative to the size of the text (a third of the number of nodes in the graph) Adjacent keywords in the text are concatenated into keyphrases
  • 19. Fit the model We will using the Azure for our modelling process and our data format should be like: Keyword1, Keyword2, Keyword3. The expected output should be like image shown below.
  • 21. Part 4: Model Design (real-time demo with SVM)
  • 22. Part 5: All in One System
  • 24. Conclusion & Future Research 1. Insufficient Data Volume: Around 200 companies in ASX Energy Area 2. Implement into System: Python Package for Azure 3. Report forms

Editor's Notes

  1. First of all, our company’s annual reports are using the PDF format, in order to use those reports for our analyses, we need convert the PDF based reports into the Text files. In this project, we choose small PDF to do this job. After that, we put our results into the Jupyter notebook to preprocessing it. This is a python implementation of TextRank for automatic keyword and sentence extraction (summarization) as done in Github. However, this implementation uses Levenshtein Distance as the relation between text units. This implementation carries out automatic keyword and sentence extraction. 100 word summary Number of keywords extracted is relative to the size of the text (a third of the number of nodes in the graph) Adjacent keywords in the text are concatenated into keyphrases Split the original text into sentences, filter out the stop words in each sentence, and retain only the words of the specified part of speech. Which can be a collection of sentences and a collection of words. Like in our project, we have put all of our company’s annual report in it and get the keywords with the correlation and it’s summary.