In nowadays financial market, there is no doubt that stock market was the essential part in it. And we trust that if we can predict the stock price volatility, it will be a meaningful and attractive research project, that is the reason why we choose this issue. For the stock price prediction, there will be several problems that we may facing when we procedure to the methodology level. This project was a real challenging task for us and if we can provide any possible solution for this subject, it will be a significant contribution for the finance and investment market. Although we all know that accurately predict is not possible, but our project is putting our aim on how to provide a useful strategy only focus on the Australia market and companies.
3. Project Background
Stock Prediction Method:
• Human Experience
• Analyze the stock trend
• Analyze the news of stock market
Traditional Stock Prediction Shortages:
• Affect by many factors
• Difficult to predict
• Highly based on experience
11. Stock Price Crawling
• 5 years of energy companies
• Collect 5 days price after annual report
released
• Yahoo Finance API
• Clean data into -1/0/1
13. Keywords processing
In this part, I will be charge for the keywords
processing, to fulfill the future use of the annual
report, we need to convert the format of the file
and procedure it into python 3 environment for
processing.
14. Artificial Intelligence VS Human Being
As we know that annual report was a summary of the company for one year and it will be
extremely complex for human to read because there are amount of information in the
report, now we choose to use machine to process it will be easiest to process and not that
exhausting.
16. About TextRank4
This is a python implementation of TextRank for automatic keyword
and sentence extraction (summarization) as done in Github. However,
this implementation uses Levenshtein Distance as the relation
between text units. This implementation carries out automatic
keyword and sentence extraction.
100 word summary
Number of keywords extracted is relative to the size of the text (a third
of the number of nodes in the graph)
Adjacent keywords in the text are concatenated into keyphrases
19. Fit the model
We will using the Azure for our modelling
process and our data format should be like:
Keyword1, Keyword2, Keyword3. The expected
output should be like image shown below.
24. Conclusion & Future Research
1. Insufficient Data Volume: Around 200
companies in ASX Energy Area
2. Implement into System: Python Package for
Azure
3. Report forms
First of all, our company’s annual reports are using the PDF format, in order to use those reports for our analyses, we need convert the PDF based reports into the Text files. In this project, we choose small PDF to do this job. After that, we put our results into the Jupyter notebook to preprocessing it.
This is a python implementation of TextRank for automatic keyword and sentence extraction (summarization) as done in Github. However, this implementation uses Levenshtein Distance as the relation between text units. This implementation carries out automatic keyword and sentence extraction.
100 word summary
Number of keywords extracted is relative to the size of the text (a third of the number of nodes in the graph)
Adjacent keywords in the text are concatenated into keyphrases
Split the original text into sentences, filter out the stop words in each sentence, and retain only the words of the specified part of speech. Which can be a collection of sentences and a collection of words. Like in our project, we have put all of our company’s annual report in it and get the keywords with the correlation and it’s summary.