SlideShare a Scribd company logo
1 of 3
Download to read offline
SCHOOL OF ENGINEERING
COMPUTER ENGINEERING & INFORMATICS
DEPARTMENT
Stock Market Analysis using Data Mining and
Machine Learning Algorithms
DIPLOMA THESIS
Grivas G. Panagiotis
griva@ceid.upatras.gr
Advisor: Professor Vasileios Megalooikonomou
Patra, September 2014
Abstract
The huge volume of economic data today has created the need for technical
analysis and processing of information that will help investors in taking correct
decisions. The subject of Diploma Thesis is the Extraction of Useful Information
through Financial Data. For the purposes of work have taken historical data from the
daily index S&P500. The basic data mining algorithms studied are the following:
Preprocessing, Export of Technical Features, Clustering, Classification, Lag
Correlation and Forecasting. In the context of this thesis the information is organized
into seven chapters.
The first chapter is introductory part, indicating the aim and motivation of this
thesis. The second chapter presents the basic Market analysis techniques which use
graphs and indicators. The third chapter examines the Data Mining methods and
Learning Algorithms aimed at discovering patterns in the data and constructing useful
models that are closer to the characteristics studied. The fourth chapter presents the
way in which data mining techniques applied to the analysis of the shares, while
highlighting the importance of each data mining algorithm for the stock Market. The
fifth chapter analyzes the environments, Matlab and Weka, in which we perform data
mining algorithms in order to analyze stock Market data.
The sixth chapter includes the experimental procedure of the present work. In
the first section of the Chapter, Preprocessing techniques are implemented so that to
improve the quality of the shares, while errors and incorrect attribute values are
removed. The second section examines the problem of Clustering where algorithms
K-Means and Hierarchical are implemented in order to detect 'similar' shares. Initially
we evaluate the performance of Hierarchical Clustering algorithm with Euclidean and
DTW metric distances, for various types of linkages between the clusters. Then we
evaluate the performance of k-Means and Hierarchical (with ward linkage criterion)
Clustering algorithms, for various numbers of clusters. Finally we apply Clustering
algorithms , for standard number of clusters, while we assess the quality of classes
created, with techniques Intra/Inter cluster distance and Silhouette value. The third
section applies the Classification algorithm of k-Nearest Neighbors so that each new
stock coming in stock market to be classified in one of the predefined groups obtained
through Clustering. Furthermore the Classification method is evaluated by checking
whether the shares are categorized in the appropriate class. In the fourth section we
use the Pearson index to find Lag Correlation in shares. Originally we detect shares
with proportional or inverse temporal association with non-zero delay, and examine
whether these shares belong to the same or different classes defined at the outset after
Hierarchical-DTW Clustering process. Yet we identified the shares with proportional
or inverse correlation for delay equal to zero time. Finally applied the lag correlation
algorithm and checked for correlation between stocks not only for their entire length,
but for a window length which starts at a specified time. In the fifth section we
perform Forecasting Algorithms to a set of stocks, where we construct a suitable
prediction model (using first 225 closing values for training set) which can forecast
the last 20 closing values of shares. The forecasting methods applied are the
following: Statistical Technique ARIMA, Artificial Neural Networks (Multilayer
Perceptron), Decision Trees (M5P Tree), Support Vector Machines (SMOreg), Linear
Regression and Instance-Based Learning Algorithms (k-Nearest Neighbors). Finally
we evaluate the performance of forecasting algorithms using both the average
absolute percentage error (MAPE) between actual and predicted values and finding
the prediction accuracy for the investment reliability of the shares in 20 days term
(Trend Prediction).
The seventh chapter presents both conclusions reached after the execution of
the experiments and future extensions that could be applied to the Financial Data
Mining models we constructed.

More Related Content

Viewers also liked

Letter of Recommendation from Joe
Letter of Recommendation from JoeLetter of Recommendation from Joe
Letter of Recommendation from JoeCaleb Henke
 
Session 3A - B.J. Tomlinson
Session 3A - B.J. TomlinsonSession 3A - B.J. Tomlinson
Session 3A - B.J. TomlinsonReenergize
 
O.N.E. Inc Corporate Profile 01-2012
O.N.E. Inc Corporate Profile 01-2012O.N.E. Inc Corporate Profile 01-2012
O.N.E. Inc Corporate Profile 01-2012Samuel L. Crumby Jr.
 
Caleb Henke Resume For Class
Caleb Henke Resume For ClassCaleb Henke Resume For Class
Caleb Henke Resume For ClassCaleb Henke
 
2012 Reenergize the Americas 5B: Martin Gomez
2012 Reenergize the Americas 5B: Martin Gomez2012 Reenergize the Americas 5B: Martin Gomez
2012 Reenergize the Americas 5B: Martin GomezReenergize
 
2012 Reenergize the Americas Keynote: Abbas Ghassemi
2012 Reenergize the Americas Keynote: Abbas Ghassemi2012 Reenergize the Americas Keynote: Abbas Ghassemi
2012 Reenergize the Americas Keynote: Abbas GhassemiReenergize
 
Session 4A - Rajan Gupta
Session 4A - Rajan GuptaSession 4A - Rajan Gupta
Session 4A - Rajan GuptaReenergize
 
2012 Reenergize the Americas 4A: Raúl Felix
2012 Reenergize the Americas 4A: Raúl Felix2012 Reenergize the Americas 4A: Raúl Felix
2012 Reenergize the Americas 4A: Raúl FelixReenergize
 
2012 Reenergize the Americas 3A: Jane Melia
2012 Reenergize the Americas 3A: Jane Melia2012 Reenergize the Americas 3A: Jane Melia
2012 Reenergize the Americas 3A: Jane MeliaReenergize
 
Le competenze di leadership nelle professioni sanitarie(1)
Le competenze di leadership nelle professioni sanitarie(1)Le competenze di leadership nelle professioni sanitarie(1)
Le competenze di leadership nelle professioni sanitarie(1)Felix B. Lecce
 
Etiqueta En La Oficina
Etiqueta En La OficinaEtiqueta En La Oficina
Etiqueta En La Oficinaguestd5522f
 

Viewers also liked (20)

resume!
resume!resume!
resume!
 
Letter of Recommendation from Joe
Letter of Recommendation from JoeLetter of Recommendation from Joe
Letter of Recommendation from Joe
 
Session 3A - B.J. Tomlinson
Session 3A - B.J. TomlinsonSession 3A - B.J. Tomlinson
Session 3A - B.J. Tomlinson
 
Portfolio
PortfolioPortfolio
Portfolio
 
O.N.E. Inc Corporate Profile 01-2012
O.N.E. Inc Corporate Profile 01-2012O.N.E. Inc Corporate Profile 01-2012
O.N.E. Inc Corporate Profile 01-2012
 
Gestão de pessoas
Gestão de pessoasGestão de pessoas
Gestão de pessoas
 
Caleb Henke Resume For Class
Caleb Henke Resume For ClassCaleb Henke Resume For Class
Caleb Henke Resume For Class
 
2012 Reenergize the Americas 5B: Martin Gomez
2012 Reenergize the Americas 5B: Martin Gomez2012 Reenergize the Americas 5B: Martin Gomez
2012 Reenergize the Americas 5B: Martin Gomez
 
P5 w herrmann_soiling_in_arid_climates
P5 w herrmann_soiling_in_arid_climatesP5 w herrmann_soiling_in_arid_climates
P5 w herrmann_soiling_in_arid_climates
 
Gestão de pessoas
Gestão de pessoasGestão de pessoas
Gestão de pessoas
 
2012 Reenergize the Americas Keynote: Abbas Ghassemi
2012 Reenergize the Americas Keynote: Abbas Ghassemi2012 Reenergize the Americas Keynote: Abbas Ghassemi
2012 Reenergize the Americas Keynote: Abbas Ghassemi
 
Legend of Tarzan
Legend of Tarzan Legend of Tarzan
Legend of Tarzan
 
Session 4A - Rajan Gupta
Session 4A - Rajan GuptaSession 4A - Rajan Gupta
Session 4A - Rajan Gupta
 
2012 Reenergize the Americas 4A: Raúl Felix
2012 Reenergize the Americas 4A: Raúl Felix2012 Reenergize the Americas 4A: Raúl Felix
2012 Reenergize the Americas 4A: Raúl Felix
 
2012 Reenergize the Americas 3A: Jane Melia
2012 Reenergize the Americas 3A: Jane Melia2012 Reenergize the Americas 3A: Jane Melia
2012 Reenergize the Americas 3A: Jane Melia
 
Paradoxes of Executive Coaching
Paradoxes of Executive CoachingParadoxes of Executive Coaching
Paradoxes of Executive Coaching
 
Le competenze di leadership nelle professioni sanitarie(1)
Le competenze di leadership nelle professioni sanitarie(1)Le competenze di leadership nelle professioni sanitarie(1)
Le competenze di leadership nelle professioni sanitarie(1)
 
Malarial parasite
Malarial parasiteMalarial parasite
Malarial parasite
 
Etiqueta En La Oficina
Etiqueta En La OficinaEtiqueta En La Oficina
Etiqueta En La Oficina
 
Trabajo Final
Trabajo FinalTrabajo Final
Trabajo Final
 

Similar to Stock Market Analysis using Data Mining & Machine Learning

Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.IRJET Journal
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 
Visualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine LearningVisualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine LearningIRJET Journal
 
A novel hybrid deep learning model for price prediction
A novel hybrid deep learning model for price prediction A novel hybrid deep learning model for price prediction
A novel hybrid deep learning model for price prediction IJECEIAES
 
Building a Classifier Employing Prism Algorithm with Fuzzy Logic
Building a Classifier Employing Prism Algorithm with Fuzzy LogicBuilding a Classifier Employing Prism Algorithm with Fuzzy Logic
Building a Classifier Employing Prism Algorithm with Fuzzy LogicIJDKP
 
Rachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra
 
IRJET- Prediction of Stock Market using Machine Learning Algorithms
IRJET- Prediction of Stock Market using Machine Learning AlgorithmsIRJET- Prediction of Stock Market using Machine Learning Algorithms
IRJET- Prediction of Stock Market using Machine Learning AlgorithmsIRJET Journal
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXmlaij
 
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Editor IJCATR
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
 
A MODEL-BASED APPROACH MACHINE LEARNING TO SCALABLE PORTFOLIO SELECTION
A MODEL-BASED APPROACH MACHINE LEARNING TO SCALABLE PORTFOLIO SELECTIONA MODEL-BASED APPROACH MACHINE LEARNING TO SCALABLE PORTFOLIO SELECTION
A MODEL-BASED APPROACH MACHINE LEARNING TO SCALABLE PORTFOLIO SELECTIONIJCI JOURNAL
 
BATCH 1 FIRST REVIEW-1.pptx
BATCH 1 FIRST REVIEW-1.pptxBATCH 1 FIRST REVIEW-1.pptx
BATCH 1 FIRST REVIEW-1.pptxSurajRavi16
 
Stock Market Prediction using Long Short-Term Memory
Stock Market Prediction using Long Short-Term MemoryStock Market Prediction using Long Short-Term Memory
Stock Market Prediction using Long Short-Term MemoryIRJET Journal
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
Performance Comparisons among Machine Learning Algorithms based on the Stock ...
Performance Comparisons among Machine Learning Algorithms based on the Stock ...Performance Comparisons among Machine Learning Algorithms based on the Stock ...
Performance Comparisons among Machine Learning Algorithms based on the Stock ...IRJET Journal
 
IRJET- Stock Market Forecasting Techniques: A Survey
IRJET- Stock Market Forecasting Techniques: A SurveyIRJET- Stock Market Forecasting Techniques: A Survey
IRJET- Stock Market Forecasting Techniques: A SurveyIRJET Journal
 
A Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data MiningA Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data MiningEditor IJCATR
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityGon-soo Moon
 

Similar to Stock Market Analysis using Data Mining & Machine Learning (20)

Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
 
T0 numtq0n tk=
T0 numtq0n tk=T0 numtq0n tk=
T0 numtq0n tk=
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
Visualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine LearningVisualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine Learning
 
A novel hybrid deep learning model for price prediction
A novel hybrid deep learning model for price prediction A novel hybrid deep learning model for price prediction
A novel hybrid deep learning model for price prediction
 
Building a Classifier Employing Prism Algorithm with Fuzzy Logic
Building a Classifier Employing Prism Algorithm with Fuzzy LogicBuilding a Classifier Employing Prism Algorithm with Fuzzy Logic
Building a Classifier Employing Prism Algorithm with Fuzzy Logic
 
Rachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_report
 
ACCESS.2020.3015966.pdf
ACCESS.2020.3015966.pdfACCESS.2020.3015966.pdf
ACCESS.2020.3015966.pdf
 
IRJET- Prediction of Stock Market using Machine Learning Algorithms
IRJET- Prediction of Stock Market using Machine Learning AlgorithmsIRJET- Prediction of Stock Market using Machine Learning Algorithms
IRJET- Prediction of Stock Market using Machine Learning Algorithms
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOX
 
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
 
A MODEL-BASED APPROACH MACHINE LEARNING TO SCALABLE PORTFOLIO SELECTION
A MODEL-BASED APPROACH MACHINE LEARNING TO SCALABLE PORTFOLIO SELECTIONA MODEL-BASED APPROACH MACHINE LEARNING TO SCALABLE PORTFOLIO SELECTION
A MODEL-BASED APPROACH MACHINE LEARNING TO SCALABLE PORTFOLIO SELECTION
 
BATCH 1 FIRST REVIEW-1.pptx
BATCH 1 FIRST REVIEW-1.pptxBATCH 1 FIRST REVIEW-1.pptx
BATCH 1 FIRST REVIEW-1.pptx
 
Stock Market Prediction using Long Short-Term Memory
Stock Market Prediction using Long Short-Term MemoryStock Market Prediction using Long Short-Term Memory
Stock Market Prediction using Long Short-Term Memory
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Performance Comparisons among Machine Learning Algorithms based on the Stock ...
Performance Comparisons among Machine Learning Algorithms based on the Stock ...Performance Comparisons among Machine Learning Algorithms based on the Stock ...
Performance Comparisons among Machine Learning Algorithms based on the Stock ...
 
IRJET- Stock Market Forecasting Techniques: A Survey
IRJET- Stock Market Forecasting Techniques: A SurveyIRJET- Stock Market Forecasting Techniques: A Survey
IRJET- Stock Market Forecasting Techniques: A Survey
 
A Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data MiningA Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data Mining
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-Severity
 

Stock Market Analysis using Data Mining & Machine Learning

  • 1. SCHOOL OF ENGINEERING COMPUTER ENGINEERING & INFORMATICS DEPARTMENT Stock Market Analysis using Data Mining and Machine Learning Algorithms DIPLOMA THESIS Grivas G. Panagiotis griva@ceid.upatras.gr Advisor: Professor Vasileios Megalooikonomou Patra, September 2014
  • 2. Abstract The huge volume of economic data today has created the need for technical analysis and processing of information that will help investors in taking correct decisions. The subject of Diploma Thesis is the Extraction of Useful Information through Financial Data. For the purposes of work have taken historical data from the daily index S&P500. The basic data mining algorithms studied are the following: Preprocessing, Export of Technical Features, Clustering, Classification, Lag Correlation and Forecasting. In the context of this thesis the information is organized into seven chapters. The first chapter is introductory part, indicating the aim and motivation of this thesis. The second chapter presents the basic Market analysis techniques which use graphs and indicators. The third chapter examines the Data Mining methods and Learning Algorithms aimed at discovering patterns in the data and constructing useful models that are closer to the characteristics studied. The fourth chapter presents the way in which data mining techniques applied to the analysis of the shares, while highlighting the importance of each data mining algorithm for the stock Market. The fifth chapter analyzes the environments, Matlab and Weka, in which we perform data mining algorithms in order to analyze stock Market data. The sixth chapter includes the experimental procedure of the present work. In the first section of the Chapter, Preprocessing techniques are implemented so that to improve the quality of the shares, while errors and incorrect attribute values are removed. The second section examines the problem of Clustering where algorithms K-Means and Hierarchical are implemented in order to detect 'similar' shares. Initially we evaluate the performance of Hierarchical Clustering algorithm with Euclidean and DTW metric distances, for various types of linkages between the clusters. Then we evaluate the performance of k-Means and Hierarchical (with ward linkage criterion) Clustering algorithms, for various numbers of clusters. Finally we apply Clustering algorithms , for standard number of clusters, while we assess the quality of classes created, with techniques Intra/Inter cluster distance and Silhouette value. The third
  • 3. section applies the Classification algorithm of k-Nearest Neighbors so that each new stock coming in stock market to be classified in one of the predefined groups obtained through Clustering. Furthermore the Classification method is evaluated by checking whether the shares are categorized in the appropriate class. In the fourth section we use the Pearson index to find Lag Correlation in shares. Originally we detect shares with proportional or inverse temporal association with non-zero delay, and examine whether these shares belong to the same or different classes defined at the outset after Hierarchical-DTW Clustering process. Yet we identified the shares with proportional or inverse correlation for delay equal to zero time. Finally applied the lag correlation algorithm and checked for correlation between stocks not only for their entire length, but for a window length which starts at a specified time. In the fifth section we perform Forecasting Algorithms to a set of stocks, where we construct a suitable prediction model (using first 225 closing values for training set) which can forecast the last 20 closing values of shares. The forecasting methods applied are the following: Statistical Technique ARIMA, Artificial Neural Networks (Multilayer Perceptron), Decision Trees (M5P Tree), Support Vector Machines (SMOreg), Linear Regression and Instance-Based Learning Algorithms (k-Nearest Neighbors). Finally we evaluate the performance of forecasting algorithms using both the average absolute percentage error (MAPE) between actual and predicted values and finding the prediction accuracy for the investment reliability of the shares in 20 days term (Trend Prediction). The seventh chapter presents both conclusions reached after the execution of the experiments and future extensions that could be applied to the Financial Data Mining models we constructed.