Predicting IBM stock prices using machine learning
1. CC282 Introduction to Machine Learning - Autumn 2008
Dr. R. Palaniappan
Department of Computing and Electronic Systems
University of Essex
Solution to exercise set 1
This set covers material from Lecture 1
________________________________________________________________________________________
Question 1
Make the necessary design choices for E and P for the following tasks T:
Remember that:
Task T: The problem to be solved
Performance P: How well the problem is solved
Experience E: Data presented to the ML system
Also, notice that there can be several possible answers for the given tasks.
a) Face recognition using a digital camera.
As T is to recognize faces seen through a digital camera, the algorithm must learn to
associate many images with the corresponding person's name or identification number.
P: An obvious overall performance measure is the percentage of correct face recognition.
We can also extend this to include a measure of false positives, e.g., when the algorithm
thinks it recognizes a person's face according to the training data, but in fact the face
shown belongs to nobody included in the training data set.
E: What kind of data will be shown and how they will be presented to the algorithm is not
always trivial. To make the algorithm robust and reliable, we want to include the following
data in the training set:
1. As many face images as possible (as inputs) along with the corresponding face
IDs. This may include images obtained with different camera angles and lighting
variations, people with and without glasses, etc.
2. Many face images that will be associated with a `nobody I know' output or
similar. This is very important if, for example, the algorithm will be used to allow
entrance to a building. We do not want the algorithm to wrongly think an
unknown face belongs to someone who might be allowed into the building.
3. Images of objects that resemble real faces but that are not faces at all. These
may include random objects, dummy faces, cartoon drawings, etc. These too
should be associated with a `nobody I know' input.
b) Deciding whether to buy or to sell IBM shares in the stock market.
P: A possible performance measure would quantify how often the algorithm made the
`right decision'. The `right decision' usually means: i) selling right before share prices
begin to drop OR ii) buying right before share prices increase. Buying when prices are
stable may also indicate good performance in some cases (e.g., when prices are not
predicted to drop too soon AND there is a money surplus to buy more shares). However,
selling when prices are stable usually indicates poor performance UNLESS the investor is
2. in urgent need for money. Another alternative is to have P as a measure of how much
money has been lost or won in previous transactions.
E: Choosing the right training data for this case is a critical problem and is a major source
of research for people in the field. However, one possibility might include the following: in
the training set:
1. As much data as possible as inputs, including IBM's past share prices, IBM
productivity parameter, indicator of general financial activities in the IT sector,
plus any other variables one may suspect of playing a role on how well IBM
shares will fare in the future, e.g., political knowledge, foreign currency
exchange rates, Microsoft share prices, Intel share prices, petrol prices, etc.
Data concerning the above should be gathered for as long a period as possible
(e.g., years) and presented to the algorithm as inputs. The desired outputs
during training would be the correct decisions the algorithm should make based
on a given input vector.
2. Because this situation depends on many dynamic variables (i.e., variables that
change with time AND whose present value is determined by past values), the
training set should include dynamic data, e.g., an input vector may include both
present and past data for a particular point in time. This can be done even if the
learning algorithm itself will include some form of dynamics in it (e.g., by using
learning with recurrence, to be seen later in the course).
3. Training data should include vectors whose corresponding outputs will lead to
one of the following decisions: i) `sell', ii) `buy', iii) `wait longer before deciding',
and possibly iv) `do not know what to do; get a human analyst to help'.
Interestingly, option iv) may be useful if we want the algorithm to make a
decision for us only when it has a high confidence level on its decision. If the
level of confidence the algorithm has on its decision is low, we would usually
prefer to be consulted rather than let the algorithm give the wrong investment
advice!
c) Predicting the value of IBM stock shares in the future.
P: As opposed to the case in b) above, this time we do not want to make a decision about
anything. We merely want to be able to predict future IBM share prices given a number of
present and past variable. Although this `Predictor' could be used as part of task b) above
to help make a decision, it is important to see that b) and c) are two entirely different tasks.
b) deals with pattern recognition and decision making, c) deals with modeling/regression.
An obvious performance parameter would the prediction error for a known input-output set.
A more detail P would also allow us to see how the error changes as the period between
the training `present' and `future' increases. For example, we may want to know whether
using training data for October 2007 allows us to make good predictions for November 2007 AND
for December 2007 as well.
E: Training data in this case would include:
1. Past data including all the factors that are suspected to affect the price of IBM
shares (the inputs to the algorithm) along with know share prices.
2. The structure of each input-output pair should allow us to predictions based on
various time distances between input and output, or, if we so desire, the model
should only be required to predict share prices for the day following that of each
input vector (i.e., we could use today's data to predict tomorrows' share prices,
but not the prices for the day after tomorrow or later). Which option we choose
will depend on how complete we want our model to be.
Question 2
Describe the full process of designing (i.e. requirements) of a learning system.
3. First, choices need to be made for T, P and E as shown in Question 1 above. Then, the
following must be determined:
The target function to learn, i.e., what kind of input variables the ML system will use and exactly what kind of
output variables it will give.
A suitable representation for the target function, i.e., how the
ML system will yield an output given a specific input vector.
A mechanism for learning this function, i.e., a way to make sure the above target
function will actually lead to learning (as determined by P) as experience is given to
the ML system.
The type of learning experience, whether the ML will learn on its own or provided with a set of good moves etc.
Also, refer to slides 20-27 from Lecture 1.
Question 3
List four reasons why one may need to use machine learning for a particular task.
As given in the lecture, we would consider using ML when:
• Some problems are hard and complex to solve, e.g. pattern recognition problems (like classifying
handwritten characters)
• To mine information (hidden data) in large data sets (e.g. analyse customer shopping patterns in
supermarkets)
• ML systems can be faster or more accurate (sometimes)
• Ability to mimic human learning and replace certain monotonous tasks - but requiring some
intelligence (e.g. driving on highways for 24 hours)
Also, refer to slide 9 in Lecture 1.
Question 4
a) Given the abilities of machine learning (ML) algorithms, it is always wise to try ML
before trying any other alternative.
Answer: False
Reason: Machine learning can be computationally costly in some case, and it can be unreliable as well.
We should thus never assume ML should be used before we explore simpler
approaches.
b) A good ML algorithm will have the same performance whether or not training data are
pre-processed and/or pre-selected before being presented to the learning system.
Answer: False
Reason: As briefly discussed in the lecture, the quality of the training data with regard to noise and to
how closely it represents situations to be faced by the ML algorithm after learning is
stopped are critical factor in the design of a learning task. Bad or incomplete training data
will lead to poor ML performance regardless of how good the algorithm is.
4. c) Learning implies not only being able to solve a given problem, but, more specifically,
solving the problem better and better as experience is gained.
Answer: True
See the definitions in the lecture slides. In particular, Lecture 1 – slide 17:
"A computer program is said to learn from experience E with respect to some class
of task T and performance measure P, if its performance at tasks in T, as measured
by P, improves with experience E." Mitchell (1997)
i. e. Learning = task performance improves with experience.
Question 5
Name the four main approaches in machine learning, briefly explain each one of them.
1. Supervised: given an input, the desired output is known during training, e.g., when
a target function is to be learned.
2. Unsupervised: no desired output; let algorithm find new representations and
features from the input data.
3. Reinforcement: give the algorithm punishment or reward according to a system's
behaviour Note: this can be considered a form of supervised learning.
4. Rule-learning: find logical structures in the data (could be supervised or
unsupervised, and even reinforced).
Also, refer to slide 24 in Lecture 1.
Question 6
For the credit scoring classification problem as in slide 11 (Lecture 1), the LOW risk rule is
– IF income > Ѳ 1 AND savings > Ѳ 2 THEN low risk
Obtain similar rules for classifying HIGH risk.
Savings
Low risk
-
+ +
- +
+
+ +
Ѳ2 -
High risk
-
- -
Ѳ1 Income
– IF income < Ѳ1 AND savings < Ѳ2 THEN high risk
– IF income > Ѳ1 AND savings < Ѳ2 THEN high risk
– IF income < Ѳ1 AND savings > Ѳ2 THEN high risk
NOTE: The solution
– IF income < Ѳ1 THEN high risk
– IF savings < Ѳ2 THEN high risk
5. Is not a very good solution, though it achieves the correct classification, this is because of the rule overlap for
some points.