Using Collaborative Filtering For Effective Training Programs

Using Linear & Logistic Regression along with Collaborative
Filtering Technique for Effective Training Program Deployment
Part I
Deepak Manjarekar

ABSTRACT— C
ountless times we hear that a bright student failing in a
“If a man does not keep pace with his companions, perhaps it is specific test or being completely indifferent towards
because he hears a different drummer. Let him step to the music learning a specific subject. Parents and teachers are
which he hears, however measured or far away.” equally perplexed about why a natural genius would
- Henry David Thoreau fail or perform poorly in what someone might feel to be
A an easy subject? The problem does not lie in our
ll over the world organizations are spending enormous incorrect classification of the student as a genius,
amounts of resources to train their employees. The rather it lies in the erroneous selection of the training
concept of “Learning Organizations” is beginning to program for that student. To make matters worse, we
emerge as a competitive necessity.1 The surge in the observe the same phenomenon in many organizations
internal training programs is largely in the hope that the where many high performing employees or high
employees will be able to cope up with the fast ranking students freshly minted from top notch
changing technologies and be productive in their job universities go inside the four walls of the training room
right from day one. The proliferation of internal training only to find themselves to be a fly on the wall. Each
programs is also due to the fact that the external year organizations spend millions of dollars and
training programs are usually very expensive, not in countless hours in training their new recruits and star
the close vicinity of the organization and may have performers only to find dismal results. At best
schedules that won’t fit the needs of the organization. employees come out of the training class with minimal
So in the current era where companies like to familiarity of the subject. Thus the learning division
outsource everything that is not their core business, within an organization usually suffers with low ROI on
we see a reverse trend of in-sourcing the training the aggregate spent on the learning activities. It is
programs for their employees. Learning organizations clear many corporate training programs are unable to
within the companies are thus cost centers whose deliver the results companies expect.2
primary responsibility is to deliver effective & custom
made training programs that may prepare the trainees Main reasons why the corporate training
in latest technologies or processes. Yet despite of all programs fail?
the customization; organizations are still grappling with We can attribute the marginal success of the training
the problem of little or no ROI on their training
programs mainly to the following three reasons,
programs. What’s happening then? May be the
1. Poorly organized training programs
delivery of the training was not right? Perhaps the
selection process of the trainees may be flawed? I 2. Ineffective training delivery
tend to think that the ineffectiveness of the training 3. Improper selection of trainees for the training
programs is largely due to the wrong selection of program
trainees by the training program coordinators. This
paper will illustrate use of regression, logistics Now let’s look in details what goes wrong in all three
regression and collaborative filtering techniques to cases.
correctly identify employees, who may enjoy the
training, benefit from it and may continue to use the 1. Poorly organized training programs
learned skills long after the training was over. "Forward-thinking companies have reinvented their
training organizations around the concept of running
INTRODUCTION training like a business, and have tangible successes
to show for it. These corporations now know what they
_________________________________________________________

are spending on training and what the investment
Deepak Manjarekar is working as a Program Manager at KPIT yields." says David van Adelsberg & Edward A.
INFOSYSTEMS LTD, Hinjewadi, Pune, India. Currently he is managing
the company’s second largest star customer account in the offshore Trolley, co-authors of “Running Training Like a
delivery. He has seventeen years of experience in the IT Industry and has Business”3
worked with many fortune 100 clients delivering solutions in Data
Warehousing and Business Intelligence space. Mr. Manjarekar is an
Electronics Engineer from Bombay University and has received his MBA But let’s be very honest with ourselves and ask the
from Anderson School of Management at University of California at Los question, “How many companies are really forward
Angeles (UCLA). He is also a PMI certified PMP. He currently resides in thinking?” Let’s ask, “How many organizations treat
Pune, India. You can reach him at Deepak.manjarkar@kpitcummins.com.

©Deepak Manjarekar, KPIT INFOSYSTEMS LTD 1

their learning organizations as profit centers versus 3. Improper selection of trainees for the training
cost centers?” The truth is that many organizations still program
treat their own learning organizations as cost centers. In KPIT Cummins, it’s a frequent gripe from the
So every time the company tries to cut cost, it’s usually learning organization that Project Managers or
the learning organization that gets the hit first. Program Managers don’t send their high ranking
personnel (or resources) for the various training
Since organizations have started in-sourcing the programs. “We always get the not so bright or low
training programs, they are burdened with managing all performing employees to train” is what they say5. Now
the training programs using internal resources. These one can argue about what should be the charter of the
resources may or may not come from education learning organization within a company. One might say
delivery background. Typically most trainers are high that the learning organization’s prime responsibility
performing folks in their own technical forte who are should be to train employees who are not really the
then delegated with the task to train others in the same star performers or employees who need some kind of
technology. The trainers may or may not have any technical training to progress ahead in their jobs. But
teaching background. The same folks are then does this mean that the high flyers should be deprived
delegated the task of creating the training materials for from quality training? I can argue about the learning
the program. Such type of training materials lacks the organization’s charter to the end of this article. But
simplicity as well as appropriate depth that are that’s not the focus of this article.
required to go with the delivery.
The following points can highlight why the trainee
Many times the training rooms lacks proper selection is usually full of flaws. These are industry
infrastructure that is required for effective learning. In observations and not necessarily reflect the trends at
today’s world where real estate is such a prime KPIT Cummins.
commodity, we often find organizations taking the
liberty in creating crammed training rooms to make a. Favoritism
some room for the corner offices. These crammed b. Crying baby gets the milk
rooms are not conducive for any learning activities at c. Reluctance to release bright and deserving
all. Sharing one computer among more than one candidates for higher and more appropriate
trainee, no set time for lab work, missing charts, training for the fear that they may surpass their
workbooks and other training materials further adds to supervisors
the poor organization of the training programs. d. Many deserving candidates are so busy in their
day-to-day work that they don’t find time for any
2. Ineffective training delivery outside work activities like training.
“Traditional training methods such as classroom e. Supervisors would simply avoid sending their
and workshop training are by far the most conventional high performers for training to maintain business
and popular methods of delivering training to continuity. Keeping their best people within the
employees. Their effectiveness will depend upon the training walls may disrupt the business
content delivered and how interesting the presenter continuity.
can make the material in order to engage and involve f. Peer pressure on employees. Many times
employees.”- Writes TimothyF.Bednarz, Ph.D.4 employees enroll themselves in a training
program because their colleagues are enrolled in
As I have mentioned earlier, most trainers are high the same program, only to find themselves in a
performing individual contributors who have been wrong class.
delegated the task of training others in the same area g. Sending substitutes for the training when the
of their expertise. They have little or no experience in actual enrollee can’t attend.
conducting formal training. Some trainers even have h. Supply and demand for the training on a specific
communication problems and find it difficult to conduct technology. Companies sometimes arrange a
classes in the languages that are not native to them. mass training on technologies such as Oracle
Many suffer from exhibiting no or low energy during the Apps or SAP and force all the people on the
teaching, there by droning the class to sleep. Many bench to attend the training so in case there is a
lack passion for teaching. Often their commitment to requirement these people can fill it up.
class and the learning of its participants is i. Due to unforeseen circumstances sometimes
questionable. It’s a frequent feedback from the people find themselves inside the four walls of
participants that the trainer lacked any practical the classroom undergoing the training that
experience or that he/she did not have anything to seems to suffocate them.
share from the work outside of the four classroom
walls. Whatever may be the case, we can find countless


examples where the trainee provides feedback to about a specific training program will definitely matter
the organization stating that he/she has gained to me, provided they have attended that specific
marginally from the training. Many don’t use the training program. It’s like going to a movie if your best
learning from training due lack of applicability or job friend recommends it to you after watching it
change or role change. himself/herself. It works because you share many
common traits in your individual profiles. So in that
In summary there is a lot of subjectivity built into the respect a trainee should definitely value the opinions of
candidate selection for the training. Due to the former trainees of the same program if their profiles
reasons listed above and many more that are not are more or less similar. But where should a trainee
listed, we find many misfits in the classroom find such folks? Should the person simply poll the
scratching their heads why in the first place they people within his network inside the organization? May
ended up in that class? be not. The reason being these people may be in his
network for various reasons and not just because they
The Scope share same profiles. So any arbitrary opinions may
T further pollute the perception of the trainee. To
he scope of this paper is limited to the appropriate circumvent this problem, I will use the collaborative
candidate selection process via statistical and filtering technique to add the subjectivity back into the
mathematical modeling. There fore I will selection process. But now this subjectivity will be from
conveniently ignore the first two points I had made the trainee’s point of view and should further help the
earlier about why corporate training programs fail? trainee to decide whether s/he should attend the
Let us just assume for the sake of simplicity that training. Let’s call this the good subjectivity factor.
corporations will take care of the first two problems
by setting up state of the art training facilities and by The design of the OSCS Model
hiring the best trainers money can buy. Let us just To design the OSCS model, I came up with three
assume that in the ideal scenario, we have to deal steps.
with just the final point of improper selection of 1. The first step will take out the bad subjectivity
trainees for the training program. I am purposely factor from trainee selection process.
limiting the scope of this paper to this last point as 2. Second step will make an objective candidate
you can soon realize that dealing with just this last selection by using the results of the first step.
point will be an insurmountable task. Taking out the 3. Third step will add the good subjectivity factor
subjectivity and bias from the candidate selection back into the selection process.
process may sound very simple but will be very
difficult to implement at the least. Let’s see how we During the OSCS Model building we will have to
can formulate the objective candidate selection perform the following steps in a specific order. I am
model for the training programs? essentially making two hypotheses here; and to test
them out I will have to use linear multi dimensional
Objective / subjective candidate selection regression technique to measure the statistical
model significance of my alternate hypothesis (Ha) in each
I case (step 1 above). After testing the first two
n order to design the Objective / Subjective Candidate hypotheses, I will use logistic regression technique to
Selection Model (hear after referred as OSCS Model) I create a predictive model (step 2). This model will
will suggest the use of three mathematical/statistical predict whether a certain employee if selected will be
techniques. But before moving onto the techniques, I successful in a specific training program or not?
suppose, I owe an explanation of what is OSCS Finally, we will add the good subjectivity factor back
model? In the beginning of this article I have spent into the equation (step 3). The four techniques I will
much time in articulating how subjectivity from use are namely, sample size determination for finite
managers or trainers point of view is detrimental to the population for survey, linear multi dimensional
success of the training program. Let’s call this the bad regression, logistics regression predictive model &
subjectivity factor. With bad subjectivity factor we collaborative filtering.
generally choose inappropriate candidates for the
training. So we must device a way to make the trainee
selection process objective. But if we think about 1. Taking out the bad subjectivity factor from
subjectivity from the trainee’s point of view; it may not trainee selection process.
necessarily as bad. Confused? May be. Let me explain
it little bit further. If I have couple of colleagues who The first null hypothesis I am making here is,
are my best buddies and who also incidentally share
the same professional profile as I do, then their opinion
H 0
: Every employee who attends a specific training
is always benefited from the training.


There fore my alternate hypothesis will be, provide us any predictive power. Also we will eliminate
H a
: Not all employees benefit from the training they the attributes that does not affect (zero covariance) the
desired outcome, i.e. success in training.
receive.
Thus at the end of this process we will come out with
To test out the first hypothesis I will need to conduct a set of attributes that are significant in unbiased
a survey of a statistically significant sample of candidate selection.
employees who had undergone some kind of training
in past. This sample can be chosen from the finite 2. Make an objective candidate selection using
population. We call it finite population because at any the results of the first step.
given time the organization will be able to identify all
the employees who had attended any training1. The After identifying the set of attributes that are
sample size will depend on the confidence level we necessary and significant for the candidate selection,
would want on the results of our survey. (Please see we will need to use logistic regression to come out with
appendix 1 on how to calculate the sample size for a an objective decision of whether a certain candidate
finite population6) should be selected for a specific training.
Upon identifying the random sample of trainees, we In order to achieve this we will need to build a
can survey them and find out the effectiveness of the predictive model using the logistic regression
past training for each trainee. We can invalidate the technique. To build such a model we will start with the
null hypothesis if our positive responses are below list of significant attributes from the step one. Let’s just
certain cut off level. This cut off level may or may not say we have identified a set of 10 attributes namely,
exist for a learning organization. x1, x2, x3, …..x10 from step one. To build the model
we will have to once more use the past training data.
The second null hypothesis I am making here is,
H 0
: Every single personal & professional attribute of In this case we will have to create the dataset (called
the candidate is equally important in the selection training dataset) with balanced outcomes. By this, I
process. mean, our training dataset must contain equal
There fore my alternate hypothesis will be, proportions of favorable and unfavorable outcomes
(passed/failed, successful/unsuccessful, satisfied/
H a
: There are certain personal & professional unsatisfied, etc) along with the proper distributions of
attributes that matter more than others. significant attributes for each candidate. Again the
training dataset size really matters here. To achieve a
To test out this first hypothesis I will need to conduct high degree of predictability, the training dataset must
a linear regression with multiple parameters. My contain reasonable amount of data covering most of
approach will be to start with all the data attributes that the possible domain values for all the attributes
the learning organization has captured so far about involved. When the training dataset is not sufficiently
each candidate and then weed out the ones that don’t large, predictive models tend to over fit the data7. Over
have any statistical significance in the effective training fitting causes a definite problem where the model
reception of the candidate. We can start with personal works very, very well on the training dataset but it will
parameters like age, gender, native language, 2nd, 3rd & fail spectacularly on the test dataset (unseen data).
4th languages if spoken, domicile state, primary We will need two other important datasets called
language of education, educations degrees, other holdout dataset and test dataset8. Holdout dataset can
training, etc. We can also start with professional be created from the same training dataset by randomly
parameters like number of years of experience, no of selecting 10% to 15% of the training data. These
years in the last assignment, tech skill set, title, prior records are kept aside and are not used during the
experience in the training subjects, etc. model training. Instead once the final model is built, the
holdout dataset is used to create the confusion matrix
The attribute elimination process will be based on and assess the predictability of the model. Test
the covariance of the attributes with the expected dataset is created from the sample data that is
outcome, i.e. the success of the trainee in the gathered after the model has been built and whose
program. During this process, we will eliminate many outcome was not known when the model was under
non significant attributes (low covariance) that does not construction. How to create these datasets is beyond
the scope of this first part of the paper. I will cover it
1
________________________________________________________ briefly in the second paper.
_ You can say it’s a big assumption. May be so. In case if an organization is
_
very large and does not have data for all the past trainees, we can still
formulate an experiment and derive a sample size based on infinite
population.


One might say we could have very well used the linear using less than 10 attributes & reduce the need of
multi dimensional regression model form the first step huge training dataset.
here; rather than building a new model using the same
attributes. The following graph shows an output of a Thus at the end of the second stage we will get a
simple linear regression with just one predictor predictive model that will tell us whether to send a
variable. particular trainee to the training or not based on the
output of the model.

3. Add the good subjectivity factor back into the
selection process.

After identifying the trainee from the second step, we
may invite the trainee for the actual training program.
But we will have to give the chance to the trainee to
assess if s/he would like to self select for this training.
Here we will be adding the subjectivity back into the
decision criteria, but from the trainee’s point of view.

The reason we can’t use the equation2 we get from the The first thing we can expect here from the trainee is
first step is because the predictive values of y may “self selection”. Self selection happens when a person
cross the bounds of 0 & 1 based on the values of the uses his/her own decision criteria to a specific
ten weights and the respective attributes values for a problem. This process is very complex and hard to
trainee. The best reason for using the logistic quantify but most of the time delivers correct
regression is that the model will always return us the assessment. We also call this as “gut feel”. Everyone
values of y as a Bernouilli probability of either 0 or 1. has it and its accuracy increases as one get older and
Where the value of 1 can be mapped for success and experience various decision making situations.
0 for failure or vice versa.
Once the candidate uses “self selection” process,
If y = α 0 ± α1x1 ± α 2 x 2 ± .... ± α 10 x10 ± ε from half of the good subjectivity is added to the model by
the linear multi dimensional regression, the trainee himself. But to add further value to the
Then p (Y = 1 | y ) = e y /(1 + e y ) Thus we will model, we can use another technique called,
“collaborative filtering” to gauge the appropriateness of
basically receive a probability value of the Y = 1 (or the training for the trainee as experienced by other
success) between zero and one based on the given trainees in past who has undergone the same training.
values of the attributes and their respective weights. As mentioned above, this is like getting the
The following figure will depict the output values of a recommendation from your colleagues who you may or
logistics regression given one predictor input. may not know but share nearly the same profile as
yours. Thus it’s useful when you want to make
predictions on preferences by considering all of a long
training history.

Now let me walk you thru an example of how
collaborative filtering works. Let’s assume a trainee
named Akshay has already gone thru our first two
steps and we arrived at the objective decision that
Akshay should take the Hyperion training. We gave
Akshay a chance to self select himself for the program.
After much contemplation, Akshay thinks he should go
Also while creating the predictive model in second
for the training. At this time he may or may not have
step, we may or may not end up using all ten any reservations. But in any case, we tell Akshay that
attributes. Using more variables may cause the model let’s see how many people who are like Akshay and
to over fit the data, if the training dataset size is not who have undergone similar training programs will
large enough to cover all the domain values. Thus by recommend the training to Akshay?
not using the equation in first step, we may end up
Vote of Akshay for Hyperion training will be equal to
2
________________________________________________________
_ At the end of the linear multi dimensional regression we will get the
_
the average of other trainees’ vote for the Hyperion
equation of the form training.
y = α 0 ± α1x1 ± α 2 x 2 ± .... ± α10 x10 ± ε , where α1, α2
are weights; x1, x2 are the attributes & ε is error term. But we need to take into account the following:


• Different people have different “standards” bough this book also bought books a, b & c, etc.
• People more similar to Akshay should be giving
higher weighting in predicting Akshay’s In this part I of the paper, I am postulating my ideas
preferences. of how we can use past training data in the
organization and use appropriate statistical analysis to
If a is the “active user”, and c the “candidate training”, correctly identify the deserving candidates for a
then v ca , the predictive vote for user a for candidate
ˆ specific training using an objective criteria.
training c can be given as9,
In part II of this paper, I will actually either prove or
disprove my postulation. The task will not be easy. I
v ca − v*a = ∑ w(a, u ) * (vc − v*u )
ˆ u
will have to uncover as much as past training data as
u≠a
possible. Deal with the data quality issue of the found
data, if any. It is going to be an interesting endeavor
Where v*a is the mean vote of active user a & and may last for more than a year, as you might know
v*u is the mean vote of user u that building predictive models are very easy but
testing them with actual test datasets takes time as the
test data needs to come after the model is built.
Choosing an appropriate weight3 w(a, u) where
c ( a, u ) So stay tuned…
w(a, u ) = and where k a can be derived as,
ka
k a = ∑ | c (a, u ) | and where c(a,u) is the votes APPENDIX 1
u≠a
1. Statistical Theory for Sampling of Finite
correlation between users a and u, and k a is the
Population
normalizing factor so that the absolute weights sum to
one10.
Suppose
• The proxy for the total poplation is called the
Putting all the various formula elements together we
Sampling Frame (SF). The SF has N customers,
can further simplify the formula and write as, where N is not large
• The mean and variance of the quantity of interest
∑ c ( a , u ) * (v − v
u
c *
u
) (QI) across the SF are m and s 2 respectively
v =v
a
ˆ a
+ u≠a
• We draw a simple random sample of n * customers
∑ | c ( a, u ) |
c *
_
u≠a The sample mean x has a probability distribution,
1 1
which has mean m and variance s * ( * − )
Thus if the predicted vote for the active user a for 2

the candidate training c is greater than the average n N
vote of all other users who has taken the training c and The square root of the variance of the sampling
have rated that training above average (meaning they distribution is called the standard error of the mean
liked it and it proved useful to them); then we will give and is given as,
our “thumbs up” for the training c to the active user a 1 1
(i.e. in our example Akshay). S=s ( *
− )
n N
This may sound confusing without the actual
example. But the idea is very simple. You recommend Key insight: A sample of n * out of N has the same
the specific training to the trainee if you find most other error as a sample of n out of ∞ if:
similar profiled candidates have given their “thumbs
up” to the training after taking it earlier. Another
example I could give you is from Amazon.com site. 1 1 1
s ( *
− )=s
When you search for a specific book name and are n N n
ready to buy the book the site actually recommends 1 1 1
you couple of other books saying, “customers who ( * − )=
n N n
1 1 1
________________________________________________________
= +
3

_ Weights can be defined in many ways. I am going to use a simple
_
correlation method here. The other two techniques I contemplate on using n* N n
would be cosine similarity and Pearson correlation. Both the formulae are
given in appendix 2


nN
n* =
n+ N
n
n* = as N is very small &
n+N
n is very large
n
n* =
n
1+
N

APPENDIX 2
1. Cosine Similarity11
The similarity measure can be based on the cosine of the
angle between two feature vectors. This technique was
primarily used in information retrieval for calculating
similarity between two documents, where documents were
usually represented as vectors of word frequencies. In this
context, weights can be defined as:
vu1,i v u 2 ,i
w(u1, u 2) = ∑
i∈items ∑v
k∈i1
u1, k 2 ∑v
k∈i2
u 2,k 2

2.Pearson Correlation12
Weights can be defined in terms of the Pearson correlation
coefficient [5]. Pearson correlation is also used in statistics
to evaluate the degree of linear relationship between two
variables. It ranges from –1 (a perfect negative relationship)
to +1 (a perfect positive relationship), with 0 stating that
there is no relationship whatsoever. The formula is as
follows:
_ _

∑ (vu1, j − v u1 )(vu 2, j − v u 2 )
j∈items
w(u1, u 2) =
_ _

∑ (vu1, j − v u1 ) 2 (vu 2, j − v u 2 ) 2
j∈items

REFERENCES


1
Peter Senge, The fifth discipline
2
Timothy F. Bednarz, Ph. D. in his e-book Maximizing training investment, The executive key to achieving results. Page 7.

3
David van Adelsberg & Edward A. Trolley, co-authors of the book Running Training Like a Business
4
Timothy F. Bednarz, Ph. D. in his e-book Maximizing training investment, The executive key to achieving results. Page 6.

5
Based on the comments made by the participants in the Learning Organization Forum at KPIT INFOSYSTEMS LTD.
6
Russell V. Lenth, Department of Statistics, University of Iowa. Some Practical Guidelines for Effective Sample-Size
Determination. Published on March 1, 2001
7
Michael J. A. Berry & Gordon S. Linoff, Data Mining Techniques, Second Edition, Wiley Publications. Page – 234.
8
Michael J. A. Berry & Gordon S. Linoff, Data Mining Techniques, Second Edition, Wiley Publications. Page – 52.
9
P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, J.
Riedl. GroupLens: An Open Architecture for
Collaborative Filtering for Netnews. Proceedings of
CSCW ’94. 1994.

Prof. Anand Bodapati, Anderson School of Management, UCLA, CA, USA. Extracted from class notes of MGMT 267
10

One-on-one Marketing. Collaborative filtering: Weight calculation. Spring 2006.
11
Miha Grčar, USER PROFILING:
COLLABORATIVE FILTERING
Department of Knowledge Technologies
Jozef Stefan Institute.
Jamova 39, 1000 Ljubljana, Slovenia
12
P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, J.
Riedl. GroupLens: An Open Architecture for
Collaborative Filtering for Netnews. Proceedings of
CSCW ’94. 1994.

Using Collaborative Filtering For Effective Training Programs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Using Collaborative Filtering For Effective Training Programs

Similar to Using Collaborative Filtering For Effective Training Programs (20)

Using Collaborative Filtering For Effective Training Programs