Using Collaborative Filtering For Effective Training Programs


Published on

This white paper was a finalist and runner up in the Tecxpedition 2007 at KPIT Cummins. It talks about how a trainee may benefit from the training if his supervisor uses some statistical intelligence for selection of appropriate training module that the person may enjoy and value.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Using Collaborative Filtering For Effective Training Programs

  1. 1. Using Linear & Logistic Regression along with Collaborative Filtering Technique for Effective Training Program Deployment Part I Deepak Manjarekar ABSTRACT— C ountless times we hear that a bright student failing in a “If a man does not keep pace with his companions, perhaps it is specific test or being completely indifferent towards because he hears a different drummer. Let him step to the music learning a specific subject. Parents and teachers are which he hears, however measured or far away.” equally perplexed about why a natural genius would - Henry David Thoreau fail or perform poorly in what someone might feel to be A an easy subject? The problem does not lie in our ll over the world organizations are spending enormous incorrect classification of the student as a genius, amounts of resources to train their employees. The rather it lies in the erroneous selection of the training concept of “Learning Organizations” is beginning to program for that student. To make matters worse, we emerge as a competitive necessity.1 The surge in the observe the same phenomenon in many organizations internal training programs is largely in the hope that the where many high performing employees or high employees will be able to cope up with the fast ranking students freshly minted from top notch changing technologies and be productive in their job universities go inside the four walls of the training room right from day one. The proliferation of internal training only to find themselves to be a fly on the wall. Each programs is also due to the fact that the external year organizations spend millions of dollars and training programs are usually very expensive, not in countless hours in training their new recruits and star the close vicinity of the organization and may have performers only to find dismal results. At best schedules that won’t fit the needs of the organization. employees come out of the training class with minimal So in the current era where companies like to familiarity of the subject. Thus the learning division outsource everything that is not their core business, within an organization usually suffers with low ROI on we see a reverse trend of in-sourcing the training the aggregate spent on the learning activities. It is programs for their employees. Learning organizations clear many corporate training programs are unable to within the companies are thus cost centers whose deliver the results companies expect.2 primary responsibility is to deliver effective & custom made training programs that may prepare the trainees Main reasons why the corporate training in latest technologies or processes. Yet despite of all programs fail? the customization; organizations are still grappling with We can attribute the marginal success of the training the problem of little or no ROI on their training programs mainly to the following three reasons, programs. What’s happening then? May be the 1. Poorly organized training programs delivery of the training was not right? Perhaps the selection process of the trainees may be flawed? I 2. Ineffective training delivery tend to think that the ineffectiveness of the training 3. Improper selection of trainees for the training programs is largely due to the wrong selection of program trainees by the training program coordinators. This paper will illustrate use of regression, logistics Now let’s look in details what goes wrong in all three regression and collaborative filtering techniques to cases. correctly identify employees, who may enjoy the training, benefit from it and may continue to use the 1. Poorly organized training programs learned skills long after the training was over. "Forward-thinking companies have reinvented their training organizations around the concept of running INTRODUCTION training like a business, and have tangible successes to show for it. These corporations now know what they _________________________________________________________  are spending on training and what the investment Deepak Manjarekar is working as a Program Manager at KPIT yields." says David van Adelsberg & Edward A. INFOSYSTEMS LTD, Hinjewadi, Pune, India. Currently he is managing the company’s second largest star customer account in the offshore Trolley, co-authors of “Running Training Like a delivery. He has seventeen years of experience in the IT Industry and has Business”3 worked with many fortune 100 clients delivering solutions in Data Warehousing and Business Intelligence space. Mr. Manjarekar is an Electronics Engineer from Bombay University and has received his MBA But let’s be very honest with ourselves and ask the from Anderson School of Management at University of California at Los question, “How many companies are really forward Angeles (UCLA). He is also a PMI certified PMP. He currently resides in thinking?” Let’s ask, “How many organizations treat Pune, India. You can reach him at ©Deepak Manjarekar, KPIT INFOSYSTEMS LTD 1
  2. 2. their learning organizations as profit centers versus 3. Improper selection of trainees for the training cost centers?” The truth is that many organizations still program treat their own learning organizations as cost centers. In KPIT Cummins, it’s a frequent gripe from the So every time the company tries to cut cost, it’s usually learning organization that Project Managers or the learning organization that gets the hit first. Program Managers don’t send their high ranking personnel (or resources) for the various training Since organizations have started in-sourcing the programs. “We always get the not so bright or low training programs, they are burdened with managing all performing employees to train” is what they say5. Now the training programs using internal resources. These one can argue about what should be the charter of the resources may or may not come from education learning organization within a company. One might say delivery background. Typically most trainers are high that the learning organization’s prime responsibility performing folks in their own technical forte who are should be to train employees who are not really the then delegated with the task to train others in the same star performers or employees who need some kind of technology. The trainers may or may not have any technical training to progress ahead in their jobs. But teaching background. The same folks are then does this mean that the high flyers should be deprived delegated the task of creating the training materials for from quality training? I can argue about the learning the program. Such type of training materials lacks the organization’s charter to the end of this article. But simplicity as well as appropriate depth that are that’s not the focus of this article. required to go with the delivery. The following points can highlight why the trainee Many times the training rooms lacks proper selection is usually full of flaws. These are industry infrastructure that is required for effective learning. In observations and not necessarily reflect the trends at today’s world where real estate is such a prime KPIT Cummins. commodity, we often find organizations taking the liberty in creating crammed training rooms to make a. Favoritism some room for the corner offices. These crammed b. Crying baby gets the milk rooms are not conducive for any learning activities at c. Reluctance to release bright and deserving all. Sharing one computer among more than one candidates for higher and more appropriate trainee, no set time for lab work, missing charts, training for the fear that they may surpass their workbooks and other training materials further adds to supervisors the poor organization of the training programs. d. Many deserving candidates are so busy in their day-to-day work that they don’t find time for any 2. Ineffective training delivery outside work activities like training. “Traditional training methods such as classroom e. Supervisors would simply avoid sending their and workshop training are by far the most conventional high performers for training to maintain business and popular methods of delivering training to continuity. Keeping their best people within the employees. Their effectiveness will depend upon the training walls may disrupt the business content delivered and how interesting the presenter continuity. can make the material in order to engage and involve f. Peer pressure on employees. Many times employees.”- Writes TimothyF.Bednarz, Ph.D.4 employees enroll themselves in a training program because their colleagues are enrolled in As I have mentioned earlier, most trainers are high the same program, only to find themselves in a performing individual contributors who have been wrong class. delegated the task of training others in the same area g. Sending substitutes for the training when the of their expertise. They have little or no experience in actual enrollee can’t attend. conducting formal training. Some trainers even have h. Supply and demand for the training on a specific communication problems and find it difficult to conduct technology. Companies sometimes arrange a classes in the languages that are not native to them. mass training on technologies such as Oracle Many suffer from exhibiting no or low energy during the Apps or SAP and force all the people on the teaching, there by droning the class to sleep. Many bench to attend the training so in case there is a lack passion for teaching. Often their commitment to requirement these people can fill it up. class and the learning of its participants is i. Due to unforeseen circumstances sometimes questionable. It’s a frequent feedback from the people find themselves inside the four walls of participants that the trainer lacked any practical the classroom undergoing the training that experience or that he/she did not have anything to seems to suffocate them. share from the work outside of the four classroom walls. Whatever may be the case, we can find countless ©Deepak Manjarekar, KPIT INFOSYSTEMS LTD 2
  3. 3. examples where the trainee provides feedback to about a specific training program will definitely matter the organization stating that he/she has gained to me, provided they have attended that specific marginally from the training. Many don’t use the training program. It’s like going to a movie if your best learning from training due lack of applicability or job friend recommends it to you after watching it change or role change. himself/herself. It works because you share many common traits in your individual profiles. So in that In summary there is a lot of subjectivity built into the respect a trainee should definitely value the opinions of candidate selection for the training. Due to the former trainees of the same program if their profiles reasons listed above and many more that are not are more or less similar. But where should a trainee listed, we find many misfits in the classroom find such folks? Should the person simply poll the scratching their heads why in the first place they people within his network inside the organization? May ended up in that class? be not. The reason being these people may be in his network for various reasons and not just because they The Scope share same profiles. So any arbitrary opinions may T further pollute the perception of the trainee. To he scope of this paper is limited to the appropriate circumvent this problem, I will use the collaborative candidate selection process via statistical and filtering technique to add the subjectivity back into the mathematical modeling. There fore I will selection process. But now this subjectivity will be from conveniently ignore the first two points I had made the trainee’s point of view and should further help the earlier about why corporate training programs fail? trainee to decide whether s/he should attend the Let us just assume for the sake of simplicity that training. Let’s call this the good subjectivity factor. corporations will take care of the first two problems by setting up state of the art training facilities and by The design of the OSCS Model hiring the best trainers money can buy. Let us just To design the OSCS model, I came up with three assume that in the ideal scenario, we have to deal steps. with just the final point of improper selection of 1. The first step will take out the bad subjectivity trainees for the training program. I am purposely factor from trainee selection process. limiting the scope of this paper to this last point as 2. Second step will make an objective candidate you can soon realize that dealing with just this last selection by using the results of the first step. point will be an insurmountable task. Taking out the 3. Third step will add the good subjectivity factor subjectivity and bias from the candidate selection back into the selection process. process may sound very simple but will be very difficult to implement at the least. Let’s see how we During the OSCS Model building we will have to can formulate the objective candidate selection perform the following steps in a specific order. I am model for the training programs? essentially making two hypotheses here; and to test them out I will have to use linear multi dimensional Objective / subjective candidate selection regression technique to measure the statistical model significance of my alternate hypothesis (Ha) in each I case (step 1 above). After testing the first two n order to design the Objective / Subjective Candidate hypotheses, I will use logistic regression technique to Selection Model (hear after referred as OSCS Model) I create a predictive model (step 2). This model will will suggest the use of three mathematical/statistical predict whether a certain employee if selected will be techniques. But before moving onto the techniques, I successful in a specific training program or not? suppose, I owe an explanation of what is OSCS Finally, we will add the good subjectivity factor back model? In the beginning of this article I have spent into the equation (step 3). The four techniques I will much time in articulating how subjectivity from use are namely, sample size determination for finite managers or trainers point of view is detrimental to the population for survey, linear multi dimensional success of the training program. Let’s call this the bad regression, logistics regression predictive model & subjectivity factor. With bad subjectivity factor we collaborative filtering. generally choose inappropriate candidates for the training. So we must device a way to make the trainee selection process objective. But if we think about 1. Taking out the bad subjectivity factor from subjectivity from the trainee’s point of view; it may not trainee selection process. necessarily as bad. Confused? May be. Let me explain it little bit further. If I have couple of colleagues who The first null hypothesis I am making here is, are my best buddies and who also incidentally share the same professional profile as I do, then their opinion H 0 : Every employee who attends a specific training is always benefited from the training. ©Deepak Manjarekar, KPIT INFOSYSTEMS LTD 3
  4. 4. There fore my alternate hypothesis will be, provide us any predictive power. Also we will eliminate H a : Not all employees benefit from the training they the attributes that does not affect (zero covariance) the desired outcome, i.e. success in training. receive. Thus at the end of this process we will come out with To test out the first hypothesis I will need to conduct a set of attributes that are significant in unbiased a survey of a statistically significant sample of candidate selection. employees who had undergone some kind of training in past. This sample can be chosen from the finite 2. Make an objective candidate selection using population. We call it finite population because at any the results of the first step. given time the organization will be able to identify all the employees who had attended any training1. The After identifying the set of attributes that are sample size will depend on the confidence level we necessary and significant for the candidate selection, would want on the results of our survey. (Please see we will need to use logistic regression to come out with appendix 1 on how to calculate the sample size for a an objective decision of whether a certain candidate finite population6) should be selected for a specific training. Upon identifying the random sample of trainees, we In order to achieve this we will need to build a can survey them and find out the effectiveness of the predictive model using the logistic regression past training for each trainee. We can invalidate the technique. To build such a model we will start with the null hypothesis if our positive responses are below list of significant attributes from the step one. Let’s just certain cut off level. This cut off level may or may not say we have identified a set of 10 attributes namely, exist for a learning organization. x1, x2, x3, …..x10 from step one. To build the model we will have to once more use the past training data. The second null hypothesis I am making here is, H 0 : Every single personal & professional attribute of In this case we will have to create the dataset (called the candidate is equally important in the selection training dataset) with balanced outcomes. By this, I process. mean, our training dataset must contain equal There fore my alternate hypothesis will be, proportions of favorable and unfavorable outcomes (passed/failed, successful/unsuccessful, satisfied/ H a : There are certain personal & professional unsatisfied, etc) along with the proper distributions of attributes that matter more than others. significant attributes for each candidate. Again the training dataset size really matters here. To achieve a To test out this first hypothesis I will need to conduct high degree of predictability, the training dataset must a linear regression with multiple parameters. My contain reasonable amount of data covering most of approach will be to start with all the data attributes that the possible domain values for all the attributes the learning organization has captured so far about involved. When the training dataset is not sufficiently each candidate and then weed out the ones that don’t large, predictive models tend to over fit the data7. Over have any statistical significance in the effective training fitting causes a definite problem where the model reception of the candidate. We can start with personal works very, very well on the training dataset but it will parameters like age, gender, native language, 2nd, 3rd & fail spectacularly on the test dataset (unseen data). 4th languages if spoken, domicile state, primary We will need two other important datasets called language of education, educations degrees, other holdout dataset and test dataset8. Holdout dataset can training, etc. We can also start with professional be created from the same training dataset by randomly parameters like number of years of experience, no of selecting 10% to 15% of the training data. These years in the last assignment, tech skill set, title, prior records are kept aside and are not used during the experience in the training subjects, etc. model training. Instead once the final model is built, the holdout dataset is used to create the confusion matrix The attribute elimination process will be based on and assess the predictability of the model. Test the covariance of the attributes with the expected dataset is created from the sample data that is outcome, i.e. the success of the trainee in the gathered after the model has been built and whose program. During this process, we will eliminate many outcome was not known when the model was under non significant attributes (low covariance) that does not construction. How to create these datasets is beyond the scope of this first part of the paper. I will cover it 1 ________________________________________________________ briefly in the second paper. _ You can say it’s a big assumption. May be so. In case if an organization is _ very large and does not have data for all the past trainees, we can still formulate an experiment and derive a sample size based on infinite population. ©Deepak Manjarekar, KPIT INFOSYSTEMS LTD 4
  5. 5. One might say we could have very well used the linear using less than 10 attributes & reduce the need of multi dimensional regression model form the first step huge training dataset. here; rather than building a new model using the same attributes. The following graph shows an output of a Thus at the end of the second stage we will get a simple linear regression with just one predictor predictive model that will tell us whether to send a variable. particular trainee to the training or not based on the output of the model. 3. Add the good subjectivity factor back into the selection process. After identifying the trainee from the second step, we may invite the trainee for the actual training program. But we will have to give the chance to the trainee to assess if s/he would like to self select for this training. Here we will be adding the subjectivity back into the decision criteria, but from the trainee’s point of view. The reason we can’t use the equation2 we get from the The first thing we can expect here from the trainee is first step is because the predictive values of y may “self selection”. Self selection happens when a person cross the bounds of 0 & 1 based on the values of the uses his/her own decision criteria to a specific ten weights and the respective attributes values for a problem. This process is very complex and hard to trainee. The best reason for using the logistic quantify but most of the time delivers correct regression is that the model will always return us the assessment. We also call this as “gut feel”. Everyone values of y as a Bernouilli probability of either 0 or 1. has it and its accuracy increases as one get older and Where the value of 1 can be mapped for success and experience various decision making situations. 0 for failure or vice versa. Once the candidate uses “self selection” process, If y = α 0 ± α1x1 ± α 2 x 2 ± .... ± α 10 x10 ± ε from half of the good subjectivity is added to the model by the linear multi dimensional regression, the trainee himself. But to add further value to the Then p (Y = 1 | y ) = e y /(1 + e y ) Thus we will model, we can use another technique called, “collaborative filtering” to gauge the appropriateness of basically receive a probability value of the Y = 1 (or the training for the trainee as experienced by other success) between zero and one based on the given trainees in past who has undergone the same training. values of the attributes and their respective weights. As mentioned above, this is like getting the The following figure will depict the output values of a recommendation from your colleagues who you may or logistics regression given one predictor input. may not know but share nearly the same profile as yours. Thus it’s useful when you want to make predictions on preferences by considering all of a long training history. Now let me walk you thru an example of how collaborative filtering works. Let’s assume a trainee named Akshay has already gone thru our first two steps and we arrived at the objective decision that Akshay should take the Hyperion training. We gave Akshay a chance to self select himself for the program. After much contemplation, Akshay thinks he should go Also while creating the predictive model in second for the training. At this time he may or may not have step, we may or may not end up using all ten any reservations. But in any case, we tell Akshay that attributes. Using more variables may cause the model let’s see how many people who are like Akshay and to over fit the data, if the training dataset size is not who have undergone similar training programs will large enough to cover all the domain values. Thus by recommend the training to Akshay? not using the equation in first step, we may end up Vote of Akshay for Hyperion training will be equal to 2 ________________________________________________________ _ At the end of the linear multi dimensional regression we will get the _ the average of other trainees’ vote for the Hyperion equation of the form training. y = α 0 ± α1x1 ± α 2 x 2 ± .... ± α10 x10 ± ε , where α1, α2 are weights; x1, x2 are the attributes & ε is error term. But we need to take into account the following: ©Deepak Manjarekar, KPIT INFOSYSTEMS LTD 5
  6. 6. • Different people have different “standards” bough this book also bought books a, b & c, etc. • People more similar to Akshay should be giving higher weighting in predicting Akshay’s In this part I of the paper, I am postulating my ideas preferences. of how we can use past training data in the organization and use appropriate statistical analysis to If a is the “active user”, and c the “candidate training”, correctly identify the deserving candidates for a then v ca , the predictive vote for user a for candidate ˆ specific training using an objective criteria. training c can be given as9, In part II of this paper, I will actually either prove or disprove my postulation. The task will not be easy. I v ca − v*a = ∑ w(a, u ) * (vc − v*u ) ˆ u will have to uncover as much as past training data as u≠a possible. Deal with the data quality issue of the found data, if any. It is going to be an interesting endeavor Where v*a is the mean vote of active user a & and may last for more than a year, as you might know v*u is the mean vote of user u that building predictive models are very easy but testing them with actual test datasets takes time as the test data needs to come after the model is built. Choosing an appropriate weight3 w(a, u) where c ( a, u ) So stay tuned… w(a, u ) = and where k a can be derived as, ka k a = ∑ | c (a, u ) | and where c(a,u) is the votes APPENDIX 1 u≠a 1. Statistical Theory for Sampling of Finite correlation between users a and u, and k a is the Population normalizing factor so that the absolute weights sum to one10. Suppose • The proxy for the total poplation is called the Putting all the various formula elements together we Sampling Frame (SF). The SF has N customers, can further simplify the formula and write as, where N is not large • The mean and variance of the quantity of interest ∑ c ( a , u ) * (v − v u c * u ) (QI) across the SF are m and s 2 respectively v =v a ˆ a + u≠a • We draw a simple random sample of n * customers ∑ | c ( a, u ) | c * _ u≠a The sample mean x has a probability distribution, 1 1 which has mean m and variance s * ( * − ) Thus if the predicted vote for the active user a for 2 the candidate training c is greater than the average n N vote of all other users who has taken the training c and The square root of the variance of the sampling have rated that training above average (meaning they distribution is called the standard error of the mean liked it and it proved useful to them); then we will give and is given as, our “thumbs up” for the training c to the active user a 1 1 (i.e. in our example Akshay). S=s ( * − ) n N This may sound confusing without the actual example. But the idea is very simple. You recommend Key insight: A sample of n * out of N has the same the specific training to the trainee if you find most other error as a sample of n out of ∞ if: similar profiled candidates have given their “thumbs up” to the training after taking it earlier. Another example I could give you is from site. 1 1 1 s ( * − )=s When you search for a specific book name and are n N n ready to buy the book the site actually recommends 1 1 1 you couple of other books saying, “customers who ( * − )= n N n 1 1 1 ________________________________________________________ = + 3 _ Weights can be defined in many ways. I am going to use a simple _ correlation method here. The other two techniques I contemplate on using n* N n would be cosine similarity and Pearson correlation. Both the formulae are given in appendix 2 ©Deepak Manjarekar, KPIT INFOSYSTEMS LTD 6
  7. 7. nN n* = n+ N n n* = as N is very small & n+N n is very large n n* = n 1+ N APPENDIX 2 1. Cosine Similarity11 The similarity measure can be based on the cosine of the angle between two feature vectors. This technique was primarily used in information retrieval for calculating similarity between two documents, where documents were usually represented as vectors of word frequencies. In this context, weights can be defined as: vu1,i v u 2 ,i w(u1, u 2) = ∑ i∈items ∑v k∈i1 u1, k 2 ∑v k∈i2 u 2,k 2 2.Pearson Correlation12 Weights can be defined in terms of the Pearson correlation coefficient [5]. Pearson correlation is also used in statistics to evaluate the degree of linear relationship between two variables. It ranges from –1 (a perfect negative relationship) to +1 (a perfect positive relationship), with 0 stating that there is no relationship whatsoever. The formula is as follows: _ _ ∑ (vu1, j − v u1 )(vu 2, j − v u 2 ) j∈items w(u1, u 2) = _ _ ∑ (vu1, j − v u1 ) 2 (vu 2, j − v u 2 ) 2 j∈items REFERENCES ©Deepak Manjarekar, KPIT INFOSYSTEMS LTD 7
  8. 8. 1 Peter Senge, The fifth discipline 2 Timothy F. Bednarz, Ph. D. in his e-book Maximizing training investment, The executive key to achieving results. Page 7. 3 David van Adelsberg & Edward A. Trolley, co-authors of the book Running Training Like a Business 4 Timothy F. Bednarz, Ph. D. in his e-book Maximizing training investment, The executive key to achieving results. Page 6. 5 Based on the comments made by the participants in the Learning Organization Forum at KPIT INFOSYSTEMS LTD. 6 Russell V. Lenth, Department of Statistics, University of Iowa. Some Practical Guidelines for Effective Sample-Size Determination. Published on March 1, 2001 7 Michael J. A. Berry & Gordon S. Linoff, Data Mining Techniques, Second Edition, Wiley Publications. Page – 234. 8 Michael J. A. Berry & Gordon S. Linoff, Data Mining Techniques, Second Edition, Wiley Publications. Page – 52. 9 P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering for Netnews. Proceedings of CSCW ’94. 1994. Prof. Anand Bodapati, Anderson School of Management, UCLA, CA, USA. Extracted from class notes of MGMT 267 10 One-on-one Marketing. Collaborative filtering: Weight calculation. Spring 2006. 11 Miha Grčar, USER PROFILING: COLLABORATIVE FILTERING Department of Knowledge Technologies Jozef Stefan Institute. Jamova 39, 1000 Ljubljana, Slovenia 12 P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering for Netnews. Proceedings of CSCW ’94. 1994.