Lesson 5: Predictability of Events (date mining) (M30P)

342 views
294 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
342
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Lesson 5: Predictability of Events (date mining) (M30P)

  1. 1. Alberta Ingenuity & CMASTE Lesson 5: Predictability of Events Purpose: To see if predictability of events can be improved by having access to a larger data set. Problem: Consider a 64 team basketball tournament with single knockout, and the probability that you can predict the winning team in the final as well as the winner of all 63 games. Hypothesis: Forecasts made by prediction groups tend to be more accurate than those made by individuals. Prediction: The Machine Learning strategy of data mining will be able to improve the success rate of picking game winners by at least 20%. Design: This model is similar to but not identical to the NCAA championship; however we will use the NCAA tournament for the model. (The 2008 tournament format with first round teams is included). Since there are 64 teams in a single knockout tournament, the number of games played forms the geometric series 32 + 16 + 8 + 4 + 2 + 1 , with a sum 1 of 63. The probability of picking the correct final game winner is , if all teams are 64 considered equal, and the probability of picking the winner in every game is the very 63 1 1 small value of   or . Do you think the predictability could be improved 2 9.22 × 1018 by adding some other information to you? Specifically, you will receive the predictions of all the other participants, as percentages. The number of different predictions possible for the entire tournament is 2 63 , but this is substantially more than there are people on the planet. Still, if this contest was web-based the number of prediction sheets to be analyzed would be very large. This is where the machine learning technique of data mining can be used to analyze the picks of all participants. Data mining is the process of sorting through large amounts of data using computer-designed queries to find relevant information and/or patterns. Since we are only using a classroom of students this process can be done by hand. Once students’ first prediction sheets are handed in, the data will be compiled by the teacher and these results will be given to each participant to look over. Then each student will fill out a second identical form with their new picks; they may change their selections or not, based on the group predictions. After the tournament is complete, we want to compare the success rate between making your picks alone and making the picks with the extra information from all the other participant selections. Materials: Students are given two tournament sheets with all competing teams paired to show the first round of 32 games one week before the tournament begins. (Note: the sheets do not show the slots for the 2 semifinal games and the final game, but these can easily be fit into the centre area of the sheet.) AICML5PredictabilityofEvents Centre for Machine Learning 1/9
  2. 2. Alberta Ingenuity & CMASTE Procedure: 1) Students may do any research they want on team strengths, rankings, etc., but must hand in their first selection sheet completed 3 days before the tournament begins. 2) The teacher will compile all the prediction data from the first selection sheets and distribute this information to students.(The “data mining” compilation sheets are included). Perhaps giving the percentage of students choosing each team winning at each bracket all the way to the final game would be appropriate. 3) Students would then use this data to fill out the second prediction sheet and hand this in the day before the tournament begins. Evidence: After the tournament has concluded, the teacher will count the number of correct predictions for each student on both the first and second prediction sheet and record these at the bottom of the sheets. Analysis: 1) The teacher would then analyze the results of success rates on sheet 1 compared to success rates on sheet 2, student by student. 2) The teacher would find how many students had their rate of success in predictions from sheet 1 to sheet 2: a) increase b) decrease or c) stay the same. As well, the teacher would find the total change for the class, and the percent change for the class as a whole ( ± %) Evaluation: The teacher would then decide if aggregating opinions was more efficient than individual forecasting, for this group of students and if collaborative problem solving produced better results. Synthesis: This tournament is followed by millions in North America and in other parts of the world. If this activity was done on a scale where a very large number of participants took part, the amount of data analysis would be onerous, but this is the nature of true data mining. Using computer-designed queries, the predictions of the large group can readily be made available to each individual. This idea can also be extended beyond sports predictions to areas such as timely and effective use of medical records, better prediction of products that will be in demand, future fashion trends, forecasting stock market trends, etc. Sources: 1. http://research.yahoo.com/node/1898, “Predicting the Future with Basketball Bets”, Chen, Yiling et al, 2008 2. Data Mining and Decision Support, Mladenic, Dunja, et al, pg. 3 – 8 3. www.SampleWords.com, 64 Slot Regional Layout Tournament Brackets 4. http://en.wikipedia.org/wiki/Data_mining AICML5PredictabilityofEvents Centre for Machine Learning 2/9
  3. 3. Alberta Ingenuity & CMASTE Here is the tournament format from 2008 with first round teams shown: The next pages are: pg 4) the first blank copy of the 64–team tournament format, to be completed by each individual student and handed in several days before the tournament, pg 5-7) the teacher “data mining” compilation sheets, pg 8) the second blank copy of the 64–team tournament format to be completed after the data from the first sheets has been compiled and distributed to all students, and pg 9) the final evaluation sheet for the teacher. AICML5PredictabilityofEvents Centre for Machine Learning 3/9
  4. 4. Alberta Ingenuity & CMASTE Sheet #1 Due Date ________________ Name _____________________ Number Correct Results: = = ______ % 63 games 63 AICML5PredictabilityofEvents Centre for Machine Learning 4/9
  5. 5. Alberta Ingenuity & CMASTE Teacher “Data Mining” Compilation Sheets Round 1 Round 2 Round 3 Semi Final Final ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% AICML5PredictabilityofEvents Centre for Machine Learning 5/9
  6. 6. Alberta Ingenuity & CMASTE Round 1 Round 2 Round 3 Semi Final Final ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% AICML5PredictabilityofEvents Centre for Machine Learning 6/9
  7. 7. Alberta Ingenuity & CMASTE Round 1 Round 2 Round 3 Semi Final Final ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% ___________ ___% __________ ___% __________ ___% __________ ___% ___________ ___% Teacher Note: These sheets allow for all 64 teams to be accounted for at each stage of the tournament, but many of the blanks should not be needed as many teams will not be selected by any students as the tournament progresses. AICML5PredictabilityofEvents Centre for Machine Learning 7/9
  8. 8. Alberta Ingenuity & CMASTE Sheet #2 Due Date ________________ Name ____________________ Number Correct Results: = = ______ % 63 games 63 AICML5PredictabilityofEvents Centre for Machine Learning 8/9
  9. 9. Alberta Ingenuity & CMASTE Final Evaluation Sheet Number of students in activity _________ Total correct predictions on sheet 1 _________ Total correct predictions on sheet 2 _________ Net change from sheet 1 to sheet 2 ( ± ) __________ Percent change from sheet 1 to sheet 2 ________% Number of students who increased their success rate from sheet 1 to sheet 2 __________ Number of students who decreased their success rate from sheet 1 to sheet 2 __________ Conclusion (individual predictions vs. group predictions) _______________________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ AICML5PredictabilityofEvents Centre for Machine Learning 9/9

×