Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

2,573 views

Published on

My talk will cover four ranking and clustering projects that I consulted on this past year. The projects range from ranking Olympic athletes, mixed martial arts fighters, and cell phone carriers to clustering sentences to rank individuals by how much humility they evidence in their written language. For each project, I will address the particular data challenges and the solutions and techniques we proposed.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

  1. 1. 4 Consulting Projects from this past year September 19, 2014 Machine Learning 2014 Amy Langville Mathematics Department College of Charleston langvillea@cofc.edu 1
  2. 2. 4 Consulting Projects from this past year Tyler Perini Mathematics Department College of Charleston perinita@g.cofc.edu 2 Amy Langville Mathematics Department College of Charleston langvillea@cofc.edu
  3. 3. 4 Consulting Projects from this past year 3 Tyler Perini Mathematics Department College of Charleston perinita@g.cofc.edu Amy Langville Mathematics Department College of Charleston langvillea@cofc.edu
  4. 4. Outline  2 Books generate questions  US Olympic Projects  CageRank  Ranking Cell Phone Carriers  The Humility Project 4
  5. 5. 2 Books generate questions 1232-1315 5
  6. 6. 2 Books generate questions 1232-1315 6 Chapter 7 talks about . . . but I need to . . . Any advice?
  7. 7. 2 Books generate questions 1232-1315 7 Chapter 7 talks about . . . but I need to . . . Any advice? I really enjoyed your book, but my problem is . . ., which you don’t mention. How do I solve it?
  8. 8. Project 1: from U.S. Olympic Committee 8
  9. 9. Project 1: from U.S. Olympic Committee 9  Problem 1: Your book talks a lot about ranking in head-to-head contests (and that was helpful), but we need to rank multi-competitor sports like downhill skiing and gymnastics.
  10. 10. Project 1: from U.S. Olympic Committee 10  Problem 1:  Solution 1: μ = average skill σ = uncertainty Your book talks a lot about ranking in head-to-head contests (and that was helpful), but we need to rank multi-competitor sports like downhill skiing and gymnastics.
  11. 11. Project 1: from U.S. Olympic Committee 11  Problem 1:  Solution 1: TrueSkill
  12. 12. Project 1: from U.S. Olympic Committee 12 1st 3rd 2nd
  13. 13. Project 1: from U.S. Olympic Committee 13 1st 3rd 2nd
  14. 14. Project 1: from U.S. Olympic Committee 14 2nd 3rd 1st
  15. 15. Project 1: from U.S. Olympic Committee 15  Problem 2: Your book talks a lot about ranking in head-to-head contests where there are multiple matches between competitors, but our data is sparse. Any advice?
  16. 16. Project 2: CageRank  Problem: 16 You talk a lot about ranking head-to- head contests, like ours [MMA fights], but our data is really sparse. How do we deal with that?
  17. 17. Project 2: CageRank  Problem:  Solution: to densify the graph 17 You talk a lot about ranking head-to- head contests, like ours [MMA fights], but our data is really sparse. How do we deal with that?
  18. 18. UFC 163 Phil Davis LyotoMachida
  19. 19. UFC 163 Phil Davis LyotoMachida had never fought each other
  20. 20. College football vs. UFC
  21. 21. UFC 163 Phil Davis LyotoMachida 1 Ricardo Arona Rashad Evans 1 Find 10 most 2 Jason Brilz Ryan Bader 2 similar 3 Ryan Bader Alexander Gustafson 3 fighters to 4 Stephan Bonnar Antonio Rogerio Nogueira 4 each 5 Randy Couture Quinton “Rampage” Jackson 5 6 Trevor Prangley Chael Sonnen 6 7 Tito Ortiz Matt Hamill 7 8 Mark Coleman James Te-Huna 8 9 Ovince St. Preux Dan Henderson 9 10 Chael Sonnen Vladimir Matyushenko 10 Similar by? Fightmetric stats
  22. 22. UFC 163 Phil Davis LyotoMachida 1 Ricardo Arona Rashad Evans 1 2 Jason Brilz Ryan Bader 2 3 Ryan Bader Alexander Gustafson 3 4 Stephan Bonnar Antonio Rogerio Nogueira 4 5 Randy Couture Quinton “Rampage” Jackson 5 6 Trevor Prangley Chael Sonnen 6 7 Tito Ortiz Matt Hamill 7 8 Mark Coleman James Te-Huna 8 9 Ovince St. Preux Dan Henderson 9 10 Chael Sonnen Vladimir Matyushenko 10 6
  23. 23. UFC 163 1 2 Phil Davis LyotoMachida 1 Ricardo Arona Rashad Evans 1 2 Jason Brilz Ryan Bader 2 3 Ryan Bader Alexander Gustafson 3 4 Stephan Bonnar Antonio Rogerio Nogueira 4 5 Randy Couture Quinton “Rampage” Jackson 5 6 Trevor Prangley Chael Sonnen 6 7 Tito Ortiz Matt Hamill 7 8 Mark Coleman James Te-Huna 8 9 Ovince St. Preux Dan Henderson 9 10 Chael Sonnen Vladimir Matyushenko 10 6 Question: is the goal to predict the winner or generate buzz?
  24. 24. Project 3: Ranking Cell Phone Carriers  Problem: 24 Rather than individual games between carriers, we have a distribution of game scores for each carrier. How do we use this data to rank carriers?
  25. 25. Project 3: Ranking Cell Phone Carriers  Problem:  Solution: , then rank aggregate by (#carriers each carrier outranks). 25 Rather than individual games between carriers, we have a distribution of game scores for each carrier. How do we use this data to rank carriers?
  26. 26. Project 3: Ranking Cell Phone Carriers  Problem:  Solution: simulate head-to-head games by random draws from data, then rank aggregate by Borda count (#carriers each carrier outranks).  New Problem: data is loaded with ties! 26 Rather than individual games between carriers, we have a distribution of game scores for each carrier. How do we use this data to rank carriers?
  27. 27. Project 3: Ranking Cell Phone Carriers 27 Question: what makes a model good? Stability in the face of small data changes Explainability to public
  28. 28. Project 4: Humility Project  Problem: 28 We’re trying to analyze a person’s writing to predict his/her humility, but we lost our data guy. Can you help us?
  29. 29. Project 4: Humility Project  Problem:  Solution: (NMF) to find hidden clusters in text. 29 We’re trying to analyze a person’s writing to predict his/her humility, but we lost our data guy. Can you help us?
  30. 30. Project 4: Humility Project 30
  31. 31. Project 4: Humility Project 31
  32. 32. Project 4: Humility Project 32
  33. 33. Project 4: Humility Project 33
  34. 34. Project 4: Humility Project 34
  35. 35. Conclusions We need you. You open our eyes to problems we never would have thought about. Iterative Collaboration Many exist. Some just need tweaking. 35
  36. 36. Conclusions We need you. You open our eyes to problems we never would have thought about. Iterative Collaboration Many exist. Some just need tweaking. 36 Future Work . . . (you tell me)

×