MongoDB: Queries and Aggregation Framework with NBA Game Data

5,068 views

Published on

0 Comments
16 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,068
On SlideShare
0
From Embeds
0
Number of Embeds
180
Actions
Shares
0
Downloads
50
Comments
0
Likes
16
Embeds 0
No embeds

No notes for slide

MongoDB: Queries and Aggregation Framework with NBA Game Data

  1. 1. MongoDB Queries and Aggregation Valeri Karpov Kernel Tools Engineer, MongoDB www.thecodebarbarian.com github.com/vkarpov15 @code_barbarian
  2. 2. Introducing an Awesome Data Set •Scraped basketball-reference.com •Mad props to NPM module Cheerio •Box scores for all 31,686 NBA games since 1985 •Download: http://bit.ly/1jlgs9u via S3 •Untar and run mongorestore *
  3. 3. Data Set Structure •Contains final score •Contains box score for teams and players *
  4. 4. Data Set Structure - High Level •Contains _id, date •Info on winning team and losing team *
  5. 5. Data Set Structure - Box •Box score contains detailed stats by team *
  6. 6. Data Set Structure - Box •And also for individual players: *
  7. 7. Queries and Aggregation •MongoDB has a rich query framework •Aggregation framework is like SQL’s group by *
  8. 8. Query Basics - findOne() •When was Kobe Bryant’s 81 point game? *
  9. 9. Query Basics - find() •Which teams have lost despite scoring more than 150 points? *
  10. 10. Query Basics - count() •How many games did the Lakers win in the 19992000 season? *
  11. 11. Query Basics - distinct() •Which teams have lost a game despite having a player make at least 10 3 pointers? *
  12. 12. Query Basics - $elemMatch operator •When did Michael Jordan score 60 points in a losing effort? *
  13. 13. Query Basics - $elemMatch operator *
  14. 14. Query Basics - .sort() and .limit() •What are the 5 highest point totals for a losing team? *
  15. 15. Query Basics - .sort() and .limit() •What are the 5 highest point totals for a losing team? *
  16. 16. Aggregation •Similar to SQL group by •Filters and transforms data in pipeline stages •Stages are chainable •Accessible via the .aggregate() function in shell *
  17. 17. Aggregation - Lakers Season PPG •How many points did the Lakers average in games they won in the 2008-2009 season? *
  18. 18. Aggregation - Lakers Season PPG •How many points did the Lakers average in games they won in the 2008-2009 season? *
  19. 19. Aggregation - $sort and $limit •Compute the teams with the 5 best records in the 1999-2000 season *
  20. 20. Aggregation - $sort and $limit *
  21. 21. Aggregation - $sort and $limit *
  22. 22. Aggregation - $unwind •Random statistic: player with highest scoring average in games their team lost *
  23. 23. Aggregation - $unwind •Random statistic: player with highest scoring average in games their team lost *
  24. 24. Aggregation - Fun With Steals •How often does a team win when they record more steals than the other team? *
  25. 25. Aggregation - Fun With Steals *
  26. 26. Aggregation - Fun With Steals *
  27. 27. Thanks for Listening! Slides on Twitter, @code_barbarian *

×