Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Agile Explained by LeanDog by LeanDog 4862 views
- The Value of Content Marketing | Az... by Matt Mesenger 1721 views
- Yes, and! How lessons from improv c... by Michael Hagesfeld 953 views
- Practical example of Scrum and Kanb... by Victor Bogomolov 762 views
- Agile From the Top Down: Executives... by LeanDog 16927 views
- Scrum In 15 Minutes by Srikanth Shreenivas 57499 views

1,339 views

Published on

No Downloads

Total views

1,339

On SlideShare

0

From Embeds

0

Number of Embeds

43

Shares

0

Downloads

6

Comments

0

Likes

3

No embeds

No notes for slide

- 1. David Hoerster 2014
- 2. C# MVP (Since April 2011) Sr. Director of Web Solutions at RGP Conference Director for Pittsburgh TechFest Co-Founder of BrainCredits (braincredits.com) Past President of Pittsburgh .NET Users Group and organizer of recent Pittsburgh Code Camps and other Tech Events Twitter - @DavidHoerster Blog – http://geekswithblogs.net/DavidHoerster Email – david@agileways.com
- 3. +
- 4. Basic understanding of document databases, like Mongo Familiarity of querying (non-aggregate pipeline) in Mongo General understanding of baseball
- 5. Basics of AVG, OBP, ERA have been around Underground of advanced statistics been growing since early 70s Bill James is probably most well known Society for American Baseball Research (SABR) Fosters the research of baseball statistical history Stats like wOBA, wRAA, WAR, DIPS, NERD and more Lends itself to computer modeling and big data
- 6. Document database A “NoSQL” solution Wide range of querying and manipulation capabilities
- 7. Issue a JSON document find and findOne like LINQ Select and First/Single methods Basic cursor functionality (think DataReader)
- 8. Download as a NuGet package Actively worked on and contributed to There is an “official” client, along with several community clients
- 9. MongoDB’s data aggregation solution Modeled on the concept of data processing pipelines Operations are performed in stages Results from one stage “piped” to the next stage $match $project $sort
- 10. Number of operations available $group, $match, $project, $sort, $limit, $skip, $redact, $out, … Essentially replaces the older mapReduce functionality Aggregation Pipeline provides better performance, generally mapReduce is more flexible Aggregation combines a number of operations in order to produce a result set
- 11. Maximum size of a returned document is 16 MB Aggregation Pipeline now returns results using cursor (as of 2.6) Each stage of a pipeline has a maximum limit of 100MB of RAM Enable allowDiskUse in order to write to disk and avoid this limitation MongoDB will also optimize the pipeline, if possible
- 12. Count
- 13. Batting Average (Hits / At Bats)
- 14. Batting Average
- 15. Batting Average in C#
- 16. Part of Mongo C# Driver Implements find and findOne Other grouping and projecting done client-side Do you want all that data before manipulating it?
- 17. Add a $match pipeline operation
- 18. Now need to sort
- 19. But wait…we have incorrect results for top Batting Average Need to enhance $match to include those with 3.1 PA for 162 games
- 20. In C# Using LINQ
- 21. Not truly aggregation pipeline in C# Done on client, not server Materialize on client with LINQ Must use BsonDocument for aggregation pipeline Yikes!
- 22. Creating the $match BsonDocument var match = new BsonDocument{ {"$match", new BsonDocument{ {"Year", 2013}, {"AtBats", new BsonDocument{ {"$gte", 502} }} }} };
- 23. Create the $project operation var project = new BsonDocument { {"$project", new BsonDocument{ {"PlayerId", 1}, {"Year", 1}, {"TeamId", 1}, {"AVG", new BsonDocument{ {"$cond", new BsonDocument{ {"if", new BsonDocument{ {"$eq", new BsonArray{"$AtBats", "0"}} }}, {"then", 0}, {"else", new BsonDocument{ {"$divide", new BsonArray{"$Hits", "$AtBats"}} }} }} }} }} };
- 24. Create the $sort and $limit operations and then combine them all in an Array var sort = new BsonDocument{ {"$sort", new BsonDocument{ {"AVG", -1} } } }; var limit = new BsonDocument{ {"$limit", 25} }; return new[] { match, project, sort, limit };
- 25. All the { } with BsonDocument and BsonArray reminds me of…
- 26. A measure of how often a batter reaches base for any reason other than a fielding error, fielder's choice, dropped/uncaught third strike, fielder's obstruction, or catcher's interference. - Wikipedia (http://en.wikipedia.org/wiki/On-base_percentage) Usually a better measure of batter’s performance than straight average (H + BB + HBP) / (AB + BB + HBP + SF)
- 27. (Hits + BB + HBP) / (AB + BB + HBP + SF) db.batting.aggregate([ {$match: { Year: 2013, AtBats: {$gte: 502} }}, {$project: { PlayerId: 1, Year: 1, TeamId: 1, OBP: { $cond: { if: {$eq: ["$AtBats", 0] }, then: 0, else: { $divide: [ {$add:["$Hits","$BaseOnBalls","$HitByPitch"]}, {$add:["$AtBats","$BaseOnBalls","$HitByPitch","$SacrificeFlies"]} ]} }} }}, {$sort: {OBP: -1}}, {$limit: 25} ])
- 28. $match $project $sort $limit
- 29. Early SABRmetric type of stat, invented by Bill James With regard to an offensive player, the first key question is how many runs have resulted from what he has done with the bat and on the basepaths. Willie McCovey hit .270 in his career, with 353 doubles, 46 triples, 521 home runs and 1,345 walks -- but his job was not to hit doubles, nor to hit singles, nor to hit triples, nor to draw walks or even hit home runs, but rather to put runs on the scoreboard. How many runs resulted from all of these things? - Bill James (James, Bill (1985). The Bill James Historical Baseball Abstract (1st ed.), pp. 273-4. Villard. ISBN 0-394-53713-0) ((H + BB) x TB) / (AB + BB) Aggregated across a team, RC is usually within 5% of a team’s actual runs
- 30. (Hits + Walks) * Total Bases / (At Bats + Walks) db.batting.aggregate([ {$match: {Year:2013, AtBats:{$gte:502}}}, {$project: { PlayerId: 1, Year: 1, TeamId: 1, RC: { $divide: [ {$multiply: [ {$add: ["$Hits","$BaseOnBalls"]}, {$add: ["$Hits","$Doubles","$Triples","$Triples", "$HomeRuns","$HomeRuns","$HomeRuns"] }] }, { $add: ["$AtBats","$BaseOnBalls"] }] } }}, {$sort: {RC:-1}}, {$limit: 25} ])
- 31. $match $project $sort $limit
- 32. db.batting.aggregate([ {$match: {Year:2013}}, {$group: { _id: "$TeamId", Hits: {$sum: "$Hits"}, Walks: {$sum: "$BaseOnBalls"}, Doubles: {$sum: "$Doubles"}, Triples: {$sum: "$Triples"}, HR: {$sum: "$HomeRuns"}, AtBats: {$sum: "$AtBats"} }}, {$project: { RC: { $divide: [ {$multiply: [ {$add: ["$Hits","$Walks"]}, {$add: ["$Hits","$Doubles","$Triples","$Triples","$HR","$HR","$HR"] } ]}, { $add: ["$AtBats","$Walks"] }] } }}, {$sort: {RC: -1}} ])
- 33. $match $group $project $sort
- 34. Babe Ruth highest paid player in 20’s ($80K in ‘30/’31) Babe and Ty Cobb were highest paid in 1920 at $20K Joe DiMaggio highest paid in 1950 ($100K) Nolan Ryan made $1M in 1980 (1st time) Albert Belle made $10M in 1997 In 1999, made ~$12M (more than entire Pirates payroll) 2001 – ARod made $22M 2009 – ARod made $33M
- 35. Hoerster copyrighted statistic Compares the value each base produced by a hitter Who are the most expensive players?
- 36. Takes total bases Hits + Doubles + (Triples x 2) + (HR x 3) + SB + BB + HBP – CS Divides salary into it Definitely not predictive More of a value statistic
- 37. Is a statistic, created by Tom Tango and based on linear regression, designed to measure a player's overall offensive contributions per plate appearance. - Wikipedia (http://en.wikipedia.org/wiki/Weighted_on-base_average) Weighs each component of offensive with a factor ((wBB*BB)+(wHBP*HBP)+(wH*Hits)+(w2B*2B)+(w3B*3B)+(wHR*HR)+(wSB*SB)+(wCS*CS)) (AB+BB+HBP+SF-IBB)
- 38. var woba = db.WOBALookup.findOne({_id:2013}); db.batting.aggregate([ {$match: {Year: woba._id}}, {$redact: { $cond: { if: { $gte: ["$AtBats",502] }, then: "$$KEEP", else: "$$PRUNE“ } }}, {$project: { Year: 1, PlayerId: 1, TeamId: 1, WOBA: { $divide: [ {$add: [{$multiply:[woba.wBB,"$BaseOnBalls"]}, {$multiply:[woba.wHBP,"$HitByPitch"]}, {$multiply:[woba.w1B,"$Hits"]}, {$multiply:[woba.w2B,"$Doubles"]}, {$multiply:[woba.w3B,"$Triples"]}, {$multiply:[woba.wHR,"$HomeRuns"]}, {$multiply:[woba.runSB,"$StolenBases"]}, {$multiply:[woba.runCS,"$CaughtStealing"]} ]}, {$add: "]}]} ] } }}, {$limit:25}, {$sort: {WOBA:-1}}, {$out: "2013TopWOBA"} ])
- 39. $match $redact $project $limit $sort $out wOBA_Factors 2013TopWOBA
- 40. Calculates, on average, how many more runs a player generates than the average player in the league Uses wOBA as a primary factor in calculation This then gets figured in for the over WAR of a player Good description here: http://www.baseball-reference.com/about/war_explained_wraa.shtml
- 41. var woba = db.WOBALookup.findOne({_id:2013}); db.TopWOBA2013.aggregate([ {$match: {Year: woba._id}}, {$project: { Year: 1, PlayerId: 1, TeamId: 1, wRAA: { $multiply: [ {$divide: [{$subtract: ["$WOBA",woba.wOBA]}, woba.wOBAScale]}, {$add: ["$AtBats","$BaseOnBalls","$HitByPitch", "$SacrificeFlies","$SacrificeHits"]} ] } }}, {$sort: { wRAA: -1 }}, {$out: 'TopWRAA013'} ]);
- 42. $match $project $sort $out wOBA_Factors 'TopWRAA013
- 43. Much of aggregate pipeline in Mongo can be done with LINQ But it will be client-side, not in Mongo! Take advantage of $out for intermediary tables during processing Stage your operations Maybe intermediary tables can be reused for other calcs $group id’s can be multi-valued Ends up as a sub-document and must be referenced accordingly
- 44. Sean Lahman’s Baseball Database http://seanlahman.com/baseball-archive/statistics/ Society for American Baseball Research http://sabr.org/ wOBA Annual Factors http://www.beyondtheboxscore.com/2011/1/4/1912914/custom-woba-and-linear- weights-through-2010-baseball-databank-data Tom Tango’s Blog http://espn.go.com/blog/statsinfo/tag/_/name/tom-tango Annual Salary Leaders, 1874 – 2012 http://sabr.org/research/mlbs-annual-salary-leaders-1874-2012

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment