SlideShare a Scribd company logo
David Hoerster
2014
 C# MVP (Since April 2011)
 Sr. Director of Web Solutions at RGP
 Conference Director for Pittsburgh TechFest
 Co-Founder of BrainCredits (braincredits.com)
 Past President of Pittsburgh .NET Users Group and organizer of recent
Pittsburgh Code Camps and other Tech Events
 Twitter - @DavidHoerster
 Blog – http://geekswithblogs.net/DavidHoerster
 Email – david@agileways.com
+
 Basic understanding of document databases, like Mongo
 Familiarity of querying (non-aggregate pipeline) in Mongo
 General understanding of baseball
 Basics of AVG, OBP, ERA have been around
 Underground of advanced statistics been growing since early 70s
 Bill James is probably most well known
 Society for American Baseball Research (SABR)
 Fosters the research of baseball statistical history
 Stats like wOBA, wRAA, WAR, DIPS, NERD and more
 Lends itself to computer modeling and big data
 Document database
 A “NoSQL” solution
 Wide range of querying and manipulation capabilities
 Issue a JSON document
 find and findOne like LINQ Select and First/Single methods
 Basic cursor functionality (think DataReader)
 Download as a NuGet package
 Actively worked on and contributed to
 There is an “official” client, along with several community clients
 MongoDB’s data aggregation solution
 Modeled on the concept of data processing pipelines
 Operations are performed in stages
 Results from one stage “piped” to the next stage
$match
$project
$sort
 Number of operations available
 $group, $match, $project, $sort, $limit, $skip, $redact, $out, …
 Essentially replaces the older mapReduce functionality
 Aggregation Pipeline provides better performance, generally
 mapReduce is more flexible
 Aggregation combines a number of operations in order to produce a result set
 Maximum size of a returned document is 16 MB
 Aggregation Pipeline now returns results using cursor (as of 2.6)
 Each stage of a pipeline has a maximum limit of 100MB of RAM
 Enable allowDiskUse in order to write to disk and avoid this limitation
 MongoDB will also optimize the pipeline, if possible
 Count
 Batting Average (Hits / At Bats)
 Batting Average
 Batting Average in C#
 Part of Mongo C# Driver
 Implements find and findOne
 Other grouping and projecting done client-side
 Do you want all that data before manipulating it?
 Add a $match pipeline operation
 Now need to sort
 But wait…we have incorrect results for top Batting Average
 Need to enhance $match to include those with 3.1 PA for 162 games
 In C# Using LINQ
 Not truly aggregation pipeline in C#
 Done on client, not server
 Materialize on client with LINQ
 Must use BsonDocument for aggregation pipeline
 Yikes!
 Creating the $match BsonDocument
var match = new BsonDocument{
{"$match", new BsonDocument{
{"Year", 2013},
{"AtBats", new BsonDocument{
{"$gte", 502}
}}
}}
};
 Create the $project operation
var project = new BsonDocument {
{"$project", new BsonDocument{
{"PlayerId", 1},
{"Year", 1},
{"TeamId", 1},
{"AVG", new BsonDocument{
{"$cond", new BsonDocument{
{"if", new BsonDocument{
{"$eq", new BsonArray{"$AtBats", "0"}}
}},
{"then", 0},
{"else", new BsonDocument{
{"$divide", new BsonArray{"$Hits", "$AtBats"}}
}}
}}
}}
}}
};
 Create the $sort and $limit operations and then combine them all in an Array
var sort = new BsonDocument{
{"$sort", new BsonDocument{
{"AVG", -1}
}
}
};
var limit = new BsonDocument{
{"$limit", 25}
};
return new[] { match, project, sort, limit };
 All the { } with BsonDocument and BsonArray reminds me of…
A measure of how often a batter reaches base for any reason other than a fielding
error, fielder's choice, dropped/uncaught third strike, fielder's obstruction, or
catcher's interference.
- Wikipedia (http://en.wikipedia.org/wiki/On-base_percentage)
Usually a better measure of batter’s performance than straight average
(H + BB + HBP) / (AB + BB + HBP + SF)
(Hits + BB + HBP) / (AB + BB + HBP + SF)
db.batting.aggregate([
{$match: { Year: 2013, AtBats: {$gte: 502} }},
{$project: {
PlayerId: 1, Year: 1, TeamId: 1,
OBP: { $cond: { if: {$eq: ["$AtBats", 0] },
then: 0,
else: { $divide: [
{$add:["$Hits","$BaseOnBalls","$HitByPitch"]},
{$add:["$AtBats","$BaseOnBalls","$HitByPitch","$SacrificeFlies"]}
]}
}}
}},
{$sort: {OBP: -1}}, {$limit: 25}
])
$match
$project
$sort
$limit
Early SABRmetric type of stat, invented by Bill James
With regard to an offensive player, the first key question is how many runs have resulted from
what he has done with the bat and on the basepaths. Willie McCovey hit .270 in his career,
with 353 doubles, 46 triples, 521 home runs and 1,345 walks -- but his job was not to hit
doubles, nor to hit singles, nor to hit triples, nor to draw walks or even hit home runs, but
rather to put runs on the scoreboard. How many runs resulted from all of these things?
- Bill James (James, Bill (1985). The Bill James Historical Baseball Abstract (1st ed.), pp. 273-4.
Villard. ISBN 0-394-53713-0)
((H + BB) x TB) / (AB + BB)
Aggregated across a team, RC is usually within 5% of a team’s actual runs
(Hits + Walks) * Total Bases / (At Bats + Walks)
db.batting.aggregate([
{$match: {Year:2013, AtBats:{$gte:502}}},
{$project: {
PlayerId: 1,
Year: 1,
TeamId: 1,
RC: { $divide: [
{$multiply:
[ {$add: ["$Hits","$BaseOnBalls"]},
{$add: ["$Hits","$Doubles","$Triples","$Triples",
"$HomeRuns","$HomeRuns","$HomeRuns"] }]
},
{ $add: ["$AtBats","$BaseOnBalls"] }]
}
}},
{$sort: {RC:-1}}, {$limit: 25}
])
$match
$project
$sort
$limit
db.batting.aggregate([
{$match: {Year:2013}},
{$group: {
_id: "$TeamId",
Hits: {$sum: "$Hits"},
Walks: {$sum: "$BaseOnBalls"},
Doubles: {$sum: "$Doubles"},
Triples: {$sum: "$Triples"},
HR: {$sum: "$HomeRuns"},
AtBats: {$sum: "$AtBats"}
}},
{$project: {
RC: { $divide: [
{$multiply:
[ {$add: ["$Hits","$Walks"]},
{$add: ["$Hits","$Doubles","$Triples","$Triples","$HR","$HR","$HR"] } ]},
{ $add: ["$AtBats","$Walks"] }]
} }},
{$sort: {RC: -1}}
])
$match
$group
$project
$sort
 Babe Ruth highest paid player in 20’s ($80K in ‘30/’31)
 Babe and Ty Cobb were highest paid in 1920 at $20K
 Joe DiMaggio highest paid in 1950 ($100K)
 Nolan Ryan made $1M in 1980 (1st time)
 Albert Belle made $10M in 1997
 In 1999, made ~$12M (more than entire Pirates payroll)
 2001 – ARod made $22M
 2009 – ARod made $33M
 Hoerster copyrighted statistic
 Compares the value each base produced by a hitter
 Who are the most expensive players?
 Takes total bases
 Hits + Doubles + (Triples x 2) + (HR x 3) + SB + BB + HBP – CS
 Divides salary into it
 Definitely not predictive
 More of a value statistic
Is a statistic, created by Tom Tango and based on linear regression, designed to
measure a player's overall offensive contributions per plate appearance.
- Wikipedia (http://en.wikipedia.org/wiki/Weighted_on-base_average)
Weighs each component of offensive with a factor
((wBB*BB)+(wHBP*HBP)+(wH*Hits)+(w2B*2B)+(w3B*3B)+(wHR*HR)+(wSB*SB)+(wCS*CS))
(AB+BB+HBP+SF-IBB)
var woba = db.WOBALookup.findOne({_id:2013});
db.batting.aggregate([
{$match: {Year: woba._id}},
{$redact: {
$cond: { if: { $gte: ["$AtBats",502] },
then: "$$KEEP",
else: "$$PRUNE“ } }},
{$project: {
Year: 1,
PlayerId: 1,
TeamId: 1,
WOBA: {
$divide: [
{$add: [{$multiply:[woba.wBB,"$BaseOnBalls"]}, {$multiply:[woba.wHBP,"$HitByPitch"]},
{$multiply:[woba.w1B,"$Hits"]}, {$multiply:[woba.w2B,"$Doubles"]},
{$multiply:[woba.w3B,"$Triples"]}, {$multiply:[woba.wHR,"$HomeRuns"]},
{$multiply:[woba.runSB,"$StolenBases"]}, {$multiply:[woba.runCS,"$CaughtStealing"]}
]},
{$add: "]}]}
]
}
}},
{$limit:25}, {$sort: {WOBA:-1}}, {$out: "2013TopWOBA"}
])
$match
$redact
$project
$limit
$sort
$out
wOBA_Factors
2013TopWOBA
 Calculates, on average, how many more runs a player generates than the average
player in the league
 Uses wOBA as a primary factor in calculation
 This then gets figured in for the over WAR of a player
 Good description here:
http://www.baseball-reference.com/about/war_explained_wraa.shtml
var woba = db.WOBALookup.findOne({_id:2013});
db.TopWOBA2013.aggregate([
{$match: {Year: woba._id}},
{$project: {
Year: 1, PlayerId: 1, TeamId: 1,
wRAA: {
$multiply: [
{$divide: [{$subtract: ["$WOBA",woba.wOBA]}, woba.wOBAScale]},
{$add: ["$AtBats","$BaseOnBalls","$HitByPitch",
"$SacrificeFlies","$SacrificeHits"]}
]
}
}},
{$sort: { wRAA: -1 }}, {$out: 'TopWRAA013'}
]);
$match
$project
$sort
$out
wOBA_Factors
'TopWRAA013
 Much of aggregate pipeline in Mongo can be done with LINQ
 But it will be client-side, not in Mongo!
 Take advantage of $out for intermediary tables during processing
 Stage your operations
 Maybe intermediary tables can be reused for other calcs
 $group id’s can be multi-valued
 Ends up as a sub-document and must be referenced accordingly
 Sean Lahman’s Baseball Database
http://seanlahman.com/baseball-archive/statistics/
 Society for American Baseball Research
http://sabr.org/
 wOBA Annual Factors
http://www.beyondtheboxscore.com/2011/1/4/1912914/custom-woba-and-linear-
weights-through-2010-baseball-databank-data
 Tom Tango’s Blog
http://espn.go.com/blog/statsinfo/tag/_/name/tom-tango
 Annual Salary Leaders, 1874 – 2012
http://sabr.org/research/mlbs-annual-salary-leaders-1874-2012

More Related Content

What's hot

MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
Tyler Brock
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDB
MongoDB
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
MongoDB
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDB
Kishor Parkhe
 
PyDX Presentation about Python, GeoData and Maps
PyDX Presentation about Python, GeoData and MapsPyDX Presentation about Python, GeoData and Maps
PyDX Presentation about Python, GeoData and Maps
Hannes Hapke
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
MongoDB
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation PipelinesMongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB
 
Doing More with MongoDB Aggregation
Doing More with MongoDB AggregationDoing More with MongoDB Aggregation
Doing More with MongoDB Aggregation
MongoDB
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
Jan Lehnardt Couch Db In A Real World Setting
Jan Lehnardt Couch Db In A Real World SettingJan Lehnardt Couch Db In A Real World Setting
Jan Lehnardt Couch Db In A Real World Setting
George Ang
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
Anuj Jain
 

What's hot (15)

MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDB
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDB
 
PyDX Presentation about Python, GeoData and Maps
PyDX Presentation about Python, GeoData and MapsPyDX Presentation about Python, GeoData and Maps
PyDX Presentation about Python, GeoData and Maps
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation PipelinesMongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
 
Doing More with MongoDB Aggregation
Doing More with MongoDB AggregationDoing More with MongoDB Aggregation
Doing More with MongoDB Aggregation
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Jan Lehnardt Couch Db In A Real World Setting
Jan Lehnardt Couch Db In A Real World SettingJan Lehnardt Couch Db In A Real World Setting
Jan Lehnardt Couch Db In A Real World Setting
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
 

Similar to Mongo Baseball .NET

MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB
 
Tips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsTips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB
 
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: Keynote
MongoDB
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
AhmedabadJavaMeetup
 
Latinoware
LatinowareLatinoware
Latinoware
kchodorow
 
DrupalCon Chicago Practical MongoDB and Drupal
DrupalCon Chicago Practical MongoDB and DrupalDrupalCon Chicago Practical MongoDB and Drupal
DrupalCon Chicago Practical MongoDB and Drupal
Doug Green
 
Tips and Tricks for Avoiding Common Query Pitfalls Christian Kurze
Tips and Tricks for Avoiding Common Query Pitfalls Christian KurzeTips and Tricks for Avoiding Common Query Pitfalls Christian Kurze
Tips and Tricks for Avoiding Common Query Pitfalls Christian Kurze
MongoDB
 
Tips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsTips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB
 
Avoid Query Pitfalls
Avoid Query PitfallsAvoid Query Pitfalls
Avoid Query Pitfalls
Norberto Leite
 
MongoDB.local Dallas 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Dallas 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB.local Dallas 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Dallas 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB
 
Enterprise Postgres
Enterprise PostgresEnterprise Postgres
Enterprise Postgres
Oracle Korea
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api Training
Spark Summit
 
Querying mongo db
Querying mongo dbQuerying mongo db
Querying mongo db
Bogdan Sabău
 
MongoDB
MongoDBMongoDB
MongoDB
techwhizbang
 
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB
 
Data Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane FineData Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane Fine
MongoDB
 
Database performance 101
Database performance 101Database performance 101
Database performance 101
Leon Fayer
 
Mongo DB 102
Mongo DB 102Mongo DB 102
Mongo DB 102
Abhijeet Vaikar
 
San Francisco Java User Group
San Francisco Java User GroupSan Francisco Java User Group
San Francisco Java User Group
kchodorow
 

Similar to Mongo Baseball .NET (20)

MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
 
Tips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsTips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query Pitfalls
 
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: Keynote
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
 
Latinoware
LatinowareLatinoware
Latinoware
 
DrupalCon Chicago Practical MongoDB and Drupal
DrupalCon Chicago Practical MongoDB and DrupalDrupalCon Chicago Practical MongoDB and Drupal
DrupalCon Chicago Practical MongoDB and Drupal
 
Tips and Tricks for Avoiding Common Query Pitfalls Christian Kurze
Tips and Tricks for Avoiding Common Query Pitfalls Christian KurzeTips and Tricks for Avoiding Common Query Pitfalls Christian Kurze
Tips and Tricks for Avoiding Common Query Pitfalls Christian Kurze
 
Tips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsTips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query Pitfalls
 
Avoid Query Pitfalls
Avoid Query PitfallsAvoid Query Pitfalls
Avoid Query Pitfalls
 
MongoDB.local Dallas 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Dallas 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB.local Dallas 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Dallas 2019: Tips & Tricks for Avoiding Common Query Pitfalls
 
Enterprise Postgres
Enterprise PostgresEnterprise Postgres
Enterprise Postgres
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api Training
 
Querying mongo db
Querying mongo dbQuerying mongo db
Querying mongo db
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
 
Data Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane FineData Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane Fine
 
Database performance 101
Database performance 101Database performance 101
Database performance 101
 
Mongo DB 102
Mongo DB 102Mongo DB 102
Mongo DB 102
 
San Francisco Java User Group
San Francisco Java User GroupSan Francisco Java User Group
San Francisco Java User Group
 

More from David Hoerster

Elm - Could this be the Future of Web Dev?
Elm - Could this be the Future of Web Dev?Elm - Could this be the Future of Web Dev?
Elm - Could this be the Future of Web Dev?
David Hoerster
 
Reactive Development: Commands, Actors and Events. Oh My!!
Reactive Development: Commands, Actors and Events.  Oh My!!Reactive Development: Commands, Actors and Events.  Oh My!!
Reactive Development: Commands, Actors and Events. Oh My!!
David Hoerster
 
CQRS Evolved - CQRS + Akka.NET
CQRS Evolved - CQRS + Akka.NETCQRS Evolved - CQRS + Akka.NET
CQRS Evolved - CQRS + Akka.NET
David Hoerster
 
Creating scalable message driven solutions akkadotnet
Creating scalable message driven solutions akkadotnetCreating scalable message driven solutions akkadotnet
Creating scalable message driven solutions akkadotnet
David Hoerster
 
Being RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceBeing RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data Persistence
David Hoerster
 
Freeing Yourself from an RDBMS Architecture
Freeing Yourself from an RDBMS ArchitectureFreeing Yourself from an RDBMS Architecture
Freeing Yourself from an RDBMS Architecture
David Hoerster
 
A Minimalist’s Attempt at Building a Distributed Application
A Minimalist’s Attempt at Building a Distributed ApplicationA Minimalist’s Attempt at Building a Distributed Application
A Minimalist’s Attempt at Building a Distributed Application
David Hoerster
 
Greenfield Development with CQRS and Windows Azure
Greenfield Development with CQRS and Windows AzureGreenfield Development with CQRS and Windows Azure
Greenfield Development with CQRS and Windows Azure
David Hoerster
 
Greenfield Development with CQRS
Greenfield Development with CQRSGreenfield Development with CQRS
Greenfield Development with CQRS
David Hoerster
 
jQuery and OData - Perfect Together
jQuery and OData - Perfect TogetherjQuery and OData - Perfect Together
jQuery and OData - Perfect Together
David Hoerster
 

More from David Hoerster (10)

Elm - Could this be the Future of Web Dev?
Elm - Could this be the Future of Web Dev?Elm - Could this be the Future of Web Dev?
Elm - Could this be the Future of Web Dev?
 
Reactive Development: Commands, Actors and Events. Oh My!!
Reactive Development: Commands, Actors and Events.  Oh My!!Reactive Development: Commands, Actors and Events.  Oh My!!
Reactive Development: Commands, Actors and Events. Oh My!!
 
CQRS Evolved - CQRS + Akka.NET
CQRS Evolved - CQRS + Akka.NETCQRS Evolved - CQRS + Akka.NET
CQRS Evolved - CQRS + Akka.NET
 
Creating scalable message driven solutions akkadotnet
Creating scalable message driven solutions akkadotnetCreating scalable message driven solutions akkadotnet
Creating scalable message driven solutions akkadotnet
 
Being RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceBeing RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data Persistence
 
Freeing Yourself from an RDBMS Architecture
Freeing Yourself from an RDBMS ArchitectureFreeing Yourself from an RDBMS Architecture
Freeing Yourself from an RDBMS Architecture
 
A Minimalist’s Attempt at Building a Distributed Application
A Minimalist’s Attempt at Building a Distributed ApplicationA Minimalist’s Attempt at Building a Distributed Application
A Minimalist’s Attempt at Building a Distributed Application
 
Greenfield Development with CQRS and Windows Azure
Greenfield Development with CQRS and Windows AzureGreenfield Development with CQRS and Windows Azure
Greenfield Development with CQRS and Windows Azure
 
Greenfield Development with CQRS
Greenfield Development with CQRSGreenfield Development with CQRS
Greenfield Development with CQRS
 
jQuery and OData - Perfect Together
jQuery and OData - Perfect TogetherjQuery and OData - Perfect Together
jQuery and OData - Perfect Together
 

Recently uploaded

Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 

Recently uploaded (20)

Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 

Mongo Baseball .NET

  • 2.  C# MVP (Since April 2011)  Sr. Director of Web Solutions at RGP  Conference Director for Pittsburgh TechFest  Co-Founder of BrainCredits (braincredits.com)  Past President of Pittsburgh .NET Users Group and organizer of recent Pittsburgh Code Camps and other Tech Events  Twitter - @DavidHoerster  Blog – http://geekswithblogs.net/DavidHoerster  Email – david@agileways.com
  • 3. +
  • 4.  Basic understanding of document databases, like Mongo  Familiarity of querying (non-aggregate pipeline) in Mongo  General understanding of baseball
  • 5.  Basics of AVG, OBP, ERA have been around  Underground of advanced statistics been growing since early 70s  Bill James is probably most well known  Society for American Baseball Research (SABR)  Fosters the research of baseball statistical history  Stats like wOBA, wRAA, WAR, DIPS, NERD and more  Lends itself to computer modeling and big data
  • 6.  Document database  A “NoSQL” solution  Wide range of querying and manipulation capabilities
  • 7.  Issue a JSON document  find and findOne like LINQ Select and First/Single methods  Basic cursor functionality (think DataReader)
  • 8.  Download as a NuGet package  Actively worked on and contributed to  There is an “official” client, along with several community clients
  • 9.  MongoDB’s data aggregation solution  Modeled on the concept of data processing pipelines  Operations are performed in stages  Results from one stage “piped” to the next stage $match $project $sort
  • 10.  Number of operations available  $group, $match, $project, $sort, $limit, $skip, $redact, $out, …  Essentially replaces the older mapReduce functionality  Aggregation Pipeline provides better performance, generally  mapReduce is more flexible  Aggregation combines a number of operations in order to produce a result set
  • 11.  Maximum size of a returned document is 16 MB  Aggregation Pipeline now returns results using cursor (as of 2.6)  Each stage of a pipeline has a maximum limit of 100MB of RAM  Enable allowDiskUse in order to write to disk and avoid this limitation  MongoDB will also optimize the pipeline, if possible
  • 13.  Batting Average (Hits / At Bats)
  • 16.  Part of Mongo C# Driver  Implements find and findOne  Other grouping and projecting done client-side  Do you want all that data before manipulating it?
  • 17.  Add a $match pipeline operation
  • 18.  Now need to sort
  • 19.  But wait…we have incorrect results for top Batting Average  Need to enhance $match to include those with 3.1 PA for 162 games
  • 20.  In C# Using LINQ
  • 21.  Not truly aggregation pipeline in C#  Done on client, not server  Materialize on client with LINQ  Must use BsonDocument for aggregation pipeline  Yikes!
  • 22.  Creating the $match BsonDocument var match = new BsonDocument{ {"$match", new BsonDocument{ {"Year", 2013}, {"AtBats", new BsonDocument{ {"$gte", 502} }} }} };
  • 23.  Create the $project operation var project = new BsonDocument { {"$project", new BsonDocument{ {"PlayerId", 1}, {"Year", 1}, {"TeamId", 1}, {"AVG", new BsonDocument{ {"$cond", new BsonDocument{ {"if", new BsonDocument{ {"$eq", new BsonArray{"$AtBats", "0"}} }}, {"then", 0}, {"else", new BsonDocument{ {"$divide", new BsonArray{"$Hits", "$AtBats"}} }} }} }} }} };
  • 24.  Create the $sort and $limit operations and then combine them all in an Array var sort = new BsonDocument{ {"$sort", new BsonDocument{ {"AVG", -1} } } }; var limit = new BsonDocument{ {"$limit", 25} }; return new[] { match, project, sort, limit };
  • 25.  All the { } with BsonDocument and BsonArray reminds me of…
  • 26. A measure of how often a batter reaches base for any reason other than a fielding error, fielder's choice, dropped/uncaught third strike, fielder's obstruction, or catcher's interference. - Wikipedia (http://en.wikipedia.org/wiki/On-base_percentage) Usually a better measure of batter’s performance than straight average (H + BB + HBP) / (AB + BB + HBP + SF)
  • 27. (Hits + BB + HBP) / (AB + BB + HBP + SF) db.batting.aggregate([ {$match: { Year: 2013, AtBats: {$gte: 502} }}, {$project: { PlayerId: 1, Year: 1, TeamId: 1, OBP: { $cond: { if: {$eq: ["$AtBats", 0] }, then: 0, else: { $divide: [ {$add:["$Hits","$BaseOnBalls","$HitByPitch"]}, {$add:["$AtBats","$BaseOnBalls","$HitByPitch","$SacrificeFlies"]} ]} }} }}, {$sort: {OBP: -1}}, {$limit: 25} ])
  • 29. Early SABRmetric type of stat, invented by Bill James With regard to an offensive player, the first key question is how many runs have resulted from what he has done with the bat and on the basepaths. Willie McCovey hit .270 in his career, with 353 doubles, 46 triples, 521 home runs and 1,345 walks -- but his job was not to hit doubles, nor to hit singles, nor to hit triples, nor to draw walks or even hit home runs, but rather to put runs on the scoreboard. How many runs resulted from all of these things? - Bill James (James, Bill (1985). The Bill James Historical Baseball Abstract (1st ed.), pp. 273-4. Villard. ISBN 0-394-53713-0) ((H + BB) x TB) / (AB + BB) Aggregated across a team, RC is usually within 5% of a team’s actual runs
  • 30. (Hits + Walks) * Total Bases / (At Bats + Walks) db.batting.aggregate([ {$match: {Year:2013, AtBats:{$gte:502}}}, {$project: { PlayerId: 1, Year: 1, TeamId: 1, RC: { $divide: [ {$multiply: [ {$add: ["$Hits","$BaseOnBalls"]}, {$add: ["$Hits","$Doubles","$Triples","$Triples", "$HomeRuns","$HomeRuns","$HomeRuns"] }] }, { $add: ["$AtBats","$BaseOnBalls"] }] } }}, {$sort: {RC:-1}}, {$limit: 25} ])
  • 32. db.batting.aggregate([ {$match: {Year:2013}}, {$group: { _id: "$TeamId", Hits: {$sum: "$Hits"}, Walks: {$sum: "$BaseOnBalls"}, Doubles: {$sum: "$Doubles"}, Triples: {$sum: "$Triples"}, HR: {$sum: "$HomeRuns"}, AtBats: {$sum: "$AtBats"} }}, {$project: { RC: { $divide: [ {$multiply: [ {$add: ["$Hits","$Walks"]}, {$add: ["$Hits","$Doubles","$Triples","$Triples","$HR","$HR","$HR"] } ]}, { $add: ["$AtBats","$Walks"] }] } }}, {$sort: {RC: -1}} ])
  • 34.  Babe Ruth highest paid player in 20’s ($80K in ‘30/’31)  Babe and Ty Cobb were highest paid in 1920 at $20K  Joe DiMaggio highest paid in 1950 ($100K)  Nolan Ryan made $1M in 1980 (1st time)  Albert Belle made $10M in 1997  In 1999, made ~$12M (more than entire Pirates payroll)  2001 – ARod made $22M  2009 – ARod made $33M
  • 35.  Hoerster copyrighted statistic  Compares the value each base produced by a hitter  Who are the most expensive players?
  • 36.  Takes total bases  Hits + Doubles + (Triples x 2) + (HR x 3) + SB + BB + HBP – CS  Divides salary into it  Definitely not predictive  More of a value statistic
  • 37. Is a statistic, created by Tom Tango and based on linear regression, designed to measure a player's overall offensive contributions per plate appearance. - Wikipedia (http://en.wikipedia.org/wiki/Weighted_on-base_average) Weighs each component of offensive with a factor ((wBB*BB)+(wHBP*HBP)+(wH*Hits)+(w2B*2B)+(w3B*3B)+(wHR*HR)+(wSB*SB)+(wCS*CS)) (AB+BB+HBP+SF-IBB)
  • 38. var woba = db.WOBALookup.findOne({_id:2013}); db.batting.aggregate([ {$match: {Year: woba._id}}, {$redact: { $cond: { if: { $gte: ["$AtBats",502] }, then: "$$KEEP", else: "$$PRUNE“ } }}, {$project: { Year: 1, PlayerId: 1, TeamId: 1, WOBA: { $divide: [ {$add: [{$multiply:[woba.wBB,"$BaseOnBalls"]}, {$multiply:[woba.wHBP,"$HitByPitch"]}, {$multiply:[woba.w1B,"$Hits"]}, {$multiply:[woba.w2B,"$Doubles"]}, {$multiply:[woba.w3B,"$Triples"]}, {$multiply:[woba.wHR,"$HomeRuns"]}, {$multiply:[woba.runSB,"$StolenBases"]}, {$multiply:[woba.runCS,"$CaughtStealing"]} ]}, {$add: "]}]} ] } }}, {$limit:25}, {$sort: {WOBA:-1}}, {$out: "2013TopWOBA"} ])
  • 40.  Calculates, on average, how many more runs a player generates than the average player in the league  Uses wOBA as a primary factor in calculation  This then gets figured in for the over WAR of a player  Good description here: http://www.baseball-reference.com/about/war_explained_wraa.shtml
  • 41. var woba = db.WOBALookup.findOne({_id:2013}); db.TopWOBA2013.aggregate([ {$match: {Year: woba._id}}, {$project: { Year: 1, PlayerId: 1, TeamId: 1, wRAA: { $multiply: [ {$divide: [{$subtract: ["$WOBA",woba.wOBA]}, woba.wOBAScale]}, {$add: ["$AtBats","$BaseOnBalls","$HitByPitch", "$SacrificeFlies","$SacrificeHits"]} ] } }}, {$sort: { wRAA: -1 }}, {$out: 'TopWRAA013'} ]);
  • 43.  Much of aggregate pipeline in Mongo can be done with LINQ  But it will be client-side, not in Mongo!  Take advantage of $out for intermediary tables during processing  Stage your operations  Maybe intermediary tables can be reused for other calcs  $group id’s can be multi-valued  Ends up as a sub-document and must be referenced accordingly
  • 44.  Sean Lahman’s Baseball Database http://seanlahman.com/baseball-archive/statistics/  Society for American Baseball Research http://sabr.org/  wOBA Annual Factors http://www.beyondtheboxscore.com/2011/1/4/1912914/custom-woba-and-linear- weights-through-2010-baseball-databank-data  Tom Tango’s Blog http://espn.go.com/blog/statsinfo/tag/_/name/tom-tango  Annual Salary Leaders, 1874 – 2012 http://sabr.org/research/mlbs-annual-salary-leaders-1874-2012