Fantasy Football
Fisher
Paul Singman
Insight Data Engineering Fellow
October 2016
Motivation
● Current offerings:
○ Season long
○ Daily
● What’s missing?
○ Micro-leagues
1 Day Full SeasonSub 1 minute
Data Simulation and Ingestion
• Play-by-play files obtained from
SportRadar API
• Plays simulated at rate of 300 /s
• K-P Producer sends onto Kafka queue
Example record:
{"player_name": "Marcus Mariota", "timestamp":
"2016-09-29_02:19:23", "touchdown": 0, "yards": 9,
"player_id": "7c16c04c-04de-41f3-ac16-ad6a9435e3f7",
"position": "QB"}
Fantasy Football Fisher Architecture
Instances Cost Total
2 x 4
m4.large
$0.12 /hr $.96 /hr
Use of Windowed Streaming
Use of Windowed Streaming
30 sec 30 sec
Optimizing Cassandra Reads, Writes, and Queries
○ Prepared Statements optimize query execution
○ Parsing and optimization occur on Cassandra node
○ Only Query ID and variables are sent between Spark - Cassandr
during execution
• Bachelor of Science in Stats
from Penn
● Shelf full of O’Reilly books
● Serial online course taker
(and completer)
• Jr Data Engineer experience
at early-stage startup
(Mighty)
• Enjoy movies,
backgammon, and rooftop
yoga

paulsingmaninsight

  • 1.
    Fantasy Football Fisher Paul Singman InsightData Engineering Fellow October 2016
  • 2.
    Motivation ● Current offerings: ○Season long ○ Daily ● What’s missing? ○ Micro-leagues 1 Day Full SeasonSub 1 minute
  • 3.
    Data Simulation andIngestion • Play-by-play files obtained from SportRadar API • Plays simulated at rate of 300 /s • K-P Producer sends onto Kafka queue Example record: {"player_name": "Marcus Mariota", "timestamp": "2016-09-29_02:19:23", "touchdown": 0, "yards": 9, "player_id": "7c16c04c-04de-41f3-ac16-ad6a9435e3f7", "position": "QB"}
  • 4.
    Fantasy Football FisherArchitecture Instances Cost Total 2 x 4 m4.large $0.12 /hr $.96 /hr
  • 5.
    Use of WindowedStreaming
  • 6.
    Use of WindowedStreaming 30 sec 30 sec
  • 7.
    Optimizing Cassandra Reads,Writes, and Queries ○ Prepared Statements optimize query execution ○ Parsing and optimization occur on Cassandra node ○ Only Query ID and variables are sent between Spark - Cassandr during execution
  • 8.
    • Bachelor ofScience in Stats from Penn ● Shelf full of O’Reilly books ● Serial online course taker (and completer) • Jr Data Engineer experience at early-stage startup (Mighty) • Enjoy movies, backgammon, and rooftop yoga