Fantasy Football
Fisher
Paul Singman
Insight Data Engineering Fellow
October 2016
Motivation
• Leverage distributed,
open-source technologies
to fill gaps in current
offerings by providing 30
second “micro-leagues”
Data Simulation and Ingestion
• 100 plays per second are simulated
from JSON files of NFL plays from
SportRadar API
• JSON play data is parsed for relevant
info and sent onto Kafka queue
Example record:
{"player_name": "Marcus Mariota", "timestamp":
"2016-09-29_02:19:23", "touchdown": 0, "yards": 9,
"player_id": "7c16c04c-04de-41f3-ac16-ad6a9435e3f7",
"position": "QB"}
Fantasy Football Fisher Architecture
Instances Cost Total
2 x 4 m4.large $0.12 /hr $.96 /hr
Use of Windowed Streaming
Use of Windowed Streaming
30 sec 30 sec
Parameter Tuning
● Ensuring the stream application is stable
● Data is flowing out as quickly as it is coming in
Ideally, batch interval should be blockInterval * # partitions
~ 2.5 - 3 seconds = 200 ms * 12-15
• Bachelor of Science in Stats
from Penn
● Shelf full of O’Reilly books
● Serial online course taker
(and completer)
• Jr Data Engineer experience
at early-stage startup
(Mighty)
• Enjoy movies,
backgammon, and rooftop
yoga

psingman insight.pptx

  • 1.
    Fantasy Football Fisher Paul Singman InsightData Engineering Fellow October 2016
  • 2.
    Motivation • Leverage distributed, open-sourcetechnologies to fill gaps in current offerings by providing 30 second “micro-leagues”
  • 3.
    Data Simulation andIngestion • 100 plays per second are simulated from JSON files of NFL plays from SportRadar API • JSON play data is parsed for relevant info and sent onto Kafka queue Example record: {"player_name": "Marcus Mariota", "timestamp": "2016-09-29_02:19:23", "touchdown": 0, "yards": 9, "player_id": "7c16c04c-04de-41f3-ac16-ad6a9435e3f7", "position": "QB"}
  • 4.
    Fantasy Football FisherArchitecture Instances Cost Total 2 x 4 m4.large $0.12 /hr $.96 /hr
  • 5.
    Use of WindowedStreaming
  • 6.
    Use of WindowedStreaming 30 sec 30 sec
  • 7.
    Parameter Tuning ● Ensuringthe stream application is stable ● Data is flowing out as quickly as it is coming in Ideally, batch interval should be blockInterval * # partitions ~ 2.5 - 3 seconds = 200 ms * 12-15
  • 8.
    • Bachelor ofScience in Stats from Penn ● Shelf full of O’Reilly books ● Serial online course taker (and completer) • Jr Data Engineer experience at early-stage startup (Mighty) • Enjoy movies, backgammon, and rooftop yoga