Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

R Studio Conference

2,078 views

Published on

R Studio - data science Marco presentation.

Published in: Education
  • Great speach in Conference! Really great way to develop a Knowledge path
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

R Studio Conference

  1. 1. Training an Army of Data Scientists Marco Blume – Trading Director marco.blume@pinnacle.com RStudio Conf 2018
  2. 2. Who is
  3. 3. • Large online international sportsbook • ~450 employees in 6 offices • Been around 20 years! • Unique model that relies heavily on data science • Risk Management, Trading • Similar to Financial Markets
  4. 4. Several Packages on CRAN related to our domain • Pinnacle.API • Odds.Converter • Pinnacle.Data • Other open source contributions Who is Pinnacle? Avid users of R technologies and RStudio products • RStudio Server Pro • RStudio Connect • Tidyverse! • RMarkdown • On the bleeding edge of R community users
  5. 5. Very complex modelling problems • Sports Models • Trading Algorithms • High transactional systems • Professional Algorithm Developers and Data Scientists Why an Army of Data Scientists? Every aspect of the business needs to be data-driven • Finance / Payment providers • Marketing • Customer Service • Business to Business • Many “micro-problems” to solve, not enough Data Scientists
  6. 6. Our Idea: • Every department needs Data Scientists • Focus on Tidyverse • Offer internal and external training to the entire company (around 450 staff) • Train Junior Data Scientists to do data analysis and produce RMD to communicate results Training an Army of Data Scientists
  7. 7. Training an Army of Data Scientists Our Target Audience • Many non-technical employees in various positions • Never written a line of code • Many without college degrees • Example: Customer Service 15 years • Similar talks such as Mine’s keynote at UseR 2017 focus on more technical students
  8. 8. Our Approach: • DataCamp as basis for external training w/ defined curriculum • Internal training w/ 4 levels based on Master the Tidyverse by Garret Training an Army of Data Scientists
  9. 9. Why we like it: • Self-paced • Quality instructors and content • Many topics • Micro-Courses Data Camp BUT… • For us, the curriculum was not ordered well • We defined our own DataCamp curriculum chapter by chapter
  10. 10. Level 1: Data Camp – Current Curriculum Time: 8 hrs. • Introduction to R • Ch. 3 Matrices • Ch. 4 Factors • Ch. 6 Lists • Introduction to the tidyverse
  11. 11. Level 2: Data Camp – Current Curriculum Time: 18 hrs. • Data Visualization with ggplot2 (Part1) • Ch. 3 qplot and wrap-up • Data Manipulation in R with dplyr • Importing data in R Part 1 • Ch. 1 Importing data from flat files with utils • Ch. 4 Reproducible Excel work with XL connect • Introduction to R • Ch. 4 Factors • Working with the Rstudio IDE Part 1 • Importing and Cleaning Data in R case studies
  12. 12. Level 3: Data Camp – Current Curriculum Time: 25 hrs. • Data Visualization with ggplot2 (Part2) • Cleaning Data in R • Reporting with R markdown • Ch. 4 Configuring R Markdown (optional) • Introduction to R • Ch. 3 Matrices • Ch. 6 Lists • Working with the Rstudio IDE Part 2 • Intermediate R • Exploratory data analysis in R case study
  13. 13. Level 4: Data Camp – Current Curriculum Time: 25 hrs. • Joining Data in R with dplyr • Intermediate R Practice • String Manipulation in R with stringr • Data Visualization with ggplot2 (Part3) • Writing Functions in R • Case study • With the help of a Mentor you can develop a capstone project that results into a markdown or a shiny application. Level 5:
  14. 14. Data Camp - Lessons Learned
  15. 15. Data Camp - Lessons Learned
  16. 16. Data Camp - Lessons Learned
  17. 17. Data Camp - Lessons Learned DataCamp “ReadCamp” package Available on GitHub: https://github.com/marcoblume/readcamp
  18. 18. Additional Internal Support • Community of R experts eager to help • #r – programming ~ 100 users • Many internal packages • ggplot theme / RMD template • Rstudio Server Pro • Admins can fix difficult install / config issues for users • Basic environment works out of box
  19. 19. Lessons Learned • RStudio Server Pro • Allows us to setup / manage environment for Junior DS • Control access to data / audit • RStudio Connect • Easy deployment / sharing • Anyone can become a Junior Data Scientist – any background • Motivation is key (use FUN datasets not mpg / iris) • Experts / previous trainees helping • Internal eco-system of packages to build upon
  20. 20. Lessons Learned • Focus on TIDYVERSE only • ggplot very important to master • RMD is central to our business now • Common template and theme make it easier to read and interpret • Communication is key • Wrappers around data • No SQL required • Customize curriculum based on feedback and business needs
  21. 21. Success Stories “About a year ago, I was offered the possibility to enroll in a paid- by-the-company R training. Being the kind of person who likes going beyond the so-called comfort zone, I decided to take on the challenge. I come from a humanistic background and math was never my favorite subject in school. After some time learning R, I realized that it is not that different from learning any other language. I usually tell myself: “If you were able to learn Russian, you are for sure able to learn R!”
  22. 22. Success Stories “I was a CSD manager in Pinnacle for 15 years until I was offered a new position as a Junior BI Analyst. I did not doubt to accept the new post as it gave me the opportunity to pursue a new career. I feel excited about starting this new path. The combination of my expertise within the CSD department and the R-tools that I am learning to use will help me analyze data in a more efficient way. I look forward to continue learning and becoming a better analyst!”
  23. 23. Contact us – We are Hiring! Email: recruitment@pinnacle.com Twitter: @PinnacleSports

×