Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Applied AI Tech Talk: How to Setup a Data Science Dept

730 views

Published on

  • Be the first to comment

  • Be the first to like this

Applied AI Tech Talk: How to Setup a Data Science Dept

  1. 1. Tech Talks: How to Setup a Data Science Business Function Jun 2015 www.applied.ai How to Setup a Data Science Business Function Applied AI Tech Talk
  2. 2. ● We are data scientists: ○ variously quants, statisticians, actuarial & machine learning types ● We are consultants: ○ we do complex data analysis, predictive modelling etc ○ and we also help to do the soft stuff... … enabling companies to learn from their data in a sustainable way This is a totally biased talk
  3. 3. Like any collaborative business effort involving research & development, a data science function should be built carefully in order to enable the best expertise and technologies. - Me, ~2 weeks ago http://blog.applied.ai/how-to-build-a-data-science-business-function/ How to Setup a Data Science Business Function a.ka. Making in-house Data Science sustainable
  4. 4. ● Including, for example: Data Science is a broad discipline one-off scenario- specific modelling exercises on-line predictive modelling of user actions regular analysis of campaigns and customer discovery … and a significant amount of data acquisition, preparation, storage etc
  5. 5. ● To be sustainable and minimise risk, we need to combine: ○ great people ○ advanced maths ○ scientific experimentation ○ software engineering ○ high-quality data ○ solid business practices ○ communication The most important thing is communication https://www.quora.com/How-could-the-Data-Science-Venn-Diagram-be-improved
  6. 6. 1. Setting up and sizing the team 2. Defining and operating projects 3. Systemising the data pipeline and analyses 4. Ensuring effective communication … to help us make in-house Data Science sustainable Four main areas to cover:
  7. 7. ● The practitioner will use a wide variety of tools to: ○ acquire, manipulate, store and access data efficiently ○ design surveys and scientific experiments to test hypotheses ○ undertake statistically valid analyses ○ implement high-quality, optimised predictive models ○ derive and communicate actionable insights … requiring diverse skills covering database management, software engineering, statistical analysis, machine learning, graphic design, ethics, social responsibility, domain knowledge and communication. 1. Setting up and sizing the team Data Scientists need a lot of skills!
  8. 8. ● But the days of hiring a single, unicorn-like, 'full-stack' data scientist are pretty much gone, and probably never really existed. 1. Setting up and sizing the team Don’t believe in unicorns
  9. 9. The team needs to be small, agile and focused: ● 2-6 data scientists is ample ● they should be proven generalists, team-players and pragmatists ● able to cope with vague requirements, messy data and high failure rates “The first hire(s) should help get three things ready: your data; a clear problem to be solved; and a process to evaluate the business impact of any new solution". - Simon Chan, Forbes, April 2015 http://www.forbes.com/sites/theyec/2015/04/30/how-to-do-your- first-data-science-hire-right/ 1. Setting up and sizing the team Start with a small, focused team
  10. 10. Any piece of research or development likely to last more than a few days and/or involve more than one person should have: ● A primary sponsor and a project leader ● A well defined goal (SMART), and a written spec ● Progress meetings to validate and update the plan, with full and frank communication between major stakeholders ● Knowledge sharing upon completion ● Consider maintaining a basic RACI and risks & issues register. 2. Defining and operating projects
  11. 11. Automate good workflows and deal with technical debt: ● Understand and map the data 'pipeline' ● Stop when the models are good enough ● Encourage a systematic, shared approach to the creation of all machine learning tools and analyses, with: ○ proper source control and documentation ○ code reviews & 'lunch and learn' seminar sessions ○ regular refactoring of algorithms, applications and data preparation scripts where appropriate. 3. Systemising the data pipeline and analyses
  12. 12. Strong communication within & without the team is vital, helping to ensure that projects stay on-track and issues are spotted early: ● Daily stand-up meetings (<10 mins), sharing immediate activities & issues ● An up-to-date communal task schedule - e.g. the Kanban methodology ● Simplified and centralised comms tech; move written discussions away from email and towards wikis, message boards, and group chats Slack ● Try to allow data scientists / software engineers the time & space to get into a productive flow state without meetings and interruptions. 4. Ensuring effective communication
  13. 13. ● Start with a small team of capable generalists and work hard to define the business problems and success criteria, set timescales and to understand & access the available data ● Allow for and embrace failure, give data scientists time and space to research and experiment ● Specialise when necessary, automate where possible and embed into an ongoing cycle of development, maintenance and support. ● Require a corporate sponsor with clout and encourage strong communication within the team and the rest of the business http://blog.applied.ai/how-to-build-a-data-science-business-function/ In review
  14. 14. Applied AI is a data science consultancy We provide data-driven insights and solutions using applied artificial intelligence www.applied.ai Thank You Any questions?

×