Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

946 views

Published on

Data cleaning is the first step of every Data Science project. Next one does Data Science. The talk covers a missing step of deployment and scaling Data Applications in production. We will go through all major steps of the process like Dockerizing application, Continuous Deployment with further AWS stack creation and rolling deploys although also covering new trends in Serverless architecture.

Published in: Data & Analytics
  • Be the first to comment

From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

  1. 1. From Data Science to Production 01 deploy, scale, enjoy! Sergii Khomenko, Data Scientist sergii.khomenko@stylight.com, @lc0d3r PyData Amsterdam - March 12, 2016
  2. 2. Sergii Khomenko 2 Data scientist at one of the biggest fashion communities, Stylight. Data analysis and visualisation hobbyist, working on problems not only in working time but in free time for fun and personal data visualisations. Originally from computer engineering background. Speaker at Berlin Buzzwords 2014, ApacheCon Europe 2014, Puppet Camp London 2015, Berlin Buzzwords 2015 , Tableau Conference on Tour 2015, Budapest BI Forum 2015, Crunchsconf 2015, FOSDEM 2016
  3. 3. Fellow DevOps 3 Quentin NerdenMilos Radovanovic Patrick Roelke
  4. 4. Profitable Leads Stylight provides its partners with high- quality leads enabling partner shops to leverage Stylight as a ROI positive traffic channel. Inspiration Stylight offers shoppable inspiration that makes it easy to know what to buy and how to style it. Branding & Reach Stylight offers a unique opportunity for brands to reach an audience that is actively looking for style online. Shopping Stylight helps users search and shop fashion and lifestyle products smarter across hundreds of shops. 4 Stylight – Make Style Happen Core Target Group Stylight help aspiring women between 18 and 35 to evolve their style through shoppable inspiration.
  5. 5. Stylight – acting on a global scale
  6. 6. Experienced & Ambitious Team Innovative cross- functional organisation with flat hierarchy builds a 
 unique team spirit. • +200 employees • 40 PhDs/Engineers • 28 years average age • 63% female • 23 nationalities • 0 suits 6
  7. 7. 7 D a t a S c i e n t i s t : P e r s o n w h o i s b e t t e r a t s t a t i s t i c s t h a n a n y s o f t w a r e e n g i n e e r a n d b e t t e r a t s o f t w a r e e n g i n e e r i n g t h a n a n y s t a t i s t i c i a n .
  8. 8. Agenda 8 E a r l y d a y s o f s t a r t u p s S o f t w a r e e n g i n e e r i n g I m m u t a b l e i n f r a s t r u c t u r e S e r v e r l e s s a r c h i t e c t u r e
  9. 9. The Early Days of Startups 9
  10. 10. Problem definition: 10 • Many different technologies • Hard to reproduce data science results • Issues with backward compatibility • Dependency hell • Hard to scale products • Hard to on-board new people
  11. 11. 11
  12. 12. Software engineering 12 built circa 2015-16
  13. 13. Our stack 13
  14. 14. 14
  15. 15. You most likely doing it already 15 • Version control • Cover code with tests • nosetests, pytest, unittest2 - start small with doc tests - try out TDD: rednose, nose-watch
  16. 16. You most likely doing it already 16 • Cover code with tests • yes, even your R application could have tests - testthat - devtools • Code reviews • Pair programming
  17. 17. Some of the mentioned problems 17 • Many different technologies • Issues with backward compatibility • Dependency hell • Hard to on-board new people
  18. 18. 18image from http://udaypal.com/
  19. 19. 19image from http://udaypal.com/
  20. 20. 20image from http://udaypal.com/
  21. 21. Some of the mentioned problems 21 • Many different technologies • Issues with backward compatibility • Dependency hell • Hard to on-board new people
  22. 22. How it could help: 22 • Every technology has its own container - just docker run • Every package with version defined in Dockerfile - have a base image for more advanced cases • New people - just docker run
  23. 23. 23image from http://udaypal.com/ r-base/Dockerfile
  24. 24. 24image from http://udaypal.com/ lc0/docker-shiny-server
  25. 25. 25image from http://udaypal.com/
  26. 26. Known issues 26 • Images could be really huge • Try to skip anything you do not need • Alpine Linux as a base image • 5 mb base image (musl libc and BusyBox) • Iron.io has pre-built images based on alpine • python, scala, java, elixir, etc
  27. 27. Known issues 27 16 mb 232 mb
  28. 28. Some of the mentioned problems 28 • Hard to roll out • Hard to maintain production dependencies
  29. 29. 29image from http://udaypal.com/ AWS ECR
  30. 30. 30image from http://udaypal.com/
  31. 31. 31image from http://udaypal.com/ CircleCI deployments
  32. 32. 32image from http://udaypal.com/ CircleCI deployments
  33. 33. 33image from http://udaypal.com/ CircleCI deployments
  34. 34. 34image from http://udaypal.com/ CircleCI deployments
  35. 35. Immutable infrastructure 35 Infrastructure as Code
  36. 36. 36 N e e d t o u p g r a d e ? N o p r o b l e m . B u i l d a n e w , u p g r a d e d s y s t e m a n d t h r o w t h e o l d o n e a w a y . N e w a p p r e v i s i o n ? S a m e t h i n g . B u i l d a s e r v e r ( o r i m a g e ) w i t h a n e w r e v i s i o n a n d t h r o w a w a y t h e o l d o n e s .
  37. 37. 37
  38. 38. 38
  39. 39. 39
  40. 40. 40 CloudFormation
  41. 41. 41 CloudFormation
  42. 42. 42 cloudtools/troposphere
  43. 43. 43 cloudtools/troposphere
  44. 44. 44 cloudtools/troposphere
  45. 45. 45 Terraform
  46. 46. 46
  47. 47. 47 Terraform Kubernetes and Docker {Swarm, Compose}
  48. 48. Serverless architecture 48
  49. 49. 49
  50. 50. 50
  51. 51. 51
  52. 52. 52
  53. 53. 53
  54. 54. 54
  55. 55. 55
  56. 56. Possibilities 56 • all Lambdas in one place with version control • integration tests with real events • proper CI/CD setup
  57. 57. 57 CircleCI deployments
  58. 58. 58 CircleCI deployments
  59. 59. 59 CircleCI deployments
  60. 60. 60 Cloud functions
  61. 61. Use-case of outlier detection 61
  62. 62. 62
  63. 63. 63 custom unification pipeline Departments Business Intelligence internal processes variety of event types and structures
  64. 64. 64 Outlier detection to Slack
  65. 65. www.stylight.com sergii.khomenko@stylight.com @lc0d3r
  66. 66. Related links 66 1. Testing Your Code - The Hitchhiker's Guide to Python 2. https://hub.docker.com/_/r-base/ 3. http://www.alpinelinux.org/ 4. https://github.com/iron-io/dockers 5. Docker Hub: A new stack plus ecosystem partners automate developer workflows 6. Trash Your Servers and Burn Your Code: Immutable Infrastructure and Disposable Components
  67. 67. Related links 67 7. https://github.com/cloudtools/troposphere 8. CloudFormation UpdatePolicy Attribute 9. https://www.terraform.io/ 10.(Docker Compose + Docker Swarm) or Kubernetes 11.Google Cloud Functions 12.https://github.com/apex/apex 13.Streaming Data Processing with Amazon Kinesis and AWS Lambda
  68. 68. 68
  69. 69. 69

×