Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

I know what you did last session - clustering users with ml


Published on

This presentation was given by our Lead Data Scientist, Adauto Braz, at sharing some insights about our clustering project using unsupervised learning algorithms such as DBSCAN, K-Means.

Published in: Data & Analytics
  • Be the first to comment

I know what you did last session - clustering users with ml

  1. 1. I know what you did last session: clustering users with ML
  2. 2. About me in/adauto-braz-687735a0/ Adauto Braz Data Scientist at Stoodi
  3. 3. Data Science doesn’t have to be cool, to rule my world. - Prince
  4. 4. Data Science projects don’t need to use the latest technology to deliver value. There are low hanging fruits everywhere.
  5. 5. This is a platform’s, platform’s world. - James Brown
  6. 6. Platforms!
  8. 8. Tell me what you want, what you really, really want. - Spice Girls
  9. 9. What are your objectives? What data is available? Is your data consistent and correct?
  10. 10. Aimed for simplicity Understandable results Is the data redundant? ● PCA: 2 features could explain >80% of variance
  11. 11. Hey I just met you, And this is crazy But here’s my data. Run an algorithm on it, maybe? - Carly Rae Jepsen
  12. 12. DBSCAN ● Density-based spatial clustering of applications with noise. ● Two parameters: ○ How close the points have to be? ○ How many points at least a cluster should have?
  13. 13. Nobody said clustering was easy. No one ever said it would be this hard. - Coldplay
  14. 14. DBSCAN ● Outliers ● Different densities
  15. 15. KMeans ● One parameter: ○ How many clusters do you want to find? ● We don’t know! ○ Tested k from 2 up to 20 ● Don’t leave outliers
  16. 16. How will I know? Don’t trust your feelings. Clustering can be deceiving. - Whitney Houston
  17. 17. Inertia How far are the points within a cluster?
  18. 18. Silhouette Index How far are the points from one cluster to the other clusters?
  19. 19. Run for different cohorts Check consistency You’re done!
  20. 20. User Videos started Videos finished Studyplans configured Exercises Submitted Cluster 1 0 0 0 0 A 2 1 0 0 2 B 3 6 4 1 3 C 4 3 1 0 20 D
  21. 21. So, so what? I’m a data star I got my data moves And I don’t need you - Pink
  22. 22. We need context to understand the results There is value! Five types of users
  23. 23. I wanna scream and shout and let it all out - Britney Spears
  24. 24. Turn your insights into knowledge Translate results to accessible language
  25. 25. On the first day after sign up, there are five main user types: Ingrid, the Inactive Ellen, the Excited Alex, the Athlete Lucas, the Lost Nina, the Novice Doesn’t do anything Just configures the study plan Focus on exercises: 25+ exercises Tests a bit of the platform: 2 Videos and 1 exercise Really engaged: study plan + 8 videos + 8 exercises
  26. 26. Palpable impact Data Science team wins!
  27. 27. Thank U, Next Adauto Braz