Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017

Enterprise datascienceatscale14nov2017prx

  • Be the first to comment

Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017

  1. 1. © Hortonworks Inc. 2011 – 2017. All Rights Reserved Enterprise Data Science at Scale: Introducing Data Science Experience (DSX) Future of Data – Princeton Meetup 14-November-2017
  2. 2. © Hortonworks Inc. 2011 – 2017. All Rights Reserved Presenter Tim Spann
  3. 3. © Hortonworks Inc. 2011 – 2017. All Rights Reserved à #1 Pure Open Source Hadoop Distribution à 1000+ customers and 2100+ ecosystem partners à Employs the original architects, developers and operators of Hadoop from Yahoo! à Best-in-class 24x7 customer support à Leading professional services and training à #1 Data Science Platform (Source: Gartner) à OpenPOWER performance leadership à Flexible, software defined storage à #1 SQL Engine for complex, analytical workloads à Leader in On-premise and Hybrid Cloud solutions + IBM + Hortonworks = Unlocking Actionable Insights
  4. 4. © Hortonworks Inc. 2011 – 2017. All Rights Reserved Data Science In Action Data Scientists Responsible for “The Math” Data Engineers Responsible for “The Data” Business Analyst Responsible for “The Business” The Team The Process Corporate IT Responsible for “Technology”
  5. 5. © Hortonworks Inc. 2011 – 2017. All Rights Reserved Data Science Challenges Data Scientists “I like my own tools” “How can I productionize my model” Data Engineers “I need a central place for data” “How can I efficiently transform data” Business Analyst ”I need to visualize the shape of data” “How can we fail fast and prototype quickly” The Team The Process Productionizing with data So many tools & limited compute resources Data Discovery Model detioriation & data evolution Corporate IT “How do I govern and secure this?” “I can’t support all of these tools”
  6. 6. © Hortonworks Inc. 2011 – 2017. All Rights Reserved The IBM + HWK Data Science Experience Data Scientists Tools: R Studio, Juypter, Zeppelin, H20, etc Model management Data Engineers Place all data assets in one place Productionize models with REST endpoints Business Analyst Rich data visualization Community and collaboration of knowledge The Team The Process Corporate IT Run secure & governed data science One experience to support many tools Collaboration Community
  7. 7. © Hortonworks Inc. 2011 – 2017. All Rights Reserved Data Science Solution Community Open Source Scale & Enterprise Security • Find tutorials and datasets • Connect with Data Scientists • Ask questions • Read articles and papers • Fork and share projects • Code in Scala/Python/R/SQL • Zeppelin & Jupyter Notebooks • RStudio IDE and Shiny • Apache Spark • Your favorite libraries • Data Science at Scale • Run Spark Jobs on HDP Cluster • Secure Hadoop Support • Ranger Atlas Support for Data • Support for ABAC Model Management • Data Shaping Pipeline UI • Auto-data preparation & modeling • Advanced Visualizations • Model management & deployment • Documented Model APIs Data Science Experience
  8. 8. © Hortonworks Inc. 2011 – 2017. All Rights Reserved DEMO
  9. 9. © Hortonworks Inc. 2011 – 2017. All Rights Reserved Use Case à All industries are effected by churn. à Being able to predict churn helps companies take action and keep customers longer. à The more historical data, the better the model à Data collected and labeled over time based on churn. à Using a Random Forest we will predict future churners. Customer Churn Architecture
  10. 10. © Hortonworks Inc. 2011 – 2017. All Rights Reserved Demo Scenario Assessing Customer Churn Probability in Real Time • Stored long term data on customer churn behavior • New real time data coming in • Predict a customers churn probability before they churn • Alert the proper departments | manager • Business monitors customer retention outlook & performance
  11. 11. © Hortonworks Inc. 2011 – 2017. All Rights Reserved Demo Scenario Problems Solved • Data Scientist collaborate, learn new tools & frameworks • Choice of tools, notebooks and languages • Run favorite notebook on all data in the HDP Cluster • Deploy the model to production • Leverage the production model to deliver insights to business • Monitor models and retrain models as new data comes in

×