More Related Content

Slideshows for you(20)

Similar to Turn Data Into Actionable Insights - StampedeCon 2016(20)

More from StampedeCon(20)

Turn Data Into Actionable Insights - StampedeCon 2016

  1. Turn Data into Actionable Insights!!
  2. About me: Vishnu Alavur Kannan Analytics Technical Platforms Lead • 15+ years in IT, software engineer @heart ! • Lead engineering teams through out my career ! • Platform is just vaporware without passionate people ! • A players make all the difference in software engineering ✓ 50:1, 100 :1, rarely on any other profession @Monsanto for two reasons: ! • I strongly believe in our commitment to sustainable agriculture ! • I am able to do top-flight Engineering R&D ✓ Complex engineering challenges keeps me going ✓ Freedom to operate: o Use the right tool for the right job o Solving problems using cutting-edge technologies o Open-source friendly, Open environment to contribute back
  3. • Bringing a broad range of solutions to help nourish our growing world • Collaborating to help tackle some of the world’s biggest challenges • >20,000 employees in 66 countries • >50% employees based outside of 
 the United States • One of the 25 World’s Best Multinational Workplaces by Great Place to Work Institute Monsanto: 
 
 A Sustainable Agriculture Company
  4. Our systems approach integrates technology platforms to maximize farmer effectiveness Crop Protection • Weed Control (Roundup ® 
 Branded Agricultural Herbicides) • Insect Control • Disease Control Breeding • Stress Tolerance • Disease Control • Yield • Vegetables, corn, cotton, soybeans, wheat, canolaBiologicals • Weed Control • Insect Control • Virus Control • Plant Health Biotechnology • Weed Control • Insect Control • Stress Tolerance • Yield / Yield Protection
 • NutrientsData Science • Planting Script Creator • Increased production • Efficient land and water use • Efficient nutrient use
  5. https://www.youtube.com/watch?v=l5Tw0PGcyN0 Why do you do what you do?! What’s the purpose? How do you do what you do? What the hell do you do? THE GOLDEN CIRCLE Simon Sinek
  6. Identify the signals from the noise @SCALE Volume DATA AT SCALE Variety VARIOUS FORMS OF DATA Velocity STREAMING IOT Veracity DATA UNCERTAINITY DIGITALMEDIA:280Exabytes FB:300+Petabytesperday * Information from multiple sources are adapted and incorporated POINTS,POLYGONS, RASTERS,VECTORS CONNECTINGDATAACROSSSOURCES, ISWHEREANALYSTSSPENDMOSTOFTHEIRTIME SENSORSarede-factoto gatherdata anddetect anomaliesacrossdomains
  7. Monsanto re-inventing Agriculture through Analytics Other providers: Cost Qualit y Agility • No hardware administration, less software administration • Eleven 9’s of data durability • Harness state-of-the-art software services ! • DevOps moving towards NoOps • Provision Infrastructure in seconds: infrastructure as code - automation • Grow or shrink compute to match seasonal workloads and pay smartly as we go Scale: MON has ~1016+ bytes of data and growing rapidly • Global Presence: Taking data driven products & services closer to business ! • Ability to accelerate feature development, integrating analytics rapidly into our workflows @scale ! • Ingest, store & retrieve massive data sets, by using the right data store to our competitive advantage (NoSQL/SQL) ! • Service diversity, Organizational maturity IOT, Imagery, Geo-spatial, Genomics, Molecular Breeding…..
  8. Vision A year ago as we started…
  9. Integrated Extended Enhanced Scalable Enable Analytics @SCALE for the Enterprise Reliable FieldDevices Apps Apps Devices DevicesApps DevicesApps Data M odels M odels M odels M odels Business Unit- 1 Business Unit- 2 Business Unit3 D igital Business Open
  10. Integrate Analytics with Product Platforms Data Data Science@scale Analytical Models Turn Data Into Actionable Insights …. …. APIs Data
  11. Predictive Product Placement @scale PFO PFO Topography Site boundary Zones Experiment metadata Planter A/B line Automap Elevation Soil Weather Topography Zones Location Data Assets Geo-spatial Catalog
  12. Analytics as a Service In Collaboration with IT & Business Scale across teams internalizing a self-service model
  13. Internalize the needs to stay ahead of the curve Addressing analytics needs based on persona ! Descriptive What happened? ! Diagnostic Why did it happen? ! Predictive What will happen? ! Prescriptive What should I do? ! Cognitive What can be learnt? Hindsight Insight Foresight 10’s K of users 1’s K 100’s Science@Scale Information Pro-Consumers Information Consumers Data ScientistsBusiness Users Business Analysts Statisticians Business Intelligence Ad-hoc Analysis Statistical Analysis !Data DiscoveryReports Dashboards Drill Down Machine Learning Inferential CausalExploratory Machine Power Users 10’s Computational Biologists Neural Networks Outsight Systems Natural Language Processing
  14. Discovery Analytics – Development Environments Non-prime Exploratory Prime R & D Development Environments @SCALE • Big-data Infra. & DevOps • Data Provisioning @scale • Model Deployments @scale • Big-data workloads • Computational pipelines • Transformation pipelines • Training pipelines • Sizing & Auto-scaling • Cloud Best practices • 24/7 availability • Monitoring • Alerting • ELK stack • …. Analytical models @SCALE • Co-engineering • Involve us sooner • Thinking scale ahead accelerating Time to Market • Model development & refactoring • R, Asreml, Python, OPL… • Java, Scala, Clojure… • Infrastructure as code • AWS, GCP, AzureML • Docker, Kubernetes • Distributed computing • Architecture • Solutions Design • Development !• API integrations • KAFKA integrations • OAUTH2 Integrations • Security/ISO collaborations Build it once, deploy frameworks as needed for user groups: Bundled in a centralized eco-system Non-prod to Prod BLUE / GREEN
  15. Discovery Analytics Development Environments Data Scientists, Developers and Novice Users From Discovery to Production Culture, approach and adoption Know Your Users For Community By Community ! Tailor by Needs Balance Freedom with Governance ! ! ! Drive User Adoption Environments iteratively served to everyone @monsanto Enable analytical capabilities @scale for the enterprise integrated with Product Platforms As of today, # of unique data scientists across groups utilizing our discovery analytics environments Model maturity Global Scalability Core teams : Train the trainee to share knowledge and best practices utilizing the environment
  16. Business Capabilities Make the platform robust, sharing a few use cases
  17. Environmental Classification @scale Engineered using Discovery Analytics - Development Environment Data Provisioning APIs Data Transformation QA/QC Rules Scala Python Scikit API API
  18. ! • Collaborations with Data Science Teams: Co-engineering R based machine learning model to a Scala based model training pipeline for scalability ! • EMR (Amazon Hadoop) & DataProc (GCP) using Apache Spark Computation Engine @scale • Iterative ON-DEMAND framework, auto-scaling up-to N number of nodes ! • Training pipeline integration with APIs & co-engineering continuum Molecular Breeding: Training Pipeline @Scale Engineered using Discovery Analytics - Development Environment Data DATA LEARNER MODEL 1
  19. Cognitive Analytics Pipeline ! • Collaborations with Cognitive Analytics Data science team to build: • An integrated Predictive Product Pipeline from inception to commercialization ! Built on: ! • Apache Airflow (incubating): DAG based model chaining & workflow management platform • Models written in Python, R • Parallelism achieved via Celery workers • Being customized now to utilize Spark ! • Apache Parquet - Columnar Storage Format on a file system; extremely parallelizable ! • Facebook Presto query engine to query parquet’s via SQLs through REST APIs – highly performant ! • Cloud Analytics platform integration • Co-engineering solutions @scale mining millions of data points to derive actionable insights Workflow DAGs Libraries Engineered using Discovery Analytics - Development Environment
  20. Deep learning @SCALE Discovery Analytics Development Environments integrated with CloudML on GCP Collect Store Train Predict Evaluate Training Pipeline Retrain • First Ever Deep Learning platform for the Enterprise ! • Perform Deep Learning @scale on CloudML using TensorFlow via Jupyter from Prime environment ! • Integrated with data, Inputs, Outputs and Metadata including Tensor Board to monitor your model training runs Discovery Analytics - Workflow Production Deployment - Workflow
  21. DATA INGESTION AND TRANSFORMATION VIA API’s AND STREAMS Streaming Business Intelligence RUN ANALYTICS@SCALE IN THE CLOUD Collaborative Data Science - DISCOVERY ANALYTICS DATA DRIVEN PRODUCTS KAFKA Streams Data Warehouse*Big-data Model outputs via APIs & Streams In-house/Third Party: Platforms AWS, GCP, Cloudera, DataStax, IBM, Azure, Domino labs… Prescriptive PredictiveCognitive Historical Models - Deep Learning, Computational Pipelines, Classification & Simulation Engines Turn Data into Actionable Insights
  22. Our Journey of Transformation We have just scratched our surface: ! • Science@scale – Our Cloud Analytics Platform is only a year old ! • Talent, Behavior and Platform as our 3 key pillars of focus ! • Talent: • Building big-data and cloud analytics engineering team from the ground up – 150+ interviews, 15 people team now • Targeting A players, nurture the team on new technologies, build leaders ! • Behavior/Cultural Mind shift: Data Science & IT Engineering operating as ONE TEAM • Two extreme spectrums • Finding the sweet spot in the middle has been the cultural shift • Data science teams have been very supportive, adapting to change • Bringing in IT best practices: Agile methodologies, versioning, CI…. • Train the trainee approach to enable adoption across the enterprise • Leverage the best of both worlds by co-engineering solutions • Collaboration is our new competitive advantage ! • Platform: We are at ground zero now, continuing to deliver Minimum Viable Products each sprint • Continue to mature & stay cutting edge on technologies • Build vs. Buy [Cost, Time, Quality] • Miles to go before we sleep
  23. https://www.youtube.com/watch?v=l5Tw0PGcyN0 Why do you do what you do?! What’s the purpose? How do you do what you do? What the hell do you do? THE GOLDEN CIRCLE Simon Sinek • Help identify the signals from the noise @scale An Enterprise Cloud Analytics platform to serve: • Analytics as a service enabling Discovery Analytics environments for the data science community • Predictive, prescriptive, streaming, cognitive, IOT edge analytical capabilities @scale • Big Data Cloud Analytics Engineering • Internalize data science needs thinking scale ahead
  24. Thank You 
 Visit us at engineering.monsanto.com
 
 We are looking for passionate big data cloud analytics engineers to join our team.
 
 https://www.linkedin.com/in/vishnukannan