SlideShare a Scribd company logo
1 of 16
Data Science with Teams
Improve the efficiency of your data science teams
with platforms that enhance collaboration and
flexibility
Agenda
● Some Background
● Goals
● Data Science Project Teams
● Challenges
● Some Solutions
● Conclusions
Background
Integration experience with Oil & Gas, Financial, Insurance and Retail industries in
multiple geographies
What did these customers have in common? All had data science teams that
worked in Silos
Difficulties when taking a data science
course
Source: Wikipedia
Background (cont)
What’s going on here?
We started to do some digging!
Source: http://cesarsway.com
Data Teams - The Old Way
Department Teams Data Scientist
Data Analyst IT Manager
The Analytics Deliverable
A dashboard! An interactive dashboard is even cooler.
Conway’s Data Scientist Venn Diagram
Data Science Teams - The New Way
Data ScientistFinance Manager
Accountant
Tax and Compliance
Treasury
Data Gurus:
- Analytics
- Data Engineers
- Business Intelligence
- Compliance
IT Manager
The Data Science Deliverable
A machine or deep learning model!
I Want GPUs - And I Just Want Them to Work
Work around for NVidia Docker Wrapper:
- nvidia-docker -d -p 8888:8888 tensorflow/tensorflow:latest-gpu
OR
- docker run -ti --rm `curl -s http://localhost:3476/docker/cli`
tensorflow/tensorflow:latest-gpu
OR
- docker run -ti --rm --volume-driver=nvidia-docker --
volume=nvidia_driver_375.82:/usr/local/nvidia:ro --device=/dev/nvidiactl --
device=/dev/nvidia-uvm --device=/dev/nvidia0 nvidia/cuda nvidia-smi
The Need for DevOps Chops
Registrator
Docker
Container
EC2 Instance
Reverse Proxy with consul-template
The old way... The new way...
Docker
Container
EC2 Instance
Reverse Proxy with static upstream location
name
$$$ $
Infrastructure Management
Uh-oh, someone has to manage this stuff!
Our Architecture - API First and Microservices
3Blades Hub for Data Scientists
Solutions
Provide flexibility with the tools that data scientists use for exploratory
data analysis and visualizations
One central source for project files with support for version control
Share visualizations from EDA
Train and save Machine Learning and Deep Learning models with
multiple frameworks, from within the same project
Streamline deployment pipelines
Thank You!!
Email: hello@3blades.io
Web: https://3blades.io
Twitter: @3bladesio
GitHub: https://github.com/3blades
Email: gwerner@3blades.io
Twitter: @gwerner
LinkedIn: https://www.linkedin.com/in/wernergreg
GitHub: https://github.com/jgwerner

More Related Content

Viewers also liked

Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
MLconf
 
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
MLconf
 
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
MLconf
 
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
MLconf
 
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
MLconf
 
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
MLconf
 
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
MLconf
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
MLconf
 

Viewers also liked (11)

Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
 
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
 
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
 
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
 
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
 
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
 
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
 
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
 
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
 
Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017
 

More from MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 

Greg Werner, CEO & Founder, 3Blades.io at MLconf ATL 2017

  • 1. Data Science with Teams Improve the efficiency of your data science teams with platforms that enhance collaboration and flexibility
  • 2. Agenda ● Some Background ● Goals ● Data Science Project Teams ● Challenges ● Some Solutions ● Conclusions
  • 3. Background Integration experience with Oil & Gas, Financial, Insurance and Retail industries in multiple geographies What did these customers have in common? All had data science teams that worked in Silos Difficulties when taking a data science course Source: Wikipedia
  • 4. Background (cont) What’s going on here? We started to do some digging! Source: http://cesarsway.com
  • 5. Data Teams - The Old Way Department Teams Data Scientist Data Analyst IT Manager
  • 6. The Analytics Deliverable A dashboard! An interactive dashboard is even cooler.
  • 8. Data Science Teams - The New Way Data ScientistFinance Manager Accountant Tax and Compliance Treasury Data Gurus: - Analytics - Data Engineers - Business Intelligence - Compliance IT Manager
  • 9. The Data Science Deliverable A machine or deep learning model!
  • 10. I Want GPUs - And I Just Want Them to Work Work around for NVidia Docker Wrapper: - nvidia-docker -d -p 8888:8888 tensorflow/tensorflow:latest-gpu OR - docker run -ti --rm `curl -s http://localhost:3476/docker/cli` tensorflow/tensorflow:latest-gpu OR - docker run -ti --rm --volume-driver=nvidia-docker -- volume=nvidia_driver_375.82:/usr/local/nvidia:ro --device=/dev/nvidiactl -- device=/dev/nvidia-uvm --device=/dev/nvidia0 nvidia/cuda nvidia-smi
  • 11. The Need for DevOps Chops Registrator Docker Container EC2 Instance Reverse Proxy with consul-template The old way... The new way... Docker Container EC2 Instance Reverse Proxy with static upstream location name $$$ $
  • 12. Infrastructure Management Uh-oh, someone has to manage this stuff!
  • 13. Our Architecture - API First and Microservices
  • 14. 3Blades Hub for Data Scientists
  • 15. Solutions Provide flexibility with the tools that data scientists use for exploratory data analysis and visualizations One central source for project files with support for version control Share visualizations from EDA Train and save Machine Learning and Deep Learning models with multiple frameworks, from within the same project Streamline deployment pipelines
  • 16. Thank You!! Email: hello@3blades.io Web: https://3blades.io Twitter: @3bladesio GitHub: https://github.com/3blades Email: gwerner@3blades.io Twitter: @gwerner LinkedIn: https://www.linkedin.com/in/wernergreg GitHub: https://github.com/jgwerner

Editor's Notes

  1. Talking points: Siloed data initiatives is a common denominator Data scientists were segregated from the rest of the organization Tooling was disparate Initially, the need to streamline Jupyter Notebook deployments with a class of students came up after many students were complaining about the time and effort involved in using specific dependencies to complete their tasks. Package managers were not enough: users also needed an integrated solution to access a consistent and reliable solution to complete their assignments using Jupyter Notebooks. We also noticed that companies, in general, did not provide a homogeneous environment for their data science teams. This led to many headaches but was considered business as usual.
  2. Talking points: Issues encountered with the education vertical were common across industry, i.e. too much time spent con configuration Basic ROI calculations justified the implementation and support of a data science hub Data science platform “a ha” moment came when pitching a solution to consolidate project workspace environments for different people across different organizations, in particular for Exploratory Data Analysis (EDA). Educational institutions are usually constrained by budget requirements, however, after providing ROI numbers on how much time and effort Teachers Assistants (TA’s) spent on providing technical support for their users, the decision to implement a data science platform was a no brainer. Nevertheless, we had the suspicion that the enterprise (SMB’s and large enterprises alike) were encountering the same challenges but were exacerbated due to the fact that more personas were involved within data science and analytics teams.
  3. Talking points: Disparate teams Data scientists siloed from the rest of the organization Ultimate goal is to automate certain processes within the organization Automation helps improve the top and bottom lines, improves competitiveness Organizations struggle to become ‘data driven’. What does that mean? Data driven organizations are those that wish to use the data they have available to improve insights and allow their business to become more competitive. Assuming the organization has successfully consolidated their data into central data warehouses or data lakes, and assuming this data is defined with standard schemas, data science and data analytics teams have the power to analyze the data, obtain valuable insights and start improving the agility of their organizations with ‘prescriptive analytics’ and ‘predictive analytics’. Prescriptive analytics involves creating machine learning and deep learning models that automate certain business processes, such as: Automatically tag images with classification types (cat or not a cat) Automatically classifying a customer with the probability that the customer will churn Recommendations for value added products to improve the checkout dollar amount at an e-commerce site Spam or not spam However, organizations have struggled to integrate data scientists into their organizations. Data science teams that just ‘do the math’ and create visualizations on an organization's data sets to not provide much value in and of itself. Creating a machine learning model that automatically recommends a product that is not strategic to the organization does not provide much value.
  4. Talking points: Dashboards democratize data so that team members can quickly absorb meaningful insights and key performance indicators. Exploratory data analysis (EDA) and model creation/deployment not really a part of the picture. Traditional Business Intelligence tools have been around for years. Some tools offer specific integrations into a variety of data sources and allow users to quickly create rich and interactive visualizations of their data. SQL, a language made popular by relational databases, is a very popular language for analytics. New developments help accelerate the time from data source to dashboard with in memory calculations, GPU powered databases, among others. Big data tools, such as Hadoop and Spark allow users to create dashboards from large data sets. However, BI tools rely traditionally on structured data. Also, traditional dashboards don’t take into account how to create machine learning and deep learning models.
  5. Just a review of a Data Scientist’s skill set.
  6. Talking points: Organizations realize they need to automate their processes and that automation must come from real time analysis of data points The deliverable is not just a BI dashboard anymore, the deliverable is a deployable machine learning and deep learning model Embedding a data science team member into the group increases value As mentioned, historically data science teams have been isolated from the rest of the organization. Successful data driven organizations embed their data scientists into various business groups. For example: data extraction and loading into a warehouse table are done by engineering teams, however, a data science liaison, embedded within a certain department or relevant company wide project, can help data engineers improve the schema definition for the data being exposed which could save valuable time during the exploratory data analysis phase. Data engineers can create tables using their favorite Extract Transform and Load (ETL) tools to remove not-a-number (NaN) rows, remove columns that are irrelevant such as data base PK/FK’s, etc. Inversely, the data scientist could help the person telling the data story (could be anyone in the group, including herself) what features are relevant, how the certain normalizations were completed without delving into the technical details, etc. “This was the only customer that bought a widget in Atlanta so the attributes for this person were adjusted to not skew the dataset in their favor”.
  7. Talking points: Move from prescriptive to predictive analytics Deliver a machine learning or deep learning model that will allow organizations to automate processes Visualizations are still important, but used for telling a data story for EDA and also for visualizing how models are behaving in real time Predictive analytics looks at the historical trends in data to provide insights. Organization members are then tasked to optimize processes to improve organizational results based on trends. However, companies need to automate tasks (remove the human from the actual task execution) based on certain indicators. In this case, visualizations are used in EDA to better understand the data with the goal of creating and deploying machine learning and deep learning models that can automate certain organization processes.
  8. Talking points: Support data source imports from multiple sources EDA needed as first step to build and deliver artifacts to automate business processes. Artifacts in this context are machine learning and deep learning models. Data engineers and DevOps need access to data science hub to streamline their own processes Traditional teams use Excel spread sheets, among other tools, and are flying back and forth with emails, chat applications or external project management solutions. Even if all users work within shared environments such as Google Docs or Office 365, teams had no way of sharing all files and tools within one common environment, particularly for exploratory data analysis, since viewing and editing files within these environments are constrained to a certain set of file formats. Nevertheless, certain organizations and individuals prefer one language over the other. For example, a data science team involved with the Finance department may be more involved with using the R programming language, and the data science team involved with the marketing department may be more involved with the Python programming environment. In both cases, users may use multiple tools for one language. For example, some individuals may prefer RStudio for R, and others amy prefer using R with Jupyter Notebooks. Server management is important to optimize compute resources.
  9. Traditional teams use Excel spread sheets, among other tools, and are flying back and forth with emails, chat applications or external project management solutions. Even if all users work within shared environments such as Google Docs or Office 365, teams had no way of sharing all files and tools within one common environment, particularly for exploratory data analysis, since viewing and editing files within these environments are constrained to a certain set of file formats. Nevertheless, certain organizations and individuals prefer one language over the other. For example, a data science team involved with the Finance department may be more involved with using the R programming language, and the data science team involved with the marketing department may be more involved with the Python programming environment. In both cases, users may use multiple tools for one language. For example, some individuals may prefer RStudio for R, and others amy prefer using R with Jupyter Notebooks. A central source for project files alleviates compliance requirements. Usually, data engineers (either due to security requirements or simply that they don’t want to surface multiple schemas/formats for different users) would rather deliver the data product to a ‘clean’ table, so data scientists can do their work using self service approaches. Having a centrally managed set of files for specific projects also helps keeps things organized when different users are accessing project files, so version control becomes important as well.