Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Organizing for Data Science 
Dan Mallinger 
Data Science Practice Manager 
September 2014
CONFIDENTIAL | Dan Mallinger 
• Data Science Practice Manager 
− Think Big Analytics 
• Working with clients across 
− Fin...
CONFIDENTIAL | Today 
• Define Data Science in the Organization 
• Look at Current Perspectives on Organization 
• Discuss...
Ÿ Use Data to Improve Our 
Business 
Ÿ Better Understand Customers 
Ÿ Act Proactively, Not Reactively 
CONFIDENTIAL | W...
CONFIDENTIAL | Ÿ Scale 
Ÿ Robustness 
Ÿ Repeatability 
Why Organize? 
CONFIDENTIAL 5
Ÿ Revolutionizing Ad Targeting 
Ÿ Automating Deals and 
Recommendations 
Ÿ Alerting Admins to New Network 
Attacks 
CON...
CONFIDENTIAL | Ÿ Specific Data Expertise 
Ÿ Exploratory Analysis 
Ÿ Modeling 
Ÿ Creativity 
Ÿ Programming 
Ÿ Big Dat...
CONFIDENTIAL | The New Toy: A Center of Excellence 
Ÿ Centralized 
- Brings data, analysis, and 
processing together 
- D...
CONFIDENTIAL | Ÿ Specific Data Expertise 
Ÿ Exploratory Analysis 
Ÿ Modeling 
Ÿ Creativity 
Ÿ Programming 
Ÿ Big Dat...
CONFIDENTIAL | Ÿ Designed a great home for unicorns 
Ÿ But they are still unicorns 
CONFIDENTIAL 10 
If You Build It, Th...
Ÿ Unravel Capability 
Ÿ Map Activities to Functional Roles 
Ÿ Align Functions with Process, 
Not Individuals 
Ÿ Don’t ...
Ÿ Identify Fraudulent Sessions 
Ÿ Cross Channel Analysis 
Ÿ Next Best Action 
Ÿ Optimize Pathways 
Ÿ Determine Sessio...
Ÿ Billions of clicks 
Ÿ Unstructured data 
Ÿ How do we model it?! 
CONFIDENTIAL | Ÿ Model the SIGNAL 
Ÿ Not the data ...
MPP Web 
CONFIDENTIAL | CLIENT EXAMPLE 
Clickstream Data Science in Action 
CONFIDENTIAL 14 
Hadoop 1.0 
Feature Selection...
CONFIDENTIAL | Ÿ Feature Selection 
- Forests 
- Clustering 
Ÿ Dimensionality Reduction 
- SVM 
Ÿ Challenges 
- Job Lat...
CONFIDENTIAL | CLIENT EXAMPLE 
Extracting Signal: Hadoop 2.0 
• Spark 
− Faster response in exploration 
− Better Support ...
Ÿ Focus on Technical Skills 
- EDA 
- Modeling 
- Programming / Big Data 
Ÿ Communication Skills 
- Capturing signal nee...
CONFIDENTIAL | CLIENT EXAMPLE 
CoE Next Steps 
• Continue to make signal available to analysts 
− Next up: Extracting sign...
CONFIDENTIAL | Discussion Over Drinks 
CONFIDENTIAL 19
Upcoming SlideShare
Loading in …5
×

Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

2,564 views

Published on

This talk will introduce a paradigm for enabling access to large, unstructured, and novel datasets in enterprises, while retaining value from existing tools and staff. By following a real world example, the discussion will walk through how small, central data science teams can make data discoveries and data value accessible to others. We will also review the tools, data science approaches, and best practices to uncovering, polishing, and digesting signal in data to support analytics at the front lines of business.

Published in: Technology
  • Be the first to comment

Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

  1. 1. Organizing for Data Science Dan Mallinger Data Science Practice Manager September 2014
  2. 2. CONFIDENTIAL | Dan Mallinger • Data Science Practice Manager − Think Big Analytics • Working with clients across − Financial Services − Advertising − Manufacturing − Social − Network Providers CONFIDENTIAL 2
  3. 3. CONFIDENTIAL | Today • Define Data Science in the Organization • Look at Current Perspectives on Organization • Discuss Shortcomings • Review a Real World Solution CONFIDENTIAL 3
  4. 4. Ÿ Use Data to Improve Our Business Ÿ Better Understand Customers Ÿ Act Proactively, Not Reactively CONFIDENTIAL | What Do We Hope to Do? CONFIDENTIAL 4
  5. 5. CONFIDENTIAL | Ÿ Scale Ÿ Robustness Ÿ Repeatability Why Organize? CONFIDENTIAL 5
  6. 6. Ÿ Revolutionizing Ad Targeting Ÿ Automating Deals and Recommendations Ÿ Alerting Admins to New Network Attacks CONFIDENTIAL | Perception: What Does Data Science Do? CONFIDENTIAL 6
  7. 7. CONFIDENTIAL | Ÿ Specific Data Expertise Ÿ Exploratory Analysis Ÿ Modeling Ÿ Creativity Ÿ Programming Ÿ Big Data Ÿ Communication Ÿ Ability to Target Impact Ÿ Unstructured Analysis Ÿ Organizational Politics Ÿ Visualization Ÿ … What Does It Take? CONFIDENTIAL 7
  8. 8. CONFIDENTIAL | The New Toy: A Center of Excellence Ÿ Centralized - Brings data, analysis, and processing together - Data scientists support one another Ÿ Distributed - Data scientists close to business - Multiple models for rotating data scientists into lines of business CONFIDENTIAL 8 Line of Business A CoE Line of Business B Line of Business C
  9. 9. CONFIDENTIAL | Ÿ Specific Data Expertise Ÿ Exploratory Analysis Ÿ Modeling Ÿ Creativity Ÿ Programming Ÿ Big Data Ÿ Communication Ÿ Ability to Target Impact Ÿ Unstructured Analysis Ÿ Organizational Politics Ÿ Visualization Ÿ … What Does It Still Take? CONFIDENTIAL 9
  10. 10. CONFIDENTIAL | Ÿ Designed a great home for unicorns Ÿ But they are still unicorns CONFIDENTIAL 10 If You Build It, They Will Come?
  11. 11. Ÿ Unravel Capability Ÿ Map Activities to Functional Roles Ÿ Align Functions with Process, Not Individuals Ÿ Don’t Forget to Scale CONFIDENTIAL | Working with Horses, Not Unicorns CONFIDENTIAL 11
  12. 12. Ÿ Identify Fraudulent Sessions Ÿ Cross Channel Analysis Ÿ Next Best Action Ÿ Optimize Pathways Ÿ Determine Session Interest Ÿ Customizing Experience Ÿ Proactive Outreach Ÿ Search Analysis Ÿ Content Optimization CONFIDENTIAL | CLIENT EXAMPLE Clickstream Data in Action CONFIDENTIAL 12
  13. 13. Ÿ Billions of clicks Ÿ Unstructured data Ÿ How do we model it?! CONFIDENTIAL | Ÿ Model the SIGNAL Ÿ Not the data CLIENT EXAMPLE Scaling Data Science CONFIDENTIAL 13
  14. 14. MPP Web CONFIDENTIAL | CLIENT EXAMPLE Clickstream Data Science in Action CONFIDENTIAL 14 Hadoop 1.0 Feature Selection & Dimensionality Reduction
  15. 15. CONFIDENTIAL | Ÿ Feature Selection - Forests - Clustering Ÿ Dimensionality Reduction - SVM Ÿ Challenges - Job Latency - Limited Iterations CLIENT EXAMPLE Extracting Signal: Hadoop 1.0 CONFIDENTIAL 15
  16. 16. CONFIDENTIAL | CLIENT EXAMPLE Extracting Signal: Hadoop 2.0 • Spark − Faster response in exploration − Better Support for Iterative Models • Genetic Algorithms • Neural Networks • Challenges − In memory: costly and limiting − MapReduce does not go away CONFIDENTIAL 16
  17. 17. Ÿ Focus on Technical Skills - EDA - Modeling - Programming / Big Data Ÿ Communication Skills - Capturing signal needs - Iterating with stakeholders CONFIDENTIAL | CLIENT EXAMPLE Horses, Not Unicorns CONFIDENTIAL 17 Hadoop 1.0
  18. 18. CONFIDENTIAL | CLIENT EXAMPLE CoE Next Steps • Continue to make signal available to analysts − Next up: Extracting signal from text • Act as a capability search party − Sprints of new insights and tools • Finalize operating model − Funding structure − Engagement model with lines of business CONFIDENTIAL 18
  19. 19. CONFIDENTIAL | Discussion Over Drinks CONFIDENTIAL 19

×