Asking the Right Questions of Your Data
Upcoming SlideShare
Loading in...5
×
 

Asking the Right Questions of Your Data

on

  • 632 views

Executives are still waiting on our “Big Data Deep Insights”. Many of us are down the path of collecting, extracting, and analyzing our ever-growing data in Hadoop environments. We are building ...

Executives are still waiting on our “Big Data Deep Insights”. Many of us are down the path of collecting, extracting, and analyzing our ever-growing data in Hadoop environments. We are building our data science expertise and expanding data governance. Yet still we are not getting what we are waiting for.This talk is about:
1. Getting to the right questions
2. Setting expectations with the executive team
3. The unintentional consequence of suddenly having lots of data
4. Framing the boundaries of our data science
5. Pragmatic data governance
6. Looking outside your data to 3rd party data

Statistics

Views

Total Views
632
Views on SlideShare
632
Embed Views
0

Actions

Likes
0
Downloads
23
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Sometimes clustering could be enough to solve a business problem
  •   We must understand the columns well before understanding the relationships
  • Data Science results lead to better database marketing – churn analytics, upselling, cross selling, RFM/LTVThese are some of the areas where we’ve used data science and machine learning to come up w/ some interesting models.

Asking the Right Questions of Your Data Asking the Right Questions of Your Data Presentation Transcript

  • Copyright © Think Big Analytics and Neustar Inc.1 Asking the Right Questions of your Data Mike Peterson VP of Platforms and Data Architecture, Neustar Jun 26, 2013
  • 2 Copyright © Neustar Inc.
  • We have come a long way!!! 3 But where/when is the GOLD? Unintended Consequence of Big Data We need to ask the right Questions Oh, and lets remember religion and not forget GOVERNANCE Copyright © Neustar Inc.
  • Big Data Evolution Status 4 » New data platform is built – 3Tier » Collected many Pbs of data » Hadoop infrastructure in place for 2yrs » Established Data Science teams » Machine Learning is in place » Increased technology skills » Focused data teams » Active in the community Copyright © Neustar Inc.
  • Our Partners are still a part of our process 5 Copyright © Think Big Analytics and Neustar Inc. » Expertise in Technologies » Trusted partner » Collaborative Teams » Open source leader » Invested in client success » Price/performance
  • Some Unintended Consequences 6 » More Customer Reporting Request » Because we suddenly have lots of customer data available » Meaning more work for the DW team!!! » DR Site is more required than ever » More data, means more critical data to protect » Network Stress to support DR and other additional access » Data Governance is overwhelmed with request » Retention Policies need to be re-thought Copyright © Neustar Inc.
  • Questions 7 » Customer Driven Questions » Easy to understand » Subject Questions » Discover the pivot and you have a good start » Exploratory Questions » Thinking of the unformed questions » Working from the top down » Narrowing the answer before you test all the data Copyright © Neustar Inc.
  • Questions - Approaches • Understand what manual process you want to automate: what is currently manually predicted that could be automated and determine if there’s any way to get training data comprising of <input,output> pairs. • Consider methods to augment existing data with a “pivot” column that can be used to join. For example, geo-location of an IP address could lead to joining with Census Data based on zip+4.
  • Questions - Approaches • Determine if your problem is one of prediction or one of grouping (clustering). The latter is more of a task that can lead to better understanding rather than solving a direct business problem.
  • Questions - Approaches • Determine if you are more interested in finding “interesting” relationships among data columns rather than knowing the columns. This is a task I’d call more of “discovery” than prediction but the idea is to determine one column as the output column in terms of the other columns as input. • Doing this for all output columns can lead to “discovery” of those correlations that are the strongest (e.g., every time a customer buys beer at 5PM, he is likely to buy diapers). This is more of a fishing expedition, but can lead to unusual insights.
  • Impetus Approach to Questioning Data 11 Copyright © Neustar Inc. EXISTING DATA PROPERTY BUSINESS STRATEGY CUSTOMER PROBLEM STATEMENTS ANALYSIS OF DATA PROPERTY DISCUSSION WITH STAKEHOLDERS ANALYSIS OF PROBLEM STATEMENT DATA NEEDS STATEMENT REFINED PROBLEM STATEMENT DATA ANALYTICS PLAN
  • Who knew there was religion in Analytics 12 » Statistical Analysis vs. Machine Learning » Stats people think “truth” » Machine Learning people think “near truth” » Truth is easy to bound » Cost models make sense to org » Near Truth is hard to explain and bound » It is where the real exploration happens » But – it can consume the Data Scientist » Both can net real returns – and they need to co- exist Copyright © Neustar Inc.
  • 13 Copyright © Neustar Inc.
  • GOVERNANCE 14 » Don’t forget about Governance » Contracts » PII » Brand » CPO & CISO are your friends - honestly » Protect your CUSTOMER DATA » It will slow you down in the beginning » But you want your results to be reputable » We need to get to a policy framework at some point that is automated Copyright © Neustar Inc.
  • About Impetus » Accelerated consulting and services leader for Big Data; Headquartered in San Jose since 1996; 1400+; Presences in Silicon Valley, Atlanta, NYC; offices in India; Expertise through Architects » Pioneers in distributed software engineering with vertical and functional expertise; Dedicated innovation labs; 200+ Big Data practitioners; 80+ dedicated to R&D
  • Drill * Incoming Question * Problem Landscape * Underlying Constraints * Specific Goals Assess * Goal Driven Hypotheses * Data Requirement * Resource Requirements * Analysis Plan Target * Data Collection * Quality Assessment * Cross Validation * Restructuring Analyze * Test Previous Hypotheses * Explore New Hypotheses * Test * Quantify Results Recommend * Summary of Results * Key Novel Insights * Impact Analysis * Action Items Data Science Approach
  • » Recommender Systems » Sentiment Analysis » Topic Identification » Predictive Analytics » Data Stream Analytics Data Science Focus Areas Contact us at bigdata@impetus.com
  • Thank you Questions?