Confidential Think Big AnalyticsConfidential Think Big Analytics
Big Analytics Best Practices
An Executive Guide
September...
Confidential Think Big Analytics
Introduction
• One of Silicon Valley’s Fastest Growing Big Data start ups
• 100% Focus on...
Confidential Think Big Analytics
Big Analytics, enabled by Big Data
Big Data invented to solve
web scale data challenges.
...
Confidential Think Big Analytics
1. It’s just a new name for Business Intelligence.
2. The packaged applications are about...
Confidential Think Big Analytics
Incremental Adoption
58/17/2013
Confidential Think Big Analytics
Real World Results
68/17/2013
Confidential Think Big Analytics
360 Customer View Analytics
Trends
• Compute model scores faster
• Analyze full data sets...
Confidential Think Big Analytics
Social Media
“The digital transformation occurring at American Express cuts across many b...
Confidential Think Big Analytics
Think Big
98/17/2013
Confidential Think Big Analytics
Envision
Current State
Future State
Prioritized
Initiatives
Key
Decisions &
Impact
Analys...
Confidential Think Big Analytics
Data Strategy: Value from Integration
Ad Server
Mobile
Social
Web Site
Devices & Enterpri...
Confidential Think Big Analytics
Start Smart
128/17/2013
Confidential Think Big Analytics
Organizing for Success
• Driven by collaboration between
data scientists, engineers and
b...
Confidential Think Big Analytics
Need for New Skills
Database
Administrator
Big Data
Administrator
Business
Analyst
Data S...
Confidential Think Big Analytics
Scale Fast
158/17/2013
Confidential Think Big Analytics
An Integrated Approach
Creating value with nimble, incremental innovation
Brainstorm
POC
...
Confidential Think Big Analytics
• Develop data and analytics
platforms that bridge the old
and new.
• Understand integrat...
Confidential Think Big Analytics
Data Science
A New Role Exists – the Data Scientist
Focused on data not models
Works with...
Confidential Think Big Analytics
1. Big Analytics is a critical capability.
2. Your organization can create value now.
3. ...
Confidential Think Big Analytics
Rick
Ron.Bodkin@thinkbiganalytics.com
@ronbodkin
Upcoming SlideShare
Loading in …5
×

Big analytics best practices @ PARC

2,175 views

Published on

ThinkBig Analytics presents at PARC, a Xerox company

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,175
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
61
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • New Data Sources, Innovative Use Cases, Data Science & Predictive AnalyticsA new class of big data technologies were invented to address data management challenges at Web scale. These technologies enabled new approaches to solve analytic questions that were too complex or did not fit into traditional systems:Reduce cycle time developing new analytic modelsRun analyses that were previously impossible Simpler modeling approaches by utilizing larger datasetsAnalysis conducted at a far lower costFlexibility for future unknowns +Compute Processing $ & Time ex. 26 Days 2 minex. 42 Hours  40 minex. 18 Hours  16 min=Business Innovation VelocityBig Data is Changing the Game. Organizations need to get smarter, leveraging substantial untapped data assets for sustained competitive advantage. reduce cycle time -> 1. much lower effort to work with new datasets; 2. parallel distributed infrastructure processes data much faster 3. compute approximate answers before investing in projects to automateMore detailed example on reduced cycle time - Hive allows you to define the underlying structure of the raw data just enough to let you run SQL-esque queries against it. run new types of analysis->1) model across complex datasets that did not match relational database model2) work with larger datasets and compute intensive algorithms simpler modeling->1) google whitepaper2) fewer assumptions, simpler models required when looking at entire customer dbvs 3% and extrapolating lower cost->- shorter cycle times- lower infrastructure costs for storage and processing utilizing commodity hw and open source sw- reduced processing time Flexibility -> promise, that by storing everything, you have source data to continue to generate and model new hypothesis, reduced cycle times for experiments to increase value, now have the ability to store 10 years of full data for self and suppliersInnovations in commodity hardware, elastic, distributed, open source software platforms, such as Hadoop, and NoSQL database technologies are changing the game for advanced analytics at the core:
  • leverages the manifest and latent signal of multistructured data        we say "multistructured" not "unstructured" now        most of the data in the world has latent signal - it's hidden as a messy tangle of other crap.  bi tools are really designed to work with data where the relevant signal is overt (manifest variables) and this is true of the corresponding models* emphasizes exploratory analysis to uncover novel topologies in the data        so this is stuff like narrow strata and behavioral cohorts. just said all fancy and whatnot* boosts power with diverse multivariate models and wholistic data sets        the world is multivariate        integrating models designed for structured data with those designed for unstructured data gives new power        it not just more data, it's data from new sources, providing a new lenses, new behaviors, etc. all in concert                e.g. integrating online and offline, adding offline brand exposures to online ad efficacy assessments and attribution analysis* triangulates truth with multiple approaches when problems are intractable        stop trying to "prove" things, let validity and predictability testing guide you, focus on avoiding spurious relationships through theoryPlaybook as talking pointsRich data setsURLs, social graphs, text feedback…New, rich visualizationExplorationAutomated detectionHighlights, trends, anomaliesCollaboration with data scientists…
  • DBA -> Big DBAPrior experience:Diverse system environmentsApplication performance mgtSystems appreciationMetrics-focusedNew skills:Management & monitoring toolsMetricsAutomation for scaleLower-level workload tuning DA --> DA BDMPrior experience:Data-focused: digging into detailsDiverse database environmentsDeep domain knowledgeFamiliarity with unstructured data (XML)Hybrid dbs and non-db systemsNew skills:Data modeling for unstructured dataAlternative tools and documentationLanguages and APIs (Hive, Pig, M/R)Process Models (M/R, Key/Value)Lower-level optimizations BA -> DS MMNew Skills:Introduction to HadoopNew tools for data manipulationVariety of new modelsChallenging top-down approachesWorking with unstructured dataBottoms-up pattern discoveryEfficient programming at scaleLarge scale Machine Learning Dev -> Big Data EngineerNew Skills:Processing models(MapReduce, Key/Value)Data modelingSchemas for unstructuredLanguages/APIs (Hive, Pig, M/R)Work process from small to full-scaleInvestigating approachesManual optimization ExplorationLearning1st Internal DataTest WorkloadsProcess LimitedProductionPilot AppsAgile DataFeedback LoopProcess LimitedPortfolioBroad App RangeIntense AnalyticsNew Feeds, Derived DataSpace LimitedData-CentricOrgImpacts Core BizNew ProductsAnalytic FocusSpace Limited
  • Big Data solution FactoryBig Data Labs Asset partnering with Think Big Gather best practicesQtlry review of brainstormsVendor briefingsGartner and ind analysts and researchWhat is the criteria for techSelection on POC vs PilotRamp Adoption and share assetsCollaboration tools
  • Big analytics best practices @ PARC

    1. 1. Confidential Think Big AnalyticsConfidential Think Big Analytics Big Analytics Best Practices An Executive Guide September 26, 2012 Ron Bodkin Founder and CEO ron.bodkin@thinkbiganalytics.com @ronbodkin
    2. 2. Confidential Think Big Analytics Introduction • One of Silicon Valley’s Fastest Growing Big Data start ups • 100% Focus on Big Data consulting & Data Science solution services • Management Background:  Cambridge Technology, C-bridge, Oracle, Sun Microsystems, Quantcast, Accenture  C-bridge Internet Solutions (CBIS) founder 1996 & executives, IPO 1999 • Clients: 40+ – Focuses: Technology, Financial Services, Retail, Advertising • North America Locations • US East: Boston, New York, Miami • US Central: Chicago, Austin • US West: HQ Mountain View, San Diego, Salt Lake City Think Big is the leading professional services firm that’s purpose built for Big Data. 28/17/2013
    3. 3. Confidential Think Big Analytics Big Analytics, enabled by Big Data Big Data invented to solve web scale data challenges. Opportunity and mandate for enterprises to compete with advanced analytics. Now enabling new businesses and products. 38/17/2013
    4. 4. Confidential Think Big Analytics 1. It’s just a new name for Business Intelligence. 2. The packaged applications are about to emerge. 3. The enterprise can wait. 4. Low cost, low skill staffing will work. 5. It’s simple to get results. 6. You can automate all the intelligence. 7. You can buy it all from a single vendor “stack.” The 7 Myths of Big Data 48/17/2013
    5. 5. Confidential Think Big Analytics Incremental Adoption 58/17/2013
    6. 6. Confidential Think Big Analytics Real World Results 68/17/2013
    7. 7. Confidential Think Big Analytics 360 Customer View Analytics Trends • Compute model scores faster • Analyze full data sets • Incorporate new data • Build new services from data Basic Reporting Data Ingestion Batch Processing Fast Analytics Data Enrichment Data Science 78/17/2013
    8. 8. Confidential Think Big Analytics Social Media “The digital transformation occurring at American Express cuts across many business units, and it has to because of the breadth and depth of our business,” Leslie Berland SVP of Digital Partnerships and Development explains. “From customer service to merchant services to our entertainment and travel business units, to corporate affairs, as well as our newly formed digital partnerships and development team, social media is a company-wide initiative.” Source: http://mashable.com/2012/03/28/american-express-social-media/ March 28, 2012 88/17/2013
    9. 9. Confidential Think Big Analytics Think Big 98/17/2013
    10. 10. Confidential Think Big Analytics Envision Current State Future State Prioritized Initiatives Key Decisions & Impact Analysis Reference Architecture Design Patterns Technology Rankings Organization & Training Optimized Projects Selection Big Analytics Roadmap & RecommendationsGap Analysis Big Data Strategy Readiness Analysis Technology Recommendations Big Data Roadmap Big Analytics Roadmap Methodology Analytic Platform Decision Tree Data Strategy 108/17/2013
    11. 11. Confidential Think Big Analytics Data Strategy: Value from Integration Ad Server Mobile Social Web Site Devices & Enterprise Applications Outside Data (new) 118/17/2013
    12. 12. Confidential Think Big Analytics Start Smart 128/17/2013
    13. 13. Confidential Think Big Analytics Organizing for Success • Driven by collaboration between data scientists, engineers and business • Leverages the manifest and latent signal of multi-structured data • Emphasizes exploratory analysis to uncover novel topologies in the data • Boosts power with diverse multivariate models and holistic data sets • Triangulates truth with multiple approaches when problems are intractable 138/17/2013
    14. 14. Confidential Think Big Analytics Need for New Skills Database Administrator Big Data Administrator Business Analyst Data Science Math Modeler Data Architect Data Architect Big Data Modeling Developers Big Data Engineer Invest and scale complementary skills to move to a data-centric organizational model. • Include expert training, mentoring and joint solution development 148/17/2013
    15. 15. Confidential Think Big Analytics Scale Fast 158/17/2013
    16. 16. Confidential Think Big Analytics An Integrated Approach Creating value with nimble, incremental innovation Brainstorm POC Pilot Deploy Training GTM Partners Clients Industry Analysts Strategic Technology Business&TechnologyRequirements Data Science & Analytics Center of Excellence InternalSolutions ExternalSolutions QA TestEngineer Risk Management Big Data Lab Technology Experts Best in Class Analytics Sand Box Monitoring Open Source Innovation Business SMEs Envision Education Engineering Strategy Management, Development & Operations Support & Performance Measurement BUSINESS VELOCITY Administration & Optimization Big Data Strategy Readiness Analysis Technology Recommendations Big Data Roadmap 168/17/2013
    17. 17. Confidential Think Big Analytics • Develop data and analytics platforms that bridge the old and new. • Understand integration patterns and use cases to effectively guide new initiatives. • Partner with business on opportunities for innovation. • Build organizational maturity along a number of dimensions (platform, architecture, data engineering, data science). 17 New IT Platforms Data Mining (R, Mahout) Query (Hive/Pig) MapReduce Parallel Export Parallel Export Messaging Replication Hadoop Cluster Management, Monitoring, and Security Landing Zone External Data Sources Event Ingest Realtime to Seconds Minutes and Up Interactions Analysis Source: Think Big Analytics MPP EDW: structured summary data Fast Unstruct- ured DB Prod Cycle (Min's) Science Cycle (Days) Scheduler & Dependency Engine DFS Data Science Tools Tradtional BI Tools Scale out DB Scale out DB Relational DBMS Serving Engine Secondary Index low vol ACID Read / Write Distributed SearchDistributed Search DB Sync 8/17/2013
    18. 18. Confidential Think Big Analytics Data Science A New Role Exists – the Data Scientist Focused on data not models Works with analysts to create business value • One Part Scientist/Statistician • One Part Sleuth • One Part Artist • One Part Programmer 188/17/2013
    19. 19. Confidential Think Big Analytics 1. Big Analytics is a critical capability. 2. Your organization can create value now. 3. Get help to get off on the right foot. 4. Adopt incrementally. Conclusions Think Big Start Smart Scale Fast 198/17/2013
    20. 20. Confidential Think Big Analytics Rick Ron.Bodkin@thinkbiganalytics.com @ronbodkin

    ×