MAKING BIG DATA COME ALIVE
Adding Hadoop to Your Analytics Mix:
Challenges and Strategies
Madina Kassengaliyeva
July 23, 2015
2
Madina Kassengaliyeva
Director, Client Services, Think Big
Madina Kassengaliyeva is responsible for ensuring successful
delivery of Think Big’s service engagements. Madina has led
strategy, engineering and data science engagements in a variety
of areas, including recommendation engines, customer
interactions optimization, marketing analytics and compliance.
Madina holds an MBA from the University of Chicago and a BA in
International Studies from American University.
Presenters
© 2015 Think Big, a Teradata Company 8/3/2015
Paul Barsch
Director, Services Marketing, Think Big
Paul Barsch directs marketing programs for Think Big, a Teradata
Company. Paul has been in IT for 15+ years in variety of roles for
Teradata, HP Enterprise Services and KPMG Consulting.
3
Housekeeping
Use the widget bar below to…
Get valuable resources & complete exit survey
Ask Questions to the Presenters
Request online technical help
Go social….
…and follow the conversation
© 2015 Think Big, a Teradata Company 8/3/2015
4
• Hadoop Adoption Path
• Key Challenges – Data,
Organization, Capabilities
• Ideas for Solutions
Agenda
5
Common Hadoop Adoption Path
© 2015 Think Big, a Teradata Company 8/3/2015
1. Address
Immediate
Needs
2. Establish a
Data
Repository
3. Initial
Analytics
Exploration
4. Integrate
Hadoop into
the Analytics
Capabilities
• Hadoop used to
relieve a technology
pain point
• Reduce data
warehouse costs
• Speed up ETL
• The only users are in
technology teams
• More and more data gets
added to Hadoop as a
result of Phase 1
• Greater data variety,
more raw data, deeper
history
• Initial data transfer,
security, and governance
practices are established
• Still perceived as largely
a technology platform
• Limited number of people
or teams conduct POCs
using Hadoop
• Analytics techniques not
available on traditional
platforms are applied
• Early wins indicate
promising business impact
and excitement builds
• Multiple teams use
Hadoop as part of the
analytics infrastructure
• Techniques, methods,
best practices and access
patterns get codified
• Business begins to
capture consistent value
Transition from Phase 3
to Phase 4 is when key
challenges emerge
6
Hadoop Adoption – Critical Point
© 2015 Think Big, a Teradata Company 8/3/2015
7
Key Challenges
© 2015 Think Big, a Teradata Company 8/3/2015
Data
Organization
Capabilities
• Impact of schema on read
• Consistent taxonomies and reference data
• Architecture - access patterns and flows
• Skills, roles and responsibilities
• Lack of common vocabulary
• Knowledge capture and sharing
• Foundational capabilities at the whim of
changing business priorities
• Future that’s hard to envision is hard to build
8
Organization – Key Challenges
© 2015 Think Big, a Teradata Company 8/3/2015
• Skills, roles and responsibilities
o Significant skills gaps between what’s currently available and what is
needed
o Both business and technology do analytics and often engineering, blurring
lines of responsibility or ownership
o “Throw over the wall” doesn’t work
• Lack of common vocabulary
o Every BU (and every leader) have their own understanding of the same
words
o This is rarely discussed
• Knowledge capture and sharing
o Multiple teams work with the same data and similar techniques
o Organization silos do not naturally support broad knowledge transfer
9
• Cross-BU committee to guide
organizational change, define
common vocabulary, defend the
effort to executive leadership and
share success
• Thorough, honest skills assessments to
identify gaps, training needs,
augmentation needs, map to roles
and responsibilities
• Documented tools requirements
based on current and projected skills
• Collaboration architecture
• Plug into existing knowledge transfer
practices and tools and allow for
informal information exchange based
on data access privileges
Organization – Ideas for Solutions
© 2015 Think Big, a Teradata Company 8/3/2015
10
Organization – Key Functions
© 2015 Think Big, a Teradata Company 8/3/2015
Strategy
Data Management & Governance
Architecture Tools Market
Research
Roadmap
Planning
Value
Realization
Future Data
Sources
Services
Support
Visualization &
ReportingData SME’s
Core Platform
Development Testing
Operations
Core Platform
Management
Metrics Tracking &
Reporting Platform Integration
Program
Management
Roadmap
Execution
Cross Group
Coordination
Financial
Management
Small Project
Prioritization
Communication
& Change
Management
Application
Development
Analytic
Sandbox
Data Science
Integration,
Interfaces &
Ingestion
Training
Incident Management Config, Change,
Release ManagementProblem Management
Help DeskKnowledge
Management
Technology
Governanc
e
Data
Quality &
Metrics
Access
Controls
Data
Governance
Metadata
Management
11
• Foundational capabilities at the whim of changing business priorities
• Lack of consensus on what are foundational capabilities
• Let’s be honest, the “Top Project” changes often and the resources go
with it
• Foundational capabilities do not immediately impact the bottom line
• Future that’s hard to envision is hard to build
• Lack of shared vision
• Clarity needed at multiple levels – strategy, operational details, day to
day
Capabilities – Key Challenges
© 2015 Think Big, a Teradata Company 8/3/2015
12
• Consolidate ownership in a team that has
organizational influence and includes
representatives from the business, the
infrastructure, architecture, data, and
analytics
• Back to vocabulary – agree on what
capabilities mean for your business unit and
your technology partners
• Roadmaps are useful – visual representations
of high-level goals against a time line that
should define your projects
• Dedicate resource to capabilities and
protect them
• Check in with your roadmap – does it still
reflect your vision?
Capabilities – Ideas for Solutions
© 2015 Think Big, a Teradata Company 8/3/2015
Photo courtesy of Flickr. Creative Commons.
By E.Bass.
13
Capabilities Pyramid
© 2015 Think Big, a Teradata Company 8/3/2015
14
Capabilities: Roadmap Example
© 2015 Think Big, a Teradata Company 8/3/2015
Analytics
standardized
methods,
code, tools,
team roles
Operations
standardized
processes,
tools, team
roles
Skills and roles
matrix
Data Ingestion, Transfer,
Structuring,
and Governance approach
Unified Model Management
Integrated
Data Science
Variables based on single source
structured data
Variable selection in
Hadoop
Integration with existing
scoring engine
Batch data processing in HadoopIntegration Cross-channel and intraday variables generation
Batch scoring in Hadoop
Natural language processing
to analyze text and voice
Initial real-time scoring
Execution Methodology and
project management
Data and
Models
Organization
and
Managemen
t
Analytics Knowledge
Management
Scoring Architectural
and Analytical design
Data Lifecycle Management
Real-time scoring design
Statistical and machine-learning-based
modeling
Data Exploration of unstructured data
components (e.g. URL, chat text)
Data Exploration of structured data
components (e.g. page views,
Cross-channel variables, variables from unstructured data +
intraday variables
15
• Impact of schema on read
• Hadoop supports a variety of data structures, which simplifies data
ingestion and allows data users to define preferred schemas
• This shifts the burden of defining the schema to the data users
• Consistent taxonomies and reference data
• Meaningful data analysis requires known and consistent taxonomy
• New taxonomies can get created by individual teams
• Reference data changes
• Architecture - access patterns and flows
• Data flows across platforms, regular updates, physical and virtual
constraints
• Decisions on what should be done where
Data – Key Challenges
© 2015 Think Big, a Teradata Company 8/3/2015
16
• Big issue with lots of opinions – see Data Lake
et. al
• Test and define common data manipulation
patterns for different use cases –
aggregations, reductions, basic statistical
derivations
• Centralize the responsibility for data
governance, data architecture, taxonomy,
and maintenance
• Establish knowledge sharing for data post-
analytics
Data – Ideas for Solutions
© 2015 Think Big, a Teradata Company 8/3/2015
Photo courtesy of Flickr. Creative Commons.
By Renzo Ferrante
17
• Data management,
knowledge, architecture, and
processing assurance
• Investment justification,
research, knowledge sharing
• Data aggregation and
enhancement
Client Example – Centralized Data Group
© 2015 Think Big, a Teradata Company 8/3/2015
Data Source 1
Data Source 2
Data Source 3
Data Source 3
Business
Group
Product
Group
Central Tech
Group
18
Conclusions
© 2015 Think Big, a Teradata Company 8/3/2015
Data
Organization
Capabilities
• Centralize data management
• Knowledge of data = knowledge of business
• Technology is not enough – need the right
people and processes
• Executive commitment is key
• Tough conversations can yield much better
alignment
• Dedicate and protect resources to build
capabilities
19
• 100% Big Data Focus
• Founded in 2010 with100+ engagements across 70 clients
• Unlock value of big data with data science and data
engineering services
• Proven vendor-neutral open source integration expertise
• Agile team-based development methodology
• Think Big Academy for skills and organizational development
• Global delivery model
Who is Think Big?
20
Questions
and Answers
Thank You!

Adding Hadoop to Your Analytics Mix?

  • 1.
    MAKING BIG DATACOME ALIVE Adding Hadoop to Your Analytics Mix: Challenges and Strategies Madina Kassengaliyeva July 23, 2015
  • 2.
    2 Madina Kassengaliyeva Director, ClientServices, Think Big Madina Kassengaliyeva is responsible for ensuring successful delivery of Think Big’s service engagements. Madina has led strategy, engineering and data science engagements in a variety of areas, including recommendation engines, customer interactions optimization, marketing analytics and compliance. Madina holds an MBA from the University of Chicago and a BA in International Studies from American University. Presenters © 2015 Think Big, a Teradata Company 8/3/2015 Paul Barsch Director, Services Marketing, Think Big Paul Barsch directs marketing programs for Think Big, a Teradata Company. Paul has been in IT for 15+ years in variety of roles for Teradata, HP Enterprise Services and KPMG Consulting.
  • 3.
    3 Housekeeping Use the widgetbar below to… Get valuable resources & complete exit survey Ask Questions to the Presenters Request online technical help Go social…. …and follow the conversation © 2015 Think Big, a Teradata Company 8/3/2015
  • 4.
    4 • Hadoop AdoptionPath • Key Challenges – Data, Organization, Capabilities • Ideas for Solutions Agenda
  • 5.
    5 Common Hadoop AdoptionPath © 2015 Think Big, a Teradata Company 8/3/2015 1. Address Immediate Needs 2. Establish a Data Repository 3. Initial Analytics Exploration 4. Integrate Hadoop into the Analytics Capabilities • Hadoop used to relieve a technology pain point • Reduce data warehouse costs • Speed up ETL • The only users are in technology teams • More and more data gets added to Hadoop as a result of Phase 1 • Greater data variety, more raw data, deeper history • Initial data transfer, security, and governance practices are established • Still perceived as largely a technology platform • Limited number of people or teams conduct POCs using Hadoop • Analytics techniques not available on traditional platforms are applied • Early wins indicate promising business impact and excitement builds • Multiple teams use Hadoop as part of the analytics infrastructure • Techniques, methods, best practices and access patterns get codified • Business begins to capture consistent value Transition from Phase 3 to Phase 4 is when key challenges emerge
  • 6.
    6 Hadoop Adoption –Critical Point © 2015 Think Big, a Teradata Company 8/3/2015
  • 7.
    7 Key Challenges © 2015Think Big, a Teradata Company 8/3/2015 Data Organization Capabilities • Impact of schema on read • Consistent taxonomies and reference data • Architecture - access patterns and flows • Skills, roles and responsibilities • Lack of common vocabulary • Knowledge capture and sharing • Foundational capabilities at the whim of changing business priorities • Future that’s hard to envision is hard to build
  • 8.
    8 Organization – KeyChallenges © 2015 Think Big, a Teradata Company 8/3/2015 • Skills, roles and responsibilities o Significant skills gaps between what’s currently available and what is needed o Both business and technology do analytics and often engineering, blurring lines of responsibility or ownership o “Throw over the wall” doesn’t work • Lack of common vocabulary o Every BU (and every leader) have their own understanding of the same words o This is rarely discussed • Knowledge capture and sharing o Multiple teams work with the same data and similar techniques o Organization silos do not naturally support broad knowledge transfer
  • 9.
    9 • Cross-BU committeeto guide organizational change, define common vocabulary, defend the effort to executive leadership and share success • Thorough, honest skills assessments to identify gaps, training needs, augmentation needs, map to roles and responsibilities • Documented tools requirements based on current and projected skills • Collaboration architecture • Plug into existing knowledge transfer practices and tools and allow for informal information exchange based on data access privileges Organization – Ideas for Solutions © 2015 Think Big, a Teradata Company 8/3/2015
  • 10.
    10 Organization – KeyFunctions © 2015 Think Big, a Teradata Company 8/3/2015 Strategy Data Management & Governance Architecture Tools Market Research Roadmap Planning Value Realization Future Data Sources Services Support Visualization & ReportingData SME’s Core Platform Development Testing Operations Core Platform Management Metrics Tracking & Reporting Platform Integration Program Management Roadmap Execution Cross Group Coordination Financial Management Small Project Prioritization Communication & Change Management Application Development Analytic Sandbox Data Science Integration, Interfaces & Ingestion Training Incident Management Config, Change, Release ManagementProblem Management Help DeskKnowledge Management Technology Governanc e Data Quality & Metrics Access Controls Data Governance Metadata Management
  • 11.
    11 • Foundational capabilitiesat the whim of changing business priorities • Lack of consensus on what are foundational capabilities • Let’s be honest, the “Top Project” changes often and the resources go with it • Foundational capabilities do not immediately impact the bottom line • Future that’s hard to envision is hard to build • Lack of shared vision • Clarity needed at multiple levels – strategy, operational details, day to day Capabilities – Key Challenges © 2015 Think Big, a Teradata Company 8/3/2015
  • 12.
    12 • Consolidate ownershipin a team that has organizational influence and includes representatives from the business, the infrastructure, architecture, data, and analytics • Back to vocabulary – agree on what capabilities mean for your business unit and your technology partners • Roadmaps are useful – visual representations of high-level goals against a time line that should define your projects • Dedicate resource to capabilities and protect them • Check in with your roadmap – does it still reflect your vision? Capabilities – Ideas for Solutions © 2015 Think Big, a Teradata Company 8/3/2015 Photo courtesy of Flickr. Creative Commons. By E.Bass.
  • 13.
    13 Capabilities Pyramid © 2015Think Big, a Teradata Company 8/3/2015
  • 14.
    14 Capabilities: Roadmap Example ©2015 Think Big, a Teradata Company 8/3/2015 Analytics standardized methods, code, tools, team roles Operations standardized processes, tools, team roles Skills and roles matrix Data Ingestion, Transfer, Structuring, and Governance approach Unified Model Management Integrated Data Science Variables based on single source structured data Variable selection in Hadoop Integration with existing scoring engine Batch data processing in HadoopIntegration Cross-channel and intraday variables generation Batch scoring in Hadoop Natural language processing to analyze text and voice Initial real-time scoring Execution Methodology and project management Data and Models Organization and Managemen t Analytics Knowledge Management Scoring Architectural and Analytical design Data Lifecycle Management Real-time scoring design Statistical and machine-learning-based modeling Data Exploration of unstructured data components (e.g. URL, chat text) Data Exploration of structured data components (e.g. page views, Cross-channel variables, variables from unstructured data + intraday variables
  • 15.
    15 • Impact ofschema on read • Hadoop supports a variety of data structures, which simplifies data ingestion and allows data users to define preferred schemas • This shifts the burden of defining the schema to the data users • Consistent taxonomies and reference data • Meaningful data analysis requires known and consistent taxonomy • New taxonomies can get created by individual teams • Reference data changes • Architecture - access patterns and flows • Data flows across platforms, regular updates, physical and virtual constraints • Decisions on what should be done where Data – Key Challenges © 2015 Think Big, a Teradata Company 8/3/2015
  • 16.
    16 • Big issuewith lots of opinions – see Data Lake et. al • Test and define common data manipulation patterns for different use cases – aggregations, reductions, basic statistical derivations • Centralize the responsibility for data governance, data architecture, taxonomy, and maintenance • Establish knowledge sharing for data post- analytics Data – Ideas for Solutions © 2015 Think Big, a Teradata Company 8/3/2015 Photo courtesy of Flickr. Creative Commons. By Renzo Ferrante
  • 17.
    17 • Data management, knowledge,architecture, and processing assurance • Investment justification, research, knowledge sharing • Data aggregation and enhancement Client Example – Centralized Data Group © 2015 Think Big, a Teradata Company 8/3/2015 Data Source 1 Data Source 2 Data Source 3 Data Source 3 Business Group Product Group Central Tech Group
  • 18.
    18 Conclusions © 2015 ThinkBig, a Teradata Company 8/3/2015 Data Organization Capabilities • Centralize data management • Knowledge of data = knowledge of business • Technology is not enough – need the right people and processes • Executive commitment is key • Tough conversations can yield much better alignment • Dedicate and protect resources to build capabilities
  • 19.
    19 • 100% BigData Focus • Founded in 2010 with100+ engagements across 70 clients • Unlock value of big data with data science and data engineering services • Proven vendor-neutral open source integration expertise • Agile team-based development methodology • Think Big Academy for skills and organizational development • Global delivery model Who is Think Big?
  • 20.