SlideShare a Scribd company logo
American Family Hadoop Journey
Case Study Discussion
UW E-Business Consortium
Business Intelligence Special Interest Group
April 2015
1
Objective
American Family Hadoop Journey
Give you a firsthand perspective on
• Why Hadoop?
• How we approached it
• What has worked and what hasn’t
2
Agenda
American Family Hadoop Journey
• Background & Context
• Organizational Frameworks
• Architectural Considerations
3
Background & context
American Family Hadoop Journey
Hadoop is a team sport
Our adoption effort has at times included 30+ staff
members
Infrastructure
Software developers
Architects
Business experts
Consultants
• My Role
– For American Family:
• Enterprise Information Architecture
– For Hadoop Team:
• Visionary direction & strategies for experimentation & project work
4
Existing landscape
American Family Hadoop Journey
American Family has long history with BI
• Mainframe files
 function-specific DW
 EDW
• Leading BI tools for various roles
• Function-specific analysis
• Standard reporting
• Ad hoc query & reporting
• Statistical analysis & modeling
5
Why Hadoop
American Family Hadoop Journey
• New business initiatives & applications
• Growing demand for flexible analysis
platforms
• Increasing interest in analyzing
“unstructured” data
6
What Hadoop is
American Family Hadoop Journey
7
www.facebook.com/hadoopers
Why Hadoop is different
American Family Hadoop Journey
• Data storage flexibility
• Programming language flexibility
– Java, python, R, SQL, APIs
• Processing flexibility
– Batch, interactive, in-memory
– Flexible workload queues
• Storage-dense to compute-dense spectrum
8
Expected benefits / challenges
American Family Hadoop Journey
Benefits
• Increased access to data
otherwise unavailable
• Greater collaboration
between IT & business
technical experts
• Greater capacity and
processing power at
lower cost than MPP
data warehouses
9
Challenges
• Rapid pace of technical
change
• Limited availability of
skilled staff
• Less optimized for
performance
• Challenging
administration
• Trial & error
A Journey Needs a Map
American Family Hadoop Journey
• Destination
– Data lake framework
– Technical architecture & principles
• Path & Traveling Companions
– Adoption framework
– Business & technology tracks
– Cross-functional team
– Organizational “architecture” & principles
• Mile Markers
– Use case categories
– Roadmap
10
Destination: Data Lake
American Family Hadoop Journey
Conceptual:
Enable the business with a rich and flexible environment able to store and
analyze all of the data they are interested in using.
Technical:
The data lake is a platform capable of storing and processing the largest
and most varied datasets that can be useful for the enterprise. The data
lake supports the following capabilities:
– Capture and store raw data at scale for a low cost
– Store many types of data in the same repository
– Perform transformations on the data to support specific analysis or
operational needs
– Define the structure through which the data should be interpreted at the
time it is used
– Perform many types of data processing rather than just SQL
11
Conceptual:
Enable the business with a rich and flexible environment able to store and
analyze all of the data they are interested in using.
Technical:
The data lake is a platform capable of storing and processing the largest
and most varied datasets that can be useful for the enterprise. The data
lake supports the following capabilities:
– Capture and store raw data at scale for a low cost
– Store many types of data in the same repository
– Perform transformations on the data to support specific analysis or
operational needs
– Define the structure through which the data should be interpreted at the
time it is used
– Perform many types of data processing rather than just SQL
Destination: Data Lake
American Family Hadoop Journey
12
Destination: High level technical architecture
American Family Hadoop Journey
13
Clients
HUE
BI Tool
Shell
Researchers
Analysts
Admins
Hadoop Cluster
DN DN DN DN DN DN
DN DN DN DN DN DN
DN DN DN DN DN DN
DN DN DN DN DN DN
DN DN DN DN DN DN
DN DN DN DN DN DN
YARN
Kerberos
HDFS
Commodity H/W
Edge
Node
Edge
Node
Edge
Node
ODBC/SSL
Rest API/SSL
SSH
• H/W and S/W ACLs
• LDAP integration
Path: Adoption framework
American Family Hadoop Journey
14
http://www.gartner.com/it/content/2604400/2604421/december_12_big_data_road_map_ssicular.pdf?userId=61955890
Path: Dual tracks
American Family Hadoop Journey
• Parallel business and technology
tracks
• Technology Track
– Technology team explores distribution options
– Realistic use cases drive proofs of concept
– Vendors onsite for 4-6 week experiments
• Business Track
– Focus on business extension use cases first
– Early adopter team received formal training and completes hands-
on experimentation
• Principles
– Business requirements drive expansion
– Market “buzz” and business vision drive experimentation
15
Organizational architecture
American Family Hadoop Journey
16
Core
Technology
Team
Manager
Technical
SMEs
Core
Business
Research
Team
Business
SMEs
Steering
Committee Hadoop
User
Group
Many informal
communication channels
• Early adopters
• Technically savvy
• Agile
• 1st year
Dedicated matrix,
• 2nd year
Integrated
Mile Markers: Use case categories
American Family Hadoop Journey
17
Operational Efficiency
• Expense reduction
• Cycle-time reduction
• Quality improvement
Information Advantage
• Knowledge
• Insight
• Prediction
Long-term
data
retention
Moving data that must be retained in an
accessible format to a lower-cost
platform
ETL Offload
(Cleanse, Conform,
Integrate)
Completing repeated data prep (ETL)
steps on a lower-cost, or more highly
parallel, platform
EDW
Optimization
Balancing cost and performance of data
storage and query execution by creating
a logical data warehouse spanning SMP,
MPP, and Hadoop
Staging for
Data
Exploration
Full (360)
View
Connect data of all types to create a
complete perspective related to
customers, products, processes, etc.
Data Science
Exploring / Processing data using
advanced statistical and test processing
algorithms to “automate” insight
discovery.
Quickly making data accessible for
profiling, visual exploration, etc. with low
IT investment
Mile Markers: Roadmap
American Family Hadoop Journey
18
The roadmap is an intersection between
• Time (across the top)
• Category (down the left)
Each cell contains the kinds of work that need to be
demonstrated for a category in that time period.
Milestones at the top provide a way to indicate when
work across all of the categories align to produce an
observable capability
Architecture choices to consider
American Family Hadoop Journey
• Analysis-specific clusters vs. the “data lake” concept
• Open source, supported distribution or proprietary
platforms
• Native tools vs. licensed accelerators
• SQL, No SQL, Search
• Metadata
• Workload balancing strategy
• Backup & disaster recovery approaches
19
Lessons Learned
American Family Hadoop Journey
• Cross functional team is critical
• Work through data governance processes
early
• Understand the state of the art with respect
to handling sensitive data
• Engage “business programmers” early
• Expect BI tools to lag in integrating well
20
Questions?
American Family Hadoop Journey
21
Craig Jordan
https://www.linkedin.com/in/crjordan

More Related Content

What's hot

Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
Mark Kromer
 
Haddop in Business Intelligence
Haddop in Business IntelligenceHaddop in Business Intelligence
Haddop in Business Intelligence
HGanesh
 

What's hot (20)

Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data Lake
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDB
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
Warehouse chapter3
Warehouse chapter3   Warehouse chapter3
Warehouse chapter3
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
Are You Killing the Benefits of Your Data Lake?
Are You Killing the Benefits of Your Data Lake?Are You Killing the Benefits of Your Data Lake?
Are You Killing the Benefits of Your Data Lake?
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and Hadoop
 
Olap, oltp and data mining
Olap, oltp and data miningOlap, oltp and data mining
Olap, oltp and data mining
 
JOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big DataJOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big Data
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the Hype
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data Warehouse
 
Haddop in Business Intelligence
Haddop in Business IntelligenceHaddop in Business Intelligence
Haddop in Business Intelligence
 

Similar to American family hadoop journey, uw ebc sig meeting, april 2015

No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summit
Open Analytics
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Pentaho
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 

Similar to American family hadoop journey, uw ebc sig meeting, april 2015 (20)

Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Apache drill
Apache drillApache drill
Apache drill
 
data analytics lecture3.ppt
data analytics lecture3.pptdata analytics lecture3.ppt
data analytics lecture3.ppt
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summit
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Teradata Loom Introductory Presentation
Teradata Loom Introductory PresentationTeradata Loom Introductory Presentation
Teradata Loom Introductory Presentation
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 

American family hadoop journey, uw ebc sig meeting, april 2015

  • 1. American Family Hadoop Journey Case Study Discussion UW E-Business Consortium Business Intelligence Special Interest Group April 2015 1
  • 2. Objective American Family Hadoop Journey Give you a firsthand perspective on • Why Hadoop? • How we approached it • What has worked and what hasn’t 2
  • 3. Agenda American Family Hadoop Journey • Background & Context • Organizational Frameworks • Architectural Considerations 3
  • 4. Background & context American Family Hadoop Journey Hadoop is a team sport Our adoption effort has at times included 30+ staff members Infrastructure Software developers Architects Business experts Consultants • My Role – For American Family: • Enterprise Information Architecture – For Hadoop Team: • Visionary direction & strategies for experimentation & project work 4
  • 5. Existing landscape American Family Hadoop Journey American Family has long history with BI • Mainframe files  function-specific DW  EDW • Leading BI tools for various roles • Function-specific analysis • Standard reporting • Ad hoc query & reporting • Statistical analysis & modeling 5
  • 6. Why Hadoop American Family Hadoop Journey • New business initiatives & applications • Growing demand for flexible analysis platforms • Increasing interest in analyzing “unstructured” data 6
  • 7. What Hadoop is American Family Hadoop Journey 7 www.facebook.com/hadoopers
  • 8. Why Hadoop is different American Family Hadoop Journey • Data storage flexibility • Programming language flexibility – Java, python, R, SQL, APIs • Processing flexibility – Batch, interactive, in-memory – Flexible workload queues • Storage-dense to compute-dense spectrum 8
  • 9. Expected benefits / challenges American Family Hadoop Journey Benefits • Increased access to data otherwise unavailable • Greater collaboration between IT & business technical experts • Greater capacity and processing power at lower cost than MPP data warehouses 9 Challenges • Rapid pace of technical change • Limited availability of skilled staff • Less optimized for performance • Challenging administration • Trial & error
  • 10. A Journey Needs a Map American Family Hadoop Journey • Destination – Data lake framework – Technical architecture & principles • Path & Traveling Companions – Adoption framework – Business & technology tracks – Cross-functional team – Organizational “architecture” & principles • Mile Markers – Use case categories – Roadmap 10
  • 11. Destination: Data Lake American Family Hadoop Journey Conceptual: Enable the business with a rich and flexible environment able to store and analyze all of the data they are interested in using. Technical: The data lake is a platform capable of storing and processing the largest and most varied datasets that can be useful for the enterprise. The data lake supports the following capabilities: – Capture and store raw data at scale for a low cost – Store many types of data in the same repository – Perform transformations on the data to support specific analysis or operational needs – Define the structure through which the data should be interpreted at the time it is used – Perform many types of data processing rather than just SQL 11
  • 12. Conceptual: Enable the business with a rich and flexible environment able to store and analyze all of the data they are interested in using. Technical: The data lake is a platform capable of storing and processing the largest and most varied datasets that can be useful for the enterprise. The data lake supports the following capabilities: – Capture and store raw data at scale for a low cost – Store many types of data in the same repository – Perform transformations on the data to support specific analysis or operational needs – Define the structure through which the data should be interpreted at the time it is used – Perform many types of data processing rather than just SQL Destination: Data Lake American Family Hadoop Journey 12
  • 13. Destination: High level technical architecture American Family Hadoop Journey 13 Clients HUE BI Tool Shell Researchers Analysts Admins Hadoop Cluster DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN YARN Kerberos HDFS Commodity H/W Edge Node Edge Node Edge Node ODBC/SSL Rest API/SSL SSH • H/W and S/W ACLs • LDAP integration
  • 14. Path: Adoption framework American Family Hadoop Journey 14 http://www.gartner.com/it/content/2604400/2604421/december_12_big_data_road_map_ssicular.pdf?userId=61955890
  • 15. Path: Dual tracks American Family Hadoop Journey • Parallel business and technology tracks • Technology Track – Technology team explores distribution options – Realistic use cases drive proofs of concept – Vendors onsite for 4-6 week experiments • Business Track – Focus on business extension use cases first – Early adopter team received formal training and completes hands- on experimentation • Principles – Business requirements drive expansion – Market “buzz” and business vision drive experimentation 15
  • 16. Organizational architecture American Family Hadoop Journey 16 Core Technology Team Manager Technical SMEs Core Business Research Team Business SMEs Steering Committee Hadoop User Group Many informal communication channels • Early adopters • Technically savvy • Agile • 1st year Dedicated matrix, • 2nd year Integrated
  • 17. Mile Markers: Use case categories American Family Hadoop Journey 17 Operational Efficiency • Expense reduction • Cycle-time reduction • Quality improvement Information Advantage • Knowledge • Insight • Prediction Long-term data retention Moving data that must be retained in an accessible format to a lower-cost platform ETL Offload (Cleanse, Conform, Integrate) Completing repeated data prep (ETL) steps on a lower-cost, or more highly parallel, platform EDW Optimization Balancing cost and performance of data storage and query execution by creating a logical data warehouse spanning SMP, MPP, and Hadoop Staging for Data Exploration Full (360) View Connect data of all types to create a complete perspective related to customers, products, processes, etc. Data Science Exploring / Processing data using advanced statistical and test processing algorithms to “automate” insight discovery. Quickly making data accessible for profiling, visual exploration, etc. with low IT investment
  • 18. Mile Markers: Roadmap American Family Hadoop Journey 18 The roadmap is an intersection between • Time (across the top) • Category (down the left) Each cell contains the kinds of work that need to be demonstrated for a category in that time period. Milestones at the top provide a way to indicate when work across all of the categories align to produce an observable capability
  • 19. Architecture choices to consider American Family Hadoop Journey • Analysis-specific clusters vs. the “data lake” concept • Open source, supported distribution or proprietary platforms • Native tools vs. licensed accelerators • SQL, No SQL, Search • Metadata • Workload balancing strategy • Backup & disaster recovery approaches 19
  • 20. Lessons Learned American Family Hadoop Journey • Cross functional team is critical • Work through data governance processes early • Understand the state of the art with respect to handling sensitive data • Engage “business programmers” early • Expect BI tools to lag in integrating well 20
  • 21. Questions? American Family Hadoop Journey 21 Craig Jordan https://www.linkedin.com/in/crjordan