Making the Case for Hadoop in a Large Enterprise
British Airways
Alan Spanos
Data Exploitation Manager
British Airways
Jay Aubby
Architect
British Airways
Making the Case for Hadoop in a Large Enterprise-British Airways
1.
2. Making the Case for Hadoop
in a Large Enterprise
15th
April 2015
3. Making the Case for Hadoop in a Large Enterprise
Alan Spanos
Data Exploitation Manager
Jay Aubby
Architect
4. Making the Case for Hadoop in a Large Enterprise 4
Contents
1. How to Approach a Business Case for Hadoop
2. BA as a Case Study
3. Technical Implementation â HDP 2.2
5. Making the Case for Hadoop in a Large Enterprise 5
1. How to Approach a Business
Case for Hadoop
6. Making the Case for Hadoop in a Large Enterprise 6
Things to Consider
When running a project to deploy Hadoop Enterprise Wide,
consider the following areas
Business
Strategy
Enterprise
Hadoop
Additional
Complexity
Current BI
Landscape
Key
Stakeholders
User Base
Financial
Processes
7. Making the Case for Hadoop in a Large Enterprise 7
Selecting your âAnchorâ Use Case
Choosing a use case which has continual and undeniable value
is key. This keys the focus on solving the technical challenges.
⢠Suitable use cases include:
ď Data archiving of âcoldâ data on EDW
ď Move sandboxes from EDW to Hadoop
ď Move ETL / ELT layer to Hadoop from RDBMS
⢠Doing something new requires a âleap of faithâ, which is
something many finance teams are unwilling to make
⢠Simpler projects will help pay for platform set up, which
can, in future, enable more speculative projects
8. Making the Case for Hadoop in a Large Enterprise 8
Architecture vs Deployment
Hadoopâs modular architecture enables it to quickly and easily
accommodate future use cases
9. Making the Case for Hadoop in a Large Enterprise 9
Reuse not Reinvention
Reusing existing staff, processes and concepts, is key to
minimising the cost of implementing hadoop
Function EDW Hadoop
Data Governance EDW Data Governance EDW Data Governance
Administrations EDW DBA EDW DBA
Hardware Support IT Operations IT Operations
Platform Support EDW Support EDW Support
Security EDW Security EDW Security
Development EDW Dev EDW Dev
Users BI / Analysts BI / Analysts
10. Making the Case for Hadoop in a Large Enterprise
IT, Information and Data
Governance, Processes,
Security and Operations
IT and Data Innovation
Cost
Architectural Fit
ROI TCO
Return
Getting the Balance Right is Key
Proving business value of Hadoop is often easier once BI / IT a
useable environment is already available!
11. Making the Case for Hadoop in a Large Enterprise 11
2. BA as a Case Study
12. Making the Case for Hadoop in a Large Enterprise 12
Anchor Use Case â Data Archive for
Legal Cases
Simple cost saving project identified as ideal anchor case
Project Objectives
â˘Agree technical architecture for enabling user case in Hadoop.
â˘Ensure agreed technical architecture is scalable for future
planned uses of Hadoop in BA.
â˘Deliver agreed technical architecture.
â˘Deliver BI and IT processes to support new infrastructure.
Project cash positive after 12 months, with order of magnitude
Opex savings once implemented.
13. Making the Case for Hadoop in a Large Enterprise 13
Keeping It Simple!
Hadoop is complex enough and the biggest risk in any project is
peoples ability to cope with the change
User
Function EDW Hadoop
Logon Corporate LDAP Corporate LDAP
BI Tools Existing Tools Existing Tools + Hue
Direct Access Method SQL HiveQL
Fault & Support KITE KITE
Data Definitions EDW Data Dictionary EDW Data Dictionary
Data structures are mastered in EDW for archiving project
14. Making the Case for Hadoop in a Large Enterprise 14
3. Technical Implementation
15. Making the Case for Hadoop in a Large Enterprise
BA Architecture Challenges
Security
⢠Multiple identity providers across the group
⢠For data access all security is controlled at the DB level via security roles
Infrastructure Constraints
⢠Policy of virtual infrastructure only
⢠Infrastructure Patterns (Puppet)
IT Service Delivery
⢠Alignment and integration with existing service delivery models
⢠Lots of application areas but only one equipped to support the Hadoop
environment (EDW team)
Information Security
⢠Policies for information security are based upon actor, role, business unit,
department, etc.
⢠Strong conformance and consistency
Support
⢠Due to segregated responsibilities application support is spread across
multiple COEâ
⢠No to open source
16. Making the Case for Hadoop in a Large Enterprise
Challenges of Hadoop Implementation
17. Making the Case for Hadoop in a Large Enterprise
Hortonworks 2.2
Hadoop for the enterprise
17
18. Making the Case for Hadoop in a Large Enterprise
Hadoop Implementation Principals
Reuse donât reinvent
⢠Reuse all EDW functions, capabilities and processes e.g. Data and Information Governance,
Database Administration, Support, and Exploitation
⢠Example of provisioning access, Corporate Directory Request Hadoop access will be just
another check box
⢠Naming standards and data/information definitions
The masterof the data structure will remain the source systems
⢠All data structures must be kept in sync between systems ensuring alignment with corporate
data/information dictionary.
The data archive/migration process will be inline with current process which is
managed and controlled by EDWteam
⢠Modifying the existing approach
Alignment with BA EA vision
⢠The initial rollout of Hadoop is to introduce the capabilities to BA. It is upon this platform other
business capabilities can be prototyped, validated and built extending the use of Hadoop.
Capability Governance
⢠Hadoop has lots of capabilities so governing and controlling what gets rolled out is critical to its
success
⢠BA Hadoop readiness assessment
18
19. Making the Case for Hadoop in a Large Enterprise
Capability Governance Example
20. Making the Case for Hadoop in a Large Enterprise
Hortonworks
Addressing Architectural Challenges with HDP 2.2.
20
Security Infrastructure Service Delivery
Information Security Support
Kerberos
(Authentication)
Ranger
(Authorisation and
Policies)
BA LDAP
(Identity Provider)
Ambari, Falcon
& Zookeeper
(Management )
YARN
(Resource
Manager)
Oozie
(Scheduler)
BA Puppet
(Provisioning) BA BMC
(Monitoring)
Control-M
(Scheduler)
BA Kite
(Support Ticket)
20
HDFS
(File System)
YARN
(Resource
Manager)
Kerberos
(Authentication)
Ranger
(Authorisation)
Ambari, Falcon
& Zookeeper
(Management )
Oozie
(Scheduler)
Core
21. Making the Case for Hadoop in a Large Enterprise 21
HDFS
(File System)
YARN
(Resource
Manager)
Kerberos
(Authentication)
Ranger
(Authorisation)
Ambari, Falcon
& Zookeeper
(Management )
Oozie
(Scheduler)
Core
Acquisition Processing Persistent Exploitation
Sqoop
(Bulk Transfer of
Data)
Pig
(Transformation)
Flume
(Collect, move
Log data)
Kafka
(Message queue)
Storm
(Computational
Engine)
Analytics
Visualisation
Excel
(Analysis)
Hue
(Web Interrogation
Client)
Hive
(SQL Datastore)
Solr
(Search)
Hbase
(NoSQL)
BatchNearReal-time
HCatelog
(Metastore)
22. Making the Case for Hadoop in a Large Enterprise
Building Business Solutions
Business Initiative IT Initiative Architecture Building
Blocks
Solution Building
Blocks
Improve Efficiency
& Reduce Costs
Data Archiving ⢠Bulk data transfer
⢠Persistent data
store
⢠Off-line query
capability
⢠Enforce Data and
Information Security
policies
⢠Ability to Restore
⢠Scoop
⢠Hive
⢠Kerberos
⢠Ranger
.... .... â˘... â˘....
Editor's Notes
Use this slide as a placeholder before your presentation starts.
Please ensure you complete the title and date information on this slide or else use slide #1.
The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
Please select and edit presenter name and title as appropriate.
Footer title must be amended in the Slide Master. See View > Master > Slide Master to edit.
Summarise your presentation title to fit. Alternatively use the date of your presentation.
The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
Use this slide as a placeholder before your presentation starts.