The document discusses the role of a Chief Data Officer (CDO) and how data architecture can help support that role. It provides examples of what BT is doing with data in the absence of a CDO, including building a data platform called HAAS to centralize data and facilitate self-service analytics. The challenges of BT's complex legacy systems are also outlined. Data architecture helps by developing a vision, building the data infrastructure platform, and educating others on its use to support data-driven initiatives without a formal CDO.
3. The architects answer…
It depends
…..State of the business
…..State of enterprise architecture
……What’s going on externally
Do you have or need a CDO ?
4. AGENDA
Framing View of the Industry
Discuss the CDO role
The organisation of BT & IT Challenges
Examples of what we’re doing with data
(in the absence of a CDO)
5. The Long View of Big Data
Data “Bigness” =
( Volume, Velocity, Variety)
1990 Y2K
Mainframe (1st Platform)
1960 09
First Research cluster Production Cluster
HAAS = Hadoop as a Service
14
Proprietary, Monolithic
Batch, Interactive
COBOL/ISAM/IDMS
Linked Record sets Client-Server Applications +
RDBMS
(2nd Platform )
OPEN ! 3GL, 4GL
PC & Servers
on premise
RELATIONAL
1606
scale out infrastructure
(3rd Platform)
Clusters, Data hub, pipelines
Mobile
Social
Big Data
Cloud
?
cost/performance
VVV crunch
6. What does a Chief Data Office(r) do ?
Evangelise
• Culture change to “data driven organisation”
• Self-Service Data & Analytics
Centralise ( tackle the silo problem )
“Year 1: Build the House”
“Year 2: Throw Open the doors”
Facilitate
• Educate
• Design Pattern Cookbook for the Enterprise
• Briefings – All Hands Calls, Leadership Team Mtgs, Hackathons….
• Tooling
• SKOOL on GITHUB (tool to simplify transferring tables from Oracle to Hadoop)
“Understanding the Chief Data Officer”
O’Reilly – Julie Steele of Silicon Valley Data Science
7. A Good CDO Role Model ?
Joy Bonaguro
City of San Francisco
data.sfgov.org
Cataloguing Data Assets
Facilitating data sharing
Building enabling infrastructure
8. BT Group Structure 1/Apr/2016
Customers
Chief Architects Office Enterprise Architecture
Data Architecture
For BT Group
~ 90K FTE in 61 countries, serving 180 countries
Research & Innovation
9. Legacy Systems Architecture in each BT Business Unit
Analytics
Data
Warehouse
ESB
CRM
Service Management
Network Management
Networks
& IT
Customers
• Hundreds of systems in each business unit grouped
into 3 operational areas (CRM/Service Mgt/Network Mgt)
• Data Warehouse per business unit
• Client – Server applications running on
servers in BT Data Centres (~ 35K hosts)
• Mainframe applications (in Openreach)
• Total Storage ~ 25PB
• Lots of event / time series data
– Network Alarms & Telemetry
– Netflow Traffic Events, Security events
– Call Detail Records, web clicks,
– mobile handset data (GPS, Apps, browsing..)
• Business Unit CIOs manage IT investment roadmap, each business
unit deploys a “stack release” quarterly
Field Engineers
10. Challenges - Complexity
Example from BT Global Services
Design for Release 17 of
Repair Systems for 1 product family
Where’s the Master Data ?
Which flows are data replication ?
Which flows are transactional ?
x 70 Similar “system stacks”
x 4 Releases / yr
14. What does Data architecture do…? 1. Sort the basics
Adopt/Adapt a framework
Establish Lists(systems, data landscape….)
DAMA DMBOK.. TOGAF…
15. What does Data architecture do…? 2. Develop Vision
CRM
Hive
Meta
Store
RDBMS
Web/APP
Server
Map
Reduce
code
BI Tools
Tableau, Zoomdata…
(HIVE TABLE ACCESS)
HDFS
Impala
+ Sentry
Wrangling & Discovery
Data Science
Datameer, HUE…
(HDFS FILE ACCESS)
Flume
Golden
Gate
ERP
RDBMS
Web/APP
Server
Map
Reduce
code
sqoop
DW
RDBMS
Web/APP
Server
Map
Reduce
code
sqoop
1. Event Ingestion from
Networks/IT/Web servers
Collection with flume agents
landing in HDFS files 2. DB Table transfer using sqoop
(map/reduce) jobs, landing in HDFS files
Active
Directory
FILES
TABLES
snapshotCDC snapshot
Data
Scientists
SQL
analysts
business
users
16. What does Data architecture do…? 3A. Build the data house
• Following a presentation to the TSO Leadership team Dec 2013 an initial inovestment in
a production cluster was agreed backed by a plan to launch in Feb 2014
• 60 nodes optimised for Hadoop map/reduce deployed in BT Data Centre in Sheffield
(6TB local disks, 1:1 core:spindle ratio, 8GB for JVM per map/reduce slot
• Existing linux 3rd line team tasked with running basic (Min. Viable Product) Hadoop
Cluster as a shared service platform
BT HaaS Release 1: 60 Nodes ~ 2 PB Feb 2014 Linux 3rd Line Hadoop Admin
17. What does Data architecture do…? 3B. Build the data house
HAAS Platform
Hadoop Cluster B (Openreach only)
Order form
(SharePoint)
script
email
Active
Directory
Tennant
“Project Owner”
User
admin
Standard
User Admin
Process
Hadoop
Cluster A HAASA AP 00307_12126
HIVE
HDFS
sentry
Job queue
HUE Impala
Flume
BI Server
Create
Hadoop
Features
“HAASA AP 00307_12126
Is ready for you to use”
existing
Business APP
12126 .
Oracle
DB
APP extends footprint in HaaS
http FS
Kerberos
Datameer
Analytics
Review
Board
Platform
Admin
ARB
User Access
Systems Access
Sqoop
Create
Security
Group
18. HAASA AP 00101_2029
Faults
4369
Orders
3531
CRM
2029
hree existing business applications (CRM, Orders, Faults) extended into HaaS
RDBMS
Customer
Table
RDBMS
Orders
Table
RDBMS
Faults
Table
T_CustomerHive DB
HAASA
AP 00101_2029
sqoop
V_Customer
HAASA AP 00202_3531
T_OrdersHive DB
HAASA
AP 0202_3531
sqoop
V_Orders
HAASA AP 00303_4369
T_FaultsHive DB
HAASA
AP 0303_4369
sqoop
V_Faults
Business
Data
Stewards
Business Analysts / Data Scientists
CRM
Orders
Faults
Governing Access to Data on the Platform ** WIP **
1. Browse & select data
2. Get Steward Approval
3. Create VIEWs & GRANTs
4. Recommend joins/ Views
Data Catalogue
(Million Table Meta-store)
19. Cloudera
“Resident”
Solution
Architect
What does Data architecture do…? 3. Educate
BT HaaS Cookbook
snip.bt.com/haascook
Design patterns to
ease project on boarding
included in “Learning Pathways”
Research & Innovation
Data Scientists
Dec 2015 3rd BT Data Science Week
(50 @ Adastral)
Business Awareness
Sep 2014
UK Hadoop User Group
(200 @ BT Centre)
IT Operations
Jan 2014
RESOPS training week
(Research + IT Ops Adastral)
Architecture
Hadoop Summit Mar 2014
(Doug Cutting- Cloudera+BT)
Big Data Data Centre of Excellence
Cardiff / Bangalore
20 designers / developers
working on > 50 opportunities & projects
published open source “skool” utility
20. Q & A
Phill Radley
Chief Data Architect
phillip.radley@bt.com