Many organisations will be grappling with large, distributed and high-load data systems. Over time they have become complex, hard to maintain, and don’t provide a consolidated view of data available to the organisation. These were the challenges faced by Liberty Global who, together with EPAM, have tackled these issues by creating an Operational Data Hub - a centralized operational data framework to replace multiple monitoring solutions.
2. Who We Are
Liberty Global is the world’s largest international TV
and broadband company, with operations in 10
European countries under the consumer brands
Virgin Media, Ziggo, Telenet and UPC.
LIBERTY GLOBAL EPAM SYSTEMS
Complete Services for Digital Business
Our multi-disciplinary teams combine business expertise
with design thinking, world-class engineering, modern
operations practices and knowledge of leading tools and
frameworks to optimize performance.
3. A R C H I T E C T U R E
S E C U R I T Y
S C A L A B I L I T Y
S T A B I L I T Y
B U S I N E S S A N A L Y S I S
Partnership
+
5. • Millions of users
• Multiple clients: Set-top boxes,TV apps, mobile, web
• Country-specific functionality
• Content availability
• Complexity – hundreds of components
• Wi-Fi and broadband connectivity
• Millions of users
• Affects digitalTV experience and vice versa
• Historically, a separate data silo from digitalTV
and entertainment
Connectivity and Entertainment
6. 6
Initial Problem Statements
R E A C T I V E P R O B L E M
S O L V I N G
M U L T I P L E M O N I T O R I N G
S O L U T I O N S
S E P A R A T E D A T A S I L O S
H A R D T O S C A L EN O D A T A A N A L Y S I SH A R D T O I N T E G R A T E
A N D I M P R O V E
When monitoring is not a part of the
core platform, it is hard to integrate
new components and costly to
maintain
Lack of data and correlation
capabilities makes troubleshooting
expensive
Scaling of traditional monitoring
solutions is challenging
Traditionally, DTV platforms have been
launched with minimal monitoring and
then expanded reactively
Multiple monitoring solutions for
different parts of the system
Connectivity Monitoring and
Entertainment monitoring are not
connected
7. Centralized operational data platform to replace multiple
co-existing monitoring solutions with a holistic system –
and to provide insight in the end-to-end health of the
overall systems.
ONE MONITORING PLATFORM
9. 9
Facts & Figures
Servers with
87 VMs
138
Production and
Satellite Clusters
18
Kafka Messages
per Second
ES Documents
per Second
100,000
Pull RequestsSpark Jobs
500,000
6000100+
10. Aggregating logs
from backend
components
Setting queues Aggregating
messages
Storage and
Parsing
Visualization
Data Pipeline
F L U M E K A F K A S P A R K E L A S T I C S E A R C H K I B A N A
11. AGILE SUPPORT
ODH supports Agile methodology
by easily adapting to any release
and testing cadence
MULTI-COUNTRY
ODH supports separate release
trains in multiple countries and
environments
FREQUENT RELEASES
ODH supports Agile methodology
by allowing frequent production
code drops in a complex DTV
ecosystem
Agile Support
12. DATA PROCESSING OPTIONS
Using historical data for proactive
capacity management, e.g.
consistent feedback for knowing
how and when to expand different
elements of the infrastructure
EASY ADOPTION & INTEGRATION
Every new component is integrated
and components do not need to
think about it
HOLISTIC VIEW
All the data is in one place:
backend components, set-top
boxes, modems, network,
performance
Data Availability
13. ANOMALY DETECTION
Neural-network based anomaly
detection trained on historical data
CORRELATIONS
Capabilities to correlate data from
different sources view the best
insight into end user experience
and undertake root-cause analysis
PREDICTIVE MODELLING
Using historical data for proactive
capacity management
Data Products
14. • Build a talented team, it will become experienced in the process
• Easy integration is the way to adopt a new solution
• When people recognize the value, it's time to get pickier, but first,
be flexible and willing to help
• The platform worked perfectly well without data and without
users. When the real integration began, we had masses of data,
users and requirements, needing effort to stabilize and optimize.
All architectural issues and lack of optimization surfaced
• Monitoring the monitoring system is important
Lessons Learned
Innovative digital products depend on new digital technologies. Engineering capability figures heavily in our ability to deliver and our credibility as a holistic digital service provider. But we need more capability in order to address all of the different challenges. Technology still underpins business success.
Support Agile development – frequent drops of new code
Scale: up to 10 million (?) users
Provide an integrated, holistic view
Merge the data between platforms – Connectivity and Digital TV
Enable (complex) data analysis
Provide insight into customer’s experience
Ease of integration
97 bare metal servers, 45 VMs in the main cluster; 42 servers, 42 VMs in other clusters
4 production clusters, 14 satellite clusters for local data ingestion
500k Kafka messages per second, 100k ES documents per second
100+ spark jobs
6000 PRs (4000PRs in ansible repo, 2000PRs in spark jobs' repos)
AG TODO: maybe
Support Agile development – frequent drops of new code
Scale: up to 10 million (?) users
Provide an integrated, holistic view
Merge the data between platforms – Connectivity and Digital TV
Enable (complex) data analysis
Provide insight into customer’s experience
Ease of integration
TODO: Ilya Epifanov
process (supports agile, frequent releases, multiple countries)
data availability (sources and integrations, data processing options, holistic view)
data products (anomaly detection, correlations, predictive modelling)