A Better Rich Media Experience & Video
Analytics at Arkena with Apache Hadoop
Welcome to today’s webinar.
We will begin shortly.
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Today’s Presenters
Reda Benzair
VP Technical Development, Arkena
John Kreisa
VP International Marketing, Hortonworks
A Better Rich Media
Experience & Video
Analytics at Arkena with
Apache Hadoop
Reda Benzair
VP Technical Development
Feb 2016
AGENDA
WHO WE ARE
CDN / OTT Business
1
AGENDA
WHO WE ARE
CDN / OTT Business
Why Media
Experience & Video
Analytics is so
Important for CDN.
21
AGENDA
WHO WE ARE
CDN / OTT Business
Why Media
Experience & Video
Analytics is so
Important for CDN.
Video Analytics
Challenge &
Difficulties
321
AGENDA
WHO WE ARE
CDN / OTT Business
Why Media
Experience & Video
Analytics is so
Important for CDN.
Video Analytics
Challenge &
Difficulties
Why we chosen
Hadoop Technology.
321
4
AGENDA
WHO WE ARE
CDN / OTT Business
Why Media
Experience & Video
Analytics is so
Important for CDN.
Video Analytics
Challenge &
Difficulties
Why we chosen
Hadoop Technology.
Architecture & Result
3
5
21
4
AGENDA
WHO WE ARE
CDN / OTT Business
Why Media
Experience & Video
Analytics is so
Important for CDN.
Video Analytics
Challenge &
Difficulties
Why we chosen
Hadoop Technology.
Architecture & Result Why we selected
Hortonworks
3
5
21
64
WHO WE ARE
YOUR TRUSTED MEDIA PARTNER
10
A TDF Group
Business Unit
• 16 POPs CDN
• 1 Tbps connectivity
• 400 live radios & 360 live TVs
• 630 hours of On Demand video processed daily
• United
Kingdom
• Norway
• USA
• Finland
• Denmark
• Poland
• France
• Spain
• Sweden
13 Offices in 9 Countries
A team of 400 employees
At a glance
CUSTOMERS
REFERENCES
MEDIA COMPANIES & OPERATORS
SOLUTIONS AND SERVICES
Cloud4Media
is our SaaS/PaaS
service that provides all
the necessary tools for
managing and
exchanging media
assets.
12
Cloud4Media
a SaaS/PaaS service
that provides all the necessary tools
for managing and exchanging
media assets.
PLAYOUT
optimized for the audiovisual
industry, designed to distribute
your live and
on-demand content.
OTT / CDN
solution with modular components
that enables content owners,
telecom operators and broadcasters
to provide video content to viewers
worldwide.
Video Platform
provides enterprises,
organizations and the public
sector with all-in-one tool to
publish, manage and distribute
video live or on-demand to every
device.
Play
is part of the Arkena Video
Platform and handles video
playback. Learn how to
customize PLAY to fit your
needs.
Mobile Publisher
is an add-on to the Arkena
Video Platform that lets you
publish live broadcasts and on
demand videos directly with
your Iphone.
CDN PLATFORM (CONTENT DELIVERY NETWORK)
ARKENA OTT / CDN
A UNIQUE EUROPEAN PRESENCE, ESPECIALLY FRANCE AND NORDICS
14
We offer advanced CDN solutions for media
 Live & OnDemand Streaming
 First-class Origin
 Transmuxing service
 Ads Insertion (Audio: Triton, Radionomy,
Adswizz)
 Timeshifting and Catch-up services
Arkena will offer a set of new
media analytics services
 Real Time Analytics
 Advanced Media Analytics
14
OTT SOLUTIONS (OVER-THE-TOP)
ARKENA OTT / CDN
A UNIQUE EUROPEAN PRESENCE, ESPECIALLY FRANCE AND NORDICS
1616
Content management & animation
• Metadata and catalog organization
• Offer scheduling and promotions
• Subscription, rental and purchase models
• Automatic sorts and optimized API for OTT apps display
User accounts management
• User ownership tracking
• DRM entitlements
• Device pairing and restrictions
• Multiscreen favorites and resume
Content processing & protection
• Adaptive streaming and download support
• Multiple audio tracks and subtitles support
• Smooth streaming with Playready and DASH with Marlin
• Geoblocking and streaming limits
17
Arkena OTT / CDN Analytics
Challenge
“Infrastructure capable of handling Millions
of simultaneous connections/requests”.
CDN Architecture
 Media specialized CDN with strong presence in Europe,
especially France and Nordics: audiovisual media streaming-
dedicated CDN
 Video and Audio delivery, Live and On-Demand services
 Multiscreen workflow expertise and broadcast / IP convergence
18
We deliver your contents with optimal performance
on all devices
NETWORK
IP
Regional
Network
CACHING SERVER
ORIGIN /
TRANSMUX SERVICE
PoP
CACHING SERVER
PoP
CACHING SERVER
PoP
IP
Regional
Network
IP
Regional
Network
 More than 300 CDN customers in Europe.
 With 16 European PoPs, local to final end-users
 Capacity: 1Tbps, very high storage capacity (~PB)
 More than 1000 streaming servers.
CDN Architecture
 Media specialized CDN with strong presence in Europe,
especially France and Nordics: audiovisual media streaming-
dedicated CDN
 Video and Audio delivery, Live and On-Demand services
 Multiscreen workflow expertise and broadcast / IP convergence
19
We deliver your contents with optimal performance
on all devices
NETWORK
IP
Regional
Network
CACHING SERVER
ORIGIN /
TRANSMUX SERVICE
PoP
CACHING SERVER
PoP
CACHING SERVER
PoP
IP
Regional
Network
IP
Regional
Network
 More than 300 CDN customers in Europe.
 With 16 European PoPs, local to final end-users
 Capacity: 1Tbps, very high storage capacity (~PB)
 More than 1000 streaming servers.
CLUSTER
Logs trafic
Why Media Experience & Video Analytics is so Important
20
Customer Trust
Real Time AnalyticsAdvanced Metrics
Advanced Media Analytics
to monetize your audience.
Billing & Payment
Reporting, Billing
20
Why we need an Efficient
Analytics System
21
Data overload every second
daily raw log size
(uncompressed, no replication)
20 GB
to 200 GB per day
Video Analytics Challenge
a peak rate of 60K
Events/Second
keep raw logs for 3-9 months
average raw log data
input rate
20 Mbps
to 120 Mbps
daily raw log size
(uncompressed, no replication)
20 GB
to 200 GB per day
Video Analytics Challenge
We compute15 Metricsat every batch:
Volume, Hits, Session duration, Concurrent sessions, Unique viewers...
All metrics are available over 15 Dimensions
Country, City, User agent, Browser, HTTP status code...
Real time statistics should be provided in
3 min
a peak rate of 60K
Events/Second
keep raw logs for 3-9 months
average raw log data
input rate
20 Mbps
to 120 Mbps
daily raw log size
(uncompressed, no replication)
20 GB
to 200 GB per day
Video Analytics Challenge
We compute15 Metricsat every batch:
Volume, Hits, Session duration, Concurrent sessions, Unique viewers...
All metrics are available over 15 Dimensions
Country, City, User agent, Browser, HTTP status code...
Real time statistics should be provided in
3 min
a peak rate of 60K
Events/Second
keep raw logs for 3-9 months
1 CDN "Edge" server generates
an average of
15 – 22 Million
Lines/Day
DASHAdaptative Bitrate
Streaming
average raw log data
input rate
20 Mbps
to 120 Mbps
1 Movie (HD, 1 hour) in
DASH format with
8 Video Tracks
1 Audio Track
4200 log
events
25
Video Analytics Challenge
Difficulty & Challenge
26
Video Analytics Challenge
Safely Transport the data in
Real time from differents POP
to the DATA cluster.
TRANSPORT
27
Video Analytics Challenge
Make life easy for the Operation
OPERATION
28
Video Analytics Challenge
Store the data safely over a
long period.
Compute the Metrics in Real
Time.
Consolidate in Batch
STORE DATA
29
Video Analytics Challenge
Compute the Analytics metrics
in Real Time.
Compute DATA in
Real Time
30
Video Analytics Story
2012 20142013 2015
Arkena Analytics has
built and developed
In House.
There is a major problem
in production with a
significant downtime.
Home made Open Source
31
Video Analytics Story
2012 20142013 2015
Arkena Analytics has
built and developed
In House.
There is a major problem
in production with a
significant downtime.
Analysis of the market.
Make or Buy
Launching of the project
with the partners. Build
the team (1 Project
Manager, 1 Developer , 1
System Engineer)
Home made Open Source
POC
32
Video Analytics Story
2012 20142013 2015
V 1
Arkena Analytics has
built and developed
In House.
There is a major problem
in production with a
significant downtime.
Analysis of the market.
Make or Buy
Launching of the project
with the partners. Build
the team (1 Project
Manager, 1 Developer , 1
System Engineer)
Release the Analytics
Platform to the operation
team and open the
services to the customers.
Home made Open Source
POC
33
Video Analytics Challenge
Hadoop Technology
34
TRANSPORT
Flume
 Apache Flume is a distributed, reliable, and available service
for efficiently collecting, aggregating, and moving large
amounts of streaming data into the HDP cluster.
 Flume already Integrated in HDP: YARN coordinates data
ingest from Apache Flume and other services that deliver raw
data into an HDP cluster.
Rsyslog
 RSYSLOG is the rocket-fast system for log processing. It offers
high-performance, great security features and a modular
design to transport data from our Edge.
 We use the RELP protocol (The Reliable Event Logging
Protocol). protocol to provide reliable delivery of event
messages.
Transport Safe
35
STORE DATA
Shared data set
 In-house solution: can't query the whole data set
 HDP: single entry point from HDFS, can query and
cross-correlate everything from the beginning of times
(almost).
Opportunities
 In-house solution: Rigid, A nightmare for the
operational teams.
 HDP: Give us new opportunities (Machine learning, new
metrics,…).
Stability & Trust
Hortonworks Data Platform
 In-house solution: add clusters to scale out (we had
3!)
 HDP: add nodes to scale out (storage + compute)
36
OPERATION
Reliability &
Scalability
YARN
 View your cluster as a single Data Operating System
 Run multiple jobs on multiple processing engines
 High availability with Standby Resource Manager
 Easy scale-out by adding more YARN NodeManagers
Queue Management
 Make sure business-critical jobs never lack resources
 Separate operation tasks from business tasks
 Validate new jobs' versions with no production impact
37
OPERATION
Compute Real Time HDP Stack SPARK Streaming
 HDP packages and incorporates the most recent and hadoop
software technology in the same Stack (Spark, Hive,Tez,…).
 Apache Spark is a fast, in-memory data processing engine.
Process the data very 2 min.
 HDP YARN-based architecture provides the foundation that
enables Spark and other applications to share a common
cluster and dataset while ensuring consistent levels of service
and response.
38
OPERATION
Compute Real Time HDP Stack SPARK Streaming
 HDP packages and incorporates the most recent and hadoop
software technology in the same Stack (Spark, Hive,Tez,…).
 Apache Spark is a fast, in-memory data processing engine.
Process the data very 2 min.
 HDP YARN-based architecture provides the foundation that
enables Spark and other applications to share a common
cluster and dataset while ensuring consistent levels of service
and response.
 Use Architecture Lambda
• Processing real Time : Spark Streaming.
• Synchronize the Data in the HDFS.
• Consolidate the data with Hive/Tez.
• Ingest in the ElasticSearch.
Events
Near Real
Time
Store Batch
39
OPERATION
Reduce operational
cost day-to-day
Easy To Use for the long run
 Easy Setup and Installation.
 Machine provisioning and capacity planning.
 Easier Provisioning and Faster Cluster Deployment
Ambari
 Expand clusters automatically as new nodes come
online
 Track cluster health, job progress and KPIs with alerts,
customizable views, customizable dashboards...
 REST API making deployment & configuration easy to
automate with modern conf management tools (Ansible)
ARKENA CDN : Analytics Cluster
Transport
ARKENA CDN : Analytics Cluster
HDP ComputeTransport
ARKENA CDN : Analytics Cluster
IndexingHDP Compute
Customer
Front-End
Transport
43
ARKENA CDN : HDP Cluster
1 2
3 4
5
Live ProcessingBatch Processing
Transport Multiple Processing Archivage Operations
44
ARKENA CDN : Hardware Cluster
A peak rate of
60K Events/Second
keep raw logs for
3-9 months
HDP Compute Cluster :
 We made the choice on DELL R730 the configuration we have
set with 16 Core, 128G RAM and 14 disk with 1To.
 We attempted to respect the rule of thumb for Hadoop of
 (1 Disk -> 8G RAM -> 1 physical core) in order to optimize the
I/O performances with 10 file channel per machine and we
kept 2 disk for the system.
ElasticSearch Cluster :
 We choice 5 machines M610 in order to have an odd number
for the redundancy and the failover
45
ARKENA CDN : Hardware Cluster
A peak rate of
60K Events/Second
keep raw logs for
3-9 months
Elastic Search Cluster
5 Machines
Cluster API
6 VM
HDP Cluster
8 Machines
HDP Compute Cluster :
 We made the choice on DELL R730 the configuration we have
set with 16 Core, 128G RAM and 14 disk with 1To.
 We attempted to respect the rule of thumb for Hadoop of
 (1 Disk -> 8G RAM -> 1 physical core) in order to optimize the
I/O performances with 10 file channel per machine and we
kept 2 disk for the system.
ElasticSearch Cluster :
 We choice 5 machines M610 in order to have an odd number
for the redundancy and the failover
46
ARKENA CDN : Transport
Form Edge to Cluster
 Rsyslog transport the logs from Edge to log
aggregator component.
 Feature available on Rsyslog :
 RELP protocol
 Native in Linux
 Disk Assit Queue beffuering
Ingest to HDFS
 Apache Flume is used to fetch the logs from
Rsyslog and push them to HDFS.
Transport Technology
 It’s not just how quickly you move data, but how
move safly from the Edge to the Cluster without
losing any lines.
 How we can have resilient solution : mixed 2
softwares.
47
ARKENA CDN : Transport
The log aggregator (with Rsyslog)
 The log aggregator is responsible for reliably
forwarding the logs to the Compute Cluster.
 If the compute cluster is unavailable or networks
issue, the logs are spooled on disk, and stay on the
aggregators until the compute cluster comes back
online.
 Logs are sent from the edge servers to Log
Aggregators. There is one aggregator per PoP.
 log aggregators are not specific to any PoP, we could
reproduce this setup on any PoP, hereby designated
as "PoPx" or "PoPy", just by deploying generic log
aggregators.
48
ARKENA CDN : Transport
The log aggregator (with Rsyslog)
 The log aggregator is responsible for reliably forwarding the
logs to the Compute Cluster.
 If the compute cluster is unavailable or networks issue, the
logs are spooled on disk, and stay on the aggregators until
the compute cluster comes back online.
 Logs are sent from the edge servers to Log Aggregators.
There is one aggregator per PoP.
 log aggregators are not specific to any PoP, we could
reproduce this setup on any PoP, hereby designated as
"PoPx" or "PoPy", just by deploying generic log aggregators.
49
ARKENA CDN : HDFS
Ingest into HDFS
 The logs are ingested in HDFS once the local
Rsyslog on each Hadoop node receives an event.
 Apache Flume is used to fetch the logs from
Rsyslog and push them to HDFS.
 The local Rsyslog forwards an event to the local
Flume agent (TCP connection to `localhost`)
 The Flume agent then proceeds to send the logs to
HDFS, while buffering them on disk for durability
reasons.
50
ARKENA CDN : HDFS
Ingest into HDFS
 The logs are ingested in HDFS once the local
Rsyslog on each Hadoop node receives an event.
 Apache Flume is used to fetch the logs from
Rsyslog and push them to HDFS.
 The local Rsyslog forwards an event to the local
Flume agent (TCP connection to `localhost`)
 The Flume agent then proceeds to send the logs to
HDFS, while buffering them on disk for durability
reasons.
51
ARKENA CDN : HDFS
Ingest into HDFS
 An SyslogSource, listening on a TCP socket, receives
the incoming rsyslog event
 a "FileChannel" listens for incoming events on the
rsyslog TCP source, and writes them locally to 10
different "datadirs" on 10 separate physical hard disk
drives.
 Each datadir acts as a FIFO. Load is balanced evenly
from the single Rsyslog TCP source to the 10 datadirs
 The "FileChannel" is plugged to 4 "HDFS Sinks".
When enough events have been buffered in the
channel, those events are sent to the 4 HDFS sinks in
an evenly balanced fashion.
52
ARKENA CDN : Customer Front-End
53
ARKENA CDN : Spark Streaming
Events
Near Real
Time
Store
Events
1
1
1
54
ARKENA CDN : Spark Streaming
Events
Near Real
Time
Store Batch
Events
2
3
Arkena choose Hadoop
(Hortonworks)
56
Why we selected Hortonworks
Avoid Vendor Lock In
 Hortonworks Data Platform is close to the open
source trunk as possible and is developed 100%
in the open so you are never locked in.
 Present a single, tested and completely open
Hadoop platform with no proprietary bolt-ons.
Transparency
 Price Model & Unlimited Support Throughout
our projects
“Hortonworks loves and lives
open source innovation”,
Arkena does as well!
57
Why we selected Hortonworks
Connect With the Community
 We employ a large number of Apache project
committers & innovators so that you are
represented in the open source community.
 Only Hortonworks can deliver the deepest level
of support across all the components of the
Hadoop platform.
Support from the Experts
 They provide the highest quality of support for
deploying at scale.
“Hortonworks loves and lives
open source innovation”,
Arkena as well.
58
59
What Happened after the Release
We have identified some improvement items after
the production release.
60
What Happened after the Release
We have identified some improvement items after
the production release.
Transport
61
What Happened after the Release
We have identified some improvement items after
the production release.
Transport Operation
62
About The team
Reda Benzair
Projet Roles :
Architect & Project management
Work Experience
Executive MBA, Graduate from Engineering
School and Master of Advanced Study
university (DEA). 15 years of experience in
SmartJog SAS (become Arkena in 2013)
TDF subsidiary. Since 2013 VP Technical
development, leading technical
development team located in Paris,
Stockholm and Warsaw.
Projet Roles : Senior Software Engineer,
Spark, System
Work Experience
A passionate programmer with a strong
interest in devops and software
craftmanship, Erwan has been working on
complex distributed architectures during
the last 10 years. He joined Arkena as a
general-purpose Analytics engineer,
worked on the Hadoop data processing
pipeline, developped a decent chunk
Erwan Queffelec Julien Girardin
Projet Roles : Senior System
administrator and python developper.
Work Experience
A passionate with Linux system with a
strong interest in devops and python
development. Strong experience with
complex distributed architectures.
Page63 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ONLY
100open source
Apache Hadoop data platform
% Founded in 2011
HADOOP
1ST
provider to go public
IPO 4Q14 (NASDAQ: HDP)
subscription
customers
800+ employees across
~850
countri
es
technology partners
1,600+ 16TM
About Hortonworks
Fastest enterprise software company to reach $100 million in
annual revenue
(Barclays research, 2015)
Page64 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage64
Social
Mapping
Payment
Tracking
Factory
Yields
Defect
Detection
Call
Analysis
Machine
Data
Product
Design
M & A
Due
Diligence
Next
Product
Recs
Cyber
Security
Risk
Modeling
Ad
Placement
Proactive
Repair
Disaster
Mitigation
Investment
Planning
Inventory
Predictions
Customer
Support
Sentiment
Analysis
Supply
Chain
Ad
Placement
Basket
Analysis
Segments
Cross-
Sell
Customer
Retention
Vendor
Scorecards
Optimize
Inventories
OPEX
Reduction
Mainframe
Offloads
Historical
Records
Data
as a
Service
Public
Data
Capture
Fraud
Prevention
Device
Data
Ingest
Rapid
Reporting
Digital
Protection
Hadoop Summit 2016 - Dublin
Date: Wednesday 13 – Thursday 14 April, 2016
Venue: Convention Centre Dublin
Website: www.hadoopsummit.org
Why Should You Attend?
• Hadoop Summit is Europe’s premier industry event for Apache Hadoop users, developers and vendors
• Two full days of practical and cutting edge education designed by the community – for the community
• Over 90 sessions spanning 7 tracks dedicated to enabling the next generation data platform
• A Community Showcase featuring the industries who’s who
• Crash courses for those just beginning with Hadoop
• Community driven meetups
• Birds of a Feather (BoFs) meetings to promote collaboration
• Comprehensive pre-event hands on classroom training
• A social program which provides ample opportunity to network and make new industry connections
• An amazing event party at the Guinness Storehouse Brewery
Plus much much more!
Register Now to take advantage of our Early Bird rates!
Page66 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions & Next Steps
Attend our next webinar
Download the sandbox
Try Hortonworks Data Flow
Page66 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
67
Why we selected Hortonworks
Thank you for your attention!

A Better Rich Media Experience & Video Analytics at Arkena with Apache Hadoop

  • 1.
    A Better RichMedia Experience & Video Analytics at Arkena with Apache Hadoop Welcome to today’s webinar. We will begin shortly. © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 2.
    Page2 © HortonworksInc. 2011 – 2016. All Rights Reserved Today’s Presenters Reda Benzair VP Technical Development, Arkena John Kreisa VP International Marketing, Hortonworks
  • 3.
    A Better RichMedia Experience & Video Analytics at Arkena with Apache Hadoop Reda Benzair VP Technical Development Feb 2016
  • 4.
    AGENDA WHO WE ARE CDN/ OTT Business 1
  • 5.
    AGENDA WHO WE ARE CDN/ OTT Business Why Media Experience & Video Analytics is so Important for CDN. 21
  • 6.
    AGENDA WHO WE ARE CDN/ OTT Business Why Media Experience & Video Analytics is so Important for CDN. Video Analytics Challenge & Difficulties 321
  • 7.
    AGENDA WHO WE ARE CDN/ OTT Business Why Media Experience & Video Analytics is so Important for CDN. Video Analytics Challenge & Difficulties Why we chosen Hadoop Technology. 321 4
  • 8.
    AGENDA WHO WE ARE CDN/ OTT Business Why Media Experience & Video Analytics is so Important for CDN. Video Analytics Challenge & Difficulties Why we chosen Hadoop Technology. Architecture & Result 3 5 21 4
  • 9.
    AGENDA WHO WE ARE CDN/ OTT Business Why Media Experience & Video Analytics is so Important for CDN. Video Analytics Challenge & Difficulties Why we chosen Hadoop Technology. Architecture & Result Why we selected Hortonworks 3 5 21 64
  • 10.
    WHO WE ARE YOURTRUSTED MEDIA PARTNER 10 A TDF Group Business Unit • 16 POPs CDN • 1 Tbps connectivity • 400 live radios & 360 live TVs • 630 hours of On Demand video processed daily • United Kingdom • Norway • USA • Finland • Denmark • Poland • France • Spain • Sweden 13 Offices in 9 Countries A team of 400 employees At a glance
  • 11.
  • 12.
    MEDIA COMPANIES &OPERATORS SOLUTIONS AND SERVICES Cloud4Media is our SaaS/PaaS service that provides all the necessary tools for managing and exchanging media assets. 12 Cloud4Media a SaaS/PaaS service that provides all the necessary tools for managing and exchanging media assets. PLAYOUT optimized for the audiovisual industry, designed to distribute your live and on-demand content. OTT / CDN solution with modular components that enables content owners, telecom operators and broadcasters to provide video content to viewers worldwide. Video Platform provides enterprises, organizations and the public sector with all-in-one tool to publish, manage and distribute video live or on-demand to every device. Play is part of the Arkena Video Platform and handles video playback. Learn how to customize PLAY to fit your needs. Mobile Publisher is an add-on to the Arkena Video Platform that lets you publish live broadcasts and on demand videos directly with your Iphone.
  • 13.
    CDN PLATFORM (CONTENTDELIVERY NETWORK)
  • 14.
    ARKENA OTT /CDN A UNIQUE EUROPEAN PRESENCE, ESPECIALLY FRANCE AND NORDICS 14 We offer advanced CDN solutions for media  Live & OnDemand Streaming  First-class Origin  Transmuxing service  Ads Insertion (Audio: Triton, Radionomy, Adswizz)  Timeshifting and Catch-up services Arkena will offer a set of new media analytics services  Real Time Analytics  Advanced Media Analytics 14
  • 15.
  • 16.
    ARKENA OTT /CDN A UNIQUE EUROPEAN PRESENCE, ESPECIALLY FRANCE AND NORDICS 1616 Content management & animation • Metadata and catalog organization • Offer scheduling and promotions • Subscription, rental and purchase models • Automatic sorts and optimized API for OTT apps display User accounts management • User ownership tracking • DRM entitlements • Device pairing and restrictions • Multiscreen favorites and resume Content processing & protection • Adaptive streaming and download support • Multiple audio tracks and subtitles support • Smooth streaming with Playready and DASH with Marlin • Geoblocking and streaming limits
  • 17.
    17 Arkena OTT /CDN Analytics Challenge “Infrastructure capable of handling Millions of simultaneous connections/requests”.
  • 18.
    CDN Architecture  Mediaspecialized CDN with strong presence in Europe, especially France and Nordics: audiovisual media streaming- dedicated CDN  Video and Audio delivery, Live and On-Demand services  Multiscreen workflow expertise and broadcast / IP convergence 18 We deliver your contents with optimal performance on all devices NETWORK IP Regional Network CACHING SERVER ORIGIN / TRANSMUX SERVICE PoP CACHING SERVER PoP CACHING SERVER PoP IP Regional Network IP Regional Network  More than 300 CDN customers in Europe.  With 16 European PoPs, local to final end-users  Capacity: 1Tbps, very high storage capacity (~PB)  More than 1000 streaming servers.
  • 19.
    CDN Architecture  Mediaspecialized CDN with strong presence in Europe, especially France and Nordics: audiovisual media streaming- dedicated CDN  Video and Audio delivery, Live and On-Demand services  Multiscreen workflow expertise and broadcast / IP convergence 19 We deliver your contents with optimal performance on all devices NETWORK IP Regional Network CACHING SERVER ORIGIN / TRANSMUX SERVICE PoP CACHING SERVER PoP CACHING SERVER PoP IP Regional Network IP Regional Network  More than 300 CDN customers in Europe.  With 16 European PoPs, local to final end-users  Capacity: 1Tbps, very high storage capacity (~PB)  More than 1000 streaming servers. CLUSTER Logs trafic
  • 20.
    Why Media Experience& Video Analytics is so Important 20 Customer Trust Real Time AnalyticsAdvanced Metrics Advanced Media Analytics to monetize your audience. Billing & Payment Reporting, Billing 20 Why we need an Efficient Analytics System
  • 21.
  • 22.
    daily raw logsize (uncompressed, no replication) 20 GB to 200 GB per day Video Analytics Challenge a peak rate of 60K Events/Second keep raw logs for 3-9 months average raw log data input rate 20 Mbps to 120 Mbps
  • 23.
    daily raw logsize (uncompressed, no replication) 20 GB to 200 GB per day Video Analytics Challenge We compute15 Metricsat every batch: Volume, Hits, Session duration, Concurrent sessions, Unique viewers... All metrics are available over 15 Dimensions Country, City, User agent, Browser, HTTP status code... Real time statistics should be provided in 3 min a peak rate of 60K Events/Second keep raw logs for 3-9 months average raw log data input rate 20 Mbps to 120 Mbps
  • 24.
    daily raw logsize (uncompressed, no replication) 20 GB to 200 GB per day Video Analytics Challenge We compute15 Metricsat every batch: Volume, Hits, Session duration, Concurrent sessions, Unique viewers... All metrics are available over 15 Dimensions Country, City, User agent, Browser, HTTP status code... Real time statistics should be provided in 3 min a peak rate of 60K Events/Second keep raw logs for 3-9 months 1 CDN "Edge" server generates an average of 15 – 22 Million Lines/Day DASHAdaptative Bitrate Streaming average raw log data input rate 20 Mbps to 120 Mbps 1 Movie (HD, 1 hour) in DASH format with 8 Video Tracks 1 Audio Track 4200 log events
  • 25.
  • 26.
    26 Video Analytics Challenge SafelyTransport the data in Real time from differents POP to the DATA cluster. TRANSPORT
  • 27.
    27 Video Analytics Challenge Makelife easy for the Operation OPERATION
  • 28.
    28 Video Analytics Challenge Storethe data safely over a long period. Compute the Metrics in Real Time. Consolidate in Batch STORE DATA
  • 29.
    29 Video Analytics Challenge Computethe Analytics metrics in Real Time. Compute DATA in Real Time
  • 30.
    30 Video Analytics Story 201220142013 2015 Arkena Analytics has built and developed In House. There is a major problem in production with a significant downtime. Home made Open Source
  • 31.
    31 Video Analytics Story 201220142013 2015 Arkena Analytics has built and developed In House. There is a major problem in production with a significant downtime. Analysis of the market. Make or Buy Launching of the project with the partners. Build the team (1 Project Manager, 1 Developer , 1 System Engineer) Home made Open Source POC
  • 32.
    32 Video Analytics Story 201220142013 2015 V 1 Arkena Analytics has built and developed In House. There is a major problem in production with a significant downtime. Analysis of the market. Make or Buy Launching of the project with the partners. Build the team (1 Project Manager, 1 Developer , 1 System Engineer) Release the Analytics Platform to the operation team and open the services to the customers. Home made Open Source POC
  • 33.
  • 34.
    34 TRANSPORT Flume  Apache Flumeis a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming data into the HDP cluster.  Flume already Integrated in HDP: YARN coordinates data ingest from Apache Flume and other services that deliver raw data into an HDP cluster. Rsyslog  RSYSLOG is the rocket-fast system for log processing. It offers high-performance, great security features and a modular design to transport data from our Edge.  We use the RELP protocol (The Reliable Event Logging Protocol). protocol to provide reliable delivery of event messages. Transport Safe
  • 35.
    35 STORE DATA Shared dataset  In-house solution: can't query the whole data set  HDP: single entry point from HDFS, can query and cross-correlate everything from the beginning of times (almost). Opportunities  In-house solution: Rigid, A nightmare for the operational teams.  HDP: Give us new opportunities (Machine learning, new metrics,…). Stability & Trust Hortonworks Data Platform  In-house solution: add clusters to scale out (we had 3!)  HDP: add nodes to scale out (storage + compute)
  • 36.
    36 OPERATION Reliability & Scalability YARN  Viewyour cluster as a single Data Operating System  Run multiple jobs on multiple processing engines  High availability with Standby Resource Manager  Easy scale-out by adding more YARN NodeManagers Queue Management  Make sure business-critical jobs never lack resources  Separate operation tasks from business tasks  Validate new jobs' versions with no production impact
  • 37.
    37 OPERATION Compute Real TimeHDP Stack SPARK Streaming  HDP packages and incorporates the most recent and hadoop software technology in the same Stack (Spark, Hive,Tez,…).  Apache Spark is a fast, in-memory data processing engine. Process the data very 2 min.  HDP YARN-based architecture provides the foundation that enables Spark and other applications to share a common cluster and dataset while ensuring consistent levels of service and response.
  • 38.
    38 OPERATION Compute Real TimeHDP Stack SPARK Streaming  HDP packages and incorporates the most recent and hadoop software technology in the same Stack (Spark, Hive,Tez,…).  Apache Spark is a fast, in-memory data processing engine. Process the data very 2 min.  HDP YARN-based architecture provides the foundation that enables Spark and other applications to share a common cluster and dataset while ensuring consistent levels of service and response.  Use Architecture Lambda • Processing real Time : Spark Streaming. • Synchronize the Data in the HDFS. • Consolidate the data with Hive/Tez. • Ingest in the ElasticSearch. Events Near Real Time Store Batch
  • 39.
    39 OPERATION Reduce operational cost day-to-day EasyTo Use for the long run  Easy Setup and Installation.  Machine provisioning and capacity planning.  Easier Provisioning and Faster Cluster Deployment Ambari  Expand clusters automatically as new nodes come online  Track cluster health, job progress and KPIs with alerts, customizable views, customizable dashboards...  REST API making deployment & configuration easy to automate with modern conf management tools (Ansible)
  • 40.
    ARKENA CDN :Analytics Cluster Transport
  • 41.
    ARKENA CDN :Analytics Cluster HDP ComputeTransport
  • 42.
    ARKENA CDN :Analytics Cluster IndexingHDP Compute Customer Front-End Transport
  • 43.
    43 ARKENA CDN :HDP Cluster 1 2 3 4 5 Live ProcessingBatch Processing Transport Multiple Processing Archivage Operations
  • 44.
    44 ARKENA CDN :Hardware Cluster A peak rate of 60K Events/Second keep raw logs for 3-9 months HDP Compute Cluster :  We made the choice on DELL R730 the configuration we have set with 16 Core, 128G RAM and 14 disk with 1To.  We attempted to respect the rule of thumb for Hadoop of  (1 Disk -> 8G RAM -> 1 physical core) in order to optimize the I/O performances with 10 file channel per machine and we kept 2 disk for the system. ElasticSearch Cluster :  We choice 5 machines M610 in order to have an odd number for the redundancy and the failover
  • 45.
    45 ARKENA CDN :Hardware Cluster A peak rate of 60K Events/Second keep raw logs for 3-9 months Elastic Search Cluster 5 Machines Cluster API 6 VM HDP Cluster 8 Machines HDP Compute Cluster :  We made the choice on DELL R730 the configuration we have set with 16 Core, 128G RAM and 14 disk with 1To.  We attempted to respect the rule of thumb for Hadoop of  (1 Disk -> 8G RAM -> 1 physical core) in order to optimize the I/O performances with 10 file channel per machine and we kept 2 disk for the system. ElasticSearch Cluster :  We choice 5 machines M610 in order to have an odd number for the redundancy and the failover
  • 46.
    46 ARKENA CDN :Transport Form Edge to Cluster  Rsyslog transport the logs from Edge to log aggregator component.  Feature available on Rsyslog :  RELP protocol  Native in Linux  Disk Assit Queue beffuering Ingest to HDFS  Apache Flume is used to fetch the logs from Rsyslog and push them to HDFS. Transport Technology  It’s not just how quickly you move data, but how move safly from the Edge to the Cluster without losing any lines.  How we can have resilient solution : mixed 2 softwares.
  • 47.
    47 ARKENA CDN :Transport The log aggregator (with Rsyslog)  The log aggregator is responsible for reliably forwarding the logs to the Compute Cluster.  If the compute cluster is unavailable or networks issue, the logs are spooled on disk, and stay on the aggregators until the compute cluster comes back online.  Logs are sent from the edge servers to Log Aggregators. There is one aggregator per PoP.  log aggregators are not specific to any PoP, we could reproduce this setup on any PoP, hereby designated as "PoPx" or "PoPy", just by deploying generic log aggregators.
  • 48.
    48 ARKENA CDN :Transport The log aggregator (with Rsyslog)  The log aggregator is responsible for reliably forwarding the logs to the Compute Cluster.  If the compute cluster is unavailable or networks issue, the logs are spooled on disk, and stay on the aggregators until the compute cluster comes back online.  Logs are sent from the edge servers to Log Aggregators. There is one aggregator per PoP.  log aggregators are not specific to any PoP, we could reproduce this setup on any PoP, hereby designated as "PoPx" or "PoPy", just by deploying generic log aggregators.
  • 49.
    49 ARKENA CDN :HDFS Ingest into HDFS  The logs are ingested in HDFS once the local Rsyslog on each Hadoop node receives an event.  Apache Flume is used to fetch the logs from Rsyslog and push them to HDFS.  The local Rsyslog forwards an event to the local Flume agent (TCP connection to `localhost`)  The Flume agent then proceeds to send the logs to HDFS, while buffering them on disk for durability reasons.
  • 50.
    50 ARKENA CDN :HDFS Ingest into HDFS  The logs are ingested in HDFS once the local Rsyslog on each Hadoop node receives an event.  Apache Flume is used to fetch the logs from Rsyslog and push them to HDFS.  The local Rsyslog forwards an event to the local Flume agent (TCP connection to `localhost`)  The Flume agent then proceeds to send the logs to HDFS, while buffering them on disk for durability reasons.
  • 51.
    51 ARKENA CDN :HDFS Ingest into HDFS  An SyslogSource, listening on a TCP socket, receives the incoming rsyslog event  a "FileChannel" listens for incoming events on the rsyslog TCP source, and writes them locally to 10 different "datadirs" on 10 separate physical hard disk drives.  Each datadir acts as a FIFO. Load is balanced evenly from the single Rsyslog TCP source to the 10 datadirs  The "FileChannel" is plugged to 4 "HDFS Sinks". When enough events have been buffered in the channel, those events are sent to the 4 HDFS sinks in an evenly balanced fashion.
  • 52.
    52 ARKENA CDN :Customer Front-End
  • 53.
    53 ARKENA CDN :Spark Streaming Events Near Real Time Store Events 1 1 1
  • 54.
    54 ARKENA CDN :Spark Streaming Events Near Real Time Store Batch Events 2 3
  • 55.
  • 56.
    56 Why we selectedHortonworks Avoid Vendor Lock In  Hortonworks Data Platform is close to the open source trunk as possible and is developed 100% in the open so you are never locked in.  Present a single, tested and completely open Hadoop platform with no proprietary bolt-ons. Transparency  Price Model & Unlimited Support Throughout our projects “Hortonworks loves and lives open source innovation”, Arkena does as well!
  • 57.
    57 Why we selectedHortonworks Connect With the Community  We employ a large number of Apache project committers & innovators so that you are represented in the open source community.  Only Hortonworks can deliver the deepest level of support across all the components of the Hadoop platform. Support from the Experts  They provide the highest quality of support for deploying at scale. “Hortonworks loves and lives open source innovation”, Arkena as well.
  • 58.
  • 59.
    59 What Happened afterthe Release We have identified some improvement items after the production release.
  • 60.
    60 What Happened afterthe Release We have identified some improvement items after the production release. Transport
  • 61.
    61 What Happened afterthe Release We have identified some improvement items after the production release. Transport Operation
  • 62.
    62 About The team RedaBenzair Projet Roles : Architect & Project management Work Experience Executive MBA, Graduate from Engineering School and Master of Advanced Study university (DEA). 15 years of experience in SmartJog SAS (become Arkena in 2013) TDF subsidiary. Since 2013 VP Technical development, leading technical development team located in Paris, Stockholm and Warsaw. Projet Roles : Senior Software Engineer, Spark, System Work Experience A passionate programmer with a strong interest in devops and software craftmanship, Erwan has been working on complex distributed architectures during the last 10 years. He joined Arkena as a general-purpose Analytics engineer, worked on the Hadoop data processing pipeline, developped a decent chunk Erwan Queffelec Julien Girardin Projet Roles : Senior System administrator and python developper. Work Experience A passionate with Linux system with a strong interest in devops and python development. Strong experience with complex distributed architectures.
  • 63.
    Page63 © HortonworksInc. 2011 – 2015. All Rights Reserved ONLY 100open source Apache Hadoop data platform % Founded in 2011 HADOOP 1ST provider to go public IPO 4Q14 (NASDAQ: HDP) subscription customers 800+ employees across ~850 countri es technology partners 1,600+ 16TM About Hortonworks Fastest enterprise software company to reach $100 million in annual revenue (Barclays research, 2015)
  • 64.
    Page64 © HortonworksInc. 2011 – 2016. All Rights ReservedPage64 Social Mapping Payment Tracking Factory Yields Defect Detection Call Analysis Machine Data Product Design M & A Due Diligence Next Product Recs Cyber Security Risk Modeling Ad Placement Proactive Repair Disaster Mitigation Investment Planning Inventory Predictions Customer Support Sentiment Analysis Supply Chain Ad Placement Basket Analysis Segments Cross- Sell Customer Retention Vendor Scorecards Optimize Inventories OPEX Reduction Mainframe Offloads Historical Records Data as a Service Public Data Capture Fraud Prevention Device Data Ingest Rapid Reporting Digital Protection
  • 65.
    Hadoop Summit 2016- Dublin Date: Wednesday 13 – Thursday 14 April, 2016 Venue: Convention Centre Dublin Website: www.hadoopsummit.org Why Should You Attend? • Hadoop Summit is Europe’s premier industry event for Apache Hadoop users, developers and vendors • Two full days of practical and cutting edge education designed by the community – for the community • Over 90 sessions spanning 7 tracks dedicated to enabling the next generation data platform • A Community Showcase featuring the industries who’s who • Crash courses for those just beginning with Hadoop • Community driven meetups • Birds of a Feather (BoFs) meetings to promote collaboration • Comprehensive pre-event hands on classroom training • A social program which provides ample opportunity to network and make new industry connections • An amazing event party at the Guinness Storehouse Brewery Plus much much more! Register Now to take advantage of our Early Bird rates!
  • 66.
    Page66 © HortonworksInc. 2011 – 2016. All Rights Reserved Questions & Next Steps Attend our next webinar Download the sandbox Try Hortonworks Data Flow Page66 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 67.
    67 Why we selectedHortonworks Thank you for your attention!

Editor's Notes

  • #2 TALK TRACK Good morning. I’m Justin Sears and I run Industry Marketing at Hortonworks. I’m excited to be speaking with you today about how Hortonworks is powering the future of data. [NEXT SLIDE]
  • #3 TALK TRACK Here are just a few of the modern data apps that convert yesterday’s impossible challenges into today’s new products, cures, conveniences and life saving innovations. These apps are either custom-built by our customers or they come of the shelf, created by Hortonworks or one of of our ecosystem partners to solve a particular problem. Symantec and other cyber security leaders have built powerful apps to detect threats to digital information. Leading pharma, automotive, consumer electronics and packaged goods companies are building their factories of the future that use actionable intelligence to improve manufacturing yields. And age-old industries like automotive, agriculture and retail are taking connected data platforms on the road, through the field or to the cash register to do things that have never before been possible. [NEXT SLIDE]
  • #11 Bonjour Messieurs & Dames, Je me présente Reda Benzair Vise Président technique chez Arkena. Qui est une société de services media du groupe TDF présente dans 9 pays et compte 1500 clients dans les media & télécoms. Nous fournissons des solutions de gestion de contenus (Echange , Stockage ,…) et de diffusion linéaire et à la demande. Et une plateforme OTT dans le cloud et un CDN pour la distribution video.
  • #12 Voici quelques références de nos clients qui nous fons confiances dans le gestion des medias et de diffusion. Des exemples dans le sport comme BeIN sur leur CDN et OTT.
  • #15 Nous proposons à nos clients media une plateforme de CDN (Caching) Difussion avec une forte presence en France et en europe, Transmux ,Origin pour la diffusion, et la mnetisation des diffusions audio. Nous fournissons aussi un services de statistiques temps réel pour nos clients.
  • #17 Nous proposons à nos clients media une plateforme de CDN (Caching) Difussion avec une forte presence en France et en europe, Transmux ,Origin pour la diffusion, et la mnetisation des diffusions audio. Nous fournissons aussi un services de statistiques temps réel pour nos clients.
  • #18 Le challagende d’un CDN c’est de fournri uneinfrastr capable de supporter et de traiter
  • #21 C’est important pour une entreprise de diffusion d’avoir une solution de statistique efficace et robuste pour les raisons De facturation. Ca permet au diffusieur d’aa nos clients a bien monétisé leur diffusion internet Une plateforme stable permet de crée une relation de confiance ave cnos client. C’est l’element que le client regarde
  • #22  Pour vous donner un ordre d’idée et de volumétrie et des contraintes que nous devons gérer avec le flow de donnée qui es génére par nos platforme de diffusion.
  • #31 Comme toute boite informatique nous avons décider de construire notre propre solution analytique in house basé sur les technologies Open Source. Fin 2013 suite un problème opérationnel et les difficultés de faire évoluer
  • #32 Comme toute boite informatique nous avons décider de construire notre propre solution analytique in house basé sur les technologies Open Source. Fin 2013 suite un problème opérationnel et les difficultés de faire évoluer
  • #33 Comme toute boite informatique nous avons décider de construire notre propre solution analytique in house basé sur les technologies Open Source. Fin 2013 suite un problème opérationnel et les difficultés de faire évoluer
  • #45 Une rapide vue sur notre architecture qui tourne en production aujourd’hui. Avec les derniers génération de machine nous avons pu faire fournir un cluster HDP avec seulement 8 machines qui reste relativement raisonnable (Prix / Perfomance) . R730 ( 16 core , 128 GRAM, 14 Disque de 1 To) et M610 (12 Core, 48G RAM).
  • #46 Une rapide vue sur notre architecture qui tourne en production aujourd’hui. Avec les derniers génération de machine nous avons pu faire fournir un cluster HDP avec seulement 8 machines qui reste relativement raisonnable (Prix / Perfomance) . R730 ( 16 core , 128 GRAM, 14 Disque de 1 To) et M610 (12 Core, 48G RAM).
  • #57 La stack Hortonworks est complètement ouvert et open source. Aucune dépendance ou contrainte. Ce qui permet une liberté d echanger de fournisseur si le besoin se ressent.
  • #64 We are pleased to have completed our first year as a publicly traded company, and 2015 marked several milestones for us:   Customers. We more than doubled our customer base in 2015 and now have over 800 support subscription customers. We believe this traction is indicative of the unique value proposition we bring to the market in the form of 100% open source, our standing within the Apache community, and multi-product offerings that address both Data in Motion and Data at Rest. Scale. We became the fastest enterprise software company to reach $100 million in annual revenue (according to Barclays research). In fact, we once again experienced triple digit annual revenue growth coming in at $122 million for 2015. Employees. We hired the best and brightest people. We exited the year with about 850 employees, and from an engineering perspective, our efforts are applied across well over 200 Apache committer seats, which allowed us to accelerate innovation through our teams and the community. Expand Market Opportunity. With the acquisition of Onyara last year, we expanded our Big Data and Analytics market focus to also target adjacent opportunities within the Internet of Things.