OVH Spark Cluster as a Service Overview

1
OVH Analytics Data Compute
April 2019
Apache Spark Cluster as a Service
Mojtaba Imani
DevOps Cloud and Bigdata @OVH

2
Deep Learning
IoT Platform
Artificial Intelligence
Smart cities
Edge
Computing
Autonomous Vehicles
Blockchains
Biochips
Smart robots & drones
BY 2025, data will be x10 the total datasphere produced till 2016
- IDC Survey (2017) « Data Age 2025 »
The Cloud Transformation: A data odyssey…
“Exponential growth of data, its costs & governance:
IS YOUR BUSINESS READY FOR IT?”

3
90 % of the whole data in the world has been created in last 2 years.
Autonomous cars generate 20TB of data per hour

4
Computing Cluster
https://wiki.rc.hms.harvard.edu/display/O2/O2+HPC+Cluster+and+Computing+Nodes+Hardware

5
Apache Spark
• An open-source distributed general-purpose cluster-computing
framework.
• The largest open source community in big data
• Up to 100 times faster than hadoop mapReduce (in-memory
processing and lazy evaluation)
• The leading platform for large-scale SQL, batch processing, stream
processing, and machine learning
• Easy to use API and coding with Java, Scala, Python, R, and SQL

6
Apache Spark Cluster
source: http://www.buyukveri.co/en/apache-spark-architecture/

7
Computing Cluster
https://wiki.rc.hms.harvard.edu/display/O2/O2+HPC+Cluster+and+Computing+Nodes+Hardware

8
Problem?
• Time and money
• Computer and Network skills and knowledge
• Maintenance
• Spark version
• Scale up
• Idle times
• Cloud Data Connection

11
Cloud Virtual Computing Cluster

12
Problem?
• Time and money (installation and configuration)
• Maintenance
• Spark version
• Scale up
• Idle times

13
Problem?
• Time and money (installation and configuration)
• Maintenance
• Spark version
• Scale up
• Idle times
ovh-spark-submit

16
What is difference with original Spark-submit command?

17
OVH Analytics Data Compute
• Same command line and options as original Spark command line
• Select Spark version ( --version 2.4.0)
• Cluster is only accessible through HTTPS.
• A new and dedicated cluster will be created for each request and it will be
deleted after finishing the job.
• You have the option to keep your cluster after finishing the job. ( --keep-infra)
• Your cluster is isolated from internet.
• Your cluster computers are created in your own Openstack project.
• Results and output logs will be saved in swift of your Openstack project.
• Input and output of data can be any source or format of data

18
How to use? www.ovh.com - Espace Client

19
Download and use command line: ovh-spark-submit
Full manual: https://docs.ovh.com/gb/en/analytics-data-compute/labs/data-compute/getting-started-with-analytics-data-compute/

20
Customer Feedbacks
A French TV Channels provider:
Analytics Data Compute helps us on defining new jobs without having
impact on our production pipelines.
A French Car Manufacturer:
With such a huge amount of data we have, we needed a solution that
could scale easily. Moreover, we did not needed a full hadoop stack. So,
Analytics Data Compute was the perfect candidate.
A French Bank :
We had trouble with spark versions and spark cluster management. With
Analytics Data Compute, we no longer have to bother with that kind of
problems.

21
Global cloud hyperscalers concentration
cloud providers HQs dictate your data sovereignty options
Freedom Act
Cloud Act
GDPR

22
OVH, A GLOBAL HYPER-SCALE CLOUD PROVIDER
KEY FACTS & FIGURES
1,500,000+
CUSTOMERS*
2200+
EMPLOYEES WORLDWIDE
1,5 BILLION €
OF INVESTMENT OVER 5 YEARS
98%
OF HOSTING ROOMS FREE FROM AIR
CONDITIONING
28 DATACENTERS
350,000 SERVERS
5000+
Partners and communities
*May 2018

23
28 Datacenters
Owned, operated and racked by OVH
manufacturing units.
From sheet metal to automated
operations.
Hillsboro
r x1
Vint Hill
r x1
Beauharnois
r x7
Singapore
r x1
Sydney
r x1
Warsaw
r x1
London
r x1
Roubaix rx7
Gravelines rx2
Strasbourg rx4
Paris rx1
Frankfurt
r x1

24
We operate our own global network and server manufacturing
Keeping control on security, scalability & Quality of Service
P.U.E
1.09
1MServers
produced
(oct.2018)
33xPoints
of Presence
260K
Public cloud
instances
16
Tbps.total network
capacity
350K
Physical servers
Patented Watercooling and manufacturing Owned and operated Data centers Proprietary Network & Anti DDoS security
« No ingress/egress charges on your network usage. »

25
OVH Data Convergence Team
• Analytics Data Platform: A one-click pre-configured Hadoop stack designed to store and
process high volumes of data across OVH Public Cloud infrastructure.
• Analytics Data Compute: A one-time Apache Spark Cluster on the fly
• Analytics Data Collector: A cloud hosted agent to replicate, query and transport data

26
Demo:
Word Count of a big text file

27
Java + Swift Word Count Sample

28
Java + Amazon S3 Word Count Sample

29
Analytics Data Compute :
labs.ovh.com
docs.ovh.com

OVH Spark Cluster as a Service Overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to OVH Spark Cluster as a Service Overview

Similar to OVH Spark Cluster as a Service Overview (20)

Recently uploaded

Recently uploaded (20)

OVH Spark Cluster as a Service Overview