Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ovh analytics data compute with apache spark as a service meetup ovh bordeaux

341 views

Published on

90% of the data in the world today has been created in the last two years. The world will be creating 163 zettabytes of data a year by 2025. So how do we want to process this volume of data?

Apache Spark is an open-source distributed general-purpose cluster computing framework that is trending today. But the problem is that how to create a computing cluster fast and efficient? Should I do all network configuration and cluster management myself? What should I do with my cluster if I don't need it anymore? Is my cluster secure?

After discovering Apache Spark principles and use cases, you will discover OVH Analytics Data Compute. A fast, secure, and efficient Spark Cluster as a Service which is going to give answers to all these questions.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Ovh analytics data compute with apache spark as a service meetup ovh bordeaux

  1. 1. 1 OVH Analytics Data Compute April 2019 Apache Spark Cluster as a Service Mojtaba Imani DevOps Cloud and Bigdata @OVH
  2. 2. 2 Deep Learning IoT Platform Artificial Intelligence Smart cities Edge Computing Autonomous Vehicles Blockchains Biochips Smart robots & drones BY 2025, data will be x10 the total datasphere produced till 2016 - IDC Survey (2017) « Data Age 2025 » The Cloud Transformation: A data odyssey… “Exponential growth of data, its costs & governance: IS YOUR BUSINESS READY FOR IT?”
  3. 3. 3 90 % of the whole data in the world has been created in last 2 years. Autonomous cars generate 20TB of data per hour
  4. 4. 4 Computing Cluster https://wiki.rc.hms.harvard.edu/display/O2/O2+HPC+Cluster+and+Computing+Nodes+Hardware
  5. 5. 5 Apache Spark • An open-source distributed general-purpose cluster-computing framework. • The largest open source community in big data • Up to 100 times faster than hadoop mapReduce (in-memory processing and lazy evaluation) • The leading platform for large-scale SQL, batch processing, stream processing, and machine learning • Easy to use API and coding with Java, Scala, Python, R, and SQL
  6. 6. 6 Apache Spark Cluster source: http://www.buyukveri.co/en/apache-spark-architecture/
  7. 7. 7 Computing Cluster https://wiki.rc.hms.harvard.edu/display/O2/O2+HPC+Cluster+and+Computing+Nodes+Hardware
  8. 8. 8 Problem? • Time and money • Computer and Network skills and knowledge • Maintenance • Spark version • Scale up • Idle times • Cloud Data Connection
  9. 9. 9
  10. 10. 10 Computing Cluster
  11. 11. 11 Cloud Virtual Computing Cluster
  12. 12. 12 Problem? • Time and money (installation and configuration) • Computer and Network skills and knowledge • Maintenance • Spark version • Scale up • Idle times • Cloud Data Connection
  13. 13. 13 Problem? • Time and money (installation and configuration) • Computer and Network skills and knowledge • Maintenance • Spark version • Scale up • Idle times • Cloud Data Connection ovh-spark-submit
  14. 14. 14 OVH Analytics Data Compute
  15. 15. 15 OVH Analytics Data Compute
  16. 16. 16 What is difference with original Spark-submit command?
  17. 17. 17 OVH Analytics Data Compute • Same command line and options as original Spark command line • Select Spark version ( --version 2.4.0) • Cluster is only accessible through HTTPS. • A new and dedicated cluster will be created for each request and it will be deleted after finishing the job. • You have the option to keep your cluster after finishing the job. ( --keep-infra) • Your cluster is isolated from internet. • Your cluster computers are created in your own Openstack project. • Results and output logs will be saved in swift of your Openstack project. • Input and output of data can be any source or format of data
  18. 18. 18 How to use? www.ovh.com - Espace Client
  19. 19. 19 Download and use command line: ovh-spark-submit Full manual: https://docs.ovh.com/gb/en/analytics-data-compute/labs/data-compute/getting-started-with-analytics-data-compute/
  20. 20. 20 Customer Feedbacks A French TV Channels provider: Analytics Data Compute helps us on defining new jobs without having impact on our production pipelines. A French Car Manufacturer: With such a huge amount of data we have, we needed a solution that could scale easily. Moreover, we did not needed a full hadoop stack. So, Analytics Data Compute was the perfect candidate. A French Bank : We had trouble with spark versions and spark cluster management. With Analytics Data Compute, we no longer have to bother with that kind of problems.
  21. 21. 21 Global cloud hyperscalers concentration cloud providers HQs dictate your data sovereignty options Freedom Act Cloud Act GDPR
  22. 22. 22 OVH, A GLOBAL HYPER-SCALE CLOUD PROVIDER KEY FACTS & FIGURES 1,500,000+ CUSTOMERS* 2200+ EMPLOYEES WORLDWIDE 1,5 BILLION € OF INVESTMENT OVER 5 YEARS 98% OF HOSTING ROOMS FREE FROM AIR CONDITIONING 28 DATACENTERS 350,000 SERVERS 5000+ Partners and communities *May 2018
  23. 23. 23 28 Datacenters Owned, operated and racked by OVH manufacturing units. From sheet metal to automated operations. Hillsboro r x1 Vint Hill r x1 Beauharnois r x7 Singapore r x1 Sydney r x1 Warsaw r x1 London r x1 Roubaix rx7 Gravelines rx2 Strasbourg rx4 Paris rx1 Frankfurt r x1
  24. 24. 24 We operate our own global network and server manufacturing Keeping control on security, scalability & Quality of Service P.U.E 1.09 1MServers produced (oct.2018) 33xPoints of Presence 260K Public cloud instances 16 Tbps.total network capacity 350K Physical servers Patented Watercooling and manufacturing Owned and operated Data centers Proprietary Network & Anti DDoS security « No ingress/egress charges on your network usage. »
  25. 25. 25 OVH Data Convergence Team • Analytics Data Platform: A one-click pre-configured Hadoop stack designed to store and process high volumes of data across OVH Public Cloud infrastructure. • Analytics Data Compute: A one-time Apache Spark Cluster on the fly • Analytics Data Collector: A cloud hosted agent to replicate, query and transport data
  26. 26. 26 Demo: Word Count of a big text file
  27. 27. 27 Java + Swift Word Count Sample
  28. 28. 28 Java + Amazon S3 Word Count Sample
  29. 29. 29 Analytics Data Compute : labs.ovh.com docs.ovh.com

×