Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
変通 [hen-tsoo]
noun
1. Resourcefulness – the quality of being able to cope with a difficult situation
2. Adaptability – the...
WELCOME TO HENTSŪ
AGENDA
• Grid computing overview
• Trusted tools moving into public cloud
• Alternative cloud services
SOME BACKGROUND
TERMINOLOGY
• Public Cloud (AWS, Azure, Google)
• Private Cloud (Your datacentre)
• High Performance Computing (HPC)
• Gri...
WHAT IS PUBLIC CLOUD?
“A service provider makes resources, such as virtual machines, applications and
storage, available t...
WHAT IS GRID COMPUTING?
Traditional resource limitations:
• Data store performance
• PC Processor / Memory / Storage
• Net...
KEY CONCEPTS
The challenges The workflows
Number of tasks
Sizeofdata
Big Data
High Throughput
Computing
MapReduce
High Per...
CHOICE OF TOOLS AND PLATFORMS
TRUSTED TOOLS & PUBLIC
CLOUD
HARDWARE INFLEXIBILITY
• Buy 22 core processors at 2.2GHz or 6
core processors at 3.6GHz?
• Buy 8GB, 16GB or 32GB memory
m...
PROFILING MATLAB RESOURCE USAGE
• MATLAB uses one processor core at a
time (50% on a 2 vCPU machine). Use
parallel computi...
MATLAB GRID WITH
PUBLIC CLOUD
- Pay only for what you use
- Scale compute resource up
AND down
- Minimal capital outlay on...
A DAY IN A PUBLIC
CLOUD CLUSTER
0
20
40
60
80
100
120
140
160
180
Time
00:30:00
01:10:00
01:50:00
02:30:00
03:10:00
03:50:...
IDEAL CLUSTER SIZE?
0
200
400
600
800
1000
1200
1400
8 16 32 64 96 128 160 192 224
Seconds
Cores
Job Run time in seconds
I...
RUNNING MATLAB CLUSTER ON IAAS
AWS vCPUs are hyper-threaded™
Each vCPU is a hyper thread of an Intel Xeon core for 2nd gen...
GRID DEPLOYMENT OPTIONS
1. Infrastructure as a Service (IaaS) DIY
Spin up a compute cluster on VMs for additional capacity...
CLOUD HOSTED DATA AND
ANALYTICS AS A SERVICE
GOOGLE BIG DATA REFERENCE ARCHITECTURE
WHAT IS BIGQUERY?
Hadoop based “service that enables
interactive analysis of massively large
datasets”
• Distributed File ...
GOOGLE BIGQUERY AND
DATALAB DEMO
DON’T FORGET SECURITY
Security considerations:
• Secure transfer and storage of data and code
• Secure remote access to cl...
SUMMARY
• Traditional grid and HPC tools can benefit from moving into cloud
• Vast landscape of available tools
• Off-the-...
Hentsu Ltd
1 Fore Street
London EC2Y 9DT
hello@hentsu.com
https://hentsu.com
MORE INFORMATION?
NEXT EVENT:
JANUARY 2017
Intellectual Property (IP)
security for Public Cloud
Services
Securing mobile email and
cloud bas...
Upcoming SlideShare
Loading in …5
×

Infinitely Scalable Clusters - Grid Computing on Public Cloud - London

218 views

Published on

Slides from our recent workshop for hedge funds and a review of the cloud grid computing options. Included some live demos tackling 2TB of full depth market data using MATLAB on AWS, and Google BigQuery with Datalab.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Infinitely Scalable Clusters - Grid Computing on Public Cloud - London

  1. 1. 変通 [hen-tsoo] noun 1. Resourcefulness – the quality of being able to cope with a difficult situation 2. Adaptability – the ability to change (or be changed) to fit changed circumstances 3. Agility – the power of moving quickly and easily; nimbleness INFINITELY SCALABLE CLUSTERS Grid computing on public cloud
  2. 2. WELCOME TO HENTSŪ
  3. 3. AGENDA • Grid computing overview • Trusted tools moving into public cloud • Alternative cloud services
  4. 4. SOME BACKGROUND
  5. 5. TERMINOLOGY • Public Cloud (AWS, Azure, Google) • Private Cloud (Your datacentre) • High Performance Computing (HPC) • Grid computing • Compute cluster • Mathworks MATLAB • CPUs / Processors / Cores • RAM (processor storage) • Disk (physical storage) • IaaS (virtual hardware and networking) • PaaS (software services)
  6. 6. WHAT IS PUBLIC CLOUD? “A service provider makes resources, such as virtual machines, applications and storage, available to the general public.” • Utility model • No contracts • Shared hardware / multi tenant • Self managed
  7. 7. WHAT IS GRID COMPUTING? Traditional resource limitations: • Data store performance • PC Processor / Memory / Storage • Network bandwidth The researcher may wait a long time for results. • Grid computing moves the computational work from the PC to a cluster of servers • The cluster processes the data on behalf of the researcher and returns the results • Processing time is reduced • Larger datasets can be tackled
  8. 8. KEY CONCEPTS The challenges The workflows Number of tasks Sizeofdata Big Data High Throughput Computing MapReduce High Performance Computing Ingest Process Analyse Visualise Store
  9. 9. CHOICE OF TOOLS AND PLATFORMS
  10. 10. TRUSTED TOOLS & PUBLIC CLOUD
  11. 11. HARDWARE INFLEXIBILITY • Buy 22 core processors at 2.2GHz or 6 core processors at 3.6GHz? • Buy 8GB, 16GB or 32GB memory modules (RAM per core ratio)? • Graphical Processing Units (GPUs)? • How much local storage per server? • What network devices between servers (32 or 48 port switches?) • What size file server? 0 20 40 60 80 100 120 Monday Tuesday Wednesday Thursday Friday Saturday Sunday Jobsperday Date Grid usage varies depending on research priorities:
  12. 12. PROFILING MATLAB RESOURCE USAGE • MATLAB uses one processor core at a time (50% on a 2 vCPU machine). Use parallel computing toolkit for multicore PCs. • MATLAB stores all data in RAM, very little I/O while processing • I/O spike when writing out results SysInternals Process Explorer
  13. 13. MATLAB GRID WITH PUBLIC CLOUD - Pay only for what you use - Scale compute resource up AND down - Minimal capital outlay on hardware - Experiment with grid computing platforms quickly, cheaply and with no commitment
  14. 14. A DAY IN A PUBLIC CLOUD CLUSTER 0 20 40 60 80 100 120 140 160 180 Time 00:30:00 01:10:00 01:50:00 02:30:00 03:10:00 03:50:00 04:30:00 05:10:00 05:50:00 06:30:00 07:10:00 07:50:00 08:30:00 09:10:00 09:50:00 10:30:00 11:10:00 11:50:00 12:30:00 13:10:00 13:50:00 14:30:00 15:10:00 15:50:00 16:30:00 17:10:00 17:50:00 18:30:00 19:10:00 19:50:00 20:30:00 21:10:00 21:50:00 22:30:00 23:10:00 Workers Tasks in Queue - Cluster consisting 32x 4 cores - Max 128 worker nodes - Ramps up as jobs get submitted - Tears down nodes when jobs finished - Minimising costs when not in use
  15. 15. IDEAL CLUSTER SIZE? 0 200 400 600 800 1000 1200 1400 8 16 32 64 96 128 160 192 224 Seconds Cores Job Run time in seconds Ingest Process Analyse Visualise Store Optimise other parts of the workflow?
  16. 16. RUNNING MATLAB CLUSTER ON IAAS AWS vCPUs are hyper-threaded™ Each vCPU is a hyper thread of an Intel Xeon core for 2nd generation instance types (M4, M3, C4, C3, R3, HS1, G2, I2, and D2) https://aws.amazon.com/ec2/instance-types/ Azure does not overcommit memory or cores. vCPUs are physical cores. Azure does not use hyper-threading. https://aws.amazon.com/ec2/instance-types/
  17. 17. GRID DEPLOYMENT OPTIONS 1. Infrastructure as a Service (IaaS) DIY Spin up a compute cluster on VMs for additional capacity and new workloads 2. Burst Use existing on premises compute cluster and burst on cloud as required 3. Software as a Service (SaaS) Software vendors and Managed Service Providers provide their own SaaS solutions. Pay for compute and application software per hour 4. Platform as a Service (PaaS) Cloud providers’ data analytics platform as a service: Google BigQuery & Datalab, Microsoft HDInsight, Amazon EMR
  18. 18. CLOUD HOSTED DATA AND ANALYTICS AS A SERVICE
  19. 19. GOOGLE BIG DATA REFERENCE ARCHITECTURE
  20. 20. WHAT IS BIGQUERY? Hadoop based “service that enables interactive analysis of massively large datasets” • Distributed File System - Stores data that’s larger than can fit on a single machine • Map Reduce – Distributes processing across multiple systems http://blogs.forrester.com/mike_gualtieri/13-06-07-what_is_hadoop
  21. 21. GOOGLE BIGQUERY AND DATALAB DEMO
  22. 22. DON’T FORGET SECURITY Security considerations: • Secure transfer and storage of data and code • Secure remote access to cloud hosted environment • Secure authentication • Windows AD credentials • AWS IAM credentials • Google accounts • Microsoft accounts • Auditing (who accessed what, who changed what)
  23. 23. SUMMARY • Traditional grid and HPC tools can benefit from moving into cloud • Vast landscape of available tools • Off-the-shelf PaaS offerings • Integrations and ecosystems • Cheap and very quick to experiment
  24. 24. Hentsu Ltd 1 Fore Street London EC2Y 9DT hello@hentsu.com https://hentsu.com MORE INFORMATION?
  25. 25. NEXT EVENT: JANUARY 2017 Intellectual Property (IP) security for Public Cloud Services Securing mobile email and cloud based file storage

×