Programming the cloud with Skywriting

•

0 likes•353 views

An introduction to cloud programming models and the Skywriting project. Talk originally given at Imperial College, London, on 13th May 2010. More information about the Skywriting project can be found here: http://www.cl.cam.ac.uk/netos/skywriting/

Technology

Programming the cloud withSkywriting Derek Murray withMalteSchwarzkopf, Chris Smowton, Anil Madhavapeddy and Steve Hand

Outline State of the art Skywriting by example Iterative algorithms Heterogeneous clusters Speculative execution Performance case studies Future directions

Task Task Task Task Task farming Task Task Task

Task farming Master Worker Worker Worker

MapReduce Input Map Shuffle Reduce Output

Problem: iterative algorithms Not converged Task Converged

Problem: cluster heterogeneity Master Worker Worker Worker

Solution: Skywriting Turing-complete coordination language Support for spawning tasks Interface to external code Distributed execution engine Executes tasks in parallel on a cluster Handles failure, locality, data motion, etc.

Spawning a Skywriting task function f(arg1, arg2) { … } result = spawn(f, [arg1, arg2]);

Building a task graph function f(x, y) { … } function g(x, y){ … } function h(x, y) { … } a = spawn(f, [7, 8]); b = spawn(g, [a, 0]); c = spawn(g, [a, 1]); d = spawn(h, [b, c]); return d; f a a g g c b h d

Iterative algorithm current = …; do { prev = current; a = spawn(f, [prev, 0]); b= spawn(f, [prev, 1]); c = spawn(f, [prev, 2]); current = spawn(g, [a, b, c]); done = spawn(h, [current]); while (!*done);

$Aside: recursive algorithm function f(x) { if (/* x is small enough */) { return /* do something with x */; } else { x_lo = /* bottom half of x */; x_hi = /* top half of x */; return [spawn(f, [x_lo]), spawn(f, [x_hi])]; } }$

$Executing external code y = exec(executor_name, { “inputs” : [x1, x2, x3], … }, num_outputs); ,[object Object]$

Workers advertise “execution facilities”

Tasks migrate to necessary facilities,[object Object]

Speculative execution x = …; a = spawn(f, [x]); b= spawn(f, [x]); c= spawn(f, [x]); result =waituntil(any, [a, b, c]); return result[“available”];

Performance case studies All experiments used Amazon EC2 m1.smallinstances, running Ubuntu 8.10 Microbenchmark Smith-Waterman

Future work Distributed data structures Coping when the lists etc. get big Better language integration Compile to JVM, CLR, LLVM etc. Decentralised master-worker Run on multiple clouds Self-scaling clusters Add and remove workers as needed

What's hot

MBrace: Cloud Computing with F#

Eirik George Tsarpalis

Hw5 2017-spring

奕安陳

13. dynamic allocation

웅식 전

Py lecture5 python plots

Yoshiki Satotani

Introducton to Convolutional Nerural Network with TensorFlow

Etsuji Nakai

Introduction to TensorFlow

Ralph Vincent Regalado

Dynamic memory allocation in c

lavanya marichamy

Intoduction to dynamic memory allocation

Utsav276

Use the Matplotlib, Luke @ PyCon Taiwan 2012

Wen-Wei Liao

Malloc() and calloc() in c

Mahesh Tibrewal

Lecture 1 mte 407

rumanatasnim415

Lecture 1 mte 407

rumanatasnim415

16829 memory management2

Sidharth Sundaresan

DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma

Deltares

proj

Christina Huang

Build 2017 - B8037 - Explore the next generation of innovative UI in the Visu...

Windows Developer

Anders Nielsen template model-builder

David LeBauer

In this talk at AI Frontiers Conference, Rajat Monga shares about TensorFlow that has enabled cutting-edge machine learning research at the top AI labs in the world. At the same time it has made the technology accessible to a large audience leading to some amazing uses. TensorFlow is used for classification, recommendation, text parsing, sentiment analysis and more. This talk goes over the design that makes it fast, flexible, and easy to use, and describe how we continue to make it better.

Rajat Monga at AI Frontiers: Deep Learning with TensorFlow

AI Frontiers

Anders Nielsen AD Model-Builder

David LeBauer

Tensorflow windows installation

marwa Ayad Mohamed

What's hot (20)

MBrace: Cloud Computing with F#

Hw5 2017-spring

13. dynamic allocation

Py lecture5 python plots

Introducton to Convolutional Nerural Network with TensorFlow

Introduction to TensorFlow

Dynamic memory allocation in c

Intoduction to dynamic memory allocation

Use the Matplotlib, Luke @ PyCon Taiwan 2012

Malloc() and calloc() in c

Lecture 1 mte 407

16829 memory management2

DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma

proj

Build 2017 - B8037 - Explore the next generation of innovative UI in the Visu...

Anders Nielsen template model-builder

Rajat Monga at AI Frontiers: Deep Learning with TensorFlow

Anders Nielsen AD Model-Builder

Tensorflow windows installation

Similar to Programming the cloud with Skywriting

Advanced JavaScript

Zsolt Mészárovics

Not so long ago Microsoft announced a new language trageting on front-end developers. Everybody's reaction was like: Why?!! Is it just Microsoft darting back to Google?! So, why a new language? JavaScript has its bad parts. Mostly you can avoid them or workaraund. You can emulate class-based OOP style, modules, scoping and even run-time typing. But that is doomed to be clumsy. That's not in the language design. Google has pointed out these flaws, provided a new language and failed. Will the story of TypeScript be any different?

TypeScript Introduction

Dmitry Sheiko

Столпы функционального программирования для адептов ООП, Николай Мозговой

Sigma Software

Javascript status 2016

Arshavski Alexander

Introduction to Scalding and Monoids

Hugo Gävert

Go Programming Patterns

Hao Chen

Short intro to scala and the play framework

Felipe

Composition birds-and-recursion

David Atchley

ES6 - Next Generation Javascript

Ramesh Nair

ES6 is Nigh

Domenic Denicola

Modern frontend in react.js

Abdulsattar Mohammed

Pick up the low-hanging concurrency fruit

Vaclav Pech

Deep dive into deeplearn.js

Kai Sasaki

golang_getting_started.pptx

Guy Komari

By Andy Wingo. Andy will talk about forthcoming iterator and generator in JS: 1. Generator and interator seen from a JS developer perspective. What it is, why should I care? 2. Generator and iteragtor seen by a JS engine developer perspective. What does it imply in term for C++, performance consideration, how different is it from what exists already... 3. What does it means to implement new features in V8 (question driven)

function* - ES6, generators, and all that (JSRomandie meetup, February 2014)

Igalia

Python is a high level language focused on readability. The Python community developed the concept of "Pythonic Code", requiring not only semantic correctness, but also conformity to universally acknowledged stylistic criteria. A pre-requisite to write pythonic code is to write idiomatic code. Using the right idioms is a matter of acquired taste and experience, however, some idioms are quite easy to learn. This presentation focuses on some of these idioms and other stylistic criteria: * for vs. while * iterators, itertools * code conventions (space invaders) * avoid default values bugs * first order functions * internal/external iterators * substituting the switch statement * properties, attributes, read only objects * named tuples * duck typings * bits of metaprogramming * exception management: LBYL vs. EAFP

Pydiomatic

rik0

Python idiomatico

PyCon Italia

Program Assignment : Process Management Objective: This program assignment is given to the Operating Systems course to allow the students to figure out how a single process (parent process) creates a child process and how they work on Unix/Linux(/Mac OS X/Windows) environment. Additionally, student should combine the code for describing inter-process communication into this assignment. Both parent and child processes interact with each other through shared memory-based communication scheme or message passing scheme. Environment: Unix/Linux environment (VM Linux or Triton Server, or Mac OS X), Windows platform Language: C or C++, Java Requirements: i. You have wide range of choices for this assignment. First, design your program to explain the basic concept of the process management in Unix Kernel. This main idea will be evolved to show your understanding on inter-process communication, file processing, etc. ii. Refer to the following system calls: - fork(), getpid(), family of exec(), wait(), sleep() system calls for process management - shmget(), shmat(), shmdt(), shmctl() for shared memory support or - msgget(), msgsnd(), msgrcv(), msgctl(), etc. for message passing support iii. The program should present that two different processes, both parent and child, execute as they are supposed to. iv. The output should contain the screen capture of the execution procedure of the program. v. Interaction between parent and child processes can be provided through inter-process communication schemes, such as shared-memory or message passing schemes. vi. Result should be organized as a document which explains the overview of your program, code, execution results, and the conclusion including justification of your program, lessons you've learned, comments, etc. Note: i. In addition, please try to understand how the local and global variables work across the processes ii. read() or write () functions are used to understand how they work on the different processes. iii. For extra credit, you can also incorporate advanced features, like socket or thread functions, into your code. Examples: 1. Process Creation and IPC with Shared Memory Scheme ============================================================= #include <stdio.h> #include <sys/shm.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> int main(){ pid_t pid; int segment_id; //allocate the memory char *shared_memory; //pointer to memory const int size = 4096; segment_id = shmget(IPC_PRIVATE, size, S_IRUSR | S_IWUSR); shared_memory = (char *) shmat(segment_id, NULL, 0); pid = fork(); if(pid < 0) { //error fprintf(stderr, "Fork failed"); return 1; } else if(pid == 0){ //child process char *child_shared_memory; child_shared_memory = (char *) shmat(segment_id,NULL,0); //attach mem sprintf(child_shared_memory, "Hello parent process!"); //write to the shared mem shmdt(child_shared_memory); } else ...

Program Assignment Process ManagementObjective This program a.docx

wkyra78

Object-oriented Basics

Jamie (Taka) Wang

Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code—supporting symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, imperative DL frameworks encouraging eager execution have emerged at the expense of run-time performance. Though hybrid approaches aim for the “best of both worlds,” using them effectively requires subtle considerations to make code amenable to safe, accurate, and efficient graph execution. We present our ongoing work on automated refactoring that assists developers in specifying whether and how their otherwise eagerly-executed imperative DL code could be reliably and efficiently executed as graphs while preserving semantics. The approach, based on a novel imperative tensor analysis, will automatically determine when it is safe and potentially advantageous to migrate imperative DL code to graph execution and modify decorator parameters or eagerly executing code already running as graphs. The approach is being implemented as a PyDev Eclipse IDE plug-in and uses the WALA Ariadne analysis framework. We discuss our ongoing work towards optimizing imperative DL code to its full potential.

Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...

Raffi Khatchadourian

Similar to Programming the cloud with Skywriting (20)

Advanced JavaScript

TypeScript Introduction

Столпы функционального программирования для адептов ООП, Николай Мозговой

Javascript status 2016

Introduction to Scalding and Monoids

Go Programming Patterns

Short intro to scala and the play framework

Composition birds-and-recursion

ES6 - Next Generation Javascript

ES6 is Nigh

Modern frontend in react.js

Pick up the low-hanging concurrency fruit

Deep dive into deeplearn.js

golang_getting_started.pptx

function* - ES6, generators, and all that (JSRomandie meetup, February 2014)

Pydiomatic

Python idiomatico

Program Assignment Process ManagementObjective This program a.docx

Object-oriented Basics

Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...

Recently uploaded

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

Tracing the root cause of a performance issue requires a lot of patience, experience, and focus. It’s so hard that we sometimes attempt to guess by trying out tentative fixes, but that usually results in frustration, messy code, and a considerable waste of time and money. This talk explains how to correctly zoom in on a performance bottleneck using three levels of profiling: distributed tracing, metrics, and method profiling. After we learn to read the JVM profiler output as a flame graph, we explore a series of bottlenecks typical for backend systems, like connection/thread pool starvation, invisible aspects, blocking code, hot CPU methods, lock contention, and Virtual Thread pinning, and we learn to trace them even if they occur in library code you are not familiar with. Attend this talk and prepare for the performance issues that will eventually hit any successful system. About authorWith two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Victor Rentea

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

DBX First Quarter 2024 Investor Presentation

Dropbox

Angeliki Cooney has spent over twenty years at the forefront of the life sciences industry, working out of Wynantskill, NY. She is highly regarded for her dedication to advancing the development and accessibility of innovative treatments for chronic diseases, rare disorders, and cancer. Her professional journey has centered on strategic consulting for biopharmaceutical companies, facilitating digital transformation, enhancing omnichannel engagement, and refining strategic commercial practices. Angeliki's innovative contributions include pioneering several software-as-a-service (SaaS) products for the life sciences sector, earning her three patents. As the Senior Vice President of Life Sciences at Avenga, Angeliki orchestrated the firm's strategic entry into the U.S. market. Avenga, a renowned digital engineering and consulting firm, partners with significant entities in the pharmaceutical and biotechnology fields. Her leadership was instrumental in expanding Avenga's client base and establishing its presence in the competitive U.S. market.

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Angeliki Cooney

Accelerating FinTech Innovation: Unleashing API Economy and GenAI Vasa Krishnan, Chief Technology Officer - FinResults Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

apidays

Following the popularity of “Cloud Revolution: Exploring the New Wave of Serverless Spatial Data,” we’re thrilled to announce this much-anticipated encore webinar. In this sequel, we’ll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you’re building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Passkeys: Developing APIs to enable passwordless authentication Cody Salas, Sr Developer Advocate | Solutions Architect - Yubico Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

apidays

[BuildWithAI] Introduction to Gemini.pdf

Sandro Moreira

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

Dubai, often portrayed as a shimmering oasis in the desert, faces its own set of challenges, including the occasional threat of flooding. Despite its reputation for opulence and modernity, the emirate is not immune to the forces of nature. In recent years, Dubai has experienced sporadic but significant floods, testing the resilience of its infrastructure and communities. Among the critical lifelines in this bustling metropolis is the Dubai International Airport, a bustling hub that connects the city to the world. This article explores the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Orbitshub

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

In this keynote, Asanka Abeysinghe, CTO,WSO2 will explore the shift towards platformless technology ecosystems and their importance in driving digital adaptability and innovation. We will discuss strategies for leveraging decentralized architectures and integrating diverse technologies, with a focus on building resilient, flexible, and future-ready IT infrastructures. We will also highlight WSO2's roadmap, emphasizing our commitment to supporting this transformative journey with our evolving product suite.

Platformless Horizons for Digital Adaptability

WSO2

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MadyBayot

MINDCTI Revenue Release Quarter One 2024

MIND CTI

ICT role in 21st century education and its challenges

rafiqahmad00786416

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

How to Troubleshoot Apps for the Modern Connected Worker

DBX First Quarter 2024 Investor Presentation

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

[BuildWithAI] Introduction to Gemini.pdf

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

AWS Community Day CPH - Three problems of Terraform

Boost Fertility New Invention Ups Success Rates.pdf

Platformless Horizons for Digital Adaptability

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MINDCTI Revenue Release Quarter One 2024

ICT role in 21st century education and its challenges

Why Teams call analytics are critical to your entire business

Strategies for Landing an Oracle DBA Job as a Fresher

Programming the cloud with Skywriting

1. Programming the cloud withSkywriting Derek Murray withMalteSchwarzkopf, Chris Smowton, Anil Madhavapeddy and Steve Hand

2. Outline State of the art Skywriting by example Iterative algorithms Heterogeneous clusters Speculative execution Performance case studies Future directions

3. Task Task Task Task Task farming Task Task Task

4. Task farming Master Worker Worker Worker

6. Task farming A B runs before

7. MapReduce Input Map Shuffle Reduce Output

9. Dryad

10.

11.

12. Problem: iterative algorithms Not converged Task Converged

13. Problem: cluster heterogeneity Master Worker Worker Worker

14. Problem: cluster heterogeneity Master

15. Problem: cluster heterogeneity Master

16. Problem: speculative execution

17.

18. Solution: Skywriting Turing-complete coordination language Support for spawning tasks Interface to external code Distributed execution engine Executes tasks in parallel on a cluster Handles failure, locality, data motion, etc.

19. Spawning a Skywriting task function f(arg1, arg2) { … } result = spawn(f, [arg1, arg2]);

20. Building a task graph function f(x, y) { … } function g(x, y){ … } function h(x, y) { … } a = spawn(f, [7, 8]); b = spawn(g, [a, 0]); c = spawn(g, [a, 1]); d = spawn(h, [b, c]); return d; f a a g g c b h d

21. Iterative algorithm current = …; do { prev = current; a = spawn(f, [prev, 0]); b= spawn(f, [prev, 1]); c = spawn(f, [prev, 2]); current = spawn(g, [a, b, c]); done = spawn(h, [current]); while (!*done);

22. Iterative algorithm f f f g h f f f

23. Aside: recursive algorithm function f(x) { if (/* x is small enough */) { return /* do something with x */; } else { x_lo = /* bottom half of x */; x_hi = /* top half of x */; return [spawn(f, [x_lo]), spawn(f, [x_hi])]; } }

24.

25. Heterogeneous cluster support

26. Workers advertise “execution facilities”

27.

28. Speculative execution x = …; a = spawn(f, [x]); b= spawn(f, [x]); c= spawn(f, [x]); result =waituntil(any, [a, b, c]); return result[“available”];

29. Performance case studies All experiments used Amazon EC2 m1.smallinstances, running Ubuntu 8.10 Microbenchmark Smith-Waterman

30. Job creation overhead

31. Smith-Waterman data flow

32. Parallel Smith-Waterman

33. Parallel Smith-Waterman

34. Future work Distributed data structures Coping when the lists etc. get big Better language integration Compile to JVM, CLR, LLVM etc. Decentralised master-worker Run on multiple clouds Self-scaling clusters Add and remove workers as needed

35. Conclusions Universal programming model for cloud computing Runs real jobs with low overhead Lots more still to do!

36.

37. Derek.Murray@cl.cam.ac.uk

38. Project website (source code, tutorial, etc.)

39. http://www.cl.cam.ac.uk/research/srg/netos/skywriting/

40. http://tinyurl.com/skywritingproj

Editor's Notes

Thanks for the introduction, Eva. Well, as Eva said, my name’s Derek Murray, I’m a third year PhD student at Cambridge, and today I’m going to talk about Skywriting, which is a little bit of work I’ve been doing with these guys: Malte, Chris, Anil and my supervisor Steve Hand.Skywriting is a system for large-scale distributed computation – in this respect it’s similar to things like Google MapReduce and Microsoft’s Dryad – so that’s systems where your data or compute need is so big that you have to use a cluster in parallel to get the job done.It was the success of these systems – in particular Hadoop, the open-source MapReduce – that motivated us to start this work. What I found interesting was that people were using these things in entirely unexpected ways… taking MapReduce, which is excellent for log-processing, and running some big iterative machine learning algorithm on it. We reckoned that people were using MapReduce not because of its programming model, but despite it.So we set out to build something that combines all the advantages of previous systems, with a very flexible programming model. The result was Skywriting, so let’s see what you think…
All the systems we’ll discuss today use the simple notion of task parallelism. Many algorithms can be divided into tasks, which are just chunks of sequential code. The key observation is that two independent tasks can run in parallel. And when your whole job divides into a fully independent bag of tasks, it’s said to be “embarrassingly parallel”.
And how do you run these embarrassingly parallel jobs? Well, you give your bag of tasks to a master, which doles them out on demand to a set of workers.This is a very simple architecture to program. And it has a lot of benefits. If one of the workers crashes, fine! The master will notice and give that worker’s current task to someone else. And if a worker is a bit slower than the others, that’s also fine! Each worker pulls a new task when it has completed the last one, so even a heterogeneous pool can do useful work.
Embarrassing parallelism is not very interesting: it only lets you do boring things like search for aliens and brute-force people’s passwords.
It gets much more interesting – i.e. commercially useful – when the tasks have dependencies between them. So here, we have two tasks A and B, and a relation that says A must run before B. The usual reason for this is because A writes some output, and B wants to read it.Think of this like makefile rules. You can build up graphs out of these dependencies, and resolve them in parallel.In fact, the original name for this project was “Cloud Make”. Fortunately it changed….
Are you all familiar with MapReduce?Introduced by Google in 2004, MapReduce used the observation that the map() function from functional programming can run in parallel over large lists. So they broke down their huge data into chunks, and ran each through a “map task”, generating some key-value pairs that are then sorted by key in this shuffle phase, and then the values for each key are folded in parallel using a “reduce task”.This basically uses the same master-worker task farm that I showed on a previous slide, with the single constraint that all the map tasks must finish before the reduce tasks begin. Therefore it had the benefit of working at huge scale, and being very reliable.
A couple of years later, Microsoft, which also has a search engine, released “Dryad”, which generalisesMapReduce by allowing the user to specify a job as any directed acyclic graph. The graph has vertices – which are arbitrary sequential code in your favourite language – and channels, which could be files, in-memory FIFOs, TCP connections or whatever.Clearly you can implement MapReduce in Dryad, since it’s just a DAG. But Dryad makes things like Joins much easier, because a task can have multiple inputs.
So far, we can run any finite directed acyclic graph using Dryad. As the name suggests, however, Dryad is not terribly good at cyclic data flows.These turn up all the time in fields like machine learning, scientific computing and information retrieval. Take PageRank, for example, which involves repeatedly premultiplying a vector by a large sparse matrix representing the web. You keep doing this until you reach a fixpoint, and the PageRank vector has converged.At present, all you can do is submit one job after another. This is bad for a number of reasons. First of all, it’s very slow: MapReduce and Dryad are designed for batch submission, and so starting an individual job takes on the order of 30 seconds. If your iteration is shorter than that, you’re losing out on parallel speedup.It also introduces a co-dependency between the client and the cluster. Now the client, which is just some simple program that submits jobs to the cluster, has to stay running for the duration of the job, but since it’s outside the cluster, it gets none of the advantages of fault-tolerance, of data locality, of fair scheduling. Since the client now contains critical job state, it’s necessary to add all these features manually.
Remember our Master-worker architecture? Well, if you’ve ever tried to setup Hadoop or Dryad, you’ll know that you need to make sure all of the workers are the same, running the same operating system, on the same local network.
But what if all you have is a little ad-hoc cluster, with a Windows desktop, a Linux server and a Mac laptop?
Or, perhaps less contrived, what if your data are spread between different cloud providers. So you might have some data in Amazon S3, some in Google’s App Engine, and some in Windows Azure. Our mantra is “put the computation near the data”, and it’s not practical to shift all the data to one place.
And what about this? Say you have a really important task to complete, but you don’t know how long it’ll take – maybe you’re using some kind of randomised algorithm. So you fire off three copies of the same task… and eventually one finishes. At this point, you can just kill the other two.Although MapReduce and Dryad have limited support for this, it’s not first-class: you can’t do it on demand, only in response to “straggler” nodes that take much longer to complete than others.
I’ve spent quite a lot of slides being rather coy about what’s to come, but if you’ve read the abstract, you’ll know that Skywriting is
…two things. First, instead of using DAGs to describe a job, we use the most powerful thing available to us: a Turing-complete coordination language. This sounds ominous and theoretical, but actually it’s just a programming language that looks a lot like JavaScript, with all the usual control flow structures, loops, ifs, functions and so on.Since we want to run things efficiently in parallel, it has support for spawning tasks, and a way to call external code.The other main component is the distributed execution engine, which actually executes Skywriting programs in the cluster. The interesting thing about this is that a “task” is just a Skywriting function – a continuation to be more precise – which means that tasks can spawn other tasks, and thereby grow the job dynamically.
1.0 – 1.2 GHz Xeon or Opteron. 1.7GB RAM, 150GB disk.
50 x 50 on 50 workers.Input size is
Best score is 15x15 = 225 tasks, at 83 s (2.6x speedup).

Programming the cloud with Skywriting

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Programming the cloud with Skywriting

Similar to Programming the cloud with Skywriting (20)

Recently uploaded

Recently uploaded (20)

Programming the cloud with Skywriting

Editor's Notes