3 f6 9_distributed_systems
Upcoming SlideShare
Loading in...5
×
 

3 f6 9_distributed_systems

on

  • 537 views

 

Statistics

Views

Total Views
537
Views on SlideShare
374
Embed Views
163

Actions

Likes
0
Downloads
3
Comments
0

1 Embed 163

http://www-sigproc.eng.cam.ac.uk 163

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

3 f6 9_distributed_systems 3 f6 9_distributed_systems Presentation Transcript

  • DISTRIBUTEDSYSTEMSA whirlwind tour Dean Sheehan CTO & SVP Product Strategy dsheehan@iwavesoftware.com
  • Contents•  Introduction•  Programming Models•  Remote Invocation•  Messaging•  Distributed Data•  Peer 2 Peer•  Distributed Hash Tables•  Cloud Computing•  Security•  And…
  • Introduction•  Who is Dean Sheehan? Ok, enough about me…•  What are Distributed Systems?•  “A collection of computing devices that communicate with each other by way of a network to achieve a common shared goal” •  Why do this?•  Why is this different from “A collection of threads in a computer that communicate with each other, by way of shared memory, to achieve a common goal”? View slide
  • Programming Models•  Writing systems that coordinate the actions of multiple computers involves writing programs that communicate with each other•  There are two general models:•  Implicit – hiding the distributed nature of the system from the programmer•  Explicit – requiring the programmer to work with the communicate as a first class programming problem View slide
  • Implicit ModelGiven that the ‘function’ ‘add’ exists with a signature like int add (int param1, int param2)and a segment of program ‘myprog’ { int result = add(50,50); }then what if we could do some magic that caused theexecution of ‘add’ by ‘myprog’ on computer ‘A’ to suddenlyappear as an execution ‘add’ within another instance of‘myprog’ on computer ‘B’
  • Remote Procedure Call•  This magic is generally referred to as RPC•  It forms the basis of •  Sun RPC •  DCE •  DCE++ •  CORBA •  DCOM •  Web Services •  Enterprise Service Buses •  Service Oriented Architectures
  • Proxy•  Take ‘myprog’•  Run it through a tool for creating distributed communication which turns the ‘add’ in myprog into•  int add (int parm1, int param2) { // package of data including param1 & param2 // send to other computer // wait for other computer to reply // unpack response return result; }
  • Stub•  We have our real ‘add’ function•  We have ‘myprog’ which has our real call on ‘add’•  The tool that created the Proxy that stands in for ‘add’ also creates a Stub that stands in for ‘myprog’{ // wait for message from another computer // unpack message int response = add(param1, param2); // package up response message // send response}
  • Proxies, Stubs & Marshalling marshal unmarshal Add Add Add MyProg Proxy Stub Function unmarshal marshal Computer A Network Computer B
  • Procedures or Objects•  Remote Procedure Call (RPC,DCE) just translate a call to a Procedure (Function) into a network hop and then a call of that function elsewhere•  Object Oriented languages like Java and C++ enable groups of functions to be associated with lumps of data (Objects).•  DCE++, CORBA, J2EE & DCOM etc. take RPC to the next level and maintain a notion of object identity in the ‘function’ call so it isn’t just ‘add’ it is ‘add’ on object instance ‘22’
  • Statefull & Stateless•  Thinking of distributed objects takes us into a discussion of state •  Data stored in memory or potentially on disks that aren’t universally accessible (not with the same access semantics anyway)•  If a distributed system holds a dialog between two computers where ‘state’ is built up as part of that dialog then it is considered stateful. •  Consider a shopping cart on an e-commerce site•  Implications are generally one of scalability and availability
  • Language Neutrality•  Take the notion of RPC•  Extract out the notion of ‘Interface Definition’ for ‘add’ in a programmatic language neutral fashion: •  IDL (CORBA) •  MIDL (DCOM) •  WSDL (Web Services) •  etc.•  Run ‘add interface definition’ to C++ Proxy tool•  Run ‘add interface definition’ to Java Stub tool•  Implement a C++ client and a Java server
  • Explicit Model•  Programmer now has to be aware that the ‘add’ function is on the end of a network somewhere•  Programmer of ‘myprog’ is responsible for marshalling that request and sending it somewhere as well as waiting for the response•  Programmer of ‘add’ is responsible for waiting for a message, unmarshalling it, processing it, marshaling response and send it to the desired recipient
  • Message Oriented Middleware•  Provides a layer of abstraction over the network•  Enables queue and topic based communication•  Guarantees and reliability•  Asynchronous and synchronous messaging•  Content Based Routing
  • Topics & Queues•  Topic is best thought of as a broadcast channel •  I publish a message to topic ‘cambridge.cycling’ and anyone listening gets to see the messages •  Partial (source) ordering •  Possible guaranteed delivery (persistent listeners) •  Publish (Pub) & Subscribe (Sub) is a general term used for this•  Queues are pipes where the messages are ‘consumable’, I don’t just ‘peek’ at the message along with all the other listeners, I pull it off and consume it so others can’t see it •  Partial (source) ordering •  Possible guaranteed delivery (much more common in Qs)
  • Products•  Unlike RPCs, standards are few and far between•  Java Message Service is probably most successful•  TIBCO Rendezvous•  IBM MQ (Message Queue)•  RabitMQ (AMQ)
  • Synchronous & asynchronous•  RPC (Implicit) model tends to be ‘synchronous’ •  Procedure calls are synchronous •  I place the call and I continue with the result •  Natural ‘linear’ programming model•  Messaging (Explicit) models encourages ‘asynchronous’ •  Send a message •  Send another message •  Have a body of code associated with receipt of a message •  May be response, may be an un-solicited message•  Pros & Cons
  • Web Services•  Nothing special•  WSDL is the ‘Service Definition’•  Language bindings and tools turn the WSDL into a form of Proxy and Stub•  Tend to thing of WSDL over HTTP but can be over any piece of ‘wet string’ technically
  • REST•  Very popular, probably the most popular ‘network’ API in use today•  Stateless•  URI based•  GET, POST, DELETE HTTP Operations•  XML and JSON are popular marshaling formats
  • Implicit v Explicit•  What are the advantages of the implicit model•  What are its disadvantages•  Is the explicit model just the reverse
  • Data•  What is special about data in distributed systems?•  What things can we rely on in non-distributed systems where data is concerned?
  • Transactions•  Consider a banking application that moves money from one account to another •  Simple debit here and a credit there•  There are transactional concerns when it is a single system, a single computer program and a single database •  These can all be handled by any serious database system •  ACID•  Now what if it is a transaction spanning two systems in two different banks with two separate databases!
  • 2-Phase Commit•  Single phase commit works just fine for a single database•  It doesn’t work for 2+ databases•  The commit negotiation is more complicated
  • Peer 2 Peer•  Traditional systems tend to be master / slave, client / server, service oriented etc.•  These are all structured and hierarchical, organized, in nature•  There is a bread of distributed systems where the players are mostly equal - peers
  • Social Networking•  Firewalls get in the way of our social networking •  I can open connections from my home machine to a central server but I don’t want anyone opening connects to my computer•  Does Skype, Yahoo, IM, BitTorrent (you name it) have massive central servers for allowing all the clients to connect and have traffic routed through them•  What are the issues with that model
  • Network Tunneling•  Computer A is behind a firewall and wants to talk to Computer B•  Computer B is behind a firewall and wants to talk to Computer A•  Computer X is not behind a firewall and it will help A set up a connection to B X 1 2 3 3 4 4 A 5 B
  • Distributed Hash Tables•  Maintaining a massive table {key->value} spread across multiple nodes with high availability•  From any one of the representative nodes, ask it to get K1 or put {K1,V1} •  DHT algorithm knows how to narrow in on the node that is actually currently responsible for K1 value storage •  Different algorithms have different costs/benefits •  New node entry •  Existing node exit (planned or otherwise) •  Read •  Write•  Interesting algorithms •  CHORD •  CAN •  Many more
  • Cloud Computing•  Means lots of things but lets just look at what is generally referred to as High Performance Computing (HPC)•  Tens, hundreds, thousands, maybe more, of machines being used to solve problems•  An explicit model of distribution and very master/slave oriented •  Break the work up into pieces •  Have those pieces executed all over the place •  Collect the results•  Requires large degree of independence in the data and processing to be performed
  • Task Engines•  Java (Tuple) Spaces (Gigaspaces)•  Sun Grid Engine•  Platform•  Data Synapse (TIBCO now)•  All take the workload, as expressed in a number of tasks to perform or chunks of data to process, and farm the work out to many machines (a compute farm)•  Can get as far as Desktop Scavenging
  • Map Reduce•  Essentially a Grid Engine•  However, the work isn’t explicitly packaged up into parcels by the program•  The program describes the packaging algorithm (map) and how to combine the results (reduce)•  The Map Reduce (Hadoop for example) takes the full dataset, carves it into pieces using the provided Map function, distributes the pieces out for processes (possible further map and reduce) and then combines the pieces using the supplied Reduce
  • Security•  Authentication•  Privacy•  Integrity•  Non-repudiation•  Authorisation – this one causes most of the problems•  Who is the security ‘principle’ to be authorized when an activity spans multiple systems?
  • And… what about?•  Grid, or Big, Data•  Distributed Caching•  Distributed Deadlocks•  Enterprise Java Beans, Session Beans, Message Beans•  Pi-Calculus•  Virtualization