Slideshow transcript
Slide 1: Off the Grid Introduction to Grid Computing with GridGain QJUG February 2007 Tom Adams Nick Partridge Workingmouse Veitch Lister Consulting
Slide 2: Why are we here?
Slide 3: Large distributed application
Slide 4: Grid-based solution worked
Slide 5: Flow
Slide 6: Grid? • Multiple independent computing clusters which act like a "grid" (Wikipedia) • Many nodes, each node is indistinguishable from other nodes •Complete machines over co-located CPUs? •Multiple processes? •Commodity hardware? •Homogenous machines?
Slide 7: A tale of two grids
Slide 8: Partition data across grid
Slide 9: Partition processing across grid
Slide 10: http://www.jroller.com/nivanov/entry/grid_computing_compute_grid_data
Slide 11: Selection
Slide 12: Requirements • Callable from a Rails webapp •Real-time - synchronous responses less than 30 seconds •Large dataset - 100 GB (computation runs across all data)
Slide 13: Rails webapp • Simple document-literal web service • Ruby - soap4r • Java - GlassFish, Spring-WS •Not really interesting for this talk... see Brisbane.rb
Slide 14: Data • Read-only •Full control •45 TB (became 100 GB with pre-processing) •SQL? 3 tables, one query w/ 2 joins
Slide 15: Don’t want to roll our own
Slide 16: (Row) database good enough
Slide 17: And we can federate them
Slide 18: Result?
Slide 19: http://battellemedia.com/archives/2007_01.php
Slide 20: What about BigTable?
Slide 21: Column database
Slide 22: Result?
Slide 23: http://failblog.wordpress.com/2008/01/29/satellite/
Slide 24: Where are we?
Slide 25: Progress • Don’t need to distribute data no data grid •No off the shelf solutions that scale/go fast •Understand data better happy to roll our own as fallback
Slide 26: Data solution
Slide 27: Data • CSV files on filesystem (now binary) •Directories form indices •Data files broken up into chunks
Slide 28: What about the code? http://giapet.net/wp-content/uploads/2007/05/luluwtf.gif
Slide 29: Need to distribute the computation
Slide 30: Options?
Slide 31: Erlang
Slide 32: Scala
Slide 33: Java
Slide 34: Java frameworks • Hadoop •GridGain •Oracle Coherence •GigaSpaces •Terracotta •JavaSpaces/Jini •Shoal
Slide 35: GridGain
Slide 36: GridGain • “fully open source full-stack grid computing platform for Java” •Map/reduce-based computation •Easy to setup and use •Can be extended via SPI implementations •Just works •“Scalable” (we’ve had it up to 32 nodes)
Slide 37: Map/reduce
Slide 38: When does it work • When data is independent (pure/referentially transparent) •When data can be combined (reduce) based solely on input
Slide 39: foo foo:1 bar bar:1 foo bar bar bar:1 foo: 1 split bar baz baz map baz:1 reduce bar: 4 quux bar quux quux:1 baz: 2 baz bar bar bar:1 quux: 1 baz baz:1 bar bar:1
Slide 40: GridGain grid
Slide 41: foo bar foo: 1 bar baz bar: 4 quux bar baz: 2 baz bar quux: 1 Grid
Slide 42: foo bar foo: 1 bar baz bar: 4 ? quux bar baz: 2 baz bar quux: 1 bar: 2 foo bar baz: 1 bar baz quux: 1 foo: 1 quux bar bar: 2 baz bar baz: 1 Node Node
Slide 43: foo bar foo: 1 bar baz Master bar: 4 quux bar Node baz: 2 baz bar quux: 1 bar: 2 foo bar baz: 1 bar baz quux: 1 foo: 1 quux bar bar: 2 baz bar baz: 1 Node Node
Slide 44: foo bar quux bar bar baz baz bar foo: 1 Master Master bar: 2 bar: 2 Node Node baz: 1 baz: 1 quux: 1 foo bar baz bar quux bar bar baz Node Node
Slide 45: Did you say map/reduce?
Slide 46: foo bar foo: 1 bar baz Master bar: 4 quux bar reduce Node baz: 2 baz bar quux: 1 bar: 2 foo bar baz: 1 bar baz quux: 1 foo: 1 quux bar bar: 2 baz bar baz: 1 Node map map Node
Slide 47: Show me the types!
Slide 48: foo bar foo: 1 bar baz Master bar: 4 reduce[B, C](List[B], C, (C, B) quux bar Node → C) → List[C] 2 baz: baz bar quux: 1 bar: 2 foo bar baz: 1 bar baz quux: 1 foo: 1 quux bar bar: 2 baz bar baz: 1 map[A, B](List[A], Node A → B) → List[B] Node
Slide 49: Terminology
Slide 50: foo bar foo: 1 bar baz Master bar: 4 quux bar Node baz: 2 baz bar quux: 1 Task Result bar: 2 foo bar quux bar baz: 1 bar baz baz bar quux: 1 foo: 1 Job bar: 2 Job baz: 1 Node Node
Slide 51: foo bar foo: 1 bar baz Master bar: 4 quux bar Node baz: 2 baz bar quux: 1 Task Result foo bar baz bar Job Job bar baz quux bar Job Job Node Node
Slide 52: foo bar foo: 1 bar baz Master bar: 4 quux bar Node baz: 2 baz bar quux: 1 Task Result bar baz foo bar quux bar baz bar Job Job Job Job Node Node Node Node
Slide 53: What defines a grid?
Slide 54: IP MCast: 228.1.2.4 IP MCast: 228.1.2.5 Node Node Node Node Node Node
Slide 55: Failover
Slide 56: foo bar foo: 1 bar baz Master bar: 4 quux bar Node baz: 2 baz bar quux: 1 Task bar baz foo bar quux bar baz bar Job Job Job Job Node Node Node Node
Slide 57: foo bar foo: 1 bar baz Master bar: 4 quux bar Node baz: 2 baz bar quux: 1 Task bar baz foo bar quux bar baz bar Job Job Job Job Node X Node Node Node
Slide 58: foo bar foo: 1 bar baz Master bar: 4 quux bar Node baz: 2 baz bar quux: 1 Task bar baz quux bar baz bar Job bar foo Job Job Job Node XNode Node Node
Slide 59: foo bar foo: 1 bar baz Master bar: 4 quux bar Node baz: 2 baz bar quux: 1 Task foo bar bar baz Job quux bar baz bar Job Job Job X X Node Node Node Node
Slide 60: Task execution
Slide 61: http://www.gridgain.com/javadoc/org/gridgain/grid/GridTask.html
Slide 62: GridGain demo
Slide 63: The good, the bad, the ugly
Slide 64: Just works, fast, easy, extensible, scalable
Slide 65: Error messages, doco, code quality, coupling, odd APIs, management overview
Slide 66: Nomenclature, JMS?
Slide 67: References • http://wiki.workingmouse.com/ •http://www.gridgain.com/ •http://labs.google.com/papers/mapreduce.html






Add a comment on Slide 1
If you have a SlideShare account, login to comment; else you can comment as a guest- Favorites & Groups
Showing 1-50 of 1 (more)