0
StormThe Real-Time Layer Your Big    Data’s Been Missing        Dan Lynn      dan@fullcontact.com          @danklynn
Keeps Contact Information Current and Complete  Based in Denver, Colorado                              CTO & Co-Founder   ...
Turn Partial Contacts Into Full Contacts
Your data is old
Old data is confusing
First, there was the spreadsheet
Next, there was SQL
Then, there wasn’t SQL
Batch Processing        =   Stale Data
StreamingComputation
Queues / Workers
Queues / WorkersMessagesMessages Messages Messages  Messages  Messages          Queue   Messages                          ...
Queues / WorkersMe ssage rou tingcan be com ple x Messages Messages  Messages  Messages   Messages   Messages          Que...
Queues / Workers
Brittle
Hard to scale
Queues / Workers
Queues / Workers
Queues / Workers
Queues / WorkersRou ting mus t bereco nfig ured whe nsca ling out
Storm
StormDistributed and fault-tolerant real-time computation
StormDistributed and fault-tolerant real-time computation
StormDistributed and fault-tolerant real-time computation
StormDistributed and fault-tolerant real-time computation
Key Concepts
TuplesOrdered list of elements
Tuples           Ordered list of elements("search-01384", "e:dan@fullcontact.com")
StreamsUnbounded sequence of tuples
Streams        Unbounded sequence of tuplesTuple    Tuple   Tuple   Tuple   Tuple   Tuple
Spouts Source of streams
Spouts Source of streams
Spouts  Source of streamsTuple   Tuple   Tuple   Tuple   Tuple   Tuple
Spouts can talk with               some images from http://commons.wikimedia.org
Spouts can talk with•Queues                     some images from http://commons.wikimedia.org
Spouts can talk with•Queues•Web logs                      some images from http://commons.wikimedia.org
Spouts can talk with•Queues•Web logs•API calls                       some images from http://commons.wikimedia.org
Spouts can talk with•Queues•Web logs•API calls•Event data                       some images from http://commons.wikimedia....
BoltsProcess tuples and create new streams
Bolts                                                                                             Tuple                   ...
Bolts        some images from http://commons.wikimedia.org
Bolts•Apply functions / transforms                     some images from http://commons.wikimedia.org
Bolts•Apply functions / transforms•Filter                     some images from http://commons.wikimedia.org
Bolts•Apply functions / transforms•Filter•Aggregation                       some images from http://commons.wikimedia.org
Bolts•Apply functions / transforms•Filter•Aggregation•Streaming joins                       some images from http://common...
Bolts•Apply functions / transforms•Filter•Aggregation•Streaming joins•Access DBs, APIs, etc...                       some ...
TopologiesA directed graph of Spouts and Bolts
This is a Topology               some images from http://commons.wikimedia.org
This is also a topology                 some images from http://commons.wikimedia.org
TasksProcesses which execute Streams or Bolts
Running a Topology$ storm jar my-code.jar com.example.MyTopology arg1 arg2
Storm Cluster                Nathan Marz
Storm ClusterIf thi s we reHadoo p...                                 Nathan Marz
Storm ClusterIf thi s we reHadoo p...Job Tracke r                                 Nathan Marz
Storm ClusterIf thi s we reHadoo p...         Tas k Tracke rs         Nathan Marz
Storm ClusterBut it’s not Hado opCoo rdi nates eve ry thi ng                                       Nathan Marz
Example:Streaming Word Count
Streaming Word CountTopologyBuilder builder = new TopologyBuilder();builder.setSpout("sentences", new RandomSentenceSpout(...
Streaming Word CountTopologyBuilder builder = new TopologyBuilder();builder.setSpout("sentences", new RandomSentenceSpout(...
Streaming Word Countpublic static class SplitSentence extends ShellBolt implements IRichBolt {            public SplitSent...
Streaming Word Countpublic static class SplitSentence extends ShellBolt implements IRichBolt {            public SplitSent...
Streaming Word Countpublic static class SplitSentence extends ShellBolt implements IRichBolt {            public SplitSent...
Streaming Word CountTopologyBuilder builder = new TopologyBuilder();builder.setSpout("sentences", new RandomSentenceSpout(...
Streaming Word Countpublic static class WordCount extends BaseBasicBolt {    Map<String, Integer> counts = new HashMap<Str...
Streaming Word Count TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("sentences", new RandomSentenceSpou...
Shuffle groupingTuples are randomly distributed across all of the             tasks running the bolt
Fields groupingGroups tuples by specific named fields and routes             them to the same task
o p’s                    ado r               t o H av i o          o us b e hAn a lo g i ng   rt i t io npa              F...
Distributed RPC
Before Distributed RPC,time-sensitive queries relied on a      pre-computed index
What if you didn’t need an index?
Distributed RPC
Try it out!Huge thanks to Nathan Marz - @nathanmarz   http://github.com/nathanmarz/stormhttps://github.com/nathanmarz/stor...
Questions? dan@fullcontact.com
Upcoming SlideShare
Loading in...5
×

Storm - The Real-Time Layer Your Big Data's Been Missing

1,631

Published on

Presentation on Twitter Storm by Dan Lynn, CTO of FullContact, at GlueCon 2012.

Published in: Technology, Business
0 Comments
12 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,631
On Slideshare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
116
Comments
0
Likes
12
Embeds 0
No embeds

No notes for slide

Transcript of "Storm - The Real-Time Layer Your Big Data's Been Missing"

  1. 1. StormThe Real-Time Layer Your Big Data’s Been Missing Dan Lynn dan@fullcontact.com @danklynn
  2. 2. Keeps Contact Information Current and Complete Based in Denver, Colorado CTO & Co-Founder dan@fullcontact.com @danklynn
  3. 3. Turn Partial Contacts Into Full Contacts
  4. 4. Your data is old
  5. 5. Old data is confusing
  6. 6. First, there was the spreadsheet
  7. 7. Next, there was SQL
  8. 8. Then, there wasn’t SQL
  9. 9. Batch Processing = Stale Data
  10. 10. StreamingComputation
  11. 11. Queues / Workers
  12. 12. Queues / WorkersMessagesMessages Messages Messages Messages Messages Queue Messages Workers
  13. 13. Queues / WorkersMe ssage rou tingcan be com ple x Messages Messages Messages Messages Messages Messages Queue Messages Workers
  14. 14. Queues / Workers
  15. 15. Brittle
  16. 16. Hard to scale
  17. 17. Queues / Workers
  18. 18. Queues / Workers
  19. 19. Queues / Workers
  20. 20. Queues / WorkersRou ting mus t bereco nfig ured whe nsca ling out
  21. 21. Storm
  22. 22. StormDistributed and fault-tolerant real-time computation
  23. 23. StormDistributed and fault-tolerant real-time computation
  24. 24. StormDistributed and fault-tolerant real-time computation
  25. 25. StormDistributed and fault-tolerant real-time computation
  26. 26. Key Concepts
  27. 27. TuplesOrdered list of elements
  28. 28. Tuples Ordered list of elements("search-01384", "e:dan@fullcontact.com")
  29. 29. StreamsUnbounded sequence of tuples
  30. 30. Streams Unbounded sequence of tuplesTuple Tuple Tuple Tuple Tuple Tuple
  31. 31. Spouts Source of streams
  32. 32. Spouts Source of streams
  33. 33. Spouts Source of streamsTuple Tuple Tuple Tuple Tuple Tuple
  34. 34. Spouts can talk with some images from http://commons.wikimedia.org
  35. 35. Spouts can talk with•Queues some images from http://commons.wikimedia.org
  36. 36. Spouts can talk with•Queues•Web logs some images from http://commons.wikimedia.org
  37. 37. Spouts can talk with•Queues•Web logs•API calls some images from http://commons.wikimedia.org
  38. 38. Spouts can talk with•Queues•Web logs•API calls•Event data some images from http://commons.wikimedia.org
  39. 39. BoltsProcess tuples and create new streams
  40. 40. Bolts Tuple Tuple Tuple Tuple Tuple TupleTuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple some images from http://commons.wikimedia.org
  41. 41. Bolts some images from http://commons.wikimedia.org
  42. 42. Bolts•Apply functions / transforms some images from http://commons.wikimedia.org
  43. 43. Bolts•Apply functions / transforms•Filter some images from http://commons.wikimedia.org
  44. 44. Bolts•Apply functions / transforms•Filter•Aggregation some images from http://commons.wikimedia.org
  45. 45. Bolts•Apply functions / transforms•Filter•Aggregation•Streaming joins some images from http://commons.wikimedia.org
  46. 46. Bolts•Apply functions / transforms•Filter•Aggregation•Streaming joins•Access DBs, APIs, etc... some images from http://commons.wikimedia.org
  47. 47. TopologiesA directed graph of Spouts and Bolts
  48. 48. This is a Topology some images from http://commons.wikimedia.org
  49. 49. This is also a topology some images from http://commons.wikimedia.org
  50. 50. TasksProcesses which execute Streams or Bolts
  51. 51. Running a Topology$ storm jar my-code.jar com.example.MyTopology arg1 arg2
  52. 52. Storm Cluster Nathan Marz
  53. 53. Storm ClusterIf thi s we reHadoo p... Nathan Marz
  54. 54. Storm ClusterIf thi s we reHadoo p...Job Tracke r Nathan Marz
  55. 55. Storm ClusterIf thi s we reHadoo p... Tas k Tracke rs Nathan Marz
  56. 56. Storm ClusterBut it’s not Hado opCoo rdi nates eve ry thi ng Nathan Marz
  57. 57. Example:Streaming Word Count
  58. 58. Streaming Word CountTopologyBuilder builder = new TopologyBuilder();builder.setSpout("sentences", new RandomSentenceSpout(), 5);builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("sentences");builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
  59. 59. Streaming Word CountTopologyBuilder builder = new TopologyBuilder();builder.setSpout("sentences", new RandomSentenceSpout(), 5);builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("sentences");builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
  60. 60. Streaming Word Countpublic static class SplitSentence extends ShellBolt implements IRichBolt {            public SplitSentence() {        super("python", "splitsentence.py");    }    @Override    public void declareOutputFields(OutputFieldsDeclarer declarer) {        declarer.declare(new Fields("word"));    }    @Override    public Map<String, Object> getComponentConfiguration() {        return null;    }} SplitSentence.java
  61. 61. Streaming Word Countpublic static class SplitSentence extends ShellBolt implements IRichBolt {            public SplitSentence() {        super("python", "splitsentence.py");    }    @Override    public void declareOutputFields(OutputFieldsDeclarer declarer) {        declarer.declare(new Fields("word"));    }    @Override    public Map<String, Object> getComponentConfiguration() {        return null;    }} SplitSentence.java splitsentence.py
  62. 62. Streaming Word Countpublic static class SplitSentence extends ShellBolt implements IRichBolt {            public SplitSentence() {        super("python", "splitsentence.py");    }    @Override    public void declareOutputFields(OutputFieldsDeclarer declarer) {        declarer.declare(new Fields("word"));    }    @Override    public Map<String, Object> getComponentConfiguration() {        return null;    }} SplitSentence.java
  63. 63. Streaming Word CountTopologyBuilder builder = new TopologyBuilder();builder.setSpout("sentences", new RandomSentenceSpout(), 5);builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("sentences");builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word")); java
  64. 64. Streaming Word Countpublic static class WordCount extends BaseBasicBolt {    Map<String, Integer> counts = new HashMap<String, Integer>();    @Override    public void execute(Tuple tuple, BasicOutputCollector collector) {        String word = tuple.getString(0);        Integer count = counts.get(word);        if(count==null) count = 0;        count++;        counts.put(word, count);        collector.emit(new Values(word, count));    }    @Override    public void declareOutputFields(OutputFieldsDeclarer declarer) {        declarer.declare(new Fields("word", "count"));    }} WordCount.java
  65. 65. Streaming Word Count TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("sentences", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("sentences"); builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word")); javaGro upings con tro l how tup les are rou ted
  66. 66. Shuffle groupingTuples are randomly distributed across all of the tasks running the bolt
  67. 67. Fields groupingGroups tuples by specific named fields and routes them to the same task
  68. 68. o p’s ado r t o H av i o o us b e hAn a lo g i ng rt i t io npa Fields grouping Groups tuples by specific named fields and routes them to the same task
  69. 69. Distributed RPC
  70. 70. Before Distributed RPC,time-sensitive queries relied on a pre-computed index
  71. 71. What if you didn’t need an index?
  72. 72. Distributed RPC
  73. 73. Try it out!Huge thanks to Nathan Marz - @nathanmarz http://github.com/nathanmarz/stormhttps://github.com/nathanmarz/storm-starter @stormprocessor
  74. 74. Questions? dan@fullcontact.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×