Your SlideShare is downloading. ×
Storm基础
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Storm基础

1,548
views

Published on

twitter storm基础介绍

twitter storm基础介绍

Published in: Technology, Education

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,548
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
61
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Storm基础 太奇
  • 2. Storm cluster Nimbus: 主控节 点,用于任务分 配,集群任务监控 等 Zookeeper:集群 中协调,共有数据 的存放(如心跳信 息) Supervisor:工作 机器,任务在这些 机器上执行
  • 3. conceptstupleStreamsSpoutsBoltsTopologies
  • 4. Tuple Field 1 Field 2 Field 3 Field 4 一个tuple表示流中一个基本的处理单元,例如一条 acookie日志,它可以包括多个field,每个field表示一 个属性
  • 5. Streams一个没有边界的连续的tuples,他们在分布式的系统 中可以被并行的处理与创建
  • 6. SpoutSpout是一个stream的源头。通常spout会从外部数据源读取数据并发送tuple到stream。例如:  a Kestrel queue  the Twitter Streaming API
  • 7. Bolt处理输入的流并产生新的输出流.Bolt可以用来做简单的stream转换,复杂的流处理/转换一般会分解为多步完成,所以会使用多个 bolt级联起来,每个bolt完成一些较简单的功能。一个bolt可以产生多个输出流。Bolt可以做什么:  Filtering  Functions  aggregations  joins  talking to databases  and more
  • 8. topology 由spout和bolt构成的网状图 实时处理程序在逻辑上构成 一个storm的拓扑 Storm 拓扑与传统任务的区 别: storm拓扑不终止的, 除非被杀死,它一值运行
  • 9. task  每个Spout和bolt都作为 很多task在集群中运行  每个task对应OS中的一 个线程  Stream groupings定义 如何把tuples从一组task 发向另一组task
  • 10. Task grouping当一个tuple被发送时,如何确定将 它发送到那个(些)task.  Shuffle grouping: 随即选择一个 task发送.  Fields grouping: 根据tuple的一部 分做一致性hash,相同的tuple被 发送到相同的task.  All grouping:发送到所有的task.  Global grouping: 由系统自行选 择,一般是选择task id最低的 taks发送.  None grouping:不关心tuple发送 到哪个task,等价于shuffle grouping.  Direct grouping: 直接将tuple发送 到指定的task
  • 11. 确保消息被处理 Spout tuple:由spout发送的tuple 一个spout tuple被完全处理: 所有与此spout tuple相关的(由它 派生)的tuple tree中的子tuple都 已经完全处理了 如果tuple tree在指定的时间内没 有被处理,spout tuple将会被重新 发送 Tuple tree Storm使用一种很有效的方式来跟 踪tuple tree 编程模型
  • 12. 确保消息被处理 A tuple isnt acked because the task died: In this case the spout tuple ids at the root of the trees for the failed tuple will time out and be replayed. Acker task dies: In this case all the spout tuples the acker was tracking will time out and be replayed. Spout task dies: In this case the source that the spout talks to is responsible for replaying the messages. For example, queues like Kestrel and RabbitMQ will place all pending messages back on the queue when a client disconnects
  • 13. Storm ui
  • 14.  Storm设计目标  Guaranteed data processing  Horizontal scalability  Fault-tolerance  No intermediate message brokers!  Higher level abstraction than message passing  “Just works” Storm现在所能做的:  Distributes code and configurations  Robust process management  Monitors topologies and reassigns failed tasks  Provides reliability by tracking tuple trees  Routing and partitioning of streams  Serialization  Fine-grained performance stats of topologies
  • 15. Storm今后的发展方向 更高层次的抽象  流的联合 (streaming join)  基于窗口的聚合(Windowed aggregations)  其他逻辑操作单元 (运行阶段)自动负载均衡 对纯non-java语言开发的支持 Online machine learning algorithms (Something like Mahout but for online algorithms) 对transaction的支持 (0.7.0) 性能测试相关的benchmark

×