Your SlideShare is downloading. ×
Data torrent meetup-productioneng
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Data torrent meetup-productioneng

720
views

Published on

DataTorrent presentation at http://www.meetup.com/SF-Bay-Area-Large-Scale-Production-Engineering/events/137185282/

DataTorrent presentation at http://www.meetup.com/SF-Bay-Area-Large-Scale-Production-Engineering/events/137185282/

Published in: Technology, Education

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
720
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Platform for Real-Time Production Operations Prepared for LSPE Meet-up November 21, 2013
  • 2. DataTorrent in Hadoop Ecosystem • Most powerful Hadoop platform for real-time stream computations • Massive Real-Time Production Monitoring, Analytics, and Alerting – Systems monitoring: Resource Utilization, Logs Analysis – Predictive Maintenance, DOS Attack, Launch Validation etc.
  • 3. DataTorrent Technology Stack Malhar – Open Source Operators and Apps Library (Apache v2 License) SLA Alerts Tools Web Services State Snapshot Security Scalability Fault Tolerance Partitioning Dynamic Modifications StrAM (Stream Application Master)
  • 4. DataTorrent’s Platform Differentiators . Extreme Scalability • • • Automatically scale to changing loads Sub-second latency with linear scalability Complex monitoring applications with massive computations Mission Critical • • • Built-in Stateful Faulttolerance. 24/7 uptime guaranteed Predictive Analysis, and trouble shooting Update your application while it's running! Hadoop-Native • • • Runs on your existing Apache Hadoop cluster. Develop faster with our open-source framework. Integrate seamlessly with your existing monitoring stack.
  • 5. Stream Processing Stream 3 Stream 1 Data Load Stream 4 Stream 2 Window 3 • • • • • Window 2 Window 1 A Stream is a sequence of data events with schema An Operator takes input streams and compute output streams An Application is a Directed Acyclic Graph (DAG) In-memory asynchronous distributed computations A Streaming Window is an atomic batch of sequential data events
  • 6. DataTorrent Hadoop GRID 1 4 3 2 DT Console dtCLI 6 5 Resource Manager NM MapReduce NM DT Gateway NM NM MapReduce StrAM MapReduce 3 1 MapReduce MapReduce 2 5 4 6 MapReduce
  • 7. Live Demonstration
  • 8. Open Sourced Production Operations Application Real-Time Dashboards and Actions • • • • • • DOS Attack Predictive maintenance of servers Pre and post Launch analysis 404 Response Root cause analysis for LAMP architecture Segmentation – – – – • Geo Location Gender, Age Resource usage (urls) Etc. URL Analysis – Response times – Patterns • Seamless integration into monitoring stacks
  • 9. How to get Started? • DataTorrent • Try Sandbox (https://datatorrent.com) • Free for small to medium enterprises: Contact us for details • Malhar Open Source (Apache 2.0) project • https://github.com/DataTorrent/Malhar • malhar-users@googlegroups.com • Applications available Jan 2014 • LogStream: Site Operations • Map-Reduce Monitor DataTorrent Inc. 3200 Partrick Henry, 2nd Fl Santa Clara, CA 95054 info@datatorrent.com www.datatorrent.com Twitter.com/DataTorrent Facebook.com/DataTorrent
  • 10. Platform Capabilities Scale able High Performance • Throughput in Billions Events/Sec • Latency in Milliseconds Powerful Tools • GUI For Cluster Performance Monitoring • GUI and Debuggers for Event Data • Test Framework, Certification, Versioning • CLI, Macros Easy To Use Fault-Tolerance • No State loss, No Message loss node outage recovery • State Management • Efficient State Checkpointing • Library of Operator Templates • Focus On Business Logic • Connectors to Current Tools • HDFS, Hbase, MySql, ActiveMQ • APIs for Tool Integrations Adaptability Native YARN Application • Runtime Scaling and Resource Optimization • Dynamic Application Modification •Integrates with Hadoop 2.0 Distributions •Apache, Cloudera, Hortonworks, MapR, Pivotal •Co-Exists with Existing Batch Infrastructure •Multi-Tenancy with Existing Hadoop Applications
  • 11. Appendix