• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data torrent meetup-productioneng

Data torrent meetup-productioneng



DataTorrent presentation at http://www.meetup.com/SF-Bay-Area-Large-Scale-Production-Engineering/events/137185282/

DataTorrent presentation at http://www.meetup.com/SF-Bay-Area-Large-Scale-Production-Engineering/events/137185282/



Total Views
Views on SlideShare
Embed Views



2 Embeds 3

https://twitter.com 2
https://dashboard.awedience.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Data torrent meetup-productioneng Data torrent meetup-productioneng Presentation Transcript

    • Platform for Real-Time Production Operations Prepared for LSPE Meet-up November 21, 2013
    • DataTorrent in Hadoop Ecosystem • Most powerful Hadoop platform for real-time stream computations • Massive Real-Time Production Monitoring, Analytics, and Alerting – Systems monitoring: Resource Utilization, Logs Analysis – Predictive Maintenance, DOS Attack, Launch Validation etc.
    • DataTorrent Technology Stack Malhar – Open Source Operators and Apps Library (Apache v2 License) SLA Alerts Tools Web Services State Snapshot Security Scalability Fault Tolerance Partitioning Dynamic Modifications StrAM (Stream Application Master)
    • DataTorrent’s Platform Differentiators . Extreme Scalability • • • Automatically scale to changing loads Sub-second latency with linear scalability Complex monitoring applications with massive computations Mission Critical • • • Built-in Stateful Faulttolerance. 24/7 uptime guaranteed Predictive Analysis, and trouble shooting Update your application while it's running! Hadoop-Native • • • Runs on your existing Apache Hadoop cluster. Develop faster with our open-source framework. Integrate seamlessly with your existing monitoring stack.
    • Stream Processing Stream 3 Stream 1 Data Load Stream 4 Stream 2 Window 3 • • • • • Window 2 Window 1 A Stream is a sequence of data events with schema An Operator takes input streams and compute output streams An Application is a Directed Acyclic Graph (DAG) In-memory asynchronous distributed computations A Streaming Window is an atomic batch of sequential data events
    • DataTorrent Hadoop GRID 1 4 3 2 DT Console dtCLI 6 5 Resource Manager NM MapReduce NM DT Gateway NM NM MapReduce StrAM MapReduce 3 1 MapReduce MapReduce 2 5 4 6 MapReduce
    • Live Demonstration
    • Open Sourced Production Operations Application Real-Time Dashboards and Actions • • • • • • DOS Attack Predictive maintenance of servers Pre and post Launch analysis 404 Response Root cause analysis for LAMP architecture Segmentation – – – – • Geo Location Gender, Age Resource usage (urls) Etc. URL Analysis – Response times – Patterns • Seamless integration into monitoring stacks
    • How to get Started? • DataTorrent • Try Sandbox (https://datatorrent.com) • Free for small to medium enterprises: Contact us for details • Malhar Open Source (Apache 2.0) project • https://github.com/DataTorrent/Malhar • malhar-users@googlegroups.com • Applications available Jan 2014 • LogStream: Site Operations • Map-Reduce Monitor DataTorrent Inc. 3200 Partrick Henry, 2nd Fl Santa Clara, CA 95054 info@datatorrent.com www.datatorrent.com Twitter.com/DataTorrent Facebook.com/DataTorrent
    • Platform Capabilities Scale able High Performance • Throughput in Billions Events/Sec • Latency in Milliseconds Powerful Tools • GUI For Cluster Performance Monitoring • GUI and Debuggers for Event Data • Test Framework, Certification, Versioning • CLI, Macros Easy To Use Fault-Tolerance • No State loss, No Message loss node outage recovery • State Management • Efficient State Checkpointing • Library of Operator Templates • Focus On Business Logic • Connectors to Current Tools • HDFS, Hbase, MySql, ActiveMQ • APIs for Tool Integrations Adaptability Native YARN Application • Runtime Scaling and Resource Optimization • Dynamic Application Modification •Integrates with Hadoop 2.0 Distributions •Apache, Cloudera, Hortonworks, MapR, Pivotal •Co-Exists with Existing Batch Infrastructure •Multi-Tenancy with Existing Hadoop Applications
    • Appendix