Infochimps Hadoop Summit 2013


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The only part you have to worry about is in the yellow circle. This is the same deploy pack that runs on your local machine for development.
  • Key MessagesWe help you leverage the people and resources you already have.Infochimps Cloud eliminates all the implementation headaches caused by Big Data enabling your Big Data applications to be completed quickly and fully achieve their objectives.Working with Big Data shouldn’t require you to hire rocket scientists or send your team to 12 weeks of Hadoop boot camp. Infochimps and our partners empower your existing teams to implement any data-driven application your Big Data vision requires. How it WorksYour largest, fastest data sources are streamed in to the Infochimps cloud, where real-time transformation, aggregation, decoration, and matching can be done.Data is then saved to a database for querying (typically Elasticsearch, Hbase, or MySQL).Simultaneously, data is saved to Hadoop for things like historical processing.Technical PointsInfochimps is a managed cloud service provider, and handles everything except your application. Also, all your application has to worry about is pointing to the database for querying data.The ETA for records is far less than 5 seconds, and we have customers who have SLA’s of under 1 second.
  • Key MessagesFrom: data and real-time analytics -- Easily handle millions of events per second with in-stream ETL and analyticsIt’s not enough anymore to simply perform historical analysis and batch reports. In situations where you need to make well-informed decisions in real-time, the data and insights must also be timely and immediately actionable. Cloud::Streams lets you process data as it flows into your application, powering real-time dashboards and on-the-fly analytics and delivering data seamlessly to Hadoop clusters and NoSQL databases.Single-purpose ETL solutions are rapidly being replaced with multi-node, multi-purpose data integration platforms — the universal glue that connects systems together and makes Big Data analytics feasible. Cloud::Streams is a linearly scalable, fault-tolerant distributed routing framework for data integration, collection, and streaming data processing. Ready-to-go integration connectors allow you to tap into virtually any internal or external data source that your application needs.BenefitsEasily integrate with virtually any data source, both live/in-motion as well as bulk/at restProcess data as it flows, at scale – not only generating real-time insights, but also delivering data to databases and Hadoop clusters that has already been cleaned, transformed, and augmented/enhancedSolve any business use case with the ability to handle any complexity business logic and parallel stream computingWrite your analytics once when leveraging Wukong – then run in both real-time with Cloud::Streams and in batch with Cloud::Hadoop
  • Key MessagesAd hoc and interactive analytics -- power your Big Data applications with data you can queryCloud::Queries, a cloud service delivered by Infochimps Cloud, enables advanced distributed text search, any-format document storage and database tables with more than 1B rows — structured and un-structured. Databases and data storage are provided as a cloud service, including worry-free database maintenance, updates and support. Depending on your application requirements, multiple storage technologies may be appropriate including NoSQL and New SQL databases such as HBase, Cassandra, Elasticsearch, MongoDB or even MySQL. Whatever your needs, with Cloud::Queries you’ll have the most powerful cloud database for the job, scaling to the needs of your business and providing APIs that will support your most demanding ad hoc and interactive queries and applications.BenefitsEliminate frustrations of large-scale database administration and data managementTight integration with Big Data processing workflows and delivery paths results for a truly comprehensive Big Data stackLinearly scalable, distributed systems support of the most demanding applications and analytics queries
  • From MessagesElastic Hadoop and large-scale batch analytics -- The easiest way to configure and manage Hadoop clusters in the cloudYour team recognizes the power that massively parallel data analysis can provide, and Hadoop is the standard to handle massively scalable data. Cloud::Hadoop, a cloud service delivered by Infochimps™ Cloud, is the ideal Hadoop solution. Turn clusters on at a moment’s notice with advanced elastic spin-up/spin-down capabilities, scale and customize on the fly and leverage tools such as Pig, Hive and Wukong that make Hadoop easier to use and much more useful for enterprises.BenefitsFocus on building applications and answering business questions, not on keeping an extremely complex Hadoop cluster happy and performantScale up to meet any data processing demand through superior elasticityBe more efficient with resources, while still having quick access to HDFS data, with instantly elastic and high performing clustersWrite your analytics once when leveraging Wukong, then run both in batch with Cloud::Hadoop and in real-time streaming with Cloud::Streams
  • Infochimps Hadoop Summit 2013

    1. 1. 0101010101010101010101010101010 010101010101010101010101010101 01010101010101010101010101010 0101010101010101010101010101 01010101010101010101010101 0101010101010101010101010 010101010101010101010101 01010101010101010101010 0101010101010101010101 01010101010101010101 010101010101010101 01010101010101010 0101010101010101 010101010101010 01010101010101 0101010101010 010101010101 01010101010 1010101010 010101010 10101010 0101010 101010 0101 101 Enterprise Big Data Turning Data Into Revenue
    2. 2. 8/17/2013 2 Which Do You Prefer? 24 Months 30 Days Over Budget 10% of Budget Failed Big Data Project Creating Huge Value
    3. 3. Real-Time Ad-hoc Batch Applications Cloud Infrastructure Analytics Public Virtual Private Private
    4. 4. Batch Analytics Ad hoc Analytics Real-time Analytics
    5. 5. Infochimps Big Data Platform HBase Elastic- search Hadoop Command Center Platform API Zabbix Zookeepers Chef MySQLNFS Backup Scheduler Listener Queue Storm HTTP(S) Syslog Archive Storage You only worry about a tiny part of the overall platform.
    6. 6. 8/17/2013 Infochimps Confidential 6
    7. 7. 8/17/2013 7 Hybrid Big Data Cloud Public Virtual Private Private
    8. 8. #1 Big Data Cloud
    9. 9. Variety, Velocity, & Volume LOGTXT CSV XML HTTP JSON Input Data Cloud::Streams Your Application Command Center A complete managed service for custom analytics in the public, private, or hybrid cloud. Cloud::Queries Cloud::Hadoop
    10. 10. Cloud::Streams LOGTXT CSV XML HTTP JSON Universal Listeners Data Queueing JSON Archiving Downstream Data Loading Cloud::Hadoop Tuples Direct Data Loading Cloud::Queries Tuples Streaming Analytics happen in real time Applications Your Application
    11. 11. HBaseor Elasticsearch Cloud::Queries Cloud::Streams Tuple Cloud::Hadoop Archiving Ad Hoc and Interactive Analytics on aggregates. Your Application
    12. 12. Cloud::Hadoop Archiving HDFS HDFS HDFS Data ScienceCluster File File Cloud::QueriesCloud::Streams Tuple Run batch analytics against all of your historical data. Applications Your Application
    13. 13. Infochimps Cloud Pillars Fast • Completely Integrated & Unified Architecture • Deployed in hours • Expanded in minutes 8/17/2013 Infochimps Confidential 13 Simple • We focus on Infrastructure Managed Services • Customers focus on data & applications Flexible • Cloud Agnostic • Modular • Portable • Open Standards Based Scalable • Elastic Cloud Infrastructure • Linearly Scalable Across All Big Data Functions • Enterprise Class