0101010101010101010101010101010
010101010101010101010101010101
01010101010101010101010101010
0101010101010101010101010101
01010101010101010101010101
0101010101010101010101010
010101010101010101010101
01010101010101010101010
0101010101010101010101
01010101010101010101
010101010101010101
01010101010101010
0101010101010101
010101010101010
01010101010101
0101010101010
010101010101
01010101010
1010101010
010101010
10101010
0101010
101010
0101
101
Enterprise
Big Data
Turning Data
Into
Revenue
8/17/2013 2
Which Do You Prefer?
24 Months 30 Days
Over Budget 10% of Budget
Failed Big
Data Project
Creating
Huge Value
Real-Time
Ad-hoc
Batch
Applications
Cloud Infrastructure
Analytics
Public Virtual
Private
Private
Batch Analytics
Ad hoc Analytics
Real-time Analytics
Infochimps Big Data Platform
HBase
Elastic-
search
Hadoop
Command
Center
Platform
API
Zabbix
Zookeepers Chef MySQLNFS
Backup
Scheduler
Listener Queue
Storm
HTTP(S)
Syslog
Archive
Storage
You only worry about a tiny
part of the overall platform.
8/17/2013 Infochimps Confidential 6
8/17/2013 7
Hybrid Big Data Cloud
Public Virtual Private Private
#1
Big Data Cloud
Variety, Velocity, & Volume
LOGTXT
CSV
XML
HTTP
JSON
Input Data
Cloud::Streams
Your Application
Command Center
A complete managed service for
custom analytics in the
public, private, or hybrid cloud.
Cloud::Queries
Cloud::Hadoop
Cloud::Streams
LOGTXT
CSV
XML
HTTP
JSON
Universal
Listeners
Data
Queueing
JSON
Archiving
Downstream
Data Loading
Cloud::Hadoop
Tuples
Direct
Data Loading
Cloud::Queries
Tuples
Streaming Analytics
happen in real time
Applications
Your Application
HBaseor
Elasticsearch
Cloud::Queries
Cloud::Streams
Tuple
Cloud::Hadoop Archiving
Ad Hoc and Interactive
Analytics on aggregates.
Your Application
Cloud::Hadoop
Archiving
HDFS
HDFS
HDFS
Data ScienceCluster
File
File
Cloud::QueriesCloud::Streams
Tuple
Run batch analytics against
all of your historical data.
Applications
Your Application
Infochimps Cloud Pillars
Fast
• Completely Integrated &
Unified Architecture
• Deployed in hours
• Expanded in minutes
8/17/2013 Infochimps Confidential 13
Simple
• We focus on
Infrastructure Managed
Services
• Customers focus on data
& applications
Flexible
• Cloud Agnostic
• Modular
• Portable
• Open Standards Based
Scalable
• Elastic Cloud
Infrastructure
• Linearly Scalable Across
All Big Data Functions
• Enterprise Class

Infochimps Hadoop Summit 2013

Editor's Notes

  • #6 The only part you have to worry about is in the yellow circle. This is the same deploy pack that runs on your local machine for development.
  • #10 Key MessagesWe help you leverage the people and resources you already have.Infochimps Cloud eliminates all the implementation headaches caused by Big Data enabling your Big Data applications to be completed quickly and fully achieve their objectives.Working with Big Data shouldn’t require you to hire rocket scientists or send your team to 12 weeks of Hadoop boot camp. Infochimps and our partners empower your existing teams to implement any data-driven application your Big Data vision requires. How it WorksYour largest, fastest data sources are streamed in to the Infochimps cloud, where real-time transformation, aggregation, decoration, and matching can be done.Data is then saved to a database for querying (typically Elasticsearch, Hbase, or MySQL).Simultaneously, data is saved to Hadoop for things like historical processing.Technical PointsInfochimps is a managed cloud service provider, and handles everything except your application. Also, all your application has to worry about is pointing to the database for querying data.The ETA for records is far less than 5 seconds, and we have customers who have SLA’s of under 1 second.
  • #11 Key MessagesFrom: http://www.infochimps.com/infochimps-cloud/cloud-services/cloud-streams/Streaming data and real-time analytics -- Easily handle millions of events per second with in-stream ETL and analyticsIt’s not enough anymore to simply perform historical analysis and batch reports. In situations where you need to make well-informed decisions in real-time, the data and insights must also be timely and immediately actionable. Cloud::Streams lets you process data as it flows into your application, powering real-time dashboards and on-the-fly analytics and delivering data seamlessly to Hadoop clusters and NoSQL databases.Single-purpose ETL solutions are rapidly being replaced with multi-node, multi-purpose data integration platforms — the universal glue that connects systems together and makes Big Data analytics feasible. Cloud::Streams is a linearly scalable, fault-tolerant distributed routing framework for data integration, collection, and streaming data processing. Ready-to-go integration connectors allow you to tap into virtually any internal or external data source that your application needs.BenefitsEasily integrate with virtually any data source, both live/in-motion as well as bulk/at restProcess data as it flows, at scale – not only generating real-time insights, but also delivering data to databases and Hadoop clusters that has already been cleaned, transformed, and augmented/enhancedSolve any business use case with the ability to handle any complexity business logic and parallel stream computingWrite your analytics once when leveraging Wukong – then run in both real-time with Cloud::Streams and in batch with Cloud::Hadoop
  • #12 Key MessagesAd hoc and interactive analytics -- power your Big Data applications with data you can queryCloud::Queries, a cloud service delivered by Infochimps Cloud, enables advanced distributed text search, any-format document storage and database tables with more than 1B rows — structured and un-structured. Databases and data storage are provided as a cloud service, including worry-free database maintenance, updates and support. Depending on your application requirements, multiple storage technologies may be appropriate including NoSQL and New SQL databases such as HBase, Cassandra, Elasticsearch, MongoDB or even MySQL. Whatever your needs, with Cloud::Queries you’ll have the most powerful cloud database for the job, scaling to the needs of your business and providing APIs that will support your most demanding ad hoc and interactive queries and applications.BenefitsEliminate frustrations of large-scale database administration and data managementTight integration with Big Data processing workflows and delivery paths results for a truly comprehensive Big Data stackLinearly scalable, distributed systems support of the most demanding applications and analytics queries
  • #13 From http://www.infochimps.com/infochimps-cloud/cloud-services/cloud-hadoop/Key MessagesElastic Hadoop and large-scale batch analytics -- The easiest way to configure and manage Hadoop clusters in the cloudYour team recognizes the power that massively parallel data analysis can provide, and Hadoop is the standard to handle massively scalable data. Cloud::Hadoop, a cloud service delivered by Infochimps™ Cloud, is the ideal Hadoop solution. Turn clusters on at a moment’s notice with advanced elastic spin-up/spin-down capabilities, scale and customize on the fly and leverage tools such as Pig, Hive and Wukong that make Hadoop easier to use and much more useful for enterprises.BenefitsFocus on building applications and answering business questions, not on keeping an extremely complex Hadoop cluster happy and performantScale up to meet any data processing demand through superior elasticityBe more efficient with resources, while still having quick access to HDFS data, with instantly elastic and high performing clustersWrite your analytics once when leveraging Wukong, then run both in batch with Cloud::Hadoop and in real-time streaming with Cloud::Streams