Infochimps: Cloud for Big Data


Published on

In this slidecast, Jim Kaskade from Infochimps presents: Cloud for Big Data.

"Infochimps was founded by data scientists and cloud computing experts. Our solutions make it faster, easier and far less complex to build and manage Big Data systems behind applications to quickly deliver actionable insights. With Infochimps Cloud, enterprises benefit from the fastest way to deploy Big Data applications in complex, hybrid cloud environments."

Learn more at:

View the presentation video:

Published in: Technology, Business
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Infochimps: Cloud for Big Data

  1. 1. Jim Kaskade, CEOJune 7, 2013
  2. 2. Agenda•  History of Infochimps•  Infochimps Cloud Services & Architecture•  Use Cases•  Cloud Deployment Models•  How to build Analytic Applications6/7/13 Infochimps Confidential 2
  3. 3. 6/7/13 Infochimps Confidential 3Big Data Is Not the CreatedContent, nor Is It Even ItsConsumptionIt Is the Analysis of All the DataSurrounding or Swirling Around ItMore Devices More ContentMore Applications More Access
  4. 4. Your Application+ =Building Data Driven ApplicationsNon-TraditionalDataSourcesEnterpriseDataSourcesData-DrivenApplications
  5. 5. What is the Infochimps Cloud?The Infochimps Cloud for Big Data is the fastest,easiest solution for performing data analytics atunlimited scale for enterprise companiesRun in real-timeWith Cloud::StreamsRun in BatchWith Cloud::HadoopInteractWith Cloud::Queries
  6. 6. Infochimps  Company  Milestones  Big  Data  Industry  Milestones  Univ  TX  Research  2004  Distributed  CompuBng  &  Data  AnalyBcs  2005  Google  Releases  Paper  on  MapReduce  2006   2007   2008   2009   2010   2011   2012   2013  Yahoo!  Creates  Hadoop  (Nutch)   Apache  Hadoop  0.10  Infochimps  Data  Marketplace  is  Born  Launched  (Hadoop-­‐based)  Very  Large  Network    Research  Complete    Big  Data  Stack  Real-­‐Time  to  Batch  Infochimps  Inc.  Data  Pla[orm  (Hadoop  +  NoSQL)  Infochimps  Big  Data  Cloud  V1.0  (Public)  OpenStack  Support  Storm  Support  Infochimps  Enterprise  Cloud  for  Big  Data  V2.0  (Virt  Private  &  Private)  VSphere  Support  Unified  Data  Processing  Framework  (Data  PaaS)  NoSQL  GeneraBon  (e.g.  CouchDB)  Hadoop  1.0  Apache  Release  Cloudera    Founded  Cassandra   Storm  Born  (Backtype)  MapR  Founded  Amazon  Web  Services  (IaaS)  AWS  EMR  Gartner  Defines  Big  Data  3  ‘V’s  Hortonworks  Founded  Digital  Universe  Study  Data  2x  /  2  yrs  Vmware    SerengeB  (uses  Ironfan)  Ironfan  Released  Impala  1.0  Released  In  Big  Data  since  2005  
  7. 7. Infochimps CloudServices
  8. 8. Cloud::StreamsCloud::HadoopCloud::QueriesCommand CenterCloud APIs & Developer ToolsEverything  That  You  Need  for    Big  Data  AnalyBcs.  
  9. 9. Variety, Velocity, & VolumeLOGTXTCSVXMLHTTPJSONInput DataCloud::StreamsYour ApplicationCommand CenterA  complete  managed  service  for  custom  analyBcs  in  the  public,  private,  or  hybrid  cloud.  Cloud::QueriesCloud::Hadoop
  10. 10. Cloud::StreamsLOGTXTCSVXMLHTTPJSONUniversalListenersDataQueueingJSONArchivingIn-StreamProcessingDownstreamData LoadingCloud::HadoopTuplesDirectData LoadingCloud::QueriesTuplesStreaming  AnalyBcs  happen  in  real  .me    ApplicationsYour Application
  11. 11. HBase orElasticsearchCloud::QueriesCloud::StreamsTupleCloud::Hadoop ArchivingAd  Hoc  and  AnalyBcs  on  aggregates.  Your Application
  12. 12. Cloud::HadoopArchivingHDFSHDFSHDFSData Science ClusterFileFileCloud::QueriesCloud::StreamsTupleRun  batch  analyBcs  against  all  of  your  historical  data.  ApplicationsYour Application
  13. 13. 6/7/13 Infochimps Confidential 13
  14. 14. Infochimps CloudArchitecture
  15. 15. Standard Reference PlatformHBaseElastic-searchHadoopCommandCenterPlatformAPIZabbixZookeepers Chef MySQLNFSBackupSchedulerDeploy Pack(Code Repository+ Deploy Scripts)Listener QueueStormHTTP(S)SyslogArchiveStorageStormAPI/TridentWukongArchive ViewerHadoopCLPigWukongHBaseAPIElasticsearchAPICommandCenterPlatformAPIPush to StormPush to HadoopCodeEditorYou  only  worry  about  a  Bny  part  of  the  overall  pla[orm.  
  16. 16. Infochimps CloudUse Cases
  17. 17. Sentiment AnalyticsCloud::Hadoop  Cloud::Queries  Three  scalable  analyBcs  services  in  one  unified  cloud.  
  18. 18. Why Cloud?
  19. 19. The Cloud Value Proposition?Infochimps has created an analyticinfrastructure that completely abstracts howand where data analysis executes.Your  data  scienBsts  focus  on  analyBcs  instead  of  infrastcture.  
  20. 20. Infrastructure Agnostic6/7/13 Infochimps Confidential 20Hybrid  Cloud  Support  is  Key.  
  21. 21. AnalyticApplicationDevelopment &Deployment
  22. 22. Ad HocAnalyticsBatchAnalyticsReal-TimeAnalyticsThe New Analytics Life Cycle
  23. 23. Public Cloud Virtual Private Cloud Private CloudIaaSDevelop & Test Locally with Wukong DSL &Application Deploy PacksAbstract to any cloud with Ironfan OrchestrationSaaSReal-timeWith Cloud::StreamsBatchWith Cloud::HadoopInteractWith Cloud::QueriesPaaS
  24. 24. 6/7/13 Infochimps Confidential 24DeploymentBusiness Discovery Information Discovery Big Data Architecture ✔ Identify Use-Cases ✔ Rank by Revenue Impact ✔ Data Sources ✔ Real-Time vs. Batch ✔ Stream Processing ✔ Ad-Hoc NoSQL ✔ Batch Analytics Scope Big Data Application End-to-End Data Flow Complete Build-out QA/Test PerfTune ✔ Customer Repo Established ✔ Reference Design Launched ✔ Configure data pipeline ✔ All Data Sources ✔ Deploy / Iterate Analytics ✔ Final “Deploy Pack” ✔ Load Any Historic Data ✔ SLA Monitoring System ✔ Stage to Production Deploy Big Data Application Manage Update Expand & Iterate ✔ Assigned Cust Srvc Rep ✔ 24x7x365 ‘Virtual NOC’ ✔ Receive Deploy Pack Updates ✔ Stage to Production ✔ Existing Application Expansion ✔ Next Application Use-Case ✔ Self-Sufficient Customers Drive New Revenue
  25. 25. Broad Industry Application2525Utilities§  Weather impact analysis onpower generation§  Transmission monitoring§  Smart grid managementRetail  §  360°  View  of  the  Customer  §  Click-­‐stream  analysis  §  Real-­‐Bme  promoBons  Law Enforcement§  Real-time multimodal surveillance§  Situational awareness§  Cyber security detectionTransporta.on  §  Weather  and  traffic  impact  on  logisBcs  and  fuel  consumpBon  Financial  Services  §  Fraud  detecBon  §  Risk  management  §  360°  View  of  the  Customer  IT§  Transition log analysisfor multipletransactional systems§  CybersecurityHealth  &  Life  Sciences  §  Epidemic early warningsystem§  ICU monitoring§  Remote healthcare monitoringTelecommunications§  CDR processing§  Churn prediction§  Geomapping / marketing§  Network monitoring