Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010


Published on

Hadoop Summit 2010 - Application Track
ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds
Mark Davis, Kitenga

Published in: Technology
  • Be the first to comment

ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

  1. 1. ZettaVox: Content Mining and Analysis across Heterogeneous Compute Clouds <ul><li>Mark Davis </li></ul>Kitenga, Inc.
  2. 2. <ul><li>The Company </li></ul><ul><li>The Problem </li></ul><ul><li>The Solution </li></ul><ul><li>Demo </li></ul>Session Agenda
  3. 3. <ul><li>Kitenga 1,2 : (Maori) A view or perception </li></ul><ul><ul><li>2004-present </li></ul></ul><ul><ul><li>CTO: Mark Davis, InXight Software (Business Objects/SAP), Microsoft, Defense R&D </li></ul></ul><ul><ul><li>CEO: Anil Uberoi, Lucid Imagination, Amdocs, Sun </li></ul></ul>Kitenga 1 also a region in Uganda 2 also a bed-and-breakfast in Clevendon, Auckland <ul><ul><li>Solutions for Information Overload </li></ul></ul>2953 Bunker Hill Lane, Santa Clara, CA
  4. 4. Support Prediction Logic, Inc.
  5. 5. The Never-Ending Problem Multimedia Data Video Imagery Audio Sensor Streams Biometric data 3D Text Email Web pages Tweets Posts Enterprise Data Enterprise data CDRs Financial records Access logs
  6. 6. Solving the Problem is Hard Content mining analysts Machine learning specialists Information retrieval specialists Software Engineers Expensive and hard to find Parallel Supercomputers Racked clusters Systems management Enterprise storage solutions Gigabit switches Power management Text analytics Ontologies Database reporting tools ETL tools Business intelligence Open source components
  7. 7. <ul><li>Convert raw data into actionable intelligence </li></ul>Defense Intelligence Situation Reports Geotagged Imagery Improve Force Effectiveness ZettaVox Named Entity Extraction Image tagging Video analytics Linkage Analysis Network Visualization Search Hadoop, GPUs, HDFS, Hbase, SOLR
  8. 8. <ul><li>Increase speed of drug discovery </li></ul>Pharmaceutical R&D Patents Genetic Sequence Data Journal Articles Faster Discovery ZettaVox Biological Named Entity Extraction Author Name Extraction and Normalization Linkage Analysis Timelines Facetted Search Hadoop, HDFS, Hbase, GPUs, SOLR
  9. 9. ZettaVox <ul><li>Compose analysis workflows using out-of-the-box components </li></ul><ul><li>Interact with HDFS/Hadoop through Rich Internet Application </li></ul><ul><li>Monitor system progress </li></ul><ul><li>Visualize and analyze results </li></ul><ul><li>Batch mode via XML and JSON </li></ul><ul><li>Heterogenous compute resources </li></ul>
  10. 10. Heterogenous Compute Clouds 42 U ≈ 84-168 cores 2 PCIe slots 15 multiprocessors 480 cores $0.13-$0.35/Gflop Amazon AWS Rackspace Mosso Private Cloud
  11. 11. Author Analysis Solutions
  12. 12. Interact with HDFS
  13. 13. Monitor Analysis Jobs
  14. 14. Use and Visualize Results
  15. 15. ZettaVox Current Approach Slow analytics Methods don’t scale Expensive hardware Expensive software Capital investment Expertise investment Hadoop with GPU support Scalable SaaS Out-of-the-box expertise Rich user experience ZettaVox Internet-scale cloud and cluster-based content mining
  16. 16. Questions? <ul><li>Mark Davis </li></ul><ul><li>[email_address] </li></ul>