Solution Spotlight Presents
Pervasive DataRush TM  for CDH Get Faster Results from Hadoop on Less Hardware
Pervasive Software Overview Global Software Company Tens of thousands of users across the globe Americas, EMEA, Asia  ~240 employees worldwide Strong Financials $46 million revenue (Trailing 12-month) 42 consecutive quarters of profitability $36 million in the bank NASDAQ:PVSW since 1997 Leader in Data Innovation  ~25% of top-line revenue re-invested in R&D Software to manage, integrate and analyze data, in the cloud or on-premises, throughout the entire lifecycle Highly Parallel Data-Intensive and Analytic Applications
Enterprises Need to Deal With Increased Data and Complexity Data Size Complexity HPC climate modeling seismic analysis fluid dynamics Internet scale web indexing web search 100GB 1TB 10TB 100TB 1PB 10PB 100PB Enterprise data data quality data analytics
Eliminating Big Data Bottlenecks Typical Big Data Bottlenecks Data Preparation profiling matching cleansing de-duplicating auditing Analytics modeling predicting searching discovering stop sampling  Pervasive DataRush eliminates the performance bottlenecks of Data Preparation and Analytics Raw Data: Structured ,  Unstructured or Semi-structured Knowledge OPTIONAL : persistent storage
Pervasive DataRush™ V5.0 Scalable:  Performance dynamically scales with increased core counts and increased nodes.  High Throughput:  Fast, deep analysis of large data sets with no limit on input data size. Cost Efficient:  Maximum performance from commodity multicore servers, SMP systems and clusters. Full integration with Apache Hadoop.  Easy to Implement:  No complex parallel processing issues; visual and API level interfaces. Extensible:  Extensible platform so you remain in control of development.  …  a patented, parallel dataflow platform that eliminates performance bottlenecks in your data-intensive applications
DataRush Use Cases Bioinformatics Next Generation Sequencing Nearly 1 TCUP throughput using Smith Waterman Scalable BFAST implementation Telecom Analyzing Call Data Records (network logs) Operational intelligence Fraud and waste detection Public Sector State income tax revenue recovery Cyber security Financial Services Mortgage analysis Healthcare Claims processing and analysis Fraud detection Network Analyzing network log data Cyber security
DataRush Makes Hadoop Faster, Easier, More Cost-Effective
Pervasive TurboRush for Hive Transparently increase performance of Hive queries Faster on less hardware with no new development Hadoop Distributed  File System Hive Query Hive Pervasive TurboRush for Hive DataRush DataRush DataRush DataRush DataRush
“ Pervasive TurboRush for Hive is exciting news for SQL analysts looking to discover trends, patterns and value in their Big Data on Hadoop. The faster their HIVE SQL queries perform, the faster their insights are achieved.  Our work with Pervasive is another important step in Karmasphere’s commitment to deliver products that help developers, analysts and business users unlock the power of Big Data on Hadoop and expand its use beyond IT departments.”  Martin Hall, Co-founder, Chairman and Executive Vice President of Corporate Development, Karmasphere, Inc.
DataRush For Hadoop (1 of 2) Call DataRush from MapReduce MapReduce runs up to 4x faster Less boilerplate code to write Community Edition coming soon! Mapper Mapper Mapper Mapper Reducer Reducer DataRush DataRush DataRush DataRush DataRush DataRush Hadoop Distributed  File System 33 mins 135 mins Malstone B 0.5 TB DataRush+MapReduce MapReduce
“ We’ve put Pervasive DataRush 5.0 to work inside wukong, our open source tool for executing Ruby tasks on  Hadoop clusters , which we use to drive Infochimps’ production jobs on Amazon’s EC2. The result is exactly on-target: improving computational efficiency means  we can get the largest jobs done using fewer resources in the cloud, which translates to hard-dollar cost savings .”  Philip Kromer, CTO at Infochimps
DataRush For Hadoop (2 of 2) DataRush with Hadoop file system (HDFS) Alternative to MapReduce Accesses data using HDFS readers/writers Benchmarked up to 13x faster than MapReduce DataRush DataRush DataRush DataRush DataRush Hadoop Distributed  File System 33 mins 135 mins Malstone B 0.5 TB 10 mins DataRush + HDFS DataRush+MapReduce Hadoop
Other Ways We Help Hadoop High-speed loaders for HBase, Avro and other Hadoop-based storage systems Hadoop Distributed  File System DataRush
“ As more and more organizations adopt Hadoop to take on large scale data challenges, organizations risk wasting resources by scaling out with inefficient server workloads. Products like Pervasive DataRush can help increase the efficiency of each node in a Hadoop cluster to maximize throughput and help harness ever-growing data volumes.” David Menninger, VP and Research Director, Ventana Research
Summary Scale up and scale out Economical, environmental, and manageable  Scale to big data Handle diverse, complex, massive data sets Scale development Easy for existing team to implement parallel applications Extensible platform keeps you in control Simplify how you develop Big Data applications
For More Information or to  Download an Evaluation Copy http://www.pervasivedatarush.com [email_address] Follow us on Twitter:  @ datarush Call  +1 866 980 RUSH (7874) +1 512 231 6000
http://www.cloudera.com/partners

Pervasive DataRush

  • 1.
  • 2.
    Pervasive DataRush TM for CDH Get Faster Results from Hadoop on Less Hardware
  • 3.
    Pervasive Software OverviewGlobal Software Company Tens of thousands of users across the globe Americas, EMEA, Asia ~240 employees worldwide Strong Financials $46 million revenue (Trailing 12-month) 42 consecutive quarters of profitability $36 million in the bank NASDAQ:PVSW since 1997 Leader in Data Innovation ~25% of top-line revenue re-invested in R&D Software to manage, integrate and analyze data, in the cloud or on-premises, throughout the entire lifecycle Highly Parallel Data-Intensive and Analytic Applications
  • 4.
    Enterprises Need toDeal With Increased Data and Complexity Data Size Complexity HPC climate modeling seismic analysis fluid dynamics Internet scale web indexing web search 100GB 1TB 10TB 100TB 1PB 10PB 100PB Enterprise data data quality data analytics
  • 5.
    Eliminating Big DataBottlenecks Typical Big Data Bottlenecks Data Preparation profiling matching cleansing de-duplicating auditing Analytics modeling predicting searching discovering stop sampling Pervasive DataRush eliminates the performance bottlenecks of Data Preparation and Analytics Raw Data: Structured , Unstructured or Semi-structured Knowledge OPTIONAL : persistent storage
  • 6.
    Pervasive DataRush™ V5.0Scalable: Performance dynamically scales with increased core counts and increased nodes. High Throughput: Fast, deep analysis of large data sets with no limit on input data size. Cost Efficient: Maximum performance from commodity multicore servers, SMP systems and clusters. Full integration with Apache Hadoop. Easy to Implement: No complex parallel processing issues; visual and API level interfaces. Extensible: Extensible platform so you remain in control of development. … a patented, parallel dataflow platform that eliminates performance bottlenecks in your data-intensive applications
  • 7.
    DataRush Use CasesBioinformatics Next Generation Sequencing Nearly 1 TCUP throughput using Smith Waterman Scalable BFAST implementation Telecom Analyzing Call Data Records (network logs) Operational intelligence Fraud and waste detection Public Sector State income tax revenue recovery Cyber security Financial Services Mortgage analysis Healthcare Claims processing and analysis Fraud detection Network Analyzing network log data Cyber security
  • 8.
    DataRush Makes HadoopFaster, Easier, More Cost-Effective
  • 9.
    Pervasive TurboRush forHive Transparently increase performance of Hive queries Faster on less hardware with no new development Hadoop Distributed File System Hive Query Hive Pervasive TurboRush for Hive DataRush DataRush DataRush DataRush DataRush
  • 10.
    “ Pervasive TurboRushfor Hive is exciting news for SQL analysts looking to discover trends, patterns and value in their Big Data on Hadoop. The faster their HIVE SQL queries perform, the faster their insights are achieved. Our work with Pervasive is another important step in Karmasphere’s commitment to deliver products that help developers, analysts and business users unlock the power of Big Data on Hadoop and expand its use beyond IT departments.” Martin Hall, Co-founder, Chairman and Executive Vice President of Corporate Development, Karmasphere, Inc.
  • 11.
    DataRush For Hadoop(1 of 2) Call DataRush from MapReduce MapReduce runs up to 4x faster Less boilerplate code to write Community Edition coming soon! Mapper Mapper Mapper Mapper Reducer Reducer DataRush DataRush DataRush DataRush DataRush DataRush Hadoop Distributed File System 33 mins 135 mins Malstone B 0.5 TB DataRush+MapReduce MapReduce
  • 12.
    “ We’ve putPervasive DataRush 5.0 to work inside wukong, our open source tool for executing Ruby tasks on Hadoop clusters , which we use to drive Infochimps’ production jobs on Amazon’s EC2. The result is exactly on-target: improving computational efficiency means we can get the largest jobs done using fewer resources in the cloud, which translates to hard-dollar cost savings .” Philip Kromer, CTO at Infochimps
  • 13.
    DataRush For Hadoop(2 of 2) DataRush with Hadoop file system (HDFS) Alternative to MapReduce Accesses data using HDFS readers/writers Benchmarked up to 13x faster than MapReduce DataRush DataRush DataRush DataRush DataRush Hadoop Distributed File System 33 mins 135 mins Malstone B 0.5 TB 10 mins DataRush + HDFS DataRush+MapReduce Hadoop
  • 14.
    Other Ways WeHelp Hadoop High-speed loaders for HBase, Avro and other Hadoop-based storage systems Hadoop Distributed File System DataRush
  • 15.
    “ As moreand more organizations adopt Hadoop to take on large scale data challenges, organizations risk wasting resources by scaling out with inefficient server workloads. Products like Pervasive DataRush can help increase the efficiency of each node in a Hadoop cluster to maximize throughput and help harness ever-growing data volumes.” David Menninger, VP and Research Director, Ventana Research
  • 16.
    Summary Scale upand scale out Economical, environmental, and manageable Scale to big data Handle diverse, complex, massive data sets Scale development Easy for existing team to implement parallel applications Extensible platform keeps you in control Simplify how you develop Big Data applications
  • 17.
    For More Informationor to Download an Evaluation Copy http://www.pervasivedatarush.com [email_address] Follow us on Twitter: @ datarush Call +1 866 980 RUSH (7874) +1 512 231 6000
  • 18.