Pervasive DataRush

Pervasive DataRush TM for CDH Get Faster Results from Hadoop on Less Hardware

Pervasive Software Overview Global Software Company Tens of thousands of users across the globe Americas, EMEA, Asia ~240 employees worldwide Strong Financials $46 million revenue (Trailing 12-month) 42 consecutive quarters of profitability $36 million in the bank NASDAQ:PVSW since 1997 Leader in Data Innovation ~25% of top-line revenue re-invested in R&D Software to manage, integrate and analyze data, in the cloud or on-premises, throughout the entire lifecycle Highly Parallel Data-Intensive and Analytic Applications

Enterprises Need to Deal With Increased Data and Complexity Data Size Complexity HPC climate modeling seismic analysis fluid dynamics Internet scale web indexing web search 100GB 1TB 10TB 100TB 1PB 10PB 100PB Enterprise data data quality data analytics

Eliminating Big Data Bottlenecks Typical Big Data Bottlenecks Data Preparation profiling matching cleansing de-duplicating auditing Analytics modeling predicting searching discovering stop sampling Pervasive DataRush eliminates the performance bottlenecks of Data Preparation and Analytics Raw Data: Structured , Unstructured or Semi-structured Knowledge OPTIONAL : persistent storage

Pervasive DataRush™ V5.0 Scalable: Performance dynamically scales with increased core counts and increased nodes. High Throughput: Fast, deep analysis of large data sets with no limit on input data size. Cost Efficient: Maximum performance from commodity multicore servers, SMP systems and clusters. Full integration with Apache Hadoop. Easy to Implement: No complex parallel processing issues; visual and API level interfaces. Extensible: Extensible platform so you remain in control of development. … a patented, parallel dataflow platform that eliminates performance bottlenecks in your data-intensive applications

DataRush Use Cases Bioinformatics Next Generation Sequencing Nearly 1 TCUP throughput using Smith Waterman Scalable BFAST implementation Telecom Analyzing Call Data Records (network logs) Operational intelligence Fraud and waste detection Public Sector State income tax revenue recovery Cyber security Financial Services Mortgage analysis Healthcare Claims processing and analysis Fraud detection Network Analyzing network log data Cyber security

DataRush Makes Hadoop Faster, Easier, More Cost-Effective

Pervasive TurboRush for Hive Transparently increase performance of Hive queries Faster on less hardware with no new development Hadoop Distributed File System Hive Query Hive Pervasive TurboRush for Hive DataRush DataRush DataRush DataRush DataRush

“ Pervasive TurboRush for Hive is exciting news for SQL analysts looking to discover trends, patterns and value in their Big Data on Hadoop. The faster their HIVE SQL queries perform, the faster their insights are achieved. Our work with Pervasive is another important step in Karmasphere’s commitment to deliver products that help developers, analysts and business users unlock the power of Big Data on Hadoop and expand its use beyond IT departments.” Martin Hall, Co-founder, Chairman and Executive Vice President of Corporate Development, Karmasphere, Inc.

DataRush For Hadoop (1 of 2) Call DataRush from MapReduce MapReduce runs up to 4x faster Less boilerplate code to write Community Edition coming soon! Mapper Mapper Mapper Mapper Reducer Reducer DataRush DataRush DataRush DataRush DataRush DataRush Hadoop Distributed File System 33 mins 135 mins Malstone B 0.5 TB DataRush+MapReduce MapReduce

“ We’ve put Pervasive DataRush 5.0 to work inside wukong, our open source tool for executing Ruby tasks on Hadoop clusters , which we use to drive Infochimps’ production jobs on Amazon’s EC2. The result is exactly on-target: improving computational efficiency means we can get the largest jobs done using fewer resources in the cloud, which translates to hard-dollar cost savings .” Philip Kromer, CTO at Infochimps

DataRush For Hadoop (2 of 2) DataRush with Hadoop file system (HDFS) Alternative to MapReduce Accesses data using HDFS readers/writers Benchmarked up to 13x faster than MapReduce DataRush DataRush DataRush DataRush DataRush Hadoop Distributed File System 33 mins 135 mins Malstone B 0.5 TB 10 mins DataRush + HDFS DataRush+MapReduce Hadoop

Other Ways We Help Hadoop High-speed loaders for HBase, Avro and other Hadoop-based storage systems Hadoop Distributed File System DataRush

“ As more and more organizations adopt Hadoop to take on large scale data challenges, organizations risk wasting resources by scaling out with inefficient server workloads. Products like Pervasive DataRush can help increase the efficiency of each node in a Hadoop cluster to maximize throughput and help harness ever-growing data volumes.” David Menninger, VP and Research Director, Ventana Research

Summary Scale up and scale out Economical, environmental, and manageable Scale to big data Handle diverse, complex, massive data sets Scale development Easy for existing team to implement parallel applications Extensible platform keeps you in control Simplify how you develop Big Data applications

For More Information or to Download an Evaluation Copy http://www.pervasivedatarush.com [email_address] Follow us on Twitter: @ datarush Call +1 866 980 RUSH (7874) +1 512 231 6000

http://www.cloudera.com/partners

Pervasive DataRush

More Related Content

What's hot

Viewers also liked

Similar to Pervasive DataRush

Recently uploaded

Pervasive DataRush