Pervasive DataRush


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Pervasive DataRush

  1. 1. Solution Spotlight Presents
  2. 2. Pervasive DataRush TM for CDH Get Faster Results from Hadoop on Less Hardware
  3. 3. Pervasive Software Overview <ul><li>Global Software Company </li></ul><ul><ul><li>Tens of thousands of users across the globe </li></ul></ul><ul><ul><li>Americas, EMEA, Asia </li></ul></ul><ul><ul><li>~240 employees worldwide </li></ul></ul><ul><li>Strong Financials </li></ul><ul><ul><li>$46 million revenue (Trailing 12-month) </li></ul></ul><ul><ul><li>42 consecutive quarters of profitability </li></ul></ul><ul><ul><li>$36 million in the bank </li></ul></ul><ul><ul><li>NASDAQ:PVSW since 1997 </li></ul></ul><ul><li>Leader in Data Innovation </li></ul><ul><ul><li>~25% of top-line revenue re-invested in R&D </li></ul></ul><ul><ul><li>Software to manage, integrate and analyze data, in the cloud or on-premises, throughout the entire lifecycle </li></ul></ul><ul><ul><li>Highly Parallel Data-Intensive and Analytic Applications </li></ul></ul>
  4. 4. Enterprises Need to Deal With Increased Data and Complexity Data Size Complexity <ul><li>HPC </li></ul><ul><li>climate modeling </li></ul><ul><li>seismic analysis </li></ul><ul><li>fluid dynamics </li></ul><ul><li>Internet scale </li></ul><ul><li>web indexing </li></ul><ul><li>web search </li></ul>100GB 1TB 10TB 100TB 1PB 10PB 100PB <ul><li>Enterprise data </li></ul><ul><li>data quality </li></ul><ul><li>data analytics </li></ul>
  5. 5. Eliminating Big Data Bottlenecks Typical Big Data Bottlenecks <ul><li>Data Preparation </li></ul><ul><li>profiling </li></ul><ul><li>matching </li></ul><ul><li>cleansing </li></ul><ul><li>de-duplicating </li></ul><ul><li>auditing </li></ul><ul><li>Analytics </li></ul><ul><li>modeling </li></ul><ul><li>predicting </li></ul><ul><li>searching </li></ul><ul><li>discovering </li></ul><ul><li>stop sampling </li></ul>Pervasive DataRush eliminates the performance bottlenecks of Data Preparation and Analytics Raw Data: Structured , Unstructured or Semi-structured Knowledge OPTIONAL : persistent storage
  6. 6. Pervasive DataRush™ V5.0 <ul><li>Scalable: Performance dynamically scales with increased core counts and increased nodes. </li></ul><ul><li>High Throughput: Fast, deep analysis of large data sets with no limit on input data size. </li></ul><ul><li>Cost Efficient: Maximum performance from commodity multicore servers, SMP systems and clusters. Full integration with Apache Hadoop. </li></ul><ul><li>Easy to Implement: No complex parallel processing issues; visual and API level interfaces. </li></ul><ul><li>Extensible: Extensible platform so you remain in control of development. </li></ul>… a patented, parallel dataflow platform that eliminates performance bottlenecks in your data-intensive applications
  7. 7. DataRush Use Cases <ul><li>Bioinformatics </li></ul><ul><ul><li>Next Generation Sequencing </li></ul></ul><ul><ul><li>Nearly 1 TCUP throughput using Smith Waterman </li></ul></ul><ul><ul><li>Scalable BFAST implementation </li></ul></ul><ul><li>Telecom </li></ul><ul><ul><li>Analyzing Call Data Records (network logs) </li></ul></ul><ul><ul><li>Operational intelligence </li></ul></ul><ul><ul><li>Fraud and waste detection </li></ul></ul><ul><li>Public Sector </li></ul><ul><ul><li>State income tax revenue recovery </li></ul></ul><ul><ul><li>Cyber security </li></ul></ul><ul><li>Financial Services </li></ul><ul><ul><li>Mortgage analysis </li></ul></ul><ul><li>Healthcare </li></ul><ul><ul><li>Claims processing and analysis </li></ul></ul><ul><ul><li>Fraud detection </li></ul></ul><ul><li>Network </li></ul><ul><ul><li>Analyzing network log data </li></ul></ul><ul><ul><li>Cyber security </li></ul></ul>
  8. 8. DataRush Makes Hadoop Faster, Easier, More Cost-Effective
  9. 9. Pervasive TurboRush for Hive <ul><li>Transparently increase performance of Hive queries </li></ul><ul><ul><li>Faster on less hardware with no new development </li></ul></ul>Hadoop Distributed File System Hive Query Hive Pervasive TurboRush for Hive DataRush DataRush DataRush DataRush DataRush
  10. 10. “ Pervasive TurboRush for Hive is exciting news for SQL analysts looking to discover trends, patterns and value in their Big Data on Hadoop. The faster their HIVE SQL queries perform, the faster their insights are achieved. Our work with Pervasive is another important step in Karmasphere’s commitment to deliver products that help developers, analysts and business users unlock the power of Big Data on Hadoop and expand its use beyond IT departments.” Martin Hall, Co-founder, Chairman and Executive Vice President of Corporate Development, Karmasphere, Inc.
  11. 11. DataRush For Hadoop (1 of 2) <ul><li>Call DataRush from MapReduce </li></ul><ul><ul><li>MapReduce runs up to 4x faster </li></ul></ul><ul><ul><li>Less boilerplate code to write </li></ul></ul><ul><ul><li>Community Edition coming soon! </li></ul></ul>Mapper Mapper Mapper Mapper Reducer Reducer DataRush DataRush DataRush DataRush DataRush DataRush Hadoop Distributed File System 33 mins 135 mins Malstone B 0.5 TB DataRush+MapReduce MapReduce
  12. 12. “ We’ve put Pervasive DataRush 5.0 to work inside wukong, our open source tool for executing Ruby tasks on Hadoop clusters , which we use to drive Infochimps’ production jobs on Amazon’s EC2. The result is exactly on-target: improving computational efficiency means we can get the largest jobs done using fewer resources in the cloud, which translates to hard-dollar cost savings .” Philip Kromer, CTO at Infochimps
  13. 13. DataRush For Hadoop (2 of 2) <ul><li>DataRush with Hadoop file system (HDFS) </li></ul><ul><ul><li>Alternative to MapReduce </li></ul></ul><ul><ul><li>Accesses data using HDFS readers/writers </li></ul></ul><ul><ul><li>Benchmarked up to 13x faster than MapReduce </li></ul></ul>DataRush DataRush DataRush DataRush DataRush Hadoop Distributed File System 33 mins 135 mins Malstone B 0.5 TB 10 mins DataRush + HDFS DataRush+MapReduce Hadoop
  14. 14. Other Ways We Help Hadoop <ul><li>High-speed loaders for HBase, Avro and other Hadoop-based storage systems </li></ul>Hadoop Distributed File System DataRush
  15. 15. “ As more and more organizations adopt Hadoop to take on large scale data challenges, organizations risk wasting resources by scaling out with inefficient server workloads. Products like Pervasive DataRush can help increase the efficiency of each node in a Hadoop cluster to maximize throughput and help harness ever-growing data volumes.” David Menninger, VP and Research Director, Ventana Research
  16. 16. Summary <ul><li>Scale up and scale out </li></ul><ul><ul><li>Economical, environmental, and manageable </li></ul></ul><ul><li>Scale to big data </li></ul><ul><ul><li>Handle diverse, complex, massive data sets </li></ul></ul><ul><li>Scale development </li></ul><ul><ul><li>Easy for existing team to implement parallel applications </li></ul></ul><ul><ul><li>Extensible platform keeps you in control </li></ul></ul><ul><li>Simplify how you develop Big Data applications </li></ul>
  17. 17. For More Information or to Download an Evaluation Copy <ul><li> </li></ul><ul><li>[email_address] </li></ul><ul><li>Follow us on Twitter: @ datarush </li></ul><ul><li>Call +1 866 980 RUSH (7874) </li></ul><ul><li>+1 512 231 6000 </li></ul>
  18. 18.