Overview of RedPoint Data Management for Hortonworks Hadoop

852 views
595 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
852
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
17
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Overview of RedPoint Data Management for Hortonworks Hadoop

  1. 1. Overview of RedPoint Data Management for Hortonworks Hadoop 2014
  2. 2. 1 RedPoint Global Inc.28 May 2014© Confidential What is Hadoop/Hadoop 2.0? Hadoop 1.0 • All operations based on Map Reduce • Intrinsic inconsistency of code based solutions • Highly skilled and expensive resources needed • 3rd party applications constrained by the need to generate code Lower cost scaling No need for structure Ease of data capture Hadoop 2.0 • Introduction of the YARN: “a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters.” • Mature applications can now operate directly on Hadoop • Reduce skill requirements and increased consistency
  3. 3. 2 RedPoint Global Inc.28 May 2014© Confidential Challenges to Hadoop Adoption • Severe shortage of MR skilled resources • Very expensive resources and hard to retain • Inconsistent skills lead to inconsistent results • Under utilizes existing resources • Prevents broad leverage of investments across enterprise Skills Gap • A nascent technology ecosystem around Hadoop • Emerging technologies only address narrow slivers of functionality • New applications are not enterprise class • Legacy applications have built short term capabilities Maturity & Governance • Data is not useful in its raw state, it must be turned into information • Benefit of Hadoop is that same data can be used from many perspectives • Analysts must now do the structuring of the data based on intended use of the data Data Into Information
  4. 4. 3 RedPoint Global Inc.28 May 2014© Confidential How RedPoint Helps First YARN compliant ETL/data quality toolset on the market – brings together both Big Data and traditional data to create Big Information! • Customer or Party Data • Processing Speed • Match Quality • Ease of Use by in: RANKED #1 The power to make your data the biggest asset your organization has
  5. 5. 4 RedPoint Global Inc.28 May 2014© Confidential RedPoint in a Hortonworks environment APPLICATIONSDATASYSTEMSOURCES OLTP, ERP, CRM Systems Documents, Emails Web Logs, Click Streams Social Networks Machine Generated Sensor Data Geolocation Data Repositories Governance &Integration Security Operations Data Access Data Management RDBMS EDW MPP Data Quality Data Integration One application, one graphical user interface for traditional and Big Data ELT  ETL  Cleanse  Match  De-dupe  Merge/Purge  Household Partition  Parse  Append  Standardize  Key  Automate  Monitor  Notify Pre-built adapters and ODBC drivers. Pure YARN application No MapReduce needed No in-cluster installation
  6. 6. 5 RedPoint Global Inc.28 May 2014© Confidential Monitoring and Management Tools Typical Hadoop architecture without RedPoint AMBARI MAPREDUCE REST DATA REFINEMENT HIVEPIG HTTP STREAM STRUCTURE HCATALOG (metadata services) Query/Visualization/ Reporting/Analytical Tools and Apps SOURCE DATA - Sensor Logs - Clickstream - Flat Files - Unstructured - Sentiment - Customer - Inventory DBs JMS Queue’s Fil es Fil esFiles Data Sources RDBMS EDW INTERACTIVE HIVE Server2 LOAD SQOOP FLUME WebHDFS NFS LOAD SQOOP/Hive Web HDFS YARN                                      n HDFS 1                                                  
  7. 7. 6 RedPoint Global Inc.28 May 2014© Confidential Monitoring and Management Tools Typical Hadoop architecture with RedPoint AMBARI MAPREDUCE REST DATA REFINEMENT HIVEPIG HTTP STREAM STRUCTURE HCATALOG (metadata services) Query/Visualization/ Reporting/Analytical Tools and Apps SOURCE DATA - Sensor Logs - Clickstream - Flat Files - Unstructured - Sentiment - Customer - Inventory DBs JMS Queue’s Fil es Fil esFiles Data Sources RDBMS EDW INTERACTIVE HIVE Server2 LOAD SQOOP WebHDFS Flume NFS LOAD SQOOP/Hive Web HDFS YARN                                      n HDFS 1                                                  
  8. 8. 7 RedPoint Global Inc.28 May 2014© Confidential

×