Overview of RedPoint Data Management for Hortonworks Hadoop
- 2. 1 RedPoint Global Inc.28 May 2014© Confidential
What is Hadoop/Hadoop 2.0?
Hadoop 1.0
• All operations based on Map Reduce
• Intrinsic inconsistency of code based
solutions
• Highly skilled and expensive resources
needed
• 3rd party applications constrained by the
need to generate code
Lower
cost
scaling
No need
for
structure
Ease of
data
capture
Hadoop 2.0
• Introduction of the YARN:
“a general-purpose, distributed, application
management framework that supersedes the classic
Apache Hadoop MapReduce framework for
processing data in Hadoop clusters.”
• Mature applications can now operate
directly on Hadoop
• Reduce skill requirements and increased
consistency
- 3. 2 RedPoint Global Inc.28 May 2014© Confidential
Challenges to Hadoop Adoption
• Severe shortage of MR
skilled resources
• Very expensive resources
and hard to retain
• Inconsistent skills lead to
inconsistent results
• Under utilizes existing
resources
• Prevents broad leverage
of investments across
enterprise
Skills Gap
• A nascent technology
ecosystem around
Hadoop
• Emerging technologies
only address narrow
slivers of functionality
• New applications are not
enterprise class
• Legacy applications have
built short term
capabilities
Maturity & Governance
• Data is not useful in its
raw state, it must be
turned into information
• Benefit of Hadoop is that
same data can be used
from many perspectives
• Analysts must now do
the structuring of the
data based on intended
use of the data
Data Into Information
- 4. 3 RedPoint Global Inc.28 May 2014© Confidential
How RedPoint Helps
First YARN compliant ETL/data quality
toolset on the market – brings together
both Big Data and traditional data to create
Big Information!
• Customer or Party Data
• Processing Speed
• Match Quality
• Ease of Use
by in:
RANKED
#1 The power to make
your data the biggest
asset your organization
has
- 5. 4 RedPoint Global Inc.28 May 2014© Confidential
RedPoint in a Hortonworks environment
APPLICATIONSDATASYSTEMSOURCES
OLTP, ERP,
CRM Systems
Documents,
Emails
Web Logs,
Click Streams
Social
Networks
Machine
Generated
Sensor
Data
Geolocation
Data
Repositories
Governance
&Integration
Security
Operations
Data Access
Data Management
RDBMS
EDW
MPP
Data Quality
Data Integration
One application, one graphical user interface for traditional and Big Data
ELT ETL Cleanse Match De-dupe Merge/Purge Household
Partition Parse Append Standardize Key Automate Monitor Notify
Pre-built adapters
and ODBC drivers.
Pure YARN application
No MapReduce needed
No in-cluster installation
- 6. 5 RedPoint Global Inc.28 May 2014© Confidential
Monitoring and Management Tools
Typical Hadoop architecture without RedPoint
AMBARI
MAPREDUCE
REST
DATA REFINEMENT
HIVEPIG
HTTP
STREAM
STRUCTURE
HCATALOG
(metadata services)
Query/Visualization/
Reporting/Analytical
Tools and Apps
SOURCE
DATA
- Sensor Logs
- Clickstream
- Flat Files
- Unstructured
- Sentiment
- Customer
- Inventory
DBs
JMS
Queue’s
Fil
es
Fil
esFiles
Data Sources
RDBMS
EDW
INTERACTIVE
HIVE Server2
LOAD
SQOOP
FLUME
WebHDFS
NFS
LOAD
SQOOP/Hive
Web HDFS
YARN
n
HDFS
1
- 7. 6 RedPoint Global Inc.28 May 2014© Confidential
Monitoring and Management Tools
Typical Hadoop architecture with RedPoint
AMBARI
MAPREDUCE
REST
DATA REFINEMENT
HIVEPIG
HTTP
STREAM
STRUCTURE
HCATALOG
(metadata services)
Query/Visualization/
Reporting/Analytical
Tools and Apps
SOURCE
DATA
- Sensor Logs
- Clickstream
- Flat Files
- Unstructured
- Sentiment
- Customer
- Inventory
DBs
JMS
Queue’s
Fil
es
Fil
esFiles
Data Sources
RDBMS
EDW
INTERACTIVE
HIVE Server2
LOAD
SQOOP
WebHDFS
Flume
NFS
LOAD
SQOOP/Hive
Web HDFS
YARN
n
HDFS
1