Data Sheet
Data Blending
Data blending allows you to blend or mix data in any way you need to
comprehensively answer complex business questions. It doesn’t matter if
data resides in the cloud or in your database, streams from IoT or social
media, or is shared or bought through third-party sources such as
partners or data brokers. You can blend it all easier and faster using
automated data blending toolsets than you can through traditional means
requiring specialized talent, labor intensity, and more time than any
company can spare.
With the right data blending toolsets, your organization can access,
integrate, cleanse, and consume data in record time and with a high
degree of accuracy and security.
Not all data blending toolsets are created equal, however. Some are even
locked-in to proprietary technologies that can restrict your options now
and later. That’s why the Open Source movement is so strong in Big Data
today.
But even Open Source choices can lead to an unforeseen lock-in. The
better strategy is to select data blending toolsets that offer the most
interoperability options and the most flexibility in analytics use while also
delivering the fastest and safest automated blending capabilities.
Overview
Data Blending Is the Process of…
• Gathering data from various sources
• Combining useful data into a functioning data set
• Eliminating unnecessary data
• Enabling uncommon types of data to coexist in a common data repository
• Combine and analyze a larger
variety of data – structured,
semi-structured and
unstructured – from a variety
of sources free of data format
and data physical location
• Gain truer, more
comprehensive insights over
single format, single source
data analysis. data analysis.
Things you can do with the
right data blending tool
• Simplify data prep and reduce
data prep time which means
you can make the data
available for analysis faster.
• Gain more control over the
data through automated error
control, data quality
monitoring, security alerts,
and pipeline management
ranging from pause to recover
and cleanup controls.
• Increase data democratization
throughout the company by
enabling self-service data
selection and analysis and
eliminating the need for
special skills such as coding.
Data Blending Solution
Accelerate your time-to-insight with our unique data blending process that
stands apart from competing products by being Open Source based, replacing a
traditional ETL approach with a modern self-service driven data blending
approach, providing data quality controls, including baked-in security, and
superior workload control via pause, quick resume, recover and other data
pipeline controls.
All of it is user-friendly with no need of special skills such as coding, and easily
executed through an intuitive user interface. With a few clicks, users can blend
diverse data sets irrespective of format or the physical location of your data.
Specifically, our comprehensive toolset does the following:
• Provides an automated mechanism to build different data sets through
blended data pipelines per your organization’s analytical needs
• Includes an intuitive workflow design with an easy to use drag-and-drop
interface
• Allows you to build actionable analytic dataset with just a few clicks
• Alerts you to critical issues through reports, metrics and analysis
• Enables monitoring and measurement of data quality against business
requirements
• Establishes an automated method to eliminate errors at the onset and
reduce error remediation time
• Numerous pipeline management options like pause, stop, recover, resume,
cleanup, schedule and much more
• Pipeline preview
• Indexing support for tables and HDFS files
Key Benefits
• Granular authorization and control - which allows you far more control over
who can access what data
• Centralized metadata harvesting and publishing and search – no more hours
or even weeks lost to finding and sharing metadata
• Easily register HTTP services as transformations like PIG scripts, Java
programs, MR jobs and Spark jobs
• Mark business-critical entities under watch to receive notifications for any
metadata change
• Virtualized access and queries support
• Centralized monitoring of technology components – no more scurrying
between dashboards to do routine monitoring and to check alerts
• Easily resume from the point of failure of DB pipeline through corrective
suggestions and effortlessly recover failed pipelines
• Pipeline lineage and audit – you need to know where your data is and where
it’s been because if you don’t, it’s been somewhere you didn’t want it to be
• Partial execution of pipeline – get what you need to the level you need it and
abandon jobs at will when necessary
• Build actionable data sets
with a few simple clicks.
• Highly customized data
pipelines to fit precise
organizational and business
line needs.
• Automated data blending
toolsets are faster than
traditional methods
© 2016 Impetus Technologies, Inc.
All rights reserved. Product and
company names mentioned herein
may be trademarks of their
respective companies.
Mar 2016
Impetus is focused on creating big business impact through Big Data Solutions for Fortune 1000
enterprises across multiple verticals. The company brings together a unique mix of software products,
consulting services, Data Science capabilities and technology expertise. It offers full life-cycle services
for Big Data implementations and real-time streaming analytics, including technology strategy,
solution architecture, proof of concept, production implementation and on-going support to its clients.
To learn more, visit www.impetus.com or write to us at inquiry@impetus.com.
How It Works
Big Data Ecosystem
Streaming Source
Traditional RDBMS
From different physical
locations, in different
formats
RMF Segmentation
User Defined Functions
Enterprise
DB Sink,
Generate
Actionable
Insights
Deduplication
Classification
Blended Data Pipeline
Rich Transformation
Libraries
SQL
Mapping
HQL
JOIN
Cleanse and Standardize
Clustering
Correlation
Frequency Distribution
Group Distribution
Basic Profiling
Alphabetical Block
Soundex
RMF Segmentation

data_blending

  • 1.
    Data Sheet Data Blending Datablending allows you to blend or mix data in any way you need to comprehensively answer complex business questions. It doesn’t matter if data resides in the cloud or in your database, streams from IoT or social media, or is shared or bought through third-party sources such as partners or data brokers. You can blend it all easier and faster using automated data blending toolsets than you can through traditional means requiring specialized talent, labor intensity, and more time than any company can spare. With the right data blending toolsets, your organization can access, integrate, cleanse, and consume data in record time and with a high degree of accuracy and security. Not all data blending toolsets are created equal, however. Some are even locked-in to proprietary technologies that can restrict your options now and later. That’s why the Open Source movement is so strong in Big Data today. But even Open Source choices can lead to an unforeseen lock-in. The better strategy is to select data blending toolsets that offer the most interoperability options and the most flexibility in analytics use while also delivering the fastest and safest automated blending capabilities. Overview Data Blending Is the Process of… • Gathering data from various sources • Combining useful data into a functioning data set • Eliminating unnecessary data • Enabling uncommon types of data to coexist in a common data repository • Combine and analyze a larger variety of data – structured, semi-structured and unstructured – from a variety of sources free of data format and data physical location • Gain truer, more comprehensive insights over single format, single source data analysis. data analysis. Things you can do with the right data blending tool • Simplify data prep and reduce data prep time which means you can make the data available for analysis faster. • Gain more control over the data through automated error control, data quality monitoring, security alerts, and pipeline management ranging from pause to recover and cleanup controls. • Increase data democratization throughout the company by enabling self-service data selection and analysis and eliminating the need for special skills such as coding.
  • 2.
    Data Blending Solution Accelerateyour time-to-insight with our unique data blending process that stands apart from competing products by being Open Source based, replacing a traditional ETL approach with a modern self-service driven data blending approach, providing data quality controls, including baked-in security, and superior workload control via pause, quick resume, recover and other data pipeline controls. All of it is user-friendly with no need of special skills such as coding, and easily executed through an intuitive user interface. With a few clicks, users can blend diverse data sets irrespective of format or the physical location of your data. Specifically, our comprehensive toolset does the following: • Provides an automated mechanism to build different data sets through blended data pipelines per your organization’s analytical needs • Includes an intuitive workflow design with an easy to use drag-and-drop interface • Allows you to build actionable analytic dataset with just a few clicks • Alerts you to critical issues through reports, metrics and analysis • Enables monitoring and measurement of data quality against business requirements • Establishes an automated method to eliminate errors at the onset and reduce error remediation time • Numerous pipeline management options like pause, stop, recover, resume, cleanup, schedule and much more • Pipeline preview • Indexing support for tables and HDFS files Key Benefits • Granular authorization and control - which allows you far more control over who can access what data • Centralized metadata harvesting and publishing and search – no more hours or even weeks lost to finding and sharing metadata • Easily register HTTP services as transformations like PIG scripts, Java programs, MR jobs and Spark jobs • Mark business-critical entities under watch to receive notifications for any metadata change • Virtualized access and queries support • Centralized monitoring of technology components – no more scurrying between dashboards to do routine monitoring and to check alerts • Easily resume from the point of failure of DB pipeline through corrective suggestions and effortlessly recover failed pipelines • Pipeline lineage and audit – you need to know where your data is and where it’s been because if you don’t, it’s been somewhere you didn’t want it to be • Partial execution of pipeline – get what you need to the level you need it and abandon jobs at will when necessary • Build actionable data sets with a few simple clicks. • Highly customized data pipelines to fit precise organizational and business line needs. • Automated data blending toolsets are faster than traditional methods
  • 3.
    © 2016 ImpetusTechnologies, Inc. All rights reserved. Product and company names mentioned herein may be trademarks of their respective companies. Mar 2016 Impetus is focused on creating big business impact through Big Data Solutions for Fortune 1000 enterprises across multiple verticals. The company brings together a unique mix of software products, consulting services, Data Science capabilities and technology expertise. It offers full life-cycle services for Big Data implementations and real-time streaming analytics, including technology strategy, solution architecture, proof of concept, production implementation and on-going support to its clients. To learn more, visit www.impetus.com or write to us at inquiry@impetus.com. How It Works Big Data Ecosystem Streaming Source Traditional RDBMS From different physical locations, in different formats RMF Segmentation User Defined Functions Enterprise DB Sink, Generate Actionable Insights Deduplication Classification Blended Data Pipeline Rich Transformation Libraries SQL Mapping HQL JOIN Cleanse and Standardize Clustering Correlation Frequency Distribution Group Distribution Basic Profiling Alphabetical Block Soundex RMF Segmentation