This presentation is to understand StreamSets ETL tool.
StreamSets is modern ETL tool designed to process streaming data.
StreamSets has 2 engines, 1 is Data Controller and Data Transformer(Based on Apache Spark).
2. What is Streamsets?
•Platform for data integration
• Multi Cloud Architecture
•Easy connections for various Source/
Target(Data Collector)
3. Streamsets Value Proposition
StreamSets Control Hub, introduced in 2017, provided a single software-as-a-
service platform to design, deploy, monitor, and manage smart data pipelines at
scale on any cloud and on-premises.
Why
Streamsets?
Minimize
Adoption time for
technologies
Smart modern
option for
changing data
source
Minimal
intervention for
developers for
data drifts
Increased visibility
for monitoring
loads
Reduced TCO
Designed to
handle data drifts
Combined
capabilities of ETL
and data
integration
4. Informatica vs Streamsets
Informatica Streamsets
• Cost intensive
• In business from 20+ years
• Proven high performance
• Less adaptive for new Source /
target connections
o Required to pay license cost
for additional connections
• Requires high Servers
• More clients compared to
Streamsets
o Designer
o Workflow manager
o Repo Manager
o Admin console
• Cost effective
• Launched in 2015 and still on the
path to be adaptive
• Based on Apache spark which is an
open-source platform
• Ease of adapting to new
connections (highly flexible)
• Lightweight application
• All functionality is managed under
Control Hub