SnapLogic Live – Big Data in Motion
2
Source: Internap
http://www.internap.com/resources/infographic-data-motion-vs-data-rest/
3
The Data Lake
“Just as data integration is the
foundation of the data
warehouse, an end-to-end
data processing capability is
the core of the data lake. The
new environment needs a
new workhorse.”
- Mark Madsen, Third Nature
snaplogic.com/resources
z
Data
Acquisition
Data Access
z
Data
Management
Data Lake Components
Add information
and improve
data
Spark
Python
Scala
Java
R
Pig
Collect and
integrate data
from multiple
sources
HDFS
AWS S3
MS Azure Blob
On Prem Apps
and Data
• ERP
• CRM
• RDBMS
Cloud Apps
and Data
• CRM
• HCM
• Social
IoT Data
• Sensors
• Wearables
• Devices
Lakeshore
Data Mart
• MS Azure
• AWS
Redshift
BI / Analytics
• Tableau
• MS
PowerBI /
Azure
• AWS
QuickSight
Organize and
prepare data
for
visualization
HDFS
AWS S3
MS Azure Blob
Hive
Batch
Streaming
Schedule and manage:
Oozie, Ambari
Kafka, Sqoop,
Flume
Real-time
Impala, HiveSQL,
SparkSQL
Current Data Lake Architecture
z
Data
Acquisition
Data Access
z
Data
Management
The Modern Data Lake
Powered by SnapLogic
Sort,
Aggregate,
Join, Merge,
Transform
SnapLogic
abstracts and
operationalizes
with MapReduce
or Spark
pipelines
Collect and
integrate data
from multiple
sources
SnapLogic
pipelines with
standard mode
execution
Organize and
prepare data
for
visualization
SnapLogic
pipelines with
standard mode
execution
On Prem Apps
and Data
• ERP
• CRM
• RDBMS
Cloud Apps
and Data
• CRM
• HCM
• Social
IoT Data
• Sensors
• Wearables
• Devices
Lakeshore
Data Mart
• MS Azure
• AWS
Redshift
BI / Analytics
• Tableau
• MS
PowerBI /
Azure
• AWS
QuickSight
Schedule and manage:
SnapLogic
Batch
Streaming
Real-time
Modern Data Lake Architecture
SnapLogic Pipeline
6
SnapLogic in the Modern Data Fabric
ConsumeStore&ProcessSource
z z z z
HANA
Data Warehouses &
Data Marts
Big Data and Data
Lakes
INGEST INGEST
Data Integration and
Transformation
On Prem
Applications
Relational
Databases
Cloud
Applications
NoSQL
Databases
Web
Logs
Internet of
Things
DELIVER DELIVER
Modern Architecture: Hybrid and Elastic
Streams: No data is
stored/cached
Secure: 100%
standards-based
Elastic: Scales out &
handles data and app
integration use cases
Metadata
Data
Databases
On Prem
Apps
Big Data
Cloud Apps
and DataCloud-Based Designer, Manager,
Dashboard
Cloudplex
Groundplex
Hadooplex
Firewall
Discussion
snaplogic.com
Unified Platform Self-Service UX
Modern Architecture Connected: 400+ Snaps

Snaplogic Live: Big Data in Motion

  • 1.
    SnapLogic Live –Big Data in Motion
  • 2.
  • 3.
    3 The Data Lake “Justas data integration is the foundation of the data warehouse, an end-to-end data processing capability is the core of the data lake. The new environment needs a new workhorse.” - Mark Madsen, Third Nature snaplogic.com/resources
  • 4.
    z Data Acquisition Data Access z Data Management Data LakeComponents Add information and improve data Spark Python Scala Java R Pig Collect and integrate data from multiple sources HDFS AWS S3 MS Azure Blob On Prem Apps and Data • ERP • CRM • RDBMS Cloud Apps and Data • CRM • HCM • Social IoT Data • Sensors • Wearables • Devices Lakeshore Data Mart • MS Azure • AWS Redshift BI / Analytics • Tableau • MS PowerBI / Azure • AWS QuickSight Organize and prepare data for visualization HDFS AWS S3 MS Azure Blob Hive Batch Streaming Schedule and manage: Oozie, Ambari Kafka, Sqoop, Flume Real-time Impala, HiveSQL, SparkSQL Current Data Lake Architecture
  • 5.
    z Data Acquisition Data Access z Data Management The ModernData Lake Powered by SnapLogic Sort, Aggregate, Join, Merge, Transform SnapLogic abstracts and operationalizes with MapReduce or Spark pipelines Collect and integrate data from multiple sources SnapLogic pipelines with standard mode execution Organize and prepare data for visualization SnapLogic pipelines with standard mode execution On Prem Apps and Data • ERP • CRM • RDBMS Cloud Apps and Data • CRM • HCM • Social IoT Data • Sensors • Wearables • Devices Lakeshore Data Mart • MS Azure • AWS Redshift BI / Analytics • Tableau • MS PowerBI / Azure • AWS QuickSight Schedule and manage: SnapLogic Batch Streaming Real-time Modern Data Lake Architecture SnapLogic Pipeline
  • 6.
    6 SnapLogic in theModern Data Fabric ConsumeStore&ProcessSource z z z z HANA Data Warehouses & Data Marts Big Data and Data Lakes INGEST INGEST Data Integration and Transformation On Prem Applications Relational Databases Cloud Applications NoSQL Databases Web Logs Internet of Things DELIVER DELIVER
  • 7.
    Modern Architecture: Hybridand Elastic Streams: No data is stored/cached Secure: 100% standards-based Elastic: Scales out & handles data and app integration use cases Metadata Data Databases On Prem Apps Big Data Cloud Apps and DataCloud-Based Designer, Manager, Dashboard Cloudplex Groundplex Hadooplex Firewall
  • 8.
    Discussion snaplogic.com Unified Platform Self-ServiceUX Modern Architecture Connected: 400+ Snaps

Editor's Notes

  • #4 http://blog.econocom.com/en/blog/whats-a-data-lake/
  • #5 This is just a sampling of the available technologies that may go into a data lake. To date, most data lake deployments have been built through manual coding, open source tools and custom integration. Manual coding of data processing applications is common because data processing is thought of in terms of application-specific work. Unfortunately, this manual effort is a dead-end investment over the long term because the underlying technologies are constantly changing. Older data warehouse environments and ETL type integration tools are good at what they do, but they can’t meet many of the new needs. The new environments are focused on data processing, but require a lot of manual work. The data lake must incorporate aspects of old data warehouse environments like connecting to and extracting data from ERP or transaction processing systems, yet do this without clunky and inefficient tools like Sqoop. The data lake also must support new capabilities like reliable collection of large volumes of events at high speed and timely processing to make data available immediately. It must also support data coming from multiple sources in a hybrid model. This exceeds the abilities of traditional data integration tools.
  • #6 SnapLogic accelerates development of a modern data lake through: Data acquisition: collecting and integrating data from multiple sources. SnapLogic goes beyond developer tools such as Sqoop and Flume with a cloud-based visual pipeline designer, and pre-built connectors for 300+ structured and unstructured data sources, enterprise applications and APIs. Data transformation: adding information and transforming data. SnapLogic minimizes the manual tasks associated with data shaping and makes data scientists and analysts more efficient. SnapLogic includes Snaps for tasks such as transformations, joins and unions without scripting. Data access: organizing and preparing data for delivery and visualization. SnapLogic makes data processed on Hadoop or Spark easily available to off-cluster applications and data stores such as statistical packages and business intelligence tools.
  • #8 Here is an example of a SnapLogic deployment. The SnapLogic control plane – including he Designer, Manager and Dashboard - does not store your data. It’s metadata only. Once a pipeline is executed, it looks for the associated Snaplex or Hadooplex. The plex dynamically scales out, adding more nodes as needed. We like to say that SnapLogic “respects data gravity” and runs as close to the data as need be. If you are integrating only cloud applications, it would make no sense to run your integrations behind the firewall. Similarly, if you’re doing ground to ground or cloud to ground, you may want to run your Snaplex on Window or Linux servers. Note that the dotted line is sending instructions via metadata to the plex, which is waiting to run. The solid line indicates how data movies bi-directionally between systems.