A resource oriented
approach to Data
Services

Mike Pittaro
Co-Founder,
and Chief Community Officer
SnapLogic

July 26, 2007
A Resource-oriented approach to data services.
   ! Description: This technical session will describe how a resource-
    ...
SnapLogic Introduction

# SnapLogic is a data transformation framework
   ! Open Source project, GPL Version 2.0 license
 ...
Fundamental Integration Problems

# Very Complicated Applications and Systems
       Tightly Coupled, Many Inter-Dependenc...
What makes Integration so Complex ?

                             1. Multiple Access Protocols
                           ...
Is There A Better Solution ?

# Design Goals
       Scalable
   !
       Extensible (by ordinary developer)
   !
       Ea...
The Web and Integration

# The largest integration venture ever
   ! 17 million web servers
   ! Totally Decentralized
   ...
The Web has an Architecture ?!

# There are deep design principles behind the web
   ! Based on Representational State Tra...
The SnapLogic Approach

 # Apply the principles of web architecture to data access
   and transformation.
                ...
Basic Data Integration Operations

# Read from Data Sources
   ! Files, Databases, Applications, 'Feeds' (RSS/Atom), XML
#...
Resource Oriented Data Services

# Mapping Data Operations to REST
      Data set =>         resource
  !
      Data descr...
Resource Oriented Pipelines

# Pipelines are a set of coordinated resources
                                  HTTP POST


...
Applications for Resource Oriented Services

# Traditional Integration
   ! Data Interfaces between systems
   ! Data conv...
Benefits of a Resource Approach

# All resources have consistent interfaces
   ! Easy to mix, match, and compose them toge...
What's in the Download ?

# SnapLogic Data Server
       Container for components
   !
       Coordinates pipeline executi...
SnapLogic Development Model

# Create resource definitions and load onto the server
   ! Can be done with Python or throug...
•Example 1 – Reading from SugarCRM

# Create a resource to read an account list from SugarCRM
#
# Using:
       Python
   ...
•Example 2 – Reading from QuickBooks

# Create a resource to read from QuickBooks
#
# Using:
   ! The QuickBooks read comp...
•Example 3 – Merge Sugar and Quickbooks

# Create a pipeline using our two resources
   ! Union the two streams, mark as c...
•Metadata and Resource Descriptions

# Metadata is required for serious integration
   ! Lack of metadata the biggest limi...
•The URL's you need

# Everything is at http://www.snaplogic.org
   ! Full source, GPL V 2.0 license
   ! Forums, mailing ...
Thanks

# Questions ?




                Copyright © 2007 SnapLogic, Inc.
Why Open Source the Product?

# LAMP is the future of all infrastructure
# Proprietary development model is broken
# Integ...
OSCON
July 26, 2007
Mike Pittaro, Founder and Chief Community Officer



                                                 ...
Upcoming SlideShare
Loading in …5
×

Os Pittaro

571 views
525 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
571
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Os Pittaro

  1. 1. A resource oriented approach to Data Services Mike Pittaro Co-Founder, and Chief Community Officer SnapLogic July 26, 2007
  2. 2. A Resource-oriented approach to data services. ! Description: This technical session will describe how a resource- oriented approach can be used to transform data into data services. ! Using a combination of REST, Python and RDF, we'll show you how to create data resources which can be composed into transformation pipelines. In this system, pipelines are also resources, allowing incremental composition of new data services based on existing ones. ! This session will include an overview of the SnapLogic Open Source data integration toolkit. ! We will also take a look at some real world examples: quot; service enable an existing application quot; create a transformation pipeline quot; combine data from multiple pipelines to create a 'mashup' resource. Copyright © 2007 SnapLogic, Inc.
  3. 3. SnapLogic Introduction # SnapLogic is a data transformation framework ! Open Source project, GPL Version 2.0 license ! Implemented in Python # Our goal is to provide a general solution for data access and transformation. Data access and transformation is a universal problem. ! So far, there has been no consistent solution. ! The problem is getting worse, not better. ! More API's, versions, and formats than ever before. ! Slide 2 Copyright © 2007 SnapLogic, Inc.
  4. 4. Fundamental Integration Problems # Very Complicated Applications and Systems Tightly Coupled, Many Inter-Dependencies ! Heterogeneous environments ! Systems must continually evolve ! Upgrades, Conversions, and Consolidations are difficult ! Vendor proprietary internal details ! Limited vendor support lifecycles ! Real systems knowledge is possessed by the implementers ! # More data is being generated than ever. Explosion of data formats ! 'Unstructured' data, with little or no metadata ! Data quality and validity ! Data feeds and conversions everywhere ! Copyright © 2007 SnapLogic, Inc.
  5. 5. What makes Integration so Complex ? 1. Multiple Access Protocols 2. Multiple Access Methods 3. Multiple Data Schemas ODBC Oracle Oracle ODBC SAP SAP Native Native SSL SSL 3rd Party 3rd Party SOAP SOAP Web Services Web Services FTP FTP Flat Files Flat Files LDAP LDAP LDAP/AD LDAP/AD Slide 4 Copyright © 2007 SnapLogic, Inc.
  6. 6. Is There A Better Solution ? # Design Goals Scalable ! Extensible (by ordinary developer) ! Easier to use than writing code for every data interface ! Target developers, not business users ! quot; 'Data Crunching / Data Munging' (Greg Wilson / David Cross) ! Bridge the gap between the Web and Enterprise Data access # To solve the problem, we need to minimize the variables ! The protocols ! The access methods ! The data formats / schemas # We started looking for a better integration solution.... ! I realized the Web seemed to be less affected by the problem Copyright © 2007 SnapLogic, Inc.
  7. 7. The Web and Integration # The largest integration venture ever ! 17 million web servers ! Totally Decentralized ! Fundamentally heterogeneous model quot; operating systems web servers quot; applications tools and frameworks # It should be a nightmare of compatibility problems... ! but it's not ! # All compatible and interoperable ! Based on open standards and protocols ! HTTP and (X)HTML ! Using a common architecture Copyright © 2007 SnapLogic, Inc.
  8. 8. The Web has an Architecture ?! # There are deep design principles behind the web ! Based on Representational State Transfer (REST) ! Developed by Roy Fielding at UC Irvine # The key abstraction in REST is a Resource ! A resource is any information that can be named # The (simplified) principles of REST: ! state and functionality are divided into resources ! resources are addressable using URI's ! all resources share a uniform interface quot; constrained set of operations quot; limited set of content types ! manipulating resources is done by exchanging representations Copyright © 2007 SnapLogic, Inc.
  9. 9. The SnapLogic Approach # Apply the principles of web architecture to data access and transformation. Oracle SL SL Oracle SL SAP SL SAP 3rd Party SL SL 3rd Party SL Web Services SL Web Services SL Flat Files SL Flat Files SL LDAP/AD SL LDAP/AD Consistent Protocol - HTTP Consistent Methods - REST 'verbs' Consistent Data Schema - Normalized Tables Slide 8 Copyright © 2007 SnapLogic, Inc.
  10. 10. Basic Data Integration Operations # Read from Data Sources ! Files, Databases, Applications, 'Feeds' (RSS/Atom), XML # Write to Data Sinks (targets) ! Files, Databases, Applications, 'Feeds', XML # Transform Data ! Filter, Sort, Aggregate, Join, Union ! string operations, formatting, general calculation # Pipelined Operations ! It's a data flow model, not really procedural ! It's useful to cascade these operations in sequence. ! The data really should stream when possible Copyright © 2007 SnapLogic, Inc.
  11. 11. Resource Oriented Data Services # Mapping Data Operations to REST Data set => resource ! Data description => resource description ! data format => representation (mime type) ! read => HTTP GET from a resource ! HTTP GET /customer_list Response with Data ! write => HTTP POST to a resource, with the URL to GET from HTTP POST HTTP GET /customer_list /new_location Response with Data Copyright © 2007 SnapLogic, Inc.
  12. 12. Resource Oriented Pipelines # Pipelines are a set of coordinated resources HTTP POST HTTP GET HTTP GET /customer_list /remove_dups /modified_list Copyright © 2007 SnapLogic, Inc.
  13. 13. Applications for Resource Oriented Services # Traditional Integration ! Data Interfaces between systems ! Data conversion and migration ! ETL for warehousing and analysis/BI # Data for 'Mash up' Applications ! Expose data as a service from any application ! Allow data to be reprocessed and reused # General Purpose Data Manipulation ! 'Data Crunching' / 'Data Munging' Copyright © 2007 SnapLogic, Inc.
  14. 14. Benefits of a Resource Approach # All resources have consistent interfaces ! Easy to mix, match, and compose them together. ! Application / Interface details are hidden at the endpoints. # All resources have a full http://... URI ! mix, match, and compose across servers easily # Pipelines are also resources ! A pipeline has a URL ! Can be read/written like any other resource ! Simplifies the composition of complex scenarios /pipe1 /pipe2 /pipe3 Copyright © 2007 SnapLogic, Inc.
  15. 15. What's in the Download ? # SnapLogic Data Server Container for components ! Coordinates pipeline execution ! Maintains resource definition repository as an RDF store ! Provides metadata services and client tool interfaces ! # Components ! Database read/write, file read/write, RSS/JSON read/write ! SaleForce Read, QuickBooks read, Apache Log reader ! Transformations – Sort, Aggregate, Filter, Join, Mixer, Sequence # Management Server ! Support for graphical web client (Flex application) # SnapScript package ! Python classes for programmers to define and access resources # SnapAdmin ! command line management utility Copyright © 2007 SnapLogic, Inc.
  16. 16. SnapLogic Development Model # Create resource definitions and load onto the server ! Can be done with Python or through the Web client # Create pipelines that connect the resources ! Again, via Python or the Web Client # Execute the pipeline ! The server takes care of coordinating the HTTP operations behind the scene. Copyright © 2007 SnapLogic, Inc.
  17. 17. •Example 1 – Reading from SugarCRM # Create a resource to read an account list from SugarCRM # # Using: Python ! The SnapLogic.SnapScript package ! The Database Reader Component ! The Connection Component ! Copyright © 2007 SnapLogic, Inc.
  18. 18. •Example 2 – Reading from QuickBooks # Create a resource to read from QuickBooks # # Using: ! The QuickBooks read component ! The Web client Copyright © 2007 SnapLogic, Inc.
  19. 19. •Example 3 – Merge Sugar and Quickbooks # Create a pipeline using our two resources ! Union the two streams, mark as customer or prospect. ! # Using: ! Pipeline Component ! Mixer Component ! File Write Component Copyright © 2007 SnapLogic, Inc.
  20. 20. •Metadata and Resource Descriptions # Metadata is required for serious integration ! Lack of metadata the biggest limitation of custom code # In SnapLogic, all resources definitions use RDF ! We maintain a complete description of the resource ! The SnapLogic server repository is an RDF Store # RDF is managed by the server and clients ! Metadata is automatically generated for the web client or SnapScript ! You need the metadata, but don't have to deal with it directly. # All resources can be queried for information ! GET from /url....?target=meta ! http://localhost:8088/OSCon/ReadSugarAccounts?target=meta # SnapScript can also generate metadata ! Resource.getAsRDFString() Copyright © 2007 SnapLogic, Inc.
  21. 21. •The URL's you need # Everything is at http://www.snaplogic.org ! Full source, GPL V 2.0 license ! Forums, mailing lists, Wiki, and bugs # http://packages.snaplogic.org A download site for SnapLogic content ! SugarCRM Data Mart ! Apache Log Reader ! Dojo / Javascript Mashup Example ! Copyright © 2007 SnapLogic, Inc.
  22. 22. Thanks # Questions ? Copyright © 2007 SnapLogic, Inc.
  23. 23. Why Open Source the Product? # LAMP is the future of all infrastructure # Proprietary development model is broken # Integration remains a coding problem # Open Source economics work for integration ! Eliminates deal-driven !adapterquot; development ! No vendor can support the whole connectivity matrix ! Enable reuse of data and components Slide 22 Copyright © 2007 SnapLogic, Inc.
  24. 24. OSCON July 26, 2007 Mike Pittaro, Founder and Chief Community Officer Slide 23 Copyright © 2007 SnapLogic, Inc.

×