Data Integration with Server Side Mashups
              Juergen Brendel
        Principal Software Engineer

           OS...
Agenda

    The SnapLogic project
•
•   Client-side mashups
•   Problems and solutions
•   Data integration with SnapLogic...
The SnapLogic project

• Founded 2005, data integration background
• Vision:
  –   Reusable data integration resources
  –...
What's a mashup?

• A 'Web 2.0 kind of thing'
• Combine, aggregate, visualise
  – Multiple sources
  – Multiple dimensions...
Self-made mashups

• Hand coded
• Mashup editors
  – GUI mashup-logic editor
  – Wiki-style
  – Hosted




               ...
Benefits for the enterprise?


                                                                         nal
              ...
Problems with client-side mashups

    Skill
•
•   Internal data often not web-friendly
•   Maintenance
•   Security
    P...
Solution: Server-side mashups

• Flexible access
• Security
• Performance




             Data Integration with Server Si...
SnapLogic data integration philosophy

     Clearly defined, REST resources
 •
 •   Data reuse and integration
 •   Pipeli...
Example: Resources

                HTTP://server1.example.com/customer_list
 Databases


                              Sn...
Example: Pipelines

                HTTP://server1.example.com/processed_customer_list
 Databases

                       ...
A simple pipeline: Filtering leads




             Data Integration with Server Side Mashups   Slide 12
                 ...
Linking fields in a pipeline




             Data Integration with Server Side Mashups   Slide 13
                       ...
Reusing a pipeline as a resource




            Data Integration with Server Side Mashups   Slide 14
                    ...
Reusing a pipeline as a resource




            Data Integration with Server Side Mashups   Slide 15
                    ...
Reusing a pipeline as a resource




            Data Integration with Server Side Mashups   Slide 16
                    ...
Adding new components

    For access logic
•
•   For data transformations
•   Independent of data format
•   Currently wr...
A simple processing component

 1: class IncreaseSalary(DataComponent):
 2:
 3:    def init(self):
 4:       '''Called whe...
An Apache log file reader
 1: class LogReader(DataComponent):
 2:
 3:     def startReading(self):
 4:        '''Called whe...
Programmatic access

• GUI is nice, but still limiting
• SnapScript: An API library
• Python, PHP, more to come




      ...
Creating a resource
 1:   # Create a new resource
 2:   staff_res_def = Resource(component='SnapLogic.Components.CsvRead')...
Creating a pipeline
 1:   # Create a new pipeline
 2:   p = Pipeline()
 3:   p.props.URI    = '/SnapLogic/Pipelines/empl_s...
Pipeline parameters
 1:   # Define the user-visible parameters of the pipeline
 2:   p.props.parameters = (
 3:       ('IN...
The end



   Any questions?

jbrendel@snaplogic.org




    Data Integration with Server Side Mashups   Slide 24
        ...
Upcoming SlideShare
Loading in …5
×

Data Integration with server side Mashups

2,291 views
2,185 views

Published on

The open source SnapLogic data integration framework. Overview, examples, screenshots.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,291
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
68
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Data Integration with server side Mashups

  1. 1. Data Integration with Server Side Mashups Juergen Brendel Principal Software Engineer OSDC 2007, Brisbane
  2. 2. Agenda The SnapLogic project • • Client-side mashups • Problems and solutions • Data integration with SnapLogic Data Integration with Server Side Mashups Slide 2 OSDC 2007, Brisbane
  3. 3. The SnapLogic project • Founded 2005, data integration background • Vision: – Reusable data integration resources – REST – Web-based GUI – Programmatic interface – Open Source • Python... Why not? • www.snaplogic.com Data Integration with Server Side Mashups Slide 3 OSDC 2007, Brisbane
  4. 4. What's a mashup? • A 'Web 2.0 kind of thing' • Combine, aggregate, visualise – Multiple sources – Multiple dimensions • Typically on the client side – Browser – Ajax Data Integration with Server Side Mashups Slide 4 OSDC 2007, Brisbane
  5. 5. Self-made mashups • Hand coded • Mashup editors – GUI mashup-logic editor – Wiki-style – Hosted Data Integration with Server Side Mashups Slide 5 OSDC 2007, Brisbane
  6. 6. Benefits for the enterprise? nal Enable knowledge io uat ns ! Sit workers !!! atio c ppli a Avoi d th IT b ottle e neck !! Yeah, right... Data Integration with Server Side Mashups Slide 6 OSDC 2007, Brisbane
  7. 7. Problems with client-side mashups Skill • • Internal data often not web-friendly • Maintenance • Security Performance • Data Integration with Server Side Mashups Slide 7 OSDC 2007, Brisbane
  8. 8. Solution: Server-side mashups • Flexible access • Security • Performance Data Integration with Server Side Mashups Slide 8 OSDC 2007, Brisbane
  9. 9. SnapLogic data integration philosophy Clearly defined, REST resources • • Data reuse and integration • Pipelines • Framework for resource specific scripting Open source and community • Data Integration with Server Side Mashups Slide 9 OSDC 2007, Brisbane
  10. 10. Example: Resources HTTP://server1.example.com/customer_list Databases SnapLogic Server Files Client HTTP HTTP Request and Component Applications Response Atom / RSS Resource JSON Definition • Resource Name • HTTP://server1.example.com/customer_list • SQL Query or filename • Credentials • Parameters Data Integration with Server Side Mashups Slide 10 OSDC 2007, Brisbane
  11. 11. Example: Pipelines HTTP://server1.example.com/processed_customer_list Databases SnapLogic Server Files Client HTTP HTTP Request and Component Component Component Applications Response Atom / RSS Resource Resource Resource JSON Definition Definition Definition Read Geocode Sort Data Integration with Server Side Mashups Slide 11 OSDC 2007, Brisbane
  12. 12. A simple pipeline: Filtering leads Data Integration with Server Side Mashups Slide 12 OSDC 2007, Brisbane
  13. 13. Linking fields in a pipeline Data Integration with Server Side Mashups Slide 13 OSDC 2007, Brisbane
  14. 14. Reusing a pipeline as a resource Data Integration with Server Side Mashups Slide 14 OSDC 2007, Brisbane
  15. 15. Reusing a pipeline as a resource Data Integration with Server Side Mashups Slide 15 OSDC 2007, Brisbane
  16. 16. Reusing a pipeline as a resource Data Integration with Server Side Mashups Slide 16 OSDC 2007, Brisbane
  17. 17. Adding new components For access logic • • For data transformations • Independent of data format • Currently written in Python Data Integration with Server Side Mashups Slide 17 OSDC 2007, Brisbane
  18. 18. A simple processing component 1: class IncreaseSalary(DataComponent): 2: 3: def init(self): 4: '''Called when the component is started.''' 5: self.increase = float(self.moduleProperties['percent_increase']) 6: 7: def processRecord(self, record): 8: '''Called for every record.''' 9: record.fields['salary'] *= (1 + self.increase/100) 10: self.writeRecord(record) Data Integration with Server Side Mashups Slide 18 OSDC 2007, Brisbane
  19. 19. An Apache log file reader 1: class LogReader(DataComponent): 2: 3: def startReading(self): 4: '''Called when component does not have input stream.''' 5: logfile = open(self._filename, 'rbU') 6: format = self.moduleProperties['log_format'] 7: 8: if format == 'COMMON': 9: p = apachelog.parser(apachelog.formats['common']) 10: elif ... 11: 12: # Read all lines in the logfile 13: for line in logile: 14: out_rec = Record(self.getSingleOutputView()) 15: raw_rec = p.parse(line) 16: out_rec.fields['remote_host'] = raw_rec['%h'] 17: out_rec.fields['client_id'] = raw_rec['%l'] 18: out_rec.fields['user'] = raw_rec['%u'] 19: out_rec.fields['server_status'] = int(raw_rec['%>s']) 20: out_rec.fields['bytes'] = int(raw_rec['%b']) 21: ... 22: 23: self.writeRecord(out_rec) Data Integration with Server Side Mashups Slide 19 OSDC 2007, Brisbane
  20. 20. Programmatic access • GUI is nice, but still limiting • SnapScript: An API library • Python, PHP, more to come Data Integration with Server Side Mashups Slide 20 OSDC 2007, Brisbane
  21. 21. Creating a resource 1: # Create a new resource 2: staff_res_def = Resource(component='SnapLogic.Components.CsvRead') 3: staff_res_def.props.URI = '/SnapLogic/Resources/Staff' 4: staff_res_def.props.description = 'Read the from the employee file' 5: staff_res_def.props.title = 'Staff' 6: staff_res_def.props.delimiter = '$?{DELIMITER}' 7: staff_res_def.props.filename = '$?{INPUTFILE}' 8: staff_res_def.props.parameters = ( 9: ('INPUTFILE', Param.Required, ''), 10: ('DELIMITER', Param.Optional, ',') 11: ) 12: 13: # Define the output view of the resource 14: staff_res_def.props.outputview.output1 = ( 15: ('Last_Name', 'string', 'Employee last name'), 16: ('First_Name', 'string', 'Employee first Name'), 17: ('Salary', 'number', 'Annual income') 18: ) Data Integration with Server Side Mashups Slide 21 OSDC 2007, Brisbane
  22. 22. Creating a pipeline 1: # Create a new pipeline 2: p = Pipeline() 3: p.props.URI = '/SnapLogic/Pipelines/empl_salary_inc' 4: p.props.title = 'Employee_Salary_Increase' 5: 6: # Select the resources in the pipeline 7: p.resources.Staff = staff_res_def.instance() 8: p.resources.PayRaise = increase_salary_res_def.instance() 9: 10: # Link the resources in the pipeline 11: link = ( 12: ('Last_Name', 'last'), 13: ('First_Name', 'first'), 14: ('Salary', 'salary') 15: ) 16: p.linkViews('Staff', 'output1', 'Salary_Increaser', 'input1', link) Data Integration with Server Side Mashups Slide 22 OSDC 2007, Brisbane
  23. 23. Pipeline parameters 1: # Define the user-visible parameters of the pipeline 2: p.props.parameters = ( 3: ('INCREASE', Param.Required, ''), 4: ) 5: 6: # Map values to the parameters of the pipeline's resources 7: p.props.parammap = ( 8: (Param.Parameter, 'INCREASE', 'PayRaise', 'PERC_INCREASE'), 9: (Param.Constant, 'file://foo/staff.csv', 'Staff', 'INPUTFILE') 10: ) 11: 12: # Confirm correctness and publish as a new resource 13: p.check() 14: p.saveToServer(connection) Data Integration with Server Side Mashups Slide 23 OSDC 2007, Brisbane
  24. 24. The end Any questions? jbrendel@snaplogic.org Data Integration with Server Side Mashups Slide 24 OSDC 2007, Brisbane

×