SlideShare a Scribd company logo
1 of 60
TALEND OPEN STUDIO
                          OR, HOW I LEARNED TO RELAX AND ENJOY ETL




Thursday, March 5, 2009
TIM BERGLUND



Thursday, March 5, 2009
Thursday, March 5, 2009
Thursday, March 5, 2009
Thursday, March 5, 2009
DOSUG




Thursday, March 5, 2009
Thursday, March 5, 2009
DBA

Thursday, March 5, 2009
Thursday, March 5, 2009
Thursday, March 5, 2009
ACTUAL DATA

                             DISPERSED
                           IDIOSYNCRATIC
                              MESSY


Thursday, March 5, 2009
DATA THE BUSINESS
                        LIKES
                             CENTRALIZED
                             CONSISTENT
                          ANSWERS QUESTIONS


Thursday, March 5, 2009
OUR SOFTWARE




Thursday, March 5, 2009
OUR REFERENCE




Thursday, March 5, 2009
ABOUT


                          AN OPEN-SOURCE STARTUP



Thursday, March 5, 2009
ABOUT


                           BASED IN FRANCE



Thursday, March 5, 2009
ABOUT


                          ENVISIONS LOW-COST ETL
                           (EXTRACT, TRANSFORM, AND LOAD)




Thursday, March 5, 2009
ABOUT


                          FREE AND SUBSCRIPTION
                               PRODUCTS




Thursday, March 5, 2009
ABOUT


                            ECLIPSE-BASED



Thursday, March 5, 2009
ABOUT


                 THEREFORE JAVA-BASED, BUT HAS
                     STRANGE PERL OPTION




Thursday, March 5, 2009
ABOUT


                          HAS A FRENCH ACCENT



Thursday, March 5, 2009
ABOUT


                          http://www.talend.com



Thursday, March 5, 2009
BASIC COMPONENTS



Thursday, March 5, 2009
Business Modeler


                            Data Processing
                                “Jobs”


                            Custom Java and
                            Groovy Scripts


                          Metadata Repository




Thursday, March 5, 2009
BUSINESS MODELER




Thursday, March 5, 2009
JOB DESIGNER

                             VISUAL WORKSPACE
                          WHERE DEVELOPMENT TAKES
                                   PLACE



Thursday, March 5, 2009
JOB DESIGNER




Thursday, March 5, 2009
METADATA REPOSITORY

                          DATABASE CONNECTIONS
                               INPUT FILES
                              OUTPUT FILES
                                SCHEMAS
                              WEB SERVICES
Thursday, March 5, 2009
DEMO, PLX



Thursday, March 5, 2009
AND NOW,
                          SOME THEORY


Thursday, March 5, 2009
THE SYSTEM OF
                            RECORD


Thursday, March 5, 2009
THE SYSTEM OF
                            RECORD


                           TRANSACTIONAL



Thursday, March 5, 2009
THE SYSTEM OF
                            RECORD

                          HEAVILY NORMALIZED
                               (IDEALLY)



Thursday, March 5, 2009
THE SYSTEM OF
                            RECORD


                          PRODUCTION SYSTEM



Thursday, March 5, 2009
THE DATA
                          WAREHOUSE


Thursday, March 5, 2009
THE DATA
                          WAREHOUSE


                            DERIVED



Thursday, March 5, 2009
THE DATA
                          WAREHOUSE


                          NONTRANSACTIONAL



Thursday, March 5, 2009
THE DATA
                          WAREHOUSE


                           DENORMALIZED



Thursday, March 5, 2009
THE DATA
                            WAREHOUSE


                          IMPORTANT, BUT OFFLINE



Thursday, March 5, 2009
HOW TO DO IT



Thursday, March 5, 2009
Thursday, March 5, 2009
Thursday, March 5, 2009
EXTRACT, TRANSFORM,
                       AND LOAD



                          EXTRACTING



Thursday, March 5, 2009
EXTRACT, TRANSFORM,
                       AND LOAD



                          CLEANING



Thursday, March 5, 2009
EXTRACT, TRANSFORM,
                       AND LOAD



                          CONFORMING



Thursday, March 5, 2009
EXTRACT, TRANSFORM,
                       AND LOAD



                          THE SNOWFLAKE SCHEMA



Thursday, March 5, 2009
THE SNOWFLAKE
                             SCHEMA

                  ONE FACT REFERENCING MANY
                         DIMENSIONS



Thursday, March 5, 2009
THE FACT TABLE
                                  order_fact
                                    user_id (FK)
                              shipping_location_id (FK)
                               billing_location_id (FK)
                              payment_method_id (FK)
                              line_item_group_id (FK)
                               order_timestamp (fact)
                                     total (fact)
                                   subtotal (fact)
                                 shipping_cost (fact)



Thursday, March 5, 2009
THE DIMENSION TABLE
                          user_dimension
                                  id (PK)
                           user_id (business key)
                                 username
                                first_name
                                last_name
                                 company
                           show_only_same_mfg
                          show_nonzero_inventory
                            mailing_list_opt_in



Thursday, March 5, 2009
ion

                          PUT TOGETHER
ey)




                                                       user_dimension
                             order_fact                           id (PK)
                                user_id (FK)          internal_user_id (business key)
mfg
                          shipping_location_id (FK)
                                                                username
 tory
                           billing_location_id (FK)
                                                                first_name
in
                          payment_method_id (FK)
                                                                last_name
                          line_item_group_id (FK)
sion                                                             company
                           order_timestamp (fact)
                                                                   city
                                 total (fact)
key)                                                               state
                               subtotal (fact)
                                                            mailing_list_opt_in
                             shipping_cost (fact)




 mfg

ntory

in
Thursday, March 5, 2009
LET’S SEE ANOTHER
                          DEMO!


Thursday, March 5, 2009
NOW WHAT?

                                REPORTING
                               OLAP TOOLS
                           BUSINESS DASHBOARDS
                          DATA LIFE CYCLE OPTIONS

Thursday, March 5, 2009
A CRITIQUE


                          DOES VISUAL PROGRAMMING
                               REALLY WORK?




Thursday, March 5, 2009
A CRITIQUE


                          WHY NOT JUST USE GROOVY?



Thursday, March 5, 2009
COMMERCIAL OPTIONS



                          THERE ARE MANY



Thursday, March 5, 2009
COMMERCIAL OPTIONS


                          THEY ALL PROBABLY INVOLVE
                                    GOLF




Thursday, March 5, 2009
COMMERCIAL OPTIONS


                  THIS IS DOSUG, SO WE’LL MOVE
                              ON




Thursday, March 5, 2009
ACKNOWLEDGEMENTS


                             THANKS TO
                     www.intellidata.net FOR THE
                             TEST DATA!



Thursday, March 5, 2009
THANK YOU!

                          TIM BERGLUND
                          AUGUST TECHNOLOGY GROUP, LLC
                          http://www.augusttechgroup.com
                          tim.berglund@augusttechgroup.com
                          @tlberglund




Thursday, March 5, 2009
PHOTO CREDITS

                            OIL DRUMS: HTTP://WWW.FLICKR.COM/PHOTOS/THE_JUSTIFIED_SINNER/2720599186/
                                  JUNGLE: HTTP://WWW.FLICKR.COM/PHOTOS/LOLLYKNIT/1155225799/
                          FRENCH GARDEN: HTTP://WWW.FLICKR.COM/PHOTOS/NOMAD-PHOTOGRAPHY/23295537/
                              SNOWFLAKE: HTTP://WWW.FLICKR.COM/PHOTOS/JOHNCHARLTON/360919818/




Thursday, March 5, 2009

More Related Content

Viewers also liked

A Simple Direct Marketing Primer for Business Executives & Digital Marketers
A Simple Direct Marketing Primer for Business Executives & Digital MarketersA Simple Direct Marketing Primer for Business Executives & Digital Marketers
A Simple Direct Marketing Primer for Business Executives & Digital MarketersGilbert Direct Marketing, Inc.
 
The last mystery from norway, geography and environment
The last mystery from norway, geography and environmentThe last mystery from norway, geography and environment
The last mystery from norway, geography and environmentEva Rekkedal
 
CüMle Bilgisi
CüMle BilgisiCüMle Bilgisi
CüMle Bilgisiyardimt
 
Social che? Seminario Aif
Social che?  Seminario AifSocial che?  Seminario Aif
Social che? Seminario AifFabrizio Faraco
 
What I Learned At Drupal Con Dc 2009
What I Learned At Drupal Con Dc 2009What I Learned At Drupal Con Dc 2009
What I Learned At Drupal Con Dc 2009Neil Giarratana
 
Class Project MBA-29
Class Project MBA-29Class Project MBA-29
Class Project MBA-29guest268b49
 
Skillscape_Introduction
Skillscape_IntroductionSkillscape_Introduction
Skillscape_Introductiongroverak
 
Dilbilgisi
DilbilgisiDilbilgisi
Dilbilgisiyardimt
 
Merita Citit iti multumeste
Merita Citit iti multumesteMerita Citit iti multumeste
Merita Citit iti multumesteCarla Alman
 
Animales De La Granja
Animales De La GranjaAnimales De La Granja
Animales De La Granjaclasidanez
 
Disruptive Innovation in Higher Education: Why business schools should be wel...
Disruptive Innovation in Higher Education: Why business schools should be wel...Disruptive Innovation in Higher Education: Why business schools should be wel...
Disruptive Innovation in Higher Education: Why business schools should be wel...Jeremy Williams
 
Emerging Paradigms in International Management Education
Emerging Paradigms in International Management EducationEmerging Paradigms in International Management Education
Emerging Paradigms in International Management EducationJeremy Williams
 
Fiilde Zaman
Fiilde ZamanFiilde Zaman
Fiilde Zamanyardimt
 
坪井創吾さん / "王子様本のRuby1.9対応を調べる"
坪井創吾さん / "王子様本のRuby1.9対応を調べる"坪井創吾さん / "王子様本のRuby1.9対応を調べる"
坪井創吾さん / "王子様本のRuby1.9対応を調べる"toRuby
 

Viewers also liked (20)

Lehen Bigarren Teresa Santos
Lehen Bigarren Teresa SantosLehen Bigarren Teresa Santos
Lehen Bigarren Teresa Santos
 
Rw Newsletter24
Rw Newsletter24Rw Newsletter24
Rw Newsletter24
 
Color can tell..
Color can tell..Color can tell..
Color can tell..
 
A Simple Direct Marketing Primer for Business Executives & Digital Marketers
A Simple Direct Marketing Primer for Business Executives & Digital MarketersA Simple Direct Marketing Primer for Business Executives & Digital Marketers
A Simple Direct Marketing Primer for Business Executives & Digital Marketers
 
The last mystery from norway, geography and environment
The last mystery from norway, geography and environmentThe last mystery from norway, geography and environment
The last mystery from norway, geography and environment
 
CüMle Bilgisi
CüMle BilgisiCüMle Bilgisi
CüMle Bilgisi
 
idei de afaceri
idei de afaceriidei de afaceri
idei de afaceri
 
Social che? Seminario Aif
Social che?  Seminario AifSocial che?  Seminario Aif
Social che? Seminario Aif
 
What I Learned At Drupal Con Dc 2009
What I Learned At Drupal Con Dc 2009What I Learned At Drupal Con Dc 2009
What I Learned At Drupal Con Dc 2009
 
Ekler
EklerEkler
Ekler
 
Class Project MBA-29
Class Project MBA-29Class Project MBA-29
Class Project MBA-29
 
Skillscape_Introduction
Skillscape_IntroductionSkillscape_Introduction
Skillscape_Introduction
 
Dilbilgisi
DilbilgisiDilbilgisi
Dilbilgisi
 
Merita Citit iti multumeste
Merita Citit iti multumesteMerita Citit iti multumeste
Merita Citit iti multumeste
 
Vacanze 2009
Vacanze 2009Vacanze 2009
Vacanze 2009
 
Animales De La Granja
Animales De La GranjaAnimales De La Granja
Animales De La Granja
 
Disruptive Innovation in Higher Education: Why business schools should be wel...
Disruptive Innovation in Higher Education: Why business schools should be wel...Disruptive Innovation in Higher Education: Why business schools should be wel...
Disruptive Innovation in Higher Education: Why business schools should be wel...
 
Emerging Paradigms in International Management Education
Emerging Paradigms in International Management EducationEmerging Paradigms in International Management Education
Emerging Paradigms in International Management Education
 
Fiilde Zaman
Fiilde ZamanFiilde Zaman
Fiilde Zaman
 
坪井創吾さん / "王子様本のRuby1.9対応を調べる"
坪井創吾さん / "王子様本のRuby1.9対応を調べる"坪井創吾さん / "王子様本のRuby1.9対応を調べる"
坪井創吾さん / "王子様本のRuby1.9対応を調べる"
 

More from Tim Berglund

Distributed Systems In One Lesson
Distributed Systems In One LessonDistributed Systems In One Lesson
Distributed Systems In One LessonTim Berglund
 
Decision Making in Software Teams
Decision Making in Software TeamsDecision Making in Software Teams
Decision Making in Software TeamsTim Berglund
 
Then our buildings shape us 10 minutes
Then our buildings shape us   10 minutesThen our buildings shape us   10 minutes
Then our buildings shape us 10 minutesTim Berglund
 
Complexity Theory and Software Development
Complexity Theory and Software DevelopmentComplexity Theory and Software Development
Complexity Theory and Software DevelopmentTim Berglund
 
Gaelyk: Lightweight Groovy on the Google App Engine
Gaelyk: Lightweight Groovy on the Google App EngineGaelyk: Lightweight Groovy on the Google App Engine
Gaelyk: Lightweight Groovy on the Google App EngineTim Berglund
 
Slaying The Legacy Dragon: Practical Lessons in Replacing Old Software
Slaying The Legacy Dragon: Practical Lessons in Replacing Old SoftwareSlaying The Legacy Dragon: Practical Lessons in Replacing Old Software
Slaying The Legacy Dragon: Practical Lessons in Replacing Old SoftwareTim Berglund
 
Test First Refresh Second: Test-Driven Development in Grails
Test First Refresh Second: Test-Driven Development in GrailsTest First Refresh Second: Test-Driven Development in Grails
Test First Refresh Second: Test-Driven Development in GrailsTim Berglund
 
Test First, Refresh Second: Web App TDD in Grails
Test First, Refresh Second: Web App TDD in GrailsTest First, Refresh Second: Web App TDD in Grails
Test First, Refresh Second: Web App TDD in GrailsTim Berglund
 
Agile Database Development with Liquibase
Agile Database Development with LiquibaseAgile Database Development with Liquibase
Agile Database Development with LiquibaseTim Berglund
 
Database Refactoring With Liquibase
Database Refactoring With LiquibaseDatabase Refactoring With Liquibase
Database Refactoring With LiquibaseTim Berglund
 

More from Tim Berglund (11)

Distributed Systems In One Lesson
Distributed Systems In One LessonDistributed Systems In One Lesson
Distributed Systems In One Lesson
 
NoSQL Smackdown!
NoSQL Smackdown!NoSQL Smackdown!
NoSQL Smackdown!
 
Decision Making in Software Teams
Decision Making in Software TeamsDecision Making in Software Teams
Decision Making in Software Teams
 
Then our buildings shape us 10 minutes
Then our buildings shape us   10 minutesThen our buildings shape us   10 minutes
Then our buildings shape us 10 minutes
 
Complexity Theory and Software Development
Complexity Theory and Software DevelopmentComplexity Theory and Software Development
Complexity Theory and Software Development
 
Gaelyk: Lightweight Groovy on the Google App Engine
Gaelyk: Lightweight Groovy on the Google App EngineGaelyk: Lightweight Groovy on the Google App Engine
Gaelyk: Lightweight Groovy on the Google App Engine
 
Slaying The Legacy Dragon: Practical Lessons in Replacing Old Software
Slaying The Legacy Dragon: Practical Lessons in Replacing Old SoftwareSlaying The Legacy Dragon: Practical Lessons in Replacing Old Software
Slaying The Legacy Dragon: Practical Lessons in Replacing Old Software
 
Test First Refresh Second: Test-Driven Development in Grails
Test First Refresh Second: Test-Driven Development in GrailsTest First Refresh Second: Test-Driven Development in Grails
Test First Refresh Second: Test-Driven Development in Grails
 
Test First, Refresh Second: Web App TDD in Grails
Test First, Refresh Second: Web App TDD in GrailsTest First, Refresh Second: Web App TDD in Grails
Test First, Refresh Second: Web App TDD in Grails
 
Agile Database Development with Liquibase
Agile Database Development with LiquibaseAgile Database Development with Liquibase
Agile Database Development with Liquibase
 
Database Refactoring With Liquibase
Database Refactoring With LiquibaseDatabase Refactoring With Liquibase
Database Refactoring With Liquibase
 

Recently uploaded

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Recently uploaded (20)

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Talend Open Studio: How I Learned To Relax And Enjoy ETL

Editor's Notes

  1. I’m your presenter.
  2. I work with a lot of open source technologies.
  3. I’m also a Java developer.
  4. I haven’t written a lot of Java in the past year. Mostly these days I write Groovy.
  5. I participate in the local development community by serving on the boards of www.denveropensource.org and www.iasadenver.org.
  6. The August Technology Group is my consulting firm. The group’s cardinality is not large.
  7. I’m all those things, but I am not a DBA. Being a DBA is a valuable specialty one has to pursue in exclusion to being a developer. Notwithstanding that fact, we’re going to pretend to be DBAs in this talk.
  8. As enterprise and web developers, our worlds are filled with data.
  9. Unfortunately, the data is not always in the condition we’d like. It’s in the wrong place, it’s not clean, and it’s stored in containers that might not suit our purposes.
  10. Businesses run with data in this condition, so it’s hardly worthless. There are some things it can’t do in this form, though.
  11. To be useful for analysis purposes, data should be in one place, and should be structured in such a way that it cooperates with the analysis process, rather than the read-write processes required by the operation of the business.
  12. We’re going to be looking at the free and open-source Talend Open Studio to help us with this problem.
  13. We’ll also refer to some basic data warehousing theory covered in this book. It’s a great reference if you need to know the topic.
  14. Talend wants to provide a lower-cost ETL tool set still capable of enterprise-grade work.
  15. We’ll be looking at the free product in this presentation, but they also offer subscription products with features useful for team environments.
  16. Your Talend jobs will generate code. We’ll be looking at the Java option, but you can have them generate Perl code as well. I suspect this doesn’t see many enterprise deployments.
  17. Often text that appears in the UI will sound kind of French. To American audiences, this lends the program an air of elegance! :)
  18. Their web site, for further research.
  19. Let’s look over the basic parts of Talend Open Studio that we’ll be interacting with.
  20. We will focus our attention on the Job Designer and the Metadata Repository. The ability to add custom Java and Groovy code is a very important feature of the product, but we don’t have time to cover it here.
  21. The Business Modeler is a very simple drawing program that is for documentation purposes only. You’re better off using OmniGraffle or Visio, and focusing your attention on the things Talend does well.
  22. In the Job Designer, we’ll drag visual components from a palette, configure their properties, and connect them with data and event flows.
  23. Here’s a screen shot of a simple job.
  24. The Metadata Repository holds more than just schemas. It also holds actual database connections (potentially to disparate relational sources), input and output file definitions, and even WSDL files if you’re connecting to SOAP web services.
  25. Let’s build a simple transform of an OPML file into an Excel spreadsheet containing a list of the RSS feeds and a text file listing the different types of links in the file.
  26. Now, a very brief review of data warehousing by a non-DBA.
  27. First there is the system or set of systems containing the data of interest.
  28. Transactionality ensures consistency in the presence of write operations that may fail while in process. It works great when there are many small reads and writes. It doesn’t work well for large bulk writes.
  29. We all know the great things normalization does for us, and we love it. However, a normalized schema is a lot harder for business users to query.
  30. We really shouldn’t be running reports against a production system. Even if we don’t break it (and given enough time, we probably will), we will certainly slow it down. We should keep our hands off.
  31. The data warehouse knows nothing of substance other than what it gets from the system of record.
  32. We only ever write to the data warehouse in large, bulk updates, so we don’t necessarily want transactions. They wouldn’t help us, but they would slow us down.
  33. The tables in the warehouse are often very wide, containing lots of null fields as necessary. It’s not particularly normalized, but it is easy to query.
  34. It may be a “production” system of sorts, but when it goes down, lines of business do not close. Decision-making may be compromised, but business can continue. This doesn’t mean it will necessarily lack an uptime SLA, but it does mean that only internal users will be affected by downtime.
  35. Let’s transform a transactional database into a data warehouse using Talend Open Studio.
  36. Start with this. This is an actual schema from a startup I was involved in. No, you can’t look any closer. :)
  37. This is what we want to produce. To be fair, this is just a part of the order schema. The whole ERD would take a bigger warehouse schema to represent it all.
  38. Extracting is obtaining and copying disparate data sources. Might just be queries from a relational DB, might be XML files, text lookup tables, Excel, Access, web services, web logs...
  39. Extract data should be validated. Make sure data meets constraints, referential integrity is enforced, business rules are met, etc.
  40. Because it comes from all over, different data models must be reconciled. Customer Support and Sales might both create account records—are they the same?
  41. This extracted, cleaned, conformed data is transformed into a standard schema form, regardless of how the DB designer build the transactional database.
  42. The snowflake schema is the standard pattern of the data warehouse.
  43. A fact table represents a single measurable business event. In this case, we are measuring an order. It has the measured quantities that apply to the fact’s grain—the kind of thing it is measuring—plus foreign keys to dimensions.
  44. Dimensions supply detail about a fact. They are heavily denormalized, possibly 100 or 200 columns wide, and may have many nulls. Generally they do not have any foreign keys, so all columns are data values.
  45. After your transactional data is transformed into the warehouse, the offline snowflake schema is now available to standard tools that can perform analysis and reports more easily, and without stressing the production system.
  46. In general, I don’t believe it does. But this is not the general case; it is ETL. The problem domain is specialized enough that visual tools seem to be pretty effective.
  47. Custom coding is always an option for ETL systems (in Groovy or any other language), but it’s probably not the right option. The constraints imposed by a tool will probably result in a simpler, more robust system.
  48. ETL is not cheap.
  49. This is an open source group, so we’ll stick to the open source options. The commercial offerings are surely worthy products that have their own strengths.
  50. All images used in this presentation are either licensed from iStockphoto.com or are Creative Commons Attribution works from flickr.com.