TechFest – Open Source ETL Software David Morris Fort Worth, TX October, 2008 * For Internal Use Only *
Table of Contents <ul><li>Introduction </li></ul><ul><li>Open Source Software </li></ul><ul><ul><li>Brief History of Open ...
This presentation will attempt to clarify misconceptions about open source software and discuss how it can benefit IT orga...
Table of Contents <ul><li>Introduction </li></ul><ul><li>Open Source Software </li></ul><ul><ul><li>Brief History of Open ...
The idea of open source or free software has a rich history that began in the 1960s <ul><li>1969 - ARPANET - Advanced Rese...
“ Free as in speech, not beer.” – Richard Stallman <ul><li>“ Open source” is not free </li></ul><ul><li>Many open source s...
Open source doesn't just mean access to the source code. The distribution terms of open-source software must comply with t...
Open source doesn't just mean access to the source code. The distribution terms of open-source software must comply with t...
There are many different types of hard costs associated with leveraging open source software for enterprise IT projects <u...
There are many different types of hard costs associated with leveraging open source software for enterprise IT projects (c...
There are also soft or intangible costs associated with enterprise open source software projects and are typically harder ...
Many organizations do not have a formal process in place to do a comprehensive financial analysis of software commitments ...
There are a variety of business models that have proven to work for companies who want to make money using open source sof...
There are many barriers to open source adoption in IT organizations, most of which are risk related <ul><li>Open source li...
Enterprise open source adoption offers many benefits to IT organizations within any type of business <ul><li>Short term Co...
There are many different open source licenses and it can be difficult to distinguish one license from another <ul><li>Most...
Open source software licenses can range from very simple to relatively complex <ul><li>Software Licenses </li></ul><ul><ul...
Another great benefit of open source software is the ability to download and try it out for free, and even learn about the...
Open source software websites offer  free downloads, documentation SSIS Informatica Ab Initio Blender Talend Pentaho Propr...
Table of Contents <ul><li>Introduction </li></ul><ul><li>Open Source Software </li></ul><ul><ul><li>Brief History of Open ...
The concept of business intelligence has been around since the 1950s and became popularized in the software industry in th...
There are many “big players” in the business intelligence software space that have been around for much longer than their ...
Open source business intelligence solutions started coming onto the scene in 2000, but the space began to explode in 2005
There are a variety of open source business intelligence products that are beginning to compete against proprietary altern...
Companies that sponsor and develop open source projects or offer many different types of tools and support options that va...
There are often a wider variety of support options available when using open source software, but they tend to be less mat...
Most companies providing support for open source projects offer different support levels such as Bronze, Silver, Gold, and...
Table of Contents <ul><li>Introduction </li></ul><ul><li>Open Source Software </li></ul><ul><ul><li>Brief History of Open ...
Talend claims to be the “first provider of open source data integration software”, and while that is not really the case, ...
Talend’s software is written in Java and has a user interface built around the Eclipse IDE, and it includes a basic busine...
Pentaho Kettle provides a full spectrum of business intelligence capabilities including reporting, analysis, dashboards, d...
Pentaho Kettle provides a full spectrum of business intelligence capabilities including reporting, analysis, dashboards, d...
“ Yahoo Pipes is a powerful composition tool to aggregate, manipulate, and mashup content from around the web.”  <ul><li>W...
This Yahoo Pipes Demo uses data exported from the DeepBlue Employee List and transforms the data into a georss and kml fee...
Table of Contents <ul><li>Introduction </li></ul><ul><li>Open Source Software </li></ul><ul><ul><li>Brief History of Open ...
Thank you for attending, please let me know if you have any questions <ul><li>Special Thanks to: </li></ul><ul><li>Samir R...
Upcoming SlideShare
Loading in...5
×

Open Source ETL

3,068

Published on

A Presentation I gave at the Pariveda Solutions 2008 Company Trip in Forth Worth, TX.

Published in: Technology, News & Politics
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,068
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
343
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Hi everyone, my name is David Morris, and I’m a C2 in the Houston Office for those of you who don’t know me yet. In this presentation, I’m going to talk about open source software, and specifically ETL, or Extract, Transform and Load software. I am sure many of you are familiar with ETL software if you’ve ever used Microsoft SQL Server Integration Services. Regardless of your previous experience level with ETL software or business intelligence in general, I hope everyone learns something from my presentation today. Without further ado, thank you for coming, and I’m going to get started. I’m using an experimental plugin for PowerPoint today, so bear with me if the slide transitions are a little strange to you. I like to try new things, and it was free, so we’ll see how it goes… Go over agenda on next slide
  • Open Source ETL

    1. 1. TechFest – Open Source ETL Software David Morris Fort Worth, TX October, 2008 * For Internal Use Only *
    2. 2. Table of Contents <ul><li>Introduction </li></ul><ul><li>Open Source Software </li></ul><ul><ul><li>Brief History of Open Source Software </li></ul></ul><ul><ul><li>The Cost of Open Source </li></ul></ul><ul><ul><li>Barriers to Open Source Adoption </li></ul></ul><ul><ul><li>Open Source Licensing Overview </li></ul></ul><ul><li>Business Intelligence Software </li></ul><ul><ul><li>Business Intelligence Software Overview </li></ul></ul><ul><ul><li>Brief History of Open Source BI Software </li></ul></ul><ul><ul><li>Open Source BI Vendor Offerings </li></ul></ul><ul><li>Demonstrations </li></ul><ul><ul><li>Talend OpenStudio </li></ul></ul><ul><ul><li>Pentaho Kettle </li></ul></ul><ul><ul><li>Yahoo Pipes </li></ul></ul><ul><li>Conclusion </li></ul>
    3. 3. This presentation will attempt to clarify misconceptions about open source software and discuss how it can benefit IT organizations in many different ways <ul><li>Open Source vs. Free Software </li></ul><ul><li>Many people do not understand the definition of open source </li></ul><ul><li>There is no such thing as free software </li></ul><ul><li>Benefits of Open Source </li></ul><ul><li>The potential for cost savings is the number one motivation to use enterprise open source software </li></ul><ul><ul><li>Software license costs is the most likely component where savings will occur </li></ul></ul><ul><ul><li>Cost savings in general are difficult to calculate </li></ul></ul><ul><li>Flexibility often turns out to be the most beneficial result of using open source software </li></ul><ul><li>Enterprise open source software has cost the proprietary software industry an estimated $60 billion per year </li></ul><ul><li>Many organizations do not have a formal process in place to do a comprehensive financial analysis of software commitments </li></ul><ul><li>Business Intelligence Software </li></ul><ul><li>Open source business intelligence software has matured over the last three years and many companies are beginning to offer an impressive set of tools free of up front licensing costs </li></ul><ul><li>These open source projects are typically backed by a corporation that pays full time employees to develop the core code base, and earns money through support contracts and consulting services </li></ul><ul><li>Many of the open source BI products are built using Java technology, with user interfaces built over the Eclipse IDE </li></ul><ul><li>Evaluating open source BI software can offer a fresh perspective on techniques and processes used by competing proprietary software </li></ul><ul><li>Many of these tools have the potential to be a real competitor in the BI space </li></ul>
    4. 4. Table of Contents <ul><li>Introduction </li></ul><ul><li>Open Source Software </li></ul><ul><ul><li>Brief History of Open Source Software </li></ul></ul><ul><ul><li>The Cost of Open Source </li></ul></ul><ul><ul><li>Barriers to Open Source Adoption </li></ul></ul><ul><ul><li>Open Source Licensing Overview </li></ul></ul><ul><li>Business Intelligence Software </li></ul><ul><ul><li>Business Intelligence Software Overview </li></ul></ul><ul><ul><li>Brief History of Open Source BI Software </li></ul></ul><ul><ul><li>Open Source BI Vendor Offerings </li></ul></ul><ul><li>Demonstrations </li></ul><ul><ul><li>Talend OpenStudio </li></ul></ul><ul><ul><li>Pentaho Kettle </li></ul></ul><ul><ul><li>Yahoo Pipes </li></ul></ul><ul><li>Questions </li></ul>
    5. 5. The idea of open source or free software has a rich history that began in the 1960s <ul><li>1969 - ARPANET - Advanced Research Projects Agency Network </li></ul><ul><ul><li>First operational packet switching network </li></ul></ul><ul><ul><li>Predecessor of the internet </li></ul></ul><ul><li>1970s – Email (SMTP), File Transfer Protocol, Network Voice Protocol (NVP) standards developed </li></ul><ul><li>1985 – Free Software Foundation – Richard Stallman </li></ul><ul><ul><li>Universal freedom to distribute and modify computer software without restriction </li></ul></ul><ul><ul><li>Founded to support the free software movement </li></ul></ul><ul><ul><li>Enforcement of the General Public License </li></ul></ul><ul><li>1992 – Linux kernel released under GPL – Linus Torvalds </li></ul><ul><li>1998 - Open Source Initiative (OSI) – Bruce Perens and Eric Raymond </li></ul><ul><ul><li>Formalized open source software and brought the model to major software companies </li></ul></ul><ul><ul><li>Formulated the Open Source Definition to determine which licenses are actually “open source” licenses </li></ul></ul><ul><li>1998 – Netscape Navigator releases source code </li></ul><ul><ul><li>known today as Firefox and Thunderbird </li></ul></ul><ul><li>1999 – Sun Microsystems releases StarOffice under GPL </li></ul><ul><ul><li>later renamed OpenOffice </li></ul></ul>
    6. 6. “ Free as in speech, not beer.” – Richard Stallman <ul><li>“ Open source” is not free </li></ul><ul><li>Many open source software licenses are free, but some licenses have costs associated with them </li></ul><ul><li>Many mature open source projects, especially operating systems, earn money from paid support and documentation </li></ul><ul><li>“ Open source is a free like a puppy is free.” – Scott McNealy, Chairman of Sun Microsystems </li></ul><ul><li>“ Lowering the cost of goods tends to increase the total investment of the people and infrastructure that sustain it.” – Eric Raymond, The Magic Cauldron </li></ul><ul><li>http://www.gnu.org/philosophy/open-source-misses-the-point.html </li></ul>
    7. 7. Open source doesn't just mean access to the source code. The distribution terms of open-source software must comply with the following criteria <ul><li>Free Redistribution </li></ul><ul><li>The license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license shall not require a royalty or other fee for such sale. </li></ul><ul><li>Source Code </li></ul><ul><li>The program must include source code, and must allow distribution in source code as well as compiled form. Where some form of a product is not distributed with source code, there must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction cost preferably, downloading via the Internet without charge. The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed. </li></ul><ul><li>Derived Works </li></ul><ul><li>The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software. </li></ul><ul><li>Integrity of The Author's Source Code </li></ul><ul><li>The license may restrict source-code from being distributed in modified form  only  if the license allows the distribution of &quot;patch files&quot; with the source code for the purpose of modifying the program at build time. The license must explicitly permit distribution of software built from modified source code. The license may require derived works to carry a different name or version number from the original software. </li></ul><ul><li>No Discrimination Against Persons or Groups </li></ul><ul><li>The license must not discriminate against any person or group of persons. </li></ul><ul><li>http://opensource.org/docs/osd </li></ul>
    8. 8. Open source doesn't just mean access to the source code. The distribution terms of open-source software must comply with the following criteria (cont.) <ul><li>No Discrimination Against Fields of Endeavor </li></ul><ul><li>The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research. </li></ul><ul><li>Distribution of License </li></ul><ul><li>The rights attached to the program must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties. </li></ul><ul><li>License Must Not Be Specific to a Product </li></ul><ul><li>The rights attached to the program must not depend on the program's being part of a particular software distribution. If the program is extracted from that distribution and used or distributed within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the original software distribution. </li></ul><ul><li>License Must Not Restrict Other Software </li></ul><ul><li>The license must not place restrictions on other software that is distributed along with the licensed software. For example, the license must not insist that all other programs distributed on the same medium must be open-source software. </li></ul><ul><li>License Must Be Technology-Neutral </li></ul><ul><li>No provision of the license may be predicated on any individual technology or style of interface. </li></ul>
    9. 9. There are many different types of hard costs associated with leveraging open source software for enterprise IT projects <ul><li>Software Licenses </li></ul><ul><ul><li>Referring to the licenses themselves, not legal terms and conditions </li></ul></ul><ul><ul><li>Often offers the most potential cost savings vs. proprietary software </li></ul></ul><ul><li>Hardware </li></ul><ul><ul><li>Open source software often has reduced hardware requirements </li></ul></ul><ul><li>Support </li></ul><ul><ul><li>Often more but less mature options for support in open source projects </li></ul></ul><ul><li>Development </li></ul><ul><ul><li>Access to source code can help make development easier and less costly </li></ul></ul><ul><ul><li>Lack of feature parity with proprietary software may create a need for more custom development </li></ul></ul><ul><ul><li>Opportunity to give source code back to the open source community </li></ul></ul><ul><li>Professional Services </li></ul><ul><ul><li>Development, installation, and configuration costs </li></ul></ul><ul><ul><li>Offered by many open source software vendors </li></ul></ul><ul><li>Training </li></ul><ul><ul><li>Offered directly by software vendor </li></ul></ul><ul><ul><li>Through a professional training center or educational institution </li></ul></ul><ul><ul><li>On-site or off-site, or online </li></ul></ul><ul><li>Testing </li></ul><ul><ul><li>Unit testing, performance testing, functional testing, test scripts, use-case scenarios, quality assurance costs </li></ul></ul>
    10. 10. There are many different types of hard costs associated with leveraging open source software for enterprise IT projects (cont.) <ul><li>Operations (Manageability) </li></ul><ul><ul><li>Mix of labor, management and monitoring tools configuration, creation of manuals to support operations </li></ul></ul><ul><ul><li>Open source tends to have less mature management capabilities </li></ul></ul><ul><li>Staffing </li></ul><ul><ul><li>No conclusive evidence to show that staffing open source projects is cheaper than for proprietary projects </li></ul></ul><ul><li>Maintenance Contracts </li></ul><ul><ul><li>15-25 percent of the license costs or equipment costs per year. </li></ul></ul><ul><ul><li>Calculated using the list price, not the actual paid price </li></ul></ul><ul><ul><li>Treated separate from support contracts in many open source projects </li></ul></ul><ul><ul><li>Costs associated with patching and updating software over time </li></ul></ul><ul><ul><li>Often free with most zero-cost open source software licenses </li></ul></ul><ul><li>Migration </li></ul><ul><ul><li>Especially for system replacement projects where existing data must be migrated to the new application </li></ul></ul><ul><li>Environmental </li></ul><ul><ul><li>Datacenter and hosting costs, floor space, power, bandwidth, hardware leasing </li></ul></ul><ul><li>Documentation </li></ul><ul><ul><li>Often coincides with the training category above </li></ul></ul><ul><li>Configuration </li></ul><ul><ul><li>Often captured with the development and operations categories </li></ul></ul>
    11. 11. There are also soft or intangible costs associated with enterprise open source software projects and are typically harder to calculate than hard costs <ul><li>Downtime </li></ul><ul><ul><li>financial impact of system outage </li></ul></ul><ul><li>IP Risk </li></ul><ul><ul><li>Legal/litigation costs </li></ul></ul><ul><li>License Auditing Risk </li></ul><ul><ul><li>Resources required to perform a vendor-required license audit </li></ul></ul><ul><li>License Management </li></ul><ul><ul><li>Resources required to manage deployment of licenses and purchase of additional licenses as the deployment grows </li></ul></ul><ul><li>License Negotiation Overhead </li></ul><ul><ul><li>Legal costs required in negotiating the software licensing contract </li></ul></ul><ul><li>Planning </li></ul><ul><ul><li>Resources for planning and overhead </li></ul></ul><ul><li>Process Inefficiencies </li></ul><ul><ul><li>Lost time and costs related to process activities </li></ul></ul><ul><li>Procurement Overhead </li></ul><ul><ul><li>Purchase cost and resources required to procure the software </li></ul></ul><ul><li>Productivity </li></ul><ul><ul><li>Efficiencies from using the software </li></ul></ul><ul><li>Reliability </li></ul><ul><ul><li>Financial impact of improved system reliability and uptime </li></ul></ul><ul><li>Support Quality </li></ul><ul><ul><li>Resources required for software support </li></ul></ul>
    12. 12. Many organizations do not have a formal process in place to do a comprehensive financial analysis of software commitments <ul><li>Below is an example finanical analysis spreadsheet comparing the use of open source software vs. proprietary for an enterprise IT project. It is interesting to look at to see how misleading many of these open source software vendors can be when they are desperate for a new client. </li></ul>451 CAOS Report 2 - Cost Conscious A practical guide for understanding and calculating the financial benefits of open source for enterprise IT projects The 451 Group                         Calculator - Example A                         Open Source Option   Proprietary/Existing Option   Initial Investment Year 1 Year 2 Year 3 TOTALS     Initial Investment Year 1 Year 2 Year 3 TOTALS HARD COSTS $76,250.00 $41,200.00 $111,150.00 $49,100.00 $277,700.00   HARD COSTS $296,250.00 $99,200.00 $176,150.00 $104,100.00 $675,700.00 Software Licenses - - - - $0.00   Software Licenses $200,000.00 $50,000.00 $50,000.00 $50,000.00 $350,000.00 Hardware $25,000.00 $3,000.00 $3,000.00 $3,000.00 $34,000.00   Hardware $25,000.00 $3,000.00 $3,000.00 $3,000.00 $34,000.00 Support - $25,000.00 $30,000.00 $35,000.00 $90,000.00   Support - $30,000.00 $35,000.00 $40,000.00 $105,000.00 Development $22,000.00 $2,500.00 $5,000.00 $2,500.00 $32,000.00   Development $22,000.00 $2,500.00 $5,000.00 $2,500.00 $32,000.00 Professional Services $12,500.00 $3,000.00 - - $15,500.00   Professional Services $25,000.00 $6,000.00 - - $31,000.00 Training $7,500.00 - - - $7,500.00   Training $15,000.00 - - - $15,000.00 Testing $3,000.00 $3,000.00 $3,000.00 $3,000.00 $12,000.00   Testing $3,000.00 $3,000.00 $3,000.00 $3,000.00 $12,000.00 Operations $2,500.00 $500.00 $500.00 $500.00 $4,000.00   Operations $2,500.00 $500.00 $500.00 $500.00 $4,000.00 Staffing - - $65,000.00 - $65,000.00   Staffing - - $75,000.00 - $75,000.00 Maintenance Contracts $3,750.00 $4,200.00 $4,650.00 $5,100.00 $17,700.00   Maintenance Contracts $3,750.00 $4,200.00 $4,650.00 $5,100.00 $17,700.00                           SOFTS COSTS         -   SOFTS COSTS         - INTERNAL COSTS         -   INTERNAL COSTS         -                           REVENUE   $250,000.00 $500,000.00 $800,000.00 $1,550,000.00   REVENUE   $250,000.00 $500,000.00 $800,000.00 $1,550,000.00                           CASHFLOW             CASHFLOW           Period -$76,250.00 $208,800.00 $388,850.00 $750,900.00     Period -$296,250.00 $150,800.00 $323,850.00 $695,900.00   Cumulative (Payback) -$76,250.00 $132,550.00 $521,400.00 $1,272,300.00     Cumulative (Payback) -$296,250.00 -$145,450.00 $178,400.00 $874,300.00                             RATE OF RETURN   607% 450% 1629%     RATE OF RETURN   252% 284% 768%   PAYBACK PERIOD Year 1           PAYBACK PERIOD Year 2         NPV $954,643.20           NPV $591,891.97         IRR 340%           IRR 82%         cost of capital % 12.00%           cost of capital % 12.00%         NOTES:                         Use this spreadsheet to consider Cost Avoidance and Opportunity Costs… May add an additional &quot;COST SAVINGS&quot; column above the REVENUE column to account for existing sunk costs Not covered in this spreadsheet: Depreciation, Amortization, Capital and Expense Budgets, etc… Dave's Attempt at saving $3,750 by not buying the spreadsheet from the 451 group
    13. 13. There are a variety of business models that have proven to work for companies who want to make money using open source software <ul><li>Support Sellers (otherwise known as &quot;Give Away the Recipe, Open A Restaurant&quot;) : </li></ul><ul><ul><li>Give away the software product </li></ul></ul><ul><ul><li>Sell distribution, branding, and after-sale service </li></ul></ul><ul><ul><li>This is what RedHat does. </li></ul></ul><ul><li>Loss Leader </li></ul><ul><ul><li>Give away open-source as a loss-leader and market positioner for closed software. </li></ul></ul><ul><ul><li>Netscape, Digium (Asterisk) </li></ul></ul><ul><li>Widget Frosting </li></ul><ul><ul><li>Hardware company goes open-source in order to get better drivers and interface tools cheaper. </li></ul></ul><ul><ul><li>Silicon Graphics (Samba), Apple (Darwin) </li></ul></ul><ul><li>Accessorizing </li></ul><ul><ul><li>Selling accessories – books, compatible hardware, complete systems with open-source software pre-installed </li></ul></ul><ul><ul><li>O'Reilly Associates, OLPC, </li></ul></ul><ul><li>source: The Open Source Initiative: http://www.opensource.org/advocacy/case_for_business.php </li></ul>
    14. 14. There are many barriers to open source adoption in IT organizations, most of which are risk related <ul><li>Open source licenses are viral </li></ul><ul><li>Open source software lacks formal support and training </li></ul><ul><li>Software changes too often and is difficult to keep up </li></ul><ul><li>Lack of a long term roadmap </li></ul><ul><li>Sunk costs in existing projects </li></ul><ul><li>Switching costs </li></ul><ul><li>De facto industry standards </li></ul>
    15. 15. Enterprise open source adoption offers many benefits to IT organizations within any type of business <ul><li>Short term Cost savings </li></ul><ul><ul><li>Most IT organizations are motivated by short term cost savings when evaluating open source software adoption </li></ul></ul><ul><ul><li>The potential for saving money on software licensing fees is a huge factor in the cost equation </li></ul></ul><ul><ul><li>Software licensing fees can be a large percentage of the up front costs for new projects as well as massive expansion of existing projects </li></ul></ul><ul><li>Long term flexibility </li></ul><ul><ul><li>In the long run, the benefits of flexibility tend to outweigh the cost benefit of using open source software </li></ul></ul><ul><ul><li>Developers have access to the source code and have the ability to modify and customize it to suit their specific needs </li></ul></ul><ul><li>Reliability </li></ul><ul><ul><li>“ If builders built houses the way programmers built programs, the first woodpecker to come along would destroy civilization. “ – Gerald P. Weinberg </li></ul></ul><ul><ul><li>The internet depends on a variety of high reliability open source projects (DNS, sendmail, TCP/IP stacks, Perl) </li></ul></ul><ul><li>Avoiding vendor lock-in </li></ul><ul><ul><li>Organizations can become less vendor-dependent by using open source software </li></ul></ul><ul><ul><li>Avoiding vendor lock-in can help a company avoid severe switching costs </li></ul></ul><ul><ul><li>Royalty-Free standards vs. Free and Open Source Software (FOSS) </li></ul></ul><ul><li>Security </li></ul><ul><ul><li>&quot;Given enough eyeballs, all bugs are shallow.&quot; - Linus’ Law </li></ul></ul><ul><ul><li>Security problems can be identified quickly and someone will be able to fix it </li></ul></ul><ul><li>Performance </li></ul><ul><ul><li>An often-cited example is Linux vs. Windows server clusters </li></ul></ul>
    16. 16. There are many different open source licenses and it can be difficult to distinguish one license from another <ul><li>Most popular </li></ul><ul><ul><li>GNU General Public License </li></ul></ul><ul><ul><li>GNU Library or Lesser GPL </li></ul></ul><ul><ul><li>Apache Software License </li></ul></ul><ul><ul><li>Berkeley Software Distribution (BSD) </li></ul></ul><ul><ul><li>MIT License </li></ul></ul><ul><ul><li>Mozilla Public License </li></ul></ul><ul><ul><li>Eclipse Public License </li></ul></ul><ul><li>Special Purpose </li></ul><ul><ul><li>Educational Community License </li></ul></ul><ul><ul><li>NASA Open Source Agreement 1.3 </li></ul></ul><ul><ul><li>Open Group Test Suite License </li></ul></ul><ul><li>Miscellaneous </li></ul><ul><ul><li>Adaptive Public License </li></ul></ul><ul><ul><li>Artistic License 2.0 </li></ul></ul><ul><ul><li>Open Software License </li></ul></ul><ul><ul><li>Qt Public License </li></ul></ul><ul><li>And many more… </li></ul><ul><ul><li>http://www.opensource.org/licenses/category </li></ul></ul><ul><ul><li>http://en.wikipedia.org/wiki/Comparison_of_free_software_licences </li></ul></ul>
    17. 17. Open source software licenses can range from very simple to relatively complex <ul><li>Software Licenses </li></ul><ul><ul><li>Cost of the actual license </li></ul></ul><ul><ul><li>Many open source vendors have a dual license model </li></ul></ul><ul><ul><li>Not the legal licensing terms or conditions </li></ul></ul><ul><ul><li>Seen as the greatest potential for savings in an open source project </li></ul></ul><ul><ul><li>Savings on licenses often used to offset training and professional services costs </li></ul></ul><ul><ul><li>Can include client access licenses, desktop licenses, database license and development tools </li></ul></ul><ul><ul><li>Based on the number of CPUs or number of users </li></ul></ul><ul><ul><li>Every vendor has their own rules </li></ul></ul><ul><ul><li>Makes calculating project costs difficult </li></ul></ul><ul><li>Dual license model </li></ul><ul><ul><li>Choose between an open source (free) license or a commercial license that costs money </li></ul></ul><ul><ul><li>Trolltech Qt Example </li></ul></ul><ul><ul><li>Motivated by market segregation based business models and license compatibility needs </li></ul></ul><ul><li>Open Core License model </li></ul><ul><ul><li>core is GPL: if you embed the GPL in closed source, you pay a fee </li></ul></ul><ul><ul><li>technical support of GPL product may be offered for a fee (up for debate as to whether it must be offered) </li></ul></ul><ul><ul><li>annual commercial subscription includes: indemnity, technical support, and additional features and/or platform support. </li></ul></ul><ul><ul><li>Additional commercial features having viewable or closed source, becoming GPL after time bomb period are both up for debate </li></ul></ul><ul><ul><li>professional services and training are for a fee </li></ul></ul><ul><li>Licensing cost comparison works for new projects, but not necessarily existing projects </li></ul><ul><li>Must be estimated over the life of the project </li></ul><ul><li>Zero cost open source software has caused proprietary vendors to lower their prices and this trend will continue </li></ul><ul><ul><li>An estimated $60 billion per year is lost by proprietary software vendors </li></ul></ul>
    18. 18. Another great benefit of open source software is the ability to download and try it out for free, and even learn about the development history and statistics <ul><li>Ohloh.net is a site that gives everyone more visibility into open source software projects by providing statistics, tracking code commit history, providing package downloads, etc… </li></ul>
    19. 19. Open source software websites offer free downloads, documentation SSIS Informatica Ab Initio Blender Talend Pentaho Proprietary Open Source
    20. 20. Table of Contents <ul><li>Introduction </li></ul><ul><li>Open Source Software </li></ul><ul><ul><li>Brief History of Open Source Software </li></ul></ul><ul><ul><li>The Cost of Open Source </li></ul></ul><ul><ul><li>Barriers to Open Source Adoption </li></ul></ul><ul><ul><li>Open Source Licensing Overview </li></ul></ul><ul><li>Business Intelligence Software </li></ul><ul><ul><li>Business Intelligence Software Overview </li></ul></ul><ul><ul><li>Brief History of Open Source BI Software </li></ul></ul><ul><ul><li>Open Source BI Vendor Offerings </li></ul></ul><ul><li>Demonstrations </li></ul><ul><ul><li>Talend OpenStudio </li></ul></ul><ul><ul><li>Pentaho Kettle </li></ul></ul><ul><ul><li>Yahoo Pipes </li></ul></ul><ul><li>Questions </li></ul>
    21. 21. The concept of business intelligence has been around since the 1950s and became popularized in the software industry in the late 1980s and early 1990s <ul><li>1958: Business Intelligence term coined by Hans Peter Luhn – “to support better decision making” </li></ul><ul><ul><li>In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera. The communication facility serving the conduct of a business (in the broad sense) may be referred to as an intelligence system. The notion of intelligence is also defined here, in a more general sense, as &quot;the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal.“ – Hans Peter Luhn </li></ul></ul><ul><li>1989: &quot;Business intelligence&quot; is first used as an umbrella term to describe the set of “concepts and methods to improve business decision-making by using fact-based support systems.” – Howard Dresner </li></ul><ul><li>1990s: Business Intelligence software market explodes and becomes very difficult to keep track of who’s doing what. </li></ul>
    22. 22. There are many “big players” in the business intelligence software space that have been around for much longer than their open source counterparts <ul><li>Informatica </li></ul><ul><li>IBM Ascential </li></ul><ul><li>Microsoft DTS/SSIS </li></ul><ul><li>ACE*COMM </li></ul><ul><li>Ab Initio </li></ul><ul><li>Actuate </li></ul><ul><li>Comanche </li></ul><ul><li>CyberQuery </li></ul><ul><li>Dimensional Insight </li></ul><ul><li>IBM </li></ul><ul><ul><li>Applix </li></ul></ul><ul><ul><li>Cognos </li></ul></ul><ul><li>Informatica </li></ul><ul><li>Information Builders </li></ul><ul><li>LogiXML </li></ul><ul><li>LucidEra </li></ul><ul><li>Microsoft </li></ul><ul><ul><li>Microsoft Analysis Services </li></ul></ul><ul><ul><li>PerformancePoint Server 2007 </li></ul></ul><ul><ul><li>Proclarity </li></ul></ul><ul><ul><li>DTS/SSIS </li></ul></ul><ul><li>Microstrategy </li></ul><ul><li>Oracle Corporation </li></ul><ul><ul><li>Hyperion Solutions Corporation </li></ul></ul><ul><li>Panorama Software </li></ul><ul><li>Pervasive </li></ul><ul><li>Pilot Software, Inc. </li></ul><ul><li>PRELYTIS </li></ul><ul><li>Prospero Business Suite </li></ul><ul><li>Qliktech </li></ul><ul><li>SAP Business Information Warehouse </li></ul><ul><ul><li>Business Objects </li></ul></ul><ul><ul><li>OutlookSoft </li></ul></ul><ul><li>SAS Institute </li></ul><ul><li>Siebel Systems </li></ul><ul><li>Tibco </li></ul><ul><li>StatSoft </li></ul><ul><li>SPSS </li></ul><ul><li>Telerik Reporting </li></ul><ul><li>Teradata </li></ul><ul><li>Thomson Data Analyzer </li></ul>
    23. 23. Open source business intelligence solutions started coming onto the scene in 2000, but the space began to explode in 2005
    24. 24. There are a variety of open source business intelligence products that are beginning to compete against proprietary alternatives after only a few years of development <ul><li>Talend (Suresnes, France and Los Altos, CA) </li></ul><ul><li>OpenStudio </li></ul><ul><li>Integration Suite </li></ul><ul><li>Open Profiler </li></ul><ul><li>Pentaho (Orlando, FL and Belgium) </li></ul><ul><li>Reporting Engine </li></ul><ul><li>Kettle Data Integration </li></ul><ul><li>Weka Data Mining </li></ul><ul><li>Jasper (Dublin, Ireland) </li></ul><ul><li>JasperServer – Interactive, ad hoc, and managed reporting and dashboards </li></ul><ul><li>JasperAnalysis – Interactive data analysis, OLAP </li></ul><ul><li>JasperETL – Data Integration </li></ul><ul><li>JasperReports </li></ul><ul><li>Apatar (Chicopee, MA and Minsk, Belarus) </li></ul><ul><li>Merge </li></ul><ul><li>OnDemand </li></ul><ul><li>Mondrian </li></ul>
    25. 25. Companies that sponsor and develop open source projects or offer many different types of tools and support options that vary in cost <ul><li>Professional Services </li></ul><ul><li>Proof of Concept Development </li></ul><ul><li>On Demand Service Contracts </li></ul><ul><li>Consulting Services </li></ul><ul><li>Technology Assessments </li></ul><ul><li>Professional Tools </li></ul><ul><li>In addition to open source tools </li></ul><ul><li>Professional tools offer more functionality </li></ul><ul><li>Training </li></ul><ul><li>On-site or Online, Group or Individual </li></ul><ul><li>Certification Exams </li></ul><ul><li>Support Contracts </li></ul><ul><li>Typically priced on a per year or per incident basis </li></ul><ul><li>Variety of support options depends on popularity of the open source project </li></ul><ul><li>Technology Partners / Alliance Program </li></ul><ul><li>Training partners </li></ul><ul><li>Development partners </li></ul><ul><li>Platinum, Gold, Silver, Bronze levels or tiers </li></ul>
    26. 26. There are often a wider variety of support options available when using open source software, but they tend to be less mature than with proprietary software <ul><li>Three common support models </li></ul><ul><li>Professional support by open source software vendors </li></ul><ul><li>Third-party vendor or consultant support </li></ul><ul><li>Self-support </li></ul><ul><li>Various Feature Levels </li></ul><ul><li>Sold on a per user per year basis </li></ul><ul><li>Number of incident reports per year (1,2,3, unlimited) with the ability to purchase extra incidents </li></ul><ul><li>Web-based support </li></ul><ul><li>Email support </li></ul><ul><li>Phone support </li></ul><ul><li>Guaranteed response times (8 hours, 1 day, 2-3 business days) </li></ul><ul><li>Guaranteed diagnostic turnaround times </li></ul><ul><li>Access to a certified version of the software </li></ul><ul><li>Automatic notification of bug fixes / updates </li></ul><ul><li>Access to community or professional support forum and bug tracker </li></ul><ul><li>Access to advanced tutorials </li></ul><ul><li>Various Pricing Levels </li></ul><ul><li>$1,000 - $5,000 per user per year </li></ul><ul><li>Can be even more expensive for “Enterprise” or “Professional” versions of the open source tools </li></ul>
    27. 27. Most companies providing support for open source projects offer different support levels such as Bronze, Silver, Gold, and Platinum Company Support Level Description Type Price Talend Silver Support Three incidents per year Web support Guaranteed response times Access to a certified version of Talend´s products Automatic notification of updates and bug fixes Access to the support forum and bug tracker 1 user for 1 year $1,150.00 Talend Silver Support Same as above 5 users for 1 year $4,950.00 Talend Gold Support Unlimited incidents Web and email support 24 hour access on business days Guaranteed response times Guaranteed diagnostic turnaround times Access to a certified version of Talend´s products Automatic notification of updates and bug fixes Access to the support forum and bug tracker 1 user for 1 year $2,150.00 Talend Gold Support Same as above 3 users for 1 year $5,750.00 Talend Gold Support Same as above 5 users for 1 year $8,350.00 Talend Platinum Support Unlimited incidents Web, email and phone support 24 hour access (email and Web) on business days Guaranteed response times Guaranteed diagnostic turnaround times Access to a certified version of Talend´s products Automatic notification of updates and bug fixes Access to the support forum and bug tracker 1 user for 1 year ? Talend Platinum Support Same as above 3 users for 1 year ? Talend Platinum Support Same as above unlimited users ?
    28. 28. Table of Contents <ul><li>Introduction </li></ul><ul><li>Open Source Software </li></ul><ul><ul><li>Brief History of Open Source Software </li></ul></ul><ul><ul><li>The Cost of Open Source </li></ul></ul><ul><ul><li>Barriers to Open Source Adoption </li></ul></ul><ul><ul><li>Open Source Licensing Overview </li></ul></ul><ul><li>Business Intelligence Software </li></ul><ul><ul><li>Business Intelligence Software Overview </li></ul></ul><ul><ul><li>Brief History of Open Source BI Software </li></ul></ul><ul><ul><li>Open Source BI Vendor Offerings </li></ul></ul><ul><li>Demonstrations </li></ul><ul><ul><li>Talend OpenStudio </li></ul></ul><ul><ul><li>Pentaho Kettle </li></ul></ul><ul><ul><li>Yahoo Pipes </li></ul></ul><ul><li>Questions </li></ul>
    29. 29. Talend claims to be the “first provider of open source data integration software”, and while that is not really the case, their software does provide some unique functionality <ul><li>Talend Open Studio </li></ul><ul><li>“ the most open, innovative and powerful data integration solution on the market today.” </li></ul><ul><li>Provides connectors to almost any source or destination, and has an easy to use/learn interface </li></ul><ul><li>Talend Integration Suite </li></ul><ul><li>Open Studio with a subscription service for technical support and source control for team environments </li></ul><ul><li>Talend On Demand </li></ul><ul><li>Saas version of Open Studio, stores all metadata and source code in a central repository hosted by Talend </li></ul><ul><li>Does not require much configuration or administration by the development team </li></ul><ul><li>Talend Open Profiler </li></ul><ul><li>“ The first open source data profiling tool” </li></ul><ul><li>Allows users to define metrics and goals about data quality for databases, files, applications, etc… </li></ul><ul><li>Produces reports and graphs to display data quality issues and KPIs based on the defined metrics and goals </li></ul>
    30. 30. Talend’s software is written in Java and has a user interface built around the Eclipse IDE, and it includes a basic business modeler tool and real-time debugging capabilities Advanced Lookup/Join Editor Familiar Eclipse Interface Real-Time Debugger Basic Business Modeler
    31. 31. Pentaho Kettle provides a full spectrum of business intelligence capabilities including reporting, analysis, dashboards, data mining, and data integration <ul><li>Data Integration (Kettle) </li></ul><ul><li>“ Extract, Transform and Load (ETL) capabilities with an intuitive design environment” </li></ul><ul><li>“ Proven, scalable, standards-based architecture” </li></ul><ul><li>100% Java with broad, cross platform support </li></ul><ul><li>Advanced scheduling, process integration, reporting, and analysis </li></ul><ul><li>Reporting Engine/Dashboards </li></ul><ul><li>Visualize KPIs, metrics, etc… </li></ul><ul><li>Deploy as JSP pages </li></ul><ul><li>Integrates with Google Maps, uses AJAX, etc… </li></ul><ul><li>Data Mining (Weka) </li></ul><ul><li>Clustering, segmentation, decision trees, random forests, neural networks, and principal component analysis </li></ul><ul><li>Algorithms can be called from code or applied directly </li></ul><ul><li>OLAP Server (Mondrian) </li></ul><ul><li>Web-based interface </li></ul><ul><li>Excel Plugin </li></ul><ul><li>Drillable spreadsheets and charts </li></ul>
    32. 32. Pentaho Kettle provides a full spectrum of business intelligence capabilities including reporting, analysis, dashboards, data mining, and data integration Debugger with Pause/Resume User Friendly Job Designer GUI Web-based Dashboards/Mashups Advanced Logging/Statistics
    33. 33. “ Yahoo Pipes is a powerful composition tool to aggregate, manipulate, and mashup content from around the web.” <ul><li>Web-based ETL tool </li></ul><ul><li>Written in Canvas and Javascript </li></ul><ul><li>Features </li></ul><ul><ul><li>Extract Data </li></ul></ul><ul><ul><ul><li>CSV Files </li></ul></ul></ul><ul><ul><ul><li>RSS Feeds </li></ul></ul></ul><ul><ul><ul><li>Screen Scrape HTML </li></ul></ul></ul><ul><ul><ul><li>Yahoo Local and Yahoo Search </li></ul></ul></ul><ul><ul><ul><li>Flickr and Google Base </li></ul></ul></ul><ul><ul><li>Transform Data </li></ul></ul><ul><ul><ul><li>Row Count, Filter </li></ul></ul></ul><ul><ul><ul><li>Geocode addresses </li></ul></ul></ul><ul><ul><ul><li>String, Date and Number Manipulation </li></ul></ul></ul><ul><ul><ul><li>Make web service calls </li></ul></ul></ul><ul><ul><li>Load/Output Data </li></ul></ul><ul><ul><ul><li>RSS Feeds </li></ul></ul></ul><ul><ul><ul><li>KML Files </li></ul></ul></ul><ul><ul><ul><li>JSON </li></ul></ul></ul><ul><ul><ul><li>PHP Objects </li></ul></ul></ul><ul><ul><ul><li>Interactive Yahoo Maps, etc… </li></ul></ul></ul><ul><ul><li>Combine many feeds into one, then sort, filter and translate it </li></ul></ul><ul><ul><li>Geocode your favorite feeds and browse the items on an interactive map </li></ul></ul><ul><ul><li>Embed widgets/badges on your own web site </li></ul></ul><ul><ul><li>Output data as RSS, JSON, KML, and other formats </li></ul></ul>
    34. 34. This Yahoo Pipes Demo uses data exported from the DeepBlue Employee List and transforms the data into a georss and kml feed <ul><li>Demo: Pariveda Employee Map </li></ul><ul><ul><li>Uses data from DeepBlue Employee Contact List </li></ul></ul><ul><ul><li>Uses the Yahoo Geocoder service on each employee’s address </li></ul></ul><ul><ul><li>Uses a text input box to limit the size of the dataset </li></ul></ul><ul><ul><li>Displays as an interactive Yahoo Map or can be loaded into Google Earth as a kml file </li></ul></ul><ul><ul><li>http://tinyurl.com/64ny7o </li></ul></ul>
    35. 35. Table of Contents <ul><li>Introduction </li></ul><ul><li>Open Source Software </li></ul><ul><ul><li>Brief History of Open Source Software </li></ul></ul><ul><ul><li>The Cost of Open Source </li></ul></ul><ul><ul><li>Barriers to Open Source Adoption </li></ul></ul><ul><ul><li>Open Source Licensing Overview </li></ul></ul><ul><li>Business Intelligence Software </li></ul><ul><ul><li>Business Intelligence Software Overview </li></ul></ul><ul><ul><li>Brief History of Open Source BI Software </li></ul></ul><ul><ul><li>Open Source BI Vendor Offerings </li></ul></ul><ul><li>Demonstrations </li></ul><ul><ul><li>Talend OpenStudio </li></ul></ul><ul><ul><li>Pentaho Kettle </li></ul></ul><ul><ul><li>Yahoo Pipes </li></ul></ul><ul><li>Questions </li></ul>
    36. 36. Thank you for attending, please let me know if you have any questions <ul><li>Special Thanks to: </li></ul><ul><li>Samir Ray </li></ul><ul><li>Jeff Townes </li></ul><ul><li>Brian Orell </li></ul><ul><li>Daniel Herrin </li></ul><ul><li>Sean Beard </li></ul><ul><li>Grant Sutton </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×