These slides on the usage of open source solutions within the business intelligence and data warehousing market go with a webcast and research report. The webcast is archived at http://ow.ly/KLz0 along with a PDF of the report, This presentation describes what open source software is being deployed and presents the benefits, challenges and practices for organizations adopting open source technologies.
How to Remove Document Management Hurdles with X-Docs?
Open Source Solutions: Managing, Analyzing and Delivering Business Information
1. Open Source Open Source
Leveraging Solutions:
Managing, Analyzing andAcross
Business Intelligence
Delivering Business Information
Your Organization
MarkR. Madsen – November 2009
Mark R. Madsen – February 2009
www.ThirdNature.net
www.ThirdNature.net
4. The Origin of Copyright
• 1556: The Worshipful Company of Stationers
and Newspaper Makers is granted a Royal
Charter, giving it a monopoly over the
publishing industry until …
• 1710: “An Act for the Encouragement of
Learning, by vesting the Copies of Printed
Books in the Authors or purchasers of such
Copies, during the Times therein mentioned”,
otherwise known as the Statute of Anne, put
the put the rights into the hands of authors
February 2009 Mark Madsen Slide 4
5. After Each Revolution, the Old Pirates
Become the New Establishment
Pirate
Establishment
February 2009 Mark Madsen Slide 5
7. What Makes Software Open Source?
Academic
LIcenses
Reciprocal
Licenses
“Freeware”
Licenses
The fuzzy dividing Commercial
line between open Licenses
and closed source
More freedom Less freedom
February 2009 Mark Madsen Slide 7
8. Some Quick Definitions
Proprietary Software
Software under a license that provides limited
usage rights only, provided in binary format.
Open Source Software (OSS)
Software under a license that allows
acquisition, modification and redistribution.
Freeware
Software that does not have licensing
limitations, generally distributed in binary
format. Not the same as open source.
February 2009 Mark Madsen Slide 8
9. Fauxpen Source
Something appearing with greater frequency as open
source becomes more popular and lower tier
proprietary vendors seek a differentiator.
February 2009 Mark Madsen Slide 9
10. Evolution of the Software Market 1987
Source: John Prendergast (data: Bloomberg, Factset)
February 2009 Mark Madsen Slide 10
11. Evolution of the Software Market 1997
Source: John Prendergast (data: Bloomberg, Factset)
February 2009 Mark Madsen Slide 11
12. Evolution of the Software Market 2007
Source: John Prendergast (data: Bloomberg, Factset)
February 2009 Mark Madsen Slide 12
13. The DW & BI Software Market Today
According to IDC, the
analytics and data
warehouse software
market is growing at 31,595
10.3% CAGR 28,682
26,001
23,601
21,408
19,342
17,386
2005 2006 2007 2008 2009 2010 2011
February 2009 Mark Madsen Slide 13
14. Any Industry This Big is Maturing
Annual US software sales
150
130
110
90
70
50
30
10
-10
70 75 80 85 90 95 00
Source: US Dept. of Commerce
February 2009 Mark Madsen Slide 14
15. “If the automobile had followed Reality
the same development as the
computer, a Rolls-Royce would
today cost $100, get a million
miles per gallon, and explode
once a year killing everyone
inside.” Anything
Robert Cringely
Time
16. Software Revenue = Corporate IT Cost
IT costs as a percent of equipment investment
50
40
30
20
10
0
68 72 76 80 84 88 92 96 00 04
Source: US Dept. of Commerce
February 2009 Mark Madsen Slide 16
17. Open Source is an Inevitable Consequence
If the means of production
is widely distributed at
commodity cost
And the internet connects
all those means of
production
And the supply of any
software program is infinite
Then we need to rethink
some things.
“The era of high capital industrial
production is giving way to a
different model.” – Peter Drucker
February 2009 Mark Madsen Slide 17
18. A Perfect Commodity Changes Things
Open source is a means of
production and distribution of
software, and is driving
change in the market.
But the fact that the internet is
a massive copying machine
for the perfect commodity is
the real change in conditions.
The basis of open source is economics, not ideology.
February 2009 Mark Madsen Slide 18
19. The Real State of Enterprise Software?
February 2009 Mark Madsen Slide 19
20. Enterprise Software Economics
The enterprise software model
is breaking down. Some facts:
• 70% - 80% of sales & marketing is
for new sales
• 76% of new license revenue goes
to sales & marketing
• Maintenance makes up 45% of
revenues and this number is
increasing
• 75% of R&D for mature products is
for updates, bug fixing, and non-
revenue enhancements
• Maintenance and support is
becoming the biggest factor is
software company profitability.
Sources Godman-Sachs, Tech Strategy Partners, Forrester
February 2009 Mark Madsen Slide 20
21. Open Source Disruption
“Which sector of the industry is most vulnerable to
disruption by open source in the next five years?”
1. Web publishing and content management
2. Social software
3. Business Intelligence
Source: North Bridge Venture Partners
February 2009 Mark Madsen Slide 21
22. BI is Entering Mainstream Adoption
The BI market has lots of segments, most
new, some mature, some being rejuvenated.
Reporting Databases
& Analysis
Platforms
Data
Integration
Predictive
analytics
February 2009 Mark Madsen Slide 22
23. Maturity for OSS Components of the Stack
Dashboards & Scorecards Visualization
Information delivery
Analytics / OLAP clients Predictive Analytics
Interactive Reporting GIS & location
Standard Reporting Modeling
Portal Search/Discovery Workflow
Information Management
DW/Mart/ODS OLAP servers MDM* Data Quality
Integration Management
ETL EII EAI EDR Metadata
Infrastructure
Servers Operating Systems Databases
February 2009 Mark Madsen Slide 23
24. Interest in and Use of Open Source
Database 18% 13% 18% 29% 22%
Data integration and ETL 18% 12% 17% 31% 22%
Business intelligence 14% 8% 22% 37% 19%
Advanced analytics 5% 8% 18% 43% 26%
In production Prototype or pilot Evaluating Considering No plans
Source: Third Nature Open Source BI/DW adoption survey
February 2009 Mark Madsen Slide 24
25. Database Use
MySQL 75%
Postgres 44%
Infobright 11%
EnterpriseDB 10%
BerkeleyDB 8%
Ingres 7%
Firebird 7%
Palo 3%
CouchDB 3%
SQLite 3%
MonetDB 3%
LucidDB 2%
Kickfire 2%
Bizgres 2%
Source: Third Nature Open Source BI/DW adoption survey
February 2009 Mark Madsen Slide 25
26. Data Integration Tool Use
What’s popular
Pentaho DI / Kettle 42%
Talend 33%
Jitterbit 13%
DataCleaner 8%
Red Hat Teiid 5%
Apatar 5% What it’s being used for
OSDQ 2%
Open Data Quality 2% Batch ETL for a data warehouse or mart 30%
Clover 2% Operational integration 21%
Data migration efforts 15%
Data quality efforts 15%
Master data management efforts 10%
Low‐latency ETL for a data warehouse or mart 8%
Source: Third Nature Open Source BI/DW adoption survey
February 2009 Mark Madsen Slide 26
27. BI Tool Use
What’s popular
Pentaho 47%
Jaspersoft 28%
Mondrian 26%
BIRT 19%
Jfree 14%
SpagoBI 9% What it’s being used for
Openl 5%
Static reports 20.7%
MarvelIT 5%
Palo 2% Dashboards or scorecards 17.1%
OpenReports 2% End user or interactive reporting 16.5%
Reporting against an application database 15.9%
Reports embedded in an application or website 15.2%
OLAP 14.6%
Source: Third Nature Open Source BI/DW adoption survey
February 2009 Mark Madsen Slide 27
28. Advanced Analytics Use
R 46%
Weka 42%
RapidMiner 23%
Knime 8%
Graphviz 8%
Orange 7%
Processing 4%
Axiis 4%
Taverna 3%
Cytoscape 2%
Source: Third Nature Open Source BI/DW adoption survey
February 2009 Mark Madsen Slide 28
29. Usage of the tools
53%
50%
Database Data Integration BI Adv. Analytics
41%
36%
25%
18% 18% 18%
16% 15%
14% 14% 14% 13%
11%
10% 10% 10% 8% 7%
Replacing proprietary Replacing internally Supplementing a Adding new Using as part of a
software developed software system with similar functionality to an new system or
features existing system project
Source: Third Nature Open Source BI/DW adoption survey
February 2009 Mark Madsen Slide 29
30. Who’s Adopting Open Source for BI/DW?
1. The under-budgeted
2. ISVs
3. The under-served
4. The over-served
5. Developers who never
had it before
More co-existence and use
in edge cases than straight
replacements, and often
competing with lack of use.
February 2009 Mark Madsen Slide 30
32. Adoption by Size of Organization
Small
32%
Using Medium
23% Small
Large
23% Medium
Small Large
37%
Medium
Evaluating 41%
Large
38%
Medium and large are the two biggest evaluators, with small
using the most in production.
Source: Third Nature Open Source BI/DW adoption survey
February 2009 Mark Madsen Slide 32
33. Scope of System Deployment
Small Medium Large
40%
38%
35%
32%
27% 27%
Department or Division Corporate‐wide
Source: Third Nature Open Source BI/DW adoption survey
February 2009 Mark Madsen Slide 33
34. Open Source Purchasing
54% No purchasee
38%
36% Maintenance or support contract
30%
Small Training
29%
23% Consulting or installation services
14%
13% Phone, email or on‐site support from the vendor
53% Commercial license
38%
28% Phone, email or on‐site support from a third party
28% Subscription to value‐added, enterprise features
Medium
22%
31%
9%
31%
58%
45%
52%
33%
Large
24%
33%
6%
21%
Source: Third Nature Open Source BI/DW adoption survey
February 2009 Mark Madsen Slide 34
35. Where Are People Getting Information?
Online articles 53%
Online documentation / wikis 53%
White papers 48%
Online demos 47%
Community forums 47%
Web seminars or screencasts 37%
Blogs 37%
Vendor evaluation / trial support (free) 32%
Print articles 29%
Web‐based training 28%
Third party books or documentation 27%
Vendor support, paid or as part of a subscription 20%
Outside consultant or systems integrator 19%
Software features in a paid "professional" version of the software 17%
Pre‐bundled software (e.g. a database packaged with a BI tool) 16%
Classroom training 14%
Support from a third party 14%
Internet relay chat (IRC) 7%
Source: Third Nature Open Source BI/DW adoption survey
February 2009 Mark Madsen Slide 35
36. Why Consider Open Source?
IT is after one of three things:
February 2009 Mark Madsen Slide 36
37. Rationale When Evaluating OSS
Lower cost and reducing vendor risk are the two big reasons.
Lower acquisiton costs 66%
Open standards 48%
Reduced dependence on a vendor 44%
Lower maintenance costs 43%
Flexibility in deployment 33%
Speed of innovation of the software 32%
Easier to evaluate or procure 32%
Open development process and road
… 32%
Extensibility, customizability of software 28%
Access to the source code 28%
Source: Third Nature Open Source BI/DW adoption survey
February 2009 Mark Madsen Slide 37
38. Good News: It Works
The benefits are largely being realized.
Lower costs 69%
Ease of integration / open standards 43%
Reduced dependence on vendor 40%
Flexibility in deployment 36%
Freedom from vendor lock‐in 34%
Access to the source code 33%
Extensibility / customizability of software 32%
Speed of innovation of the software 30%
Quicker turnaround on bug fixes 22%
Better performance 12%
Source: Third Nature Open Source BI/DW adoption survey
February 2009 Mark Madsen Slide 38
41. Why did the software evaluations fail?
Missing or incomplete features 72%
Scalability problems 34%
Required more internal expertise than expected 32%
Difficulty integrating into current environment 29%
Difficulty finding available solutions 28%
Reliability problems 25%
Lack of available consulting 21%
Interoperability problems 19%
Higher costs than anticipated 18%
Lack of vendor service or support 16%
The biggest reason is maturity of the software.
Source: Third Nature Open Source BI/DW adoption survey
February 2009 Mark Madsen Slide 41
42. Data Size, All Database Types
Source: Third Nature Open Source BI/DW adoption survey
67% of the
24% sample < 1TB
15%
14% 14% 13%
4%
3%
1%
Less than 50 to 100 to 500GB to 1 to <5TB 5 to <20TB 20TB to More than
50GB <100GB <500GB <1TB 50TB 50TB
February 2009 Mark Madsen Slide 42
43. Performance problems
Poor interactive BI or analytics performance 69%
Poor performance loading data 37%
Poor ETL or data integration performance 33%
Poor batch reporting performance 33%
Source: Third Nature Open Source BI/DW adoption survey
February 2009 Mark Madsen Slide 43
44. Solving Performance Problems
Replace every single thing before the database?
Database or application tuning 38%
Buy more powerful hardware 34%
Change BI or analytics tools 32%
Redesign the ETL or data integration 32%
Limit the amount of data stored in the system 30%
Rewrite the BI application or reports 26%
Change ETL or data integration tools 18%
Limit the number of users accessing the system 18%
Migrate to an analytic database 10%
Buy a specialized accellerator 8%
Migrate to a different traditional database 4%
Migrating to an analytic database is twice as likely as to another
row-store database. Source: Third Nature Open Source BI/DW adoption survey
February 2009 Mark Madsen Slide 44
45. Discontinuity Drives Open Source BI Use
The situations most appropriate
to open source BI tools often
involve discontinuous change.
• New interface requirements
• New integration requirements
• Platform change
• Schema change
• Data latency / real-time
requirements
• Segmenting the user population
The data warehouse is becoming
much more diverse – one BI vendor
can no longer be expected to provide
tools for all needs.
February 2009 Mark Madsen Slide 45
46. First Thought is Often “Replace”
February 2009 Mark Madsen Slide 46
47. Coexist is More Likely Than Replace
February 2009 Mark Madsen Slide 47
48. Augment is Also More Likely
February 2009 Mark Madsen Slide 48
49. Recommendations
1. Don't focus solely on cost
savings. People did not
mention as up-front reasons
many of the benefits they
discovered later.
2. Plan to augment, not replace,
existing software with open
source. Rather than trying to
saving money by replacing
software, look at gaps in the BI
portfolio or data warehouse
stack and use open source to
supplement your systems.
February 2009 Mark Madsen Slide 49
50. Recommendations
3.Consider developing open
source policies. Most
organizations are adopting open
source in an ad-hoc fashion,
project by project.
4. Evaluate open source like any
other software. It doesn't
matter if the software is free if it
takes longer to build, manage
and deploy solutions to end
users, if it is unstable, or if it is
missing a key feature
5. Make open source the default
option. When there are no
internal tools, open source
should be the first alternative.
February 2009 Mark Madsen Slide 50
51. “When a new technology rolls over you, you're either part of
Questions?
the steamroller or part of the road.” – Stewart Brand
February 2009 Mark Madsen Slide 51
52. Creative Commons
Thanks to the people who made their images available via creative commons:
glassblower - http://flickr.com/photos/cazasco/261229878/
canal - http://flickr.com/photos/mcsixth/150749007/
rc toy truck.jpg - http://flickr.com/photos/texas_hillsurfer/2683650363/
asymmetry_building_tokyo.jpg - http://flickr.com/photos/fukagawa/2004102417/
beer_free_beer2.jpg - http://flickr.com/photos/fzero/173386050
beer_free_beer3.jpg - http://flickr.com/photos/henrikmoltke/142750871/
condiments_salsa.jpg - http://flickr.com/photos/uberculture/2462506722/
london modern and ancient together.jpg - http://www.flickr.com/photos/cc_chapman/299509390/
firemen not noticing fire.jpg - http://flickr.com/photos/oldonliner/1485881035/
acapluco_cliff_divers_cc.jpg - http://flickr.com/photos/raveller/
highway storm.jpg - http://flickr.com/photos/areyoumyrik/235230688
Tenessee chicken - http://www.flickr.com/photos/mayhem/2495739721/
February 2009 Mark Madsen Slide 52
53. About the Presenter
Mark Madsen is president of Third
Nature, a technology research and
consulting firm focused on business
intelligence, data integration and
data management. Mark is an
award-winning author, architect and
CTO whose work has been featured
in numerous industry publications.
Over the past ten years Mark
received awards for his work from
the American Productivity & Quality
Center, TDWI, and the Smithsonian
Institute. He is an international
speaker, a contributing editor at
Intelligent Enterprise, and manages
the open source channel at the
Business Intelligence Network. For
more information or to contact Mark,
visit http://ThirdNature.net.