Unleash Your Potential - Namagunga Girls Coding Club
Open sourcebi
1. The State of Open Source Business Intelligence
Christian Donner
2. Getting from data to the source of a problem can be hard ...
A czar learned that the most disease-ridden province of
his empire was also the province with the most doctors.
His solution?
He promptly ordered all the doctors shot dead.
(He clearly lacked Business Intelligence)
Folktale from: Freakonomics - A Rogue Economist Explores the Hidden Side of Everything (Steven D. Levitt, Stephen J. Dubner)
www.molecular.com 2
3. … or easy …
"How would you rate the overall job President George W. Bush is doing as
president
-- excellent, pretty good, only fair, or poor?
Excellent or pretty good
90.00%
80.00%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
December
December
December
November
November
November
September
September
November
February
February
February
February
February
February
January
January
October
October
October
October
August
August
August
August
July
August
July
June
June
June
June
March
June
March
March
May
May
May
April
April
April
April
April
2001 2002 2003 2004 2005 2006
Source: Harris Poll, published by the Wall Street Journal Online on 5/12/2006
www.molecular.com 3
4. Poll
• Who has implemented something that you would define as
a BI solution before, either in your own organization or for
someone else?
• Out of this group, who has used an Open Source BI
product?
• Survey on http://cdonner.com (20 responses):
Currently using BI 85%
Currently using OSBI 40%
Evaluated OSBI in the past 40%
Planning to use OSBI 35%
www.molecular.com 4
5. Why this presentation?
• 2 years ago I started a low-budget BI project
• Researched many products and technologies
• OSBI was practically non-existent
• Decided to go with Microsoft DTS and SQL RS
• Today, the landscape has changed dramatically
• I wanted to know: would I go with Open Source BI today?
www.molecular.com 5
6. Agenda
• What is Business Intelligence?
• BI Trends
• OSBI Trends
• Products
• Pentaho
• Jaspersoft
• OpenI
• BIRT
• Bizgres
• Mondrian
• Demo
www.molecular.com 6
7. Business Intelligence – A Definition
• Business Intelligence
• In 1989 Howard Dresner (Gartner Group) created the term "BI“:
“A set of concepts and methods to improve business decision-
making by using fact-based support systems.”
• Wikipedia:
• the technology used for collecting and analyzing business
information
• a set of business processes for this purpose
• the information obtained from these processes
• Includes:
• ETL Tools
• OLAP/Data Analysis Tools
• Reporting Tools
• Databases
www.molecular.com 7
9. Business Intelligence Platform
• Integrate with business
processes
• Manage and schedule
reports
• Deliver reports through
multiple channels, push and
pull model support
• Maintain user security
• Seamlessly integrate via
open standards with portals
and applications
www.molecular.com 9
10. Agenda
• What is Business Intelligence?
• BI Trends
• OSBI Trends
• Products
• Pentaho
• Jaspersoft
• OpenI
• BIRT
• Bizgres
• Mondrian
• Demo
www.molecular.com 10
11. Forecast: Business intelligence market growth
Actual Forecast
$8,000 BI services revenue
BI maintenance revenue
BI license revenue
$6,000
$4,000
$2,000
BI market
size
(US$ millions) $0
2003 2004 2005 2006 2007 2008
Size $5,253M $5,596M $5,997M $6,506M $7,005M $7,331M
Growth N/A 6.5% 7.2% 8.5% 7.7% 4.7%
Source: Forrester Research, “Business Intelligence Growth Is Driven By Compliance, Standardization, And Performance Initiatives”
www.molecular.com 11
12. Mainstream BI Theme
• Keith Gile, Forrester Research:
“We are witness to a change in BI that shifts the
emphasis away from functionally powerful tools for
power-user “producers” toward context-sensitive BI
solutions for a large community of “consumers” of
information.”
• Paul Doscher, CEO Jaspersoft:
“The big commercial tool providers can handle
performance management applications well, but left
Operational BI behind.”
• License bottleneck
• Lower-level in-house user
• Public web sites
www.molecular.com 12
13. Forrester Wave™: BI Enterprise Reporting, Q1 ‘06
Where are
the Open
Source
contenders?
Source: Forrester Research
www.molecular.com 13
14. Agenda
• What is Business Intelligence?
• BI Trends
• OSBI Trends
• Products
• Pentaho
• Jaspersoft
• OpenI
• BIRT
• Bizgres
• Mondrian
• Outlook
www.molecular.com 14
15. Why Open Source?
Source: Survey by Computer Economics, Frank Scavo
www.molecular.com 15
17. Organizational Involvement with OS BI
Don’t Know Has deployed open source
Not Considering open BI software
8%
source BI software 9% 21%
19%
43%
In development with
Considering open open source BI software
source BI software
(c) 2006 Ventana Research Open Source and BI Research
www.molecular.com 17
18. Comparison of OS BI with Commercial BI
Cost of Ownership 77%
Openness/Flexibility 80%
Database Support 72%
Reliability 69%
Metadata Support 62%
Manageability 57%
Scalability/Performance 61%
Ease of Use 57%
Significantly more Capable More Capable Equivalently Capable Less Capable Significantly less Capable Don’t know
(c) 2006 Ventana Research Open Source and BI Research
www.molecular.com 18
19. Extranet Applications - The “Beachhead” of Open Source BI?
• Technology requirements favor open source
• Pure J2EE offerings provide a better technology fit than
legacy BI technology
• Licensing requirements contradict prevailing proprietary models
• “Named user” only – doesn’t map to extranet usage
• Role-based – meaningless in extranets
• >$1,000 USD per name user – cost prohibitive
• Net/net: The “old school” BI licensing model breaks down
www.molecular.com 19
20. Free software for sale!
• Community-based vs. for-profit companies
• Open Source has become a business model
• Acquisition of your vendor can change the terms under
which you use OS SW
• Example: Bill Venners account of using Jive for Artima.com
• Example: Snort, Sale of Martin Roesch’s Checkpoint
Software
• Whatever you do, factor in that your Open Source product
may not always remain that.
www.molecular.com 20
22. Agenda
• What is Business Intelligence?
• BI Trends
• OSBI Trends
• Products
• Pentaho
• Jaspersoft
• OpenI
• BIRT
• Bizgres
• Mondrian
• Outlook
www.molecular.com 22
23. OSBI Explosion
• There are about 25 products competing
in this space, about half of which did not
exist prior to 2005.
• Many of them will probably return to
insignificance
• Because we are so early in the maturity
cycle, it is difficult to make judgments
about who will make it.
www.molecular.com 23
26. BIRT 2.0 Features
• Released January 20, 2006
• Re-Use Library – A report component environment allows developers
with a range of expertise to share report components or functions for
reuse.
• Page-on-Demand HTML- A page-on-demand navigation
mechanism enables the efficient viewing of large report documents over
the internet.
• CSS Style Sheets – External style sheets can be used across
multiple report designs, making it easy to establish a common look
across all reports in one application.
• Scripting Editor – BIRT supports the ability to code or script the
behavior of reports using a perspective for Java Code Editing for BIRT
reports.
• Large, Persistent Reports – Report developers can generate a
report and then distribute a URL to end-users.
• Improved Charting Facility, Scripting – BIRT 2.0 includes a
wizard for building common usage charts and advanced capabilities for
including detailed charts within a report design.
www.molecular.com 26
33. OpenI at a Glance
• J2EE Web Application
• Standards-based, integrates other Open Source
components
• Connectors for Relational (JDBC), OLAP (XMLA), and data
mining data sets (RServe) currenly only XMLA
• Supports Jasper .jrxml and custom RDL
• JPivot for Pivot tables, JFreeChart
• Supports JSP-168
• Form-based authentication with J2EE Security
www.molecular.com 33
34. Bizgres
• Sponsored by Greenplum
• Bizgres is a distribution of PostgreSQL (Open Source DB)
• Bizgres includes the following components:
• PostgreSQL 8.1.3 (Open Source RDBMS)
• Bizgres Loader (Mass data loading utility)
• Demonstration Programs and Utilities
• KETL Integration (ETL solution for web log analysis)
• JasperReports Integration
• Bizgres Clickstream
www.molecular.com 34
37. Agenda
• What is Business Intelligence?
• BI Trends
• OSBI Trends
• Products
• BI suites
• ETL tools
• OLAP
• Reporting tools
• Databases
• Demo
www.molecular.com 37
38. The State of Open Source Business Intelligence
• “Business intelligence” is a broad umbrella term
• Lot of buzz in the media and from analysts
• Young and growing market
• Immature, but rapidly improving products
• No clear market leader
www.molecular.com 38
39. Thank you!
Q&A
Would I go with Open Source BI today? How about you?
www.molecular.com 39
Editor's Notes
JasperReports JasperReports is one of the oldies as well, starting in 2001. More recently a company, JasperSoft has been formed to invest in JasperReports, as well as to provide support, training and various other services. JasperSoft represents the JasperReports project in consortiums, such as Bizgres. Agata Report From their web site..."Agata Report is a Database Reporting Tool and EIS tool, MIS tool (graph generation), like Crystal Reports. Its written in PHP-GTK and allows you to edit and get SQL results from several databases (PostgreSQL, MySQL, Oracle, SyBase, MsSql, FrontBase, DB2, Informix and InterBase) as as PostScript, plain text, HTML, XML, PDF, or spreadsheet (CSV) formats through its graphical interface. You can also define levels, subtotals, and a grand total for the report, merge the data into a document, generate address labels, or even generate a complete ER-diagram from your database." DataVision DataVision is an Open Source Report Writer that allows drag-and-drop report design through its GUI. It is written in Java and can connect to any database supporting JDBC. OpenReports From their website... "OpenReports is a flexible open source web reporting solution that allows users to generate dynamic reports in a browser. OpenReports uses JasperReports, an excellent full featured open source reporting engine, and was developed using leading open source components including WebWork, Velocity, Quartz, and Hibernate and includes full support for JasperReports." They've recently announced OpenReports Portal Edition that blends OpenReports with the Apache Jetspeed Enterprise Portal system. Also of interest are the related projects of ObjectVisualizer and OpenReports Designer OpenRPT OpenRPT is a full featured, cross-platform SQL report writer that stores its report definitions as XML, and has a WYWIWYG report writer that can be used in stand-alone or embedded fashion. JFreeReport jFreeReport is standalone Java report library with a nice series of capabilities and a decent community around it. In January, 2006, jFreeReport became a part of the Pentaho suite. (source: http://www.squidoo.com/osbi)
Mondrian Mondrian is one of the oldest open source BI components, having been registered in 2001. It is also used as the OLAP engine in other open source software OLAP and BI Suite projects. JPivot JPivot is a JSP tag library supporting XMLA that provides a front-end OLAP table to the Mondrian OLAP engine, allowing typical OLAP functions such as slice-and-dice, drill-down and roll-up. gOLAP Gratis OLAP [gOLAP] has been in the planning stage since its registration on SourceForge in 2001. There are some files in the CVS, but nothing has been released. From its SourceForge description... "gOLAP is a BSD-licensed OLAP server engine and client API. It is a hypercube-based Analytical Processing engine intended for general high performance applications." PALO PALO is a recent entry to the open source software OLAP field. It's different in that it is esentially an add-in for Micorsoft Excel. PALO provides a MDDB for Excel, with future plans to allow access through other APIs as well. From their homepage... "Palo is an advanced data store for Microsoft Excel that allows you to handle large amounts of Excel data on a small number of worksheets. In addition, it also allows you to share Excel data real-time with your collegues." pocOLAP pocOLAP is a web-based, cross-tab reporting tool written in Java, that also allows for drill-down. The name comes from "poco", meaning "little" in the Italian and Spanish. (source: http://www.squidoo.com/osbi)
KETL KETL is an ETL for high volume transactions developed by Kinetic Networks and delivered as part of the Bizgres suite. This links provides an index of documents from Kinetic Networks. KETL First Meeting Read our first interview with the KETL team. Enhydra Octopus Enhydra Octopus is part of the ObjectWeb GForge project, providing JDBC Data Transformations Pequel ETL Pequel ETL is, according to their SourceForge description, a comprehensive and high performance data processing/transform system. It features a simple, user-friendly event driven scripting interface that transparently generates & executes highly efficient Perl/C code. Uses: ETL, datawarehousing, statistics, and data-cleansing. Clover ETL Clover ETL is an open source Java based framework for building data transformations (ETL applications). CpluSQL The cplusql distributed ETL tool extracts and transforms row based data from databases and flat files for terabyte scale datawarehouse loading. JetStream JetStream is the first open source ETL tool that we used. It is described as a Java Extraction Transformation Service for Transmitting Records & Exchanging Application Metadata: a Java-based ETL/EAI tool. KETTLE Don't confuse KETL and KETTLE - they're not related. K.E.T.T.L.E (Kettle ETTL Environment) is a meta-data driven ETTL tool. (Extraction, Transformation, Transportation & Loading) openDigger OpenDigger is a java based compiler for the xETL language. xETL is a language specifically projected to read, manipulate and write data in any format and database. With OpenDigger/XETL you can build powerful Extraction-Transformation-Loading (ETL) prograns. (source: http://www.squidoo.com/osbi)
BEE Project BEE is one of the first open source BI Suites, having been around since 2002. It provides ETL, ROLAP, reporting, integration with the R Project, is written in PERL, and primarily supports MySQL. Bizgres Bizgres is a distribution of PostgreSQL with specific modifications to increase performance and use as a data warehouse. In addition, the Bizgres project comes with the KETL ETL tool and JasperReports. The Bizgres project is supported by a consurtium of three companies, Greenplum, Kinetic Networks, and JasperSoft. OpenReports Portal MarvelIT's OpenReports Portal provides Reporting, Charting and Portal capabilities. Open i Open i provides a web-driven interface to OLAP, relational, statistical and data mining sources giving BI integrators user interface, report definition and connector tools. Pentaho Pentaho has been getting a lot of attention since its launch and funding in 2005. This project has an impressive pedigree in its team leaders, and provides quite an array of capabilities: Reporting, Analysis, Dashboards, Data Mining and Workflow. SpagoBI SpagoBI is a BI platform drawing its components from the ObjectWeb consortium. Tools include metadata management, ETL, Reporting, Analysis, and Dashboards. (source: http://www.squidoo.com/osbi)
OpenI is a J2EE web application, by default running on Tomcat. It publishes web-based analytical reports from 3 types of data sources – OLAP servers, relational database servers, and data mining servers. It has 3 key component categories: Connectors Connectors’ job is to speak the native tongue of individual analytical data sources. For relational data sources, OpenI uses JDBC since it is well known and standardized. For OLAP data sources, OpenI uses XMLA as the standard protocol to communicate. This protocol is supported by several OLAP servers including Microsoft Analysis Services and Mondrian (an open source OLAP server). For data mining datasets, OpenI integrates with the R project , a popular open source data mining platform, using a native API called RServe . ( only XMLA is operational in the current release ) Report Definitions OpenI uses data-source specific report definition languages (RDL’s) to define and track the reports created on the platform. Wherever possible, OpenI uses existing standard RDL’s from other open source projects such as the .jrxml definition from JasperReports for relational database reports. For OLAP and data mining reports, OpenI implements its own XML-based RDL to define the report schema. By publishing this codebase into open source space, we hope that these RDL’s will become more standard (and robust) via community feedback and contribution. User Interface The UI for OpenI brings various existing public domain work into a single platform, mainly with the intent to make the platform extremely user friendly to a non-technical user. It is more designed for the “business analyst” rather than the “database developer”. For charting components and pivot table components, it heavily utilizes components from JPivot and JFreeChart , and unifies them in a single, consistent navigation framework. Realizing that analytical applications usually need to be embedded into existing enterprise portals, we are also leveraging the upcoming portlet features of JPivot to better integrate with JSR-168 compliant portals. A key UI feature of OpenI is the administrator interface where a user can create and publish new reports from existing data sources entirely via a web interface, without having to write any code or query. Also available are features like publishing in private versus public folders, customization of chart components, color palettes, etc. Security OpenI uses a form-based authentication that is integrated with the J2EE security structure, i.e. you can use any of the security realms defined in the J2EE configuration to authenticate the login. OpenI also provides integration between J2EE security and datasource security allowing the datasource to enforce fine grained data permissions. This way, user or group-specific access policies get enforced at the data source level, enabling hierarchical data access policies. For example, a user may only see the specific subset of the cube data as permitted by the OLAP security rules for their login.
The term “Business Intelligence” has only been in use for a few years. From the stone age of computing until only a few years ago, it was called “reporting”. In the late 80ies, the Information Warehouse was conceived. The idea was to leave data where it was and access it from anywhere with tools. Needless to say, this fad was short-lived. Soon thereafter, in the mid-90ies, Ralph Kimball published his first Data Warehousing book. Arguably, the concept of what we mean by Business Intelligence today was coined in those days. Data is extracted from operational systems, processed and stored in repositories especially designed for analysis. I don’t remember hearing the term Business Intelligence until a few years ago, though, around 2001. Dashboards, Key Performance Indicators and Scorecards brought Business Intelligence closer to the executive office. This trend is still happening. Only in the last year or 2, Open Source appeared in the world of Business Intelligence.