Open Source BIDeep Dive      Ravi Samji      21/04/2011
AgendaBusiness Intelligence – Why, What & Who?Open Source BI – Introduction, Tech StackOLAP Engine – MondrianUI Layer – JP...
About Yodlee
Business Intelligence – Why?Data is the biggest asset   Structured and Unstructured formatMost of our assets are buriedHel...
Business Intelligence – What?ReportingAnalyticsData/Text MiningETLPredictive Analytics
Business Intelligence – Who?
Open Source BI – IntroductionMondrian – OLAP Engine   Initially Independent Open Source Initiative   Now Part of Pentaho O...
Open Source BI – Tech Stack    JFreeChart              WCF                                  log4j                         ...
OLAP Engine – MondrianCube Definition – schema.xmlMDX – Query language to access multi dimensional dataOperates on normali...
Mondrian – schema.xmlLogical model of a multi dimensional databaseCube, VirtualCubeDimensions, Hierarchies, LevelsMeasure,...
Logical Model – Multi Dimensional<Schema>                  Database <Cube name="Sales">  <Table name="sales_fact_1997"/>  ...
Dimensions & Shared Dimensions <Schema> <Dimension name="Time">  <Hierarchy hasAll="false" primaryKey="time_id">   <Table ...
Hierarchies<Schema><Dimension name="Time"> <Hierarchy hasAll="false" primaryKey="time_id">  <Table name="time_by_day"/>  <...
Schema.xml – ExtensionsPlug-in classesIn-line tablesViewsUser defined functions
Extensions – Plug-in ClassesMember ReaderMember FormatterCell ReaderCell FormatterProperty Formatter
Extensions – In-line Tables<Dimension name="Severity"> <Hierarchy hasAll="true" primaryKey="severity_id"> <InlineTable ali...
Extensions – Views<Cube name="Operations"> <View alias="StateCountyCity">  <SQL dialect="generic">   <![CDATA[SELECT s.sta...
Extensions – User Defined Functions  Must implement mondrian.spi.UserDefinedFunction  Implementation must be available in ...
MDX / JDBC ParallelsMondrian                                JDBCConnection – mondrian.olap.Connection   Connection – java....
UI Layer – JPivot
Performance & ScalabilityEnable SQL statement logging to analyzemondrian generated SQL statementsIndex on foreign/join key...
ConstraintsComposite key joins are not supportedUniqueness within a level is not based on idHave had issues re-using same ...
Summary100% Pure Java BI toolNot too difficult to work withExtensible for different front-end layersScalableViable alterna...
Upcoming SlideShare
Loading in …5
×

Learning Open Source Business Intelligence

833 views

Published on

Interested in learning BI the open source way? We will walk through open source BI tools and technology stack. We'll also explore how to build BI application on a relational data without creating a data warehouse and understand the constraints/requirements of being able to do so. Then, a discussion around how to take advantage of some of the database features to scale open source BI to new heights. Learn about cube construction, shared dimensions, calculated facts and how mondrian interprets the cube definition to construct a SQL statement. Also, we will learn how to replace some of the out of the box component in the stack with a custom

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
833
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Learning Open Source Business Intelligence

  1. 1. Open Source BIDeep Dive Ravi Samji 21/04/2011
  2. 2. AgendaBusiness Intelligence – Why, What & Who?Open Source BI – Introduction, Tech StackOLAP Engine – MondrianUI Layer – JPivotPerformance & ScalabilityConstraints
  3. 3. About Yodlee
  4. 4. Business Intelligence – Why?Data is the biggest asset Structured and Unstructured formatMost of our assets are buriedHelps us understand customer behaviorHelps us deliver better business valueMeasure performance
  5. 5. Business Intelligence – What?ReportingAnalyticsData/Text MiningETLPredictive Analytics
  6. 6. Business Intelligence – Who?
  7. 7. Open Source BI – IntroductionMondrian – OLAP Engine Initially Independent Open Source Initiative Now Part of Pentaho Open Source BI Suite100% Pure JavaSupports MDX and XML/ABundled With Other Open Source Packages
  8. 8. Open Source BI – Tech Stack JFreeChart WCF log4j log4j JPivot Mondrian RDBMS
  9. 9. OLAP Engine – MondrianCube Definition – schema.xmlMDX – Query language to access multi dimensional dataOperates on normalized relational database
  10. 10. Mondrian – schema.xmlLogical model of a multi dimensional databaseCube, VirtualCubeDimensions, Hierarchies, LevelsMeasure, CalculatedMember
  11. 11. Logical Model – Multi Dimensional<Schema> Database <Cube name="Sales"> <Table name="sales_fact_1997"/> <Dimension name="Gender" foreignKey="customer_id"> <Hierarchy hasAll="true" allMemberName="All Genders" primaryKey="customer_id"> <Table name="customer"/> <Level name="Gender" column="gender" uniqueMembers="true"/> </Hierarchy> </Dimension> <Dimension name="Time" foreignKey="time_id"> <Hierarchy hasAll="false" primaryKey="time_id"> <Table name="time_by_day"/> <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/> <Level name="Quarter" column="quarter" uniqueMembers="false"/> <Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/> </Hierarchy> </Dimension> <Measure name="Unit Sales" column="unit_sales" aggregator="sum" formatString="#,###"/> <Measure name="Store Sales" column="store_sales" aggregator="sum" formatString="#,###.##"/> <Measure name="Store Cost" column="store_cost" aggregator="sum" formatString="#,###.00"/> <CalculatedMember name="Profit" dimension="Measures" formula="[Measures].[Store Sales] - [Measures].[Store Cost]"> <CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00"/> </CalculatedMember> </Cube></Schema>
  12. 12. Dimensions & Shared Dimensions <Schema> <Dimension name="Time"> <Hierarchy hasAll="false" primaryKey="time_id"> <Table name="time_by_day"/> <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/> <Level name="Quarter" column="quarter" uniqueMembers="false"/> <Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/> </Hierarchy> </Dimension> <Cube name="Sales"> <Table name="sales_fact_1997"/> <DimensionUsage name=“Time" source=“Time" foreignKey="time_id”/> <Measure …/> <CalculatedMember …/> </Cube> <Cube name=“Warehouse"> <Table name="sales_fact_1997"/> <DimensionUsage name=“Time" source=“Time" foreignKey="time_id”/> <Measure …/> <CalculatedMember …/> </Cube> </Schema>
  13. 13. Hierarchies<Schema><Dimension name="Time"> <Hierarchy hasAll="false" primaryKey="time_id"> <Table name="time_by_day"/> <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/> <Level name="Quarter" column="quarter" uniqueMembers="false"/> <Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/> </Hierarchy> <Hierarchy name=“Fiscal Calendar” hasAll="false" primaryKey="time_id"> <Table name="time_by_day"/> <Level name="Year" column=“fiscal_year" type="Numeric" uniqueMembers="true"/> <Level name="Quarter" column=“fiscal_quarter" uniqueMembers="false"/> <Level name="Month" column=“fiscal_month_of_year" type="Numeric" uniqueMembers="false"/> </Hierarchy></Dimension> <Cube name="Sales"> <Table name="sales_fact_1997"/> <DimensionUsage name=“Time" source=“Time" foreignKey="time_id”/> <Measure …/> <CalculatedMember …/> </Cube></Schema>
  14. 14. Schema.xml – ExtensionsPlug-in classesIn-line tablesViewsUser defined functions
  15. 15. Extensions – Plug-in ClassesMember ReaderMember FormatterCell ReaderCell FormatterProperty Formatter
  16. 16. Extensions – In-line Tables<Dimension name="Severity"> <Hierarchy hasAll="true" primaryKey="severity_id"> <InlineTable alias="severity"> <ColumnDefs> <ColumnDef name="id" type="Numeric"/> <ColumnDef name="desc" type="String"/> </ColumnDefs> <Rows> <Row> <Value column="id">1</Value> <Value column="desc">High</Value> </Row> <Row> <Value column="id">2</Value> <Value column="desc">Medium</Value> </Row> <Row> <Value column="id">3</Value> <Value column="desc">Low</Value> </Row> </Rows> </InlineTable> <Level name="Severity" column="id" nameColumn="desc" uniqueMembers="true"/> </Hierarchy></Dimension>
  17. 17. Extensions – Views<Cube name="Operations"> <View alias="StateCountyCity"> <SQL dialect="generic"> <![CDATA[SELECT s.state_name, c.county_name, t.city_name, s.state_id, c.county_id, t.city_idFROM state sLEFT JOIN county c ON (c.state_id = s.state_id)LEFT JOIN city t ON (c.county_id = t.county_id) ]]> </SQL> </View></Cube>
  18. 18. Extensions – User Defined Functions Must implement mondrian.spi.UserDefinedFunction Implementation must be available in classpath UDF Definition in schema.xml<Schema> ... <UserDefinedFunction name="PlusOne" className=“my.udf.PlusOne" /></Schema> MDX UsageWITH MEMBER [Measures].[Unit Sales Plus One] AS PlusOne([Measures].[Unit Sales])SELECT {[Measures].[Unit Sales]} ON COLUMNS, {[Gender].MEMBERS} ON ROWSFROM [Sales]
  19. 19. MDX / JDBC ParallelsMondrian JDBCConnection – mondrian.olap.Connection Connection – java.sql.ConnectionQuery – mondrian.olap.Query Statement – java.sql.StatementResult – mondrian.olap.Result ResultSet – java.sql.ResultSetAccess Axis & Cell from Result Access Rows & Columns from ResultSet
  20. 20. UI Layer – JPivot
  21. 21. Performance & ScalabilityEnable SQL statement logging to analyzemondrian generated SQL statementsIndex on foreign/join keysUse Aggregate Tables & Materialized ViewsQuery results in session
  22. 22. ConstraintsComposite key joins are not supportedUniqueness within a level is not based on idHave had issues re-using same table with adifferent aliasMake mondrian happy schema – must benormalizedRequires dedicated Time dimension table
  23. 23. Summary100% Pure Java BI toolNot too difficult to work withExtensible for different front-end layersScalableViable alternative to proprietary tools No vendor lock-in – Open Source Less TCO Quicker Time To Market

×