Business Intelligence Open Source


Published on

Business Intelligence Open Source course, theory and principal vendors.

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Business Intelligence Open Source

  1. 1. Business Intelligence
  2. 2. History● Business Intelligence term first apparition on 1958 by Hans Peter Luhn, an IBM researcher● Authomatic method to provide current awareness services to scientists and engineers● Current definition of Business Intelligence as a combination of processes and technologies for gathering, storing, analyzing and providing access to informations to help enterprise users to make conscious decisions
  3. 3. Main concept● Collect data from different sources● Integrate and clean up data in a common, easy to analyze repository● Provide business related analysis for managers and decision makers● Focus on business, data integration, data presentation
  4. 4. Datawarehouse● Bill Inmon: A collection of data in support of decisional process ● End-user oriented ● Collected from different sources ● Time dependence ● Data is not editable● In theory means a group of processes● In the real world is often used for the database
  5. 5. OLTP: On-Line Transaction Processing ● Commonly used in ERP, CRM systems and database applications ● Focuson transaction level (one invoice, one sales order, a search query, etc.) ● Updates and insertions are frequent ● Relational model with many tables, using normalization rules
  6. 6. OLAP: On-Line Analytical Processing● A system designed for analysis prouposes● Focused on the data exploration on the whole● Data once added changes a lot less frequently● 13 (12+0) rules of Dr. Codd (1993) ● Multidimensional view ● Intuitive data manipulation ● Dimensions, Facts, Hierarchy levels, Cardinality
  7. 7. On-Line Analytical Processing
  8. 8. Relational OLAP● Uses relational database schemas and SQL to store and access OLAP cubes● Reuse of RDBMS technology● Many tools and vendors available● SQL can be used directly by many tools● Scalability
  9. 9. Star schema
  10. 10. Memory OLAP, Hybrid OLAP● Memory OLAP uses optimized multidimensional arrays● Requires pre-computation and storage of the cube (processing)● Often better in performances than ROLAP, better caching, multidimensional indexing● Compression techniques, statistical indexes● Less scalable than ROLAP on high volume of data, less tools and vendors available● Hybrid OLAP (HOLAP) is the combination of ROLAP and MOLAP
  11. 11. Slowly Changing Dimensions● In some Business Intelligence implementations data is always added and almost never modified● This makes possible to go back in the timeline● For example if an employer was hired in a time period you can analyze data as being in that period, counting exactly the number of employes● A common approach to ensure Slowly Changing Dimesions is to add some special fields to the database records, giving a time-related validity for each record
  12. 12. MDX● Multidimensional Expressions (MDX) is a query language for OLAP databases● MDX is to OLAP as SQL queries are to OLTP databases● Powerfull on computing indexes and navigating through OLAP dimensions● SELECT {[Measures].[Store Sales]} ON COLUMNS {[Date].[2002], [Date].[2003]} ON ROWS FROM Sales WHERE ([Store].[USA].[CA])
  13. 13. Features for a BI platform● Data storage, data management● Data Integration, process schedulement● Querying and reporting● On Line Analitycal Processing (OLAP)● Documents management, versioning● Statistical computations● Microsoft Office or Open Office support● Easy to use and end user self creation of documents (indipendence from developers)
  14. 14. Dashboards, KPIs
  15. 15. Geoanalysis
  16. 16. Data Mining● Requires a strong preparation in computational statistics
  17. 17. What-if analysis
  18. 18. Open Source offers ● Reporting ● OLAP ● Charts ● Portal containers ● Data integration tools ● Libraries, CMS, scheduler ● Databases
  19. 19. SpagoBI (BI Suite) ● Engineering Informatica (Italy) ● Integration of components using drivers ● Comprehensive ● Full Open Source
  20. 20. Pentaho (BI Suite) ● Pentaho (USA) ● Acquisition instead of integration ● Strong marketing ● Commercial and Open Source
  21. 21. JasperServer (BI Suite) ● JasperSoft (USA) ● Famous for JasperReports ● Easy to use ● Commercial and Open Souce
  22. 22. Palo (In memory OLAP) ● Jedox (Germany) ● Interesting technology (M-OLAP, GPU) ● Excel and OpenOffice plugins ● Web spreadsheet and reporting ● Open Source and Commercial support
  23. 23. Talend (Data Integration) ● Talend (France) ● „Cool Vendor“ Gartner for Data Integration ● Data Integration, Data Quality, Data Management, ESB ● Open Source and Commercial support
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.