Business Intelligence
History

●   Business Intelligence term first apparition on
    1958 by Hans Peter Luhn, an IBM researcher
●   Authomatic method to provide current
    awareness services to scientists and engineers
●   Current definition of Business Intelligence as a
    combination of processes and technologies for
    gathering, storing, analyzing and providing
    access to informations to help enterprise users
    to make conscious decisions


                                           www.robertomarchetto.com
Main concept

●   Collect data from different sources
●   Integrate and clean up data in a common, easy
    to analyze repository
●   Provide business related analysis for managers
    and decision makers
●   Focus on business, data integration, data
    presentation




                                          www.robertomarchetto.com
Datawarehouse

●   Bill Inmon: A collection of data in support of
    decisional process
    ●   End-user oriented
    ●   Collected from different sources
    ●   Time dependence
    ●   Data is not editable
●   In theory means a group of processes
●   In the real world is often used for the database


                                             www.robertomarchetto.com
OLTP: On-Line Transaction Processing

 ●   Commonly used in ERP, CRM systems and
     database applications
 ●   Focuson transaction level (one invoice, one
     sales order, a search query, etc.)
 ●   Updates and insertions are frequent
 ●   Relational model with many tables, using
     normalization rules




                                           www.robertomarchetto.com
OLAP: On-Line Analytical Processing

●   A system designed for analysis prouposes
●   Focused on the data exploration on the whole
●   Data once added changes a lot less frequently
●   13 (12+0) rules of Dr. Codd (1993)
    ●   Multidimensional view
    ●   Intuitive data manipulation
    ●   Dimensions, Facts, Hierarchy levels, Cardinality



                                                 www.robertomarchetto.com
On-Line Analytical Processing




                         www.robertomarchetto.com
Relational OLAP

●   Uses relational database schemas and SQL to
    store and access OLAP cubes
●   Reuse of RDBMS technology
●   Many tools and vendors available
●   SQL can be used directly by many tools
●   Scalability




                                        www.robertomarchetto.com
Star schema




              www.robertomarchetto.com
Memory OLAP, Hybrid OLAP

●   Memory OLAP uses optimized multidimensional arrays
●   Requires pre-computation and storage of the cube
    (processing)
●   Often better in performances than ROLAP, better
    caching, multidimensional indexing
●   Compression techniques, statistical indexes
●   Less scalable than ROLAP on high volume of data,
    less tools and vendors available
●   Hybrid OLAP (HOLAP) is the combination of ROLAP
    and MOLAP

                                              www.robertomarchetto.com
Slowly Changing Dimensions

●   In some Business Intelligence implementations data is
    always added and almost never modified
●   This makes possible to go back in the timeline
●   For example if an employer was hired in a time period
    you can analyze data as being in that period, counting
    exactly the number of employes
●   A common approach to ensure Slowly Changing
    Dimesions is to add some special fields to the
    database records, giving a time-related validity for
    each record


                                                 www.robertomarchetto.com
MDX

●   Multidimensional Expressions (MDX) is a query
    language for OLAP databases
●   MDX is to OLAP as SQL queries are to OLTP
    databases
●   Powerfull on computing indexes and navigating
    through OLAP dimensions
●   SELECT
    {[Measures].[Store Sales]} ON COLUMNS
    {[Date].[2002], [Date].[2003]} ON ROWS
    FROM Sales
    WHERE ([Store].[USA].[CA])

                                        www.robertomarchetto.com
Features for a BI platform
●   Data storage, data management
●   Data Integration, process schedulement
●   Querying and reporting
●   On Line Analitycal Processing (OLAP)
●   Documents management, versioning
●   Statistical computations
●   Microsoft Office or Open Office support
●   Easy to use and end user self creation of
    documents (indipendence from developers)
                                           www.robertomarchetto.com
Dashboards, KPIs




                   www.robertomarchetto.com
Geoanalysis




              www.robertomarchetto.com
Data Mining

●   Requires a strong preparation in computational statistics




                                                   www.robertomarchetto.com
What-if analysis




                   www.robertomarchetto.com
Open Source offers

         ●   Reporting
         ●   OLAP
         ●   Charts
         ●   Portal containers
         ●   Data integration tools
         ●   Libraries, CMS,
             scheduler
         ●   Databases

                         www.robertomarchetto.com
SpagoBI (BI Suite)

         ●   Engineering
             Informatica (Italy)
         ●   Integration of
             components using
             drivers
         ●   Comprehensive
         ●   Full Open Source




                          www.robertomarchetto.com
Pentaho (BI Suite)

         ●   Pentaho (USA)
         ●   Acquisition instead of
             integration
         ●   Strong marketing
         ●   Commercial and
             Open Source




                         www.robertomarchetto.com
JasperServer (BI Suite)

            ●   JasperSoft (USA)
            ●   Famous for
                JasperReports
            ●   Easy to use
            ●   Commercial and
                Open Souce




                              www.robertomarchetto.com
Palo (In memory OLAP)

           ●   Jedox (Germany)
           ●   Interesting technology
               (M-OLAP, GPU)
           ●   Excel and OpenOffice
               plugins
           ●   Web spreadsheet and
               reporting
           ●   Open Source and
               Commercial support

                           www.robertomarchetto.com
Talend (Data Integration)

             ●   Talend (France)
             ●   „Cool Vendor“
                 Gartner for Data
                 Integration
             ●   Data Integration, Data
                 Quality, Data
                 Management, ESB
             ●   Open Source and
                 Commercial support

                             www.robertomarchetto.com

Business Intelligence Open Source

  • 1.
  • 2.
    History ● Business Intelligence term first apparition on 1958 by Hans Peter Luhn, an IBM researcher ● Authomatic method to provide current awareness services to scientists and engineers ● Current definition of Business Intelligence as a combination of processes and technologies for gathering, storing, analyzing and providing access to informations to help enterprise users to make conscious decisions www.robertomarchetto.com
  • 3.
    Main concept ● Collect data from different sources ● Integrate and clean up data in a common, easy to analyze repository ● Provide business related analysis for managers and decision makers ● Focus on business, data integration, data presentation www.robertomarchetto.com
  • 4.
    Datawarehouse ● Bill Inmon: A collection of data in support of decisional process ● End-user oriented ● Collected from different sources ● Time dependence ● Data is not editable ● In theory means a group of processes ● In the real world is often used for the database www.robertomarchetto.com
  • 5.
    OLTP: On-Line TransactionProcessing ● Commonly used in ERP, CRM systems and database applications ● Focuson transaction level (one invoice, one sales order, a search query, etc.) ● Updates and insertions are frequent ● Relational model with many tables, using normalization rules www.robertomarchetto.com
  • 6.
    OLAP: On-Line AnalyticalProcessing ● A system designed for analysis prouposes ● Focused on the data exploration on the whole ● Data once added changes a lot less frequently ● 13 (12+0) rules of Dr. Codd (1993) ● Multidimensional view ● Intuitive data manipulation ● Dimensions, Facts, Hierarchy levels, Cardinality www.robertomarchetto.com
  • 7.
    On-Line Analytical Processing www.robertomarchetto.com
  • 8.
    Relational OLAP ● Uses relational database schemas and SQL to store and access OLAP cubes ● Reuse of RDBMS technology ● Many tools and vendors available ● SQL can be used directly by many tools ● Scalability www.robertomarchetto.com
  • 9.
    Star schema www.robertomarchetto.com
  • 10.
    Memory OLAP, HybridOLAP ● Memory OLAP uses optimized multidimensional arrays ● Requires pre-computation and storage of the cube (processing) ● Often better in performances than ROLAP, better caching, multidimensional indexing ● Compression techniques, statistical indexes ● Less scalable than ROLAP on high volume of data, less tools and vendors available ● Hybrid OLAP (HOLAP) is the combination of ROLAP and MOLAP www.robertomarchetto.com
  • 11.
    Slowly Changing Dimensions ● In some Business Intelligence implementations data is always added and almost never modified ● This makes possible to go back in the timeline ● For example if an employer was hired in a time period you can analyze data as being in that period, counting exactly the number of employes ● A common approach to ensure Slowly Changing Dimesions is to add some special fields to the database records, giving a time-related validity for each record www.robertomarchetto.com
  • 12.
    MDX ● Multidimensional Expressions (MDX) is a query language for OLAP databases ● MDX is to OLAP as SQL queries are to OLTP databases ● Powerfull on computing indexes and navigating through OLAP dimensions ● SELECT {[Measures].[Store Sales]} ON COLUMNS {[Date].[2002], [Date].[2003]} ON ROWS FROM Sales WHERE ([Store].[USA].[CA]) www.robertomarchetto.com
  • 13.
    Features for aBI platform ● Data storage, data management ● Data Integration, process schedulement ● Querying and reporting ● On Line Analitycal Processing (OLAP) ● Documents management, versioning ● Statistical computations ● Microsoft Office or Open Office support ● Easy to use and end user self creation of documents (indipendence from developers) www.robertomarchetto.com
  • 14.
    Dashboards, KPIs www.robertomarchetto.com
  • 15.
    Geoanalysis www.robertomarchetto.com
  • 16.
    Data Mining ● Requires a strong preparation in computational statistics www.robertomarchetto.com
  • 17.
    What-if analysis www.robertomarchetto.com
  • 18.
    Open Source offers ● Reporting ● OLAP ● Charts ● Portal containers ● Data integration tools ● Libraries, CMS, scheduler ● Databases www.robertomarchetto.com
  • 19.
    SpagoBI (BI Suite) ● Engineering Informatica (Italy) ● Integration of components using drivers ● Comprehensive ● Full Open Source www.robertomarchetto.com
  • 20.
    Pentaho (BI Suite) ● Pentaho (USA) ● Acquisition instead of integration ● Strong marketing ● Commercial and Open Source www.robertomarchetto.com
  • 21.
    JasperServer (BI Suite) ● JasperSoft (USA) ● Famous for JasperReports ● Easy to use ● Commercial and Open Souce www.robertomarchetto.com
  • 22.
    Palo (In memoryOLAP) ● Jedox (Germany) ● Interesting technology (M-OLAP, GPU) ● Excel and OpenOffice plugins ● Web spreadsheet and reporting ● Open Source and Commercial support www.robertomarchetto.com
  • 23.
    Talend (Data Integration) ● Talend (France) ● „Cool Vendor“ Gartner for Data Integration ● Data Integration, Data Quality, Data Management, ESB ● Open Source and Commercial support www.robertomarchetto.com