20200713152029_PPT4-Business analytics using data science techniques and case study-R1.PPT
1. Course : ISYE8015_Selected Topic in
Industrial Engineering
Period : June 2020
Business analytics using data
science techniques
2. Topic
1. What is business analytics ?
2. Data preparation for business analytics
3. Business analytics and intelligence application
framework
3. Business Intelligence and
Analytic
• Business intelligence
– Acquisition of data and information for use
in decision-making activities
• Business analytics
– Models and solution methods
• Data mining
– Applying models and methods to data to
identify patterns and trends
4. Data, Information, Knowledge
• Data
– Items that are the most elementary descriptions of
things, events, activities, and transactions
– May be internal or external
• Information
– Organized data that has meaning and value
• Knowledge
– Processed data or information that conveys
understanding or learning applicable to a problem or
activity
6. Database Model
• Hierarchical
– Top down, like inverted tree
– Fields have only one “parent”, each “parent” can have multiple
“children”
– Fast
• Network
– Relationships created through linked lists, using pointers
– “Children” can have multiple “parents”
– Greater flexibility, substantial overhead
• Relational
– Flat, two-dimensional tables with multiple access queries
– Examines relations between multiple tables
– Flexible, quick, and extendable with data independence
• Object oriented
– Data analyzed at conceptual level
– Inheritance, abstraction, encapsulation
7.
8. Migrating Data
• Business rules
– Stored in metadata repository
– Applied to data warehouse centrally
• Data extracted from all relevant sources
– Loaded through data-transformation tools or
programs
– Separate operation and decision support
environments
• Correct problems in quality before data
stored
– Cleanse and organize in consistent manner
9. Business Analytics and
intelligence to support
visualization
• Technologies supporting visualization and
interpretation
– Digital imaging, GIS, GUI, tables,
multidimensions, graphs, VR, 3D, animation
– Identify relationships and trends
• Data manipulation allows real time look at
performance data
10. Data Analytic System
• Real-time queries and analysis
• Real-time decision-making
• Real-time data warehouses updated
daily or more frequently
–Updates may be made while queries
are active
–Not all data updated continuously
• Deployment of business analytic
applications
11. Business Analytics : GIS
• Computerized system for managing and
manipulating data with digitized maps
– Geographically oriented
– Geographic spreadsheet for models
– Software allows web access to maps
– Used for modeling and simulations
12.
13. Business Analytics: Web
• Web analytics
– Application of business analytics to Web
sites
• Web intelligence
– Application of business intelligence
techniques to Web sites
15. About Pentaho
• Recognized leader in business analytics & data integration
• Subscription-based business model
• Achieved critical mass:
• Over 1,200 commercial customers
• Over 10,000 production deployments
• Over 185 countries
• Stewardship of most important open source analytics
projects
INDUSTRY RECOGNITION OVER 160 PARTNERS GLOBALLY
16. Pentaho for Big Data Analytic
Big
Data
Mgmt
Hadoop
Java MapReduce, Pig
Pentaho MapReduce
NoSQL Databases Analytic Databases
Data Integration
Job Orchestration
Workflow
Scheduling
High Performance
Visual IDE
Data
Integration
Pentaho Business Analytics
•
R
•
3rd Party BI Tools
•
Applications
3rd Party Tools
Big
Analytics
17. Business Analytic Model
Advanced
Power Users
& Viewers
Data Science
Information
Consumers
Dashboards
Knowledge
Workers/
Business
Users
Analysis
Business
Users
Reporting
Power Users,
Developers &
DBAs
Data
Advanced
Predictive
Analysis
Self-service Interactive
KPI & Metrics and
Visualization
Self-service Interactive and
Ad Hoc Analysis
Ad hoc and
Operational
Reports
High Performance Data Integration,
BIG DATA, Cleansing
and Presentation
Components
are
independent
18. High Level Feature/Functions
Advanced
Power Users
& Viewers
Data Science
Information
Consumers
Dashboards
Knowledge
Workers/
Business
Users
Analysis
Business
Users
Reporting
Power Users,
Developers &
DBAs
Data
Advanced
Predictive
Analysis
Self-service Interactive
KPI & Metrics and
Visualization
Self-service Interactive and
Ad Hoc Analysis
Ad hoc and
Operational
Reports
High Performance Data Integration,
BIG DATA, Cleansing
and Presentation
22. High Level Feature/Functions
Advanced
Power Users
& Viewers
Data Mining
Information
Consumers
Dashboards
Knowledge
Workers/
Business
Users
Analysis
Business
Users
Reporting
Power Users,
Developers &
DBAs
Data
Advanced
Predictive
Analysis
Self-service Interactive
KPI & Metrics and
Visualization
Self-service Interactive and
Ad Hoc Analysis
Ad hoc and
Operational
Reports
High Performance Data Integration,
BIG DATA, Cleansing
and Presentation
26. High Level Feature/Functions
Advanced
Power Users
& Viewers
Data Science
Information
Consumers
Dashboards
Knowledge
Workers/
Business
Users
Analysis
Business
Users
Reporting
Power Users,
Developers &
DBAs
Data
Advanced
Predictive
Analysis
Self-service Interactive
KPI & Metrics and
Visualization
Self-service Interactive and
Ad Hoc Analysis
Ad hoc and
Operational
Reports
High Performance Data Integration,
BIG DATA, Cleansing
and Presentation
27. Enhanced In-Memory Analytics
• Enhanced in-memory caching for speed of
thought visualization & analysis
– More re-usability of in-memory data
– Fewer trips to the database/disk
• Builds on existing unique extreme-scale in-
memory analytics
– Support for external data grids
• Infinispan / JBoss Enteprise Data Grid
and Memcached
• Scale to caching hundreds of GBs
(potentially TBs) of data in-memory
• Competition
– Java heap or C++ memory space (a few GB
at most (most BI products)
or
– Proprietary (hard to manage) in-memory
technology (e.g. Qlikview, Microstrategy)
31. High Level Feature/Functions
Advanced
Power Users
& Viewers
Data Science
Information
Consumers
Dashboards
Knowledge
Workers/
Business
Users
Analysis
Business
Users
Reporting
Power Users,
Developers &
DBAs
Data
Advanced
Predictive
Analysis
Self-service Interactive
KPI & Metrics and
Visualization
Self-service Interactive and
Ad Hoc Analysis
Ad hoc and
Operational
Reports
High Performance Data Integration,
BIG DATA, Cleansing
and Presentation
Even the best interactive visualization is frustrating if the end-user has to sit there interminably waiting for the system to respond. Testing has shown that usage of BI systems drops dramatically once response time starts to exceed 5 seconds, as users tend to lose their train of thought. This is what we mean by “speed of thought” response times – a snappy system that keeps up with thought-train of the user.
By avoiding database round-trips, in-memory data caching is a popular and growing approach to providing this performance, But with the dramatic growth of data volumes it is become more and more challenging for traditional BI applications to keep-up. Pentaho is the “only” business analytics provider to use the extreme-scale in-memory caching technology used to power some of the world’s highest volume consumer websites such as Youtube and Amazon.com. That technology is known as data grids – a way of caching large amounts of data across an inexpensive cluster of commodity servers.
Pentaho’s analytics supports two of the leading data grids – Infinispan (also known as JBoss Enterprise Data Grid) and Memcached.
Traditional in-memory products written in either Java or C++ are constrained to using a limited amount of memory on the server on which they are executing – at most a few GBs. By contrast a data grid can be distributed across a cluster of commodity servers and can address hundreds of GBs, and potentially TBs of memory in future as hardware memory sizes get larger and less expensive. This allows customers to load all or most of their data into memory, so delivering consistent speed of thought responses times, orders of magnitude faster than needing to query a database because the data the user needed was not in-memory.
A couple of vendors, Qliktech and Microstrategy, do provide proprietary in-memory caching capabilities. But these require special training and skills to use and maintain – and they are single server solutions so constrained to the amount of physical memory that can be installed on a single server, typically no more than 64GB.