6. What is “Big Data”
• Volume
• Variety
• Velocity
– Rate of Collection
– Need for Agility
7. Primary BI Systems
1. Reporting systems
2. Data-mining systems
3. Knowledge
Management
Systems
4. Expert Systems
8. Reporting Systems
• Integrate data from
multiple systems.
• Calculate and
summarize data.
• Sorting, grouping,
summing, averaging,
comparing
• Present data as
meaningful information.
10. Data Mining Systems
• Use sophisticated
statistical techniques,
regression analysis,
and decision tree
analysis.
• Discovers hidden
patterns and
relationships.
• Can be used as the
basis for predictions.
11. Data Mining Systems
• MS Business
Intelligence
Development Studio
• SAS Enterprise
Miner
• WEKA
12. Knowledge Management
Systems
• Collect and share
human knowledge.
• Best practices
• Employee training
• Process
documentation
• Collaboration &
Communication
16. Business Intelligence Problems
• Raw data usually unsuitable for
sophisticated reporting or data
mining
• Dirty data (misspelled, wrong type,
missing, duplication, inconsistent)
• Multi-dimensionality
• Wrong granularity (summary vs.
detail)
17. Data Warehousing
• Extract and clean data from
various operational systems
and other sources
• Store and catalog data for BI
processing
• Extract, clean, prepare data
• Stored in data-warehouse
DBMS
19. Data Mart
• Created to address particular needs
• Smaller than data warehouse
• Users may not have data management
expertise
• Data extracted from data warehouse for a
functional area
20. BI & MRV
MRV could:
• Develop a plan to organize storage of data and how
management might use it;
• Use a reporting system to provide information about
how much repeat business each guide generates;
• Identify high value customers and customer referrals;
• Analyze equipment inventory usage to guide future
equipment purchases.
Editor's Notes
Volume is actually a benefit. The benefit gained from the ability to process large amounts of information is the main attraction of big data analytics. Having more data beats out having better models: simple bits of math can be unreasonably effective given large amounts of data. If you could run that forecast taking into account 300 factors rather than 6, could you predict demand better? This volume presents the most immediate challenge to conventional IT structures. It calls for scalable storage, and a distributed approach to querying. Many companies already have large amounts of archived data, perhaps in the form of logs, but not the capacity to process it.