1
Data and Knowledge
Management
2
Data Management:
A Critical Success Factor
• The difficulties and the process
• Data sources and collection
• Data quality
• Multimedia and object-oriented databases
• Document management
3
Difficulties
• Data amount increases exponentially
• Data: multiple sources
• Small portion of data useful for specific
decisions
• Increased need for external data
4
Difficulties ..2
• Differing legal requirements among
countries
• Selection of data management tool - large
number
• Data security, quality, and integrity
5
Data Life Cycle Process and
Knowledge Discovery
• Data collected and stored in databases
• Processed and stored in data warehouses
• Transformation - ready for analysis
• Data mining tools - knowledge
• Presentation
6
Data Sources and Collection
• Internal data
• Personal data
• External data
• Internet and commercial database services
7
Data Quality (DQ)
Intrinsic
– Accuracy, objectivity, believability, and
reputation
Accessibility
– Accessibility and access security
8
Data Quality ..2
Contextual DQ
– Relevancy, value added, timeliness,
completeness
Representation DQ
– Interpretability, ease of understanding, concise
representation, and consistent representation
9
10
Complex Databases
• Object-Oriented database
• Multimedia database
• Document management
11
Data Warehousing,
Mining, and Analysis
• Transaction versus analytical processing
• Data warehouse and data marts
• Knowledge discovery, analysis, and mining
12
Good Data Delivery System
• Easy data access by end users
• Quicker decision making
• Accurate and effective decision making
• Flexible decision making
13
Processing Solutions
• Business representation of data for end
users
• Client-server environment - end users query
and reporting capability
• Server-based repository (data warehouse)
14
Data Warehouse and Marts
The purpose of a data warehouse is to
establish a data repository that makes data
accessible in a form readily acceptable for
analytical processing activities.
A data mart is dedicated to a functional or
regional area. (subset of a warehouse)
15
Data Warehouse
• A data warehouse contains historical data,
not operational
• It contains data from a number of databases
so the data must be ‘cleaned’ to ensure that
the data definitions are consistent
16
Characteristics of Data
Warehousing
• Organization
• Consistency
• Time variant
• Nonvolatile
• Relational
17
The Data Warehouse and Marts
• Benefits
• Cost
• Architecture
• Putting the data warehouse on the internet
• Suitability
18
Knowledge Discovery, Analysis,
and Mining
• Foundations of knowledge discovery in
databases (KDD)
• Tools and techniques of KDD
• Online analytical processing (OLAP)
• Data mining
19
The Foundations of Knowledge
Discovery in Databases (KDD)
• Massive data collection
• Powerful multiprocessor computers
• Data mining algorithms
20
21
OLAP Queries
• Access very large amounts of data
• Analyze the relationships between many
types of business elements
• Involve aggregated data
• Compare aggregated data over hierarchical
time periods
22
OLAP Queries ..2
• Present data in different perspectives
• Involve complex calculations between data
elements
• Able to respond quickly to user requests
23
Data Mining
• Automated prediction of trends
• Automated discovery of previously
unknown patterns
• Example: People who buy Barbie dolls also
buy a particular chocolate bar – What can
we do with that information?
24
Data Mining
Characteristics and Objectives
• Data often buried deep within large
databases
• Data may be consolidated in data
warehouse or kept in internet and intranet
servers
• Usually client-server architecture
25
Data Mining
Characteristics and Objectives
• Data mining tools extract information
buried in corporate files or archived public
records
• The “miner” is often an end user
• “Striking it rich” usually involves finding
unexpected, valuable results
• Parallel processing
26
Data Mining
Characteristics and Objectives
• Data mining yields five types of
information
• Data miners can use one or several tools
27
Data Mining Yields Five Types of
Information
• Association
• Sequences
• Classifications
• Clusters
• Forecasting
28
Data Mining Techniques
• Case-based reasoning
• Neural computing
• Intelligent agents
• Others: decision trees, genetic algorithms,
nearest neighbor method, and rule reduction
29
Data Visualization Technologies
• Data visualization
• Multidimensionality
• Geographical information systems (GIS)
30
Data Visualization
Data visualization refers to presentation of
data by technologies digital images,
geographical information systems, graphical
user interfaces, multidimensional tables and
graphs, virtual reality, three-dimensional
presentations and animation.
31
Multidimensionality
Major advantage
– data can be organized the way
managers prefer to see the data
Three factors
– dimensions, measures, and time
32
Examples
Dimensions
– Products, salespeople, market segments,
business units, geographical locations
Measures
– Money, sales volume, head count, inventory,
profit, actual versus forecasted
Time
– Daily, weekly, monthly, quarterly, yearly
33
Geographical Information
Systems (GIS)
A GIS is a computer-based system for
capturing, storing, checking,
integrating, manipulating, and
displaying data using digitized maps.
34
Components of a GIS
• Software
• Data
• Emerging GIS applications
35
Emerging GIS Applications
Integration of GIS and GPS
– Reengineer aviation and shipping industries
Intelligent GIS (integration of GIS and ES)
User interface
– Multimedia, 3D graphics, animated and
interactive maps
Web applications
36
Knowledge Management
• Knowledge management or managing
knowledge databases
• A knowledge base is a database that
contains information or organizational
know how.
37
Accenture’s
Learning Organization Knowledge Base
• Global best practices
• These data combined with ongoing research
identify areas to be developed
• Research analysis team with content experts
to develop best practices
• Qualitative and quantitative information and
tools in Intranet for corporate wide access
38
Accenture’s Knowledge Base ..2
• Best company profiles
• Relevant Accenture engagement experience
• Top 10 case studies and articles
• World-class performance measures
• Diagnostic tools
39
Accenture’s Knowledge Base ..3
• Customizable presentations
• Process definitions
• Directory of internal experts
• Best control practice
• Tax implementations
40
Conclusion
• Cost-benefit analysis
• Where to store data physically
• Disaster recovery
• Internal or external
• Data security and ethics
• Data purging
41
Conclusion ..2
• The legacy data problem
• Data delivery
• Privacy – especially customer information
• What to do?
• When to do it?

DataMgmt.ppt

  • 1.
  • 2.
    2 Data Management: A CriticalSuccess Factor • The difficulties and the process • Data sources and collection • Data quality • Multimedia and object-oriented databases • Document management
  • 3.
    3 Difficulties • Data amountincreases exponentially • Data: multiple sources • Small portion of data useful for specific decisions • Increased need for external data
  • 4.
    4 Difficulties ..2 • Differinglegal requirements among countries • Selection of data management tool - large number • Data security, quality, and integrity
  • 5.
    5 Data Life CycleProcess and Knowledge Discovery • Data collected and stored in databases • Processed and stored in data warehouses • Transformation - ready for analysis • Data mining tools - knowledge • Presentation
  • 6.
    6 Data Sources andCollection • Internal data • Personal data • External data • Internet and commercial database services
  • 7.
    7 Data Quality (DQ) Intrinsic –Accuracy, objectivity, believability, and reputation Accessibility – Accessibility and access security
  • 8.
    8 Data Quality ..2 ContextualDQ – Relevancy, value added, timeliness, completeness Representation DQ – Interpretability, ease of understanding, concise representation, and consistent representation
  • 9.
  • 10.
    10 Complex Databases • Object-Orienteddatabase • Multimedia database • Document management
  • 11.
    11 Data Warehousing, Mining, andAnalysis • Transaction versus analytical processing • Data warehouse and data marts • Knowledge discovery, analysis, and mining
  • 12.
    12 Good Data DeliverySystem • Easy data access by end users • Quicker decision making • Accurate and effective decision making • Flexible decision making
  • 13.
    13 Processing Solutions • Businessrepresentation of data for end users • Client-server environment - end users query and reporting capability • Server-based repository (data warehouse)
  • 14.
    14 Data Warehouse andMarts The purpose of a data warehouse is to establish a data repository that makes data accessible in a form readily acceptable for analytical processing activities. A data mart is dedicated to a functional or regional area. (subset of a warehouse)
  • 15.
    15 Data Warehouse • Adata warehouse contains historical data, not operational • It contains data from a number of databases so the data must be ‘cleaned’ to ensure that the data definitions are consistent
  • 16.
    16 Characteristics of Data Warehousing •Organization • Consistency • Time variant • Nonvolatile • Relational
  • 17.
    17 The Data Warehouseand Marts • Benefits • Cost • Architecture • Putting the data warehouse on the internet • Suitability
  • 18.
    18 Knowledge Discovery, Analysis, andMining • Foundations of knowledge discovery in databases (KDD) • Tools and techniques of KDD • Online analytical processing (OLAP) • Data mining
  • 19.
    19 The Foundations ofKnowledge Discovery in Databases (KDD) • Massive data collection • Powerful multiprocessor computers • Data mining algorithms
  • 20.
  • 21.
    21 OLAP Queries • Accessvery large amounts of data • Analyze the relationships between many types of business elements • Involve aggregated data • Compare aggregated data over hierarchical time periods
  • 22.
    22 OLAP Queries ..2 •Present data in different perspectives • Involve complex calculations between data elements • Able to respond quickly to user requests
  • 23.
    23 Data Mining • Automatedprediction of trends • Automated discovery of previously unknown patterns • Example: People who buy Barbie dolls also buy a particular chocolate bar – What can we do with that information?
  • 24.
    24 Data Mining Characteristics andObjectives • Data often buried deep within large databases • Data may be consolidated in data warehouse or kept in internet and intranet servers • Usually client-server architecture
  • 25.
    25 Data Mining Characteristics andObjectives • Data mining tools extract information buried in corporate files or archived public records • The “miner” is often an end user • “Striking it rich” usually involves finding unexpected, valuable results • Parallel processing
  • 26.
    26 Data Mining Characteristics andObjectives • Data mining yields five types of information • Data miners can use one or several tools
  • 27.
    27 Data Mining YieldsFive Types of Information • Association • Sequences • Classifications • Clusters • Forecasting
  • 28.
    28 Data Mining Techniques •Case-based reasoning • Neural computing • Intelligent agents • Others: decision trees, genetic algorithms, nearest neighbor method, and rule reduction
  • 29.
    29 Data Visualization Technologies •Data visualization • Multidimensionality • Geographical information systems (GIS)
  • 30.
    30 Data Visualization Data visualizationrefers to presentation of data by technologies digital images, geographical information systems, graphical user interfaces, multidimensional tables and graphs, virtual reality, three-dimensional presentations and animation.
  • 31.
    31 Multidimensionality Major advantage – datacan be organized the way managers prefer to see the data Three factors – dimensions, measures, and time
  • 32.
    32 Examples Dimensions – Products, salespeople,market segments, business units, geographical locations Measures – Money, sales volume, head count, inventory, profit, actual versus forecasted Time – Daily, weekly, monthly, quarterly, yearly
  • 33.
    33 Geographical Information Systems (GIS) AGIS is a computer-based system for capturing, storing, checking, integrating, manipulating, and displaying data using digitized maps.
  • 34.
    34 Components of aGIS • Software • Data • Emerging GIS applications
  • 35.
    35 Emerging GIS Applications Integrationof GIS and GPS – Reengineer aviation and shipping industries Intelligent GIS (integration of GIS and ES) User interface – Multimedia, 3D graphics, animated and interactive maps Web applications
  • 36.
    36 Knowledge Management • Knowledgemanagement or managing knowledge databases • A knowledge base is a database that contains information or organizational know how.
  • 37.
    37 Accenture’s Learning Organization KnowledgeBase • Global best practices • These data combined with ongoing research identify areas to be developed • Research analysis team with content experts to develop best practices • Qualitative and quantitative information and tools in Intranet for corporate wide access
  • 38.
    38 Accenture’s Knowledge Base..2 • Best company profiles • Relevant Accenture engagement experience • Top 10 case studies and articles • World-class performance measures • Diagnostic tools
  • 39.
    39 Accenture’s Knowledge Base..3 • Customizable presentations • Process definitions • Directory of internal experts • Best control practice • Tax implementations
  • 40.
    40 Conclusion • Cost-benefit analysis •Where to store data physically • Disaster recovery • Internal or external • Data security and ethics • Data purging
  • 41.
    41 Conclusion ..2 • Thelegacy data problem • Data delivery • Privacy – especially customer information • What to do? • When to do it?