Organising data in
traditionalfile system
• Data Hierarchy in Computer Systems
• Problems with the traditional file system
3.
Effective Information Systems
•Deliver accurate, timely, and relevant information
• Accurate: Free from errors or inconsistencies
• Timely: Available when decision-makers need it
• Relevant: Useful for the task or decision at hand
The Challenge in Businesses
• Many businesses lack quality information
• Common issues: delayed, incorrect, or irrelevant data
• Root cause: poorly organized and maintained data
4.
Why Data ManagementMatters
• Ensures data is structured, clean, and accessible
• Supports better decisions and efficiency
• Core to building reliable information systems
Traditional File Management Issues
• Data stored in isolated files
• Leads to duplication, inconsistency, and inefficiency
• Difficult to retrieve and analyze data effectively
* Data management is of paramount importance
5.
Data Hierarchy inComputer Systems
• Bit: Smallest data unit a computer can process
• Byte: Group of bits; represents a character (letter, number, symbol)
• Field: Group of characters (e.g., name, age)
• Record: Collection of related fields (e.g., student name, course, grade)
• File: Group of similar records (e.g., student course file)
• Database: Collection of related files (e.g., student records + financial data)
6.
Key Concepts
• Entity:A person, place, thing, or event (e.g., student, course)
• Attribute: Characteristic of an entity (e.g., Student_ID, Course, Grade)
• FieldValues: Specific data stored for each attribute in a record
For example…
• Entity: COURSE
• Attributes: Student_ID, Course, Date, Grade
• Record: Specific student’s data for one course
• File:All course records
• Database: Combines course files with other student data
8.
Problems inTraditional FileEnvironment
• Systems developed independently in each department
• No centralized planning or integration
• Each application had its own files and software programs
• Example: Human Resources System
• Separate files for:
• Personnel, Payroll, Medical insurance, Pension, Mailing list
• Leads to hundreds of isolated files over time
• Key Problems
• Data Redundancy: Same data stored in multiple files
• Data Inconsistency: Conflicting data across systems
• Difficult Maintenance: Hard to update, manage, or integrate systems
• Inefficient Operations:Time-consuming and resource-heavy
10.
Data Redundancy andInconsistency
• Data Redundancy:
• Duplicate data stored in multiple files
• Collected independently by different departments
• Wastes storage and resources
• Data Inconsistency:
• Same attribute shows different values in different systems
• Causes confusion and errors in decision-making
11.
Examples of Inconsistency
•Course Data: Date updated in one system but not others
• Student_ID: Named differently across systems (e.g., Student_ID vs. ID)
• Coding Differences:
• Clothing size represented as “extra large” in one system, “XL” in another
• Leads to integration issues across departments
Impact on Business Systems
• Hard to build integrated systems like:
• Customer Relationship Management (CRM)
• Supply Chain Management (SCM)
• Enterprise Systems (ERP)
• Results in poor data quality and decision-making
12.
Program-Data Dependence
• Dataand programs are tightly linked
• Each program must define data format and location
Key Issues
• Any change in program may require changes to the data
• Data format changes can break multiple programs
• Example: Changing ZIP code from 5-digit to 9-digit
• Affects all programs using the old 5-digit format
Consequences
• High maintenance costs
• Lack of flexibility in updating systems
• Increases risk of errors and system failures
13.
Lack of Flexibility
•Can generate routine, scheduled reports only
• Ad hoc reports are difficult and time-consuming to produce
• Cannot easily respond to unexpected information needs
Challenges
• Data is available but hard to access
• Requires extensive programming
• May take weeks of effort to compile needed data
Impact
• Slows down decision-making
• Increases costs and delays
• Limits the organization’s ability to be agile and responsive
14.
Poor Security
• Lackof control over data access and usage
• Unauthorized access and data changes are possible
• No centralized way to track user activity
Key Risks
• Sensitive data may be exposed or altered
• No audit trail to monitor who accessed or modified information
• Increases chances of data breaches and internal misuse
Impact
• Compromises trust and data integrity
• Can lead to legal issues and financial losses
• Weakens overall information system security
15.
Lack of DataSharing and Availability
• Data is stored in isolated systems across departments
• No connection between different data files
• Makes data sharing across functions very difficult
Key Issues
• Slow access to required information
• Inconsistent data leads to distrust and confusion
• Users may avoid using systems due to unreliable data
Impact
• Breakdown in communication between departments
• Limits collaboration and informed decision-making
• Reduces overall efficiency and effectiveness
16.
The database approachto data management
• Database management systems (DBMS)
• How DBMS solve the problems of traditional file system
• Relational DBMS
• Operations of relational DBMS
• Non-Relational databases in the Cloud
• The database approach to data management
• Querying and reporting
• Designing Databases
• Normalization and Entity relationship diagrams
17.
Database Technology: ABetter Approach
• Centralized collection of organized data
• Serves multiple applications efficiently
• Eliminates redundancy by storing data once
• Key Benefits
• Data appears to be in one location for all users
• Supports data sharing across departments
• Reduces data inconsistency and duplication
• Use one centralized Human Resources database
• Impact
• Improved accuracy, accessibility, and efficiency
• Easier maintenance and updates
• Enables better decision-making and integration
18.
Database management system:Whatis a DBMS?
• Software that centralizes data and manages it efficiently
• Acts as an interface between applications and data files
• Provides easy access to data for multiple programs
Key Functions
• Retrieves data when requested by applications
• Separates logical and physical views of data
• Logical view: How users see the data
• Physical view: How data is stored
Benefits of DBMS
• Reduces programming effort
• Allows multiple users to access customized views
• Ensures data consistency and security
Example: Human Resources Database
• Benefits specialist sees: Name, SSN, Health Insurance, etc.
20.
How a DBMSSolves Traditional File System Problems
• Reduces data redundancy by minimizing repeated data
• Helps maintain data consistency across all systems
• Decouples programs and data, allowing independent management
Key Advantages
• Improves data accuracy and reliability
• Enables centralized data management and security
• Supports ad hoc queries for flexible data access
Efficiency Gains
• Faster access to information
• Lower development and maintenance costs
• Enhanced information sharing across departments
22.
Relational DBMS (RDBMS)
•Most common type of Database Management System
• Stores data in two-dimensional tables (relations)
• Each table represents an entity and its attributes
Key Features
• Easy to organize, access, and update data
• Supports relationships between tables using keys
• Ideal for small to large-scale applications
Popular RDBMS Examples
• Microsoft Access – for desktop systems
• MySQL – open-source, widely used
• Oracle Database, SQL Server, DB2 – for enterprise use
• Oracle Database Lite – for mobile devices
23.
Organizing Data inRelational Databases
• Data is stored in tables (also called relations)
• Each table contains rows (records/tuples) and columns (fields/attributes)
• Separate tables are created for different entities (e.g., SUPPLIER, PART)
Structure of aTable
• Columns = Attributes (e.g., name, address, zip code)
• Rows = Records (each row = one instance of the entity)
• Fields = Individual data elements
Key Concepts
• Primary Key: Uniquely identifies each record in a table (e.g., Supplier_Number)
• Foreign Key: A field in one table that links to the primary key in another table (e.g.,
Supplier_Number in PART table)
• Enables relationships between tables
25.
• Key Operationsof a Relational DBMS
• Relational tables can be combined using shared data elements (e.g.,
Supplier_Number)
• Three basic operations: Select, Join, and Project
• 1. Select Operation
• Retrieves rows (records) that meet a specific condition
• Example: Select parts where Part_Number = 137 or 150
• 2. Join Operation
• Combines tables based on a common field
• Example: Join PART and SUPPLIER tables using Supplier_Number to get
supplier info for selected parts
26.
• 3. ProjectOperation
• Retrieves specific columns from a table
• Example: Extract only Part_Number, Part_Name, Supplier_Number,
Supplier_Name
• Use Case Example
• Goal: Find suppliers for Part 137 or 150
• Use Select → Join → Project to get clear and focused data output
27.
📦 Non-Relational Databases& Cloud Databases
• Need for Alternatives toTraditional Databases
• Traditional relational databases: based on tables, columns, and rows
• Not ideal for big data, web services, and unstructured data
• Rise of cloud computing and massive data volumes demands flexible solutions
• What Are Non-Relational (NoSQL) Databases?
• Use a flexible data model (no fixed schema)
• Designed for large-scale, distributed data processing
• Handle structured + unstructured data (e.g., social media, web content, images)
• Examples:
• Oracle NoSQL Database
• Amazon SimpleDB
28.
• Cloud-Based RelationalDatabases
• Offer traditional RDBMS features on a cloud platform
• Examples:
• Amazon RDS (MySQL, Oracle, SQL Server)
• Oracle Database Cloud Service
• Microsoft SQL Azure Database
• Private Cloud Databases
• Private cloud = secure, internal cloud infrastructure
• Example: Sabre Holdings (aviation industry SaaS provider)
• RunsOracle Database 11g; Supports 100+ projects & 700 users
• Reduces costs & improves performance
• Key Benefits
• Scalability (up or down as needed)
• Cost efficiency (pay-per-use model)
• Improved data management for web and mobile applications
• Better resource utilization through consolidation
29.
🛠️ Core Capabilitiesof a DBMS
• Data Definition Language (DDL)
• Defines structure and content of the database
• Used to create tables, define fields, and their characteristics
• Examples of definitions: data type, field size, constraints
• Data Dictionary
• A repository of metadata (data about data)
• Stores details like:
• Field names & descriptions
• Data types & sizes
• Formats & validation rules
• Key fields (e.g., Supplier_Number in Access)
30.
• Advanced DataDictionary Functions (Large Systems)
• Tracks:
• Data usage and ownership
• Authorization and access levels
• Responsible users or departments
• Linked programs, reports, and business functions
• Example: Microsoft Access
• Offers basic data dictionary functionality
• Shows:
• Field names
• Formats
• Key fields (indicated by a key icon)
31.
• 🔍Querying andReporting with DBMS
• Data Manipulation Language (DML)
• Used to add, update, delete, and retrieve data
• Allows end users and developers to extract and work with data
• Most popular DML: SQL (Structured Query Language)
• SQL – Structured Query Language
• Industry-standard for querying relational databases
• Example SQL:
32.
• DBMSTools forQuerying
• DB2, Oracle, SQL Server: use SQL
• MicrosoftAccess:
• Uses SQL behind the scenes
• Provides graphical tools for easy query building
• Users select tables, fields, and set conditions visually
• Reporting Features
• Organize retrieved data into polished reports
• Access simplifies reporting with built-in templates and layouts
33.
6.3: USING DATABASESTOIMPROVE
BUSINESS PERFORMANCEAND DECISION
MAKING
• The challenge of big data
• Business intelligence infrastructure
• Data warehouse and marts; Hadoop; In-memory
computing;Analytic platforms
• ANALYTICALTOOLS: RELATIONSHIPS, PATTERNS,
TRENDS
• Online analytical processing
• Data mining
• Text mining and web mining
34.
• Businesses usedatabases to keep track of basic transactions, such as paying
suppliers, processing orders, keeping track of customers, and paying
employees.
• But they also need databases to provide information that will help the
company run the business more efficiently, and help managers and
employees make better decisions.
• If a company wants to know which product is the most popular or who is its
most profitable customer, the answer lies in the data.
35.
🌐The Challenge ofBig Data
• Traditional Data Limitations
• Mostly transaction-based, fitting neatly in rows & columns
• Managed well by relational DBMS
• Examples: invoices, payments, payroll records
Explosion of New Data Sources
• Web traffic, social media posts (tweets, status updates)
• Email messages, sensor outputs, and machine-generated data
• Unstructured/semi-structured formats incompatible with traditional
DBMS
36.
• What isBig Data?
• Massive volumes: petabytes to exabytes
• Data types: structured, semi-structured, and unstructured
• Produced rapidly and continuously
• E.g., 10 TB from one jet engine in 30 mins
• Why Big Data Matters
• Reveals deep patterns, trends, and anomalies
• Can drive insights into:
• Customer behavior; Weather trends; Market activity
• Need for New Tools & Technologies
• Big data requires:
• Advanced analytics; Non-relational systems; Cloud and distributed platforms
• Goal: Extract business value from complex, diverse data
37.
🧠 Business IntelligenceInfrastructure
• Goal of Business Intelligence (BI)
• Provide concise, reliable info about:
• Current operations
• Emerging trends
• Business changes
• Enable better decision-making across departments
• The Data Challenge
• Data spread across multiple systems:
• Sales, manufacturing, accounting
• External sources: demographics, competitors
• Big data often required for full insights
38.
• 🛠️ BIInfrastructure Components
• DataWarehouses & Data Marts
• Centralized repositories for structured data
• Support reporting and analysis
• Hadoop
• Open-source framework for processing massive datasets
• Handles structured & unstructured data across distributed systems
• In-Memory Computing
• Analyzes big data instantly in RAM
• Speeds up complex calculations and queries
• Analytical Platforms
• High-performance systems for data mining, predictive analytics, and machine learning
• Handle large-scale structured/unstructured data analysis
39.
🏢 DataWarehouses &Data Marts
• What is a Data Warehouse?
• A central database storing:
• Current and historical data
• From multiple sources (e.g., sales, customer service, manufacturing, websites)
• Supports decision making across the company
• Data Flow Process
• Extract data from operational systems
• Clean & transform (correct errors, fill gaps)
• Load into warehouse for analysis
• Data is read-only (cannot be altered)
40.
• Key Features
•Enterprise-wide access
• Supports queries and analysis
• Includes:
• Ad hoc queries
• Standardized reporting tools
• Graphical dashboards
• What is a Data Mart?
• A subset of a data warehouse
• Contains focused, summarized data
• Created for specific user groups or departments
41.
⚙️ What isHadoop?
• Open-source software framework
• Managed by Apache Software Foundation
• Designed for distributed parallel processing
• Handles structured, semi-structured, & unstructured big data
• Key Components
• HDFS (Hadoop Distributed File System)
Connects file systems across nodes to form a unified data system
• MapReduce
Breaks big data into sub-tasks, processes them in parallel, then aggregates results
• HBase
A non-relational database for fast access & real-time big data applications
42.
• How HadoopWorks?
• Break down a large data problem
• Distribute across many low-cost servers
• Process in parallel
• Combine results into analyzable output
• Real-World Applications
• Facebook: Stores over 100 petabytes on Hadoop
• Yahoo: Uses Hadoop to track & personalize user experience
• NextBio: Supports genomic research with Hadoop + HBase
• Vendor Support
• Hadoop tools provided by:
• IBM; Oracle; HP; Microsoft
43.
⚡ In-Memory Computing:Supercharging Big Data Analytics
• What is In-Memory Computing?
• Stores data in main memory (RAM) instead of disks
• Bypasses slow disk I/O bottlenecks
• Enables real-time data access and lightning-fast analytics
• Key Benefits
• Faster query responses — from hours to seconds
• Supports large datasets (e.g., data marts, small warehouses)
• Works even on handheld/mobile devices
• Ideal for complex business calculations
44.
• Tech Enablers
•High-speed multicore processors
• Cheaper and more powerful RAM
• Advances in hardware optimization and parallel processing
• Leading In-Memory Solutions
• SAP HANA (High Performance Analytics Appliance)
• Oracle Exalytics
• Real-World Example: Centrica
• Energy utility uses SAP HANA to:
• Analyze smart meter data every 15 minutes
• Gain detailed insights by neighborhood, home size, business type
• Show customers real-time energy usage via web/mobile tools
45.
• 📊 AnalyticPlatforms: Power Tools for Big Data Insights
• What Are Analytic Platforms?
• High-speed systems optimized for complex analytics
• Use relational + non-relational technologies
• Designed for large-scale data analysis
• Key Features
• Preconfigured hardware + software integration
• Faster query processing (10–100x vs traditional DBMS)
• Support for:
• In-memory computing
• NoSQL databases
46.
• 🛠️ LeadingSolutions
• IBM Netezza
• Integrated system for database, server, and storage
• Ideal for advanced analytic queries
• Oracle Exadata
• Engineered for enterprise-scale performance
• How It Fits in BI Infrastructure
• Integrates data from:
• Operational systems; Web & social media; Machines & sensors; External sources
• Outputs Delivered
• Interactive dashboards
• Standard and ad hoc reports
• Fast, data-driven business insights
47.
📊AnalyticalTools: Discovering Insightsin Data
• Purpose of AnalyticalTools
• Go beyond data storage to extract actionable insights
• Identify relationships, spot patterns, and track trends
• KeyTypes of AnalyticalTools
• DatabaseQuerying & Reporting
• Use SQL or visual query tools
• Generate summaries, reports, and answers to specific questions
• OLAP (Online Analytical Processing)
• Enables multidimensional analysis
• Analyze data across multiple dimensions (e.g., time, region, product)
• Data Mining
• Uses AI and statistical methods to uncover hidden patterns
• Predict outcomes (e.g., customer churn, fraud)
48.
What is DataMining?
• Discovery-driven analysis
• Finds hidden patterns, relationships, and rules in large datasets
• Predicts future behavior to guide strategic decisions.
• It is used in organization to do more informed decision making.
• Business Benefits
• Targeted marketing (e.g., 1-to-1 campaigns)
• Customer segmentation
• Churn prediction and retention strategies
• Revenue optimization
50.
📄 Text Mining& Web Mining: Unlocking Unstructured Data
• Text Mining
• Purpose: Extracts insights from unstructured data like emails, memos, call transcripts,
surveys
• Benefits:
• Identifies patterns & relationships; Summarizes large volumes of text; Aids in decision-making
• Sentiment Analysis
• Detects positive, negative, or neutral sentiments in text (emails, social media, blogs, etc.)
• Example:
Charles Schwab uses Attensity Analyze to:
• Analyze customer interactions
• Detect dissatisfaction early
• Act quickly to retain customers
51.
• Web Mining:Discovery of patterns and insights from the WorldWideWeb
52.
6.4 Managing Dataresources: establishing an information policy and
ensuring data quality (Page: 265)
Read yourself, write a note on it, and upload on Moelium.