2. Task
What are data,
information,
knowledge and
wisdom?
How do they differ?
Give an example of a
context where you
would apply each
concept.
Working Example: The UK Census
3. Data, information & knowledge
• Data: information
without context
• Information: data
within a context
• Knowledge:
information with
added value
6. Zeroth generation: Record Managers
4000BC -1900
• The first known writing describes
the royal assets and taxes in
Sumeria.
• The next six thousand years saw a
technological evolution from clay
tablets to papyrus to parchment
and then to paper.
• There were many innovations in
data representation: phonetic
alphabets, novels, ledgers,
libraries, paper and the printing
press.
7. TASK
Create a timeline which
shows the dates of
origin of the:
1. International Phonetic
Alphabet (IPA)
2. First Novel in English
3. 1st
Library
4. Printing Press
5. 1st
Computational Device
Time: 10 mins
8. Answers
• 1300 BC: The Royal Library of Ashurbinpal in Sumeria
• 1439: Johannes Guttenberg invented the printing
press
• 1470: The first novel in English, Le Morte D’Arthur
was written by Thomas Malory
• 1801: Jacquard invented the loom – considered to be
the first computational device
• 1886: The International Phonetic Alphabet was
invented
9. First Generation: Record Managers
1900 -1955
• The first practical automated
information processing began
circa 1800 with the Jacquard
Loom that produced fabric
from patterns represented by
punched cards.
• Each data record was
represented as binary patterns
on a punched card
• By 1955, many companies had
entire floors dedicated to
storing punched cards, much as
the Sumerian archives had
stored clay tablets.
10. Second Generation: Programmed Unit
Record Equipment 1955-1970
• Stored program electronic computers
had been developed in the 1940’s for
scientific and numerical calculations.
At about the same time, Univac had
developed a magnetic tape
• Software was a key component of this
new technology. It made them
relatively easy to program and use. It
was much easier to sort, analyze, and
process the data with languages like
COBOL
• The software of the day provided a
file-oriented record processing model.
Typical programs sequentially read
several input files and produced new
files as output
11. Third Generation: Online Network
Databases 1965-1980
• Teleprocessing monitors provided the specialized
software to multiplex thousands of terminals onto the
modest server computers of the day
• Online transaction processing augmented the batch
transaction processing that performed background
reporting tasks.
• Simple indexed-sequential record organizations soon
evolved to a more powerful set-oriented record
model. Applications often want to relate two or more
records.
• The end product was, in essence, a network data
model
12. Fourth Generation: Relational
Databases 1980-1995
• Despite the success of the
network data model, many
software designers felt that a
navigational programming
interface was too low-level
• The idea of the relational model is
to represent both entities and
relationships in a uniform way.
• The relational model had some
unexpected benefits beyond
programmer productivity and
ease-of-use. The relational model
was well suited to client-server
computing, to parallel processing,
and to graphical user interfaces.
13. Fifth Generation: Multimedia
Databases 1995-?
• Relational systems offered huge improvements in
ease-of-use, graphical interfaces, client-server
applications, distributed databases, parallel data
search, and data mining. Nonetheless, in about
1985, the research community began to look
beyond the relational model.
• People coming from the object-oriented
programming community saw the problem
clearly: datatype design requires a good data
model and a unification of procedures and data.
14. Sixth Generation: The Future
• Defining the data models for new types and
integrating them with the traditional database
systems.
• Scaling databases in size (to petabytes), space
(distributed), and diversity (heterogeneous).
• Automatically discovering data trends,
patterns, and anomalies (data mining, data
analysis).
16. What is a database model?
A database model is the
theoretical foundation of a
database and fundamentally
determines in which manner
data can be stored, organised,
and manipulated in a database
system. It thereby defines the
infrastructure offered by a
particular database system.
17. Flat File Model
The flat (or table) model consists of a
single, two-dimensional array of data
elements, where all members of a
given column are assumed to be
similar values, and all members of a
row are assumed to be related to
one another.
18. Hierarchical Model
In a hierarchical model, data is
organized into a tree-like
structure, implying a single upward
link in each record to describe the
nesting, and a sort field to keep
the records in a particular order in
each same-level list.
19. Network Model
The network model (defined
by the CODASYL specification)
organises data using two
fundamental concepts, called
records and sets. Records
contain fields (which may be
organized hierarchically, as in
the programming language
COBOL). Sets (not to be
confused with mathematical
sets) define one-to-many
relationships between records:
one owner, many members. A
record may be an owner in any
number of sets, and a member
in any number of sets.
COnference on DAta SYstems Languages
DataBase Task Group data model. The
CODASYL group originally formed in the early
1970's to create the standards for COBOL. After
successfully developing the COBOL
specifications, the groups charter was
extended to create a set of database
standards.
20. Relational Model
The relational model was
introduced by E.F. Codd in
1970as a way to make
database management
systems more independent
of any particular application.
It is a mathematical model
defined in terms of
predicate logic and set
theory.
21. Strengths of the Relational Model
• The data model and access to it is simple to
understand and use, even for those who are not
experienced programmers.
• The model of data represented in tables is simple.
• There are straightforward database design
procedures.
• Efficient implementation techniques are well known
and widely used.
• Standards exist for query languages, such as SQL.
22. Object-Oriented Model
In recent years, the object-
oriented paradigm has been
applied to database
technology, creating a new
programming model known
as object databases. These
databases attempt to bring
the database world and the
application programming
world closer together, in
particular by ensuring that
the database uses the same
type system as the
application program.
24. What is a Data Warehouse?
A data warehouse is a database used
for reporting and analysis. The data
stored in the warehouse is
uploaded from the operational
systems. The data may pass
through an operational data store
for additional operations before it
is used in the data warehouse for
reporting.
26. Benefits of a Data Warehouse
A data warehouse maintains a copy of information from the source transaction
systems. This architectural complexity provides the opportunity to:
• Maintain data history, even if the source transaction systems do not.
• Integrate data from multiple source systems, enabling a central view across
the enterprise. This benefit is always valuable, but particularly so when the
organization has grown by merger.
• Improve data quality, by providing consistent codes and descriptions, flagging
or even fixing bad data.
• Present the organization's information consistently.
• Provide a single common data model for all data of interest regardless of the
data's source.
• Restructure the data so that it makes sense to the business users.
• Restructure the data so that it delivers excellent query performance, even for
complex analytic queries, without impacting the operational systems.
• Add value to operational business applications, notably customer relationship
management (CRM) systems.
27. Dimensional v Normalised
There are two leading approaches to storing data in a data
warehouse — the dimensional approach and the normalised
approach.
• The dimensional approach, whose supporters are referred to
as “Kimballites”, believe in Ralph Kimball’s approach in which
it is stated that the data warehouse should be modelled using
a Dimensional Model (DM).
• The normalized approach, also called the 3NF model, whose
supporters are referred to as “Inmonites”, believe in Bill
Inmon's approach in which it is stated that the data
warehouse should be modelled using an Entity-Relationship
(ER) model.
29. What is Data Mining?
Data mining, the analysis step of the Knowledge
Discovery in Databases (KDD) processor; a
relatively young and interdisciplinary field of
computer science,is the process of
discovering new patterns from large data sets
involving methods at the intersection of
artificial intelligence, machine learning,
statistics and database systems.
30. Classes of Task Examples 1-3
Data mining involves common classes of tasks, for example:
1. Classification: is the task of generalising known structure
to apply to new data, eg, an email program might
attempt to classify an email as legitimate or spam.
3. Clustering: is the task of discovering groups and
structures in the data that are in some way or another
similar, without using known structures in the data, eg,
market basket analysis: Age x Income x Type of Cheese
4. Summarisation: providing a more compact
representation of the data set, including visualisation
and report generation, eg, charts and graphs