This document provides an overview of relational algebra and calculus, database normalization, and queries. It defines common relational algebra operations like selection, projection, join, etc. It explains database normalization forms like 1NF, 2NF, 3NF and their advantages. It also covers functional dependencies, integrity constraints, and different types of queries including subqueries and nested subqueries.
BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docxShiraPrater50
BUS 308 Week 4 Lecture 3
Developing Relationships in Excel
Expected Outcomes
After reading this lecture, the student should be able to:
1. Calculate the t-value for a correlation coefficient
2. Calculate the minimum statistically significant correlation coefficient value.
3. Set-up and interpret a Linear Regression in Excel
4. Set-up and interpret a Multiple Regression in Excel
Overview
Setting up correlations and regressions in Excel is fairly straightforward and follows the
approaches we have seen with our previous tools. This involves setting up the data input table,
selecting the tools, and inputting information into the appropriate parts of the input window.
Correlations
Question 1
Data set-up for a correlation is perhaps the simplest of any we have seen. It involves
simply copying and pasting the variables from the Data tab to the Week 4 worksheet. Again,
paste them to the right of the question area. The screenshot below has the data for both the
question 1 correlation and the question 2 multiple regression pasted them starting at column V.
You can paste all the data at once or add the multiple regression variables later (as long as you
do not sort the original data).
Specifically, for Question 1, copy the salary data to column V (for example). Then copy
the Midpoint thru Service columns and paste them next to salary. Finally copy the Raise column
and paste it next to the service column. Notice that our data input range for this question now
includes Salary in Column V and the other interval level variables found in Columns W thru AA.
Question 1 asks for the correlation among the interval/ratio level variables with salary
and says to exclude compa-ratio. For our example, we will correlation compa-ratio with the
other interval/ratio level variables with the exclusion of salary. Since compa-ratio equals the
salary divided by the midpoint, it does not seem reasonable to use salary in predicting compa-
ratio or compa-ratio in predicting salary.
Pearson correlations can be performed in two ways within Excel. If we have a single pair
of variables we are interested in, for example compa-ratio and performance rating, we could use
the fx (or Formulas) function CORREL(array1, array2) (note array means the same as range) to
give us the correlation.
However, if we have several variables we want to correlate at the same time, it is more
effective to use the Correlation function found in the Analysis ToolPak in the Data Analysis tab.
Set up of the input data for Correlation is simple. Just ensure that all of the variables to be
correlated are listed together, and only include interval or ratio level data. For our data set, this
would mean we cannot include gender or degree; even though they look like numerical data the 0
and 1 are merely labels as far as correlation is concerned.
In the Correlation data input box shown below, list the entire data range, indicate if your
dat ...
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docxShiraPrater50
BUS 308 Week 4 Lecture 3
Developing Relationships in Excel
Expected Outcomes
After reading this lecture, the student should be able to:
1. Calculate the t-value for a correlation coefficient
2. Calculate the minimum statistically significant correlation coefficient value.
3. Set-up and interpret a Linear Regression in Excel
4. Set-up and interpret a Multiple Regression in Excel
Overview
Setting up correlations and regressions in Excel is fairly straightforward and follows the
approaches we have seen with our previous tools. This involves setting up the data input table,
selecting the tools, and inputting information into the appropriate parts of the input window.
Correlations
Question 1
Data set-up for a correlation is perhaps the simplest of any we have seen. It involves
simply copying and pasting the variables from the Data tab to the Week 4 worksheet. Again,
paste them to the right of the question area. The screenshot below has the data for both the
question 1 correlation and the question 2 multiple regression pasted them starting at column V.
You can paste all the data at once or add the multiple regression variables later (as long as you
do not sort the original data).
Specifically, for Question 1, copy the salary data to column V (for example). Then copy
the Midpoint thru Service columns and paste them next to salary. Finally copy the Raise column
and paste it next to the service column. Notice that our data input range for this question now
includes Salary in Column V and the other interval level variables found in Columns W thru AA.
Question 1 asks for the correlation among the interval/ratio level variables with salary
and says to exclude compa-ratio. For our example, we will correlation compa-ratio with the
other interval/ratio level variables with the exclusion of salary. Since compa-ratio equals the
salary divided by the midpoint, it does not seem reasonable to use salary in predicting compa-
ratio or compa-ratio in predicting salary.
Pearson correlations can be performed in two ways within Excel. If we have a single pair
of variables we are interested in, for example compa-ratio and performance rating, we could use
the fx (or Formulas) function CORREL(array1, array2) (note array means the same as range) to
give us the correlation.
However, if we have several variables we want to correlate at the same time, it is more
effective to use the Correlation function found in the Analysis ToolPak in the Data Analysis tab.
Set up of the input data for Correlation is simple. Just ensure that all of the variables to be
correlated are listed together, and only include interval or ratio level data. For our data set, this
would mean we cannot include gender or degree; even though they look like numerical data the 0
and 1 are merely labels as far as correlation is concerned.
In the Correlation data input box shown below, list the entire data range, indicate if your
dat ...
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
2. Relational algebra is a procedural language , which takes
instances of relations as input and yields instances of relations
as output.
• It uses operators to perform queries .
• An operator can be either unary or binary .
• They accept relations as their input and yields relations as
their output.
Relational Algebra Operations:
Union
Intersection
Difference
Division
Cartesian Product
Selection
Projection
Join
3. Union: It is a binary operation , which combines the tuples of
two relations.
The two must contains the same columns.
Each column of the first set must be either the same datatype as
the corresponding column of the second test.
It is denoted by ‘U’ symbol.
R U S
R and S are the relations and U is the operator.
Intersection : This operation is a binary operation . It results in
relation with tuples that are in both the relations.
It is denoted by ‘⋂’ symbol.
R ⋂ S
4. Difference: This is a binary operator . This operator creates a
new relation with tuples that are in one relation but not in other
relation.
It is denoted by ‘-’ Symbol.
R - S
R and S are the relations .
Cartesian Product :
This is a binary relation operation .It combines the tuples
of two relations into one relation .
It is denoted by ‘×’ Symbol.
R × S
R and S are two relations and × is the operator.
Division: This operation is used to find the tuples with
phrase ‘for all’.
It is denoted by ‘÷’ Symbol.
R ÷ S
5. Selection Operation : This is a unary relational operation . This
operation pulls the horizontal subset of the relations that satisfies the
condition.
This can use operators like <,>,<=,>=,=and != to filter the data from
the relation.
It can also use logical AND,OR and NOT operators to combine the
various filtering conditions .
Denoted by: σp(r) ,Where σ stands for selection predicate
and r stands for relation. p is prepositional logic formula which may
use connectors like and, or, and not.
Projection Operators: This is a unary relation operation .
It creates the subset of relation based on the conditions specified
.Here , it selects only selected columns /attributes from the relation-
vertical subset of relations .
The selection operation above creates subset of relation but
For all the attributes in the relation.
Denoted by: ∏A1, A2, An (r)
∏ is the operator for projection, r is the relation and a1,a2,a3 are the
attributes of the relations.
6. Join operations: A join means combining fields from two tables
by using values common to each.
The different types of join operations are :
Natural join
Inner join
Self join
Left Outer join
Right Outer join
Full Outer join
Cartesian join
7. Natural join : Natural join does not use any comparison
operator.
We can perform a Natural join only if there is at least one
common attribute that exists between two relations.
In addition the attributes must have the same name and
domain.
Natural join acts on those matching attributes where the values
of attributes in both the relations are same .
R ⋈ S Here , R and S are the relations
Syntax: Select table1.column1,table2.column2…from table1
Natural join table2 where table1.common=table2.common.
8. Inner join: The inner join selects all rows from both participating
tables as long as there is a match between the columns.
Inner join is same as join clause, combining rows from two or
more tables.
Syntax: select table1.column,table2.column2..FROM table1
INNER JOIN table2 ON table1.common=table2.common;
9. Left Outer join: In this operation , all the tuples in the left hand
side relation is retained .
All matching attribute in the right hand relation is displayed with
values and the ones which does not have values are shown as
Null value .
Syntax: Select table1.column1,table2.column2….from table1 left
outer join table2 on table1.common=table2.common;
10. Right Outer Join: In this operation , all the attributes of right
hand side is retained and it matching attribute in left hand
relation is found and displayed . If no matching is found then
null is displayed.
Syntax: Select table1.column1,table2.column2….from table1
Right outer join table2 on table1.common=table2.common;
11. Full Outer Join:
In this operation combination of both left and right outer join .
It displays all the attributes from both the relation .
If the matching attribute exists in other relation ,then that will be
displayed ,else as those attributes are shown as null.
Syntax: Select table1.column1,table2.column2…from table1 full
outer join on table1.common_field=table2.common_field;
12. Self join: The SQL self join is used to join a table to itself as if the
were two tables , temporarily remaining at least one table in the
SQL statement.
Syntax: Select a. column_name,b.column_name ……
From table a, table b
Where a .common=b.common;
13. Cartesian Join: A Cartesian join or Cross join returns the
Cartesian product of the sets of records from the two or more
joined tables. Thus , it equates to an inner join where the join-
condition always evaluates to true or where the join-condition is
absent from the statement.
14. Relational Calculus: Relational Calculus is a non-procedural query
language .
The relational calculus tells , the system what data to be retrieved but
does not tell how to retrieve it.
Relational Calculus exists in two forms:
Tuple Relational Calculus (TRC)
Domain Relational Calculus(DRC)
Tuple Relational Calculus : Tuple Relational Calculus is a non
procedural query language which specifies to select the tuples in a
relation .
It can select the tuples with range of values or tuples for certain
attribute values.
The resulting relation can have one or more tuples.
{t| P(t)} or {t | condition (t)}
t=> resulting tuple
P(t) => condition used to fetch
Example: {t | Employee (t) and t. salary>10000}
It selects the tuples from employee relation such that resulting
employee tuples will have salary greater than 10000.
15. Domain Relational Calculus: Domain relational calculus uses list
of attributes to be selected from the relation based on the
condition .
It is same as Tuple relational calculus , but differs by selecting
attributes rather than selecting whole tuples.
Denoted by, { <a1,a2,a3,…..an>| P(a1,a2,a3,….an)}
a1,a2,a3……are the attributes .
p=> condition
Example: {<EMPID, ENAME> | <EMPID,ENAME>? EMPLOYEE ^
DEPTID = 10}
selects empid empname of employees who work for
department.
Constraints: Every relation has some conditions that must hold
for it to be a valid relation .
These conditions are called Relational Integrity Constraints.
16. There are 3 types of Integrity Constraints:
Key Constraints
Domain Constraints
Referential Constraints
Key Constraints: There must be at least one minimal subset of
attributes in the relation , which can identify a tuple uniquely.
This minimal subset of attributes is called key for that relation.
Key constraints are also referred to as Entity Constraints.
Key constraints force that :
In a relation with a key attribute , no two tuples can have
identical values for key attributes.
A key attribute cannot have NULL values.
17. Domain Constraints: Attributes have specific values in real-
world scenario.
Example: Age can only be a positive integer.
Referential integrity Constraints: Referential integrity
Constraints Work on the concept of Foreign keys.
A Foreign key is a key attribute of a relation that can be referred
in other relation .
Referential integrity constraint states that if a relation refers to a
Key attribute of a different or same relation , then that key
element must exist.
Integrity Rules: Integrity rules may sound very technical but
they are simple and straightforward rules that each table must
follow.
These are very important in database design, when tables break
any of the integrity rules our database will contain errors when
retrieving information.
Hence the name integrity which describes reliability and
consistency of values.
18. There are two types of integrity rules:
Entity integrity rule
Referential integrity rule
Entity integrity rule: The entity integrity rule refers to the rules
the primary key must follow.
The primary key value cannot be null.
The primary key value must be unique.
Referential Integrity Rule:
The referential integrity rule refers to the foreign key.
The foreign key may be null and may have the same value.
The foreign key value must match a record in the table it is
referring to.
19. Normalization of Database:
It is a technique of organizing data in a database .
Normalization is a systematic approach of decomposing
tables to eliminate data redundancy and undesirable
characteristics
Like Insertion , Update and Deletion Anomalies.
It is a multi-step process that puts data into tabular form by
removing duplicated data from the relational tables.
Normalization is used for mainly two purpose(goals).
Eliminating redundant data.
Ensuring data dependencies make sense i.e data is logically
stored .
20. Problem without Normalization:
Empid Ename Address Department
1001 Surya Bangalore Biology
1002 Chandra Mysore Maths
1003 Rama Kolar Maths
1004 Surya Bangalore Physics
21. Updation Anomaly: To upadate the address of a employee who
occurs twice or more than twice in a table ,we will have to
update the Address column in all the rows , else data will
become inconsistent.
Insertion Anomaly: Suppose for a new employee , we have
empid, ename and address of a employee but if employee has
not opted for any department yet then we have to insert Null
there ,leading to insertion anomaly.
Deletion Anomaly: If empid 1001 has only one department and
temporarily he drops it , when we delete that row , entire
employee record will be deleted along with it.
Types of Normal form:
First Normal Form
Second Normal Form
Third Normal Form
22. First Normal Form (1NF) : As per the rule of first normal form ,
an attribute of a table cannot hold multiple values .
It should hold only atomic values.
Table without Normalization:
24. Second Normal Form(2NF)
A table is said to be in 2NF if both the following conditions hold:
Table in 1NF
No non-prime attribute is dependent on the proper subset of
any candidate key of table.
25. Third Normal Form (3NF)
A table design is said to be in 3NF if both the following conditions
hold:
Table must be in 2NF
Transitive functional dependency of non-prime attribute on any
super key should be removed.
An attribute that is not part of an any candidate key is known as
prime attribute.
26. Table complies with 3NF:
The advantage of removing transitive dependency is ,
• Amount of data duplication is reduced.
• Data integrity achieved.
27. Properties of Normalized Relation:
No data value should be duplicated in different rows,
unnecessarily.
A value must be specified for every attribute in a row.
Each relation should be self-contained . In other words ,if a
row from a relation is deleted , important information should
not be accidently lost.
When a row is added to a relation , other relations in the
database should not be affected.
A value of an attribute in a tuple may be changed
independent of other tuples in the relation and other
relations.
28. Advantages of Normalization:
More efficient data structure.
Avoid redundant fields or columns.
More flexible data structure i.e we should be able to add new
rows and data values easily.
Better understanding of data.
Ensures that distinct tables exist when necessary.
Easier to maintain data structure i.e it is easy to perform
operations and complex queries can be easily handled.
Minimizes data duplication .
Close modeling of real word entities ,process and their
relationships.
29. Disadvantages:
On Normalizing the relations to higher normal forms i.e
4NF,5NF the performance degrades.
It is very time consuming and difficult process in normalizing
relations of higher degree.
Careless decompositions may leads to bad design
Functional Dependency: Functional dependency (FD) is a set of
constraints between two attributes in a relation .
Functional dependency says that if two tuples have same
values for attributes A1,A2,A3 …….An, then those two tuples
must have to have same values for the attributes B1,B2,…..Bn.
Functional Dependency is represented by a arrow sign (--->)
that is XY , where X functionally determines Y.
The left hand side attributes determine the values of
attributes on the right hand side .
30. An important property of functional dependency is Armstrong ‘s
axiom, which is used in database normalization.
In a relation R , with three attributes (X,Y,Z) Armstrong’s axiom holds
strong if the following conditions are satisfied.
Transitivity: if X Y and YZ, then XZ
Reflexivity (Subset Property): if Y is a subset of X , then XY
Augmentation: if X Y, then XZYZ
31.
32. Queries: A query is a request for data or information from a
database table or combinations of tables.
This data may be generated as results returned by structured Query
language (SQL).
Query languages generate different in data types according to
function.
SQL returns data in neat rows and columns .
Other query language generate data as graphs or other complex
data manipulation
Example: Select * from student.
Sub Queries: A sub query or a inner query is a query within another
SQL query and embedded within the WHERE clause.
A sub query is used to return the data that will be used in the main
query as a condition to further restrict the data to be retrieved.
Sub Queries can be used along with the SELECT ,INSERT, UPDATE,
and DELETE statements along with the operations like
=,<,>,>=,<=,IN, BETWEEN etc.
33. Rules for Sub Queries:
Sub Queries must be enclosed within parentheses.
A sub query can have only one column in the SELECT clause, unless
multiple columns in a are in the main query to compare its selected
columns.
An ORDER BY cannot be used in a sub query, although the main query can
be use an ORDER BY .
The group by can be used to perform the same function as the ORDER BY
in a sub query.
Sub Queries that return more than one row can only be used with
multiple value operators, such as the IN operators.
A sub Query cannot be immediately enclosed in a set function.
The BEWEEN operator cannot be used with a sub query ; However , the
BEWEEN operator can be used within the sub query.
Syntax: SELECT column_name [,column_name]
FROM table1, [,table2]
WHERE column_name OPERATOR
(SELECT column_name[,column_name]
FROM table1 [,table2]
[WHERE])
34. Example: select count (*) accident from participated where regno
in (select regno from car where model like ‘ALTO’);
Nested Sub Queries: A sub Query can be nested inside other sub
queries.
SQL has an ability to nest queries within one another.
A sub query is a select statement that is nested within another
Select statement and which return intermediate results.
Syntax: Select column1,column2…..from table_name[where
condition ](select column1,column2,..from table_name where
condition );