The Arithmetic Mean
An arithmetic mean is a fancy term for what most people call an "average." When
someone says the average of 10 and 20 is 15, they are referring to the arithmetic mean.
The simplest definition of a mean is the following: Add up all the numbers you want to
average, and then divide by the number of items you just added.
For example, if you want to average 10, 20, and 27, first add them together to get
10+20+27= 57. Then divide by 3 because we have three values, and we get an
arithmetic mean (average) of 19.
Want a formal, mathematical expression of the arithmetic mean?
That's just a fancy way to say "the sum of k different numbers divided by k."
Check out a few example of the arithmetic mean to make sure you understand:
Example:
Find the arithmetic mean (average) of the following numbers: 9, 3, 7, 3, 8, 10, and 2.
Solution:
Add up all the numbers. Then divide by 7 because there are 7 different numbers.
Example:
Find the arithmetic mean of -4, 3, 18, 0, 0, and -10.
Solution:
Sum the numbers. Divide by 6 because there are 6 numbers.
The answer is 7/6, or 1.167
Geometric Mean
The geometric mean is NOT the arithmetic mean and it is NOT a simple average. It is
the nth root of the product of n numbers. That means you multiply a bunch of numbers
together, and then take the nth root, where n is the number of values you just multiplied.
Did that make sense? Here's a quick example:
Example:
What is the geometric mean of 2, 8 and 4?
Solution:
Multiply those numbers together. Then take the third root (cube root) because there are
3 numbers.
Naturally, the geometric mean can get very complicated. Here's a mathematical
definition of the geometric mean:
Remember that the capital PI symbol means to multiply a series of numbers. That
definition says to multiply k numbers and then take the kth root. One thing you should
know is that the geometric mean only works with positive numbers. Negative numbers
could result in imaginary results depending on how many negative numbers are in a set.
Typically this isn't a problem, because most uses of the geometric mean involve real
data, such as the length of physical objects or the number of people responding to a
survey.
Try a few more examples until you understand the geometric mean.
Example:
What is the geometric mean of 4, 9, 9, and 2?
Solution:
Just multiply the four numbers and take the 4th root:
The geometric mean between two numbers is:
The arithmetic mean between two numbers is:
Example: The cut-off frequencies of a phone line are f1 = 300 Hz
and f2 = 3300 Hz. What is the center frequency?
The center frequency is f0 = 995 Hz as geometric mean and
not f0 = 1800 Hz (arithmetic mean). What a difference!
The geometric mean of two numbers is the square root of their product.
The geometric mean of three numbers is the cubic root of their product.
The arithmetic mean is the sum of the numbers, divided by the quantity of the
numbers.
Other names for arithmetic mean: average, mean, arithmetic average.
In general, you can only take the geometric mean of positive numbers.
The geometric mean, by definition, is the nth root of the product of the n units in a data
set. For example, the geometric mean of 5, 7, 2, 1 is (5 × 7 × 2 × 1)1/4
= 2.893.
Alternatively, if you log transform each of the individual units the geometric will be the
exponential of the arithmetic mean of these log-transformed values. So, reusing the
example above, exp [ ( ln(5) + ln(7) + ln(2) + ln(1) ) / 4 ] = 2.893.
Geometric Mean
Arithmetic Mean
An arithmetic average is the sum of a series of numbers divided by the count of that
series of numbers.
If you were asked to find the class (arithmetic) average of test scores, you would simply
add up all the test scores of the students, and then divide that sum by the number of
students. For example, if five students took an exam and their scores were 60%, 70%,
80%, 90% and 100%, the arithmetic class average would be 80%.
This would be calculated as: (0.6 + 0.7 + 0.8 + 0.9 + 1.0) / 5 = 0.8.
The reason you use an arithmetic average for test scores is that each test score is an
independent event. If one student happens to perform poorly on the exam, the next
student's chances of doing poor (or well) on the exam isn't affected. In other words,
each student's score is independent of the all other students' scores. However, there
are some instances, particularly in the world of finance, where an arithmetic mean is not
an appropriate method for calculating an average.
Consider your investment returns, for example. Suppose you have invested your
savings in the stock market for five years. If your returns each year were 90%, 10%,
20%, 30% and -90%, what would your average return be during this period? Well,
taking the simple arithmetic average, you would get an answer of 12%. Not too shabby,
you might think.
However, when it comes to annual investment returns, the numbers are not
independent of each other. If you lose a ton of money one year, you have that much
less capital to generate returns during the following years, and vice versa. Because of
this reality, we need to calculate the geometric average of your investment returns in
order to get an accurate measurement of what your actual average annual return over
the five-year period is.
To do this, we simply add one to each number (to avoid any problems with negative
percentages). Then, multiply all the numbers together, and raise their product to the
power of one divided by the count of the numbers in the series. And you're finished -
just don't forget to subtract one from the result!
That's quite a mouthful, but on paper it's actually not that complex. Returning to our
example, let's calculate the geometric average: Our returns were 90%, 10%, 20%, 30%
and -90%, so we plug them into the formula as [(1.9 x 1.1 x 1.2 x 1.3 x 0.1) ^ 1/5] - 1.
This equals a geometric average annual return of -20.08%. That's a heck of a lot worse
than the 12% arithmetic average we calculated earlier, and unfortunately it's also the
number that represents reality in this case.
It may seem confusing as to why geometric average returns are more accurate than
arithmetic average returns, but look at it this way: if you lose 100% of your capital in one
year, you don't have any hope of making a return on it during the next year. In other
words, investment returns are not independent of each other, so they require a
geometric average to represent their mean.
A matrix consists of a set of numbers arranged in rows and columns enclosed in
brackets.
The order of a matrix gives the number of rows followed by the number of columns
in a matrix. The order of a matrix with 3 rows and 2 columns is 3 2 or 3 by 2.
We usually denote a matrix by a capital letter.
C is a matrix of order 2 × 4 (read as ‘2 by 4’)
Elements In An Array
Each number in the array is called an entry or an element of the matrix. When we
need to read out the elements of an array, we read it out row by row.
Each element is defined by its position in the matrix.
In a matrix A, an element in row i and column j is represented by aij
Example:
a11 (read as ‘a one one ’)= 2 (first row, first column)
a12 (read as ‘a one two') = 4 (first row, second column)
a13 = 5, a21 = 7, a22 = 8, a23 = 9
Matrix Multiplication
There are two matrix operations
which we will use in our matrix transformations, multiplying (concatenating) two matrices,
and transforming a vector by a matrix. We will now examine the first of these two
operations, matrix multiplication.
Matrix multiplication is the operation by which one matrix is transformed by another. A very
important thing to remember is that matrix multiplication is not commutative. That is, [a] *
[b] != [b] * [a]. For now, it will suffice to say that a matrix multiplication stores the results
of the sum of the products of matrix rows and columns. Here is some example code of a
matrix multiplication routine which multiplies matrix [a] * matrix [b], then copies the result
to matrix a.
void matmult(float a[4][4], float b[4][4])
{
float temp[4][4]; // temporary matrix for storing result
int i, j; // row and column counters
for (j = 0; j < 4; j++) // transform by columns first
for (i = 0; i < 4; i++) // then by rows
temp[i][j] = a[i][0] * b[0][j] + a[i][1] * b[1][j] +
a[i][2] * b[2][j] + a[i][3] * b[3][j];
for (i = 0; i < 4; i++) // copy result matrix into matrix a
for (j = 0; j < 4; j++)
a[i][j] = temp[i][j];
}
I have been informed that there is a faster way of multiplying matrices, which involves
taking the dot product of rows and columns. However, I have yet to implement such a
method, so I will not discuss it here at this time.
Transforming a Vector by a Matrix
This is the second operation
which is required for our matrix transformations. It involves projecting a stationary vector
onto transformed axis vectors using the dot product. One dot product is performed for each
coordinate axis.
x = x0 * matrix[0][0] + y0 * matrix[1][0] + z0 * matrix[2][0] +
w0 * matrix[3][0];
y = x0 * matrix[0][1] + y0 * matrix[1][1] + z0 * matrix[2][1] +
w0 * matrix[3][1];
z = x0 * matrix[0][2] + y0 * matrix[1][2] + z0 * matrix[2][2] +
w0 * matrix[3][2];
The x0, y0, etc. coordinates are the original object space coordinates for the vector. That is,
they never change due to transformation.
"Alright," you say. "Where did all the w coordinates come from???" Good question :) The w
coordinates come from what is known as a homogenous coordinate system, which is
basically a way to represent 3d space in terms of a 4d matrix. Because we are limiting
ourselves to 3d, we pick a constant, nonzero value for w (1.0 is a good choice, since
anything * 1.0 = itself). If we use this identity axiom, we can eliminate a multiply from each
of the dot products:
x = x0 * matrix[0][0] + y0 * matrix[1][0] + z0 * matrix[2][0] +
matrix[3][0];
y = x0 * matrix[0][1] + y0 * matrix[1][1] + z0 * matrix[2][1] +
matrix[3][1];
z = x0 * matrix[0][2] + y0 * matrix[1][2] + z0 * matrix[2][2] +
matrix[3][2];
These are the formulas you should use to transform a vector by a matrix.
Object Space Transformations
Now that we know how to multiply matrices together, we can implement the actual formulas
used in our transformations. There are three operations performed on a vector by a matrix
transformation: translation, rotation, and scaling.
Translation can best be described as linear change in position. This change can be
represented by a delta vector [dx, dy, dz], where dx represents the change in the object's x
position, dy represents the change in its y position, and dz its z position.
If done correctly, object space translation allows objects to translate forward/backward,
left/right, and up/down, relative to the current orientation of the object. Using our airplane
as an example, if the nose of the airplane is oriented along the object's local z axis, then
translating the airplane in the +z direction will make the airplane move forward (the
direction in which its nose is pointing) regardless of the airplane's orientation.
Here is the translation matrix:
+= =+
| += =+ += =+ += =+ += += |
| | | | | | | | | |
| | 1 | | 0 | | 0 | | 0 | |
| | | | | | | | | |
| | | | | | | | | |
| | 0 | | 1 | | 0 | | 0 | |
| | | | | | | | | |
| | | | | | | | | |
| | 0 | | 0 | | 1 | | 0 | |
| += =+ += =+ += =+ | | |
| +===============+ | | |
| dy dx dz | 1 | |
| +===============+ += =+ |
+= =+
where [dx, dy, dz] is the displacement vector. After this operation, the object will have
moved in its own coordinate system, according to the displacement (translation) vector.
The next operation that is performed by our matrix transformation is rotation. Rotation can
be described as circular motion about some axis, in this case the axis is one of the object's
local axes. Since there are three axes in each object, we need to rotate around each of
them. Here are the matrices for rotation about each axis:
about the x axis:
+= =+
| += =+ += =+ += =+ += += |
| | | | | | | | | |
| | 1 | | 0 | | 0 | | 0 | |
| | | | | | | | | |
| | | | | | | | | |
| | 0 | |cx | |sx | | 0 | |
| | | | | | | | | |
| | | | | | | | | |
| | 0 | |-sx| |cx | | 0 | |
| += =+ += =+ += =+ | | |
| +===============+ | | |
| 0 0 0 | 1 | |
| +===============+ += =+ |
+= =+
about the y axis:
+= =+
| += =+ += =+ += =+ += += |
| | | | | | | | | |
| |cy | | 0 | |-sy| | 0 | |
| | | | | | | | | |
| | | | | | | | | |
| | 0 | | 1 | | 0 | | 0 | |
| | | | | | | | | |
| | | | | | | | | |
| |sy | | 0 | |cy | | 0 | |
| += =+ += =+ += =+ | | |
| +===============+ | | |
| 0 0 0 | 1 | |
| +===============+ += =+ |
+= =+
about the z axis:
+= =+
| += =+ += =+ += =+ += += |
| | | | | | | | | |
| |cz | |sz | | 0 | | 0 | |
| | | | | | | | | |
| | | | | | | | | |
| |-sz| |cz | | 0 | | 0 | |
| | | | | | | | | |
| | | | | | | | | |
| | 0 | | 0 | | 1 | | 0 | |
| += =+ += =+ += =+ | | |
| +===============+ | | |
| 0 0 0 | 1 | |
| +===============+ += =+ |
+= =+
The cx, sx, cy, sy, cz, and sz variables are the values of the cosines and sines of the angles
of rotation about the x, y, and z axes, respectively. Remeber that the angles used represent
angular displacement just as the values used in the translation step denote a linear
displacement. Correct transformation CANNOT be accomplished with matrix multiplication if
you use the cumulative angles of rotation. I have been told that quaternions are able to
perform this operation correctly, however I know nothing of quaternions and how they are
implemented. The incremental angles used here represent rotation from the current object
orientation. In other words, by rotating 1 degree about the z axis, you are telling your
object "Rotate 1 degree about your z axis, regardless of your current orientation, and
regardless of how you got to that orientation." If you think about it a bit, you will realize
that this is how the real world operates. In object space, the series of rotations an object
undergoes to attain a certain orientation have no effect on the object space results of any
upcoming rotations.
Now that we know the matrix formulas for translation and rotation, we can combine them to
transform our objects. The formula for transformations in object space is
[O] = [O] * [T] * [X] * [Y] * [Z]
where O is the object's matrix, T is the translation matrix, and X, Y, and Z are the rotation
matrices for their respective axes. Remember, that order of matrix multiplication is very
important!
The recursive assignment of O poses a question: What is the original value of the object
matrix? To eliminate any terrible errors in transformation, the matrices which store an
object's orientation should always be initialized to identity.
Matrix Multiplication
You probably know what a matrix is already if you are interested in matrix multiplication.
However, a quick example won't hurt. A matrix is just a two-dimensional group of
numbers. Instead of a list, called a vector, a matrix is a rectangle, like the following:
You can set a variable to be a matrix just as you can set a variable to be a number. In
this case, x is the matrix containing those four numbers (in that particular order). Now,
suppose you have two matrices that you need to multiply. Multiplication for numbers is
pretty easy, but how do you do it for a matrix?
Here is a key point: You cannot just multiply each number by the corresponding
number in the other matrix. Matrix multiplication is not like addition or subtraction. It is
more complicated, but the overall process is not hard to learn. Here's an example first,
and then I'll explain what I did:
Example:
Solution:
You're probably wondering how in the world I got that answer. Well you're justified in
thinking that. Matrix multiplication is not an easy task to learn, and you do need to pay
attention to avoid a careless error or two. Here's the process:
• Step 1: Move across the top row of the first matrix, and down the first column of
the second matrix:
• Step 2: Multiply each number from the top row of the first matrix by the number in
the first column on the second matrix. In this case, that means multiplying 1*2
and 6*9. Then, take the sum of those values (2+54):
• Step 3: Insert the value you just got into the answer matrix. Since we are
multiplying the 1st row and the 1st column, our answer goes into that slot in the
answer matrix:
• Step 4: Repeat for the other rows and columns. That means you need to walk
down the first row of the first matrix and this time the second column of the
second matrix. Then the second row of the first matrix and the first column of
the second, and finally the bottom of the first matrix and the right column of the
second matrix:
• Step 5: Insert all of those values into the answer matrix. I just showed you how to
do top left and the bottom right. If you work the other two numbers, you will get
1*2+6*7=44 and 3*2+8*9=78. Insert them into the answer matrix in the
corresponding positions and you get:
Now I know what you're thinking. That was really hard!!! Well it will seem that way until
you get used to the process. It may help you to write out all your work, and even draw
arrows to remember which way you're moving in the rows and columns. Just remember
to multiply each row in the first matrix by each column in the second matrix.
What if the matrices aren't squares? Then you have to add another step. In order to
multiply two matrices, the matrix on the left must have as many columns as the matrix
on the right has rows. That way you can match up each pair while you're multiplying.
The size of the final matrix is determined by the rows in the left matrix and the columns
in the right. Here's what I do:
I write down the sizes of the matrices. The left matrix has 2 rows and 3 columns, so
that's how we write it. Rows, columns, in that order. The other matrix is a 3x1 matrix
because it has 3 rows and just 1 column. If the numbers in the middle match up you can
multiply. The outside numbers give you the size of the answer. Even if you mess this up
you'll figure it out eventually because you won't be able to multiply.
Here's an important reminder: Matrix Multiplication is not commutative. That means
you cannot switch the order and expect the same result! Regular multiplication tells
us that 4*3=3*4, but this is not multiplication of the usual sense.
Finally, here's an example with uneven matrix sizes to wrap things up:
Example:
Lab 1: Matrix Calculation Examples
Given the Following Matrices:
A=
1.00 2.00 3.00
4.00 5.00 6.00
7.00 8.00 9.00
B=
1.00 1.00 1.00
2.00 2.00 2.00
C=
1.00 2.00 1.00
3.00 2.00 2.00
1.00 5.00 3.00
1) Calculate A + C
2) Calculate A - C
3) Calculate A * C (A times C)
4) Calculate B * A (B time A)
5) Calculate A .* C (A element by element multiplication with C)
6) Inverse of C
Matrix Calculation Examples - Answers
A + C=
2.00 4.00 4.00
7.00 7.00 8.00
8.00 13.00 12.00
A - C=
0.00 0.00 2.00
1.00 3.00 4.00
6.00 3.00 6.00
A * C=
10.00 21.00 14.00
25.00 48.00 32.00
40.00 75.00 50.00
B * A=
12.00 15.00 18.00
24.00 30.00 36.00
Element by element multiplication A .* C=
1.00 4.00 3.00
12.00 10.00 12.00
7.00 40.00 27.00
Inverse of C=
0.80 0.20 -0.40
1.40 -0.40 -0.20
-2.60 0.60 0.80
Matrix Calculation Assignment
Given the Following Matrices:
A=
1.00 2.00 3.00
6.00 5.00 4.00
9.00 8.00 7.00
B=
2.00 2.00 2.00
3.00 3.00 3.00
C=
1.00 2.00 1.00
4.00 3.00 1.00
3.00 4.00 2.00
1) Calculate A + C
2) Calculate A - C
3) Calculate A * C (A times C)
4) Calculate B * A (B time A)
5) Calculate A .* C (A element by element multiplication with C)
Vector product VS dot product in matrix
hi, i don't really understand whats the difference between vector product and
dot product in matrix form.
for example
(1 2) X (1 2)
(3 4) (3 4) = ?
so when i take rows multiply by columns, to get a 2x2 matrix, i am doing
vector product?
so what then is dot producT?
lastly, my notes says |detT| = final area of basic box/ initial area of basic box
where detT = (Ti) x (Tj) . (Tk)
so, whats the difference between how i should work out x VS . ?
also, |detT| = magnitude of T right? so is there a formula i should use to find
magnitude?
so why is |k . k| = 1?
thanks
PhysOrg.com
science news on PhysOrg.com
>> Smooth-talking hackers test hi-tech titans' skills
>> Reading terrorists minds about imminent attack: P300
brain waves correlated to guilty knowledge
>> Nano 'pin art': NIST arrays are step toward mass
production of nanowires
Apr25-10, 07:35 AM Last edited by HallsofIvy; Apr27-10 at 07:42 AM.. #2
HallsofIvy
HallsofIvy is Offline:
Posts: 26,845
Re: Vector product VS dot product in matrix
Originally Posted by quietrain
hi, i don't really understand whats the difference between
vector product and dot product in matrix form.
for example
(1 2) X (1 2)
(3 4) (3 4) = ?
so when i take rows multiply by columns, to get a 2x2
matrix, i am doing vector product?
No, you are doing a "matrix product". There are no vectors
here.
so what then is dot producT?
With matrices? It isn't anything. The matrix product is the only
multiplication defined for matrices. The dot product is defined
for vectors, not matrices.
lastly, my notes says |detT| = final area of basic box/ initial
area of basic box
where detT = (Ti) x (Tj) . (Tk)
Well, we don't have your notes so we have no idea what "T",
"Ti", "Tj", "Tk" are nor do we know what a "basic box" is.
I do know that if you have a "parallelpiped" with adjacent
sides given by the vectors , , and , then the volume (not
area) of the parallelpiped is given by the "triple
product", which can be represented by determinant
having the components of the vectors as rows. That has
nothing at all to do with matrices.
so, whats the difference between how i should work out x
VS . ?
also, |detT| = magnitude of T right?
No, "det" applies only to square arrays for which "magnitude"
is not defined.
so is there a formula i should use to find magnitude?
so why is |k . k| = 1?
thanks
I guess you mean "k" to be the unit vector in the z direction
in a three dimensional coordinate system. If so, then |k.k| is,
by definition, the length of k which is, again by definition of
"unit vector", 1.
You seem to be confusing a number of very different concepts.
Go back and review.
Apr27-10, 01:02 AM #3
quietrain
quietrain is Offline:
Posts: 173
Re: Vector product VS dot product in matrix
oh.. em..
ok lets say we have
(1 2) x (4 5)
(3 4) (6 7) = so this is just rows multiply by column to get a 2x2
matrix right? so what is the difference if i replace the x sign with the
dot sign now. do i still get the same?
i presume one is cross (x) product , one is dot (.) product? or is it for
matrix there is no such things as cross or dot product? thats weird. my
tutor tells us to know the difference between cross and dot matrix
product
so for the case of the parallelpiped, whats the significance of the triple
product (u x v) .w? why do we use x for u&v but . for w?
is it just to tell us that we have to use sin and cos respectively? but if u
v and w were square matrix, then there won't be any sin and cos to
use? so we just multiply as usual rows by columns?
oh by definition . so that means |k.k| = (k)(k)cos(0) = (1)(1)cos(0) =
1
so |i.k| = (1)(1)cos(90) = 0 ?
so if i x k gives us -j by the right hand rule, then does it mean the
magnitude, which is |i.k| = 0 is 0? in the direction of the -j?? or are
they 2 totally different aspects?
btw, sry for another question,
why is e(w)(A)
,
where A =
(0 -1)
(1 0)
can be expressed as
( cosw -sinw)
( sinw cosw)
which is the rotational matrix anti-clockwise about the x-axis right?
thanks
Apr27-10, 08:08 AM Last edited by HallsofIvy; Apr27-10 at 08:16 AM.. #4
HallsofIvy
HallsofIvy is Offline:
Posts: 26,845
Re: Vector product VS dot product in matrix
Originally Posted by quietrain
oh.. em..
ok lets say we have
(1 2) x (4 5)
(3 4) (6 7) = so this is just rows multiply by column to get a 2x2
matrix right? so what is the difference if i replace the x sign with the
dot sign now. do i still get the same?
You can replace it by whatever symbol you like. As long as your
multiplication is "matrix multiplication" you will get the same result.
i presume one is cross (x) product , one is dot (.) product?
No, just changing the symbol doesn't make it one or the other.
or is it for matrix there is no such things as cross or dot product?
thats weird. my tutor tells us to know the difference between cross
and dot matrix product
I suspect your tutor was talking about vectors not matrices.
so for the case of the parallelpiped, whats the significance of the
triple product (u x v) .w? why do we use x for u&v but . for w?
Because you are talking about vectors not matrices!
is it just to tell us that we have to use sin and cos respectively? but
if u v and w were square matrix, then there won't be any sin and cos
to use? so we just multiply as usual rows by columns?
They are NOT matrices, they are vectors!!
You can think of vectors as "row matrices" (n by 1) or "column
matrices" (1 by n) but they still have properties that matrices in
general do not have.
oh by definition . so that means |k.k| = (k)(k)cos(0) = (1)(1)cos(0)
= 1
so |i.k| =(1)(1)cos(90) = 0 ?
Yes, that is correct.
so if i x k gives us -j by the right hand rule, then does it mean the
magnitude, which is |i.k| = 0 is 0? in the direction of the -j?? or are
they 2 totally different aspects?
No, the length of i x k is NOT |i.k|, it is .
In general, the length of is where is the angle
between and .
btw, sry for another question,
why is e(w)(A)
,
where A =
(0 -1)
(1 0)
can be expressed as
( cosw -sinw)
( sinw cosw)
which is the rotational matrix anti-clockwise about the x-axis right?
thanks
For objects other than numbers, where we have a notion of addition
and multiplication, we define higher functions by using their "Taylor
series", power series that are equal to the functions. In
particular, .
It should be easy to calculate that
and, since that is the identity matrix, it all repeats:
etc.
That gives
and you should be able to recognise those as the Taylor's series about
0 for cos(w) and sin(w).
Apr27-10, 09:15 AM #5
quietrain
quietrain is Offline:
Posts: 173
Re: Vector product VS dot product in matrix
wow.
ok, i went to check again what my tutor said and it was
"scalar and vector products in terms of matrices". so what does he
mean by this?
the scalar product is (A B C) x (D E F)T
, (so we can take the transpose
of DEF because it is symmetric matrix? or is it for some other
reason? )
so rows multiply by columns again?
but what about vector product?
for the parallelpiped, (u x v).w
so lets say u = (1,1) , v = (2,2), w = (3,3)
so u x v = (1x2, 1x2)sin(angle between vectors)
so .w = (2x3,2x3) cos(angle) ?
so if it yields 0, that vector w lies in the plane define by u and v, but if
its otherwise, then w doesn't lie in the plane of u v ?
for i x k, why is the length |i||j|? why is j introduced here? shouldn't it
be |i||k|sin(90) = 1?
oh i see.. so the right hand rule gives the direection but the magnitude
for i x k = |i||k|sin(90) = 1?
thanks a ton!
Apr27-10, 11:40 AM #6
HallsofIvy
HallsofIvy is Offline:
Posts: 26,845
Re: Vector product VS dot product in matrix
Originally Posted by quietrain
wow.
ok, i went to check again what my tutor said and it was
"scalar and vector products in terms of matrices". so what does he
mean by this?
the scalar product is (A B C) x (D E F)T
, (so we can take the
transpose of DEF because it is symmetric matrix? or is it for some
other reason? )
so rows multiply by columns again?
Okay, you think of one vector as a row matrix and the other as a
column matrix then the "dot product" is the matrix product
But the dot product is commutative isn't it? Does it really make sense
to treat the two vectors as different kinds of matrices? It is really
better here to think of this not as the product of two vectors but a
vector in a vector space and functional in the dual space.
but what about vector product
for the parallelpiped, (u x v).w
so lets say u = (1,1) , v = (2,2), w = (3,3)
so u x v = (1x2, 1x2)sin(angle between vectors)
so .w = (2x3,2x3) cos(angle) ?
so if it yields 0, that vector w lies in the plane define by u and v, but
if its otherwise, then w doesn't lie in the plane of u v ?
for i x k, why is the length |i||j|? why is j introduced here? shouldn't
it be |i||k|sin(90) = 1?
Yes, that was a typo. I meant |i||k|.
oh i see.. so the right hand rule gives the direection but the
magnitude for i x k = |i||k|sin(90) = 1?
Yes.
thanks a ton!
Matrix (mathematics)
From Wikipedia, the free encyclopedia
Specific entries of a matrix are often referenced by using pairs of subscripts.
In mathematics, a matrix (plural matrices, or less commonly matrixes) is a
rectangular array of numbers, such as
An item in a matrix is called an entry or an element. The example has entries 1,
9, 13, 20, 55, and 4. Entries are often denoted by a variable with twosubscripts,
as shown on the right. Matrices of the same size can be added and subtracted
entrywise and matrices of compatible sizes can be multiplied. These operations
have many of the properties of ordinary arithmetic, except that matrix
multiplication is not commutative, that is, AB and BA are not equal in general.
Matrices consisting of only one column or row define the components of vectors,
while higher-dimensional (e.g., three-dimensional) arrays of numbers define the
components of a generalization of a vector called a tensor. Matrices with entries
in other fields or rings are also studied.
Matrices are a key tool in linear algebra. One use of matrices is to
represent linear transformations, which are higher-dimensional analogs of linear
functions of the form f(x) = cx, where c is a constant; matrix multiplication
corresponds to composition of linear transformations. Matrices can also keep
track of thecoefficients in a system of linear equations. For a square matrix,
the determinant and inverse matrix (when it exists) govern the behavior of
solutions to the corresponding system of linear equations, and eigenvalues and
eigenvectors provide insight into the geometry of the associated linear
transformation.
Matrices find many applications. Physics makes use of matrices in various
domains, for example in geometrical optics and matrix mechanics; the latter led
to studying in more detail matrices with an infinite number of rows and
columns. Graph theory uses matrices to keep track of distances between pairs
of vertices in a graph. Computer graphics uses matrices to project 3-
dimensional space onto a 2-dimensional screen. Matrix calculus generalizes
classical analytical notions such as derivatives of functions or exponentials to
matrices. The latter is a recurring need in solving ordinary differential
equations.Serialism and dodecaphonism are musical movements of the 20th
century that use a square mathematical matrix to determine the pattern of music
intervals.
A major branch of numerical analysis is devoted to the development of efficient
algorithms for matrix computations, a subject that is centuries old but still an
active area of research. Matrix decomposition methods simplify computations,
both theoretically and practically. For sparse matrices, specifically tailored
algorithms can provide speedups; such matrices arise in the finite element
method, for example.
Definition
A matrix is a rectangular arrangement of numbers.[1]
For example,
An alternative notation uses large parentheses instead of box brackets:
The horizontal and vertical lines in a matrix are called rows and columns,
respectively. The numbers in the matrix are called its entries or its elements. To
specify a matrix's size, a matrix with m rows and ncolumns is called an m-by-n matrix
or m × n matrix, while m and n are called its dimensions. The above is a 4-by-3
matrix.
A matrix with one row (a 1 × n matrix) is called a row vector, and a matrix with one
column (an m × 1 matrix) is called a column vector. Any row or column of a matrix
determines a row or column vector, obtained by removing all other rows respectively
columns from the matrix. For example, the row vector for the third row of the above
matrix A is
When a row or column of a matrix is interpreted as a value, this refers to the
corresponding row or column vector. For instance one may say that two different
rows of a matrix are equal, meaning they determine the same row vector. In some
cases the value of a row or column should be interpreted just as a sequence of
values (an element of Rn
if entries are real numbers) rather than as a matrix, for
instance when saying that the rows of a matrix are equal to the corresponding
columns of its transpose matrix.
Most of this article focuses on real and complex matrices, i.e., matrices whose
entries are real or complex numbers. More general types of entries are
discussed below.
[edit]Notation
The specifics of matrices notation varies widely, with some prevailing trends.
Matrices are usually denoted using upper-case letters, while the
corresponding lower-case letters, with two subscript indices, represent the entries. In
addition to using upper-case letters to symbolize matrices, many authors use a
special typographical style, commonly boldface upright (non-italic), to further
distinguish matrices from other variables. An alternative notation involves the use of
a double-underline with the variable name, with or without boldface style, (e.g., ).
The entry that lies in the i-th row and the j-th column of a matrix is typically referred
to as the i,j, (i,j), or (i,j)th
entry of the matrix. For example, the (2,3) entry of the above
matrix A is 7. The (i, j)th
entry of a matrix A is most commonly written as ai,j.
Alternative notations for that entry are A[i,j] or Ai,j.
Sometimes a matrix is referred to by giving a formula for its (i,j)th
entry, often with
double parenthesis around the formula for the entry, for example, if the (i,j)th
entry of
A were given by aij, A would be denoted ((aij)).
An asterisk is commonly used to refer to whole rows or columns in a matrix. For
example, ai,
∗
refers to the ith
row of A, and a
∗
,j refers to the jth
column of A. The set of
all m-by-n matrices is denoted (m,n).
A common shorthand is
A = [ai,j]i=1,...,m; j=1,...,n or more briefly A = [ai,j]m×n
to define an m × n matrix A. Usually the entries ai,j are defined separately for all
integers 1 ≤ i ≤ m and 1 ≤ j ≤ n. They can however sometimes be given by one
formula; for example the 3-by-4 matrix
can alternatively be specified by A = [i − j]i=1,2,3; j=1,...,4, or simply A = ((i-j)), where the
size of the matrix is understood.
Some programming languages start the numbering of rows and columns at zero, in
which case the entries of an m-by-n matrix are indexed by 0 ≤ i ≤ m − 1 and 0
≤ j ≤ n − 1.[2]
This article follows the more common convention in mathematical
writing where enumeration starts from 1.
[edit]Basic operations
Main articles: Matrix addition, Scalar multiplication, Transpose, and Row operations
There are a number of operations that can be applied to modify matrices
called matrix addition, scalar multiplication and transposition.[3]
These form the basic
techniques to deal with matrices.
Operation Definition Example
Addition
The sum A+B of two m-
by-n matrices A and B is
calculated entrywise:
(A + B)i,j = Ai,j + Bi
,j, where 1
≤ i ≤ m and 1
≤ j ≤ n.
Scalar
multiplicati
on
The scalar
multiplication cA of
a matrix A and a
number c (also
called a scalar in
the parlance
ofabstract algebra)
is given by
multiplying every
entry of A by c:
(cA)i,j = c · Ai,j.
Transp
ose
The transpose
of an m-by-
n matrix A is
the n-by-m m
atrix AT
(also
denoted Atr
or
t
A) formed by
turning rows
into columns
and vice
versa:
(AT
)i,j = Aj,i.
Familiar properties of numbers extend to these operations of matrices: for example,
addition is commutative, i.e. the matrix sum does not depend on the order of the
summands: A + B = B + A.[4]
The transpose is compatible with addition and scalar
multiplication, as expressed by (cA)T
= c(AT
) and (A + B)T
= AT
+ BT
. Finally,
(AT
)T
= A.
Row operations are ways to change matrices. There are three types of row
operations: row switching, that is interchanging two rows of a matrix, row
multiplication, multiplying all entries of a row by a non-zero constant and finally row
addition which means adding a multiple of a row to another row. These row
operations are used in a number of ways including solving linear equations and
finding inverses.
[edit]Matrix multiplication, linear equations and linear transformations
Main article: Matrix multiplication
Schematic depiction of the matrix product AB of two matrices A and B.
Multiplication of two matrices is defined only if the number of columns of the left
matrix is the same as the number of rows of the right matrix. If A is an m-by-n matrix
and B is an n-by-p matrix, then their matrix product AB is the m-by-p matrix whose
entries are given by dot-product of the corresponding row of A and the corresponding
column of B:
where 1 ≤ i ≤ m and 1 ≤ j ≤ p.[5]
For example (the underlined entry 1 in the product is
calculated as the product 1 · 1 + 0 · 1 + 2 · 0 = 1):
Matrix multiplication satisfies the rules (AB)C = A(BC) (associativity), and
(A+B)C = AC+BC as well as C(A+B) = CA+CB (left and right distributivity),
whenever the size of the matrices is such that the various products are defined.
[6]
The product AB may be defined without BA being defined, namely
if A and B are m-by-n and n-by-k matrices, respectively, and m ≠ k. Even if both
products are defined, they need not be equal, i.e. generally one has
AB ≠ BA,
i.e., matrix multiplication is not commutative, in marked contrast to (rational, real, or
complex) numbers whose product is independent of the order of the factors. An
example of two matrices not commuting with each other is:
whereas
The identity matrix In of size n is the n-by-n matrix in which all the elements on
the main diagonal are equal to 1 and all other elements are equal to 0, e.g.
It is called identity matrix because multiplication with it leaves a matrix
unchanged: MIn = ImM = M for any m-by-n matrix M.
Besides the ordinary matrix multiplication just described, there exist other less
frequently used operations on matrices that can be considered forms of
multiplication, such as the Hadamard product and theKronecker product.[7]
They arise
in solving matrix equations such as the Sylvester equation.
[edit]Linear equations
Main articles: Linear equation and System of linear equations
A particular case of matrix multiplication is tightly linked to linear equations:
if x designates a column vector (i.e. n×1-matrix) of n variables x1, x2, ..., xn, and A is
an m-by-n matrix, then the matrix equation
Ax = b,
where b is some m×1-column vector, is equivalent to the system of linear equations
A1,1x1 + A1,2x2 + ... + A1,nxn = b1
...
Am,1x1 + Am,2x2 + ... + Am,nxn = bm .[8]
This way, matrices can be used to compactly write and deal with multiple linear
equations, i.e. systems of linear equations.
[edit]Linear transformations
Main articles: Linear transformation and Transformation matrix
Matrices and matrix multiplication reveal their essential features when related
to linear transformations, also known as linear maps. A real m-by-n matrix A gives
rise to a linear transformation Rn
→ Rm
mapping each vector x in Rn
to the (matrix)
product Ax, which is a vector in Rm
. Conversely, each linear
transformation f: Rn
→ Rm
arises from a unique m-by-n matrix A: explicitly, the (i, j)-
entry of A is theith
coordinate of f(ej), where ej = (0,...,0,1,0,...,0) is the unit vector with
1 in the jth
position and 0 elsewhere. The matrix A is said to represent the linear
map f, and A is called the transformation matrix of f.
The following table shows a number of 2-by-2 matrices with the associated linear
maps of R2
. The blue original is mapped to the green grid and shapes, the origin
(0,0) is marked with a black point.
Vertical
shear with
m=1.25.
Horizontal flip
Squeeze
mapping with
r=3/2
Scaling by a
factor of 3/2
Rotation by π/6R
= 30°
Under the 1-to-1 correspondence between matrices and linear maps, matrix
multiplication corresponds to composition of maps:[9]
if a k-by-m matrix B represents
another linear map g : Rm
→ Rk
, then the composition g ∘ f is represented
by BA since
(g ∘ f)(x) = g(f(x)) = g(Ax) = B(Ax) = (BA)x.
The last equality follows from the above-mentioned associativity of matrix
multiplication.
The rank of a matrix A is the maximum number of linearly independent row vectors of
the matrix, which is the same as the maximum number of linearly independent
column vectors.[10]
Equivalently it is thedimension of the image of the linear map
represented by A.[11]
The rank-nullity theorem states that the dimension of the kernel
of a matrix plus the rank equals the number of columns of the matrix.[12]
Square matrices
A square matrix is a matrix which has the same number of rows and columns. An n-
by-n matrix is known as a square matrix of order n. Any two square matrices of the
same order can be added and multiplied. A square matrix A is
called invertible or non-singular if there exists a matrix B such that
AB = In.[13]
This is equivalent to BA = In.[14]
Moreover, if B exists, it is unique and is called
the inverse matrix of A, denoted A−1
.
The entries Ai,i form the main diagonal of a matrix. The trace, tr(A) of a square
matrix A is the sum of its diagonal entries. While, as mentioned above, matrix
multiplication is not commutative, the trace of the product of two matrices is
independent of the order of the factors: tr(AB) = tr(BA).[15]
If all entries outside the main diagonal are zero, A is called a diagonal matrix. If
only all entries above (below) the main diagonal are zero, A is called a
lower triangular matrix (upper triangular matrix, respectively). For example, if n =
3, they look like
(diagonal), (lower) and
(upper triangular matrix).
[edit]Determinant
Main article: Determinant
A linear transformation on R2
given by the indicated matrix. The determinant
of this matrix is −1, as the area of the green parallelogram at the right is 1,
but the map reverses theorientation, since it turns the counterclockwise
orientation of the vectors to a clockwise one.
The determinant det(A) or |A| of a square matrix A is a number encoding
certain properties of the matrix. A matrix is invertible if and only if its
determinant is nonzero. Its absolute value equals the area (in R2
) or volume
(in R3
) of the image of the unit square (or cube), while its sign corresponds
to the orientation of the corresponding linear map: the determinant is
positive if and only if the orientation is preserved.
The determinant of 2-by-2 matrices is given by
the determinant of 3-by-3 matrices involves 6 terms (rule of Sarrus). The more
lengthy Leibniz formula generalises these two formulae to all dimensions.[16]
The determinant of a product of square matrices equals the product of their
determinants: det(AB) = det(A) · det(B).[17]
Adding a multiple of any row to another
row, or a multiple of any column to another column, does not change the
determinant. Interchanging two rows or two columns affects the determinant by
multiplying it by −1.[18]
Using these operations, any matrix can be transformed to a
lower (or upper) triangular matrix, and for such matrices the determinant equals the
product of the entries on the main diagonal; this provides a method to calculate the
determinant of any matrix. Finally, the Laplace expansion expresses the determinant
in terms of minors, i.e., determinants of smaller matrices.[19]
This expansion can be
used for a recursive definition of determinants (taking as starting case the
determinant of a 1-by-1 matrix, which is its unique entry, or even the determinant of a
0-by-0 matrix, which is 1), that can be seen to be equivalent to the Leibniz formula.
Determinants can be used to solve linear systems using Cramer's rule, where the
division of the determinants of two related square matrices equates to the value of
each of the system's variables.[20]
[edit]Eigenvalues and eigenvectors
Main article: Eigenvalues and eigenvectors
A number λ and a non-zero vector v satisfying
Av = λv
are called an eigenvalue and an eigenvector of A, respectively.[nb 1][21]
The number λ
is an eigenvalue of an n×n-matrix A if and only if A−λIn is not invertible, which
is equivalent to
[22]
The function pA(t) = det(A−tI) is called the characteristic polynomial of A,
its degree is n. Therefore pA(t) has at most n different roots, i.e., eigenvalues of the
matrix.[23]
They may be complex even if the entries of A are real. According to
the Cayley-Hamilton theorem, pA(A) = 0, that is to say, the characteristic polynomial
applied to the matrix itself yields the zero matrix.
[edit]Symmetry
A square matrix A that is equal to its transpose, i.e. A = AT
, is a symmetric matrix; if
it is equal to the negative of its transpose, i.e. A = −AT
, then it is a skew-symmetric
matrix. In complex matrices, symmetry is often replaced by the concept of Hermitian
matrices, which satisfy A
∗
= A, where the star or asterisk denotes the conjugate
transpose of the matrix, i.e. the transpose of the complex conjugateof A.
By the spectral theorem, real symmetric matrices and complex Hermitian matrices
have an eigenbasis; i.e., every vector is expressible as a linear combination of
eigenvectors. In both cases, all eigenvalues are real.[24]
This theorem can be
generalized to infinite-dimensional situations related to matrices with infinitely many
rows and columns, see below.
[edit]Definiteness
Matrix A; definiteness; associated quadratic form QA(x,y);
set of vectors (x,y) such that QA(x,y)=1
positive definite indefinite
1/4 x2
+ y2
1/4 x2
− 1/4 y2
Ellipse Hyperbola
A symmetric n×n-matrix is called positive-definite (respectively negative-definite;
indefinite), if for all nonzero vectors x ∈ Rn
the associatedquadratic form given by
Q(x) = xT
Ax
takes only positive values (respectively only negative values; both some negative
and some positive values).[25]
If the quadratic form takes only non-negative
(respectively only non-positive) values, the symmetric matrix is called positive-
semidefinite (respectively negative-semidefinite); hence the matrix is indefinite
precisely when it is neither positive-semidefinite nor negative-semidefinite.
A symmetric matrix is positive-definite if and only if all its eigenvalues are positive.
[26]
The table at the right shows two possibilities for 2-by-2 matrices.
Allowing as input two different vectors instead yields the bilinear form associated
to A:
BA (x, y) = xT
Ay.[27]
[edit]Computational aspects
In addition to theoretical knowledge of properties of matrices and their relation to
other fields, it is important for practical purposes to perform matrix calculations
effectively and precisely. The domain studying these matters is called numerical
linear algebra.[28]
As with other numerical situations, two main aspects are
the complexity of algorithms and theirnumerical stability. Many problems can be
solved by both direct algorithms or iterative approaches. For example, finding
eigenvectors can be done by finding a sequence of vectors xn converging to an
eigenvector when n tends to infinity.[29]
Determining the complexity of an algorithm means finding upper bounds or estimates
of how many elementary operations such as additions and multiplications of scalars
are necessary to perform some algorithm, e.g. multiplication of matrices. For
example, calculating the matrix product of two n-by-n matrix using the definition
given above needs n3
multiplications, since for any of the n2
entries of the
product, n multiplications are necessary. The Strassen algorithm outperforms this
"naive" algorithm; it needs only n2.807
multiplications.[30]
A refined approach also
incorporates specific features of the computing devices.
In many practical situations additional information about the matrices involved is
known. An important case are sparse matrices, i.e. matrices most of whose entries
are zero. There are specifically adapted algorithms for, say, solving linear
systems Ax = b for sparse matrices A, such as the conjugate gradient method.[31]
An algorithm is, roughly speaking, numerical stable, if little deviations (such as
rounding errors) do not lead to big deviations in the result. For example, calculating
the inverse of a matrix via Laplace's formula(Adj (A) denotes the adjugate
matrix of A)
A−1
= Adj(A) / det(A)
may lead to significant rounding errors if the
determinant of the matrix is very small.
The norm of a matrix can be used to capture
the conditioning of linear algebraic problems,
such as computing a matrix' inverse.[32]
Although most computer languages are not designed with commands or libraries for
matrices, as early as the 1970s, some engineering desktop computers such as
the HP 9830 had ROM cartridges to add BASIC commands for matrices. Some
computer languages such as APL were designed to manipulate matrices,
and various mathematical programs can be used to aid computing with matrices.[33]
[edit]Matrix decomposition methods
Main articles: Matrix decomposition, Matrix diagonalization, and Gaussian
elimination
There are several methods to render matrices into a more easily accessible form.
They are generally referred to as matrix transformation or matrix
decomposition techniques. The interest of all these decomposition techniques is that
they preserve certain properties of the matrices in question, such as determinant,
rank or inverse, so that these quantities can be calculated after applying the
transformation, or that certain matrix operations are algorithmically easier to carry
out for some types of matrices.
The LU decomposition factors matrices as a product of lower (L) and an
upper triangular matrices (U).[34]
Once this decomposition is calculated, linear
systems can be solved more efficiently, by a simple technique called forward and
back substitution. Likewise, inverses of triangular matrices are algorithmically easier
to calculate. The Gaussian elimination is a similar algorithm; it transforms any matrix
to row echelon form.[35]
Both methods proceed by multiplying the matrix by
suitable elementary matrices, which correspond to permuting rows or columns and
adding multiples of one row to another row. Singular value decomposition expresses
any matrix A as a product UDV
∗
, where U and V are unitary matrices and D is a
diagonal matrix.
A matrix in Jordan normal form. The grey blocks are called Jordan blocks.
The eigendecomposition or diagonalization expresses A as a product VDV−1
,
where D is a diagonal matrix and V is a suitable invertible matrix.[36]
If A can be
written in this form, it is called diagonalizable. More generally, and applicable to all
matrices, the Jordan decomposition transforms a matrix into Jordan normal form,
that is to say matrices whose only nonzero entries are the eigenvalues λ1 to λn of A,
placed on the main diagonal and possibly entries equal to one directly above the
main diagonal, as shown at the right.[37]
Given the eigendecomposition, the nth
power
of A (i.e. n-fold iterated matrix multiplication) can be calculated via
An
= (VDV−1
)n
= VDV−1
VDV−1
...VDV−1
= VDn
V−1
and the power of a diagonal matrix can be calculated by taking the corresponding
powers of the diagonal entries, which is much easier than doing the exponentiation
for Ainstead. This can be used to compute the matrix exponential eA
, a need
frequently arising in solving linear differential equations, matrix
logarithms and square roots of matrices.[38]
To avoid numerically ill-
conditioned situations, further algorithms such as the Schur decomposition can be
employed.[39]
[edit]Abstract algebraic aspects and
generalizations
Matrices can be generalized in different ways. Abstract algebra uses matrices with
entries in more general fields or even rings, while linear algebra codifies properties of
matrices in the notion of linear maps. It is possible to consider matrices with infinitely
many columns and rows. Another extension are tensors, which can be seen as
higher-dimensional arrays of numbers, as opposed to vectors, which can often be
realised as sequences of numbers, while matrices are rectangular or two-
dimensional array of numbers.[40]
Matrices, subject to certain requirements tend to
form groups known as matrix groups.
[edit]Matrices with more general entries
This article focuses on matrices whose entries are real or complex
numbers. However, matrices can be considered with much more general types of
entries than real or complex numbers. As a first step of generalization, any field, i.e.
a set where addition, subtraction, multiplication and division operations are defined
and well-behaved, may be used instead of R or C, for example rational
numbers or finite fields. For example, coding theory makes use of matrices over
finite fields. Wherever eigenvalues are considered, as these are roots of a
polynomial they may exist only in a larger field than that of the coefficients of the
matrix; for instance they may be complex in case of a matrix with real entries. The
possibility to reinterpret the entries of a matrix as elements of a larger field (e.g., to
view a real matrix as a complex matrix whose entries happen to be all real) then
allows considering each square matrix to possess a full set of eigenvalues.
Alternatively one can consider only matrices with entries in an algebraically closed
field, such as C, from the outset.
More generally, abstract algebra makes great use of matrices with entries in
a ring R.[41]
Rings are a more general notion than fields in that no division operation
exists. The very same addition and multiplication operations of matrices extend to
this setting, too. The set M(n, R) of all square n-by-n matrices over R is a ring
called matrix ring, isomorphic to the endomorphism ring of the left R-moduleRn
.[42]
If
the ring R is commutative, i.e., its multiplication is commutative, then M(n, R) is a
unitary noncommutative (unless n = 1) associative algebra over R.
The determinant of square matrices over a commutative ring R can still be defined
using the Leibniz formula; such a matrix is invertible if and only if its determinant
is invertible in R, generalising the situation over a field F, where every nonzero
element is invertible.[43]
Matrices over superrings are called supermatrices.[44]
Matrices do not always have all their entries in the same ring - or even in any ring at
all. One special but common case is block matrices, which may be considered as
matrices whose entries themselves are matrices. The entries need not be quadratic
matrices, and thus need not be members of any ordinary ring; but their sizes must
fulfil certain compatibility conditions.
[edit]Relationship to linear maps
Linear maps Rn
→ Rm
are equivalent to m-by-n matrices, as described above. More
generally, any linear map f: V → W between finite-dimensional vector spaces can be
described by a matrix A = (aij), after choosing bases v1, ..., vn of V,
and w1, ..., wm of W (so n is the dimension of V and m is the dimension of W), which
is such that
In other words, column j of A expresses the image of vj in terms of the basis
vectors wi of W; thus this relationuniquely determines the entries of the matrix A.
Note that the matrix depends on the choice of the bases: different choices of bases
give rise to different, but equivalent matrices.[45]
Many of the above concrete notions
can be reinterpreted in this light, for example, the transpose matrix AT
describes
thetranspose of the linear map given by A, with respect to the dual bases.[46]
Graph theory
An undirected graph with adjacency matrix
The adjacency matrix of a finite graph is a basic notion of graph theory.[62]
It saves
which vertices of the graph are connected by an edge. Matrices containing just two
different values (0 and 1 meaning for example "yes" and "no") are called logical
matrices. The distance (or cost) matrix contains information about distances of the
edges.[63]
These concepts can be applied to websites connected hyperlinks or cities
connected by roads etc., in which case (unless the road network is extremely dense)
the matrices tend to be sparse, i.e. contain few nonzero entries. Therefore,
specifically tailored matrix algorithms can be used in network theory.
[edit]Analysis and geometry
The Hessian matrix of a differentiable function ƒ: Rn
→ R consists of the second
derivatives of ƒ with respect to the several coordinate directions, i.e.[64]
It encodes information about the local growth behaviour of the function: given
a critical point x = (x1, ..., xn), i.e., a point where the first partial
derivatives of ƒ vanish, the function has a local minimum if the Hessian
matrix is positive definite. Quadratic programming can be used to find global
minima or maxima of quadratic functions closely related to the ones attached to
matrices (see above).[65]
At the saddle point (x = 0, y = 0) (red) of the function f(x,−y) = x2
− y2
, the
Hessian matrix is indefinite.
Another matrix frequently used in geometrical situations is the Jacobi matrix of a
differentiable map f: Rn
→ Rm
. If f1, ..., fm denote the components of f, then the
Jacobi matrix is defined as [66]
If n > m, and if the rank of the Jacobi matrix attains its maximal value m, f is
locally invertible at that point, by the implicit function theorem.[67]
Partial differential equations can be classified by considering the matrix of
coefficients of the highest-order differential operators of the equation.
For elliptic partial differential equations this matrix is positive definite, which
has decisive influence on the set of possible solutions of the equation in
question.[68]
The finite element method is an important numerical method to solve partial
differential equations, widely applied in simulating complex physical
systems. It attempts to approximate the solution to some equation by
piecewise linear functions, where the pieces are chosen with respect to a
sufficiently fine grid, which in turn can be recast as a matrix equation.[69]
[edit]Probability theory and statistics
Two different Markov chains. The chart depicts the number of particles (of a
total of 1000) in state "2". Both limiting values can be determined from the
transition matrices, which are given by (red) and (black).
Stochastic matrices are square matrices whose rows are probability
vectors, i.e., whose entries sum up to one. Stochastic matrices are used to
define Markov chains with finitely many states.[70]
A row of the stochastic
matrix gives the probability distribution for the next position of some particle
which is currently in the state corresponding to the row. Properties of the
Markov chain like absorbing states, i.e. states that any particle attains
eventually, can be read off the eigenvectors of the transition matrices.[71]
Statistics also makes use of matrices in many different forms.[72]
Descriptive
statistics is concerned with describing data sets, which can often be represented in
matrix form, by reducing the amount of data. The covariance matrix encodes the
mutual variance of several random variables.[73]
Another technique using matrices
are linear least squares, a method that approximates a finite set of pairs (x1, y1),
(x2, y2), ..., (xN, yN), by a linear function
yi ≈ axi + b, i = 1, ..., N
which can be formulated in terms of matrices, related to the singular value
decomposition of matrices.[74]
Random matrices are matrices whose entries are random numbers, subject to
suitable probability distributions, such as matrix normal distribution. Beyond
probability theory, they are applied in domains ranging from number
theory to physics.[75][76]
[edit]Symmetries and transformations in physics
Further information: Symmetry in physics
Linear transformations and the associated symmetries play a key role in modern
physics. For example, elementary particles in quantum field theory are classified as
representations of the Lorentz group of special relativity and, more specifically, by
their behavior under the spin group. Concrete representations involving the Pauli
matrices and more general gamma matrices are an integral part of the physical
description of fermions, which behave as spinors.[77]
For the three lightest quarks,
there is a group-theoretical representation involving the special unitary group SU(3);
for their calculations, physicists use a convenient matrix representation known as
the Gell-Mann matrices, which are also used for the SU(3) gauge group that forms
the basis of the modern description of strong nuclear interactions, quantum
chromodynamics. The Cabibbo–Kobayashi–Maskawa matrix, in turn, expresses the
fact that the basic quark states that are important for weak interactions are not the
same as, but linearly related to the basic quark states that define particles with
specific and distinct masses.[78]
[edit]Linear combinations of quantum states
The first model of quantum mechanics (Heisenberg, 1925) represented the theory's
operators by infinite-dimensional matrices acting on quantum states.[79]
This is also
referred to as matrix mechanics. One particular example is the density matrix that
characterizes the "mixed" state of a quantum system as a linear combination of
elementary, "pure" eigenstates.[80]
Another matrix serves as a key tool for describing the scattering experiments which
form the cornerstone of experimental particle physics: Collision reactions such as
occur in particle accelerators, where non-interacting particles head towards each
other and collide in a small interaction zone, with a new set of non-interacting
particles as the result, can be described as the scalar product of outgoing particle
states and a linear combination of ingoing particle states. The linear combination is
given by a matrix known as the S-matrix, which encodes all information about the
possible interactions between particles.[81]
[edit]Normal modes
A general application of matrices in physics is to the description of linearly coupled
harmonic systems. The equations of motion of such systems can be described in
matrix form, with a mass matrix multiplying a generalized velocity to give the kinetic
term, and a force matrix multiplying a displacement vector to characterize the
interactions. The best way to obtain solutions is to determine the
system'seigenvectors, its normal modes, by diagonalizing the matrix equation.
Techniques like this are crucial when it comes to describing the internal dynamics
of molecules: the internal vibrations of systems consisting of mutually bound
component atoms.[82]
They are also needed for describing mechanical vibrations, and
oscillations in electrical circuits.[83]
[edit]Geometrical optics
Geometrical optics provides further matrix applications. In this approximative theory,
the wave nature of light is neglected. The result is a model in which light rays are
indeed geometrical rays. If the deflection of light rays by optical elements is small,
the action of a lens or reflective element on a given light ray can be expressed as
multiplication of a two-component vector with a two-by-two matrix called ray transfer
matrix: the vector's components are the light ray's slope and its distance from the
optical axis, while the matrix encodes the properties of the optical element. Actually,
there will be two different kinds of matrices, viz. a refraction matrix describing de
madharchod refraction at a lens surface, and a translation matrix, describing the
translation of the plane of reference to the next refracting surface, where another
refraction matrix will apply. The optical system consisting of a combination of lenses
and/or reflective elements is simply described by the matrix resulting from the
product of the components' matrices.[84]
[edit]Electronics
The behaviour of many electronic components can be described using matrices.
Let A be a 2-dimensional vector with the component's input voltage v1 and input
current i1 as its elements, and let B be a 2-dimensional vector with the component's
output voltage v2 and output current i2 as its elements. Then the behaviour of the
electronic component can be described by B = H · A, where H is a 2 x 2 matrix
containing one impedance element (h12), one admittance element (h21) and
two dimensionless elements (h11 and h22). Calculating a circuit now reduces to
multiplying matrices.
[edit]History
Matrices have a long history of application in solving linear equations. The Chinese
text The Nine Chapters on the Mathematical Art (Jiu Zhang Suan Shu), from
between 300 BC and AD 200, is the first example of the use of matrix methods to
solve simultaneous equations,[85]
including the concept of determinants, almost 2000
years before its publication by the Japanese mathematician Seki in 1683 and the
German mathematician Leibniz in 1693. Cramer presented Cramer's rule in 1750.
Early matrix theory emphasized determinants more strongly than matrices and an
independent matrix concept akin to the modern notion emerged only in 1858,
with Cayley's Memoir on the theory of matrices.[86][87]
The term "matrix" was coined
by Sylvester, who understood a matrix as an object giving rise to a number of
determinants today called minors, that is to say, determinants of smaller matrices
which derive from the original one by removing columns and rows. Etymologically,
matrix derives from Latin mater (mother).[88]
The study of determinants sprang from several sources.[89]
Number-
theoretical problems led Gauss to relate coefficients of quadratic forms, i.e.,
expressions such as x2
+ xy − 2y2
, and linear maps in three dimensions to
matrices. Eisenstein further developed these notions, including the remark that, in
modern parlance, matrix products are non-commutative. Cauchy was the first to
prove general statements about determinants, using as definition of the determinant
of a matrix A = [ai,j] the following: replace the powers aj
k
by ajk in the polynomial
where Π denotes the product of the indicated terms. He also showed, in 1829, that
the eigenvalues of symmetric matrices are real.[90]
Jacobi studied "functional
determinants"—later called Jacobi determinants by Sylvester—which can be used to
describe geometric transformations at a local (or infinitesimal) level,
see above; Kronecker's Vorlesungen über die Theorie der
Determinanten[91]
andWeierstrass' Zur Determinantentheorie,[92]
both published in
1903, first treated determinants axiomatically, as opposed to previous more concrete
approaches such as the mentioned formula of Cauchy. At that point, determinants
were firmly established.
Many theorems were first established for small matrices only, for example
the Cayley-Hamilton theorem was proved for 2×2 matrices by Cayley in the
aforementioned memoir, and by Hamilton for 4×4 matrices. Frobenius, working
on bilinear forms, generalized the theorem to all dimensions (1898). Also at the end
of the 19th century the Gauss-Jordan elimination (generalizing a special case now
known asGauss elimination) was established by Jordan. In the early 20th century,
matrices attained a central role in linear algebra.[93]
partially due to their use in
classification of the hypercomplex number systems of the previous century.
The inception of matrix mechanics by Heisenberg, Born and Jordan led to studying
matrices with infinitely many rows and columns.[94]
Later, von Neumann carried out
the mathematical formulation of quantum mechanics, by further
developing functional analytic notions such as linear operators on Hilbert spaces,
which, very roughly speaking, correspond to Euclidean space, but with an infinity
ofindependent directions.
[edit]Other historical usages of the word "matrix" in mathematics
The word has been used in unusual ways by at least two authors of historical
importance.
Bertrand Russell and Alfred North Whitehead in their Principia Mathematica (1910–
1913) use the word matrix in the context of their Axiom of reducibility. They
proposed this axiom as a means to reduce any function to one of lower type,
successively, so that at the "bottom" (0 order) the function will be identical to
its extension[disambiguation needed]
:
"Let us give the name of matrix to any function, of however many variables, which
does not involve any apparent variables. Then any possible function other than a
matrix is derived from a matrix by means of generalization, i.e. by considering the
proposition which asserts that the function in question is true with all possible values
or with some value of one of the arguments, the other argument or arguments
remaining undetermined".[95]
For example a function Φ(x, y) of two variables x and y can be reduced to
a collection of functions of a single variable, e.g. y, by "considering" the function for
all possible values of "individuals" ai substituted in place of variable x. And then the
resulting collection of functions of the single variable y, i.e. ∀ai: Φ(ai, y), can be
reduced to a "matrix" of values by "considering" the function for all possible values of
"individuals" bi substituted in place of variable y:
∀bj∀ai: Φ(ai, bj).
Alfred Tarski in his 1946 Introduction to Logic used the word "matrix" synonymously
with the notion of truth table as used in mathematical logic.[96]
[edit]See also
Median
The median is the middle value in a set of numbers. In the set [1,2,3] the median is
2. If the set has an even number of values, the median is the average of the two in
the middle. For example, the median of [1,2,3,4] is 2.5, because that is the average
of 2 and 3.
The median is often used when analyzing statistical studies, for example the income
of a nation. While the arithmetic mean (average) is a simple calculation that most
people understand, it is skewed upwards by just a few high values. The average
income of a nation might be $20,000 but most people are much poorer. Many people
with $10,000 incomes are balanced out by just a single person with a $5,000,000
income. Therefore the median is often quoted because it shows a value that 50% of
the country makes more than and 50% of the country makes less than.
Exponential Functions
Take a look at x3 . What does it mean?
We have two parts here:
1) Exponent, which is 3.
2) Base, which is x.
x 3 = x times x times x
It's read two ways:
1) x cubed
2) x to the third power
With exponential functions, 3 is the base and x is the exponent.
So, the idea is reversed in terms of exponential functions.
Here's what exponential functions look like:
y = 3 x , f(x) = 1.124 x , etc. In other words, the exponent will be a variable.
The general exponential function looks like this: b x where the base b is ANY
constant. So, the standard form for ANY exponential function is f(x) = b x where b is
a real number greater than 0.
Sample: Solve for x
f(x) = 1.276 x
Here x can be ANY number we select.
Say, x = 1.2.
f(1.2) = 1.276 1.2
NOTE: You must follow your calculator's instructions in terms of exponents. Every
calculator is different and thus has different steps.
I will use my TI-36 SOLAR Calculator to find an approximation for x.
f(1.2) = 1.33974088
Rounding off to two decimal places I get:
f(1.2) = 1.34
We can actually graph our point (1.2, 1.34) on the xy-plane but more on that in future
exponential function lessons.
We can use the formula B(t) = 100(1.12 t ) to solve bacteria applications. We can
use the above formula to find HOW MUCH bacteria remains in a given region after a
certain amount of time. Of course, in the formula, lower case t = time. The number
100 indicates how many bacteria there were at the start of the LAB experiment. The
decimal number 1.12 indicates how fast bacteria grows.
Sample: How much bacteria in LAB 3 after 2.9 hours of work?
Okay, t = 2.9 hours.
Replace t with 2.9 hours in the formula above and simplify.
B(2.9 hours) = 100(1.12 2.9 hours)
B(2.9 hours) = 100(1.389096016)
B(2.9 hours) = 138.9096
NOTE: An exponent can be ANY real number, positive or negative. For ANY
exponential function, the domain will be ALL real numbers.
Trig Addition Formulas
The trig addition formulas can be useful to simplify a complicated expression, or
perhaps find an exact value when you only have a small table of trig values. For
example, if you want the sine of 15 degrees, you can use a subtraction formula to
compute sin(15) as sin(45-30).
Trigonometry Derivatives
While you may know how to take the derivative of a polynomial, what happens when
you need to take the derivative of a trig function? What IS the derivative of a sine?
Luckily, the derivatives of trig functions are simple -- they're other trig functions! For
example, the derivative of sine is just cosine:
The rest of the trig functions are also straightforward once you learn them, but they
aren't QUITE as easy as the first two.
Derivatives of Trigonometry Functions
sin'(x) = cos(x)
cos'(x) = -sin(x)
tan'(x) = sec2
(x)
sec'(x) = sec(x)tan(x)
cot'(x) = -csc2
(x)
csc'(x) = -csc(x)cot(x)
Take a look at this graphic for an illustration of what this means. At the first point
(around x=2*pi), the cosine isn't changing. You can see that the sine is 0, and since
negative sine is the rate of change of cosine, cosine would be changing at a rate of
-0.
At the second point I've illustrated (x=3*pi), you can see that the sine is decreasing
rapidly. This makes sense because the cosine is negative. Since cosine is the rate of
change of sine, a negative cosine means the sine is decreasing.
Double and Half Angle Formulas
The double and half angle formulas can be used to find the values of unknown trig
functions. For example, you might not know the sine of 15 degrees, but by using the
half angle formula for sine, you can figure it out based on the common value of
sin(30) = 1/2.
They are also useful for certain integration problems where a double or half angle
formula may make things much simpler to solve.
Double Angle Formulas:
You'll notice that there are several listings for the double angle for cosine. That's
because you can substitute for either of the squared terms using the basic trig
identity sin^2+cos^2=1.
Half Angle Formulas:
These are a little trickier because of the plus or minus. It's not that you can use
BOTH, but you have to figure out the sign on your own. For example, the sine of 30
degrees is positive, as is the sine of 15. However, if you were to use 200, you'd find
that the sine of 200 degrees is negative, while the sine of 100 is positive. Just
remember to look at a graph and figure out the sines and you'll be fine.
The magic identity
Trigonometry is the art of doing algebra over the circle. So it is a mixture of algebra and
geometry. The sine and cosine functions are just the coordinates of a point on the unit circle.
This implies the most fundamental formula in trigonometry (which we will call here the
magic identity)
where is any real number (of course measures an angle).
Example. Show that
Answer. By definitions of the trigonometric functions we have
Hence we have
Using the magic identity we get
This completes our proof.
Remark. the above formula is fundamental in many ways. For example, it is very useful in
techniques of integration.
Example. Simplify the expression
Answer. We have by definition of the trigonometric functions
Hence
Using the magic identity we get
Putting stuff together we get
This gives
Using the magic identity we get
Therefore we have
Example. Check that
Answer.
Example. Simplify the expression
Answer.
The following identities are very basic to the analysis of trigonometric expressions and
functions. These are called Fundamental Identities
Reciprocal identities
Pythagorean Identities
Quotient Identities
Understanding sine
A teaching guideline/lesson plan when first teaching sine (grades 7-9)
The sine is simply a RATIO of certain sides of a right triangle. Look at the triangles
below. They all have the same shape. That means they have the SAME ANGLES
but the lengths of the sides may be different. In other words, they are SIMILAR
figures.
Have your child/students measure the sides s1, h1, s2, h2, s3, h3 as accurately as
possible (or draw several similar right triangles on their own).
Then let her calculate the following s1 , s2 s3 . What can you
ratios:
h1 h2
h3
note?
Those ratios should all be the same (or close to same due to measuring errors).
That is so because the triangles have the same shape (or are similar), which means
their respective parts are PROPORTIONAL. That is why the ratio of those parts
remains the same. Now ask your child what would happen if we had a fourth triangle
with the same shape. The answer of course is that even in that fourth triangle the
ratio s4/h4 would be the same.
The ratio you calculated remains the same for all the triangles. Why? Because the
triangles were similar so their sides were proportional. SO, in all right triangles
where the angles are the same, this one ratio is the same too. We associate this
ratio with the angle α. THAT RATIO IS CALLED THE SINE OF THE ANGLE α.
What follows is that if you know the ratio, you can find what the angle α is. Or in
other words, if you know the sine of α, you can find α. Or, if you know what α is, you
can find this ratio - and when you know this ratio and one side of a right triangle, you
can find the other sides.
:
s1
h1
=
s2
h2
=
s3
h3
= sin α = 0.57358
In our pictures the angle α is 35 degrees. So sin 35 = 0.57358 (rounded to five
decimals). We can use this fact when dealing with OTHER right triangles that have
a 35 angle. See, other such triangles are, again, similar to these ones we see here,
so the ratio of the opposite side to the hypotenuse, WHICH IS THE SINE OF THE 35
ANGLE, is the same! So in another such triangle, if you only know the hypotenuse,
you can calculate the opposite side since you know the ratio, or vice versa.
Problem
Suppose we have a triangle that has the same shape as the triangles above. The
side opposite to the 35 angle is 5 cm. How long is the hypotenuse?
SOLUTION: Let h be that hypotenuse. Then
5cm = sin 35 ≈ 0.57358
h
From this equation one can easily solve
that h =
5cm
0.57358
≈ 8.72 cm
An example
The two triangles are pictured both overlapping and separate. We can find H3
simply by the fact that these two triangles are similar. Since the triangles are similar,
3.9
h3
=
2.6
6
, from which h3
=
6 × 3.9
2.6
=
9
We didn't even need the sine to solve that, but note how closely it ties in with similar
triangles.
The triangles have the same angle α. Sin α of course would be
the ratio
2.6
6
or
3.9
9
≈
0.4333.
Now we can find the actual angle α from the calculator:
Since sin α = 0.4333, then α = sin-1
0.4333 ≈ 25.7 degrees.
Test your understanding
1. Draw a right triangle that has a 40 angle. Then measure the opposite side and
the hypotenuse and use those measurements to calculate sin 40. Check your
answer by plugging into calculator sin 40 (remember the calculator has to be in the
degrees mode instead of radians mode).
2. Draw two right triangles that have a 70 angle - but that are of different sizes. Use
the first triangle to find sin 70 (like you did in problem 1). Then measure the
hypotenuse of your second triangle. Use sin 70 and the measurement of the
hypotenuse to find the opposite side in your second triangle. Check by measuring
the opposite side from your triangle.
3. Draw a right triangle that has a 48 angle. Measure the hypotenuse. Then use sin
48 (from a calculator) and your measurement to calculate the length of the opposite
side. Check by measuring the opposite side from your triangle.
Someone asked me once, "When I type in sine in my graphic calculator, why
does it give me a wave?"
Read my answer where we get the familiar sine wave.
My question is that if some one give us only the length of sides of triangle,
how can we draw a triangle?
sajjad ahmed shah
This is an easy construction. See
Constructing a Triangle
and
Constructing a triangle when the lengths of 3 sides are known
if i am in a plane flying at 30000 ft how many linear miles of ground can i see.
and please explain how that answer is generated. does it have anything to do
with right triangles and the pythagorean therom
jim taucher
The image below is NOT to scale - it is just to help in the problem. The angle
α is much smaller in reality.
Yes, you have a right triangle. r is the Earth's radius. Now, Earth's radius is
not constant but varies because Earth is not a perfect sphere. For this
problem, I was using the mean radius, 3959.871 miles. This also means our
answer will be just approximate. I also converted 30000 feet to 5.6818182
miles.
First we calculate α using cosine. You should get α is 3.067476356 degrees.
Then, we use a proportion comparing α and 360 degrees and x and earth's
circumference. You will get x ≈ 212 miles. Even that result might be too
'exact'.
ntroduction:
The Unit circle is used to understand the sins and cos of angles to find 90
degree triangle. Its radius is exactly one. The center of circle is said to be the origin
and its perimeter comprises the set of all points that are exactly one unit from the
center of the circle while placed in the plane.Its just a circle with radius ‘one’.
Unit Circle -standard Equation:
The distance from the origin point(x,y) is by using Pythagorean Theorem.
Here, radius is one So, The expression should becomes =1
Take square on both sides then the equation becomes,
X2
+y2
=1
Positive angles are found using counterclockwise from the positive x axis
And negative angles are found anti clockwise from negative x axis.
Unit Circle-sine and Cosine
Sine ,Cosine:
You Can directly measure the Sine theta and cosine theta,becasue radius of the unit
circle is one,
By using this condition,When the angle is is 0 deg.
So,
cosine =1 ,sine =0 and tangent =0
if is 90 degree. cosine =0 ,sine =1 and tangent is undefined.
Calculating 60 °, 30° ,45 °
Note : Radius of the Unit Circle is 1
Take 60°
Consider eqiilateral triangle.All Sides are equal and all angles are same.So x
side is now 1/2 and Y side will be,
+ =
+ =1
Therefore,
Y = =
Therefore , cos =1/2 = 0.5 and sine = = 0.8660
Take 30°
Here 30 is just 60 swapped over So we get ,cosine = sqrt(3/4) = 0.8660 and sine
=1/2 = 0.5
Take 45°
X = Y so , + = 12
Therefore , cos = = 0.70 and sine = = 0.70
he standard definition of sine is usually introduced as:
which is correct, but makes the eyes glaze over and doesn’t even hint
that the ancient Egyptians used trigonometry to survey their land .
The ancient Egyptians noticed if two different triangles both have the
same angles (similar triangles), then no matter what their size, the
relationships between the sides were always the same.
The ratio of side a to side b is the same as the ratio of side A to side B.
This is true for all combinations of sides – their ratios are always the
same. Expressed as fractions we would write:
With numbers it might look like: if side a is 1 unit long and side b is 2
units long and the larger triangle has sides that are twice as long,
then side A is 2 units and side B is 4 units long. Writing the ratios as
fractions we see:
To determine if two triangles are similar (have the same angles), we have
to measure two angles. The angles inside a triangle always add up to 180
degrees. If you know two of the angles, then you can figure out the third.
Later, an insightful Greek realized that if the triangle is a right angle
triangle, then we already know one angle – it is 90 degrees, so we don’t
need to measure it, therefore we only have to measure one of the other
angles to determine if two right angle triangles are similar. This insight
turned out to be incredibly useful for measuring things.
We can make a bunch of small right angle triangles and measure their
sides and calculate the ratios of the sides. (or we could just write them in
a book, “A right angle triangle with a measured angle of 1 degree has the
following ratios ...”)
Knowing all these ratios for different angles, we can then measure things.
For example, if you are a landowner and want to measure how long your
fields are, you could do the following:
• send a servant with a 3 meter tall staff to the end of the field
• measure the angle to the top of the staff
• consult your table of similar right angle triangles and determine the
ratio of the sides (in this case, we would be looking for staff / distance to
measure)
• calculate the length of the field using the ratio and the known
height of the staff
If the measured angle was 5 degrees, then we know (from a similar
triangle) that the ratio of staff to distance to measure is 0.0874, or:
Rearranging the equation we get:
a distance of 34.3 meters.
We call this ratio the tangent of the angle.
The sine, cosine, tangent, secant, cosecant and cotangent refer to the
ratio of a specific pair of sides of the triangle at the given angle A. The
problem is that the names aren’t very informative or intuitive. Sine comes
from the Latin word sinus which means fold or bend. Looking at our
original definition, it now makes a little more sense:
The sine of angle A is equal to the ratio of the sides at the bend in the
triangle as seen from A. Or opposite divided by hypotenuse.
The ratio of a given pair of sides in a right angle triangle are given the
following names:
There is no simple way to remember which ratios go with which
trigonometric function, although it is easier if you know some of the
history behind it.
sin, cos, tan, sec, csc, and cot are a shorthand way of referring to the
ratio of a specific pair of sides in a right angle triangle.
An Introduction to Trigonometry ... by Brandon Williams
Main Index...
Introduction
Well it is nearly one in the morning and I have tons of work to do and a fabulous idea
pops into my head: How about writing an introductory tutorial to trigonometry! I am
going to fall so far behind. And once again I did not have the chance to proof read this
or check my work so if you find any mistakes e-mail me.
I'm going to try my best to write this as if the reader has no previous knowledge of
math (outside of some basic Algebra at least) and I'll do my best to keep it consistent.
There may be flaws or gaps in my logic at which point you can e-mail me and I will
do my best to go back over something more specific. So let's begin with a standard
definition of trigonometry:
trig - o - nom - e - try n. - a branch of mathematics which deals with relations between
sides and angles of triangles
Basics
Well that may not sound very interesting at the moment but trigonometry is the most
interesting forms of math I have come across…and just to let you know I do not have
an extensive background in math. Well since trigonometry has a lot to do with angles
and triangles let's familiarize ourselves with some fundamentals. First a right triangle:
A right triangle is a triangle that has one 90-degree angle. The 90-degree angle is
denoted with a little square drawn in the corner. The two sides that are adjacent to the
90-degree angle, 'a' and 'b', are called the legs. The longer side opposite of the 90-
degree angle, 'c', is called the hypotenuse. The hypotenuse is always longer than the
legs. While we are on the subject lets brush up on the Pythagorean Theorem. The
Pythagorean Theorem states that the sum of the two legs squared is equal to the
hypotenuse squared. An equation you can use is:
c^2 = a^2 + b^2
So lets say we knew that 'a' equaled 3 and 'b' equaled 4 how would we find the length
of 'c'…assuming this is in fact a right triangle. Plug-in the values that you know into
your formula:
c^2 = 3^2 + 4^2
Three squared plus four squared is twenty-five so we now have this:
c^2 = 25 - - - > Take the square root of both sides and you now know that c = 5
So now we are passed some of the relatively boring parts. Let's talk about certain
types of right triangles. There is the 45-45-90 triangle and the 30-60-90 triangle. We
might as well learn these because we'll need them later when we get to the unit circle.
Look at this picture and observe a few of the things going on for a 45-45-90 triangle:
In a 45-45-90 triangle you have a 90-degree angle and two 45-degree angles (duh) but
also the two legs are equal. Also if you know the value of 'c' then the legs are simply
'c' multiplied by the square root of two divided by two. I rather not explain that
because I would have to draw more pictures…hopefully you will be able to prove it
through your own understanding. The 30-60-90 triangle is a little but harder to get but
I am not going into to detail with it…here is a picture:
You now have one 30-degree angle, a 60-degree angle, and a 90-degree angle. This
time the relationship between the sides is a little different. The shorter side is half of
the hypotenuse. The longer side is the hypotenuse times the square root of 3 all
divided by two. That's all I'm really going to say on this subject but make sure you get
this before you go on because it is crucial in understanding the unit circle…which in
turn is crucial for understanding trigonometry.
Trigonometric Functions
The entire subject of trigonometry is mostly based on these functions we are about to
learn. The three basic ones are sine, cosine, and tangent. First to clear up any
confusion that some might have: these functions mean nothing with out a number with
them i.e. sin (20) is something…sin is nothing. Make sure you know that. Now for
some quick definitions (these are my own definitions…if you do not get what I am
saying look them up on some other website):
Sine - the ratio of the side opposite of an angle in a right triangle over the hypotenuse.
Cosine - the ratio of the side adjacent of an angle in a right triangle over the
hypotenuse.
Tangent - the ratio of the side opposite of an angle in a right triangle over the adjacent
side.
Now before I go on I should also say that those functions only find ratios and nothing
more. It may seem kind of useless now but they are very powerful functions. Also I
am only going to explain the things that I think are useful in Flash…I could go off on
some tangent (no pun intended) on other areas of Trigonometry but I'll try to keep it
just to the useful stuff. OK lets look at a few pictures:
Angles are usually denoted with capital case letters so that is what I used. Now lets
find all of the trigonometry ratios for angle A:
sin A = 4/5
cos A = 3/5
tan A = 4/3
Now it would be hard for me to explain more than what I have done, for this at least,
so you are just going to have to look at the numbers and see where I got them from.
Here are the ratios for angle B:
sin B = 3/5
cos B = 4/5
tan B = 3/4
Once again just look at the numbers and reread the definitions to see where I came up
with that stuff. But now that I told you a way of thinking of the ratios like opposite
over hypotenuse there is one more way which should be easier and will also be
discussed more later on. Here is a picture…notice how I am only dealing with one
angle:
The little symbol in the corner of the triangle is a Greek letter called "theta"…its
usually used to represent an unknown angle. Now with that picture we can think of
sine, cosine and tangent in a different way:
sin (theta) = x/r
cos (theta)= y/r
tan (theta)= y/x -- and x <> 0
We will be using that form most of the time. Now although I may have skipped some
kind of fundamentally important step (I'm hoping I did not) I can only think of one
place to go from here: the unit circle. Becoming familiar with the unit circle will
probably take the most work but make sure you do because it is very important. First
let me tell you about radians just in case you do not know. Radians are just another
way of measuring angles very similar to degrees. You know that there are 90 degrees
in one-quarter of a circle, 180 degrees in one-half of a circle, and 360 degrees in a
whole circle right? Well if you are dealing with radians there are 2p radians in a
whole circle instead of 360 degrees. The reason that there are 2p radians in a full
circle really is not all that important and would only clutter this "tutorial" more…just
know that it is and it will stay that way. Now if there are 2p radians in a whole circle
there are also p radians in a half, and p/2 radians in a quarter. Now its time to think
about splitting the circle into more subdivisions than just a half or quarter. Here is a
picture to help you out:
If at all possible memorize those values. You can always have a picture to look at like
this one but it will do you well when you get into the more advanced things later on if
you have it memorized. However that is not the only thing you need to memorize.
Now you need to know (from memory if you have the will power) the sine and cosine
values for every angle measure on that chart.
OK I think I cut myself short on explaining what the unit circle is when I moved on to
explaining radians. For now the only thing we need to know is that it is a circle with a
radius of one centered at (0,0). Now the really cool thing about the unit circle is what
we are about to discuss. I'm going to just pick some random angle up there on the
graph…let's say…45 degrees. Do you see that line going from the center of the circle
(on the chart above) to the edge of the circle? That point at which the line intersects
the edge of the circle is very important. The "x" coordinate of that point on the edge is
the cosine of the angle and the "y" coordinate is the sine of the angle. Very interesting
huh? So lets find the sine and cosine of 45 degrees ourselves without any calculator or
lookup tables.
Well if you remember anything that I said at the beginning of this tutorial then you
now know why I even mentioned it. In a right triangle if there is an angle with a
measure of 45 degrees the third angle is also 45 degrees. And not only that but the two
legs of the triangle have the same length. So if we think of that line coming from the
center of the circle at a 45-degree angle as a right triangle we can find the x- and y-
position of where the line intersects…look at this picture:
If we apply some of the rules we learned about 45-45-90 triangles earlier we can
accurately say that:
sqrt (2)
sin 45 = --------
2
sqrt (2)
cos 45 = ----------
2
Another way to think of sine is it's the distance from the x-axis to the point on the
edge of the circle…you can only think of it that way if you are dealing with a unit
circle. You could also think of cosine the same way except it's the distance from the
y-axis to the point on the border of the circle. If you still do not know where I came
up with those numbers look at the beginning of this tutorial for an explanation of 45-
45-90 triangles…and why you are there refresh yourself on 30-60-90 triangles
because we need to know those next.
Now lets pick an angle from the unit circle chart like 30 degrees. I'm not going to
draw another picture but you should know how to form a right triangle with a line
coming from the center of the circle to one of its edges. Now remember the rules that
governed the lengths of the sides of a 30-60-90 triangle…if you do then you can once
again accurately say that:
1
sin 30 = ----
2
sqrt (3)
cos 30 = ---------
2
I was just about to type out another explanation of why I did this but it's basically the
same as what I did for sine just above. Also now that I am rereading this I am seeing
some things that may cause confusion so I thought I would try to clear up a few
things. If you look at this picture (it's the same as the one I used a the beginning of all
this) I will explain with a little bit more detail on how I arrived at those values for sine
and cosine of 45-degrees:
Our definition of sine states that the sine of an angle would be the opposite side of the
triangle divided by the hypotenuse. Well we know our hypotenuse is one since this a
unit circle so we can substitute a one in for "c" and get this:
/ 1*sqrt(2) 
| ------------ |
 2 /
sin 45 = -------------------
1
Which even the most basic understand of Algebra will tell us that the above is the
same as:
sqrt (2)
sin 45 = --------
2
Now if you do not get that look at it really hard until it comes to you…I'm sure it will
hit you sooner or later. And instead of my wasting more time making a complete unit
circle with everything on it I found this great link to
one: http://www.infomagic.net/~bright/research/untcrcl.gif . Depending on just how
far you want to go into this field of math as well as others like Calculus you may want
to try and memorize that entire thing. Whatever it takes just try your best. I always
hear people talking about different patterns that they see which helps them to
memorize the unit circle, and that is fine but I think it makes it much easier to
remember if you know how to come up with those numbers…that's what this whole
first part of this tutorial was mostly about.
Also while on the subject I might as well tell you about the reciprocal trigonometric
functions. They are as follow:
csc (theta) = r/y
sec (theta) = r/x
cot (theta) = x/y
Those are pronounced secant, cosecant, and cotangent. Just think of them as the same
as their matching trigonometric functions except flipped…like this:
sin (theta) = y/r - - - > csc (theta) = r/y
cos (theta) = x/r - - - > sec (theta) = r/x
tan (theta) = y/x - - - > cot (theta) = x/y
That makes it a little bit easier to understand doesn't it?
Well believe it or not that is it for an introduction to trigonometry. From here we can
start to go into much more complicate areas. There are many other fundamentals that I
would have liked to go over but this has gotten long and boring enough as it is. I guess
I am hoping that you will explore some of these concepts and ideas on your own…
you will gain much more knowledge that way as opposed to my sloppy words.
Before I go…
Before I go I want to just give you a taste of what is to come…this may actually turn
out to be just as long as the above so go ahead and make yourself comfortable. First I
want to introduce to you trigonometric identities, which are trigonometric equations
that are true for all values of the variables for which the expressions in the equation
are defined. Now that's probably a little hard to understand and monotonous but I'll
explain. Here is a list of what are know as the "fundamental identities":
Reciprocal Identities
1
csc (theta) = ---------- , sin (theta) <> 0
sin (theta)
1
sec (theta) = ---------- , COs (theta) <> 0
cos (theta)
1
cot (theta) = ---------- , tan (theta) <> 0
tan (theta)
Ratio Identities
sin (theta)
tan (theta) = ------------ , cos (theta) <> 0
cos (theta)
cos (theta)
cot (theta) = ------------- , sin (theta) <> 0
sin (theta)
Pythagorean Identities
sin^2(theta) + cos^2(theta) = 1
1 + cot^2(theta) = csc^2(theta)
1 + tan^2(theta) = sec^2(theta)
Odd-even Identities
sin (-theta) = -sin (theta)
cos (-theta) = cos (theta)
tan (-theta) = -tan (theta)
csc (-theta) = csc (theta)
sec (-theta) = sec (theta)
cot (-theta) = -cot (theta)
Now proving them…well that's gonna take a lot of room but here it goes. I'm only
going to prove a few out of each category of identities so maybe you can figure out
the others. Lets start with the reciprocal. Well if the reciprocal of a number is simply
one divided by that number then we can look at cosecant (which is the reciprocal of
sine) as:
1
csc (theta) = ----- ----------------- >>> | If
you multiply the numerator and the denominator by "r" you
get:
/ y  |
|---- | < -- I hope you know | csc
(theta) = r/y < -- Just like we said before. We just
proved
 r / that is sine (theta) | an
identity...I'll let you do the rest of them...
Now the ratio identities. If you think of tangent as y/x , sine as y/r , and cosine as x/r
then check this out:
sin (theta) --- > y/r
y
tan (theta) = -------------- --- > ----- --- >
Multiply top and bottom by "r" and you're left with ---
> ---
cos (theta) --- > x/r
x
I'm going to save the proof for the Pythagorean Identities for another time. These
fundamental identities will help us prove much more complex identities later on.
Knowing trigonometric identities will help us understand some of the more abstract
things…at least they are abstract to me. Once I am finished with this I am going to
write another tutorial that will go into the somewhat more complex areas that I know
of and these fundamental things I have just talked about are required reading.
I was going to go over some laws that can be very useful but my study plan tells me
that I may not have provided enough information for you to understand it…therefore
that will be something coming in the next thing I write.
Closing thoughts
Well this concludes all the things that you will need to know before you start to do
more complicated things. I was a bit brief with some things so if you have any
questions or if you want me to go back and further explain something I implore you to
e-mail me and I will do my best to clear up any confusion. Also I want to reiterate that
this is a very basic introduction to trigonometry. I hope you were not expecting to
read this and learn all there is to know. Actually I have not really even mentioned
Flash or the possibilities yet…and quite honestly there is not really anything to work
with yet. However once I do start to mention Flash and the math that it will take to
create some of these effects everyone sees it will almost be just like a review. When
you sit down and want to write out a script it will be like merely translating
everything you learned about trigonometry from a piece of paper into actionscript.
If you want a little synopsis of what I plan on talking about in the next few things I
write here you go:
- Trigonometry curves
- More advanced look into trigonometry
- Programmatic movement using trigonometry
- Orchestrating it all into perfect harmony (pardon the cliché)
Well that's it for me…until next time.
Definition
The "mean", or "average", or "expected value" is the weighted sum of all possible
outcomes. The roll of two dice, for instance, has a mean of 7. Multiply 2 by 1/36, the
odds of rolling a 2. Multiply 3 by 2/36, the odds of rolling a 3. Do this for all
outcomes up to 12. Add them up, and the result is 7. Toss the dice 100 times and
the sum of all those throws is going to be close to 700, i.e. 100 times the expected
value of 7.
The mean need not be one of the possible outcomes. Toss one die, and the mean is
3.5, even though there is no single outcome with value 3.5. But toss the die 100
times and the sum of all those throws will be close to 350.
Given a continuous density function f(x), the expected value is the integral of x×f(x).
This is the limit of the discrete weighted sum described above.
Let's consider a pathological example. Let f(x) = 1/x2, from 1 to infinity. This is a
valid density function with integral equal to 1. What is its expected value? Multiply
by x to get 1/x, and integrate to get log(x). Evaluate log(x) at 1 and infinity, giving an
infinite expected value. Whatever the outcome, you can expect larger outcomes in
the future.
Add a constant c to each outcome, and you add c to the expected value. Prove this
for discrete and continuous density functions.
Similarly, scale the output by a constant c, and the mean is multiplied by c. This is
proved using integration by substitution.
The sum of two independent variables adds their means. This is intuitive, but takes
a little effort to prove. If f and g are the density functions of x and y, then the density
function for both variables is f(x)g(y). Multiply by x+y and take the integral over the
xy plane. Treat it as two integrals:
∫{ f(x)g(y)x } + ∫{ f(x)g(y)y }
The first integral becomes the mean of x times 1, and the second becomes 1 times
the mean of y. Hence the mean of the sum is the sum of the means.
Arithmetic and Geometric Mean
The arithmetic mean is the mean, as described above. If all values are positive, the
geometric mean is computed by taking logs, finding the arithmetic mean, and taking
the exponential. If there are just a few values, the same thing can be accomplished
by multiplying them together and taking the nth root. In the arithmetic mean, you
add up and divide by n; in the geometric mean, you multiply up and take the
nth root. The geometric mean of 21, 24, and 147 is 42.
The geometric mean is used when the log of a measurement is a better indicator (for
whatever reason) than the measurement itself. If we wanted to find, for example, the
"average" strength of a solar flare, we might use a geometric mean, because the
strength can vary by orders of magnitude. Of course, scientists usually develop
logarithmic scales for these phenomenon - such as the ricter scale, the decibel
scale, and so on. When logs are already implicit in the measurements we can return
to the arithmetic mean.
The Arithmetic Mean Exceeds the Geometric Mean
The average of 2, 5, 8, and 9 is 6, yet the geometric mean is 5.18. The geometric
mean always comes out smaller.
Let f be a differentiable function that maps the reals, or an unbroken segment of the
reals, into the reals. Let f′ be everywhere positive, and let f′′ be everywhere
negative. Let g be the inverse of f.
Let s be a finite set of real numbers with mean m. Apply f to s, take the average, and
apply g. The result is less than m, or equal to m if everything in s is equal to m.
When f = log(x), the relationship between the geometric mean and the arithmetic
mean is a simple corollary.
Shift f(x) up or down, so that f(m) = 0. Let v = f′(m). If x is a value in s less than m,
and if f were a straight line with slope v, f(x) would be v×(x-m). Actually f(x) has to be
smaller, else the mean value theorem implies a first derivative ≤ v, and a second
derivative ≥ 0. On the other side, when x is greater than m, similar reasoning shows
f(x) is less than v×(x-m). The entire curve lies below the line with slope v passing
through the origin.
If f was a line, f(s) would have a mean of 0. But for every x ≠ m, f(x) is smaller. This
pulls the mean below 0, and when we apply f inverse, the result lies below m.
If f′′ is everywhere positive then the opposite is true; the mean of the image of s in f
pulls back to a value greater than m.
All this can be extended to the average of a continuous function h(x) from a to b.
Choose riemann nets with regular spacing, and apply the theorem to the resulting
riemann sums. As the spacing approaches 0, the average remains ≤ m, and in the
limit, the average of f(h), pulled back through g, is no larger than the average of h.
If h is nonconstant the average through f comes out strictly smaller than the average
of h. You'll need uniform continuity, which is assured by continuity across the closed
interval [a,b]. The scaled riemann sums approach the average of f(h), and after a
while, the mean, according to each riemann sum, can be bounded below f(m). I'll
leave the details to you.
Variance and Standard Deviation
If the mean of a random variable is m, the variance is the sum or integral of f(x)(x-
m)2. To illustrate, let m = 0. The variance is now the weighted sum of the outcomes
squared. In other words, how far does the random variable stray from its mean? If
the variance is 0, the outcome is always zero. Any nonzero outcome produces a
positive variance.
Consider the example of throwing two dice. The average throw produces 7, so
subtract 7 from everything. Ten times out of 36 you get a 6 or an 8, giving 10/36×12,
or 10/36. Eight times out of 36 you get a 5 or a 9, so add in 8×4/36. Continue
through all possible rolls. When your done, the variance is 35/6.
Recall that (x-m)2 = x2-2mx+m2. This lets us compute both mean and variance in
one pass, which is helpfull if the data set is large. Add up f(x)×x, and f(x)×x2. The
former becomes m, the mean. The latter is almost the variance, but we must add
m2 times the sum of f(x) (which is m2), and subtract 2m times the sum of xf(x) (which
becomes 2m2). Hence the variance is the sum of f(x)x2, minus the square of the
mean.
The above is also true for continuous variables. The variance is the integral of
f(x)x2, minus the square of the mean. The proof is really the same as above.
Variance is a bit troublesome however, because the units are wrong. Let a random
variable indicate the height of a human being on earth. Height is measured in
meters, and the mean, the average height of all people, is also measured in meters.
Yet the variance, the variation of height about the mean, seems to be measured in
meters squared. To compensate for this, the standard deviation is the square root of
variance. Now we're back to meters again. If the average height is 1.7 meters, and
the standard deviation is 0.3 meters, we can be pretty sure that a person, chosen at
random, will be between 1.4 and 2.0 meters tall. How sure? We'll quantify that later
on. For now, the standard deviation gives a rough measure of the spread of a
random variable about its mean.
The Variance of the Sum
We showed that the mean of the sum of two random variables is the sum of the
individual means. What about variance?
Assume, without loss of generality, that mean(x) = mean(y) = 0. If x and y have
density functions f and g, the individual variances are the integrals of f(x)x2 and
g(y)y2, respectively. Taken together, the combined density function is f×g, and we
want to know the variance of x+y. Consider the following double integral.
∫∫f(x)g(y)(x+y)2 =
∫∫{ f(x)g(y)x2 } + ∫∫{ 2f(x)g(y)xy } + ∫∫{ f(x)g(y)y2 }
The first integral is the variance of x, and the third is the variance of y. The middle
integral is the mean of x times the mean of y, or zero. Therefore the variance of the
sum is the sum of the variances.
Reverse Engineering
If a random variable has a mean of 0 and a variance of 1, what can we say about it?
Not a whole lot. The outcome could be 0 most of the time, and on rare occasions, a
million. That gives a variance of 1. But for all practical purposes the "random"
variable is always 0. Alternatively, x could be ±1, like flipping a coin. This has mean
0 and variance 1, yet the outcome is never 0. Other functions produce values of 1/3,
0.737, sqrt(½), and so on. There's really no way to know.
We can however say something about the odds of finding x ≥ c, for c ≥ 1. Let |x|
exceed c with probability p. The area of the curve, beyond c, is p. This portion of
the curve contributes at least pc2 to the variance. Since this cannot exceed 1, the
probability of finding x beyond c is bounded by 1/c2.
Generalize the above proof to a random variable with mean m and standard
deviation s. If c is at least s, x is at least c away from m with probability at most
s2/c2.
The Mean is Your Best Guess
Let a random variable x have a density function f and a mean m. You would like to
predict the value of x, in a manner that minimizes error. If your prediction is t, the
error is defined as (x-t)2, i.e. the square of the difference between your prediction
and the actual outcome. What should you guess to minimize error?
The expected error is the integral of f(x)(x-t)2, from -infinity to +infinity. Write this as
three separate integrals:
error = ∫{ f(x)x2 } - ∫{ 2f(x)xt } + ∫{ f(x)t2 }
The first integral becomes a constant, i.e. it does not depend on t. The second
becomes -2mt, where m is the mean, and the third becomes t2. This gives a
quadratic in t. Find its minimum by setting its first derivative equal to 0. Thus t = m,
and the mean is your best guess. The expected error is the variance of f.
Custom Links:
What is Log?
Date: 26 Feb 1995 22:46:28 -0500
From: charley
Subject: Math questions
Hi,
My name is Yutaka Charley and I'm in the 5th grade at PS150Q
in NYC.
What's 4 to the half power?
What does log mean?
Thank you.
Yutaka
Date: 27 Feb 1995 21:54:12 -0500
From: Dr. Ken
Subject: Re: Math questions
Hello there!
I'll address your second question, the one about Logs; and my
colleague and buddy Ethan has promised to answer your first
question,
the one about 4 to the 1/2 power.
Here's the definition of Log:
b
If a = x, then Log (x) = b.
a
When you read that, you say "if a to the b power equals x,
then the Log
(or Logarithm) to the base a of x equals b." Log is short for
the word
Logarithm. Here are a couple of examples: Since 2^3 = 8, Log
(8) = 3.
2
For the rest of this letter we will use ^ to represent
exponents -
2^3 means 2 to the third power.
To find out what Log (25) is, we'd ask ourselves "what power
do you raise 5
5
to to get 25?" Since 5^2 = 25, the answer to this one is 2.
So the
Logarithm to the base 5 of 25 is 2.
Whenever you talk about a Logarithm, you have to say what base
you're
talking about. For instance, the Logarithm to the base 3 of
81 is 4, but
the Logarithm to the base 9 of 81 is 2.
Here are a couple of examples that you can try to figure out:
What is the
Logarithm to the base 2 of 16? What is the Logarithm to the
base 7 of 343?
How would you express the information, 4^3 = 64, in terms of
Logarithms?
_______________
Now that you have done Logarithms I will take over for my
buddy
Ken and talk about fractional exponents.
To help explain fractional exponents I need to teach you one
neat
fact about exponents:
3^4 times 3^5 equals 3^(4+5) or 3^9
This will be very important so I will show a few more
examples.
4^7 times 4^10 equals 4^17
5^2 times 5^6 equals 5^8
Now let's get to fractional exponents. Let's start with
9^(1/2).
We know from our adding rule that 9^(1/2) times 9^(1/2) is
9^(1/2 + 1/2),
which is 9^1; so whatever 9^(1/2) is, we know that it times
itself has to
equal nine. But what times itself equals 9? Well 3, so
9^(1/2) is 3.
All fractional exponents work this way. Lets look at 8^(1/3).
Again,
8^(1/3) times 8^(1/3) times 8^(1/3) is 8^(1/3 + 1/3 + 1/3),
which is
8; so we need to know what times itself three times is 8.
That is 2.
So now look at your problem, 4^(1/2). We know from experience
that this means what number times itself is 4? That is 2, so
4^(1/2)
equals 2.
Geometrical Meaning of Matrix Multiplication
Definitions of 'matrix' Wordnet
1. (noun) matrix
(mathematics) a rectangular array of quantities or expressions set out by rows and
columns; treated as a singleelement and manipulated according to rules
2. (noun) matrix
(geology) amass of fine-grained rock in which fossils, crystals, or gems are
embedded
3. (noun) matrix
an enclosure within which something originates or develops (from the Latin for
womb)
4. (noun) matrix, intercellular substance, ground substance
the body substance in which tissue cells are embedded
5. (noun) matrix
the formative tissue at the base of a nail
6. (noun) matrix
mold used in the production of phonograph records, type, or other relief surface
Definitions of 'matrix' Webster 1913 Dictionary
1. (noun) matrix
the womb
2. (noun) matrix
hence, that which gives form or origin to anything
3. (noun) matrix
the cavity in which anything is formed, and which gives it shape; a die; a mold, as
for the face of a type
4. (noun) matrix
the earthy or stony substance in which metallic ores or crystallized minerals are
found; the gangue
5. (noun) matrix
the five simple colors, black, white, blue, red, and yellow, of which all the rest are
composed
6. (noun) matrix
the lifeless portion of tissue, either animal or vegetable, situated between the cells;
the intercellular substance
7. (noun) matrix
a rectangular arrangement of symbols in rows and columns. The symbols
may express quantities or operations
Definitions of 'matrix' The New Hacker's Dictionary
1. matrix
[FidoNet]
1. What the Opus BBS software and sysops call FidoNet.
2. Fanciful term for a cyberspace expected to emerge from current networking
experiments (see the network). The name of the
rather good 1999 cypherpunk movie The Matrix played on this sense, which
however had been established for years before.
3. The totality of present-day computer networks (popularized in
this sense by John Quarterman; rare outsideacademic literature).
Matrix multiplication is a versatile tool for many aspects of scientific or technical
methods. One particular application of matrix multiplication is the transformation of
data in n-dimensional space. Data can be scaled, shifted, rotated, or distorted by a
simple matrix multiplication. In order to achieve all these operations by a single
transformation matrix, the original data has to be augmented by an additional
constant value (preferably 1). In order to see the effects of matrix multiplication, you
can start the following interactive example .
Example: transformation of two-dimensional points. Suppose you have seven data
points in two dimensions (x, and y). These seven data points have to be submitted to
various transformation operations. Therefore we first augment the data points,
denoted by [xi,yi], with a constant value, resulting in the point vectors [xi,yi,1].
For performing the various transformations, we simply have to adjust the
transformation matrix.
Shift
The coordinates of the data points are shifted by the
vector [t1,t2]
Scaling The points are scaled by the factor s.
Scaling only the
y coordinate
Here, only the y coordinates are scaled according to the
factor s.
Rotation
A rotation of all points around the origin can be
accomplished by using the sines and cosines of the
rotation angle (remember the negative sign for the first
sine term).
The ordinary matrix product is the most often used and the most important way to
multiply matrices. It is defined between two matrices only if the width of the first
matrix equals the height of the second matrix. Multiplying an m×n matrix with
an n×p matrix results in an m×p matrix. If many matrices are multiplied together, and
their dimensions are written in a list in order, e.g. m×n, n×p, p×q, q×r, the size of the
result is given by the first and the last numbers (m×r), and the values surrounding
each comma must match for the result to be defined. The ordinary matrix product is
not commutative:
The element x3,4 of the above matrix product is computed as follows
The first coordinate in matrix notation denotes the row and the second the column;
this order is used both in indexing and in giving the dimensions. The element at
the intersection of row i and columnj of the product matrix is the dot product (or
scalar product) of row i of the first matrix and column j of the second matrix. This
explains why the width and the height of the matrices being multiplied must match:
otherwise the dot product is not defined.
The figure to the right illustrates the product of two matrices A and B, showing how
each intersection in the product matrix corresponds to a row of Aand a column of B.
The size of the output matrix is always the largest possible, i.e. for each row of A and
for each column of B there are always corresponding intersections in the product
matrix. The product matrix AB consists of all combinations of dot products of rows
of A and columns of B.
The values at the intersections marked with circles are:
Formal definition
Formally, for
,
for some field F, then
where the elements of AB are given by
for each pair i and j with 1 ≤ i ≤ m and 1 ≤ j ≤ p. The algebraic system of "matrix units"
summarizes the abstract properties of this kind of multiplication.
Relationship with the inner product and the outer product
The Euclidean inner product and outer product are the simplest special cases of the
ordinary matrix product. The inner product of two column
vectors A and B is , where T denotes the matrix transpose. More
explicitly,
The outer product is , where
Matrix multiplication can be viewed in terms of these two operations by considering
how the matrix product works on block matrices.
Decomposing A into row vectors and B into column vectors:
where
The method in the introduction was:
This is an outer product where the real product inside is replaced with the inner
product. In general, block matrix multiplication works exactly like ordinary matrix
multiplication, but the real product inside is replaced with the matrix product.
An alternative method results when the decomposition is done the other way around
(A decomposed into column vectors and B into row vectors):
This method emphasizes the effect of individual column/row pairs on the result,
which is a useful point of view with e.g. covariance matrices, where each such pair
corresponds to the effect of a single sample point. An example for a small matrix:
One more useful decomposition results when B is decomposed intocolumns and A is
left unrecompensed. Then A is seen to act separately on each column
of B, transforming them in parallel. Conversely,B acts separately on each row of A.
If x is a vector and A is decomposed into columns,
then . The column vectors of A give directions
and units for coordinate axes and the elements of x are coordinates on the
corresponding axes. is then the vector which has thos
Properties
 Matrix multiplication is not generally commutative
 If A and B are both n x n matrices, the determinant of their product is
independent of the order of the matrices in the product.
 If both matrices are diagonal square matrices of the same dimension,
their product is commutative.
 If A is a matrix representative of a linear transformation L and B is a
matrix representative of a linear transformation P then AB is a matrix
representative of a linear transform P followed by the linear
transformation L.
 Matrix multiplication is associative:
 Matrix multiplication is distributive over matrix addition:
.
 If the matrix is defined over a field (for example, over the
Real or Complex fields), then it is compatible with scalar
multiplication in that field.
where c is a scalar.
Algorithms for efficient matrix multiplication
The running time of square matrix multiplication, if carried out naively, is O(n3
). The
running time for multiplying rectangular matrices (one m×p-matrix with one p×n-
matrix) is O(mnp). But more efficient algorithms do exist. Strassen's algorithm,
devised by Volker Strassen in 1969 and often referred to as "fast matrix
multiplication", is based on a clever way of multiplying two 2 × 2 matrices which
requires only 7 multiplications (instead of the usual 8), at the expense of several
additional addition and subtraction operations. Applying this trick recursively gives an
algorithm with a multiplicative cost of . Strassen's algorithm
is awkward to implement, compared to the naive algorithm, and it lacks numerical
stability. Nevertheless, it is beginning to appear in libraries such asBLAS, where it is
computationally interesting for matrices with dimensions n > 100[1]
, and is very useful
for large matrices over exact domains such as finite fields, where numerical stability
is not an issue.
The currently O(nk
) algorithm with the lowest known exponent k is the Coppersmith–
Winograd algorithm. It was presented by Don Coppersmith and Shmuel Winograd in
1990, has an asymptotic complexity of O(n2.376
). It is similar to Strassen's algorithm: a
clever way is devised for multiplying two k × k matrices with fewer
than k3
multiplications, and this technique is applied recursively. However, the
constant coefficient hidden by the Big O Notation is so large that the Coppersmith–
Winograd algorithm is only worthwhile for matrices that are too large to handle on
present-day computers.[2]
Since any algorithm for multiplying two n × n matrices has to process all 2 × n²
entries, there is an asymptotic lower bound of ω(n2
) operations. Raz (2002) proves a
lower bound of Ω(m2
logm) for bounded coefficient arithmetic circuits over the real or
complex numbers.
Cohn et al. (2003, 2005) put methods, such as the Strassen and Coppersmith–
Winograd algorithms, in an entirely different group-theoretic context. They show that
if families of wreath products of Abelian with symmetric groups satisfying certain
conditions exist, then there are matrix multiplication algorithms with essential
quadratic complexity. Most researchers believe that this is indeed the case[3]
- a
lengthy attempt at proving this was undertaken by the late Jim Eve.[4]
Because of the nature of matrix operations and the layout of matrices in memory, it is
typically possible to gain substantial performance gains through use
of parallelisation and vectorization. It should therefore be noted that some lower
time-complexity algorithms on paper may have indirect time complexity costs on real
machines.
Relationship to linear transformations
Matrices offer a concise way of representing linear transformations between vector
spaces, and (ordinary) matrix multiplication corresponds to the composition of linear
transformations. This will be illustrated here by means of an example using three
vector spaces of specific dimensions, but the correspondence applies equally to any
other choice of dimensions.
Let X, Y, and Z be three vector spaces, with dimensions 4, 2, and 3, respectively, all
over the same field, for example the real numbers. The coordinates of a point
in X will be denoted as xi, for i = 1 to 4, and analogously for the other two spaces.
Two linear transformations are given: one from Y to X, which can be expressed by
the system of linear equations
and one from Z to Y, expressed by the system
These two transformations can be composed to obtain a transformation from Z to X.
By substituting, in the first system, the right-hand sides of the equations of the
second system for their corresponding left-hand sides, the xi can be expressed in
terms of the zk:
These three systems can be written equivalently in matrix–vector notation – thereby
reducing each system to a single equation – as follows:
These three systems can be written equivalently in matrix–vector notation – thereby
reducing each system to a single equation – as follows:
Representing these three equations symbolically and more concisely as
inspection of the entries of matrix C reveals that C = AB.
This can be used to formulate a more abstract definition of matrix multiplication,
given the special case of matrix–vector multiplication: the product AB of
matrices A and B is the matrix C such that for all vectors z of the appropriate
shape Cz = A(Bz).
Scalar multiplication
The scalar multiplication of a matrix A = (aij) and a scalar r gives a product r A of the
same size as A. The entries of r A are given by
For example, if
then
If we are concerned with matrices over a more general ring, then the above
multiplication is the left multiplication of the matrix A with scalar p while the right
multiplication is defined to be
When the underlying ring is commutative, for example, the real or complex number
field, the two multiplications are the same. However, if the ring is not commutative,
such as the quaternions, they may be different. For example
Hadamard product
See also: (Function) pointwise product
For two matrices of the same dimensions, we have the Hadamard product (named
after French mathematician Jacques Hadamard), also known as the entrywise
product and the Schur product.[5]
Formally, for two matrices of the same dimensions:
the Hadamard product A · B is a matrix of the same dimensions
with elements given by
Note that the Hadamard product is a submatrix of the Kronecker product.
The Hadamard product is commutative.
The Hadamard product appears in lossy compression algorithms such as JPEG.
Kronecker product
Main article: Kronecker product
For any two arbitrary matrices A and B, we have the direct product
or Kronecker product A ⊗ B defined as
If A is an m-by-n matrix and B is a p-by-q matrix, then their Kronecker
product A ⊗ B is an mp-by-nq matrix.
The Kronecker product is not commutative.
For example
If A and B represent linear transformations V1 → W1 and V2 → W2, respectively,
then A ⊗ B repres
Common properties
If A, B and C are matrices with appropriate dimensions defined over
a field (e.g. ) where c is a scalar in that field, then for all three types of
multiplication:
 Matrix multiplication is associative:
 Matrix multiplication is distributive:
 Matrix multiplication is compatible with scalar multiplication:
 Note that matrix multiplication is not commutative:
although, the order of multiplication can be reversed
by transposing the matrices:
The Frobenius inner product, sometimes denoted A:B is the component-wise inner
product of two matrices as though they are vectors. In other words, it is the sum of
the entries of the Hadamard product, that is,
This inner product induces the Frobenius norm.
Square matrices can be multiplied by themselves repeatedly in the same way that
ordinary numbers can. This repeated multiplication can be described as a power of
the matrix. Using the ordinary notion of matrix multiplication, the following identities
hold for an n-by-n matrix A, a positive integer k, and a scalar c:
The naive computation of matrix powers is to multiply k times the matrix A to the
result, starting with the identity matrix just like the scalar case. This can be improved
using the binary representation of k, a method commonly used to scalars. An even
better method is to use the eigenvalue decomposition of A.
Calculating high powers of matrices can be very time-consuming, but the complexity
of the calculation can be dramatically decreased by using the Cayley-Hamilton
theorem, which takes advantage of an identity found using the matrices'
characteristic polynomial and gives a much more effective equation for Ak
, which
instead raises a scalar to the required power, rather than a matrix.
Powers of diagonal matrices
The power, k, of a diagonal matrix A, is the diagonal matrix whose diagonal entries
are the k powers of the original matrix A.
When raising an arbitrary matrix (not necessarily a diagonal matrix) to a power, it is
often helpful to diagonalize the matrix first.
The Weighted Matrix Product
The Weighted Matrix Product, Weighted Matrix Multiplication is a generalization of
ordinary matrix multiplication, in the following way.
Given a set of Weight Matrices, the Weighted Matrix Product of the
matrix pair is given by:
,
where: c(A) is the number of columns of
The number of Weight Matrices is:
the number of columns of the left operand = the number of rows of the right operand
The number of rows of the Weight Matrices is:
number of rows of the left operand.
The number of columns of the Weight Matrices is:
the number of columns of the right operand.
The Weighted Matrix Product is defined only if the matrix operands are conformable
in the ordinary sense.
The resultant matrix has the number of rows of the left operand , and the number
of columns of the right operand .
NOTE:
Ordinary Matrix Multiplication is the special case of Weighted Matrix Multiplication,
where all the weight matrix entries are 1s .
Ordinary Matrix Multiplication is Weighted Matrix Multiplication in a default "sea of 1s
", the weight matrices formed out of the "sea" as necessary.
NOTE:
The Weighted Matrix Product is not generally associative:
Weighted matrix multiplication may be expressed in terms of ordinary matrix
multiplication, using matrices constructed from the constituent parts, as follows:
for mxp matrix: , and pxn matrix: ,
define:
then:
The Weighted Matrix product is especially useful in developing matrix bases closed
under a (not necessarily associative) product (algebras).
As an example, consider the following developments: It is convenient (although not
necessary) to begin with permutation matrices as the basis; since they are a known
basis and about as simple as there is.
the complex plane
is weighted matrix multiplication,
with weights:
then:
then:
So:
Thus, is manifested a homomorphism between this and thecomplex plane.
Quaternions
is weighted matrix multiplication,
with weights:
then
*
Multiplying a 2 × 3 matrix by a 3 × 4 matrix is possible and it gives a 2 × 4 matrix as
the answer.
Multiplying a 7 × 1 matrix by a 1 × 2 matrix is okay; it gives a 7 × 2 matrix
A 4 × 3 matrix times a 2 × 3 matrix is NOT possible.
How to Multiply 2 Matrices
We use letters first to see what is going on. We'll see a numbers example after.
As an example, let's take a general 2 × 3 matrix multiplied by a 3 × 2 matrix.
The answer will be a 2 × 2 matrix.
We multiply and add the elements as follows. We work across the 1st row of the first
matrix, multiplying down the 1st column of the second matrix, element by element.
Weadd the resulting products. Our answer goes in position a11 (top left) of the
answer matrix.
We do a similar process for the 1st row of the first matrix and the 2nd column of the
second matrix. The result is placed in position a12.
Now for the 2nd row of the first matrix and the 1st column of the second matrix. The
result is placed in position a21.
Finally, we do the 2nd row of the first matrix and the 2nd column of the second
matrix. The result is placed in position a22.
So the result of multiplying our 2 matrices is as follows:
Now let's see a number example.
Example
Multiply:
Answer
Multiplying 2 × 2 Matrices
The process is the same for any size matrix. We multiply across rows of the first
matrix and down columns of the second matrix, element by element. We then add
the products:
In this case, we multiply a 2 × 2 matrix by a 2 × 2 matrix and we get a 2 × 2 matrix as
the result.
Example
Multiply:
Answer
Here is a (2×2)×(2×2) example in LiveMath.
LIVEMath
Let's look at another example. This time we have (3×3)×(3×1).
LIVEMath
Flash Interactive
Here's a Flash movie to play with. It will generate many different sized (up to 5 by 5)
matrices with different random numbers each time. You can see plenty of examples
of matrix operations, including adding, subtracting and multiplying.
You can step through each calculation involved. You can do this by clicking the
"step" button which appears.
Suggestion: Work out the answer yourself first, then check your answer against
what it says. Never just copy!
Matrices and Systems of Simultaneous Linear Equations
We now see how to write a system of linear equations using matrix multiplication.
Example:
The system of equations
can be written as:
Matrices are ideal for computer-driven solutions of problems because computers
easily form arrays. We can leave out the algebraic symbols. A computer only
requires the first and last matrices to solve the system, as we will see in Matrices and
Linear Equations.
Note 1 - Notation
Care with writing matrix multiplication.
The following expressions have different meanings:
AB is matrix multiplication
A×B cross product, which returns a vector
A*B used in computer notation, but not on paper
A•B dot product, which returns a scalar.
[See the Vector chapter for more information on vector and scalar quantities.]
Note 2 - Commutativity of Matrix Multiplication
Does AB = BA?
Let's see if it is true using an example.
Example
If
and
find AB and BA.
Answer
In general, when multiplying matrices, the commutative law doesn't hold,
i.e. AB ≠ BA. There are two common exceptions to this:
• The identity matrix: IA = AI = A.
• The inverse of a matrix: A-1
A = AA-1
= I.
In the next section we learn how to find the inverse of a matrix.
Example - Multiplying by the Identity Matrix
Given that
find AI.
Answer
Exercises
1. If possible, find BA and AB.
Answer
2. Determine if B = A-1
.
Answer
3. In studying the motion of electrons, one of the Pauli spin matrices is
where
Show that s2
= I.
[If you have never seen j before, go to the section on complex numbers].
Answer
4. Evaluate the following matrix multiplication which is used in directing the motion of
a robotic mechanism.
Answer
Diagonal matrix
From Wikipedia, the free encyclopedia
In linear algebra, a diagonal matrix is a square matrix in which the entries outside
the main diagonal (↘) are all zero. The diagonal entries themselves may or may not
be zero. Thus, the matrix D = (di,j) with n columns and n rows is diagonal if:
For example, the following matrix is diagonal:
The term diagonal matrix may sometimes refer to a rectangular diagonal matrix,
which is an m-by-n matrix with only the entries of the form di,i possibly non-zero; for
example,
, or
However, in the remainder of this article we will consider only square matrices.
Any diagonal matrix is also a symmetric matrix. Also, if the entries come from
the field R or C, then it is a normal matrix as well.
Equivalently, we can define a diagonal matrix as a matrix that is
both upper- and lower-triangular.
The identity matrix In and any square zero matrix are diagonal. A one-dimensional
matrix is always diagonal.
Scalar matrix
A diagonal matrix with all its main diagonal entries equal is a scalar matrix, that is, a
scalar multiple λI of the identity matrix I. Its effect on a vector is scalar
multiplication by λ. For example, a 3×3 scalar matrix has the form:
The scalar matrices are the center of the algebra of matrices: that is, they are
precisely the matrices that commute with all other square matrices of the same
size.
For an abstract vector space V (rather than the concrete vector space Kn
), or
more generally a module M over a ring R, with the endomorphism
algebra End(M) (algebra of linear operators on M) replacing the algebra of
matrices, the analog of scalar matrices are scalar transformations. Formally,
scalar multiplication is a linear map, inducing a map (send a
scalar λ to the corresponding scalar transformation, multiplication by λ) exhibiting
End(M) as a R-algebra. For vector spaces, or more generally free
modules , for which the endomorphism algebra is isomorphic to a
matrix algebra, the scalar transforms are exactly the center of the endomorphism
algebra, and similarly invertible transforms are the center of the general linear
group GL(V), where they are denoted by Z(V), follow the usual notation for the
center.
Matrix operations
The operations of matrix addition and matrix multiplication are especially simple for
diagonal matrices. Write diag(a1,...,an) for a diagonal matrix whose diagonal entries
starting in the upper left corner area1,...,an. Then, for addition, we have
diag(a1,...,an) + diag(b1,...,bn) = diag(a1+b1,...,an+bn)
and for matrix multiplication,
diag(a1,...,an) · diag(b1,...,bn) = diag(a1b1,...,anbn).
The diagonal matrix diag(a1,...,an) is invertible if and only if the entries a1,...,an are all
non-zero. In this case, we have
diag(a1,...,an)-1
= diag(a1
-1
,...,an
-1
).
In particular, the diagonal matrices form a subring of the ring of all n-by-n matrices.
Multiplying an n-by-n matrix A from the left with diag(a1,...,an) amounts to multiplying
the i-th row of A by ai for all i; multiplying the matrix A from the right with
diag(a1,...,an) amounts to multiplying the i-thcolumn of A by ai for all i.
Background
The variance of a random variable or distribution is the expectation, or mean, of the
squared deviation of that variable from its expected value or mean. Thus the
variance is a measure of the amount of variation within the values of that variable,
taking account of all possible values and their probabilities or weightings (not just the
extremes which give the range). For example, a perfect die, when thrown, has
expected value (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5, expected absolute deviation 1.5 (the
mean of the equally likely absolute deviations (3.5 − 1, 3.5 − 2, 3.5 − 3, 4 − 3.5,
5 − 3.5, 6 − 3.5), giving 2.5, 1.5, 0.5, 0.5, 1.5, 2.5), but expected square deviation or
variance of 17.5/6 ≈ 2.9 (the mean of the equally likely squared deviations 2.52
, 1.52
,
0.52
, 0.52
, 1.52
, 2.52
).
As another example, if a coin is tossed twice, the number of heads is: 0 with
probability 0.25, 1 with probability 0.5 and 2 with probability 0.25. Thus the variance
is 0.25 × (0 − 1)2
+ 0.5 × (1 − 1)2
+ 0.25 × (2 − 1)2
= 0.25 + 0 + 0.25 = 0.5. (Note that
in this case, where tosses of coins are independent, the variance is additive, i.e., if
the coin is tossed n times, the variance will be 0.25n.)
Unlike expected deviation, the variance of a variable has units that are the square of
the units of the variable itself. For example, a variable measured in inches will have a
variance measured in square inches. For this reason, describing data sets via
their standard deviation or root mean square deviation is often preferred over
variance. In the dice example the standard deviation is √(17.5/6) ≈ 1.7, slightly larger
than the expected deviation of 1.5.
The standard deviation and the expected deviation can both be used as an indicator
of the "spread" of a distribution. The standard deviation is more amenable to
algebraic manipulation, and, together with variance and its
generalization covariance is used frequently in theoretical statistics; however the
expected deviation tends to be more robust as it is less sensitive to outliers arising
from measurement anomalies or an unduly heavy-tailed distribution.
Real-world distributions such as the distribution of yesterday’s rain throughout the
day are typically not fully known, unlike the behavior of perfect dice or an ideal
distribution such as the normal distribution, because it is impractical to account for
every raindrop. Instead one estimates the mean and variance of the whole
distribution as the computed mean and variance of n samples drawn suitably
randomly from the whole sample space, in this example yesterday’s rainfall.
This method of estimation is close to optimal, with the caveat that it underestimates
the variance by a factor of (n−1)/n (when n = 1 the variance of a single sample is
obviously zero regardless of the true variance), a bias which should be corrected for
when n is small. If the mean is determined in some other way than from the same
samples used to estimate the variance then this bias does not arise and the variance
can safely be estimated as that of the samples.
The variance of a real-valued random variable is its second central moment, and it
also happens to be its second cumulant. Just as some distributions do not have a
mean, some do not have a variance. The mean exists whenever the variance exists,
but not vice versa.
Definition
If a random variable X has the expected value (mean) μ = E[X], then the variance
of X is given by:
This definition encompasses random variables that are discrete, continuous, or
neither. It can be expanded as follows:
The variance of random variable X is typically designated as Var(X), , or simply
σ2
(pronounced “sigma squared”). If a distribution does not have an expected value,
as is the case for the Cauchy distribution, it does not have a variance either. Many
other distributions for which the expected value does exist do not have a finite
variance because the relevant integral diverges. An example is a Pareto
distribution whose index k satisfies 1 < k ≤ 2.
[edit]Continuous case
If the random variable X is continuous with probability density function f(x),
where
and where the integrals are definite integrals taken for x ranging over the range of X.
[edit]Discrete case
If the random variable X is discrete with probability mass function x1 ↦ p1, ..., xn ↦ pn,
then
where
.
(When such a discrete weighted variance is specified by weights whose sum is
not 1, then one divides by the sum of the weights.) That is, it is the expected value of
the square of the deviation of X from its own mean. In plain language, it can be
expressed as “The mean of the square of the deviation of each data point from the
average”. It is thus the mean squared deviation.
Examples
Exponential distribution
The exponential distribution with parameter λ is a continuous distribution whose
support is the semi-infinite interval [0,∞). Its probability density function is given by:
and it has expected value μ = λ−1
. Therefore the variance is equal to:
So for an exponentially distributed random variable σ2
= μ2
.
[edit]Fair dice
A six-sided fair die can be modelled with a discrete random variable with outcomes 1
through 6, each with equal probability 1
/6. The expected value is
(1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5. Therefore the variance can be computed to be:
Properties
Variance is non-negative because the squares are positive or zero. The variance of
a constant random variable is zero, and the variance of a variable in a data set is 0 if
and only if all entries have the same value.
Variance is invariant with respect to changes in a location parameter. That is, if a
constant is added to all values of the variable, the variance is unchanged. If all
values are scaled by a constant, the variance is scaled by the square of that
constant. These two properties can be expressed in the following formula:
The variance of a finite sum of uncorrelated random variables is equal to the sum of
their variances. This stems from the identity:
and that for uncorrelated variables covariance is zero.
In general, for the sum of N variables: , we have:
Suppose that the observations can be partitioned into equal-
sized subgroups according to some second variable. Then the variance of the total
group is equal to the mean of the variances of the subgroups plus the variance of the
means of the subgroups. This property is known as variance decomposition or
the law of total variance and plays an important role in the analysis of variance. For
example, suppose that a group consists of a subgroup of men and an equally large
subgroup of women. Suppose that the men have a mean body length of 180 and that
the variance of their lengths is 100. Suppose that the women have a mean length of
160 and that the variance of their lengths is 50. Then the mean of the variances is
(100 + 50) / 2 = 75; the variance of the means is the variance of 180, 160 which is
100. Then, for the total group of men and women combined, the variance of the body
lengths will be 75 + 100 = 175. Note that this uses N for the denominator instead of
N − 1.
In a more general case, if the subgroups have unequal sizes, then they must be
weighted proportionally to their size in the computations of the means and variances.
The formula is also valid with more than two groups, and even if the grouping
variable is continuous.
This formula implies that the variance of the total group cannot be smaller than the
mean of the variances of the subgroups. Note, however, that the total variance is not
necessarily larger than the variances of the subgroups. In the above example, when
the subgroups are analyzed separately, the variance is influenced only by the man-
man differences and the woman-woman differences. If the two groups are combined,
however, then the men-women differences enter into the variance also.
Many computational formulas for the variance are based on this equality: The
variance is equal to the mean of the squares minus the square of the mean. For
example, if we consider the numbers 1, 2, 3, 4 then the mean of the squares is (1 ×
1 + 2 × 2 + 3 × 3 + 4 × 4) / 4 = 7.5. The regular mean of all four numbers is 2.5, so
the square of the mean is 6.25. Therefore the variance is 7.5 − 6.25 = 1.25, which is
indeed the same result obtained earlier with the definition formulas. Many pocket
calculators use an algorithm that is based on this formula and that allows them to
compute the variance while the data are entered, without storing all values in
memory. The algorithm is to adjust only three variables when a new data value is
entered: The number of data entered so far (n), the sum of the values so far (S), and
the sum of the squared values so far (SS). For example, if the data are 1, 2, 3, 4,
then after entering the first value, the algorithm would have n = 1, S = 1 and SS = 1.
After entering the second value (2), it would have n = 2, S = 3 and SS = 5. When all
data are entered, it would have n = 4, S = 10 and SS = 30. Next, the mean is
computed as M = S / n, and finally the variance is computed as SS / n − M × M. In
this example the outcome would be 30 / 4 − 2.5 × 2.5 = 7.5 − 6.25 = 1.25. If the
unbiased sample estimate is to be computed, the outcome will be multiplied by n /
(n − 1), which yields 1.667 in this example.
Sum of uncorrelated variables (Bienaymé formula)
One reason for the use of the variance in preference to other measures of dispersion
is that the variance of the sum (or the difference) of uncorrelated random variables is
the sum of their variances:
This statement is called the Bienaymé formula.[1]
and was discovered in 1853. It is
often made with the stronger condition that the variables are independent, but
uncorrelatedness suffices. So if all the variables have the same variance σ2
, then,
since division by n is a linear transformation, this formula immediately implies that
the variance of their mean is
That is, the variance of the mean decreases when n increases. This formula for the
variance of the mean is used in the definition of the standard error of the sample
mean, which is used in the central limit theorem.
[edit]Sum of correlated variables
In general, if the variables are correlated, then the variance of their sum is the sum of
their covariances:
(Note: This by definition includes the variance of each variable, since
Cov(X,X) = Var(X).)
Here Cov is the covariance, which is zero for independent random variables (if it
exists). The formula states that the variance of a sum is equal to the sum of all
elements in the covariance matrix of the components. This formula is used in the
theory of Cronbach's alpha in classical test theory.
So if the variables have equal variance σ2
and the average correlation of distinct
variables is ρ, then the variance of their mean is
This implies that the variance of the mean increases with the average of the
correlations. Moreover, if the variables have unit variance, for example if they are
standardized, then this simplifies to
This formula is used in the Spearman-Brown prediction formula of classical test
theory. This converges to ρ if n goes to infinity, provided that the average correlation
remains constant or converges too. So for the variance of the mean of standardized
variables with equal correlations or converging average correlation we have
Therefore, the variance of the mean of a large number of standardized variables is
approximately equal to their average correlation. This makes clear that the sample
mean of correlated variables does generally not converge to the population mean,
even though the Law of large numbers states that the sample mean will converge for
independent variables.
[edit]Weighted sum of variables
The scaling property and the Bienaymé formula, along with this property from
the covariance page: Cov(aX, bY) = ab Cov(X, Y) jointly imply that
This implies that in a weighted sum of variables, the variable with the largest weight
will have a disproportionally large weight in the variance of the total. For example,
if X and Y are uncorrelated and the weight of X is two times the weight of Y, then the
weight of the variance of X will be four times the weight of the variance of Y.
Decomposition
The general formula for variance decomposition or the law of total variance is:
If X and Y are two random variables and the variance of X exists, then
Here, E(X|Y) is the conditional expectation of X given Y, and Var(X|Y) is
the conditional variance of X given Y. (A more intuitive explanation is that given a
particular value of Y, then X follows a distribution with mean E(X|Y) and variance
Var(X|Y). The above formula tells how to find Var(X) based on the distributions of
these two quantities when Y is allowed to vary.) This formula is often applied
in analysis of variance, where the corresponding formula is
SSTotal = SSBetween + SSWithin.
It is also used in linear regression analysis, where the corresponding formula is
SSTotal = SSRegression + SSResidual.
This can also be derived from the additivity of variances, since the total (observed)
score is the sum of the predicted score and the error score, where the latter two are
uncorrelated.
Computational formula
Main article: computational formula for the variance
See also: algorithms for calculating variance
The computational formula for the variance follows in a straightforward manner
from the linearity of expected values and the above definition:
This is often used to calculate the variance in practice, although it suffers
from catastrophic cancellation if the two components of the equation are similar in
magnitude.
Characteristic property
The second moment of a random variable attains the minimum value when taken
around the first moment (i.e., mean) of the random variable,
i.e. . Conversely, if a continuous function
satisfies for all random variables X, then it is
necessarily of the form , where a > 0. This also holds in the
multidimensional case.[2]
[edit]Calculation from the CDF
The population variance for a non-negative random variable can be expressed in
terms of the cumulative distribution function F using
where H(u) = 1 − F(u) is the right tail function. This expression can be used to
calculate the variance in situations where the CDF, but not the density, can be
conveniently expressed.
[edit]Approximating the variance of a function
The delta method uses second-order Taylor expansions to approximate the variance
of a function of one or more random variables: see Taylor expansions for the
moments of functions of random variables. For example, the approximate variance of
a function of one variable is given by
provided that f is twice differentiable and that the mean and variance of X are finite.
[citation needed]
[edit]Population variance and sample variance
In general, the population variance of a finite population of size N is given by
where
is the population mean.
In many practical situations, the true variance of a population is not known a
priori and must be computed somehow. When dealing with extremely large
populations, it is not possible to count every object in the population.
A common task is to estimate the variance of a population from a sample.[3]
We take
a sample with replacement of n values y1, ..., yn from the population, where n < N,
and estimate the variance on the basis of this sample. There are several good
estimators. Two of them are well known:
and
[4]
Both are referred to as sample variance. Here,
denotes the sample mean:
The two estimators only differ slightly as can be seen, and for larger values of
the sample size n the difference is negligible. While the first one may be seen as the
variance of the sample considered as a population, the second one is the unbiased
estimator of the population variance, meaning that its expected value E[s2
] is equal
to the true variance of the sampled random variable; the use of the term n − 1 is
called Bessel's correction. The sample variance with n − 1 is a U-statistic for the
function ƒ(x1, x2) = (x1 − x2)2
/2, meaning that it is obtained by averaging a 2-sample
statistic over 2-element subsets of the population.
While,
Distribution of the sample variance
Being a function of random variables, the sample variance is itself a random
variable, and it is natural to study its distribution. In the case that yi are independent
observations from a normal distribution,Cochran's theorem shows that s2
follows a
scaled chi-square distribution:
As a direct consequence, it follows that E(s2
) = σ2
.
If the yi are independent and identically distributed, but not necessarily normally
distributed, then
where κ is the kurtosis of the distribution. If the conditions of the law of large
numbers hold, s2
is a consistent estimator of σ2
.
[edit]Generalizations
If X is a vector-valued random variable, with values in , and thought of as a
column vector, then the natural generalization of variance
is , where and is the transpose of X, and
so is a row vector. This variance is a positive semi-definite square matrix, commonly
referred to as the covariance matrix.
If X is a complex-valued random variable, with values in , then its variance
is , where is the conjugate transpose of X. This variance
is also a positive semi-definite square matrix.
[edit]History
The term variance was first introduced by Ronald Fisher in his 1918 paper The
Correlation Between Relatives on the Supposition of Mendelian Inheritance:[5]
The great body of available statistics show us that the deviations of a human
measurement from its mean follow very closely the Normal Law of Errors, and,
therefore, that the variability may be uniformly measured by the standard
deviation corresponding to the square root of the mean square error. When there are
two independent causes of variability capable of producing in an otherwise uniform
population distributions with standard deviations θ1 and θ2, it is found that the
distribution, when both causes act together, has a standard deviation . It is
therefore desirable in analysing the causes of variability to deal with the square of
the standard deviation as the measure of variability. We shall term this quantity the
Variance...
[edit]Moment of inertia
The variance of a probability distribution is analogous to the moment of
inertia in classical mechanics of a corresponding mass distribution along a line, with
respect to rotation about its center of mass. It is because of this analogy that such
things as the variance are called moments of probability distributions. The
covariance matrix is related to the moment of inertia tensor for multivariate
distributions. The moment of inertia of a cloud of n points with a covariance matrix
of Σ is given by
This difference between moment of inertia in physics and in statistics is clear for
points that are gathered along a line. Suppose many points are close to the x and
distributed along it. The covariance matrix might look like
That is, there is the most variance in the x direction. However, physicists would
consider this to have a low moment about the x axis so the moment-of-inertia tensor
is
Overview
The moment of inertia of an object about a given axis describes how difficult it is to
change its angular motion about that axis. Therefore, it encompasses not just how
much mass the object has overall, but how far each bit of mass is from the axis. The
farther out the object's mass is, the more rotational inertia the object has, and the
more force is required to change its rotation rate. For example, consider two hoops,
A and B, made of the same material and of equal mass. Hoop A is larger in diameter
but thinner than B. It requires more effort to accelerate hoop A (change its angular
velocity) because its mass is distributed farther from its axis of rotation: mass that is
farther out from that axis must, for a given angular velocity, move more quickly than
mass closer in. So in this case, hoop A has a larger moment of inertia than hoop B.
Divers reducing their moments of inertia to increase their rates of rotation
The moment of inertia of an object can change if its shape changes. A figure skater
who begins a spin with arms outstretched provides a striking example. By pulling in
her arms, she reduces her moment of inertia, causing her to spin faster (by the
conservation of angular momentum).
The moment of inertia has two forms, a scalar form, I, (used when the axis of rotation
is specified) and a more general tensor form that does not require the axis of rotation
to be specified. The scalar moment of inertia, I, (often called simply the "moment of
inertia") allows a succinct analysis of many simple problems inrotational dynamics,
such as objects rolling down inclines and the behavior of pulleys. For instance, while
a block of any shape will slide down a frictionless decline at the same rate, rolling
objects may descend at different rates, depending on their moments of inertia. A
hoop will descend more slowly than a solid disk of equal mass and radius because
more of its mass is located far from the axis of rotation, and thus needs to move
faster if the hoop rolls at the same angular velocity. However, for (more complicated)
problems in which the axis of rotation can change, the scalar treatment is
inadequate, and the tensor treatment must be used (although shortcuts are possible
in special situations). Examples requiring such a treatment include gyroscopes, tops,
and even satellites, all objects whose alignment can change.
The moment of inertia is also called the mass moment of inertia (especially by
mechanical engineers) to avoid confusion with the second moment of area, which is
sometimes called the moment of inertia (especially by structural engineers). The
easiest way to differentiate these quantities is through their units (kg·m2
as opposed
to m4
). In addition, moment of inertia should not be confused with polar moment of
inertia, which is a measure of an object's ability to resist torsion (twisting) only.
[edit]Scalar moment of inertia
[edit]Definition
A simple definition of the moment of inertia (with respect to a given axis of rotation)
of any object, be it a point mass or a 3D-structure, is given by:
where m is mass and r is the perpendicular distance to the axis of rotation.
[edit]Detailed analysis
The (scalar) moment of inertia of a point mass rotating about a known axis is defined
by
The moment of inertia is additive. Thus, for a rigid body consisting of N point
masses mi with distances ri to the rotation axis, the total moment of inertia equals the
sum of the point-mass moments of inertia:
The mass distribution along the axis of rotation has no effect on the moment of
inertia.
For a solid body described by a mass density function, ρ(r), the moment of inertia
about a known axis can be calculated by integrating the square of the distance
(weighted by the mass density) from a point in the body to the rotation axis:
where
V is the volume occupied by the object.
ρ is the spatial density function of the object, and
r = (r,θ,φ), (x,y,z), or (r,θ,z) is the vector (orthogonal to the axis of rotation) between
the axis of rotation and the point in the body.
Diagram for the calculation of a disk's moment of inertia. Here c is 1/2 and is the
radius used in determining the moment.
Based on dimensional analysis alone, the moment of inertia of a non-point object
must take the form:
where
M is the mass
L is a length dimension taken from the centre of mass (in some cases, the length of
the object is used instead.)
c is a dimensionless constant called the inertial constant that varies with the object in
consideration.
Inertial constants are used to account for the differences in the placement of the
mass from the center of rotation. Examples include:
c = 1, thin ring or thin-walled cylinder around its center,
c = 2/5, solid sphere around its center
c = 1/2, solid cylinder or disk around its center.
When c is 1, the length (L) is called the radius of gyration.
For more examples, see the List of moments of inertia.
[Parallel axis theorem
Main article: Parallel axis theorem
Once the moment of inertia has been calculated for rotations about the center of
mass of a rigid body, one can conveniently recalculate the moment of inertia for all
parallel rotation axes as well, without having to resort to the formal definition. If the
axis of rotation is displaced by a distance r from the center of mass axis of rotation
(e.g., spinning a disc about a point on its periphery, rather than through its center,)
the displaced and center-moment of inertia are related as follows:
This theorem is also known as the parallel axes rule and is a special case
of Steiner's parallel-axis theorem.
Composite bodies
If a body can be decomposed (either physically or conceptually) into several
constituent parts, then the moment of inertia of the body about a given axis is
obtained by summing the moments of inertia of each constituent part around the
same given axis.[2]
Equations involving the moment of inertia
The rotational kinetic energy of a rigid body can be expressed in terms of its moment
of inertia. For a system with N point masses mi moving with speeds vi, the rotational
kinetic energy T equals
where ω is the common angular velocity (in radians per second). The final
expression I ω2
/ 2 also holds for a mass density function with a generalization of the
above derivation from a discrete summation to an integration.
In the special case where the angular momentum vector is parallel to the angular
velocity vector, one can relate them by the equation
where L is the angular momentum and ω is the angular velocity. However, this
equation does not hold in many cases of interest, such as the torque-
free precession of a rotating object, although its more general tensor form is always
correct.
When the moment of inertia is constant, one can also relate the torque on an object
and its angular acceleration in a similar equation:
where τ is the torque and α is the angular acceleration.
Moment of inertia tensor
In three dimensions, if the axis of rotation is not given, we need to be able to
generalize the scalar moment of inertia to a quantity that allows us to compute a
moment of inertia about arbitrary axes. This quantity is known as the moment of
inertia tensor and can be represented as a symmetric positive semi-definite
matrix, I. This representation elegantly generalizes the scalar case: The angular
momentum vector, is related to the rotation velocity vector, ω by
and the kinetic energy is given by
as compared with
in the scalar case.
Like the scalar moment of inertia, the moment of inertia tensor may be calculated
with respect to any point in space, but for practical purposes, the center of mass is
almost always used.
[edit]Definition
For a rigid object of N point masses mk, the moment of inertia tensor is given by
,
where
and I12 = I21, I13 = I31, and I23 = I32. (Thus I is a symmetric tensor.)
Here Ixx denotes the moment of inertia around the x-axis when the objects are rotated
around the x-axis, Ixy denotes the moment of inertia around the y-axis when the
objects are rotated around the x-axis, and so on.
These quantities can be generalized to an object with distributed mass, described by
a mass density function, in a similar fashion to the scalar moment of inertia. One
then has
where is their outer product, E3 is the 3 × 3 identity matrix, and V is a region of
space completely containing the object. Alternatively, the equation above can be
represented in a component-based method. Recognizing that, in the above
expression, the scalars Iij with are called the products of inertia, a
generalized form of the products of inertia can be given as
The diagonal elements of I are called the principal moments of inertia.
Derivation of the tensor components
The distance r of a particle at from the axis of rotation passing through the origin in
the direction is . By using the formula I = mr2
(and some simple
vector algebra) it can be seen that the moment of inertia of this particle (about the
axis of rotation passing through the origin in the direction)
is This is a quadratic form in and, after a bit
more algebra, this leads to a tensor formula for the moment of inertia
.
This is exactly the formula given below for the moment of inertia in the case of a
single particle. For multiple particles we need only recall that the moment of inertia is
additive in order to see that this formula is correct.
Reduction to scalar
For any axis , represented as a column vector with elements ni, the scalar
form I can be calculated from the tensor form I as
The range of both summations correspond to the three Cartesian coordinates.
The following equivalent expression avoids the use of transposed vectors which
are not supported in maths libraries because internally vectors and their
transpose are stored as the same linear array,
However it should be noted that although this equation is mathematically
equivalent to the equation above for any matrix, inertia tensors are
symmetrical. This means that it can be further simplified to:
[edit]Principal axes of inertia
By the spectral theorem, since the moment of inertia tensor is real and symmetric, it
is possible to find a Cartesian coordinate system in which it is diagonal, having the
form
where the coordinate axes are called the principal axes and the
constants I1, I2 and I3 are called the principal moments of inertia. The principal
axes of a body, therefore, are a cartesian coordinate system whose origin is located
at the center of mass. [3]
The unit vectors along the principal axes are usually
denoted as (e1, e2, e3). This result was first shown by J. J. Sylvester (1852), and is a
form ofSylvester's law of inertia. The principal axis with the highest moment of inertia
is sometimes called the figure axis or axis of figure.
When all principal moments of inertia are distinct, the principal axes are uniquely
specified. If two principal moments are the same, the rigid body is called
a symmetrical top and there is no unique choice for the two corresponding principal
axes. If all three principal moments are the same, the rigid body is called a spherical
top (although it need not be spherical) and any axis can be considered a principal
axis, meaning that the moment of inertia is the same about any axis.
The principal axes are often aligned with the object's symmetry axes. If a rigid body
has an axis of symmetry of order m, i.e., is symmetrical under rotations of
360°/m about a given axis, the symmetry axis is a principal axis. When m > 2, the
rigid body is a symmetrical top. If a rigid body has at least two symmetry axes that
are not parallel or perpendicular to each other, it is a spherical top, e.g., a cube or
any other Platonic solid.
The motion of vehicles is often described about these axes with
the rotations called yaw, pitch, and roll.
A practical example of this mathematical phenomenon is the routine automotive task
of balancing a tire, which basically means adjusting the distribution of mass of a car
wheel such that its principal axis of inertia is aligned with the axle so the wheel does
not wobble.
[edit]Parallel axis theorem
Once the moment of inertia tensor has been calculated for rotations about the center
of mass of the rigid body, there is a useful labor-saving method to compute the
tensor for rotations offset from the center of mass.
If the axis of rotation is displaced by a vector R from the center of mass, the new
moment of inertia tensor equals
where m is the total mass of the rigid body, E3 is the 3 × 3 identity matrix, and is
the outer product.
[edit]Rotational symmetry
Using the above equation to express all moments of inertia in terms of integrals of
variables either along or perpendicular to the axis of symmetry usually simplifies the
calculation of these moments considerably.
Comparison with covariance matrix
Main article: Moment (mathematics)
The moment of inertia tensor about the center of mass of a 3 dimensional rigid body
is related to the covariance matrix of a trivariate random vector whose probability
density function is proportional to the pointwise density of the rigid body by:[citation needed]
where n is the number of points.
The structure of the moment-of-inertia tensor comes from the fact that it is to be used
as a bilinear form on rotation vectors in the form
Each element of mass has a kinetic energy of
The velocity of each element of mass is where r is a vector from the center of
rotation to that element of mass. The cross product can be converted to matrix
multiplication so that
and similarly
Thus,
plugging in the definition of the term leads directly to the structure of the
moment tensor.

1561 maths

  • 1.
    The Arithmetic Mean Anarithmetic mean is a fancy term for what most people call an "average." When someone says the average of 10 and 20 is 15, they are referring to the arithmetic mean. The simplest definition of a mean is the following: Add up all the numbers you want to average, and then divide by the number of items you just added. For example, if you want to average 10, 20, and 27, first add them together to get 10+20+27= 57. Then divide by 3 because we have three values, and we get an arithmetic mean (average) of 19. Want a formal, mathematical expression of the arithmetic mean? That's just a fancy way to say "the sum of k different numbers divided by k." Check out a few example of the arithmetic mean to make sure you understand: Example: Find the arithmetic mean (average) of the following numbers: 9, 3, 7, 3, 8, 10, and 2. Solution: Add up all the numbers. Then divide by 7 because there are 7 different numbers. Example: Find the arithmetic mean of -4, 3, 18, 0, 0, and -10. Solution:
  • 2.
    Sum the numbers.Divide by 6 because there are 6 numbers. The answer is 7/6, or 1.167 Geometric Mean The geometric mean is NOT the arithmetic mean and it is NOT a simple average. It is the nth root of the product of n numbers. That means you multiply a bunch of numbers together, and then take the nth root, where n is the number of values you just multiplied. Did that make sense? Here's a quick example: Example: What is the geometric mean of 2, 8 and 4? Solution: Multiply those numbers together. Then take the third root (cube root) because there are 3 numbers. Naturally, the geometric mean can get very complicated. Here's a mathematical definition of the geometric mean: Remember that the capital PI symbol means to multiply a series of numbers. That definition says to multiply k numbers and then take the kth root. One thing you should
  • 3.
    know is thatthe geometric mean only works with positive numbers. Negative numbers could result in imaginary results depending on how many negative numbers are in a set. Typically this isn't a problem, because most uses of the geometric mean involve real data, such as the length of physical objects or the number of people responding to a survey. Try a few more examples until you understand the geometric mean. Example: What is the geometric mean of 4, 9, 9, and 2? Solution: Just multiply the four numbers and take the 4th root: The geometric mean between two numbers is: The arithmetic mean between two numbers is: Example: The cut-off frequencies of a phone line are f1 = 300 Hz and f2 = 3300 Hz. What is the center frequency? The center frequency is f0 = 995 Hz as geometric mean and not f0 = 1800 Hz (arithmetic mean). What a difference! The geometric mean of two numbers is the square root of their product. The geometric mean of three numbers is the cubic root of their product. The arithmetic mean is the sum of the numbers, divided by the quantity of the numbers. Other names for arithmetic mean: average, mean, arithmetic average. In general, you can only take the geometric mean of positive numbers. The geometric mean, by definition, is the nth root of the product of the n units in a data
  • 4.
    set. For example,the geometric mean of 5, 7, 2, 1 is (5 × 7 × 2 × 1)1/4 = 2.893. Alternatively, if you log transform each of the individual units the geometric will be the exponential of the arithmetic mean of these log-transformed values. So, reusing the example above, exp [ ( ln(5) + ln(7) + ln(2) + ln(1) ) / 4 ] = 2.893. Geometric Mean Arithmetic Mean An arithmetic average is the sum of a series of numbers divided by the count of that series of numbers. If you were asked to find the class (arithmetic) average of test scores, you would simply add up all the test scores of the students, and then divide that sum by the number of students. For example, if five students took an exam and their scores were 60%, 70%, 80%, 90% and 100%, the arithmetic class average would be 80%. This would be calculated as: (0.6 + 0.7 + 0.8 + 0.9 + 1.0) / 5 = 0.8. The reason you use an arithmetic average for test scores is that each test score is an independent event. If one student happens to perform poorly on the exam, the next student's chances of doing poor (or well) on the exam isn't affected. In other words, each student's score is independent of the all other students' scores. However, there are some instances, particularly in the world of finance, where an arithmetic mean is not an appropriate method for calculating an average. Consider your investment returns, for example. Suppose you have invested your savings in the stock market for five years. If your returns each year were 90%, 10%, 20%, 30% and -90%, what would your average return be during this period? Well, taking the simple arithmetic average, you would get an answer of 12%. Not too shabby, you might think. However, when it comes to annual investment returns, the numbers are not
  • 5.
    independent of eachother. If you lose a ton of money one year, you have that much less capital to generate returns during the following years, and vice versa. Because of this reality, we need to calculate the geometric average of your investment returns in order to get an accurate measurement of what your actual average annual return over the five-year period is. To do this, we simply add one to each number (to avoid any problems with negative percentages). Then, multiply all the numbers together, and raise their product to the power of one divided by the count of the numbers in the series. And you're finished - just don't forget to subtract one from the result! That's quite a mouthful, but on paper it's actually not that complex. Returning to our example, let's calculate the geometric average: Our returns were 90%, 10%, 20%, 30% and -90%, so we plug them into the formula as [(1.9 x 1.1 x 1.2 x 1.3 x 0.1) ^ 1/5] - 1. This equals a geometric average annual return of -20.08%. That's a heck of a lot worse than the 12% arithmetic average we calculated earlier, and unfortunately it's also the number that represents reality in this case. It may seem confusing as to why geometric average returns are more accurate than arithmetic average returns, but look at it this way: if you lose 100% of your capital in one year, you don't have any hope of making a return on it during the next year. In other words, investment returns are not independent of each other, so they require a geometric average to represent their mean.
  • 6.
    A matrix consistsof a set of numbers arranged in rows and columns enclosed in brackets. The order of a matrix gives the number of rows followed by the number of columns in a matrix. The order of a matrix with 3 rows and 2 columns is 3 2 or 3 by 2. We usually denote a matrix by a capital letter. C is a matrix of order 2 × 4 (read as ‘2 by 4’) Elements In An Array Each number in the array is called an entry or an element of the matrix. When we need to read out the elements of an array, we read it out row by row.
  • 7.
    Each element isdefined by its position in the matrix. In a matrix A, an element in row i and column j is represented by aij Example: a11 (read as ‘a one one ’)= 2 (first row, first column) a12 (read as ‘a one two') = 4 (first row, second column) a13 = 5, a21 = 7, a22 = 8, a23 = 9 Matrix Multiplication There are two matrix operations which we will use in our matrix transformations, multiplying (concatenating) two matrices, and transforming a vector by a matrix. We will now examine the first of these two operations, matrix multiplication. Matrix multiplication is the operation by which one matrix is transformed by another. A very important thing to remember is that matrix multiplication is not commutative. That is, [a] * [b] != [b] * [a]. For now, it will suffice to say that a matrix multiplication stores the results of the sum of the products of matrix rows and columns. Here is some example code of a matrix multiplication routine which multiplies matrix [a] * matrix [b], then copies the result to matrix a. void matmult(float a[4][4], float b[4][4])
  • 8.
    { float temp[4][4]; //temporary matrix for storing result int i, j; // row and column counters for (j = 0; j < 4; j++) // transform by columns first for (i = 0; i < 4; i++) // then by rows temp[i][j] = a[i][0] * b[0][j] + a[i][1] * b[1][j] + a[i][2] * b[2][j] + a[i][3] * b[3][j]; for (i = 0; i < 4; i++) // copy result matrix into matrix a for (j = 0; j < 4; j++) a[i][j] = temp[i][j]; } I have been informed that there is a faster way of multiplying matrices, which involves taking the dot product of rows and columns. However, I have yet to implement such a method, so I will not discuss it here at this time. Transforming a Vector by a Matrix This is the second operation which is required for our matrix transformations. It involves projecting a stationary vector onto transformed axis vectors using the dot product. One dot product is performed for each coordinate axis. x = x0 * matrix[0][0] + y0 * matrix[1][0] + z0 * matrix[2][0] + w0 * matrix[3][0]; y = x0 * matrix[0][1] + y0 * matrix[1][1] + z0 * matrix[2][1] + w0 * matrix[3][1]; z = x0 * matrix[0][2] + y0 * matrix[1][2] + z0 * matrix[2][2] + w0 * matrix[3][2]; The x0, y0, etc. coordinates are the original object space coordinates for the vector. That is, they never change due to transformation. "Alright," you say. "Where did all the w coordinates come from???" Good question :) The w coordinates come from what is known as a homogenous coordinate system, which is basically a way to represent 3d space in terms of a 4d matrix. Because we are limiting ourselves to 3d, we pick a constant, nonzero value for w (1.0 is a good choice, since anything * 1.0 = itself). If we use this identity axiom, we can eliminate a multiply from each of the dot products: x = x0 * matrix[0][0] + y0 * matrix[1][0] + z0 * matrix[2][0] + matrix[3][0]; y = x0 * matrix[0][1] + y0 * matrix[1][1] + z0 * matrix[2][1] + matrix[3][1];
  • 9.
    z = x0* matrix[0][2] + y0 * matrix[1][2] + z0 * matrix[2][2] + matrix[3][2]; These are the formulas you should use to transform a vector by a matrix. Object Space Transformations Now that we know how to multiply matrices together, we can implement the actual formulas used in our transformations. There are three operations performed on a vector by a matrix transformation: translation, rotation, and scaling. Translation can best be described as linear change in position. This change can be represented by a delta vector [dx, dy, dz], where dx represents the change in the object's x position, dy represents the change in its y position, and dz its z position. If done correctly, object space translation allows objects to translate forward/backward, left/right, and up/down, relative to the current orientation of the object. Using our airplane as an example, if the nose of the airplane is oriented along the object's local z axis, then translating the airplane in the +z direction will make the airplane move forward (the direction in which its nose is pointing) regardless of the airplane's orientation. Here is the translation matrix: += =+ | += =+ += =+ += =+ += += | | | | | | | | | | | | | 1 | | 0 | | 0 | | 0 | | | | | | | | | | | | | | | | | | | | | | | | 0 | | 1 | | 0 | | 0 | | | | | | | | | | | | | | | | | | | | | | | | 0 | | 0 | | 1 | | 0 | | | += =+ += =+ += =+ | | | | +===============+ | | | | dy dx dz | 1 | | | +===============+ += =+ | += =+ where [dx, dy, dz] is the displacement vector. After this operation, the object will have moved in its own coordinate system, according to the displacement (translation) vector. The next operation that is performed by our matrix transformation is rotation. Rotation can be described as circular motion about some axis, in this case the axis is one of the object's local axes. Since there are three axes in each object, we need to rotate around each of them. Here are the matrices for rotation about each axis: about the x axis:
  • 10.
    += =+ | +==+ += =+ += =+ += += | | | | | | | | | | | | | 1 | | 0 | | 0 | | 0 | | | | | | | | | | | | | | | | | | | | | | | | 0 | |cx | |sx | | 0 | | | | | | | | | | | | | | | | | | | | | | | | 0 | |-sx| |cx | | 0 | | | += =+ += =+ += =+ | | | | +===============+ | | | | 0 0 0 | 1 | | | +===============+ += =+ | += =+ about the y axis: += =+ | += =+ += =+ += =+ += += | | | | | | | | | | | | |cy | | 0 | |-sy| | 0 | | | | | | | | | | | | | | | | | | | | | | | | 0 | | 1 | | 0 | | 0 | | | | | | | | | | | | | | | | | | | | | | | |sy | | 0 | |cy | | 0 | | | += =+ += =+ += =+ | | | | +===============+ | | | | 0 0 0 | 1 | | | +===============+ += =+ | += =+ about the z axis: += =+ | += =+ += =+ += =+ += += | | | | | | | | | | | | |cz | |sz | | 0 | | 0 | | | | | | | | | | | | | | | | | | | | | | | |-sz| |cz | | 0 | | 0 | | | | | | | | | | | | | | | | | | | | | | | | 0 | | 0 | | 1 | | 0 | | | += =+ += =+ += =+ | | | | +===============+ | | | | 0 0 0 | 1 | | | +===============+ += =+ | += =+
  • 11.
    The cx, sx,cy, sy, cz, and sz variables are the values of the cosines and sines of the angles of rotation about the x, y, and z axes, respectively. Remeber that the angles used represent angular displacement just as the values used in the translation step denote a linear displacement. Correct transformation CANNOT be accomplished with matrix multiplication if you use the cumulative angles of rotation. I have been told that quaternions are able to perform this operation correctly, however I know nothing of quaternions and how they are implemented. The incremental angles used here represent rotation from the current object orientation. In other words, by rotating 1 degree about the z axis, you are telling your object "Rotate 1 degree about your z axis, regardless of your current orientation, and regardless of how you got to that orientation." If you think about it a bit, you will realize that this is how the real world operates. In object space, the series of rotations an object undergoes to attain a certain orientation have no effect on the object space results of any upcoming rotations. Now that we know the matrix formulas for translation and rotation, we can combine them to transform our objects. The formula for transformations in object space is [O] = [O] * [T] * [X] * [Y] * [Z] where O is the object's matrix, T is the translation matrix, and X, Y, and Z are the rotation matrices for their respective axes. Remember, that order of matrix multiplication is very important! The recursive assignment of O poses a question: What is the original value of the object matrix? To eliminate any terrible errors in transformation, the matrices which store an object's orientation should always be initialized to identity. Matrix Multiplication You probably know what a matrix is already if you are interested in matrix multiplication. However, a quick example won't hurt. A matrix is just a two-dimensional group of numbers. Instead of a list, called a vector, a matrix is a rectangle, like the following: You can set a variable to be a matrix just as you can set a variable to be a number. In this case, x is the matrix containing those four numbers (in that particular order). Now, suppose you have two matrices that you need to multiply. Multiplication for numbers is pretty easy, but how do you do it for a matrix?
  • 12.
    Here is akey point: You cannot just multiply each number by the corresponding number in the other matrix. Matrix multiplication is not like addition or subtraction. It is more complicated, but the overall process is not hard to learn. Here's an example first, and then I'll explain what I did: Example: Solution: You're probably wondering how in the world I got that answer. Well you're justified in thinking that. Matrix multiplication is not an easy task to learn, and you do need to pay attention to avoid a careless error or two. Here's the process: • Step 1: Move across the top row of the first matrix, and down the first column of the second matrix:
  • 13.
    • Step 2:Multiply each number from the top row of the first matrix by the number in the first column on the second matrix. In this case, that means multiplying 1*2 and 6*9. Then, take the sum of those values (2+54): • Step 3: Insert the value you just got into the answer matrix. Since we are multiplying the 1st row and the 1st column, our answer goes into that slot in the answer matrix: • Step 4: Repeat for the other rows and columns. That means you need to walk down the first row of the first matrix and this time the second column of the second matrix. Then the second row of the first matrix and the first column of the second, and finally the bottom of the first matrix and the right column of the second matrix:
  • 14.
    • Step 5:Insert all of those values into the answer matrix. I just showed you how to do top left and the bottom right. If you work the other two numbers, you will get 1*2+6*7=44 and 3*2+8*9=78. Insert them into the answer matrix in the corresponding positions and you get: Now I know what you're thinking. That was really hard!!! Well it will seem that way until you get used to the process. It may help you to write out all your work, and even draw arrows to remember which way you're moving in the rows and columns. Just remember to multiply each row in the first matrix by each column in the second matrix. What if the matrices aren't squares? Then you have to add another step. In order to multiply two matrices, the matrix on the left must have as many columns as the matrix
  • 15.
    on the righthas rows. That way you can match up each pair while you're multiplying. The size of the final matrix is determined by the rows in the left matrix and the columns in the right. Here's what I do: I write down the sizes of the matrices. The left matrix has 2 rows and 3 columns, so that's how we write it. Rows, columns, in that order. The other matrix is a 3x1 matrix because it has 3 rows and just 1 column. If the numbers in the middle match up you can multiply. The outside numbers give you the size of the answer. Even if you mess this up you'll figure it out eventually because you won't be able to multiply. Here's an important reminder: Matrix Multiplication is not commutative. That means you cannot switch the order and expect the same result! Regular multiplication tells us that 4*3=3*4, but this is not multiplication of the usual sense. Finally, here's an example with uneven matrix sizes to wrap things up: Example:
  • 16.
    Lab 1: MatrixCalculation Examples Given the Following Matrices: A= 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00
  • 17.
    B= 1.00 1.00 1.00 2.002.00 2.00 C= 1.00 2.00 1.00 3.00 2.00 2.00 1.00 5.00 3.00 1) Calculate A + C 2) Calculate A - C 3) Calculate A * C (A times C) 4) Calculate B * A (B time A) 5) Calculate A .* C (A element by element multiplication with C) 6) Inverse of C
  • 18.
    Matrix Calculation Examples- Answers A + C= 2.00 4.00 4.00 7.00 7.00 8.00 8.00 13.00 12.00 A - C= 0.00 0.00 2.00 1.00 3.00 4.00 6.00 3.00 6.00 A * C= 10.00 21.00 14.00 25.00 48.00 32.00 40.00 75.00 50.00 B * A= 12.00 15.00 18.00
  • 19.
    24.00 30.00 36.00 Elementby element multiplication A .* C= 1.00 4.00 3.00 12.00 10.00 12.00 7.00 40.00 27.00 Inverse of C= 0.80 0.20 -0.40 1.40 -0.40 -0.20 -2.60 0.60 0.80
  • 20.
    Matrix Calculation Assignment Giventhe Following Matrices: A= 1.00 2.00 3.00 6.00 5.00 4.00 9.00 8.00 7.00 B= 2.00 2.00 2.00 3.00 3.00 3.00 C= 1.00 2.00 1.00 4.00 3.00 1.00 3.00 4.00 2.00
  • 21.
    1) Calculate A+ C 2) Calculate A - C 3) Calculate A * C (A times C) 4) Calculate B * A (B time A) 5) Calculate A .* C (A element by element multiplication with C) Vector product VS dot product in matrix hi, i don't really understand whats the difference between vector product and dot product in matrix form. for example (1 2) X (1 2) (3 4) (3 4) = ? so when i take rows multiply by columns, to get a 2x2 matrix, i am doing vector product? so what then is dot producT? lastly, my notes says |detT| = final area of basic box/ initial area of basic box where detT = (Ti) x (Tj) . (Tk) so, whats the difference between how i should work out x VS . ? also, |detT| = magnitude of T right? so is there a formula i should use to find magnitude? so why is |k . k| = 1? thanks PhysOrg.com science news on PhysOrg.com >> Smooth-talking hackers test hi-tech titans' skills >> Reading terrorists minds about imminent attack: P300 brain waves correlated to guilty knowledge >> Nano 'pin art': NIST arrays are step toward mass production of nanowires Apr25-10, 07:35 AM Last edited by HallsofIvy; Apr27-10 at 07:42 AM.. #2 HallsofIvy HallsofIvy is Offline: Posts: 26,845 Re: Vector product VS dot product in matrix Originally Posted by quietrain hi, i don't really understand whats the difference between vector product and dot product in matrix form. for example
  • 22.
    (1 2) X(1 2) (3 4) (3 4) = ? so when i take rows multiply by columns, to get a 2x2 matrix, i am doing vector product? No, you are doing a "matrix product". There are no vectors here. so what then is dot producT? With matrices? It isn't anything. The matrix product is the only multiplication defined for matrices. The dot product is defined for vectors, not matrices. lastly, my notes says |detT| = final area of basic box/ initial area of basic box where detT = (Ti) x (Tj) . (Tk) Well, we don't have your notes so we have no idea what "T", "Ti", "Tj", "Tk" are nor do we know what a "basic box" is. I do know that if you have a "parallelpiped" with adjacent sides given by the vectors , , and , then the volume (not area) of the parallelpiped is given by the "triple product", which can be represented by determinant having the components of the vectors as rows. That has nothing at all to do with matrices. so, whats the difference between how i should work out x VS . ? also, |detT| = magnitude of T right? No, "det" applies only to square arrays for which "magnitude" is not defined. so is there a formula i should use to find magnitude? so why is |k . k| = 1? thanks I guess you mean "k" to be the unit vector in the z direction in a three dimensional coordinate system. If so, then |k.k| is, by definition, the length of k which is, again by definition of "unit vector", 1. You seem to be confusing a number of very different concepts. Go back and review. Apr27-10, 01:02 AM #3
  • 23.
    quietrain quietrain is Offline: Posts:173 Re: Vector product VS dot product in matrix oh.. em.. ok lets say we have (1 2) x (4 5) (3 4) (6 7) = so this is just rows multiply by column to get a 2x2 matrix right? so what is the difference if i replace the x sign with the dot sign now. do i still get the same? i presume one is cross (x) product , one is dot (.) product? or is it for matrix there is no such things as cross or dot product? thats weird. my tutor tells us to know the difference between cross and dot matrix product so for the case of the parallelpiped, whats the significance of the triple product (u x v) .w? why do we use x for u&v but . for w? is it just to tell us that we have to use sin and cos respectively? but if u v and w were square matrix, then there won't be any sin and cos to use? so we just multiply as usual rows by columns? oh by definition . so that means |k.k| = (k)(k)cos(0) = (1)(1)cos(0) = 1 so |i.k| = (1)(1)cos(90) = 0 ? so if i x k gives us -j by the right hand rule, then does it mean the magnitude, which is |i.k| = 0 is 0? in the direction of the -j?? or are they 2 totally different aspects? btw, sry for another question, why is e(w)(A) , where A = (0 -1) (1 0) can be expressed as ( cosw -sinw) ( sinw cosw) which is the rotational matrix anti-clockwise about the x-axis right? thanks Apr27-10, 08:08 AM Last edited by HallsofIvy; Apr27-10 at 08:16 AM.. #4 HallsofIvy HallsofIvy is Offline: Posts: 26,845 Re: Vector product VS dot product in matrix Originally Posted by quietrain oh.. em.. ok lets say we have (1 2) x (4 5) (3 4) (6 7) = so this is just rows multiply by column to get a 2x2 matrix right? so what is the difference if i replace the x sign with the dot sign now. do i still get the same? You can replace it by whatever symbol you like. As long as your multiplication is "matrix multiplication" you will get the same result.
  • 24.
    i presume oneis cross (x) product , one is dot (.) product? No, just changing the symbol doesn't make it one or the other. or is it for matrix there is no such things as cross or dot product? thats weird. my tutor tells us to know the difference between cross and dot matrix product I suspect your tutor was talking about vectors not matrices. so for the case of the parallelpiped, whats the significance of the triple product (u x v) .w? why do we use x for u&v but . for w? Because you are talking about vectors not matrices! is it just to tell us that we have to use sin and cos respectively? but if u v and w were square matrix, then there won't be any sin and cos to use? so we just multiply as usual rows by columns? They are NOT matrices, they are vectors!! You can think of vectors as "row matrices" (n by 1) or "column matrices" (1 by n) but they still have properties that matrices in general do not have. oh by definition . so that means |k.k| = (k)(k)cos(0) = (1)(1)cos(0) = 1 so |i.k| =(1)(1)cos(90) = 0 ? Yes, that is correct. so if i x k gives us -j by the right hand rule, then does it mean the magnitude, which is |i.k| = 0 is 0? in the direction of the -j?? or are they 2 totally different aspects? No, the length of i x k is NOT |i.k|, it is . In general, the length of is where is the angle between and . btw, sry for another question, why is e(w)(A) , where A = (0 -1) (1 0) can be expressed as ( cosw -sinw) ( sinw cosw) which is the rotational matrix anti-clockwise about the x-axis right? thanks For objects other than numbers, where we have a notion of addition and multiplication, we define higher functions by using their "Taylor series", power series that are equal to the functions. In
  • 25.
    particular, . It shouldbe easy to calculate that and, since that is the identity matrix, it all repeats: etc. That gives and you should be able to recognise those as the Taylor's series about 0 for cos(w) and sin(w). Apr27-10, 09:15 AM #5 quietrain quietrain is Offline: Posts: 173 Re: Vector product VS dot product in matrix wow. ok, i went to check again what my tutor said and it was "scalar and vector products in terms of matrices". so what does he mean by this? the scalar product is (A B C) x (D E F)T , (so we can take the transpose of DEF because it is symmetric matrix? or is it for some other reason? ) so rows multiply by columns again? but what about vector product? for the parallelpiped, (u x v).w so lets say u = (1,1) , v = (2,2), w = (3,3) so u x v = (1x2, 1x2)sin(angle between vectors) so .w = (2x3,2x3) cos(angle) ? so if it yields 0, that vector w lies in the plane define by u and v, but if its otherwise, then w doesn't lie in the plane of u v ? for i x k, why is the length |i||j|? why is j introduced here? shouldn't it be |i||k|sin(90) = 1?
  • 26.
    oh i see..so the right hand rule gives the direection but the magnitude for i x k = |i||k|sin(90) = 1? thanks a ton! Apr27-10, 11:40 AM #6 HallsofIvy HallsofIvy is Offline: Posts: 26,845 Re: Vector product VS dot product in matrix Originally Posted by quietrain wow. ok, i went to check again what my tutor said and it was "scalar and vector products in terms of matrices". so what does he mean by this? the scalar product is (A B C) x (D E F)T , (so we can take the transpose of DEF because it is symmetric matrix? or is it for some other reason? ) so rows multiply by columns again? Okay, you think of one vector as a row matrix and the other as a column matrix then the "dot product" is the matrix product But the dot product is commutative isn't it? Does it really make sense to treat the two vectors as different kinds of matrices? It is really better here to think of this not as the product of two vectors but a vector in a vector space and functional in the dual space. but what about vector product for the parallelpiped, (u x v).w so lets say u = (1,1) , v = (2,2), w = (3,3) so u x v = (1x2, 1x2)sin(angle between vectors) so .w = (2x3,2x3) cos(angle) ? so if it yields 0, that vector w lies in the plane define by u and v, but if its otherwise, then w doesn't lie in the plane of u v ? for i x k, why is the length |i||j|? why is j introduced here? shouldn't it be |i||k|sin(90) = 1? Yes, that was a typo. I meant |i||k|. oh i see.. so the right hand rule gives the direection but the magnitude for i x k = |i||k|sin(90) = 1? Yes. thanks a ton!
  • 27.
    Matrix (mathematics) From Wikipedia,the free encyclopedia Specific entries of a matrix are often referenced by using pairs of subscripts. In mathematics, a matrix (plural matrices, or less commonly matrixes) is a rectangular array of numbers, such as
  • 28.
    An item ina matrix is called an entry or an element. The example has entries 1, 9, 13, 20, 55, and 4. Entries are often denoted by a variable with twosubscripts, as shown on the right. Matrices of the same size can be added and subtracted entrywise and matrices of compatible sizes can be multiplied. These operations have many of the properties of ordinary arithmetic, except that matrix multiplication is not commutative, that is, AB and BA are not equal in general. Matrices consisting of only one column or row define the components of vectors, while higher-dimensional (e.g., three-dimensional) arrays of numbers define the components of a generalization of a vector called a tensor. Matrices with entries in other fields or rings are also studied. Matrices are a key tool in linear algebra. One use of matrices is to represent linear transformations, which are higher-dimensional analogs of linear functions of the form f(x) = cx, where c is a constant; matrix multiplication corresponds to composition of linear transformations. Matrices can also keep track of thecoefficients in a system of linear equations. For a square matrix, the determinant and inverse matrix (when it exists) govern the behavior of solutions to the corresponding system of linear equations, and eigenvalues and eigenvectors provide insight into the geometry of the associated linear transformation. Matrices find many applications. Physics makes use of matrices in various domains, for example in geometrical optics and matrix mechanics; the latter led to studying in more detail matrices with an infinite number of rows and columns. Graph theory uses matrices to keep track of distances between pairs of vertices in a graph. Computer graphics uses matrices to project 3- dimensional space onto a 2-dimensional screen. Matrix calculus generalizes classical analytical notions such as derivatives of functions or exponentials to matrices. The latter is a recurring need in solving ordinary differential equations.Serialism and dodecaphonism are musical movements of the 20th century that use a square mathematical matrix to determine the pattern of music intervals. A major branch of numerical analysis is devoted to the development of efficient algorithms for matrix computations, a subject that is centuries old but still an active area of research. Matrix decomposition methods simplify computations, both theoretically and practically. For sparse matrices, specifically tailored
  • 29.
    algorithms can providespeedups; such matrices arise in the finite element method, for example. Definition A matrix is a rectangular arrangement of numbers.[1] For example, An alternative notation uses large parentheses instead of box brackets: The horizontal and vertical lines in a matrix are called rows and columns, respectively. The numbers in the matrix are called its entries or its elements. To specify a matrix's size, a matrix with m rows and ncolumns is called an m-by-n matrix or m × n matrix, while m and n are called its dimensions. The above is a 4-by-3 matrix. A matrix with one row (a 1 × n matrix) is called a row vector, and a matrix with one column (an m × 1 matrix) is called a column vector. Any row or column of a matrix determines a row or column vector, obtained by removing all other rows respectively columns from the matrix. For example, the row vector for the third row of the above matrix A is When a row or column of a matrix is interpreted as a value, this refers to the corresponding row or column vector. For instance one may say that two different rows of a matrix are equal, meaning they determine the same row vector. In some cases the value of a row or column should be interpreted just as a sequence of values (an element of Rn if entries are real numbers) rather than as a matrix, for instance when saying that the rows of a matrix are equal to the corresponding columns of its transpose matrix. Most of this article focuses on real and complex matrices, i.e., matrices whose entries are real or complex numbers. More general types of entries are discussed below.
  • 30.
    [edit]Notation The specifics ofmatrices notation varies widely, with some prevailing trends. Matrices are usually denoted using upper-case letters, while the corresponding lower-case letters, with two subscript indices, represent the entries. In addition to using upper-case letters to symbolize matrices, many authors use a special typographical style, commonly boldface upright (non-italic), to further distinguish matrices from other variables. An alternative notation involves the use of a double-underline with the variable name, with or without boldface style, (e.g., ). The entry that lies in the i-th row and the j-th column of a matrix is typically referred to as the i,j, (i,j), or (i,j)th entry of the matrix. For example, the (2,3) entry of the above matrix A is 7. The (i, j)th entry of a matrix A is most commonly written as ai,j. Alternative notations for that entry are A[i,j] or Ai,j. Sometimes a matrix is referred to by giving a formula for its (i,j)th entry, often with double parenthesis around the formula for the entry, for example, if the (i,j)th entry of A were given by aij, A would be denoted ((aij)). An asterisk is commonly used to refer to whole rows or columns in a matrix. For example, ai, ∗ refers to the ith row of A, and a ∗ ,j refers to the jth column of A. The set of all m-by-n matrices is denoted (m,n). A common shorthand is A = [ai,j]i=1,...,m; j=1,...,n or more briefly A = [ai,j]m×n to define an m × n matrix A. Usually the entries ai,j are defined separately for all integers 1 ≤ i ≤ m and 1 ≤ j ≤ n. They can however sometimes be given by one formula; for example the 3-by-4 matrix can alternatively be specified by A = [i − j]i=1,2,3; j=1,...,4, or simply A = ((i-j)), where the size of the matrix is understood. Some programming languages start the numbering of rows and columns at zero, in which case the entries of an m-by-n matrix are indexed by 0 ≤ i ≤ m − 1 and 0 ≤ j ≤ n − 1.[2] This article follows the more common convention in mathematical writing where enumeration starts from 1.
  • 31.
    [edit]Basic operations Main articles:Matrix addition, Scalar multiplication, Transpose, and Row operations There are a number of operations that can be applied to modify matrices called matrix addition, scalar multiplication and transposition.[3] These form the basic techniques to deal with matrices. Operation Definition Example Addition The sum A+B of two m- by-n matrices A and B is calculated entrywise: (A + B)i,j = Ai,j + Bi ,j, where 1 ≤ i ≤ m and 1 ≤ j ≤ n. Scalar multiplicati on The scalar multiplication cA of a matrix A and a number c (also called a scalar in the parlance ofabstract algebra) is given by multiplying every entry of A by c: (cA)i,j = c · Ai,j. Transp ose The transpose of an m-by- n matrix A is the n-by-m m atrix AT (also denoted Atr or t A) formed by turning rows into columns and vice versa:
  • 32.
    (AT )i,j = Aj,i. Familiarproperties of numbers extend to these operations of matrices: for example, addition is commutative, i.e. the matrix sum does not depend on the order of the summands: A + B = B + A.[4] The transpose is compatible with addition and scalar multiplication, as expressed by (cA)T = c(AT ) and (A + B)T = AT + BT . Finally, (AT )T = A. Row operations are ways to change matrices. There are three types of row operations: row switching, that is interchanging two rows of a matrix, row multiplication, multiplying all entries of a row by a non-zero constant and finally row addition which means adding a multiple of a row to another row. These row operations are used in a number of ways including solving linear equations and finding inverses. [edit]Matrix multiplication, linear equations and linear transformations Main article: Matrix multiplication Schematic depiction of the matrix product AB of two matrices A and B. Multiplication of two matrices is defined only if the number of columns of the left matrix is the same as the number of rows of the right matrix. If A is an m-by-n matrix and B is an n-by-p matrix, then their matrix product AB is the m-by-p matrix whose entries are given by dot-product of the corresponding row of A and the corresponding column of B: where 1 ≤ i ≤ m and 1 ≤ j ≤ p.[5] For example (the underlined entry 1 in the product is calculated as the product 1 · 1 + 0 · 1 + 2 · 0 = 1):
  • 33.
    Matrix multiplication satisfiesthe rules (AB)C = A(BC) (associativity), and (A+B)C = AC+BC as well as C(A+B) = CA+CB (left and right distributivity), whenever the size of the matrices is such that the various products are defined. [6] The product AB may be defined without BA being defined, namely if A and B are m-by-n and n-by-k matrices, respectively, and m ≠ k. Even if both products are defined, they need not be equal, i.e. generally one has AB ≠ BA, i.e., matrix multiplication is not commutative, in marked contrast to (rational, real, or complex) numbers whose product is independent of the order of the factors. An example of two matrices not commuting with each other is: whereas The identity matrix In of size n is the n-by-n matrix in which all the elements on the main diagonal are equal to 1 and all other elements are equal to 0, e.g. It is called identity matrix because multiplication with it leaves a matrix unchanged: MIn = ImM = M for any m-by-n matrix M. Besides the ordinary matrix multiplication just described, there exist other less frequently used operations on matrices that can be considered forms of multiplication, such as the Hadamard product and theKronecker product.[7] They arise in solving matrix equations such as the Sylvester equation. [edit]Linear equations Main articles: Linear equation and System of linear equations A particular case of matrix multiplication is tightly linked to linear equations: if x designates a column vector (i.e. n×1-matrix) of n variables x1, x2, ..., xn, and A is an m-by-n matrix, then the matrix equation
  • 34.
    Ax = b, whereb is some m×1-column vector, is equivalent to the system of linear equations A1,1x1 + A1,2x2 + ... + A1,nxn = b1 ... Am,1x1 + Am,2x2 + ... + Am,nxn = bm .[8] This way, matrices can be used to compactly write and deal with multiple linear equations, i.e. systems of linear equations. [edit]Linear transformations Main articles: Linear transformation and Transformation matrix Matrices and matrix multiplication reveal their essential features when related to linear transformations, also known as linear maps. A real m-by-n matrix A gives rise to a linear transformation Rn → Rm mapping each vector x in Rn to the (matrix) product Ax, which is a vector in Rm . Conversely, each linear transformation f: Rn → Rm arises from a unique m-by-n matrix A: explicitly, the (i, j)- entry of A is theith coordinate of f(ej), where ej = (0,...,0,1,0,...,0) is the unit vector with 1 in the jth position and 0 elsewhere. The matrix A is said to represent the linear map f, and A is called the transformation matrix of f. The following table shows a number of 2-by-2 matrices with the associated linear maps of R2 . The blue original is mapped to the green grid and shapes, the origin (0,0) is marked with a black point. Vertical shear with m=1.25. Horizontal flip Squeeze mapping with r=3/2 Scaling by a factor of 3/2 Rotation by π/6R = 30°
  • 35.
    Under the 1-to-1correspondence between matrices and linear maps, matrix multiplication corresponds to composition of maps:[9] if a k-by-m matrix B represents another linear map g : Rm → Rk , then the composition g ∘ f is represented by BA since (g ∘ f)(x) = g(f(x)) = g(Ax) = B(Ax) = (BA)x. The last equality follows from the above-mentioned associativity of matrix multiplication. The rank of a matrix A is the maximum number of linearly independent row vectors of the matrix, which is the same as the maximum number of linearly independent column vectors.[10] Equivalently it is thedimension of the image of the linear map represented by A.[11] The rank-nullity theorem states that the dimension of the kernel of a matrix plus the rank equals the number of columns of the matrix.[12] Square matrices A square matrix is a matrix which has the same number of rows and columns. An n- by-n matrix is known as a square matrix of order n. Any two square matrices of the same order can be added and multiplied. A square matrix A is called invertible or non-singular if there exists a matrix B such that AB = In.[13] This is equivalent to BA = In.[14] Moreover, if B exists, it is unique and is called the inverse matrix of A, denoted A−1 . The entries Ai,i form the main diagonal of a matrix. The trace, tr(A) of a square matrix A is the sum of its diagonal entries. While, as mentioned above, matrix multiplication is not commutative, the trace of the product of two matrices is independent of the order of the factors: tr(AB) = tr(BA).[15] If all entries outside the main diagonal are zero, A is called a diagonal matrix. If only all entries above (below) the main diagonal are zero, A is called a lower triangular matrix (upper triangular matrix, respectively). For example, if n = 3, they look like
  • 36.
    (diagonal), (lower) and (uppertriangular matrix). [edit]Determinant Main article: Determinant A linear transformation on R2 given by the indicated matrix. The determinant of this matrix is −1, as the area of the green parallelogram at the right is 1, but the map reverses theorientation, since it turns the counterclockwise orientation of the vectors to a clockwise one. The determinant det(A) or |A| of a square matrix A is a number encoding certain properties of the matrix. A matrix is invertible if and only if its determinant is nonzero. Its absolute value equals the area (in R2 ) or volume (in R3 ) of the image of the unit square (or cube), while its sign corresponds to the orientation of the corresponding linear map: the determinant is positive if and only if the orientation is preserved. The determinant of 2-by-2 matrices is given by the determinant of 3-by-3 matrices involves 6 terms (rule of Sarrus). The more lengthy Leibniz formula generalises these two formulae to all dimensions.[16] The determinant of a product of square matrices equals the product of their determinants: det(AB) = det(A) · det(B).[17] Adding a multiple of any row to another row, or a multiple of any column to another column, does not change the determinant. Interchanging two rows or two columns affects the determinant by multiplying it by −1.[18] Using these operations, any matrix can be transformed to a lower (or upper) triangular matrix, and for such matrices the determinant equals the
  • 37.
    product of theentries on the main diagonal; this provides a method to calculate the determinant of any matrix. Finally, the Laplace expansion expresses the determinant in terms of minors, i.e., determinants of smaller matrices.[19] This expansion can be used for a recursive definition of determinants (taking as starting case the determinant of a 1-by-1 matrix, which is its unique entry, or even the determinant of a 0-by-0 matrix, which is 1), that can be seen to be equivalent to the Leibniz formula. Determinants can be used to solve linear systems using Cramer's rule, where the division of the determinants of two related square matrices equates to the value of each of the system's variables.[20] [edit]Eigenvalues and eigenvectors Main article: Eigenvalues and eigenvectors A number λ and a non-zero vector v satisfying Av = λv are called an eigenvalue and an eigenvector of A, respectively.[nb 1][21] The number λ is an eigenvalue of an n×n-matrix A if and only if A−λIn is not invertible, which is equivalent to [22] The function pA(t) = det(A−tI) is called the characteristic polynomial of A, its degree is n. Therefore pA(t) has at most n different roots, i.e., eigenvalues of the matrix.[23] They may be complex even if the entries of A are real. According to the Cayley-Hamilton theorem, pA(A) = 0, that is to say, the characteristic polynomial applied to the matrix itself yields the zero matrix. [edit]Symmetry A square matrix A that is equal to its transpose, i.e. A = AT , is a symmetric matrix; if it is equal to the negative of its transpose, i.e. A = −AT , then it is a skew-symmetric matrix. In complex matrices, symmetry is often replaced by the concept of Hermitian matrices, which satisfy A ∗ = A, where the star or asterisk denotes the conjugate transpose of the matrix, i.e. the transpose of the complex conjugateof A. By the spectral theorem, real symmetric matrices and complex Hermitian matrices have an eigenbasis; i.e., every vector is expressible as a linear combination of eigenvectors. In both cases, all eigenvalues are real.[24] This theorem can be generalized to infinite-dimensional situations related to matrices with infinitely many rows and columns, see below.
  • 38.
    [edit]Definiteness Matrix A; definiteness;associated quadratic form QA(x,y); set of vectors (x,y) such that QA(x,y)=1 positive definite indefinite 1/4 x2 + y2 1/4 x2 − 1/4 y2 Ellipse Hyperbola A symmetric n×n-matrix is called positive-definite (respectively negative-definite; indefinite), if for all nonzero vectors x ∈ Rn the associatedquadratic form given by Q(x) = xT Ax takes only positive values (respectively only negative values; both some negative and some positive values).[25] If the quadratic form takes only non-negative (respectively only non-positive) values, the symmetric matrix is called positive- semidefinite (respectively negative-semidefinite); hence the matrix is indefinite precisely when it is neither positive-semidefinite nor negative-semidefinite. A symmetric matrix is positive-definite if and only if all its eigenvalues are positive. [26] The table at the right shows two possibilities for 2-by-2 matrices. Allowing as input two different vectors instead yields the bilinear form associated to A: BA (x, y) = xT Ay.[27] [edit]Computational aspects In addition to theoretical knowledge of properties of matrices and their relation to other fields, it is important for practical purposes to perform matrix calculations effectively and precisely. The domain studying these matters is called numerical linear algebra.[28] As with other numerical situations, two main aspects are
  • 39.
    the complexity ofalgorithms and theirnumerical stability. Many problems can be solved by both direct algorithms or iterative approaches. For example, finding eigenvectors can be done by finding a sequence of vectors xn converging to an eigenvector when n tends to infinity.[29] Determining the complexity of an algorithm means finding upper bounds or estimates of how many elementary operations such as additions and multiplications of scalars are necessary to perform some algorithm, e.g. multiplication of matrices. For example, calculating the matrix product of two n-by-n matrix using the definition given above needs n3 multiplications, since for any of the n2 entries of the product, n multiplications are necessary. The Strassen algorithm outperforms this "naive" algorithm; it needs only n2.807 multiplications.[30] A refined approach also incorporates specific features of the computing devices. In many practical situations additional information about the matrices involved is known. An important case are sparse matrices, i.e. matrices most of whose entries are zero. There are specifically adapted algorithms for, say, solving linear systems Ax = b for sparse matrices A, such as the conjugate gradient method.[31] An algorithm is, roughly speaking, numerical stable, if little deviations (such as rounding errors) do not lead to big deviations in the result. For example, calculating the inverse of a matrix via Laplace's formula(Adj (A) denotes the adjugate matrix of A) A−1 = Adj(A) / det(A) may lead to significant rounding errors if the determinant of the matrix is very small. The norm of a matrix can be used to capture the conditioning of linear algebraic problems, such as computing a matrix' inverse.[32] Although most computer languages are not designed with commands or libraries for matrices, as early as the 1970s, some engineering desktop computers such as the HP 9830 had ROM cartridges to add BASIC commands for matrices. Some computer languages such as APL were designed to manipulate matrices, and various mathematical programs can be used to aid computing with matrices.[33] [edit]Matrix decomposition methods Main articles: Matrix decomposition, Matrix diagonalization, and Gaussian elimination There are several methods to render matrices into a more easily accessible form. They are generally referred to as matrix transformation or matrix
  • 40.
    decomposition techniques. Theinterest of all these decomposition techniques is that they preserve certain properties of the matrices in question, such as determinant, rank or inverse, so that these quantities can be calculated after applying the transformation, or that certain matrix operations are algorithmically easier to carry out for some types of matrices. The LU decomposition factors matrices as a product of lower (L) and an upper triangular matrices (U).[34] Once this decomposition is calculated, linear systems can be solved more efficiently, by a simple technique called forward and back substitution. Likewise, inverses of triangular matrices are algorithmically easier to calculate. The Gaussian elimination is a similar algorithm; it transforms any matrix to row echelon form.[35] Both methods proceed by multiplying the matrix by suitable elementary matrices, which correspond to permuting rows or columns and adding multiples of one row to another row. Singular value decomposition expresses any matrix A as a product UDV ∗ , where U and V are unitary matrices and D is a diagonal matrix. A matrix in Jordan normal form. The grey blocks are called Jordan blocks. The eigendecomposition or diagonalization expresses A as a product VDV−1 , where D is a diagonal matrix and V is a suitable invertible matrix.[36] If A can be written in this form, it is called diagonalizable. More generally, and applicable to all matrices, the Jordan decomposition transforms a matrix into Jordan normal form, that is to say matrices whose only nonzero entries are the eigenvalues λ1 to λn of A, placed on the main diagonal and possibly entries equal to one directly above the main diagonal, as shown at the right.[37] Given the eigendecomposition, the nth power of A (i.e. n-fold iterated matrix multiplication) can be calculated via An = (VDV−1 )n = VDV−1 VDV−1 ...VDV−1 = VDn V−1
  • 41.
    and the powerof a diagonal matrix can be calculated by taking the corresponding powers of the diagonal entries, which is much easier than doing the exponentiation for Ainstead. This can be used to compute the matrix exponential eA , a need frequently arising in solving linear differential equations, matrix logarithms and square roots of matrices.[38] To avoid numerically ill- conditioned situations, further algorithms such as the Schur decomposition can be employed.[39] [edit]Abstract algebraic aspects and generalizations Matrices can be generalized in different ways. Abstract algebra uses matrices with entries in more general fields or even rings, while linear algebra codifies properties of matrices in the notion of linear maps. It is possible to consider matrices with infinitely many columns and rows. Another extension are tensors, which can be seen as higher-dimensional arrays of numbers, as opposed to vectors, which can often be realised as sequences of numbers, while matrices are rectangular or two- dimensional array of numbers.[40] Matrices, subject to certain requirements tend to form groups known as matrix groups. [edit]Matrices with more general entries This article focuses on matrices whose entries are real or complex numbers. However, matrices can be considered with much more general types of entries than real or complex numbers. As a first step of generalization, any field, i.e. a set where addition, subtraction, multiplication and division operations are defined and well-behaved, may be used instead of R or C, for example rational numbers or finite fields. For example, coding theory makes use of matrices over finite fields. Wherever eigenvalues are considered, as these are roots of a polynomial they may exist only in a larger field than that of the coefficients of the matrix; for instance they may be complex in case of a matrix with real entries. The possibility to reinterpret the entries of a matrix as elements of a larger field (e.g., to view a real matrix as a complex matrix whose entries happen to be all real) then allows considering each square matrix to possess a full set of eigenvalues. Alternatively one can consider only matrices with entries in an algebraically closed field, such as C, from the outset. More generally, abstract algebra makes great use of matrices with entries in a ring R.[41] Rings are a more general notion than fields in that no division operation exists. The very same addition and multiplication operations of matrices extend to this setting, too. The set M(n, R) of all square n-by-n matrices over R is a ring called matrix ring, isomorphic to the endomorphism ring of the left R-moduleRn .[42] If
  • 42.
    the ring Ris commutative, i.e., its multiplication is commutative, then M(n, R) is a unitary noncommutative (unless n = 1) associative algebra over R. The determinant of square matrices over a commutative ring R can still be defined using the Leibniz formula; such a matrix is invertible if and only if its determinant is invertible in R, generalising the situation over a field F, where every nonzero element is invertible.[43] Matrices over superrings are called supermatrices.[44] Matrices do not always have all their entries in the same ring - or even in any ring at all. One special but common case is block matrices, which may be considered as matrices whose entries themselves are matrices. The entries need not be quadratic matrices, and thus need not be members of any ordinary ring; but their sizes must fulfil certain compatibility conditions. [edit]Relationship to linear maps Linear maps Rn → Rm are equivalent to m-by-n matrices, as described above. More generally, any linear map f: V → W between finite-dimensional vector spaces can be described by a matrix A = (aij), after choosing bases v1, ..., vn of V, and w1, ..., wm of W (so n is the dimension of V and m is the dimension of W), which is such that In other words, column j of A expresses the image of vj in terms of the basis vectors wi of W; thus this relationuniquely determines the entries of the matrix A. Note that the matrix depends on the choice of the bases: different choices of bases give rise to different, but equivalent matrices.[45] Many of the above concrete notions can be reinterpreted in this light, for example, the transpose matrix AT describes thetranspose of the linear map given by A, with respect to the dual bases.[46] Graph theory
  • 43.
    An undirected graphwith adjacency matrix The adjacency matrix of a finite graph is a basic notion of graph theory.[62] It saves which vertices of the graph are connected by an edge. Matrices containing just two different values (0 and 1 meaning for example "yes" and "no") are called logical matrices. The distance (or cost) matrix contains information about distances of the edges.[63] These concepts can be applied to websites connected hyperlinks or cities connected by roads etc., in which case (unless the road network is extremely dense) the matrices tend to be sparse, i.e. contain few nonzero entries. Therefore, specifically tailored matrix algorithms can be used in network theory. [edit]Analysis and geometry The Hessian matrix of a differentiable function ƒ: Rn → R consists of the second derivatives of ƒ with respect to the several coordinate directions, i.e.[64] It encodes information about the local growth behaviour of the function: given a critical point x = (x1, ..., xn), i.e., a point where the first partial derivatives of ƒ vanish, the function has a local minimum if the Hessian matrix is positive definite. Quadratic programming can be used to find global minima or maxima of quadratic functions closely related to the ones attached to matrices (see above).[65] At the saddle point (x = 0, y = 0) (red) of the function f(x,−y) = x2 − y2 , the Hessian matrix is indefinite.
  • 44.
    Another matrix frequentlyused in geometrical situations is the Jacobi matrix of a differentiable map f: Rn → Rm . If f1, ..., fm denote the components of f, then the Jacobi matrix is defined as [66] If n > m, and if the rank of the Jacobi matrix attains its maximal value m, f is locally invertible at that point, by the implicit function theorem.[67] Partial differential equations can be classified by considering the matrix of coefficients of the highest-order differential operators of the equation. For elliptic partial differential equations this matrix is positive definite, which has decisive influence on the set of possible solutions of the equation in question.[68] The finite element method is an important numerical method to solve partial differential equations, widely applied in simulating complex physical systems. It attempts to approximate the solution to some equation by piecewise linear functions, where the pieces are chosen with respect to a sufficiently fine grid, which in turn can be recast as a matrix equation.[69] [edit]Probability theory and statistics Two different Markov chains. The chart depicts the number of particles (of a total of 1000) in state "2". Both limiting values can be determined from the transition matrices, which are given by (red) and (black). Stochastic matrices are square matrices whose rows are probability vectors, i.e., whose entries sum up to one. Stochastic matrices are used to define Markov chains with finitely many states.[70] A row of the stochastic matrix gives the probability distribution for the next position of some particle which is currently in the state corresponding to the row. Properties of the
  • 45.
    Markov chain likeabsorbing states, i.e. states that any particle attains eventually, can be read off the eigenvectors of the transition matrices.[71] Statistics also makes use of matrices in many different forms.[72] Descriptive statistics is concerned with describing data sets, which can often be represented in matrix form, by reducing the amount of data. The covariance matrix encodes the mutual variance of several random variables.[73] Another technique using matrices are linear least squares, a method that approximates a finite set of pairs (x1, y1), (x2, y2), ..., (xN, yN), by a linear function yi ≈ axi + b, i = 1, ..., N which can be formulated in terms of matrices, related to the singular value decomposition of matrices.[74] Random matrices are matrices whose entries are random numbers, subject to suitable probability distributions, such as matrix normal distribution. Beyond probability theory, they are applied in domains ranging from number theory to physics.[75][76] [edit]Symmetries and transformations in physics Further information: Symmetry in physics Linear transformations and the associated symmetries play a key role in modern physics. For example, elementary particles in quantum field theory are classified as representations of the Lorentz group of special relativity and, more specifically, by their behavior under the spin group. Concrete representations involving the Pauli matrices and more general gamma matrices are an integral part of the physical description of fermions, which behave as spinors.[77] For the three lightest quarks, there is a group-theoretical representation involving the special unitary group SU(3); for their calculations, physicists use a convenient matrix representation known as the Gell-Mann matrices, which are also used for the SU(3) gauge group that forms the basis of the modern description of strong nuclear interactions, quantum chromodynamics. The Cabibbo–Kobayashi–Maskawa matrix, in turn, expresses the fact that the basic quark states that are important for weak interactions are not the same as, but linearly related to the basic quark states that define particles with specific and distinct masses.[78] [edit]Linear combinations of quantum states The first model of quantum mechanics (Heisenberg, 1925) represented the theory's operators by infinite-dimensional matrices acting on quantum states.[79] This is also referred to as matrix mechanics. One particular example is the density matrix that
  • 46.
    characterizes the "mixed"state of a quantum system as a linear combination of elementary, "pure" eigenstates.[80] Another matrix serves as a key tool for describing the scattering experiments which form the cornerstone of experimental particle physics: Collision reactions such as occur in particle accelerators, where non-interacting particles head towards each other and collide in a small interaction zone, with a new set of non-interacting particles as the result, can be described as the scalar product of outgoing particle states and a linear combination of ingoing particle states. The linear combination is given by a matrix known as the S-matrix, which encodes all information about the possible interactions between particles.[81] [edit]Normal modes A general application of matrices in physics is to the description of linearly coupled harmonic systems. The equations of motion of such systems can be described in matrix form, with a mass matrix multiplying a generalized velocity to give the kinetic term, and a force matrix multiplying a displacement vector to characterize the interactions. The best way to obtain solutions is to determine the system'seigenvectors, its normal modes, by diagonalizing the matrix equation. Techniques like this are crucial when it comes to describing the internal dynamics of molecules: the internal vibrations of systems consisting of mutually bound component atoms.[82] They are also needed for describing mechanical vibrations, and oscillations in electrical circuits.[83] [edit]Geometrical optics Geometrical optics provides further matrix applications. In this approximative theory, the wave nature of light is neglected. The result is a model in which light rays are indeed geometrical rays. If the deflection of light rays by optical elements is small, the action of a lens or reflective element on a given light ray can be expressed as multiplication of a two-component vector with a two-by-two matrix called ray transfer matrix: the vector's components are the light ray's slope and its distance from the optical axis, while the matrix encodes the properties of the optical element. Actually, there will be two different kinds of matrices, viz. a refraction matrix describing de madharchod refraction at a lens surface, and a translation matrix, describing the translation of the plane of reference to the next refracting surface, where another refraction matrix will apply. The optical system consisting of a combination of lenses and/or reflective elements is simply described by the matrix resulting from the product of the components' matrices.[84]
  • 47.
    [edit]Electronics The behaviour ofmany electronic components can be described using matrices. Let A be a 2-dimensional vector with the component's input voltage v1 and input current i1 as its elements, and let B be a 2-dimensional vector with the component's output voltage v2 and output current i2 as its elements. Then the behaviour of the electronic component can be described by B = H · A, where H is a 2 x 2 matrix containing one impedance element (h12), one admittance element (h21) and two dimensionless elements (h11 and h22). Calculating a circuit now reduces to multiplying matrices. [edit]History Matrices have a long history of application in solving linear equations. The Chinese text The Nine Chapters on the Mathematical Art (Jiu Zhang Suan Shu), from between 300 BC and AD 200, is the first example of the use of matrix methods to solve simultaneous equations,[85] including the concept of determinants, almost 2000 years before its publication by the Japanese mathematician Seki in 1683 and the German mathematician Leibniz in 1693. Cramer presented Cramer's rule in 1750. Early matrix theory emphasized determinants more strongly than matrices and an independent matrix concept akin to the modern notion emerged only in 1858, with Cayley's Memoir on the theory of matrices.[86][87] The term "matrix" was coined by Sylvester, who understood a matrix as an object giving rise to a number of determinants today called minors, that is to say, determinants of smaller matrices which derive from the original one by removing columns and rows. Etymologically, matrix derives from Latin mater (mother).[88] The study of determinants sprang from several sources.[89] Number- theoretical problems led Gauss to relate coefficients of quadratic forms, i.e., expressions such as x2 + xy − 2y2 , and linear maps in three dimensions to matrices. Eisenstein further developed these notions, including the remark that, in modern parlance, matrix products are non-commutative. Cauchy was the first to prove general statements about determinants, using as definition of the determinant of a matrix A = [ai,j] the following: replace the powers aj k by ajk in the polynomial where Π denotes the product of the indicated terms. He also showed, in 1829, that the eigenvalues of symmetric matrices are real.[90] Jacobi studied "functional determinants"—later called Jacobi determinants by Sylvester—which can be used to describe geometric transformations at a local (or infinitesimal) level, see above; Kronecker's Vorlesungen über die Theorie der
  • 48.
    Determinanten[91] andWeierstrass' Zur Determinantentheorie,[92] bothpublished in 1903, first treated determinants axiomatically, as opposed to previous more concrete approaches such as the mentioned formula of Cauchy. At that point, determinants were firmly established. Many theorems were first established for small matrices only, for example the Cayley-Hamilton theorem was proved for 2×2 matrices by Cayley in the aforementioned memoir, and by Hamilton for 4×4 matrices. Frobenius, working on bilinear forms, generalized the theorem to all dimensions (1898). Also at the end of the 19th century the Gauss-Jordan elimination (generalizing a special case now known asGauss elimination) was established by Jordan. In the early 20th century, matrices attained a central role in linear algebra.[93] partially due to their use in classification of the hypercomplex number systems of the previous century. The inception of matrix mechanics by Heisenberg, Born and Jordan led to studying matrices with infinitely many rows and columns.[94] Later, von Neumann carried out the mathematical formulation of quantum mechanics, by further developing functional analytic notions such as linear operators on Hilbert spaces, which, very roughly speaking, correspond to Euclidean space, but with an infinity ofindependent directions. [edit]Other historical usages of the word "matrix" in mathematics The word has been used in unusual ways by at least two authors of historical importance. Bertrand Russell and Alfred North Whitehead in their Principia Mathematica (1910– 1913) use the word matrix in the context of their Axiom of reducibility. They proposed this axiom as a means to reduce any function to one of lower type, successively, so that at the "bottom" (0 order) the function will be identical to its extension[disambiguation needed] : "Let us give the name of matrix to any function, of however many variables, which does not involve any apparent variables. Then any possible function other than a matrix is derived from a matrix by means of generalization, i.e. by considering the proposition which asserts that the function in question is true with all possible values or with some value of one of the arguments, the other argument or arguments remaining undetermined".[95] For example a function Φ(x, y) of two variables x and y can be reduced to a collection of functions of a single variable, e.g. y, by "considering" the function for all possible values of "individuals" ai substituted in place of variable x. And then the
  • 49.
    resulting collection offunctions of the single variable y, i.e. ∀ai: Φ(ai, y), can be reduced to a "matrix" of values by "considering" the function for all possible values of "individuals" bi substituted in place of variable y: ∀bj∀ai: Φ(ai, bj). Alfred Tarski in his 1946 Introduction to Logic used the word "matrix" synonymously with the notion of truth table as used in mathematical logic.[96] [edit]See also Median The median is the middle value in a set of numbers. In the set [1,2,3] the median is 2. If the set has an even number of values, the median is the average of the two in the middle. For example, the median of [1,2,3,4] is 2.5, because that is the average of 2 and 3. The median is often used when analyzing statistical studies, for example the income of a nation. While the arithmetic mean (average) is a simple calculation that most people understand, it is skewed upwards by just a few high values. The average income of a nation might be $20,000 but most people are much poorer. Many people with $10,000 incomes are balanced out by just a single person with a $5,000,000 income. Therefore the median is often quoted because it shows a value that 50% of the country makes more than and 50% of the country makes less than. Exponential Functions Take a look at x3 . What does it mean? We have two parts here: 1) Exponent, which is 3. 2) Base, which is x. x 3 = x times x times x
  • 50.
    It's read twoways: 1) x cubed 2) x to the third power With exponential functions, 3 is the base and x is the exponent. So, the idea is reversed in terms of exponential functions. Here's what exponential functions look like: y = 3 x , f(x) = 1.124 x , etc. In other words, the exponent will be a variable. The general exponential function looks like this: b x where the base b is ANY constant. So, the standard form for ANY exponential function is f(x) = b x where b is a real number greater than 0. Sample: Solve for x f(x) = 1.276 x Here x can be ANY number we select. Say, x = 1.2. f(1.2) = 1.276 1.2 NOTE: You must follow your calculator's instructions in terms of exponents. Every calculator is different and thus has different steps. I will use my TI-36 SOLAR Calculator to find an approximation for x. f(1.2) = 1.33974088 Rounding off to two decimal places I get: f(1.2) = 1.34 We can actually graph our point (1.2, 1.34) on the xy-plane but more on that in future exponential function lessons. We can use the formula B(t) = 100(1.12 t ) to solve bacteria applications. We can use the above formula to find HOW MUCH bacteria remains in a given region after a certain amount of time. Of course, in the formula, lower case t = time. The number 100 indicates how many bacteria there were at the start of the LAB experiment. The decimal number 1.12 indicates how fast bacteria grows. Sample: How much bacteria in LAB 3 after 2.9 hours of work? Okay, t = 2.9 hours. Replace t with 2.9 hours in the formula above and simplify.
  • 51.
    B(2.9 hours) =100(1.12 2.9 hours) B(2.9 hours) = 100(1.389096016) B(2.9 hours) = 138.9096 NOTE: An exponent can be ANY real number, positive or negative. For ANY exponential function, the domain will be ALL real numbers. Trig Addition Formulas The trig addition formulas can be useful to simplify a complicated expression, or perhaps find an exact value when you only have a small table of trig values. For example, if you want the sine of 15 degrees, you can use a subtraction formula to compute sin(15) as sin(45-30). Trigonometry Derivatives While you may know how to take the derivative of a polynomial, what happens when you need to take the derivative of a trig function? What IS the derivative of a sine? Luckily, the derivatives of trig functions are simple -- they're other trig functions! For example, the derivative of sine is just cosine:
  • 52.
    The rest ofthe trig functions are also straightforward once you learn them, but they aren't QUITE as easy as the first two. Derivatives of Trigonometry Functions sin'(x) = cos(x) cos'(x) = -sin(x) tan'(x) = sec2 (x) sec'(x) = sec(x)tan(x) cot'(x) = -csc2 (x) csc'(x) = -csc(x)cot(x) Take a look at this graphic for an illustration of what this means. At the first point (around x=2*pi), the cosine isn't changing. You can see that the sine is 0, and since negative sine is the rate of change of cosine, cosine would be changing at a rate of -0. At the second point I've illustrated (x=3*pi), you can see that the sine is decreasing rapidly. This makes sense because the cosine is negative. Since cosine is the rate of change of sine, a negative cosine means the sine is decreasing.
  • 53.
    Double and HalfAngle Formulas The double and half angle formulas can be used to find the values of unknown trig functions. For example, you might not know the sine of 15 degrees, but by using the half angle formula for sine, you can figure it out based on the common value of sin(30) = 1/2. They are also useful for certain integration problems where a double or half angle formula may make things much simpler to solve. Double Angle Formulas:
  • 54.
    You'll notice thatthere are several listings for the double angle for cosine. That's because you can substitute for either of the squared terms using the basic trig identity sin^2+cos^2=1. Half Angle Formulas: These are a little trickier because of the plus or minus. It's not that you can use BOTH, but you have to figure out the sign on your own. For example, the sine of 30 degrees is positive, as is the sine of 15. However, if you were to use 200, you'd find that the sine of 200 degrees is negative, while the sine of 100 is positive. Just remember to look at a graph and figure out the sines and you'll be fine.
  • 55.
    The magic identity Trigonometryis the art of doing algebra over the circle. So it is a mixture of algebra and geometry. The sine and cosine functions are just the coordinates of a point on the unit circle. This implies the most fundamental formula in trigonometry (which we will call here the magic identity) where is any real number (of course measures an angle). Example. Show that Answer. By definitions of the trigonometric functions we have Hence we have Using the magic identity we get This completes our proof. Remark. the above formula is fundamental in many ways. For example, it is very useful in techniques of integration. Example. Simplify the expression Answer. We have by definition of the trigonometric functions
  • 56.
    Hence Using the magicidentity we get Putting stuff together we get This gives Using the magic identity we get Therefore we have Example. Check that Answer. Example. Simplify the expression
  • 57.
    Answer. The following identitiesare very basic to the analysis of trigonometric expressions and functions. These are called Fundamental Identities Reciprocal identities Pythagorean Identities Quotient Identities Understanding sine A teaching guideline/lesson plan when first teaching sine (grades 7-9) The sine is simply a RATIO of certain sides of a right triangle. Look at the triangles below. They all have the same shape. That means they have the SAME ANGLES but the lengths of the sides may be different. In other words, they are SIMILAR figures. Have your child/students measure the sides s1, h1, s2, h2, s3, h3 as accurately as possible (or draw several similar right triangles on their own). Then let her calculate the following s1 , s2 s3 . What can you
  • 58.
    ratios: h1 h2 h3 note? Those ratiosshould all be the same (or close to same due to measuring errors). That is so because the triangles have the same shape (or are similar), which means their respective parts are PROPORTIONAL. That is why the ratio of those parts remains the same. Now ask your child what would happen if we had a fourth triangle with the same shape. The answer of course is that even in that fourth triangle the ratio s4/h4 would be the same. The ratio you calculated remains the same for all the triangles. Why? Because the triangles were similar so their sides were proportional. SO, in all right triangles where the angles are the same, this one ratio is the same too. We associate this ratio with the angle α. THAT RATIO IS CALLED THE SINE OF THE ANGLE α. What follows is that if you know the ratio, you can find what the angle α is. Or in other words, if you know the sine of α, you can find α. Or, if you know what α is, you can find this ratio - and when you know this ratio and one side of a right triangle, you can find the other sides. : s1 h1 = s2 h2 = s3 h3 = sin α = 0.57358 In our pictures the angle α is 35 degrees. So sin 35 = 0.57358 (rounded to five decimals). We can use this fact when dealing with OTHER right triangles that have a 35 angle. See, other such triangles are, again, similar to these ones we see here, so the ratio of the opposite side to the hypotenuse, WHICH IS THE SINE OF THE 35 ANGLE, is the same! So in another such triangle, if you only know the hypotenuse, you can calculate the opposite side since you know the ratio, or vice versa. Problem Suppose we have a triangle that has the same shape as the triangles above. The side opposite to the 35 angle is 5 cm. How long is the hypotenuse? SOLUTION: Let h be that hypotenuse. Then 5cm = sin 35 ≈ 0.57358
  • 59.
    h From this equationone can easily solve that h = 5cm 0.57358 ≈ 8.72 cm An example The two triangles are pictured both overlapping and separate. We can find H3 simply by the fact that these two triangles are similar. Since the triangles are similar, 3.9 h3 = 2.6 6 , from which h3 = 6 × 3.9 2.6 = 9 We didn't even need the sine to solve that, but note how closely it ties in with similar triangles. The triangles have the same angle α. Sin α of course would be the ratio 2.6 6 or 3.9 9 ≈ 0.4333. Now we can find the actual angle α from the calculator: Since sin α = 0.4333, then α = sin-1 0.4333 ≈ 25.7 degrees. Test your understanding 1. Draw a right triangle that has a 40 angle. Then measure the opposite side and the hypotenuse and use those measurements to calculate sin 40. Check your
  • 60.
    answer by plugginginto calculator sin 40 (remember the calculator has to be in the degrees mode instead of radians mode). 2. Draw two right triangles that have a 70 angle - but that are of different sizes. Use the first triangle to find sin 70 (like you did in problem 1). Then measure the hypotenuse of your second triangle. Use sin 70 and the measurement of the hypotenuse to find the opposite side in your second triangle. Check by measuring the opposite side from your triangle. 3. Draw a right triangle that has a 48 angle. Measure the hypotenuse. Then use sin 48 (from a calculator) and your measurement to calculate the length of the opposite side. Check by measuring the opposite side from your triangle. Someone asked me once, "When I type in sine in my graphic calculator, why does it give me a wave?" Read my answer where we get the familiar sine wave. My question is that if some one give us only the length of sides of triangle, how can we draw a triangle? sajjad ahmed shah This is an easy construction. See Constructing a Triangle and Constructing a triangle when the lengths of 3 sides are known if i am in a plane flying at 30000 ft how many linear miles of ground can i see. and please explain how that answer is generated. does it have anything to do with right triangles and the pythagorean therom jim taucher The image below is NOT to scale - it is just to help in the problem. The angle α is much smaller in reality. Yes, you have a right triangle. r is the Earth's radius. Now, Earth's radius is not constant but varies because Earth is not a perfect sphere. For this problem, I was using the mean radius, 3959.871 miles. This also means our answer will be just approximate. I also converted 30000 feet to 5.6818182 miles.
  • 61.
    First we calculateα using cosine. You should get α is 3.067476356 degrees. Then, we use a proportion comparing α and 360 degrees and x and earth's circumference. You will get x ≈ 212 miles. Even that result might be too 'exact'. ntroduction: The Unit circle is used to understand the sins and cos of angles to find 90 degree triangle. Its radius is exactly one. The center of circle is said to be the origin and its perimeter comprises the set of all points that are exactly one unit from the center of the circle while placed in the plane.Its just a circle with radius ‘one’. Unit Circle -standard Equation: The distance from the origin point(x,y) is by using Pythagorean Theorem. Here, radius is one So, The expression should becomes =1 Take square on both sides then the equation becomes, X2 +y2 =1
  • 62.
    Positive angles arefound using counterclockwise from the positive x axis And negative angles are found anti clockwise from negative x axis. Unit Circle-sine and Cosine Sine ,Cosine: You Can directly measure the Sine theta and cosine theta,becasue radius of the unit circle is one, By using this condition,When the angle is is 0 deg. So, cosine =1 ,sine =0 and tangent =0 if is 90 degree. cosine =0 ,sine =1 and tangent is undefined. Calculating 60 °, 30° ,45 ° Note : Radius of the Unit Circle is 1 Take 60° Consider eqiilateral triangle.All Sides are equal and all angles are same.So x side is now 1/2 and Y side will be, + = + =1 Therefore, Y = = Therefore , cos =1/2 = 0.5 and sine = = 0.8660 Take 30°
  • 63.
    Here 30 isjust 60 swapped over So we get ,cosine = sqrt(3/4) = 0.8660 and sine =1/2 = 0.5 Take 45° X = Y so , + = 12 Therefore , cos = = 0.70 and sine = = 0.70 he standard definition of sine is usually introduced as: which is correct, but makes the eyes glaze over and doesn’t even hint that the ancient Egyptians used trigonometry to survey their land . The ancient Egyptians noticed if two different triangles both have the same angles (similar triangles), then no matter what their size, the relationships between the sides were always the same. The ratio of side a to side b is the same as the ratio of side A to side B. This is true for all combinations of sides – their ratios are always the same. Expressed as fractions we would write: With numbers it might look like: if side a is 1 unit long and side b is 2 units long and the larger triangle has sides that are twice as long, then side A is 2 units and side B is 4 units long. Writing the ratios as fractions we see:
  • 64.
    To determine iftwo triangles are similar (have the same angles), we have to measure two angles. The angles inside a triangle always add up to 180 degrees. If you know two of the angles, then you can figure out the third. Later, an insightful Greek realized that if the triangle is a right angle triangle, then we already know one angle – it is 90 degrees, so we don’t need to measure it, therefore we only have to measure one of the other angles to determine if two right angle triangles are similar. This insight turned out to be incredibly useful for measuring things. We can make a bunch of small right angle triangles and measure their sides and calculate the ratios of the sides. (or we could just write them in a book, “A right angle triangle with a measured angle of 1 degree has the following ratios ...”) Knowing all these ratios for different angles, we can then measure things. For example, if you are a landowner and want to measure how long your fields are, you could do the following: • send a servant with a 3 meter tall staff to the end of the field • measure the angle to the top of the staff • consult your table of similar right angle triangles and determine the ratio of the sides (in this case, we would be looking for staff / distance to measure) • calculate the length of the field using the ratio and the known height of the staff If the measured angle was 5 degrees, then we know (from a similar triangle) that the ratio of staff to distance to measure is 0.0874, or: Rearranging the equation we get:
  • 65.
    a distance of34.3 meters. We call this ratio the tangent of the angle. The sine, cosine, tangent, secant, cosecant and cotangent refer to the ratio of a specific pair of sides of the triangle at the given angle A. The problem is that the names aren’t very informative or intuitive. Sine comes from the Latin word sinus which means fold or bend. Looking at our original definition, it now makes a little more sense: The sine of angle A is equal to the ratio of the sides at the bend in the triangle as seen from A. Or opposite divided by hypotenuse. The ratio of a given pair of sides in a right angle triangle are given the following names: There is no simple way to remember which ratios go with which trigonometric function, although it is easier if you know some of the history behind it. sin, cos, tan, sec, csc, and cot are a shorthand way of referring to the ratio of a specific pair of sides in a right angle triangle.
  • 66.
    An Introduction toTrigonometry ... by Brandon Williams Main Index... Introduction Well it is nearly one in the morning and I have tons of work to do and a fabulous idea pops into my head: How about writing an introductory tutorial to trigonometry! I am going to fall so far behind. And once again I did not have the chance to proof read this or check my work so if you find any mistakes e-mail me. I'm going to try my best to write this as if the reader has no previous knowledge of math (outside of some basic Algebra at least) and I'll do my best to keep it consistent. There may be flaws or gaps in my logic at which point you can e-mail me and I will do my best to go back over something more specific. So let's begin with a standard definition of trigonometry: trig - o - nom - e - try n. - a branch of mathematics which deals with relations between sides and angles of triangles Basics Well that may not sound very interesting at the moment but trigonometry is the most interesting forms of math I have come across…and just to let you know I do not have an extensive background in math. Well since trigonometry has a lot to do with angles and triangles let's familiarize ourselves with some fundamentals. First a right triangle: A right triangle is a triangle that has one 90-degree angle. The 90-degree angle is denoted with a little square drawn in the corner. The two sides that are adjacent to the 90-degree angle, 'a' and 'b', are called the legs. The longer side opposite of the 90- degree angle, 'c', is called the hypotenuse. The hypotenuse is always longer than the legs. While we are on the subject lets brush up on the Pythagorean Theorem. The Pythagorean Theorem states that the sum of the two legs squared is equal to the hypotenuse squared. An equation you can use is: c^2 = a^2 + b^2 So lets say we knew that 'a' equaled 3 and 'b' equaled 4 how would we find the length of 'c'…assuming this is in fact a right triangle. Plug-in the values that you know into your formula:
  • 67.
    c^2 = 3^2+ 4^2 Three squared plus four squared is twenty-five so we now have this: c^2 = 25 - - - > Take the square root of both sides and you now know that c = 5 So now we are passed some of the relatively boring parts. Let's talk about certain types of right triangles. There is the 45-45-90 triangle and the 30-60-90 triangle. We might as well learn these because we'll need them later when we get to the unit circle. Look at this picture and observe a few of the things going on for a 45-45-90 triangle: In a 45-45-90 triangle you have a 90-degree angle and two 45-degree angles (duh) but also the two legs are equal. Also if you know the value of 'c' then the legs are simply 'c' multiplied by the square root of two divided by two. I rather not explain that because I would have to draw more pictures…hopefully you will be able to prove it through your own understanding. The 30-60-90 triangle is a little but harder to get but I am not going into to detail with it…here is a picture: You now have one 30-degree angle, a 60-degree angle, and a 90-degree angle. This time the relationship between the sides is a little different. The shorter side is half of the hypotenuse. The longer side is the hypotenuse times the square root of 3 all divided by two. That's all I'm really going to say on this subject but make sure you get this before you go on because it is crucial in understanding the unit circle…which in turn is crucial for understanding trigonometry. Trigonometric Functions The entire subject of trigonometry is mostly based on these functions we are about to learn. The three basic ones are sine, cosine, and tangent. First to clear up any confusion that some might have: these functions mean nothing with out a number with them i.e. sin (20) is something…sin is nothing. Make sure you know that. Now for some quick definitions (these are my own definitions…if you do not get what I am
  • 68.
    saying look themup on some other website): Sine - the ratio of the side opposite of an angle in a right triangle over the hypotenuse. Cosine - the ratio of the side adjacent of an angle in a right triangle over the hypotenuse. Tangent - the ratio of the side opposite of an angle in a right triangle over the adjacent side. Now before I go on I should also say that those functions only find ratios and nothing more. It may seem kind of useless now but they are very powerful functions. Also I am only going to explain the things that I think are useful in Flash…I could go off on some tangent (no pun intended) on other areas of Trigonometry but I'll try to keep it just to the useful stuff. OK lets look at a few pictures: Angles are usually denoted with capital case letters so that is what I used. Now lets find all of the trigonometry ratios for angle A: sin A = 4/5 cos A = 3/5 tan A = 4/3 Now it would be hard for me to explain more than what I have done, for this at least, so you are just going to have to look at the numbers and see where I got them from. Here are the ratios for angle B: sin B = 3/5 cos B = 4/5 tan B = 3/4 Once again just look at the numbers and reread the definitions to see where I came up with that stuff. But now that I told you a way of thinking of the ratios like opposite over hypotenuse there is one more way which should be easier and will also be discussed more later on. Here is a picture…notice how I am only dealing with one angle:
  • 69.
    The little symbolin the corner of the triangle is a Greek letter called "theta"…its usually used to represent an unknown angle. Now with that picture we can think of sine, cosine and tangent in a different way: sin (theta) = x/r cos (theta)= y/r tan (theta)= y/x -- and x <> 0 We will be using that form most of the time. Now although I may have skipped some kind of fundamentally important step (I'm hoping I did not) I can only think of one place to go from here: the unit circle. Becoming familiar with the unit circle will probably take the most work but make sure you do because it is very important. First let me tell you about radians just in case you do not know. Radians are just another way of measuring angles very similar to degrees. You know that there are 90 degrees in one-quarter of a circle, 180 degrees in one-half of a circle, and 360 degrees in a whole circle right? Well if you are dealing with radians there are 2p radians in a whole circle instead of 360 degrees. The reason that there are 2p radians in a full circle really is not all that important and would only clutter this "tutorial" more…just know that it is and it will stay that way. Now if there are 2p radians in a whole circle there are also p radians in a half, and p/2 radians in a quarter. Now its time to think about splitting the circle into more subdivisions than just a half or quarter. Here is a picture to help you out:
  • 70.
    If at allpossible memorize those values. You can always have a picture to look at like this one but it will do you well when you get into the more advanced things later on if you have it memorized. However that is not the only thing you need to memorize. Now you need to know (from memory if you have the will power) the sine and cosine values for every angle measure on that chart. OK I think I cut myself short on explaining what the unit circle is when I moved on to explaining radians. For now the only thing we need to know is that it is a circle with a radius of one centered at (0,0). Now the really cool thing about the unit circle is what we are about to discuss. I'm going to just pick some random angle up there on the graph…let's say…45 degrees. Do you see that line going from the center of the circle (on the chart above) to the edge of the circle? That point at which the line intersects the edge of the circle is very important. The "x" coordinate of that point on the edge is the cosine of the angle and the "y" coordinate is the sine of the angle. Very interesting huh? So lets find the sine and cosine of 45 degrees ourselves without any calculator or lookup tables. Well if you remember anything that I said at the beginning of this tutorial then you now know why I even mentioned it. In a right triangle if there is an angle with a measure of 45 degrees the third angle is also 45 degrees. And not only that but the two legs of the triangle have the same length. So if we think of that line coming from the center of the circle at a 45-degree angle as a right triangle we can find the x- and y- position of where the line intersects…look at this picture:
  • 71.
    If we applysome of the rules we learned about 45-45-90 triangles earlier we can accurately say that: sqrt (2) sin 45 = -------- 2 sqrt (2) cos 45 = ---------- 2 Another way to think of sine is it's the distance from the x-axis to the point on the edge of the circle…you can only think of it that way if you are dealing with a unit circle. You could also think of cosine the same way except it's the distance from the y-axis to the point on the border of the circle. If you still do not know where I came up with those numbers look at the beginning of this tutorial for an explanation of 45- 45-90 triangles…and why you are there refresh yourself on 30-60-90 triangles because we need to know those next. Now lets pick an angle from the unit circle chart like 30 degrees. I'm not going to draw another picture but you should know how to form a right triangle with a line coming from the center of the circle to one of its edges. Now remember the rules that governed the lengths of the sides of a 30-60-90 triangle…if you do then you can once again accurately say that: 1 sin 30 = ---- 2 sqrt (3) cos 30 = --------- 2 I was just about to type out another explanation of why I did this but it's basically the same as what I did for sine just above. Also now that I am rereading this I am seeing
  • 72.
    some things thatmay cause confusion so I thought I would try to clear up a few things. If you look at this picture (it's the same as the one I used a the beginning of all this) I will explain with a little bit more detail on how I arrived at those values for sine and cosine of 45-degrees: Our definition of sine states that the sine of an angle would be the opposite side of the triangle divided by the hypotenuse. Well we know our hypotenuse is one since this a unit circle so we can substitute a one in for "c" and get this: / 1*sqrt(2) | ------------ | 2 / sin 45 = ------------------- 1 Which even the most basic understand of Algebra will tell us that the above is the same as: sqrt (2) sin 45 = -------- 2 Now if you do not get that look at it really hard until it comes to you…I'm sure it will hit you sooner or later. And instead of my wasting more time making a complete unit circle with everything on it I found this great link to one: http://www.infomagic.net/~bright/research/untcrcl.gif . Depending on just how far you want to go into this field of math as well as others like Calculus you may want to try and memorize that entire thing. Whatever it takes just try your best. I always hear people talking about different patterns that they see which helps them to memorize the unit circle, and that is fine but I think it makes it much easier to remember if you know how to come up with those numbers…that's what this whole first part of this tutorial was mostly about. Also while on the subject I might as well tell you about the reciprocal trigonometric functions. They are as follow: csc (theta) = r/y sec (theta) = r/x
  • 73.
    cot (theta) =x/y Those are pronounced secant, cosecant, and cotangent. Just think of them as the same as their matching trigonometric functions except flipped…like this: sin (theta) = y/r - - - > csc (theta) = r/y cos (theta) = x/r - - - > sec (theta) = r/x tan (theta) = y/x - - - > cot (theta) = x/y That makes it a little bit easier to understand doesn't it? Well believe it or not that is it for an introduction to trigonometry. From here we can start to go into much more complicate areas. There are many other fundamentals that I would have liked to go over but this has gotten long and boring enough as it is. I guess I am hoping that you will explore some of these concepts and ideas on your own… you will gain much more knowledge that way as opposed to my sloppy words. Before I go… Before I go I want to just give you a taste of what is to come…this may actually turn out to be just as long as the above so go ahead and make yourself comfortable. First I want to introduce to you trigonometric identities, which are trigonometric equations that are true for all values of the variables for which the expressions in the equation are defined. Now that's probably a little hard to understand and monotonous but I'll explain. Here is a list of what are know as the "fundamental identities": Reciprocal Identities 1 csc (theta) = ---------- , sin (theta) <> 0 sin (theta) 1 sec (theta) = ---------- , COs (theta) <> 0 cos (theta) 1 cot (theta) = ---------- , tan (theta) <> 0 tan (theta) Ratio Identities sin (theta) tan (theta) = ------------ , cos (theta) <> 0 cos (theta)
  • 74.
    cos (theta) cot (theta)= ------------- , sin (theta) <> 0 sin (theta) Pythagorean Identities sin^2(theta) + cos^2(theta) = 1 1 + cot^2(theta) = csc^2(theta) 1 + tan^2(theta) = sec^2(theta) Odd-even Identities sin (-theta) = -sin (theta) cos (-theta) = cos (theta) tan (-theta) = -tan (theta) csc (-theta) = csc (theta) sec (-theta) = sec (theta) cot (-theta) = -cot (theta) Now proving them…well that's gonna take a lot of room but here it goes. I'm only going to prove a few out of each category of identities so maybe you can figure out the others. Lets start with the reciprocal. Well if the reciprocal of a number is simply one divided by that number then we can look at cosecant (which is the reciprocal of sine) as: 1 csc (theta) = ----- ----------------- >>> | If you multiply the numerator and the denominator by "r" you get: / y | |---- | < -- I hope you know | csc (theta) = r/y < -- Just like we said before. We just proved r / that is sine (theta) | an identity...I'll let you do the rest of them... Now the ratio identities. If you think of tangent as y/x , sine as y/r , and cosine as x/r then check this out: sin (theta) --- > y/r y tan (theta) = -------------- --- > ----- --- > Multiply top and bottom by "r" and you're left with ---
  • 75.
    > --- cos (theta)--- > x/r x I'm going to save the proof for the Pythagorean Identities for another time. These fundamental identities will help us prove much more complex identities later on. Knowing trigonometric identities will help us understand some of the more abstract things…at least they are abstract to me. Once I am finished with this I am going to write another tutorial that will go into the somewhat more complex areas that I know of and these fundamental things I have just talked about are required reading. I was going to go over some laws that can be very useful but my study plan tells me that I may not have provided enough information for you to understand it…therefore that will be something coming in the next thing I write. Closing thoughts Well this concludes all the things that you will need to know before you start to do more complicated things. I was a bit brief with some things so if you have any questions or if you want me to go back and further explain something I implore you to e-mail me and I will do my best to clear up any confusion. Also I want to reiterate that this is a very basic introduction to trigonometry. I hope you were not expecting to read this and learn all there is to know. Actually I have not really even mentioned Flash or the possibilities yet…and quite honestly there is not really anything to work with yet. However once I do start to mention Flash and the math that it will take to create some of these effects everyone sees it will almost be just like a review. When you sit down and want to write out a script it will be like merely translating everything you learned about trigonometry from a piece of paper into actionscript. If you want a little synopsis of what I plan on talking about in the next few things I write here you go: - Trigonometry curves - More advanced look into trigonometry - Programmatic movement using trigonometry - Orchestrating it all into perfect harmony (pardon the cliché) Well that's it for me…until next time. Definition
  • 76.
    The "mean", or"average", or "expected value" is the weighted sum of all possible outcomes. The roll of two dice, for instance, has a mean of 7. Multiply 2 by 1/36, the odds of rolling a 2. Multiply 3 by 2/36, the odds of rolling a 3. Do this for all outcomes up to 12. Add them up, and the result is 7. Toss the dice 100 times and the sum of all those throws is going to be close to 700, i.e. 100 times the expected value of 7. The mean need not be one of the possible outcomes. Toss one die, and the mean is 3.5, even though there is no single outcome with value 3.5. But toss the die 100 times and the sum of all those throws will be close to 350. Given a continuous density function f(x), the expected value is the integral of x×f(x). This is the limit of the discrete weighted sum described above. Let's consider a pathological example. Let f(x) = 1/x2, from 1 to infinity. This is a valid density function with integral equal to 1. What is its expected value? Multiply by x to get 1/x, and integrate to get log(x). Evaluate log(x) at 1 and infinity, giving an infinite expected value. Whatever the outcome, you can expect larger outcomes in the future. Add a constant c to each outcome, and you add c to the expected value. Prove this for discrete and continuous density functions. Similarly, scale the output by a constant c, and the mean is multiplied by c. This is proved using integration by substitution. The sum of two independent variables adds their means. This is intuitive, but takes a little effort to prove. If f and g are the density functions of x and y, then the density function for both variables is f(x)g(y). Multiply by x+y and take the integral over the xy plane. Treat it as two integrals: ∫{ f(x)g(y)x } + ∫{ f(x)g(y)y } The first integral becomes the mean of x times 1, and the second becomes 1 times the mean of y. Hence the mean of the sum is the sum of the means. Arithmetic and Geometric Mean The arithmetic mean is the mean, as described above. If all values are positive, the geometric mean is computed by taking logs, finding the arithmetic mean, and taking the exponential. If there are just a few values, the same thing can be accomplished
  • 77.
    by multiplying themtogether and taking the nth root. In the arithmetic mean, you add up and divide by n; in the geometric mean, you multiply up and take the nth root. The geometric mean of 21, 24, and 147 is 42. The geometric mean is used when the log of a measurement is a better indicator (for whatever reason) than the measurement itself. If we wanted to find, for example, the "average" strength of a solar flare, we might use a geometric mean, because the strength can vary by orders of magnitude. Of course, scientists usually develop logarithmic scales for these phenomenon - such as the ricter scale, the decibel scale, and so on. When logs are already implicit in the measurements we can return to the arithmetic mean. The Arithmetic Mean Exceeds the Geometric Mean The average of 2, 5, 8, and 9 is 6, yet the geometric mean is 5.18. The geometric mean always comes out smaller. Let f be a differentiable function that maps the reals, or an unbroken segment of the reals, into the reals. Let f′ be everywhere positive, and let f′′ be everywhere negative. Let g be the inverse of f. Let s be a finite set of real numbers with mean m. Apply f to s, take the average, and apply g. The result is less than m, or equal to m if everything in s is equal to m. When f = log(x), the relationship between the geometric mean and the arithmetic mean is a simple corollary. Shift f(x) up or down, so that f(m) = 0. Let v = f′(m). If x is a value in s less than m, and if f were a straight line with slope v, f(x) would be v×(x-m). Actually f(x) has to be smaller, else the mean value theorem implies a first derivative ≤ v, and a second derivative ≥ 0. On the other side, when x is greater than m, similar reasoning shows f(x) is less than v×(x-m). The entire curve lies below the line with slope v passing through the origin. If f was a line, f(s) would have a mean of 0. But for every x ≠ m, f(x) is smaller. This pulls the mean below 0, and when we apply f inverse, the result lies below m. If f′′ is everywhere positive then the opposite is true; the mean of the image of s in f pulls back to a value greater than m. All this can be extended to the average of a continuous function h(x) from a to b. Choose riemann nets with regular spacing, and apply the theorem to the resulting riemann sums. As the spacing approaches 0, the average remains ≤ m, and in the limit, the average of f(h), pulled back through g, is no larger than the average of h. If h is nonconstant the average through f comes out strictly smaller than the average of h. You'll need uniform continuity, which is assured by continuity across the closed interval [a,b]. The scaled riemann sums approach the average of f(h), and after a
  • 78.
    while, the mean,according to each riemann sum, can be bounded below f(m). I'll leave the details to you. Variance and Standard Deviation If the mean of a random variable is m, the variance is the sum or integral of f(x)(x- m)2. To illustrate, let m = 0. The variance is now the weighted sum of the outcomes squared. In other words, how far does the random variable stray from its mean? If the variance is 0, the outcome is always zero. Any nonzero outcome produces a positive variance. Consider the example of throwing two dice. The average throw produces 7, so subtract 7 from everything. Ten times out of 36 you get a 6 or an 8, giving 10/36×12, or 10/36. Eight times out of 36 you get a 5 or a 9, so add in 8×4/36. Continue through all possible rolls. When your done, the variance is 35/6. Recall that (x-m)2 = x2-2mx+m2. This lets us compute both mean and variance in one pass, which is helpfull if the data set is large. Add up f(x)×x, and f(x)×x2. The former becomes m, the mean. The latter is almost the variance, but we must add m2 times the sum of f(x) (which is m2), and subtract 2m times the sum of xf(x) (which becomes 2m2). Hence the variance is the sum of f(x)x2, minus the square of the mean. The above is also true for continuous variables. The variance is the integral of f(x)x2, minus the square of the mean. The proof is really the same as above. Variance is a bit troublesome however, because the units are wrong. Let a random variable indicate the height of a human being on earth. Height is measured in meters, and the mean, the average height of all people, is also measured in meters. Yet the variance, the variation of height about the mean, seems to be measured in meters squared. To compensate for this, the standard deviation is the square root of variance. Now we're back to meters again. If the average height is 1.7 meters, and the standard deviation is 0.3 meters, we can be pretty sure that a person, chosen at random, will be between 1.4 and 2.0 meters tall. How sure? We'll quantify that later on. For now, the standard deviation gives a rough measure of the spread of a random variable about its mean. The Variance of the Sum We showed that the mean of the sum of two random variables is the sum of the individual means. What about variance? Assume, without loss of generality, that mean(x) = mean(y) = 0. If x and y have density functions f and g, the individual variances are the integrals of f(x)x2 and
  • 79.
    g(y)y2, respectively. Takentogether, the combined density function is f×g, and we want to know the variance of x+y. Consider the following double integral. ∫∫f(x)g(y)(x+y)2 = ∫∫{ f(x)g(y)x2 } + ∫∫{ 2f(x)g(y)xy } + ∫∫{ f(x)g(y)y2 } The first integral is the variance of x, and the third is the variance of y. The middle integral is the mean of x times the mean of y, or zero. Therefore the variance of the sum is the sum of the variances. Reverse Engineering If a random variable has a mean of 0 and a variance of 1, what can we say about it? Not a whole lot. The outcome could be 0 most of the time, and on rare occasions, a million. That gives a variance of 1. But for all practical purposes the "random" variable is always 0. Alternatively, x could be ±1, like flipping a coin. This has mean 0 and variance 1, yet the outcome is never 0. Other functions produce values of 1/3, 0.737, sqrt(½), and so on. There's really no way to know. We can however say something about the odds of finding x ≥ c, for c ≥ 1. Let |x| exceed c with probability p. The area of the curve, beyond c, is p. This portion of the curve contributes at least pc2 to the variance. Since this cannot exceed 1, the probability of finding x beyond c is bounded by 1/c2. Generalize the above proof to a random variable with mean m and standard deviation s. If c is at least s, x is at least c away from m with probability at most s2/c2. The Mean is Your Best Guess Let a random variable x have a density function f and a mean m. You would like to predict the value of x, in a manner that minimizes error. If your prediction is t, the error is defined as (x-t)2, i.e. the square of the difference between your prediction and the actual outcome. What should you guess to minimize error? The expected error is the integral of f(x)(x-t)2, from -infinity to +infinity. Write this as three separate integrals: error = ∫{ f(x)x2 } - ∫{ 2f(x)xt } + ∫{ f(x)t2 } The first integral becomes a constant, i.e. it does not depend on t. The second becomes -2mt, where m is the mean, and the third becomes t2. This gives a quadratic in t. Find its minimum by setting its first derivative equal to 0. Thus t = m, and the mean is your best guess. The expected error is the variance of f.
  • 80.
    Custom Links: What isLog? Date: 26 Feb 1995 22:46:28 -0500 From: charley Subject: Math questions Hi, My name is Yutaka Charley and I'm in the 5th grade at PS150Q in NYC. What's 4 to the half power? What does log mean? Thank you. Yutaka Date: 27 Feb 1995 21:54:12 -0500 From: Dr. Ken Subject: Re: Math questions Hello there! I'll address your second question, the one about Logs; and my colleague and buddy Ethan has promised to answer your first question, the one about 4 to the 1/2 power. Here's the definition of Log: b If a = x, then Log (x) = b. a When you read that, you say "if a to the b power equals x, then the Log (or Logarithm) to the base a of x equals b." Log is short for the word Logarithm. Here are a couple of examples: Since 2^3 = 8, Log (8) = 3. 2 For the rest of this letter we will use ^ to represent exponents - 2^3 means 2 to the third power.
  • 81.
    To find outwhat Log (25) is, we'd ask ourselves "what power do you raise 5 5 to to get 25?" Since 5^2 = 25, the answer to this one is 2. So the Logarithm to the base 5 of 25 is 2. Whenever you talk about a Logarithm, you have to say what base you're talking about. For instance, the Logarithm to the base 3 of 81 is 4, but the Logarithm to the base 9 of 81 is 2. Here are a couple of examples that you can try to figure out: What is the Logarithm to the base 2 of 16? What is the Logarithm to the base 7 of 343? How would you express the information, 4^3 = 64, in terms of Logarithms? _______________ Now that you have done Logarithms I will take over for my buddy Ken and talk about fractional exponents. To help explain fractional exponents I need to teach you one neat fact about exponents: 3^4 times 3^5 equals 3^(4+5) or 3^9 This will be very important so I will show a few more examples. 4^7 times 4^10 equals 4^17 5^2 times 5^6 equals 5^8 Now let's get to fractional exponents. Let's start with 9^(1/2). We know from our adding rule that 9^(1/2) times 9^(1/2) is 9^(1/2 + 1/2), which is 9^1; so whatever 9^(1/2) is, we know that it times itself has to equal nine. But what times itself equals 9? Well 3, so 9^(1/2) is 3. All fractional exponents work this way. Lets look at 8^(1/3). Again,
  • 82.
    8^(1/3) times 8^(1/3)times 8^(1/3) is 8^(1/3 + 1/3 + 1/3), which is 8; so we need to know what times itself three times is 8. That is 2. So now look at your problem, 4^(1/2). We know from experience that this means what number times itself is 4? That is 2, so 4^(1/2) equals 2. Geometrical Meaning of Matrix Multiplication Definitions of 'matrix' Wordnet 1. (noun) matrix (mathematics) a rectangular array of quantities or expressions set out by rows and columns; treated as a singleelement and manipulated according to rules 2. (noun) matrix (geology) amass of fine-grained rock in which fossils, crystals, or gems are embedded 3. (noun) matrix an enclosure within which something originates or develops (from the Latin for womb) 4. (noun) matrix, intercellular substance, ground substance the body substance in which tissue cells are embedded 5. (noun) matrix the formative tissue at the base of a nail 6. (noun) matrix mold used in the production of phonograph records, type, or other relief surface Definitions of 'matrix' Webster 1913 Dictionary 1. (noun) matrix the womb 2. (noun) matrix hence, that which gives form or origin to anything 3. (noun) matrix the cavity in which anything is formed, and which gives it shape; a die; a mold, as for the face of a type
  • 83.
    4. (noun) matrix theearthy or stony substance in which metallic ores or crystallized minerals are found; the gangue 5. (noun) matrix the five simple colors, black, white, blue, red, and yellow, of which all the rest are composed 6. (noun) matrix the lifeless portion of tissue, either animal or vegetable, situated between the cells; the intercellular substance 7. (noun) matrix a rectangular arrangement of symbols in rows and columns. The symbols may express quantities or operations Definitions of 'matrix' The New Hacker's Dictionary 1. matrix [FidoNet] 1. What the Opus BBS software and sysops call FidoNet. 2. Fanciful term for a cyberspace expected to emerge from current networking experiments (see the network). The name of the rather good 1999 cypherpunk movie The Matrix played on this sense, which however had been established for years before. 3. The totality of present-day computer networks (popularized in this sense by John Quarterman; rare outsideacademic literature). Matrix multiplication is a versatile tool for many aspects of scientific or technical methods. One particular application of matrix multiplication is the transformation of data in n-dimensional space. Data can be scaled, shifted, rotated, or distorted by a simple matrix multiplication. In order to achieve all these operations by a single transformation matrix, the original data has to be augmented by an additional constant value (preferably 1). In order to see the effects of matrix multiplication, you can start the following interactive example . Example: transformation of two-dimensional points. Suppose you have seven data points in two dimensions (x, and y). These seven data points have to be submitted to various transformation operations. Therefore we first augment the data points, denoted by [xi,yi], with a constant value, resulting in the point vectors [xi,yi,1].
  • 84.
    For performing thevarious transformations, we simply have to adjust the transformation matrix. Shift The coordinates of the data points are shifted by the vector [t1,t2] Scaling The points are scaled by the factor s. Scaling only the y coordinate Here, only the y coordinates are scaled according to the factor s. Rotation A rotation of all points around the origin can be accomplished by using the sines and cosines of the rotation angle (remember the negative sign for the first sine term).
  • 85.
    The ordinary matrixproduct is the most often used and the most important way to multiply matrices. It is defined between two matrices only if the width of the first matrix equals the height of the second matrix. Multiplying an m×n matrix with an n×p matrix results in an m×p matrix. If many matrices are multiplied together, and their dimensions are written in a list in order, e.g. m×n, n×p, p×q, q×r, the size of the result is given by the first and the last numbers (m×r), and the values surrounding each comma must match for the result to be defined. The ordinary matrix product is not commutative: The element x3,4 of the above matrix product is computed as follows The first coordinate in matrix notation denotes the row and the second the column; this order is used both in indexing and in giving the dimensions. The element at the intersection of row i and columnj of the product matrix is the dot product (or scalar product) of row i of the first matrix and column j of the second matrix. This explains why the width and the height of the matrices being multiplied must match: otherwise the dot product is not defined. The figure to the right illustrates the product of two matrices A and B, showing how each intersection in the product matrix corresponds to a row of Aand a column of B. The size of the output matrix is always the largest possible, i.e. for each row of A and for each column of B there are always corresponding intersections in the product matrix. The product matrix AB consists of all combinations of dot products of rows of A and columns of B.
  • 86.
    The values atthe intersections marked with circles are: Formal definition Formally, for , for some field F, then where the elements of AB are given by for each pair i and j with 1 ≤ i ≤ m and 1 ≤ j ≤ p. The algebraic system of "matrix units" summarizes the abstract properties of this kind of multiplication. Relationship with the inner product and the outer product The Euclidean inner product and outer product are the simplest special cases of the ordinary matrix product. The inner product of two column vectors A and B is , where T denotes the matrix transpose. More explicitly,
  • 87.
    The outer productis , where Matrix multiplication can be viewed in terms of these two operations by considering how the matrix product works on block matrices. Decomposing A into row vectors and B into column vectors: where The method in the introduction was: This is an outer product where the real product inside is replaced with the inner product. In general, block matrix multiplication works exactly like ordinary matrix multiplication, but the real product inside is replaced with the matrix product.
  • 88.
    An alternative methodresults when the decomposition is done the other way around (A decomposed into column vectors and B into row vectors): This method emphasizes the effect of individual column/row pairs on the result, which is a useful point of view with e.g. covariance matrices, where each such pair corresponds to the effect of a single sample point. An example for a small matrix: One more useful decomposition results when B is decomposed intocolumns and A is left unrecompensed. Then A is seen to act separately on each column of B, transforming them in parallel. Conversely,B acts separately on each row of A. If x is a vector and A is decomposed into columns, then . The column vectors of A give directions and units for coordinate axes and the elements of x are coordinates on the corresponding axes. is then the vector which has thos Properties  Matrix multiplication is not generally commutative  If A and B are both n x n matrices, the determinant of their product is independent of the order of the matrices in the product.  If both matrices are diagonal square matrices of the same dimension, their product is commutative.  If A is a matrix representative of a linear transformation L and B is a matrix representative of a linear transformation P then AB is a matrix representative of a linear transform P followed by the linear transformation L.
  • 89.
     Matrix multiplicationis associative:  Matrix multiplication is distributive over matrix addition: .  If the matrix is defined over a field (for example, over the Real or Complex fields), then it is compatible with scalar multiplication in that field. where c is a scalar. Algorithms for efficient matrix multiplication The running time of square matrix multiplication, if carried out naively, is O(n3 ). The running time for multiplying rectangular matrices (one m×p-matrix with one p×n- matrix) is O(mnp). But more efficient algorithms do exist. Strassen's algorithm, devised by Volker Strassen in 1969 and often referred to as "fast matrix multiplication", is based on a clever way of multiplying two 2 × 2 matrices which requires only 7 multiplications (instead of the usual 8), at the expense of several additional addition and subtraction operations. Applying this trick recursively gives an algorithm with a multiplicative cost of . Strassen's algorithm is awkward to implement, compared to the naive algorithm, and it lacks numerical stability. Nevertheless, it is beginning to appear in libraries such asBLAS, where it is computationally interesting for matrices with dimensions n > 100[1] , and is very useful for large matrices over exact domains such as finite fields, where numerical stability is not an issue. The currently O(nk ) algorithm with the lowest known exponent k is the Coppersmith– Winograd algorithm. It was presented by Don Coppersmith and Shmuel Winograd in 1990, has an asymptotic complexity of O(n2.376 ). It is similar to Strassen's algorithm: a clever way is devised for multiplying two k × k matrices with fewer than k3 multiplications, and this technique is applied recursively. However, the constant coefficient hidden by the Big O Notation is so large that the Coppersmith–
  • 90.
    Winograd algorithm isonly worthwhile for matrices that are too large to handle on present-day computers.[2] Since any algorithm for multiplying two n × n matrices has to process all 2 × n² entries, there is an asymptotic lower bound of ω(n2 ) operations. Raz (2002) proves a lower bound of Ω(m2 logm) for bounded coefficient arithmetic circuits over the real or complex numbers. Cohn et al. (2003, 2005) put methods, such as the Strassen and Coppersmith– Winograd algorithms, in an entirely different group-theoretic context. They show that if families of wreath products of Abelian with symmetric groups satisfying certain conditions exist, then there are matrix multiplication algorithms with essential quadratic complexity. Most researchers believe that this is indeed the case[3] - a lengthy attempt at proving this was undertaken by the late Jim Eve.[4] Because of the nature of matrix operations and the layout of matrices in memory, it is typically possible to gain substantial performance gains through use of parallelisation and vectorization. It should therefore be noted that some lower time-complexity algorithms on paper may have indirect time complexity costs on real machines. Relationship to linear transformations Matrices offer a concise way of representing linear transformations between vector spaces, and (ordinary) matrix multiplication corresponds to the composition of linear transformations. This will be illustrated here by means of an example using three vector spaces of specific dimensions, but the correspondence applies equally to any other choice of dimensions. Let X, Y, and Z be three vector spaces, with dimensions 4, 2, and 3, respectively, all over the same field, for example the real numbers. The coordinates of a point in X will be denoted as xi, for i = 1 to 4, and analogously for the other two spaces. Two linear transformations are given: one from Y to X, which can be expressed by the system of linear equations and one from Z to Y, expressed by the system
  • 91.
    These two transformationscan be composed to obtain a transformation from Z to X. By substituting, in the first system, the right-hand sides of the equations of the second system for their corresponding left-hand sides, the xi can be expressed in terms of the zk: These three systems can be written equivalently in matrix–vector notation – thereby reducing each system to a single equation – as follows: These three systems can be written equivalently in matrix–vector notation – thereby reducing each system to a single equation – as follows: Representing these three equations symbolically and more concisely as inspection of the entries of matrix C reveals that C = AB.
  • 92.
    This can beused to formulate a more abstract definition of matrix multiplication, given the special case of matrix–vector multiplication: the product AB of matrices A and B is the matrix C such that for all vectors z of the appropriate shape Cz = A(Bz). Scalar multiplication The scalar multiplication of a matrix A = (aij) and a scalar r gives a product r A of the same size as A. The entries of r A are given by For example, if then If we are concerned with matrices over a more general ring, then the above multiplication is the left multiplication of the matrix A with scalar p while the right multiplication is defined to be When the underlying ring is commutative, for example, the real or complex number field, the two multiplications are the same. However, if the ring is not commutative, such as the quaternions, they may be different. For example Hadamard product See also: (Function) pointwise product For two matrices of the same dimensions, we have the Hadamard product (named after French mathematician Jacques Hadamard), also known as the entrywise product and the Schur product.[5] Formally, for two matrices of the same dimensions: the Hadamard product A · B is a matrix of the same dimensions
  • 93.
    with elements givenby Note that the Hadamard product is a submatrix of the Kronecker product. The Hadamard product is commutative. The Hadamard product appears in lossy compression algorithms such as JPEG. Kronecker product Main article: Kronecker product For any two arbitrary matrices A and B, we have the direct product or Kronecker product A ⊗ B defined as If A is an m-by-n matrix and B is a p-by-q matrix, then their Kronecker product A ⊗ B is an mp-by-nq matrix. The Kronecker product is not commutative. For example If A and B represent linear transformations V1 → W1 and V2 → W2, respectively, then A ⊗ B repres
  • 94.
    Common properties If A,B and C are matrices with appropriate dimensions defined over a field (e.g. ) where c is a scalar in that field, then for all three types of multiplication:  Matrix multiplication is associative:  Matrix multiplication is distributive:  Matrix multiplication is compatible with scalar multiplication:  Note that matrix multiplication is not commutative: although, the order of multiplication can be reversed by transposing the matrices: The Frobenius inner product, sometimes denoted A:B is the component-wise inner product of two matrices as though they are vectors. In other words, it is the sum of the entries of the Hadamard product, that is, This inner product induces the Frobenius norm. Square matrices can be multiplied by themselves repeatedly in the same way that ordinary numbers can. This repeated multiplication can be described as a power of
  • 95.
    the matrix. Usingthe ordinary notion of matrix multiplication, the following identities hold for an n-by-n matrix A, a positive integer k, and a scalar c: The naive computation of matrix powers is to multiply k times the matrix A to the result, starting with the identity matrix just like the scalar case. This can be improved using the binary representation of k, a method commonly used to scalars. An even better method is to use the eigenvalue decomposition of A. Calculating high powers of matrices can be very time-consuming, but the complexity of the calculation can be dramatically decreased by using the Cayley-Hamilton theorem, which takes advantage of an identity found using the matrices' characteristic polynomial and gives a much more effective equation for Ak , which instead raises a scalar to the required power, rather than a matrix. Powers of diagonal matrices The power, k, of a diagonal matrix A, is the diagonal matrix whose diagonal entries are the k powers of the original matrix A. When raising an arbitrary matrix (not necessarily a diagonal matrix) to a power, it is often helpful to diagonalize the matrix first. The Weighted Matrix Product
  • 96.
    The Weighted MatrixProduct, Weighted Matrix Multiplication is a generalization of ordinary matrix multiplication, in the following way. Given a set of Weight Matrices, the Weighted Matrix Product of the matrix pair is given by: , where: c(A) is the number of columns of The number of Weight Matrices is: the number of columns of the left operand = the number of rows of the right operand The number of rows of the Weight Matrices is: number of rows of the left operand. The number of columns of the Weight Matrices is: the number of columns of the right operand. The Weighted Matrix Product is defined only if the matrix operands are conformable in the ordinary sense. The resultant matrix has the number of rows of the left operand , and the number of columns of the right operand . NOTE: Ordinary Matrix Multiplication is the special case of Weighted Matrix Multiplication, where all the weight matrix entries are 1s . Ordinary Matrix Multiplication is Weighted Matrix Multiplication in a default "sea of 1s ", the weight matrices formed out of the "sea" as necessary. NOTE: The Weighted Matrix Product is not generally associative: Weighted matrix multiplication may be expressed in terms of ordinary matrix multiplication, using matrices constructed from the constituent parts, as follows: for mxp matrix: , and pxn matrix: , define:
  • 97.
    then: The Weighted Matrixproduct is especially useful in developing matrix bases closed under a (not necessarily associative) product (algebras). As an example, consider the following developments: It is convenient (although not necessary) to begin with permutation matrices as the basis; since they are a known basis and about as simple as there is. the complex plane is weighted matrix multiplication, with weights: then: then: So: Thus, is manifested a homomorphism between this and thecomplex plane.
  • 98.
    Quaternions is weighted matrixmultiplication, with weights: then * Multiplying a 2 × 3 matrix by a 3 × 4 matrix is possible and it gives a 2 × 4 matrix as the answer. Multiplying a 7 × 1 matrix by a 1 × 2 matrix is okay; it gives a 7 × 2 matrix A 4 × 3 matrix times a 2 × 3 matrix is NOT possible. How to Multiply 2 Matrices We use letters first to see what is going on. We'll see a numbers example after. As an example, let's take a general 2 × 3 matrix multiplied by a 3 × 2 matrix.
  • 99.
    The answer willbe a 2 × 2 matrix. We multiply and add the elements as follows. We work across the 1st row of the first matrix, multiplying down the 1st column of the second matrix, element by element. Weadd the resulting products. Our answer goes in position a11 (top left) of the answer matrix. We do a similar process for the 1st row of the first matrix and the 2nd column of the second matrix. The result is placed in position a12. Now for the 2nd row of the first matrix and the 1st column of the second matrix. The result is placed in position a21. Finally, we do the 2nd row of the first matrix and the 2nd column of the second matrix. The result is placed in position a22.
  • 100.
    So the resultof multiplying our 2 matrices is as follows: Now let's see a number example. Example Multiply: Answer Multiplying 2 × 2 Matrices The process is the same for any size matrix. We multiply across rows of the first matrix and down columns of the second matrix, element by element. We then add the products: In this case, we multiply a 2 × 2 matrix by a 2 × 2 matrix and we get a 2 × 2 matrix as the result.
  • 101.
    Example Multiply: Answer Here is a(2×2)×(2×2) example in LiveMath. LIVEMath Let's look at another example. This time we have (3×3)×(3×1). LIVEMath Flash Interactive Here's a Flash movie to play with. It will generate many different sized (up to 5 by 5) matrices with different random numbers each time. You can see plenty of examples of matrix operations, including adding, subtracting and multiplying. You can step through each calculation involved. You can do this by clicking the "step" button which appears. Suggestion: Work out the answer yourself first, then check your answer against what it says. Never just copy! Matrices and Systems of Simultaneous Linear Equations We now see how to write a system of linear equations using matrix multiplication. Example: The system of equations
  • 102.
    can be writtenas: Matrices are ideal for computer-driven solutions of problems because computers easily form arrays. We can leave out the algebraic symbols. A computer only requires the first and last matrices to solve the system, as we will see in Matrices and Linear Equations. Note 1 - Notation Care with writing matrix multiplication. The following expressions have different meanings: AB is matrix multiplication A×B cross product, which returns a vector A*B used in computer notation, but not on paper A•B dot product, which returns a scalar. [See the Vector chapter for more information on vector and scalar quantities.] Note 2 - Commutativity of Matrix Multiplication Does AB = BA? Let's see if it is true using an example. Example If
  • 103.
    and find AB andBA. Answer In general, when multiplying matrices, the commutative law doesn't hold, i.e. AB ≠ BA. There are two common exceptions to this: • The identity matrix: IA = AI = A. • The inverse of a matrix: A-1 A = AA-1 = I. In the next section we learn how to find the inverse of a matrix. Example - Multiplying by the Identity Matrix Given that find AI. Answer
  • 104.
    Exercises 1. If possible,find BA and AB. Answer 2. Determine if B = A-1 . Answer 3. In studying the motion of electrons, one of the Pauli spin matrices is where
  • 105.
    Show that s2 =I. [If you have never seen j before, go to the section on complex numbers]. Answer 4. Evaluate the following matrix multiplication which is used in directing the motion of a robotic mechanism. Answer Diagonal matrix From Wikipedia, the free encyclopedia In linear algebra, a diagonal matrix is a square matrix in which the entries outside the main diagonal (↘) are all zero. The diagonal entries themselves may or may not be zero. Thus, the matrix D = (di,j) with n columns and n rows is diagonal if:
  • 106.
    For example, thefollowing matrix is diagonal: The term diagonal matrix may sometimes refer to a rectangular diagonal matrix, which is an m-by-n matrix with only the entries of the form di,i possibly non-zero; for example, , or However, in the remainder of this article we will consider only square matrices. Any diagonal matrix is also a symmetric matrix. Also, if the entries come from the field R or C, then it is a normal matrix as well. Equivalently, we can define a diagonal matrix as a matrix that is both upper- and lower-triangular. The identity matrix In and any square zero matrix are diagonal. A one-dimensional matrix is always diagonal. Scalar matrix A diagonal matrix with all its main diagonal entries equal is a scalar matrix, that is, a scalar multiple λI of the identity matrix I. Its effect on a vector is scalar multiplication by λ. For example, a 3×3 scalar matrix has the form: The scalar matrices are the center of the algebra of matrices: that is, they are precisely the matrices that commute with all other square matrices of the same size. For an abstract vector space V (rather than the concrete vector space Kn ), or more generally a module M over a ring R, with the endomorphism algebra End(M) (algebra of linear operators on M) replacing the algebra of matrices, the analog of scalar matrices are scalar transformations. Formally,
  • 107.
    scalar multiplication isa linear map, inducing a map (send a scalar λ to the corresponding scalar transformation, multiplication by λ) exhibiting End(M) as a R-algebra. For vector spaces, or more generally free modules , for which the endomorphism algebra is isomorphic to a matrix algebra, the scalar transforms are exactly the center of the endomorphism algebra, and similarly invertible transforms are the center of the general linear group GL(V), where they are denoted by Z(V), follow the usual notation for the center. Matrix operations The operations of matrix addition and matrix multiplication are especially simple for diagonal matrices. Write diag(a1,...,an) for a diagonal matrix whose diagonal entries starting in the upper left corner area1,...,an. Then, for addition, we have diag(a1,...,an) + diag(b1,...,bn) = diag(a1+b1,...,an+bn) and for matrix multiplication, diag(a1,...,an) · diag(b1,...,bn) = diag(a1b1,...,anbn). The diagonal matrix diag(a1,...,an) is invertible if and only if the entries a1,...,an are all non-zero. In this case, we have diag(a1,...,an)-1 = diag(a1 -1 ,...,an -1 ). In particular, the diagonal matrices form a subring of the ring of all n-by-n matrices. Multiplying an n-by-n matrix A from the left with diag(a1,...,an) amounts to multiplying the i-th row of A by ai for all i; multiplying the matrix A from the right with diag(a1,...,an) amounts to multiplying the i-thcolumn of A by ai for all i. Background The variance of a random variable or distribution is the expectation, or mean, of the squared deviation of that variable from its expected value or mean. Thus the variance is a measure of the amount of variation within the values of that variable, taking account of all possible values and their probabilities or weightings (not just the extremes which give the range). For example, a perfect die, when thrown, has expected value (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5, expected absolute deviation 1.5 (the mean of the equally likely absolute deviations (3.5 − 1, 3.5 − 2, 3.5 − 3, 4 − 3.5, 5 − 3.5, 6 − 3.5), giving 2.5, 1.5, 0.5, 0.5, 1.5, 2.5), but expected square deviation or
  • 108.
    variance of 17.5/6≈ 2.9 (the mean of the equally likely squared deviations 2.52 , 1.52 , 0.52 , 0.52 , 1.52 , 2.52 ). As another example, if a coin is tossed twice, the number of heads is: 0 with probability 0.25, 1 with probability 0.5 and 2 with probability 0.25. Thus the variance is 0.25 × (0 − 1)2 + 0.5 × (1 − 1)2 + 0.25 × (2 − 1)2 = 0.25 + 0 + 0.25 = 0.5. (Note that in this case, where tosses of coins are independent, the variance is additive, i.e., if the coin is tossed n times, the variance will be 0.25n.) Unlike expected deviation, the variance of a variable has units that are the square of the units of the variable itself. For example, a variable measured in inches will have a variance measured in square inches. For this reason, describing data sets via their standard deviation or root mean square deviation is often preferred over variance. In the dice example the standard deviation is √(17.5/6) ≈ 1.7, slightly larger than the expected deviation of 1.5. The standard deviation and the expected deviation can both be used as an indicator of the "spread" of a distribution. The standard deviation is more amenable to algebraic manipulation, and, together with variance and its generalization covariance is used frequently in theoretical statistics; however the expected deviation tends to be more robust as it is less sensitive to outliers arising from measurement anomalies or an unduly heavy-tailed distribution. Real-world distributions such as the distribution of yesterday’s rain throughout the day are typically not fully known, unlike the behavior of perfect dice or an ideal distribution such as the normal distribution, because it is impractical to account for every raindrop. Instead one estimates the mean and variance of the whole distribution as the computed mean and variance of n samples drawn suitably randomly from the whole sample space, in this example yesterday’s rainfall. This method of estimation is close to optimal, with the caveat that it underestimates the variance by a factor of (n−1)/n (when n = 1 the variance of a single sample is obviously zero regardless of the true variance), a bias which should be corrected for when n is small. If the mean is determined in some other way than from the same samples used to estimate the variance then this bias does not arise and the variance can safely be estimated as that of the samples. The variance of a real-valued random variable is its second central moment, and it also happens to be its second cumulant. Just as some distributions do not have a mean, some do not have a variance. The mean exists whenever the variance exists, but not vice versa. Definition
  • 109.
    If a randomvariable X has the expected value (mean) μ = E[X], then the variance of X is given by: This definition encompasses random variables that are discrete, continuous, or neither. It can be expanded as follows: The variance of random variable X is typically designated as Var(X), , or simply σ2 (pronounced “sigma squared”). If a distribution does not have an expected value, as is the case for the Cauchy distribution, it does not have a variance either. Many other distributions for which the expected value does exist do not have a finite variance because the relevant integral diverges. An example is a Pareto distribution whose index k satisfies 1 < k ≤ 2. [edit]Continuous case If the random variable X is continuous with probability density function f(x), where and where the integrals are definite integrals taken for x ranging over the range of X. [edit]Discrete case If the random variable X is discrete with probability mass function x1 ↦ p1, ..., xn ↦ pn, then
  • 110.
    where . (When such adiscrete weighted variance is specified by weights whose sum is not 1, then one divides by the sum of the weights.) That is, it is the expected value of the square of the deviation of X from its own mean. In plain language, it can be expressed as “The mean of the square of the deviation of each data point from the average”. It is thus the mean squared deviation. Examples Exponential distribution The exponential distribution with parameter λ is a continuous distribution whose support is the semi-infinite interval [0,∞). Its probability density function is given by: and it has expected value μ = λ−1 . Therefore the variance is equal to: So for an exponentially distributed random variable σ2 = μ2 . [edit]Fair dice A six-sided fair die can be modelled with a discrete random variable with outcomes 1 through 6, each with equal probability 1 /6. The expected value is (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5. Therefore the variance can be computed to be: Properties Variance is non-negative because the squares are positive or zero. The variance of a constant random variable is zero, and the variance of a variable in a data set is 0 if and only if all entries have the same value. Variance is invariant with respect to changes in a location parameter. That is, if a constant is added to all values of the variable, the variance is unchanged. If all values are scaled by a constant, the variance is scaled by the square of that constant. These two properties can be expressed in the following formula:
  • 111.
    The variance ofa finite sum of uncorrelated random variables is equal to the sum of their variances. This stems from the identity: and that for uncorrelated variables covariance is zero. In general, for the sum of N variables: , we have: Suppose that the observations can be partitioned into equal- sized subgroups according to some second variable. Then the variance of the total group is equal to the mean of the variances of the subgroups plus the variance of the means of the subgroups. This property is known as variance decomposition or the law of total variance and plays an important role in the analysis of variance. For example, suppose that a group consists of a subgroup of men and an equally large subgroup of women. Suppose that the men have a mean body length of 180 and that the variance of their lengths is 100. Suppose that the women have a mean length of 160 and that the variance of their lengths is 50. Then the mean of the variances is (100 + 50) / 2 = 75; the variance of the means is the variance of 180, 160 which is 100. Then, for the total group of men and women combined, the variance of the body lengths will be 75 + 100 = 175. Note that this uses N for the denominator instead of N − 1. In a more general case, if the subgroups have unequal sizes, then they must be weighted proportionally to their size in the computations of the means and variances. The formula is also valid with more than two groups, and even if the grouping variable is continuous. This formula implies that the variance of the total group cannot be smaller than the mean of the variances of the subgroups. Note, however, that the total variance is not necessarily larger than the variances of the subgroups. In the above example, when the subgroups are analyzed separately, the variance is influenced only by the man- man differences and the woman-woman differences. If the two groups are combined, however, then the men-women differences enter into the variance also. Many computational formulas for the variance are based on this equality: The variance is equal to the mean of the squares minus the square of the mean. For example, if we consider the numbers 1, 2, 3, 4 then the mean of the squares is (1 ×
  • 112.
    1 + 2× 2 + 3 × 3 + 4 × 4) / 4 = 7.5. The regular mean of all four numbers is 2.5, so the square of the mean is 6.25. Therefore the variance is 7.5 − 6.25 = 1.25, which is indeed the same result obtained earlier with the definition formulas. Many pocket calculators use an algorithm that is based on this formula and that allows them to compute the variance while the data are entered, without storing all values in memory. The algorithm is to adjust only three variables when a new data value is entered: The number of data entered so far (n), the sum of the values so far (S), and the sum of the squared values so far (SS). For example, if the data are 1, 2, 3, 4, then after entering the first value, the algorithm would have n = 1, S = 1 and SS = 1. After entering the second value (2), it would have n = 2, S = 3 and SS = 5. When all data are entered, it would have n = 4, S = 10 and SS = 30. Next, the mean is computed as M = S / n, and finally the variance is computed as SS / n − M × M. In this example the outcome would be 30 / 4 − 2.5 × 2.5 = 7.5 − 6.25 = 1.25. If the unbiased sample estimate is to be computed, the outcome will be multiplied by n / (n − 1), which yields 1.667 in this example. Sum of uncorrelated variables (Bienaymé formula) One reason for the use of the variance in preference to other measures of dispersion is that the variance of the sum (or the difference) of uncorrelated random variables is the sum of their variances: This statement is called the Bienaymé formula.[1] and was discovered in 1853. It is often made with the stronger condition that the variables are independent, but uncorrelatedness suffices. So if all the variables have the same variance σ2 , then, since division by n is a linear transformation, this formula immediately implies that the variance of their mean is That is, the variance of the mean decreases when n increases. This formula for the variance of the mean is used in the definition of the standard error of the sample mean, which is used in the central limit theorem. [edit]Sum of correlated variables In general, if the variables are correlated, then the variance of their sum is the sum of their covariances:
  • 113.
    (Note: This bydefinition includes the variance of each variable, since Cov(X,X) = Var(X).) Here Cov is the covariance, which is zero for independent random variables (if it exists). The formula states that the variance of a sum is equal to the sum of all elements in the covariance matrix of the components. This formula is used in the theory of Cronbach's alpha in classical test theory. So if the variables have equal variance σ2 and the average correlation of distinct variables is ρ, then the variance of their mean is This implies that the variance of the mean increases with the average of the correlations. Moreover, if the variables have unit variance, for example if they are standardized, then this simplifies to This formula is used in the Spearman-Brown prediction formula of classical test theory. This converges to ρ if n goes to infinity, provided that the average correlation remains constant or converges too. So for the variance of the mean of standardized variables with equal correlations or converging average correlation we have Therefore, the variance of the mean of a large number of standardized variables is approximately equal to their average correlation. This makes clear that the sample mean of correlated variables does generally not converge to the population mean, even though the Law of large numbers states that the sample mean will converge for independent variables. [edit]Weighted sum of variables The scaling property and the Bienaymé formula, along with this property from the covariance page: Cov(aX, bY) = ab Cov(X, Y) jointly imply that This implies that in a weighted sum of variables, the variable with the largest weight will have a disproportionally large weight in the variance of the total. For example,
  • 114.
    if X andY are uncorrelated and the weight of X is two times the weight of Y, then the weight of the variance of X will be four times the weight of the variance of Y. Decomposition The general formula for variance decomposition or the law of total variance is: If X and Y are two random variables and the variance of X exists, then Here, E(X|Y) is the conditional expectation of X given Y, and Var(X|Y) is the conditional variance of X given Y. (A more intuitive explanation is that given a particular value of Y, then X follows a distribution with mean E(X|Y) and variance Var(X|Y). The above formula tells how to find Var(X) based on the distributions of these two quantities when Y is allowed to vary.) This formula is often applied in analysis of variance, where the corresponding formula is SSTotal = SSBetween + SSWithin. It is also used in linear regression analysis, where the corresponding formula is SSTotal = SSRegression + SSResidual. This can also be derived from the additivity of variances, since the total (observed) score is the sum of the predicted score and the error score, where the latter two are uncorrelated. Computational formula Main article: computational formula for the variance See also: algorithms for calculating variance The computational formula for the variance follows in a straightforward manner from the linearity of expected values and the above definition: This is often used to calculate the variance in practice, although it suffers from catastrophic cancellation if the two components of the equation are similar in magnitude.
  • 115.
    Characteristic property The secondmoment of a random variable attains the minimum value when taken around the first moment (i.e., mean) of the random variable, i.e. . Conversely, if a continuous function satisfies for all random variables X, then it is necessarily of the form , where a > 0. This also holds in the multidimensional case.[2] [edit]Calculation from the CDF The population variance for a non-negative random variable can be expressed in terms of the cumulative distribution function F using where H(u) = 1 − F(u) is the right tail function. This expression can be used to calculate the variance in situations where the CDF, but not the density, can be conveniently expressed. [edit]Approximating the variance of a function The delta method uses second-order Taylor expansions to approximate the variance of a function of one or more random variables: see Taylor expansions for the moments of functions of random variables. For example, the approximate variance of a function of one variable is given by provided that f is twice differentiable and that the mean and variance of X are finite. [citation needed] [edit]Population variance and sample variance In general, the population variance of a finite population of size N is given by where is the population mean. In many practical situations, the true variance of a population is not known a priori and must be computed somehow. When dealing with extremely large populations, it is not possible to count every object in the population.
  • 116.
    A common taskis to estimate the variance of a population from a sample.[3] We take a sample with replacement of n values y1, ..., yn from the population, where n < N, and estimate the variance on the basis of this sample. There are several good estimators. Two of them are well known: and [4] Both are referred to as sample variance. Here, denotes the sample mean: The two estimators only differ slightly as can be seen, and for larger values of the sample size n the difference is negligible. While the first one may be seen as the variance of the sample considered as a population, the second one is the unbiased estimator of the population variance, meaning that its expected value E[s2 ] is equal to the true variance of the sampled random variable; the use of the term n − 1 is called Bessel's correction. The sample variance with n − 1 is a U-statistic for the function ƒ(x1, x2) = (x1 − x2)2 /2, meaning that it is obtained by averaging a 2-sample statistic over 2-element subsets of the population.
  • 117.
    While, Distribution of thesample variance Being a function of random variables, the sample variance is itself a random variable, and it is natural to study its distribution. In the case that yi are independent observations from a normal distribution,Cochran's theorem shows that s2 follows a scaled chi-square distribution: As a direct consequence, it follows that E(s2 ) = σ2 . If the yi are independent and identically distributed, but not necessarily normally distributed, then where κ is the kurtosis of the distribution. If the conditions of the law of large numbers hold, s2 is a consistent estimator of σ2 .
  • 118.
    [edit]Generalizations If X isa vector-valued random variable, with values in , and thought of as a column vector, then the natural generalization of variance is , where and is the transpose of X, and so is a row vector. This variance is a positive semi-definite square matrix, commonly referred to as the covariance matrix. If X is a complex-valued random variable, with values in , then its variance is , where is the conjugate transpose of X. This variance is also a positive semi-definite square matrix. [edit]History The term variance was first introduced by Ronald Fisher in his 1918 paper The Correlation Between Relatives on the Supposition of Mendelian Inheritance:[5] The great body of available statistics show us that the deviations of a human measurement from its mean follow very closely the Normal Law of Errors, and, therefore, that the variability may be uniformly measured by the standard deviation corresponding to the square root of the mean square error. When there are two independent causes of variability capable of producing in an otherwise uniform population distributions with standard deviations θ1 and θ2, it is found that the distribution, when both causes act together, has a standard deviation . It is therefore desirable in analysing the causes of variability to deal with the square of the standard deviation as the measure of variability. We shall term this quantity the Variance... [edit]Moment of inertia The variance of a probability distribution is analogous to the moment of inertia in classical mechanics of a corresponding mass distribution along a line, with respect to rotation about its center of mass. It is because of this analogy that such things as the variance are called moments of probability distributions. The covariance matrix is related to the moment of inertia tensor for multivariate distributions. The moment of inertia of a cloud of n points with a covariance matrix of Σ is given by This difference between moment of inertia in physics and in statistics is clear for points that are gathered along a line. Suppose many points are close to the x and distributed along it. The covariance matrix might look like
  • 119.
    That is, thereis the most variance in the x direction. However, physicists would consider this to have a low moment about the x axis so the moment-of-inertia tensor is Overview The moment of inertia of an object about a given axis describes how difficult it is to change its angular motion about that axis. Therefore, it encompasses not just how much mass the object has overall, but how far each bit of mass is from the axis. The farther out the object's mass is, the more rotational inertia the object has, and the more force is required to change its rotation rate. For example, consider two hoops, A and B, made of the same material and of equal mass. Hoop A is larger in diameter but thinner than B. It requires more effort to accelerate hoop A (change its angular velocity) because its mass is distributed farther from its axis of rotation: mass that is farther out from that axis must, for a given angular velocity, move more quickly than mass closer in. So in this case, hoop A has a larger moment of inertia than hoop B. Divers reducing their moments of inertia to increase their rates of rotation The moment of inertia of an object can change if its shape changes. A figure skater who begins a spin with arms outstretched provides a striking example. By pulling in her arms, she reduces her moment of inertia, causing her to spin faster (by the conservation of angular momentum).
  • 120.
    The moment ofinertia has two forms, a scalar form, I, (used when the axis of rotation is specified) and a more general tensor form that does not require the axis of rotation to be specified. The scalar moment of inertia, I, (often called simply the "moment of inertia") allows a succinct analysis of many simple problems inrotational dynamics, such as objects rolling down inclines and the behavior of pulleys. For instance, while a block of any shape will slide down a frictionless decline at the same rate, rolling objects may descend at different rates, depending on their moments of inertia. A hoop will descend more slowly than a solid disk of equal mass and radius because more of its mass is located far from the axis of rotation, and thus needs to move faster if the hoop rolls at the same angular velocity. However, for (more complicated) problems in which the axis of rotation can change, the scalar treatment is inadequate, and the tensor treatment must be used (although shortcuts are possible in special situations). Examples requiring such a treatment include gyroscopes, tops, and even satellites, all objects whose alignment can change. The moment of inertia is also called the mass moment of inertia (especially by mechanical engineers) to avoid confusion with the second moment of area, which is sometimes called the moment of inertia (especially by structural engineers). The easiest way to differentiate these quantities is through their units (kg·m2 as opposed to m4 ). In addition, moment of inertia should not be confused with polar moment of inertia, which is a measure of an object's ability to resist torsion (twisting) only. [edit]Scalar moment of inertia [edit]Definition A simple definition of the moment of inertia (with respect to a given axis of rotation) of any object, be it a point mass or a 3D-structure, is given by: where m is mass and r is the perpendicular distance to the axis of rotation. [edit]Detailed analysis The (scalar) moment of inertia of a point mass rotating about a known axis is defined by The moment of inertia is additive. Thus, for a rigid body consisting of N point masses mi with distances ri to the rotation axis, the total moment of inertia equals the sum of the point-mass moments of inertia:
  • 121.
    The mass distributionalong the axis of rotation has no effect on the moment of inertia. For a solid body described by a mass density function, ρ(r), the moment of inertia about a known axis can be calculated by integrating the square of the distance (weighted by the mass density) from a point in the body to the rotation axis: where V is the volume occupied by the object. ρ is the spatial density function of the object, and r = (r,θ,φ), (x,y,z), or (r,θ,z) is the vector (orthogonal to the axis of rotation) between the axis of rotation and the point in the body. Diagram for the calculation of a disk's moment of inertia. Here c is 1/2 and is the radius used in determining the moment. Based on dimensional analysis alone, the moment of inertia of a non-point object must take the form: where M is the mass L is a length dimension taken from the centre of mass (in some cases, the length of the object is used instead.) c is a dimensionless constant called the inertial constant that varies with the object in consideration. Inertial constants are used to account for the differences in the placement of the mass from the center of rotation. Examples include: c = 1, thin ring or thin-walled cylinder around its center, c = 2/5, solid sphere around its center
  • 122.
    c = 1/2,solid cylinder or disk around its center. When c is 1, the length (L) is called the radius of gyration. For more examples, see the List of moments of inertia. [Parallel axis theorem Main article: Parallel axis theorem Once the moment of inertia has been calculated for rotations about the center of mass of a rigid body, one can conveniently recalculate the moment of inertia for all parallel rotation axes as well, without having to resort to the formal definition. If the axis of rotation is displaced by a distance r from the center of mass axis of rotation (e.g., spinning a disc about a point on its periphery, rather than through its center,) the displaced and center-moment of inertia are related as follows: This theorem is also known as the parallel axes rule and is a special case of Steiner's parallel-axis theorem. Composite bodies If a body can be decomposed (either physically or conceptually) into several constituent parts, then the moment of inertia of the body about a given axis is obtained by summing the moments of inertia of each constituent part around the same given axis.[2] Equations involving the moment of inertia The rotational kinetic energy of a rigid body can be expressed in terms of its moment of inertia. For a system with N point masses mi moving with speeds vi, the rotational kinetic energy T equals where ω is the common angular velocity (in radians per second). The final expression I ω2 / 2 also holds for a mass density function with a generalization of the above derivation from a discrete summation to an integration. In the special case where the angular momentum vector is parallel to the angular velocity vector, one can relate them by the equation where L is the angular momentum and ω is the angular velocity. However, this equation does not hold in many cases of interest, such as the torque-
  • 123.
    free precession ofa rotating object, although its more general tensor form is always correct. When the moment of inertia is constant, one can also relate the torque on an object and its angular acceleration in a similar equation: where τ is the torque and α is the angular acceleration. Moment of inertia tensor In three dimensions, if the axis of rotation is not given, we need to be able to generalize the scalar moment of inertia to a quantity that allows us to compute a moment of inertia about arbitrary axes. This quantity is known as the moment of inertia tensor and can be represented as a symmetric positive semi-definite matrix, I. This representation elegantly generalizes the scalar case: The angular momentum vector, is related to the rotation velocity vector, ω by and the kinetic energy is given by as compared with in the scalar case. Like the scalar moment of inertia, the moment of inertia tensor may be calculated with respect to any point in space, but for practical purposes, the center of mass is almost always used. [edit]Definition For a rigid object of N point masses mk, the moment of inertia tensor is given by , where
  • 124.
    and I12 =I21, I13 = I31, and I23 = I32. (Thus I is a symmetric tensor.) Here Ixx denotes the moment of inertia around the x-axis when the objects are rotated around the x-axis, Ixy denotes the moment of inertia around the y-axis when the objects are rotated around the x-axis, and so on. These quantities can be generalized to an object with distributed mass, described by a mass density function, in a similar fashion to the scalar moment of inertia. One then has where is their outer product, E3 is the 3 × 3 identity matrix, and V is a region of space completely containing the object. Alternatively, the equation above can be represented in a component-based method. Recognizing that, in the above expression, the scalars Iij with are called the products of inertia, a generalized form of the products of inertia can be given as The diagonal elements of I are called the principal moments of inertia. Derivation of the tensor components The distance r of a particle at from the axis of rotation passing through the origin in the direction is . By using the formula I = mr2 (and some simple vector algebra) it can be seen that the moment of inertia of this particle (about the axis of rotation passing through the origin in the direction)
  • 125.
    is This isa quadratic form in and, after a bit more algebra, this leads to a tensor formula for the moment of inertia . This is exactly the formula given below for the moment of inertia in the case of a single particle. For multiple particles we need only recall that the moment of inertia is additive in order to see that this formula is correct. Reduction to scalar For any axis , represented as a column vector with elements ni, the scalar form I can be calculated from the tensor form I as The range of both summations correspond to the three Cartesian coordinates. The following equivalent expression avoids the use of transposed vectors which are not supported in maths libraries because internally vectors and their transpose are stored as the same linear array, However it should be noted that although this equation is mathematically equivalent to the equation above for any matrix, inertia tensors are symmetrical. This means that it can be further simplified to: [edit]Principal axes of inertia By the spectral theorem, since the moment of inertia tensor is real and symmetric, it is possible to find a Cartesian coordinate system in which it is diagonal, having the form where the coordinate axes are called the principal axes and the constants I1, I2 and I3 are called the principal moments of inertia. The principal axes of a body, therefore, are a cartesian coordinate system whose origin is located at the center of mass. [3] The unit vectors along the principal axes are usually denoted as (e1, e2, e3). This result was first shown by J. J. Sylvester (1852), and is a
  • 126.
    form ofSylvester's lawof inertia. The principal axis with the highest moment of inertia is sometimes called the figure axis or axis of figure. When all principal moments of inertia are distinct, the principal axes are uniquely specified. If two principal moments are the same, the rigid body is called a symmetrical top and there is no unique choice for the two corresponding principal axes. If all three principal moments are the same, the rigid body is called a spherical top (although it need not be spherical) and any axis can be considered a principal axis, meaning that the moment of inertia is the same about any axis. The principal axes are often aligned with the object's symmetry axes. If a rigid body has an axis of symmetry of order m, i.e., is symmetrical under rotations of 360°/m about a given axis, the symmetry axis is a principal axis. When m > 2, the rigid body is a symmetrical top. If a rigid body has at least two symmetry axes that are not parallel or perpendicular to each other, it is a spherical top, e.g., a cube or any other Platonic solid. The motion of vehicles is often described about these axes with the rotations called yaw, pitch, and roll. A practical example of this mathematical phenomenon is the routine automotive task of balancing a tire, which basically means adjusting the distribution of mass of a car wheel such that its principal axis of inertia is aligned with the axle so the wheel does not wobble. [edit]Parallel axis theorem Once the moment of inertia tensor has been calculated for rotations about the center of mass of the rigid body, there is a useful labor-saving method to compute the tensor for rotations offset from the center of mass. If the axis of rotation is displaced by a vector R from the center of mass, the new moment of inertia tensor equals where m is the total mass of the rigid body, E3 is the 3 × 3 identity matrix, and is the outer product. [edit]Rotational symmetry Using the above equation to express all moments of inertia in terms of integrals of variables either along or perpendicular to the axis of symmetry usually simplifies the calculation of these moments considerably. Comparison with covariance matrix Main article: Moment (mathematics)
  • 127.
    The moment ofinertia tensor about the center of mass of a 3 dimensional rigid body is related to the covariance matrix of a trivariate random vector whose probability density function is proportional to the pointwise density of the rigid body by:[citation needed] where n is the number of points. The structure of the moment-of-inertia tensor comes from the fact that it is to be used as a bilinear form on rotation vectors in the form Each element of mass has a kinetic energy of The velocity of each element of mass is where r is a vector from the center of rotation to that element of mass. The cross product can be converted to matrix multiplication so that and similarly Thus, plugging in the definition of the term leads directly to the structure of the moment tensor.