Automating Google Workspace (GWS) & more with Apps Script
Brief Introduction to Error Correction Coding
1.
2. I. Overview of some basic image processing
Here is a quick story about an image on the internet to set the stage for our discussion.
A galaxy sits out on edge of our galaxy. It emits a wide spectrum of electromagnetic
waves. A satellite is designed and sent into space to image the wide range of
electromagnetic waves emitted from things like this galaxy. On board it has different
equipment including a camera that is capable of capturing visible light. To simplify
things let’s pretend this lens is a lot like modern digital camera lenses and consists of a
matrix of photocells that are sensitive to red, green, and blue light. These build an array
of data based on the intensity of a particular band of light received. After capturing an
image our satellite encodes the data and transmits it using radio frequencies. These are
received on earth by satellite receivers, transmitted to computers, decoded, and
analyzed by scientists.
These scientists are pretty clever and create a color image by combing the data
received; they decide to share it as .jpg on the Internet.
In this way the visible information emitted from the galaxy is turned into electronic
information that is then encoded and retransmitted as yet another type of
electromagnetic information, received, decoded and retranslated yet again into
different electronic signals and recombined only to be sent again through a series of
routers and servers to re-emerge through your monitor as visible light.
Let’s take a minute to look at just what a .jpg image is in terms of linear algebra. You can
think of any image as an nXm matrix where each entry in matrix consisting of pixel
values.
In an 8-bit .jpg these values range for 0-255. A typical color .jpg consists of 3 such
matrices, each one describing a discreet color space. When these spaces combine they
form what we typically see as a color image in the visible spectrum of light.
Each pixel can in turn be thought of as a “binary” vector in the form
[b1 b2 b3 b4 b5 b6 b7 b8] where each bit is a member of the finite vector field F2 |{0,1}.
Basically that means they can only be 1’s or 0’s and have special “mod 2” addition rules.
They can also be easily turned back into their integers.
It’s to the delivery of this single pixel vector our discussion now turns.
3. II. Overview Of Error Correction
One of the many of technical challenges involved with our little satellite taking its
picture that will eventually end up on Wikipedia is getting that information, and our
little pixel, all the way back to earth in one piece.
Space is pretty big and there are lots of things that could mess that signal up. You add in
the probability of the on board electronic hardware introducing errors, along with the
whole information chain once it gets TO earth and its pretty amazing we can send any
digitized information anywhere.
To top off our challenge, our satellite only has so much energy with which to
communicate, so each and every image, and each and every pixel should be accounted
for! So how do we make sure every bit transmitted gets received with perfect parity?
Sending duplicate copies would work, but it means that each transmission wastes
energy because some amount of your signal is going to be received just fine. Imagine
having power for 100 images, but duplicating the data 4 times means you can only send
25 images worth of information.
We could just wait to get a transmission, check it out, if its messed up ask the satellite to
resend it. Again that uses more energy, and there isn’t any guarantee you would get any
better messages the second, third, fourth, nth time you asked. This could cause us to
use up all the energy for just a single image.
There has to be a better way that we can have our satellite send us data...
There is!
And the foundation for the techniques employed is called Code Theory. It heavily
leverages linear algebra, and provides methods by which data is encoded in such a way
that even if it gets messed up through transmission or reception it can be “fixed up”
once someone gets their hands on it.
This means that every message our little satellite sends is consuming less power than if
we relied on redundancy or resending. Along with its long storied history, it also has the
added benefit of allowing us to automatically correct the errors that get introduced.
4. III. History Of Error Correcting Codes And Introduction To Code
Theory
Error correction codes, and code theory in general, has its start with our need to send
information through channels that mangle it to machines that can’t reason between
messed up data and good data.
The individual attributed to the start of modern error correction is Richard Hamming. A
mathematician who among other things, was responsible for the machine level
calculations to determine if the atomic bomb would ignite the atmosphere. During this
early time of computing he had a lot of automated tests to run over weekends. If
everything went well he would have a pile of answers. If everything didn’t go so well he
would have a pile of garbage. After multiple weekends of things not going so well he
began thinking about ways the machine itself could just fix the errors as they occurred
saving everyone, including himself, a lot of time and headache.
The basic premise of code theory is as follows: A message is encoded by a generator
matrix. Once encoded, the message becomes a code-word and consists of two parts.
One is the message bits, and the other is a set of parity bits. Once received, the code
word is ran through a parity-check matrix that produces a result called a syndrome. If
the syndrome is the zero vector then the message was received without errors and the
original message can be processed. If the syndrome is not the zero vector, it will be a
vector corresponding to the location of the error introduced through the
communication channel, and thus automatically fixable.
This is because parity bits are constructed using the message bits themselves, thus
becoming linearly dependent on them. Each message bit is tied to some unique
combination of parity bits. This means that if a message bit is changed, you can simply
look to see if the parity bits still make sense, and if they don’t you know where the error
is coming from. Because all you work with is 1’s and 0’s you can then fix it.
For a traditional “Hamming Code” The amount of errors your can detect and correct is
based on the relationship between how many data bits you have, and how many parity
bits you use. A message contains K bits of data, and N total bits. The number of parity
bits needed is N-K, and is referred to as D. D is also the “distance” or the maximum
number of ways in which each message can differ from one another. It is also written as
[n,k,d] code. N/K produces a ratio called the rate that informs you as to how effective
an [n,k,d] code is versus redundantly sending a message , so you know much power per
bit you are saving.
5. IV. Mechanical Explanation Of Error Correcting Codes
Below is the mechanical “bit by bit” process of encoding and decoding a signal using an
error correction code.
This example uses the standard set up of a [7,4,3] hamming code with an extra 8th
bit.
Start with an empty array of 8 blank bits: {[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]}
Assign the 8 Bits to be either Parity bits or Message bits. The actual parity bits chosen
are based on a pattern. Bit 1 checks 1 bit skips one bit, Bit 2 checks 2 bits skips 2 bits, Bit
3 having been checked is out of the rotation. Bit 4 checks 4 bits skips 4 bits. Bits 5, 6, and
7 have already been checked by other bits so they become our message bits with the
eighth bit reserved for later.
Our new, empty, bit array looks like this: {[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]}
Now assign a binary message to be sent: {[ ] [ ] [1] [ ] [1] [0] [1] [ ]}
Now generate the parity bits using the message bits.
Bit 1 add (mod 2) bits 3, 5, and 7. 1+1+1 = 1. {[1] [ ] [1] [ ] [1] [0] [1] [ ]}
Bit 2 add (mod 2) bits 3, 6, and 7. 1+0+1 = 0. {[1] [0] [1] [ ] [1] [0] [1] [ ]}
Bit 4 add (mod 2) bits 5, 6, and 7. 1+0+1 = 0. {[1] [0] [1] [0] [1] [0] [1] [ ]}
Bit 8 add (mod 2) the first seven bits. 1+0+1+0+1+0+1 = 0. {[1] [0] [1] [0] [1] [0] [1] [0]}
So our final coded message is: {[1] [0] [1] [0] [1] [0] [1] [0]}
Say we send this message but it gets received as: {[1] [0] [1] [0] [1] [1] [1] [0]}
How do we detect the error? We just recalculate all the parity bits to see which ones
don’t match the received message.
In this case Bit 1 is still 1, Bit 2 is 1, Bit 4 is 1. Comparing to our received message we
have a discrepancy with bits 2 and 4. To find the message bit with the error just add the
parity bit locations with discrepancies together 2+4 = 6. Knowing this we can simply
switch bit 6 from 1 to 0 and have our originally intended message!
6. V. Linear Algebra and Error Correction Codes
Having manually seen how to encode and decode a message we can use linear algebra
to give us a more general framework to understand them at a deeper mathematical
level.
Let’s look at the encoder, or Generator, matrix:
You start with a message vector m, and a generator matrix G, where G has dimensions N (number of
message bits) by K (number of total bits).
When you send m through G via matrix multiplication you get c, which is a “code word”, we can write this
as mG=c. C is a linear combination of m and G where G forms the basis of for all the code words in N.
In notation this looks like this:
m = [ ]
G = [ ]
The first part of G is the identity matrix for n terms, and the second part is the inverse of a part of the
parity matrix. In a hamming code the non-identity part consists of what parity bits are checking what
message bits.
For our [7,4,3] hamming code it G would look like this:
When you multiply mG you get the linear combination
c = [m1, m2, m3, m4, (m1+m2+m4), (m1+m3+m4),
(m2+m3+m4)]. These columns correspond to bits {3, 5, 6, 7, 1,
2, 4}
Now for the decoder, Parity Check, matrix:
The key thing here is that G creates all the codes that become the null-space of H. This means that Hc, if c
is a valid code word, will equal the zero vector. If it doesn’t, it will be a column in H corresponding to the
bit with the error in it. It’s the linear dependence that exists between parity and message bits that allows
this to happen.
You could geometrically think of all code words being perpendicular to the vectors in H. If a code-word
has an error then part of it is “tilted” into H.
In notation it looks like:
c = kX1 column vector. [m1, m2, m3, m4, (m2+m3+m4), (m1+m3+m4), (m1+m2+m4)]
H = [ ]
For our [7,4,3] hamming code it H would look like this:
For a hamming code, H turns out to be the bit numbers in
binary, then column shifted so the identity matrix is in back.
The columns are ordered by bits {6,5,3,7,4,2,1}
1 0 0 0 1 1 0
0 1 0 0 1 0 1
0 0 1 0 0 1 1
0 0 0 1 1 1 1
1 1 0 1 1 0 0
1 0 1 1 0 1 0
0 1 1 1 0 0 1
7. V. Vector Spaces and Sets
Let’s take a look at what is going on with the different spaces to get a better feel for what is happening.
K is the finite vector space consisting of all code-words {c1, c2, c3 …cn} generated by the basis G, and the
coefficients m. N is the finite vector space of all N combinations possible of {0,1}. There are 2^k code-
words, and 2^n possible ways to receive a message.
How the generator and parity matrix are related is what allows error detection and correction to take
place. Any valid codeword c is in the rowspace of G, and the rowspace of G is the nullspace of H. The
inverse is also true, the rowspace of H is the nullspace of G. This means the code-words are in the
nullspace of H. This means that to construct H you just need to build a system of equations such that
Hc=0.
You can also look at the relationship between what sets of parity bit are dependent on what message bits.
These relationships form smaller sets of vectors within the code word vector itself and it’s this
interdependence that leads to the interaction between the generator and parity matrices.
K K
N
H
0
G
8. VI. Pseudo C++ Vs. Pseudo Matlab Encoding Comparison
The main thing to note here is the utility of the hamming matrix . It actually has all of the functionality of
the C++ code IMPLICITLY integrated into its very structure. It’s a good lesson that clever use of matrices
can simplify programming.
C++
This pseudo C++ implementation of a generator matrix uses the bit shift operator to manually add a bit,
then work through each bit, one at a time using the parity bit being calculated to control the number of
bits shifted.
EncodeImage (inputData, parityBit) { resultBit = dataLength =
inputData.length
while (0 < dataLength) { result &= inputData & 1 inputData = inputData
>> parityBit;
dataLength -= parityBit }
return result
}
EncodeEveryThing (inputData) { outputData = 0
# Encode all our parity bits
(parityBit in parityBits) outputData |= EncodeImage(inputData,
parityBit)
# Shove in our data
(dataBit in dataBits) { dataBitShift = imgageData outputData |=
(inputData & dataBit) << dataBitShift }
return outputData
}
Matlab
The pseudo matlab code is much more compact. It reads in the file, saves it as an array then loops
through the array elements performing the matrix operations for each pixel, saving them to a new
image matrix.
function [ imageCoded ] = ImageEncode( filename )
img = filename
a = imgread(‘img’);
G=ham[7,4];
dataLengthR = size(a,:);
dataLengthC = size(:,a);
i = 1;
j = 1;
while (0 < dataLengthR){
c=dec2bin(a(i,j));
bBin(i,j)=c*G;
bDec=dec2bin(bBin(i,j);
imageCoded(i,j)=bDec;
i = (i+1);
dataLengthR = (dataLengthR + 1);}end
9. Work Sheet
1) Given the code [7,4,3].
a) How many message bits are there?
b) What is the rate?
c) What is distance?
d) What is total size code generated?
e) Is this code more efficient than a [6,3,3] code?
f) How many potential messages are possible?
2) Assume you receive this bit array [10101110] encoded with a [7,4,3] generator
matrix.
a) Is there an error?
b) What bit has the error?
3) You have two words [001] and [101]?
a) What is distance between them?
b) What is the maximum distance for ANY two words in this code?
c) What is total number of words in this code?
4) Given the parity matrix H as
Hc= [1
0
1]
a) What column of H does the syndrome correspond to?
b) What row of c is the error in?
5) In the manual example above, what is the function of the eighth parity bit?
1 1 0 1 1 0 0
1 0 1 1 0 1 0
0 1 1 1 0 0 1
10. Sources
Error–correcting codes with linear algebra
http://www.math.union.edu/~jaureguj/ECC.pdf
Hamming, "Error-Correcting Codes" (April 21, 1995)
http://www.youtube.com/watch?v=BZh07Ew32UA
Encoding and Decoding with the Hamming Code
http://www.uwyo.edu/moorhouse/courses/4600/encoding.pdf
The Hamming Code in Matrix Form
http://www.ece.unb.ca/tervo/ee4253/hamming2.shtml
Error Correction and the Hamming Code
http://www.ece.unb.ca/tervo/ee4253/hamming.shtml
TLT-5400/5406 DIGITAL TRANSMISSION
http://www.cs.tut.fi/kurssit/TLT-5400/matlab_tehtavat/TLT5400_M5.pdf
Construction of Hamming codes using Matrix
http://www.gaussianwaves.com/2008/05/construction-of-hamming-codes-using-matrix/
Lecture - 15 Error Detection and Correction
http://www.youtube.com/watch?v=aNqiTCZ-nko
Parity Check
http://en.wikipedia.org/wiki/Parity_check
Linear Block Codes
http://en.wikipedia.org/wiki/Linear_block_codes
Generator Matrix
http://en.wikipedia.org/wiki/Generator_matrix
Parity Check Matrix
http://en.wikipedia.org/wiki/Parity_check_matrix
Hamming Code
http://en.wikipedia.org/wiki/Hamming_code
Systemic Code
http://en.wikipedia.org/wiki/Systematic_code
Forward Error Correction
http://en.wikipedia.org/wiki/Forward_error_correction
How To Count In Binary
http://www.wikihow.com/Count-in-Binary