This document summarizes a software project called Digital Speech within 125 Hz Bandwidth (DS-125). The project aims to transmit live voice over long distances using a low bandwidth of 125 Hz by breaking down voice into short audio clips and unique identification codes. It discusses how voice is sampled and converted into binary digits sent at 125 Hz, then reconstructed at the receiving end into synthetic voice. The project seeks to address voice transmission for applications like communication with astronauts on Mars by using a low bandwidth to overcome signal degradation issues.
1. Digital Speech within 125 Hz Bandwidth (DS-125)
Digital Speech within 125 Hz Bandwidth is a software project containing
three parts, a computer with a microphone and audio output, a transport
device, and a second far away computer with an audio input and a
speaker. The transport device has a bandwidth of 125 Hz that violates the
Shannon-Hartley Theorem, which implies that voice bandwidth cannot be
less than the bandwidth of voice without adding distortion.
Let’s use an analogy we can all understand. Get a crate of oranges and
make orange juice. Then take the water out, put the rest in a can and
freeze the can. Take the frozen can from Florida to New York. Open the
can and add water to make orange juice, but not the original crate of
oranges.
All digital computers run on a very primitive form of math that has only two
numbers; one and zero. So when live voice enters this computer program
for a duration of one second, the sound card produces 48,000 samples
with 16 ones and zeros. That is 768,000 ones and zeros. This software
sends 125 ones and zeros per second to the other computer. This other
computer takes those 125 ones and zeros per second and turns them into
768,000 ones and zeros per second so that the computer's sound card
plays synthetic live voice on the headset.
Long ago writers needed a transport device to send the written word to far
away readers. Morse code (not to be confused with software) with a very
small bandwidth filled part of that need even though the penmanship was
removed. Today astronauts to Mars need a transport device to send live
2. voice to Earth. The Lebo code (also not to be confused with software) fills
part of that need.
Why do this? Radios are used to send live voice from one place to
another. Lots of things get in the way of the radio signal. If there is not
enough signal, you can not hear the live voice. To fix this you can use
more power by adding an amplifier to give the signal gain, or you can use
a bigger antenna to give the signal gain, or you can use DS-125 to reduce
the bandwidth to give the signal gain. The cost of amplifiers with power
supplies or huge antennas are prohibitive, but DS-125 could be free. The
uses of DS-125 include, live voice to and from Mars, live voice from Earth
to the moon and back, live voice for underwater communications, etc.
For a practical development system (proof-of-concept) only one computer
is needed for simplex (one direction) live voice from the microphone to the
headset which prevents feedback. The microphone on a laptop computer
is always live, no need to add an echo with an external microphone.
The project name is digital speech, not digital voice. What if 40% of the
live voice was distorted? Would that constitute failure? What if, your mind
didn’t perceive distortion at that 40% of the time? There are known
sounds that make up speech and there must be transition times between
these sounds, because the parts of the mouth, tongue, lips, jaw, etc. take
time to change from one sound to another. Although your ears can detect
these transitions, why should your mind use these transition sounds?
At the output of the headset, synthetic voice is played that is made of
extremely short pre-recorded audio clips. They are made from your live
3. voice that is spoken into the microphone. They contain many different
frequencies. The audio clips are connected together in the correct order to
make your synthetic voice, but the phase of each frequency at the end of
any audio clip must match the phase of each frequency at the beginning
of the next clip or major distortion occurs. If the amplitude of the audio clip
at the beginning and end was zero, the phase of all frequencies would
match, but this reduction of the amplitude creates a new kind of distortion.
The solution is to extend the duration of the audio clips by .04 seconds
and use that time to slowly reduce the amplitude at the end of the first clip
and also use that .04 seconds to slowly raise the amplitude at the
beginning of the next audio clip, then overlap the audio clips by that same
.04 seconds. This is the 40% distortion.
On the original computer a one or a zero goes into the transport device
every .008 seconds, which is the 125 Hz bandwidth. A sound detector that
works every .008 second is needed to know, if the next digit is a one or
zero. But sound detectors make distortion. I have limited the total number
of sounds in voice up to 88, but what if more are needed? Linguists have
found between 40 to 47 phonemes that make up the English language,
meaning there is a low risk of adding distortion. I have divided voice into
16 logarithmic bands of frequencies for this sound detector. The Cochlear
implant used by deaf people to hear, has only eleven frequency bands,
meaning there is also a low risk of adding distortion. The 16 audio
bandpass filters are not perfect filters, but are very fast. Because the
same filters are used to make the numbers and compare those numbers,
the errors cancel. Each band of filters has an amplitude detector and
frequency detector. The numbers from the two detectors also are used to
4. make their first and second derivatives, which totals 96 numbers every
.008 seconds. Only some of these numbers are used to make the 88
comparisons for each sound. But most of the time the 96 numbers from
the sound detector do not fit any of your 88 known sounds. Those .008
seconds time slots are assigned as unknowns.
It is obvious that people cannot change their mouth, tongue, lips, jaw, etc.
in .008 seconds, so when the sound detector finds a match to the unique
cluster of numbers for a known sound, the same unique cluster of
numbers often repeats. Sounds come in repeating groups of .008 seconds
time slots. In between these groups are unknown sounds. The rule to fix
these unknown time slots is to make half of the unknown time slots match
the earlier group of a known sound, and the rest of the unknown time slots
are matched to the next group of a known sound. The total duration of
each of these new known sound groups without unknown time slots must
fit a unique code of ones and zeros at 125 Hz rate to be sent into the
transport device for each of your 88 sounds.
The following is not part of the development system, but it must be
addressed now to understand the final product. I invented the Lebo code,
which contain both speech and timing. The way this is done is to make all
Lebo codes start with one and end with two or more zeros, making 88
minimum length Lebo codes with eleven or less digits. To extend the time
of any of the 88 codes in .008 seconds steps, add extra zeros. The time to
send any Lebo code through the transport device is exactly the same time
that plays its audio clip on the headset. The maximum extra zeros needed
are twice the size of the minimum length Lebo code for that sound, minus
5. one. There is no rule that says the same Lebo code cannot be repeated.
That is a total of 843 Lebo codes (843 audio clips) that make up all of
your speech and timing. You cannot say any word that the 125 Hz
bandwidth Lebo code cannot play on the headset. Also a 844th Lebo
code of 1100 which turns on the squelch that mutes the headphones
when you stop sending. There are 19 different lengths of Lebo codes with
associated audio clips lasting from .024 to .168 seconds. Some sounds in
live voice have a short duration and should be assigned to the smaller
minimum length Lebo codes. The smallest Lebo code is assigned to “no
sound”, which is the most used of all sounds, the easiest sound to detect
and has no distortion. Some sounds in voice always have a long duration
and should be assigned to the largest minimum length Lebo codes.
What happens when the sender talks too fast for a sound group to fit its
Lebo code? If a sound group without unknown time slots has half or more
of its minimum length Lebo code, the full Lebo code is sent, but one extra
zero is remover from each of the future sound groups for each digit
added. If a group has less than half of its minimum length Lebo code, the
group is removed, and one extra zero is added to each of the future sound
groups for each digit of the removed group. This time banking must add a
little time distortion, but our minds may compensate. In real life, you
should tell the sender to slow down.
When all of your 88 sounds can be detected, the unknown time slots are
removed and the groups that are less than half the minimum length are
removed, you can play your synthetic voice by putting your found audio
clips into the audio output. If you can understand yourself all the time, no
6. further proof is needed that this project works. If you cannot understand
yourself, the project fails. Unfortunately this project only works, if all parts
are complete, because timing cannot be scaled.
So how do you find your unique cluster combination from some of the 96
known numbers for each of your 88 sounds to compare with your 96
unknown numbers from the sound detector every .008 second? Your
unique cluster numbers must determine, if your unknown sound is a
match to one of your 88 known sounds. An example of your unique cluster
combination would be, if your 16 peak amplitude numbers from the 16
audio bandpass filters are below a fixed value. Then your unknown sound
must be “no sound”, which is the first of your 88 known sounds. Your
other 87 unique cluster number combinations and your 840 short audio
clips need to be made. Would you like to know how to make them?
Eventually someone like you must finish writing the software for this
project with or without my help. My email address is
mike.lebo@gmail.com If you want to know, as Paul Harvey said, “The
rest of the story” of how to make the missing parts, please contact me.