3. an easy-to-use home automation system that can be fully
operated based on speech commands. The system is
constructed in a way that is easy to install, configure, run, and
maintain. The functional blocks of the overall system are
shown in Figure 2.
Figure 2: Sequence of activities in the Wireless Home Automation System
The system consists of three modules:
• Handheld Microphone Module which incorporates a
microphone with RF module (ZigBee protocol).
• Central Controller Module (PC based).
• Appliance Control Modules.
Figure 3 illustrates the sequence of activities in the WHAS.
The voice is captured using a microphone, sampled, filtered
and converted to digital data using an analogue-to-digital
converter. The data is then compressed and sent serially as
packets of binary data. At the receiving end (Central Controller
Module), binary data are converted to analogue, filtered and
passed to the computer through the sound card. A Visual Basic
application program, running on the PC, uses Microsoft Speech
API library for the voice recognition. Upon recognition of the
commands, control characters are sent wirelessly to the
specified appliance address. Consequently, appliances can be
turned ON or OFF depending on the control characters
received.
III. HARDWARE DESIGN
In this section we present the hardware descriptions of the
three modules that constitute the WHAS.
A. Handheld Microphone Module(MM)
The components of the microphone module are shown in
Figure 4. The system captures human voice using a sampling
rate (fs) of 8 kHz. It is known that the highest frequency
component of the human voice is 20 kHz, however the most
significant parts of the information is encoded in frequencies
between 6 Hz and 3.5 kHz [6]. To meet Nyquist sampling
criteria, an anti-aliasing filter is used to block all the
frequencies above the Nyquist frequency (Fn).
Figure 3: Functional block diagram of the Wireless Home Automation System (WHAS). Legends- A: Analogue, D: Digital
4. 2s nf F= (1)
Figure 4: Block diagram of the handheld Microphone Module.
The incoming speech wave goes through a low pass filter
(Figure 5). A 3-pole Butterworth low pass filter is used as an
anti-aliasing filter [7]. The signal is then amplified in order to
utilise the full range of the ADC. A voltage divider and a DC
blocking capacitor provide a voltage translation from the filters
to the ADC. In the microcontroller, data is first converted to
digital format using the in-built ADC, and then compressed
using Differential Pulse Code Modulation (DPCM) algorithm.
The data is compressed from 12 bits to 6 bits. Data are sent
serially from the microcontroller to the ZigBee RF module at
the baud rate of 115200 bits/s. This is the maximum
configurable baud rate provided by ZigBee [8].
1
2
P2
Header 2
TO_ADC0
MK01
Mic1
GND
10K
R1
154k
R3
4.99K
R?
154K
R4 133K
R5
VCC
0.1u
C1
1u
C2
1u
C4
10uF
C3
GND
8
3
2
4
1
A (COMP)
X1A
LM392N
5
6
7
B (OPA)
84
X1B
LM392N
VCC
GND
5.1K
R6
7p
C5
1u
C6
GND
VCC
2700p
C7
1200p
C8
4.99K
R7
9.76K
R8
GND
100K
R10
100K
R9
10K
R11
10n
C10
0.1u
C9
GND
VCC
GND
Figure 5: Portable microphone circuit.
B. Central Controller Module
The functional blocks of the central controller module are
shown in Figure 6. At the central controller module
(coordinator), when data are received, the received bytes are
decompressed using DPCM algorithm [9]. Decompressed data
is assigned to the digital-to-analogue converter (DAC). The
analogue output of the DAC is filtered and fed to the computer
as analogue signal through the sound card of the PC. The filter
and amplifier circuit is shown in Figure 7.
Figure 6: Block diagram of the Central Controller Module.
SIGNAL
1
COMP
2
VOUT
3
VSS
4
DEM
5
VIN
6
PCOMP
7
VDD
8
U1
TPA4861D
8
3
2
4
1
A (COMP)
U2A
MC33204DR2
5
6
7
B(OPA)
84
U2B
MC33204DR2
15K
R1
150K
R2
8.2K
R3
150K
R4
6.8K
R5 12K
R62.7K
R7
22K
R8
GND
4.7n
C1
3.3n
C2
VCC
GND
VCC
GND
100n
C3
DAC0
68n
C4
3.3n
C5
GND
100n
C6
10uF
C7
GND
VCC
LS
Speaker
GND
VO2
VO1
1.5n
C10
100n
C11
10uF
C8
100n
CDEC
VCC
GND
GND
1
2
3
P1
Header 3
VCC
Figure 7: Filtering and amplification circuit of the received audio.
C. Appliance Control Module
Once the speech commands are recognised, control
charterers are sent to the specified appliance address through
ZigBee communication protocol. Each appliance that has to be
controlled has a relay controlling circuit shown in Figure 8.
K1
Relay-SPST
Q1
2N3904
Port to home appliance
D2
Diode 1N4934
1
2
3
J1
PWR2.5 GND
3900
R2
Res1
GND
1
2
P1
MHDR1X2
GND
frm ucon
ACGND
VAC230AC50HZ
1
2
3
4
5
6
7
8
9
10
11
P2
Header 11
VCC
14
GND
7
A
1
B
2
Y
3
U1A MM74HC08N
10K
R3
12K5
R4
GND
Vcc2
Vcc2
Vcc3
4
5
6
U1B
MM74HC08N
8
9
10
U1C
MM74HC08N
12
13
11
U1D
MM74HC08N
GND
Vcc3GND
10n
Ccou
Vcc3
GND
Figure 8: Circuit schematic for appliance control module
IV.SOFTWARE DESIGN
Software design includes ADC sampling and
compression/decompression algorithms, transmission and
receiving, and voice recognition.
A. ADC sampling and data compression / decompression
The portable microphone module implements DPCM
compression scheme. This compression algorithm is inherently
lossy because of the error incurred due to the nature of the
compression algorithm. The algorithm compresses each ADC
sample from 12 bits of data down to 6-bit codes. This code
represents the difference between the actual sample and the
5. predicted value of the sample. The predicted sample is obtained
from the previous iteration result. The difference between the
sample and the predicted value is then quantised. The 6 bit
code is then packed into bytes of data in order to send them
serially. In order to calculate the new predicted value, the
compression algorithm decodes the difference and adds it into
the current predicted value.
Figure 9: DPCM Compression algorithm.
Figure 10: DPCM Decompression Algorithm.
Figure 9 shows the DPCM compression algorithm. At the
receiving end, data are decompressed to the original form using
the DPCM decompression algorithm. Figure 10 shows the
decoding algorithm which basically matches the received code
with the quantised difference and adds this difference to the
predictor [10].
B. Voice Recognition Application
The voice recognition application implements Microsoft
speech API. The application compares incoming speech with
an obtainable predefined dictionary. The Microsoft speech API
run time environment relies on two main engines: Automatic
Speech Recognition (ASR engine) and Text To Speech (TTS
engine) as shown in Figure 11. ASR implements the Fast
Fourier Transform (FFT) to compute the spectrum of the
fingerprint data [4]. Comparing the fingerprint with an existing
database returns a string of the text being spoken. This string is
represented by a control character that gets sent to the
corresponding appliance’s address.
Figure 11: Voice recognition application hierarchy.
The designed graphical user interface (GUI) offers the user
the choice of selecting the desired serial communication port as
well as it provides a record of all the commands that have been
recognised and executed. The application implements the
hierarchy described earlier in Figure 11 and the flow chart
shown in Figure 12. When designing the programme GUI,
making it a user friendly application was a huge priority since
the target clients need to avoid any possible complications in
the system. A screen shot of the GUI is shown in Figure 13.
Control characters corresponding to the recognised
commands are then sent serially from the central controller
module to the appliance control modules that are connected to
the home appliances.
Figure 12: Flow chart of the voice recognition application.
6. Figure 13: Voice recognition GUI
C. ZigBee RF communication
Zigbee protocol is the communication protocol that’s used
in this system. Zigbee offers 250 kbps as maximum baud rate,
however, 115200 bps was used for sending and receiving as
this was the highest speed that the UART of the
microcontroller could be programmed to operate at.
For each byte transmitted, there is a start and stop bit.
Hence the actual baudrate is :
The amount of data (bits/s) produced by the ADC is:
The streaming will not be possible without voice data being
compressed [11]. After compression, the total resultant data
rate (bits/s) will be:
This allows a window for error checking and resending data
if necessary.
V. EXPERIMENTAL RESULTS AND DISCUSSIONS
The prototype of the system has been fabricated and tested.
Figure 14 shows the microphone module. Figure 15 shows the
appliances control module.
Figure 14: Microphone circuit board with ZigBee module
Figure 15: Fabricated relay control unit
Figure 16: Results of voice recognition experiments showing percentage of
correct recognition for different ethnicity/accent
The graph in Figure 16 and the data in Table I show the
response of the speech recognition application to spoken
commands. The tests involved 35 subjects; the trails were
7. conducted with people with different English accents. The test
subjects were a mix of male and female and 35 different voice
commands were sent by each person. Thus the test involved
sending a total of 1225 commands. 79.8% of these commands
were recognised correctly. When a command is not recognised
correctly, the software ignores the command and does not
transmit any signals to the device control modules. The
accuracy of the recognition can be affected by background
noise, speed of the speaker, and the clearity of the spoken
accent. These factors need to be studied further in more details
by conducting more tests. The system was tested in an
apartment and performed well up to 40m. With a clear line-of-
sight transmission (such as in a wide open gymnasium) the
reception was accurate up to 80m.
Additional tests are being planned involving a bigger
variety of commands.
TABLE I: RESULTS OF VOICE COMMAND RECOGNITION TESTS: PERCENTAGE OF
COMMANDS CORRECTLY RECOGNISED
VI. CONCLUSIONS AND FUTURE WORK
A home automation system based on voice recognition was
built and implemented. The system is targetted at elderly and
disabled people. The prototype developed can control electrical
devices in a home or office. The system implements Automatic
Speech Recognition engines through Microsoft speech APIs.
The system implements the wireless network using ZigBee RF
modules for their efficiency and low power consumption.
Multimedia streaming through the network was impleneted
with the help of the Differential Pulse Code Modulation
(DPCM) compression algorithms that allows to compress the
speech data to half of its orignal data size. The preliminary test
results are promising.
Future work will entail:
• Adding confirmation commands to the voice
recognition system.
• Integrating variable control functions to improve the
system versatility such as providing control
commands other than ON/OFF commands. For
example “Increase Temperature”, “Dim Lights” etc.
• Integration of GSM or mobile server to operate
from a distance.
• Design and integration of an online home control
panel.
•
REFERENCES
[1] T. Birtley, (2010) Japan debates care for elderly. [Cited 21/09/2010].
Available: http://www.youtube.com/watch?v=C0UTqfigSec
[2] Population Division, DESA, United Nations. (2009). World Population
Ageing: Annual report 2009. [29/07/2010]. Available:
http://www.un.org/esa/population/publications/WPA2009/WPA2009_W
orkingPaper.pdf
[3] (2010) uControl Home security system website. [Cited 2010 14th
Oct].
Available: http://www.itechnews.net/2008/05/20/ucontrol-home-
security-system/
[4] R. Gadalla, “Voice Recognition System for Massey University
Smarthouse,” M. Eng thesis, Massey University, Auckland, New
Zealand, 2006.
[5] (2010) Home Automated Living website. [Cited 2010 14th
Oct].
Available: http://www.homeautomatedliving.com/default.htm
[6] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals,
New Jersey, US: Prentice Hall Inc, 1978.
[7] “XBee-2.5-Manual,” ZigBee RF communication protocol. (2008).
Minnetonka: Digi International Inc.
[8] B. Yukesekkaya, A. A. Kayalar, M. B. Tosun, M. K. Ozcan, and A. Z.
Alkar, “A GSM, Internet and Speech Controlled WirelessInteractive
Home Automation System,” IEEE Transactions on Consumer
Electronics, vol. 52, pp. 837-843, August 2006.
[9] F. J. Owens, Signal Processing of Speech, New York, US: McGraw-Hill
Inc, 1993.
[10] Voice Recoder Refrence Design (AN 278), Silicon Laboratories, 2006.
[11] D. Brunelli, M. Maggiorotti, L. Benini, and F. L. Bellifemine, “Analysis
of Audio Streaming Capapbility of Zigbee Networks,” in EWSN 2008,
2008, LNCS 4913, pp. 189-204.
Category Kiwi Arab Filipino
Pacific
Island
Japan Thai African
Person 1 68 85.7 88.6 70.3 67.7 75.8 96.7
Person 2 86 80 85 77 70 81.8 88
Person 3 57.1 88 80 80 74 85 85
Person 4 90 77 90 67 78.6 82 90
Person 5 85 60 77 90 76 68 82
Average 77.3 78.1 84.1 78.7 73.3 78.5 88.3