Machine Learning Project for Georgetown University's Data Science Certificate Program. Our team collected sensor data from the classroom using Raspberry Pi 3 sensors and other devices, and then built supervised classification and regression models to predict the room's occupancy.
2. 2
PROJECT OBJECTIVES MEET OUR TEAM
N I KO L AY B A N D U R A
K R I S T E N M C I N T Y R E
A B R A H A M M O N T I L L A
M E N G D I Y U E
S V E T L A N A Z O LOTA R E VA
Use Raspberry Pi 3 and other
devices to capture sensor data from
classroom
CA PTURE SEN SO R DATA
Build and tune Supervised
Classification and Regression
Models
M ACHIN E L EA RN IN G M O DEL S
Create a web application that
incorporates supervised machine
learning models to predict real-time
occupancy levels
B UILD W EB A PPLICATIO N
3. 3
INITIAL SETUP
Original Raspberry Pi 3 setup:
Bluetooth Devices
Door Sensor
Motion Sensor
Dual Temperature & Humidity Sensor
Ultimately, the Motion Sensor was removed
and the following sensors were added:
Camera
CO2
Light
Noise
D A T A I N G E S T I O N
4. 4
FINAL SETUP
Raspberry Pi 3 sensors are
connected to a breadboard.
SENSORS
BLUETOOTH DEVICES
CO2, ppm
DOOR SENSOR
HUMIDITY, %
IMAGE, 8-MP
LIGHT, Lux
NOISE, Hz
TEMPERATURE, °Celsius
D A T A I N G E S T I O N
14. 14
BLUETOOTH DEVICES DATA
Pearson score
Missing Data
W R I T E H E R E
B l u e t o o t h D e v i c e s : F r i d a y , M a y 5 , 2 0 1 7
D A T A W R A N G L I N G
15. 15
FEATURE CORRELATION
A company is an association or
collection of individuals, whether
natural persons, legal
W R I T E H E R E
B l u e t o o t h D e v i c e s & N o N - P e r s o n a l B l u e t o o t h D e v i c e s
D A T A W R A N G L I N G
16. 16
MACHINE LEARNING
M A I N C H A L L E N G E S
Mi s s i n g Val u e s
S e n so r E rro rs
N ew Fe at u re s
T i m e -Se ri e s D ata
Tim e S e rie sS p lit
C A RT M o d e ls
Cl as s I m b al an ce
8 9 % Oc c u p ie d
Oc c u p a n c y C ate go r y
0 : E mpt y
1 -1 6 : L ow
1 7 - 2 7 : M id - L eve l
> 2 7 : Hig h
17. 17
MODEL
CLASSIFICATION
REPORT
CROSS-
VALIDATION
ACCURACY
SCORES
GaussianNB f1 score: 0.98
precision: 0.98
recall: 0.98
0.8056 Training set: 0.962
Test set: 0.983
kNN f1 score: 0.50
precision: 0.49
recall: 0.58
0.5755 Training set: 0.922
Test set: 0.583
LDA f1 score: 0.96
precision: 0.96
recall: 0.96
0.8699 Training set: 0.896
Test set: 0.960
Logistic
Regression
f1 score: 0.94
precision: 0.95
recall: 0.94
0.8067 Training set: 0.913
Test set: 0.942
SGD f1 score: 0.42
precision: 0.33
recall: 0.55
0.6223 Training set: 0.661
Test set: 0.574
SVC f1 score: 0.64
precision: 0.81
recall: 0.70
0.6402 Training set: 0.991
Test set: 0.703
CLASSIFICATION
MODELS
M A C H I N E L E A R N I N G
C L A S S I F I C AT I O N
R E P O R T
C R O S S - VA L I D AT I O N
S C O R E ( 1 2 - F o l d )
A C C U R A C Y S C O R E S
INITIAL RESULTS
23. 23
INITIAL STATE
DASHBOARD
W E B A P P L I C A T I O N
An interactive viewer
that allows users to
mine data and collect
targeted statistics
WAV E S
Set an action to occur
for a specified condition
in the data stream
T R I G G E R S
Each tile is customized
to display a unique
event stream
T I L E S
24. 24
MODEL OPTIMATIZATION
C L A S S R O O M O C C U P A N C Y C A P S T O N E
Specific to location
User privacy
L I M I TAT I O N S
Gather sensor data from multiple rooms at
the same location
Take into account the building’s HVAC
system
Gather data from rooms impacted by
outdoor weather conditions
I M P R OV E M E N T S