Optimizing AI for immediate response in Smart CCTV
Lot Explorer Report
1. CORNELL UNIVERSITY
Final Report
Daniel Nosrati (dn259) | Kaushik Murali (km693) | Christopher Roman (cr469)
Report for May 15, 2018
1 PROJECT DESCRIPTION
Our project is an app that intends to help people find parking spaces. People often struggle
to find parking in heavily populated areas downtown during work commutes or nights out.
Given the users current location, the application will be able to find the nearest parking spot
at the destination at the estimated time of arrival. The app will monitor parking availabilities
through sensors (e.g. ground sensors, cameras, parking meters, etc.) in real-time. Users will
be able to view forecasted available parking spots based on a regression model trained in
the Cloud. The parking lots will be filtered based on a distance range provided by the user,
representing how far away they want their parking lots. Users can then make reservations for
a specified time period at a particular lot. The parking lots are assumed to be commercial,
where reservations can be enforced. Capabilities for showing the reservations for each user is
provided.
2 DATA
Streaming Input Data — (Timestamp, Lot Name, Latitude, Longitude, Available Spots)
Example Data Source — https://data.sandiego.gov/datasets/parking-meters-transactions/
Static information about each parking lot/street parking location
1. Parking lot ID (int)
2. Timestamps (Date/Time objects)
1
2. 3. GPS coordinates (long, lat) for interfacing with Google API
There are a couple more pieces of data we would love to be incorporated in the future:
1. Incorporate weekly schedule of a parking lot
2. The option for users to query our application based on price of parking
3. Personalized User Accounts
3 INTERFACE
We have provided an interface through a REST API.
LotRangeQuery (destLocation Latitude, destLocation Longitude, lotRange Distance (km)
This stores a geographical mapping as well as a parking lot ID to server cluster mapping. It
fetches parking lot IDs corresponding to those that are within the lotRange specified, and then
makes requests to the edge servers that contain info about these parking lots through a load
balancer:
CreateReservation(userID, parkingLotID, startTimestamp, endTimestamp)
Creates reservations, and stores them in a DHT for querying.
GetUserReservations(userID)
Creates reservations, and stores them in a DHT for querying.
DisplaySensorData(areaID)
This API uses the current time in order to retrieve the most up-to-date parking lot availability
on all the parking lots stored on that server.
SensorPrediction(areaID, time=15)
This API uses a machine learning model stored on the edge to forecast parking lot availability .
The future is currently fixed to be 15 minutes as the model was trained for that use case.
Example1:
Input: LotRangeQuery(destLat = 34.02516, destLon = -118.50977, lotRange=1.25)
Output:
1 {
2 "data": {
3 "1": {
4 "prediction": 297.0987536268288,
5 "updated_at": "03/15/18 07:45"
6 },
2
4. 27 }
28 ]
29 },
30 u’success ’: True
31 }
4 ARCHITECTURE
Our architecture consists of the following components: Sensors, Amazon SNS pub/sub,
Proxy Servers, Edge Servers, A load balancer to route to different edge servers and Proxy
Servers, Amazo S3 storage, MongoDB, Amazon EMR/Spark and DynamoDB.
The lifetime of a requests starts at the client which contacts the DNS of the load balancer
in order to make a request. From there the proxy routes the request ot the appropriate edge
server which processes the request using the data it has from the sensors as well as the data
from the ML model and DynamoDB to handle the request. Meanwhile in the background
Sensors are publishing the data on a 5 mintue period interval obtained from the lots to the
publish subscribe models which get pulled in by the relevant edge servers(this is established
4
5. through a mapping of edge servers to ip adresses which will be explained below) and are kept
in memory to due their small size. Additionally as the edge servers poll in this data, they push
some of the data to MonogDB which will store the data to be used for processing by the Spark
Cluster. The Spark cluster trains the model using the new data on a set time intevrval and
pushes the models to S3 from which the edge servers pull on a regular basis.
Both Edge Servers and Proxy Servers are containerized to make use of the fault tolerance
guarantees imposed by AWS ECS and to allow for quick reboots as rebooting the container
takes a minimal(30 seconds) amount of time.
As far as the architecture, we made plenty of assumptions and decisions, which need to be
justified appropriately. Initially, we thought MySQL would be a great way to store reservations
in the backend, as we could query on the expiry time of reservations in order to remove them
from the database. Using the size of the table, we could determine whether new reservations
could be accommodated, thus consistency of the database seemed essential. However, after
creating the MySQL EC2 cluster, we found a read heavy load would slow down the queries
considerably. Essentially, our cluster distribution appeared like a DHT, which was backed by a
SQL instead of a NoSQL database. Additionally, the use of transactions and database to for
most queries in MySQL would prove to be harmful for most endpoint cases such as making
reservations.
After seriously considering the tradeoff in consistency, we found that because of the way that
we partition using our mapping we are able to service range queries, without entering multiple
nodes in the DHT. DynamoDB, a NoSQL database, provided us a good way of sharding our
data across machines. We used area as the partition key because it would allow us to make
requests to a specific machine through a cluster. Within the mapping an area refers to a group
of lots that are grouped by location. This makes as an entire area will probably be able to fit
onto a machine and therefore it would only be necessary to talk to one machine to get all
necessary data. Additionally our use of indices prevents us from making scans and allow us to
use queries for all operations. Thus DynamoDB proved to be a great choice.
We used our indices in the DHT to optimize our workload. For user requests(how many
reservations a user has) we used a new primary index on useri d that would get us all the
data for a specific user. We used an index for loti d and endt ime to help with deletion of old
reservations as well as being able to see whether reservations are possible. We used an index
for areai d ordered on loti d to get specific information about lots and to allow for strongly
consistent reads in the event that our model requires us to perform them. Additionally we are
also able to shard the DHT itself in the event that our app goes global thereby allowing us to
circumvent the scalability limits of the DHT.
Hotspots in the DHT, such as Manhatten or Downtown Los Angeles, can be avoided by
carefully designing our mapping that would determine on what machine they lie. Additionally
we should be able to alter the hash function to deal with how data gets distributed in the DHT
based on the partition keys. We are forced to make a tradeoff between locality of access and
hotness elimination to handle such cases.
For the machine learning model, we forecasted a large amount of data flowing into the
application including pertinent information on how users made reservations as well as direct
sensor data that reflected how cars flowed into parking lots over the course of the day. T = 100%
5
6. of the sensor data received by an edge server through an SQS queue flows through to a cluster
of EC2 nodes running MongoDB. Currently, all the data is stored in one MongoDB table
because the number of parking lots is less than 20, but we can easily scale by storing one
MongoDB table per parking lot.
We chose MongoDB because we needed a database that could service a lot of write fairly
quickly while still allowing for some replication. MongoDB had not only the easiest option to
deploy a cluster, but it also had a simple connector to integrate with Spark. Thus, it was overall
better suited to our use case than other NoSQL services that cloud providers provided, such as
HBase.
Spark lends itself to our application as it is the top of the market cluster computing frame-
work used for creating distributed machine learning applications. Considering the machines
rented on the AWS EMR cluster had plenty of memory, using Hadoop as a direct MapReduce
framework would have significantly slowed down our rate of development. Spark takes the
best of Hadoop by incorporating pieces such as HDFS and YARN.
Spark Core functions were used to preprocess the data and create feature vectors. Spark
MLLib was used for training a Linear Regression model trained using SGD. Stochastic gradient
descent was a fantastic option available, because it trains the model using random samples
without making a pass through the entire dataset, and converges in on the loss-minimizing
parameters faster. In our case, the parameters were weights that corresponded to sensor data
drawn from the last four hours. These parameters were stored in files in AWS S3 as weight
vectors, as AWS EMR provides preinstalled dependecies to do so. We had a thread running on
the edge server that polled the AWS S3 service every 10 minutes in order to read an updated
model, if found.
As our model was just linear regression, it was very simple to make a predictions, as each
edge server stores sensor data drawn from the last four hours in local memory. If a more
complex model was made, we would use a separate cluster of machines handling predictions.
We would route requests from the proxy server to a prediction server, which would be hosted
on a machine with a cheap GPUs. These GPUs would vectorize computations that happen
on the edge, and compute results much faster. Once again, to increase the complexity of the
model, we would incorporate many more features such as the price of parking at different
times of the day, weather conditions, etc.
5 EVALUATION
To evaluate our system, we will focus on measuring the latency of requests and responses,
throughput of queries, fault tolerance, accuracy of predictions, and how easily nodes can be
added and removed. We will also evaluate our model through the theory behind the individual
portions that make up our architecture.
We assume a workload of the following for each request and will use it to appropriately
model all tests:
Get available parking lots 50%
Create reservations 30%
6
7. Delete Reservations 15%
Get all reservations by user 5%
We also assume that the servers will be on average running at 60% of their maximum capacity.
Described below are the metrics that we used.
5.1 THROUGHPUT - QUERIES / SECOND SERVED BY 1 MACHINE
We test the Throughput of our machines by first heating up the cache of servers as well as
the rest of our infrastructure by making some random requests. We then test each individual
request endpoint by making 1000 requests from different processes and taking the average
time it takes for every worker to finish his load. Between each endpoint we make an effort
to reset the cache again by using a random workload. We finally test our throughput for our
predicted workload by making 1000 requests in a random ordering that follow the percentages
set by our expectations.
We obtain the following values:
Global Benchmarks:
1000 Requests:
Mix of requests modeled by workload: 130 requests/s
Delete and create reservations: 42 requests/s
Get all user reservations: 44 requests/s
Get parking lot information: 189 requests/s
500 Requests:
Mix of requests modeled by workload: 111 requests/s
Delete and create reservations: 41 requests/s
Get all user reservations: 42 requests/s
Get parking lot information: 222 requests/s
5.2 LATENCY OF USER REQUEST
Now that we know what our average throughput is we use it to simulate our predicted workload
of 60%. We then test the latency by putting the server under the expected workload and start
by making 10 requests from the same user for each individual user and test the average latency
per request(ms). Cluster Specific Benchmarks:
10 Requests:
Mix of requests modeled by workload: 80 ms/request
Delete and create reservations: 119 ms/request
Get all user reservations: 79 ms/request
Get parking lot information: 92 ms/request
5.3 LATENCY OF SENSORY DATA
This section can’t be fully tested as sensors are not physically present.
7
8. 5.4 ML MODEL
We evaluate the effectiveness of our model using RMSE, root mean squared error. For any sort
of linear prediction, this is simply ( ˆy − y)2
, where ˆy represents the prediction of parking spots
available m minutes into the future, and y is the true number of parking spots available at that
future time.
Assuming we have 241196 rows of data distributed over 17 parking lots stored in MongoDb,
our Spark models are computed using Spark libraries, and the models are stored to S3 in 344
seconds (under 6 minutes). This should scale extremely well, as we would retrain the model at
most once a day. The Mapreduce framework Spark is built on scales extremely well when we
increase the size of the dataset.
When running on the AWS EMR (Elastic MapReduce) service, when running the model on a
cluster with a single M4.large instance, we observed a 1.5 x decrease in speed (9 minutes).
5.5 INTERPRETING BENCHMARK RESULTS:
We note that our servers are able to linearly scale with the number of requests they receive
as seen by the difference between the 500 and 1000 workload. We note that our results can
be improved by switching form the flask development server to a real webserver like nginx
and interfacing with it using gunicorn ot have more than one worker per. Additionally the app
can benefit greatly from caching using redis to cache sensor and reservations data amongst
the multiple workers per server. We also not that some of our accesses to reservations appear
slow as we have overloaded DynamoDB with a substantial amount of testing data. We are also
limited in benchmarking by our computers which only have a certain amount of cores to run
the multiprocessing code and therefore have to wait to acquire and release resources.
5.6 FAULT TOLERANCE:
We test our fault tolerance by removing individual pieces of our infrastructure and noting that
the system still runs as expected. There are different instances of fault tolerance within the
system. The servers (EdgeServers and ProxyServers) are f +1 fault tolerant at the level of the
container — as long as there isn’t a total failure, the containerized applications will be restarted
automatically by AWS. It is still f +1 fault tolerant at the level of the EC2 Instances, but the
system takes longer to recover in this case. The ALB allows for fault tolerance of requests — if
a response is not received by the ALB (which passes requests between servers and client), it
will retransmit the request to another instance. DynamoDB is fault tolerant based on the DHT
architecture by replicating ranges in the key space across multiple nodes in the ring.
5.7 NOTES:
We minimize the error of our measurements by using the python multiprocessing processing
library which circumvents false measurements observed when using classic python threading
to due GIL(Global Interpreter Lock) overhead. Additional testing to verify the accuracy of our
findings could be done by profiling our code using a profiler
8
9. 6 MILESTONES
0 (Completed March 7, Kush) - Milestone Completed
Found good data sources online of parking lots and their availability over a long period
of time. Will decide how to use them to generate realistic data for when we test scaling
our application to handle multiple sensors at many locations.
0.5 (Completed March 7, Chris & Daniel) - Milestone Completed
Locally created a script to simulate multiple sensors that send data (in the form described
in the Data section) every 3 seconds to our server. Created script to send multiple client
query requests simultaneously.
1 (Completed March 15, Chris & Daniel & Kush) - Milestone Completed
Implemented Server + Database Infrastructure locally for a multiple parking lots, by
running server and database instances locally. This involved taking in the data described
in the data section and finding some viable options for the user to park. Additionally, we
modeled the most recent sensor data held by the servers as an in-memory data structure.
We implemented a REST API for the clients to sucessfully make requests to the server,
and created connectors to handle parts of the business logic. This resulted in running a
full webserver(nginx & WSGI) locally.
2 (Completed March 22, Chris & Daniel) - Milestone Completed
Test Run+DevOps: We locally tested our single server implementation for a few parking
lots. We deployed our single server application (along with the database) an AWS EC2
t1.micro instance, using Ansible to handle server configuration.
* (March 27): INTERMEDIATE REPORT DUE
2 (Due April 1, Kush) - Milestone Completed
Implement ML model (for a single node) to predict trends in number of open parking
spots of parking lots and streets from the sample data generated in step 0. - Learn
time-series trends of how parking lots behave over the course of a day.
Update: Explored Pytorch for neural network architectures that would be feasible for
the limited number of features we have. Although Pytorch would support recurrent
neural network/LSTMs that function very well with time series data, integrating with
a distributed training framework would require lots of code rewriting. Decided on a
simple linear regression model that is best suited for distributed training using Spark.
A multilayer perceptron could also be entertained, if we obtained more training data
including features like weather.
4 (Due April 10, Chris) - Milestone Completed
Implement pub-sub model, with a subset of sensors publishing to a specific "Area" topic.
Allow the CH-micro-service to subscribe to an "Area" topic.
3 (Due April 12, Kush & Daniel & Chris) - Milestone Completed
Core business logic: Now that the ML model is functional we plan to update our core
9
10. business logic to have better routing based on what the model gives us. This involves
pushing the model to the edge(which we will figure out) and updating how our predic-
tions interface with he model. At this point we are still running on 1 machine.
4 (Due April 17, Daniel & Chris) - Milestone Completed
Create micro-service that is responsible for handling client requests (henceforth the
CH-micro-service), such as range queries or making parking spot reservations. Make
this micro-service an auto-scaling group, add load balancing, and ensure that it is
fault-tolerant.
4 (Due April 17, Chris & Kush) - Milestone Completed
Create Routing-micro-service that is responsible for mapping parking lots to "Area"
groups. These groups must be overlapping in order to add redundancy, and possibly
even dynamically add new mappings of parking lots. Add load-balancing and auto-
scaling to this micro-service.
6 (Due May 8, Kush) - Milestone Completed
Update the infrastructure for logging the data that the ML model(predicting user behav-
ior to predict which spots will open/close up in the future) will use to train itself. Train
ML model on a cluster of machines in a distributed manner.
7 (Due May 8, Kush & Daniel & Chris) - Milestone Completed
Crash mechanics for fault tolerance testing. We plan to implement a crash command
that can automatically shut down a server to conduct further fault tolerance testing.
8 (Due May 10, Chris &Daniel) Extras Features - Milestone Completed
Extra business logic to create reservations of parking spots.
9 (Due May 12, Daniel & Chris & Kush) Evaluations - Milestone Completed
Create some metrics of evaluations so we know how well our system is doing and if we
need to change some of our implementation. Create a reusable framework for this so
we can use it in the later stages of the project. Begin using the data from evaluations to
create poster
10 (Due May 15, Chris & Kush & Daniel) Poster + Presentation - Milestone Completed
Create our poster for the presentation and practice/prepare the presentation.
Updates
Finished all of them.
10