SlideShare a Scribd company logo
1 of 10
Download to read offline
CORNELL UNIVERSITY
Final Report
Daniel Nosrati (dn259) | Kaushik Murali (km693) | Christopher Roman (cr469)
Report for May 15, 2018
1 PROJECT DESCRIPTION
Our project is an app that intends to help people find parking spaces. People often struggle
to find parking in heavily populated areas downtown during work commutes or nights out.
Given the users current location, the application will be able to find the nearest parking spot
at the destination at the estimated time of arrival. The app will monitor parking availabilities
through sensors (e.g. ground sensors, cameras, parking meters, etc.) in real-time. Users will
be able to view forecasted available parking spots based on a regression model trained in
the Cloud. The parking lots will be filtered based on a distance range provided by the user,
representing how far away they want their parking lots. Users can then make reservations for
a specified time period at a particular lot. The parking lots are assumed to be commercial,
where reservations can be enforced. Capabilities for showing the reservations for each user is
provided.
2 DATA
Streaming Input Data — (Timestamp, Lot Name, Latitude, Longitude, Available Spots)
Example Data Source — https://data.sandiego.gov/datasets/parking-meters-transactions/
Static information about each parking lot/street parking location
1. Parking lot ID (int)
2. Timestamps (Date/Time objects)
1
3. GPS coordinates (long, lat) for interfacing with Google API
There are a couple more pieces of data we would love to be incorporated in the future:
1. Incorporate weekly schedule of a parking lot
2. The option for users to query our application based on price of parking
3. Personalized User Accounts
3 INTERFACE
We have provided an interface through a REST API.
LotRangeQuery (destLocation Latitude, destLocation Longitude, lotRange Distance (km)
This stores a geographical mapping as well as a parking lot ID to server cluster mapping. It
fetches parking lot IDs corresponding to those that are within the lotRange specified, and then
makes requests to the edge servers that contain info about these parking lots through a load
balancer:
CreateReservation(userID, parkingLotID, startTimestamp, endTimestamp)
Creates reservations, and stores them in a DHT for querying.
GetUserReservations(userID)
Creates reservations, and stores them in a DHT for querying.
DisplaySensorData(areaID)
This API uses the current time in order to retrieve the most up-to-date parking lot availability
on all the parking lots stored on that server.
SensorPrediction(areaID, time=15)
This API uses a machine learning model stored on the edge to forecast parking lot availability .
The future is currently fixed to be 15 minutes as the model was trained for that use case.
Example1:
Input: LotRangeQuery(destLat = 34.02516, destLon = -118.50977, lotRange=1.25)
Output:
1 {
2 "data": {
3 "1": {
4 "prediction": 297.0987536268288,
5 "updated_at": "03/15/18 07:45"
6 },
2
7 "5": {
8 "prediction": 655.4239395889163,
9 "updated_at": "03/15/18 07:40"
10 },
11 "6": {
12 "prediction": 786.9102123088978,
13 "updated_at": "03/15/18 07:45"
14 },
15 "8": {
16 "prediction": 150.55680082440648,
17 "updated_at": "03/15/18 07:40"
18 }
19 },
20 "success": true
21 }
Example2:
Input: GetUserReservations(userID=100) Output:
1 {
2 u’data ’: {
3 u’reservations ’: [
4 {
5 u’area_id ’: {u’N’: u’1’},
6 u’end_time ’: {u’N’: u’1526411449’},
7 u’lot_id ’: {u’N’: u’1’},
8 u’reservation_id ’: {u’S’: u’100;1;1526407849’},
9 u’start_time ’: {u’N’: u’1526407849’},
10 u’user_id ’: {u’N’: u’100’}
11 },
12 {
13 u’area_id ’: {u’N’: u’0’},
14 u’end_time ’: {u’N’: u’1526411434’},
15 u’lot_id ’: {u’N’: u’0’},
16 u’reservation_id ’: {u’S’: u’100;0;1526407834’},
17 u’start_time ’: {u’N’: u’1526407834’},
18 u’user_id ’: {u’N’: u’100’}
19 },
20 {
21 u’area_id ’: {u’N’: u’1’},
22 u’end_time ’: {u’N’: u’1526411458’},
23 u’lot_id ’: {u’N’: u’1’},
24 u’reservation_id ’: {u’S’: u’100;1;1526407858’},
25 u’start_time ’: {u’N’: u’1526407858’},
26 u’user_id ’: {u’N’: u’100’}
3
27 }
28 ]
29 },
30 u’success ’: True
31 }
4 ARCHITECTURE
Our architecture consists of the following components: Sensors, Amazon SNS pub/sub,
Proxy Servers, Edge Servers, A load balancer to route to different edge servers and Proxy
Servers, Amazo S3 storage, MongoDB, Amazon EMR/Spark and DynamoDB.
The lifetime of a requests starts at the client which contacts the DNS of the load balancer
in order to make a request. From there the proxy routes the request ot the appropriate edge
server which processes the request using the data it has from the sensors as well as the data
from the ML model and DynamoDB to handle the request. Meanwhile in the background
Sensors are publishing the data on a 5 mintue period interval obtained from the lots to the
publish subscribe models which get pulled in by the relevant edge servers(this is established
4
through a mapping of edge servers to ip adresses which will be explained below) and are kept
in memory to due their small size. Additionally as the edge servers poll in this data, they push
some of the data to MonogDB which will store the data to be used for processing by the Spark
Cluster. The Spark cluster trains the model using the new data on a set time intevrval and
pushes the models to S3 from which the edge servers pull on a regular basis.
Both Edge Servers and Proxy Servers are containerized to make use of the fault tolerance
guarantees imposed by AWS ECS and to allow for quick reboots as rebooting the container
takes a minimal(30 seconds) amount of time.
As far as the architecture, we made plenty of assumptions and decisions, which need to be
justified appropriately. Initially, we thought MySQL would be a great way to store reservations
in the backend, as we could query on the expiry time of reservations in order to remove them
from the database. Using the size of the table, we could determine whether new reservations
could be accommodated, thus consistency of the database seemed essential. However, after
creating the MySQL EC2 cluster, we found a read heavy load would slow down the queries
considerably. Essentially, our cluster distribution appeared like a DHT, which was backed by a
SQL instead of a NoSQL database. Additionally, the use of transactions and database to for
most queries in MySQL would prove to be harmful for most endpoint cases such as making
reservations.
After seriously considering the tradeoff in consistency, we found that because of the way that
we partition using our mapping we are able to service range queries, without entering multiple
nodes in the DHT. DynamoDB, a NoSQL database, provided us a good way of sharding our
data across machines. We used area as the partition key because it would allow us to make
requests to a specific machine through a cluster. Within the mapping an area refers to a group
of lots that are grouped by location. This makes as an entire area will probably be able to fit
onto a machine and therefore it would only be necessary to talk to one machine to get all
necessary data. Additionally our use of indices prevents us from making scans and allow us to
use queries for all operations. Thus DynamoDB proved to be a great choice.
We used our indices in the DHT to optimize our workload. For user requests(how many
reservations a user has) we used a new primary index on useri d that would get us all the
data for a specific user. We used an index for loti d and endt ime to help with deletion of old
reservations as well as being able to see whether reservations are possible. We used an index
for areai d ordered on loti d to get specific information about lots and to allow for strongly
consistent reads in the event that our model requires us to perform them. Additionally we are
also able to shard the DHT itself in the event that our app goes global thereby allowing us to
circumvent the scalability limits of the DHT.
Hotspots in the DHT, such as Manhatten or Downtown Los Angeles, can be avoided by
carefully designing our mapping that would determine on what machine they lie. Additionally
we should be able to alter the hash function to deal with how data gets distributed in the DHT
based on the partition keys. We are forced to make a tradeoff between locality of access and
hotness elimination to handle such cases.
For the machine learning model, we forecasted a large amount of data flowing into the
application including pertinent information on how users made reservations as well as direct
sensor data that reflected how cars flowed into parking lots over the course of the day. T = 100%
5
of the sensor data received by an edge server through an SQS queue flows through to a cluster
of EC2 nodes running MongoDB. Currently, all the data is stored in one MongoDB table
because the number of parking lots is less than 20, but we can easily scale by storing one
MongoDB table per parking lot.
We chose MongoDB because we needed a database that could service a lot of write fairly
quickly while still allowing for some replication. MongoDB had not only the easiest option to
deploy a cluster, but it also had a simple connector to integrate with Spark. Thus, it was overall
better suited to our use case than other NoSQL services that cloud providers provided, such as
HBase.
Spark lends itself to our application as it is the top of the market cluster computing frame-
work used for creating distributed machine learning applications. Considering the machines
rented on the AWS EMR cluster had plenty of memory, using Hadoop as a direct MapReduce
framework would have significantly slowed down our rate of development. Spark takes the
best of Hadoop by incorporating pieces such as HDFS and YARN.
Spark Core functions were used to preprocess the data and create feature vectors. Spark
MLLib was used for training a Linear Regression model trained using SGD. Stochastic gradient
descent was a fantastic option available, because it trains the model using random samples
without making a pass through the entire dataset, and converges in on the loss-minimizing
parameters faster. In our case, the parameters were weights that corresponded to sensor data
drawn from the last four hours. These parameters were stored in files in AWS S3 as weight
vectors, as AWS EMR provides preinstalled dependecies to do so. We had a thread running on
the edge server that polled the AWS S3 service every 10 minutes in order to read an updated
model, if found.
As our model was just linear regression, it was very simple to make a predictions, as each
edge server stores sensor data drawn from the last four hours in local memory. If a more
complex model was made, we would use a separate cluster of machines handling predictions.
We would route requests from the proxy server to a prediction server, which would be hosted
on a machine with a cheap GPUs. These GPUs would vectorize computations that happen
on the edge, and compute results much faster. Once again, to increase the complexity of the
model, we would incorporate many more features such as the price of parking at different
times of the day, weather conditions, etc.
5 EVALUATION
To evaluate our system, we will focus on measuring the latency of requests and responses,
throughput of queries, fault tolerance, accuracy of predictions, and how easily nodes can be
added and removed. We will also evaluate our model through the theory behind the individual
portions that make up our architecture.
We assume a workload of the following for each request and will use it to appropriately
model all tests:
Get available parking lots 50%
Create reservations 30%
6
Delete Reservations 15%
Get all reservations by user 5%
We also assume that the servers will be on average running at 60% of their maximum capacity.
Described below are the metrics that we used.
5.1 THROUGHPUT - QUERIES / SECOND SERVED BY 1 MACHINE
We test the Throughput of our machines by first heating up the cache of servers as well as
the rest of our infrastructure by making some random requests. We then test each individual
request endpoint by making 1000 requests from different processes and taking the average
time it takes for every worker to finish his load. Between each endpoint we make an effort
to reset the cache again by using a random workload. We finally test our throughput for our
predicted workload by making 1000 requests in a random ordering that follow the percentages
set by our expectations.
We obtain the following values:
Global Benchmarks:
1000 Requests:
Mix of requests modeled by workload: 130 requests/s
Delete and create reservations: 42 requests/s
Get all user reservations: 44 requests/s
Get parking lot information: 189 requests/s
500 Requests:
Mix of requests modeled by workload: 111 requests/s
Delete and create reservations: 41 requests/s
Get all user reservations: 42 requests/s
Get parking lot information: 222 requests/s
5.2 LATENCY OF USER REQUEST
Now that we know what our average throughput is we use it to simulate our predicted workload
of 60%. We then test the latency by putting the server under the expected workload and start
by making 10 requests from the same user for each individual user and test the average latency
per request(ms). Cluster Specific Benchmarks:
10 Requests:
Mix of requests modeled by workload: 80 ms/request
Delete and create reservations: 119 ms/request
Get all user reservations: 79 ms/request
Get parking lot information: 92 ms/request
5.3 LATENCY OF SENSORY DATA
This section can’t be fully tested as sensors are not physically present.
7
5.4 ML MODEL
We evaluate the effectiveness of our model using RMSE, root mean squared error. For any sort
of linear prediction, this is simply ( ˆy − y)2
, where ˆy represents the prediction of parking spots
available m minutes into the future, and y is the true number of parking spots available at that
future time.
Assuming we have 241196 rows of data distributed over 17 parking lots stored in MongoDb,
our Spark models are computed using Spark libraries, and the models are stored to S3 in 344
seconds (under 6 minutes). This should scale extremely well, as we would retrain the model at
most once a day. The Mapreduce framework Spark is built on scales extremely well when we
increase the size of the dataset.
When running on the AWS EMR (Elastic MapReduce) service, when running the model on a
cluster with a single M4.large instance, we observed a 1.5 x decrease in speed (9 minutes).
5.5 INTERPRETING BENCHMARK RESULTS:
We note that our servers are able to linearly scale with the number of requests they receive
as seen by the difference between the 500 and 1000 workload. We note that our results can
be improved by switching form the flask development server to a real webserver like nginx
and interfacing with it using gunicorn ot have more than one worker per. Additionally the app
can benefit greatly from caching using redis to cache sensor and reservations data amongst
the multiple workers per server. We also not that some of our accesses to reservations appear
slow as we have overloaded DynamoDB with a substantial amount of testing data. We are also
limited in benchmarking by our computers which only have a certain amount of cores to run
the multiprocessing code and therefore have to wait to acquire and release resources.
5.6 FAULT TOLERANCE:
We test our fault tolerance by removing individual pieces of our infrastructure and noting that
the system still runs as expected. There are different instances of fault tolerance within the
system. The servers (EdgeServers and ProxyServers) are f +1 fault tolerant at the level of the
container — as long as there isn’t a total failure, the containerized applications will be restarted
automatically by AWS. It is still f +1 fault tolerant at the level of the EC2 Instances, but the
system takes longer to recover in this case. The ALB allows for fault tolerance of requests — if
a response is not received by the ALB (which passes requests between servers and client), it
will retransmit the request to another instance. DynamoDB is fault tolerant based on the DHT
architecture by replicating ranges in the key space across multiple nodes in the ring.
5.7 NOTES:
We minimize the error of our measurements by using the python multiprocessing processing
library which circumvents false measurements observed when using classic python threading
to due GIL(Global Interpreter Lock) overhead. Additional testing to verify the accuracy of our
findings could be done by profiling our code using a profiler
8
6 MILESTONES
0 (Completed March 7, Kush) - Milestone Completed
Found good data sources online of parking lots and their availability over a long period
of time. Will decide how to use them to generate realistic data for when we test scaling
our application to handle multiple sensors at many locations.
0.5 (Completed March 7, Chris & Daniel) - Milestone Completed
Locally created a script to simulate multiple sensors that send data (in the form described
in the Data section) every 3 seconds to our server. Created script to send multiple client
query requests simultaneously.
1 (Completed March 15, Chris & Daniel & Kush) - Milestone Completed
Implemented Server + Database Infrastructure locally for a multiple parking lots, by
running server and database instances locally. This involved taking in the data described
in the data section and finding some viable options for the user to park. Additionally, we
modeled the most recent sensor data held by the servers as an in-memory data structure.
We implemented a REST API for the clients to sucessfully make requests to the server,
and created connectors to handle parts of the business logic. This resulted in running a
full webserver(nginx & WSGI) locally.
2 (Completed March 22, Chris & Daniel) - Milestone Completed
Test Run+DevOps: We locally tested our single server implementation for a few parking
lots. We deployed our single server application (along with the database) an AWS EC2
t1.micro instance, using Ansible to handle server configuration.
* (March 27): INTERMEDIATE REPORT DUE
2 (Due April 1, Kush) - Milestone Completed
Implement ML model (for a single node) to predict trends in number of open parking
spots of parking lots and streets from the sample data generated in step 0. - Learn
time-series trends of how parking lots behave over the course of a day.
Update: Explored Pytorch for neural network architectures that would be feasible for
the limited number of features we have. Although Pytorch would support recurrent
neural network/LSTMs that function very well with time series data, integrating with
a distributed training framework would require lots of code rewriting. Decided on a
simple linear regression model that is best suited for distributed training using Spark.
A multilayer perceptron could also be entertained, if we obtained more training data
including features like weather.
4 (Due April 10, Chris) - Milestone Completed
Implement pub-sub model, with a subset of sensors publishing to a specific "Area" topic.
Allow the CH-micro-service to subscribe to an "Area" topic.
3 (Due April 12, Kush & Daniel & Chris) - Milestone Completed
Core business logic: Now that the ML model is functional we plan to update our core
9
business logic to have better routing based on what the model gives us. This involves
pushing the model to the edge(which we will figure out) and updating how our predic-
tions interface with he model. At this point we are still running on 1 machine.
4 (Due April 17, Daniel & Chris) - Milestone Completed
Create micro-service that is responsible for handling client requests (henceforth the
CH-micro-service), such as range queries or making parking spot reservations. Make
this micro-service an auto-scaling group, add load balancing, and ensure that it is
fault-tolerant.
4 (Due April 17, Chris & Kush) - Milestone Completed
Create Routing-micro-service that is responsible for mapping parking lots to "Area"
groups. These groups must be overlapping in order to add redundancy, and possibly
even dynamically add new mappings of parking lots. Add load-balancing and auto-
scaling to this micro-service.
6 (Due May 8, Kush) - Milestone Completed
Update the infrastructure for logging the data that the ML model(predicting user behav-
ior to predict which spots will open/close up in the future) will use to train itself. Train
ML model on a cluster of machines in a distributed manner.
7 (Due May 8, Kush & Daniel & Chris) - Milestone Completed
Crash mechanics for fault tolerance testing. We plan to implement a crash command
that can automatically shut down a server to conduct further fault tolerance testing.
8 (Due May 10, Chris &Daniel) Extras Features - Milestone Completed
Extra business logic to create reservations of parking spots.
9 (Due May 12, Daniel & Chris & Kush) Evaluations - Milestone Completed
Create some metrics of evaluations so we know how well our system is doing and if we
need to change some of our implementation. Create a reusable framework for this so
we can use it in the later stages of the project. Begin using the data from evaluations to
create poster
10 (Due May 15, Chris & Kush & Daniel) Poster + Presentation - Milestone Completed
Create our poster for the presentation and practice/prepare the presentation.
Updates
Finished all of them.
10

More Related Content

What's hot

Location Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaLocation Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaGuido Schmutz
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkSpark Summit
 
A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915Dan Han
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed DatasetsGabriele Modena
 
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web ServicesBig Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web ServicesAmazon Web Services
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning PrimerMathieu Dumoulin
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Carol McDonald
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingMohammad Mustaqeem
 
Finding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache HadoopFinding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache HadoopNushrat
 
Approximation algorithms for stream and batch processing
Approximation algorithms for stream and batch processingApproximation algorithms for stream and batch processing
Approximation algorithms for stream and batch processingGabriele Modena
 
A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...ijcses
 

What's hot (13)

Location Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaLocation Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache Kafka
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
 
A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed Datasets
 
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web ServicesBig Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
 
Finding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache HadoopFinding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache Hadoop
 
Approximation algorithms for stream and batch processing
Approximation algorithms for stream and batch processingApproximation algorithms for stream and batch processing
Approximation algorithms for stream and batch processing
 
A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...
 

Similar to Lot Explorer Report

Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Amazon Web Services
 
ESTIMATING CLOUD COMPUTING ROUND-TRIP TIME (RTT) USING FUZZY LOGIC FOR INTERR...
ESTIMATING CLOUD COMPUTING ROUND-TRIP TIME (RTT) USING FUZZY LOGIC FOR INTERR...ESTIMATING CLOUD COMPUTING ROUND-TRIP TIME (RTT) USING FUZZY LOGIC FOR INTERR...
ESTIMATING CLOUD COMPUTING ROUND-TRIP TIME (RTT) USING FUZZY LOGIC FOR INTERR...IJCI JOURNAL
 
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar SeriesBest Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar SeriesAmazon Web Services
 
Whitepaper - Choosing the right cloud provider for your business
Whitepaper - Choosing the right cloud provider for your businessWhitepaper - Choosing the right cloud provider for your business
Whitepaper - Choosing the right cloud provider for your businessRick Blaisdell
 
Android chapter16-web-services
Android chapter16-web-servicesAndroid chapter16-web-services
Android chapter16-web-servicesAravindharamanan S
 
Fast Synchronization In IVR Using REST API For HTML5 And AJAX
Fast Synchronization In IVR Using REST API For HTML5 And AJAXFast Synchronization In IVR Using REST API For HTML5 And AJAX
Fast Synchronization In IVR Using REST API For HTML5 And AJAXIJERA Editor
 
Cloud computing-ieee-2014-projects
Cloud computing-ieee-2014-projectsCloud computing-ieee-2014-projects
Cloud computing-ieee-2014-projectsVijay Karan
 
Cloud Computing IEEE 2014 Projects
Cloud Computing IEEE 2014 ProjectsCloud Computing IEEE 2014 Projects
Cloud Computing IEEE 2014 ProjectsVijay Karan
 
Microsoft Windows Azure - SAOSTA Professional Services Simulate Real World In...
Microsoft Windows Azure - SAOSTA Professional Services Simulate Real World In...Microsoft Windows Azure - SAOSTA Professional Services Simulate Real World In...
Microsoft Windows Azure - SAOSTA Professional Services Simulate Real World In...Microsoft Private Cloud
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayJosef Adersberger
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Cost to Serve of large scale Online Systems - final
Cost to Serve of large scale Online Systems - finalCost to Serve of large scale Online Systems - final
Cost to Serve of large scale Online Systems - finalAndrés Paz
 
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY ijccsa
 
Locality Sim : Cloud Simulator with Data Locality
Locality Sim : Cloud Simulator with Data LocalityLocality Sim : Cloud Simulator with Data Locality
Locality Sim : Cloud Simulator with Data Localityneirew J
 
50 C o m m u n i C At i o n S o f t h E A C m A P.docx
50    C o m m u n i C At i o n S  o f  t h E  A C m       A P.docx50    C o m m u n i C At i o n S  o f  t h E  A C m       A P.docx
50 C o m m u n i C At i o n S o f t h E A C m A P.docxalinainglis
 
AWS Cloud Solutions Architects & Tech Enthusiasts
AWS Cloud Solutions Architects & Tech EnthusiastsAWS Cloud Solutions Architects & Tech Enthusiasts
AWS Cloud Solutions Architects & Tech EnthusiastsJasonRoy50
 
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdfSchema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdfseo18
 

Similar to Lot Explorer Report (20)

Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS
 
ESTIMATING CLOUD COMPUTING ROUND-TRIP TIME (RTT) USING FUZZY LOGIC FOR INTERR...
ESTIMATING CLOUD COMPUTING ROUND-TRIP TIME (RTT) USING FUZZY LOGIC FOR INTERR...ESTIMATING CLOUD COMPUTING ROUND-TRIP TIME (RTT) USING FUZZY LOGIC FOR INTERR...
ESTIMATING CLOUD COMPUTING ROUND-TRIP TIME (RTT) USING FUZZY LOGIC FOR INTERR...
 
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar SeriesBest Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
 
Whitepaper - Choosing the right cloud provider for your business
Whitepaper - Choosing the right cloud provider for your businessWhitepaper - Choosing the right cloud provider for your business
Whitepaper - Choosing the right cloud provider for your business
 
Android chapter16-web-services
Android chapter16-web-servicesAndroid chapter16-web-services
Android chapter16-web-services
 
Fast Synchronization In IVR Using REST API For HTML5 And AJAX
Fast Synchronization In IVR Using REST API For HTML5 And AJAXFast Synchronization In IVR Using REST API For HTML5 And AJAX
Fast Synchronization In IVR Using REST API For HTML5 And AJAX
 
Cloud computing-ieee-2014-projects
Cloud computing-ieee-2014-projectsCloud computing-ieee-2014-projects
Cloud computing-ieee-2014-projects
 
Cloud Computing IEEE 2014 Projects
Cloud Computing IEEE 2014 ProjectsCloud Computing IEEE 2014 Projects
Cloud Computing IEEE 2014 Projects
 
Microsoft Windows Azure - SAOSTA Professional Services Simulate Real World In...
Microsoft Windows Azure - SAOSTA Professional Services Simulate Real World In...Microsoft Windows Azure - SAOSTA Professional Services Simulate Real World In...
Microsoft Windows Azure - SAOSTA Professional Services Simulate Real World In...
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice Way
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Cost to Serve of large scale Online Systems - final
Cost to Serve of large scale Online Systems - finalCost to Serve of large scale Online Systems - final
Cost to Serve of large scale Online Systems - final
 
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
 
Locality Sim : Cloud Simulator with Data Locality
Locality Sim : Cloud Simulator with Data LocalityLocality Sim : Cloud Simulator with Data Locality
Locality Sim : Cloud Simulator with Data Locality
 
Cloud sim report
Cloud sim reportCloud sim report
Cloud sim report
 
50 C o m m u n i C At i o n S o f t h E A C m A P.docx
50    C o m m u n i C At i o n S  o f  t h E  A C m       A P.docx50    C o m m u n i C At i o n S  o f  t h E  A C m       A P.docx
50 C o m m u n i C At i o n S o f t h E A C m A P.docx
 
p1365-fernandes
p1365-fernandesp1365-fernandes
p1365-fernandes
 
Scheduling in CCE
Scheduling in CCEScheduling in CCE
Scheduling in CCE
 
AWS Cloud Solutions Architects & Tech Enthusiasts
AWS Cloud Solutions Architects & Tech EnthusiastsAWS Cloud Solutions Architects & Tech Enthusiasts
AWS Cloud Solutions Architects & Tech Enthusiasts
 
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdfSchema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
 

Recently uploaded

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 

Recently uploaded (20)

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 

Lot Explorer Report

  • 1. CORNELL UNIVERSITY Final Report Daniel Nosrati (dn259) | Kaushik Murali (km693) | Christopher Roman (cr469) Report for May 15, 2018 1 PROJECT DESCRIPTION Our project is an app that intends to help people find parking spaces. People often struggle to find parking in heavily populated areas downtown during work commutes or nights out. Given the users current location, the application will be able to find the nearest parking spot at the destination at the estimated time of arrival. The app will monitor parking availabilities through sensors (e.g. ground sensors, cameras, parking meters, etc.) in real-time. Users will be able to view forecasted available parking spots based on a regression model trained in the Cloud. The parking lots will be filtered based on a distance range provided by the user, representing how far away they want their parking lots. Users can then make reservations for a specified time period at a particular lot. The parking lots are assumed to be commercial, where reservations can be enforced. Capabilities for showing the reservations for each user is provided. 2 DATA Streaming Input Data — (Timestamp, Lot Name, Latitude, Longitude, Available Spots) Example Data Source — https://data.sandiego.gov/datasets/parking-meters-transactions/ Static information about each parking lot/street parking location 1. Parking lot ID (int) 2. Timestamps (Date/Time objects) 1
  • 2. 3. GPS coordinates (long, lat) for interfacing with Google API There are a couple more pieces of data we would love to be incorporated in the future: 1. Incorporate weekly schedule of a parking lot 2. The option for users to query our application based on price of parking 3. Personalized User Accounts 3 INTERFACE We have provided an interface through a REST API. LotRangeQuery (destLocation Latitude, destLocation Longitude, lotRange Distance (km) This stores a geographical mapping as well as a parking lot ID to server cluster mapping. It fetches parking lot IDs corresponding to those that are within the lotRange specified, and then makes requests to the edge servers that contain info about these parking lots through a load balancer: CreateReservation(userID, parkingLotID, startTimestamp, endTimestamp) Creates reservations, and stores them in a DHT for querying. GetUserReservations(userID) Creates reservations, and stores them in a DHT for querying. DisplaySensorData(areaID) This API uses the current time in order to retrieve the most up-to-date parking lot availability on all the parking lots stored on that server. SensorPrediction(areaID, time=15) This API uses a machine learning model stored on the edge to forecast parking lot availability . The future is currently fixed to be 15 minutes as the model was trained for that use case. Example1: Input: LotRangeQuery(destLat = 34.02516, destLon = -118.50977, lotRange=1.25) Output: 1 { 2 "data": { 3 "1": { 4 "prediction": 297.0987536268288, 5 "updated_at": "03/15/18 07:45" 6 }, 2
  • 3. 7 "5": { 8 "prediction": 655.4239395889163, 9 "updated_at": "03/15/18 07:40" 10 }, 11 "6": { 12 "prediction": 786.9102123088978, 13 "updated_at": "03/15/18 07:45" 14 }, 15 "8": { 16 "prediction": 150.55680082440648, 17 "updated_at": "03/15/18 07:40" 18 } 19 }, 20 "success": true 21 } Example2: Input: GetUserReservations(userID=100) Output: 1 { 2 u’data ’: { 3 u’reservations ’: [ 4 { 5 u’area_id ’: {u’N’: u’1’}, 6 u’end_time ’: {u’N’: u’1526411449’}, 7 u’lot_id ’: {u’N’: u’1’}, 8 u’reservation_id ’: {u’S’: u’100;1;1526407849’}, 9 u’start_time ’: {u’N’: u’1526407849’}, 10 u’user_id ’: {u’N’: u’100’} 11 }, 12 { 13 u’area_id ’: {u’N’: u’0’}, 14 u’end_time ’: {u’N’: u’1526411434’}, 15 u’lot_id ’: {u’N’: u’0’}, 16 u’reservation_id ’: {u’S’: u’100;0;1526407834’}, 17 u’start_time ’: {u’N’: u’1526407834’}, 18 u’user_id ’: {u’N’: u’100’} 19 }, 20 { 21 u’area_id ’: {u’N’: u’1’}, 22 u’end_time ’: {u’N’: u’1526411458’}, 23 u’lot_id ’: {u’N’: u’1’}, 24 u’reservation_id ’: {u’S’: u’100;1;1526407858’}, 25 u’start_time ’: {u’N’: u’1526407858’}, 26 u’user_id ’: {u’N’: u’100’} 3
  • 4. 27 } 28 ] 29 }, 30 u’success ’: True 31 } 4 ARCHITECTURE Our architecture consists of the following components: Sensors, Amazon SNS pub/sub, Proxy Servers, Edge Servers, A load balancer to route to different edge servers and Proxy Servers, Amazo S3 storage, MongoDB, Amazon EMR/Spark and DynamoDB. The lifetime of a requests starts at the client which contacts the DNS of the load balancer in order to make a request. From there the proxy routes the request ot the appropriate edge server which processes the request using the data it has from the sensors as well as the data from the ML model and DynamoDB to handle the request. Meanwhile in the background Sensors are publishing the data on a 5 mintue period interval obtained from the lots to the publish subscribe models which get pulled in by the relevant edge servers(this is established 4
  • 5. through a mapping of edge servers to ip adresses which will be explained below) and are kept in memory to due their small size. Additionally as the edge servers poll in this data, they push some of the data to MonogDB which will store the data to be used for processing by the Spark Cluster. The Spark cluster trains the model using the new data on a set time intevrval and pushes the models to S3 from which the edge servers pull on a regular basis. Both Edge Servers and Proxy Servers are containerized to make use of the fault tolerance guarantees imposed by AWS ECS and to allow for quick reboots as rebooting the container takes a minimal(30 seconds) amount of time. As far as the architecture, we made plenty of assumptions and decisions, which need to be justified appropriately. Initially, we thought MySQL would be a great way to store reservations in the backend, as we could query on the expiry time of reservations in order to remove them from the database. Using the size of the table, we could determine whether new reservations could be accommodated, thus consistency of the database seemed essential. However, after creating the MySQL EC2 cluster, we found a read heavy load would slow down the queries considerably. Essentially, our cluster distribution appeared like a DHT, which was backed by a SQL instead of a NoSQL database. Additionally, the use of transactions and database to for most queries in MySQL would prove to be harmful for most endpoint cases such as making reservations. After seriously considering the tradeoff in consistency, we found that because of the way that we partition using our mapping we are able to service range queries, without entering multiple nodes in the DHT. DynamoDB, a NoSQL database, provided us a good way of sharding our data across machines. We used area as the partition key because it would allow us to make requests to a specific machine through a cluster. Within the mapping an area refers to a group of lots that are grouped by location. This makes as an entire area will probably be able to fit onto a machine and therefore it would only be necessary to talk to one machine to get all necessary data. Additionally our use of indices prevents us from making scans and allow us to use queries for all operations. Thus DynamoDB proved to be a great choice. We used our indices in the DHT to optimize our workload. For user requests(how many reservations a user has) we used a new primary index on useri d that would get us all the data for a specific user. We used an index for loti d and endt ime to help with deletion of old reservations as well as being able to see whether reservations are possible. We used an index for areai d ordered on loti d to get specific information about lots and to allow for strongly consistent reads in the event that our model requires us to perform them. Additionally we are also able to shard the DHT itself in the event that our app goes global thereby allowing us to circumvent the scalability limits of the DHT. Hotspots in the DHT, such as Manhatten or Downtown Los Angeles, can be avoided by carefully designing our mapping that would determine on what machine they lie. Additionally we should be able to alter the hash function to deal with how data gets distributed in the DHT based on the partition keys. We are forced to make a tradeoff between locality of access and hotness elimination to handle such cases. For the machine learning model, we forecasted a large amount of data flowing into the application including pertinent information on how users made reservations as well as direct sensor data that reflected how cars flowed into parking lots over the course of the day. T = 100% 5
  • 6. of the sensor data received by an edge server through an SQS queue flows through to a cluster of EC2 nodes running MongoDB. Currently, all the data is stored in one MongoDB table because the number of parking lots is less than 20, but we can easily scale by storing one MongoDB table per parking lot. We chose MongoDB because we needed a database that could service a lot of write fairly quickly while still allowing for some replication. MongoDB had not only the easiest option to deploy a cluster, but it also had a simple connector to integrate with Spark. Thus, it was overall better suited to our use case than other NoSQL services that cloud providers provided, such as HBase. Spark lends itself to our application as it is the top of the market cluster computing frame- work used for creating distributed machine learning applications. Considering the machines rented on the AWS EMR cluster had plenty of memory, using Hadoop as a direct MapReduce framework would have significantly slowed down our rate of development. Spark takes the best of Hadoop by incorporating pieces such as HDFS and YARN. Spark Core functions were used to preprocess the data and create feature vectors. Spark MLLib was used for training a Linear Regression model trained using SGD. Stochastic gradient descent was a fantastic option available, because it trains the model using random samples without making a pass through the entire dataset, and converges in on the loss-minimizing parameters faster. In our case, the parameters were weights that corresponded to sensor data drawn from the last four hours. These parameters were stored in files in AWS S3 as weight vectors, as AWS EMR provides preinstalled dependecies to do so. We had a thread running on the edge server that polled the AWS S3 service every 10 minutes in order to read an updated model, if found. As our model was just linear regression, it was very simple to make a predictions, as each edge server stores sensor data drawn from the last four hours in local memory. If a more complex model was made, we would use a separate cluster of machines handling predictions. We would route requests from the proxy server to a prediction server, which would be hosted on a machine with a cheap GPUs. These GPUs would vectorize computations that happen on the edge, and compute results much faster. Once again, to increase the complexity of the model, we would incorporate many more features such as the price of parking at different times of the day, weather conditions, etc. 5 EVALUATION To evaluate our system, we will focus on measuring the latency of requests and responses, throughput of queries, fault tolerance, accuracy of predictions, and how easily nodes can be added and removed. We will also evaluate our model through the theory behind the individual portions that make up our architecture. We assume a workload of the following for each request and will use it to appropriately model all tests: Get available parking lots 50% Create reservations 30% 6
  • 7. Delete Reservations 15% Get all reservations by user 5% We also assume that the servers will be on average running at 60% of their maximum capacity. Described below are the metrics that we used. 5.1 THROUGHPUT - QUERIES / SECOND SERVED BY 1 MACHINE We test the Throughput of our machines by first heating up the cache of servers as well as the rest of our infrastructure by making some random requests. We then test each individual request endpoint by making 1000 requests from different processes and taking the average time it takes for every worker to finish his load. Between each endpoint we make an effort to reset the cache again by using a random workload. We finally test our throughput for our predicted workload by making 1000 requests in a random ordering that follow the percentages set by our expectations. We obtain the following values: Global Benchmarks: 1000 Requests: Mix of requests modeled by workload: 130 requests/s Delete and create reservations: 42 requests/s Get all user reservations: 44 requests/s Get parking lot information: 189 requests/s 500 Requests: Mix of requests modeled by workload: 111 requests/s Delete and create reservations: 41 requests/s Get all user reservations: 42 requests/s Get parking lot information: 222 requests/s 5.2 LATENCY OF USER REQUEST Now that we know what our average throughput is we use it to simulate our predicted workload of 60%. We then test the latency by putting the server under the expected workload and start by making 10 requests from the same user for each individual user and test the average latency per request(ms). Cluster Specific Benchmarks: 10 Requests: Mix of requests modeled by workload: 80 ms/request Delete and create reservations: 119 ms/request Get all user reservations: 79 ms/request Get parking lot information: 92 ms/request 5.3 LATENCY OF SENSORY DATA This section can’t be fully tested as sensors are not physically present. 7
  • 8. 5.4 ML MODEL We evaluate the effectiveness of our model using RMSE, root mean squared error. For any sort of linear prediction, this is simply ( ˆy − y)2 , where ˆy represents the prediction of parking spots available m minutes into the future, and y is the true number of parking spots available at that future time. Assuming we have 241196 rows of data distributed over 17 parking lots stored in MongoDb, our Spark models are computed using Spark libraries, and the models are stored to S3 in 344 seconds (under 6 minutes). This should scale extremely well, as we would retrain the model at most once a day. The Mapreduce framework Spark is built on scales extremely well when we increase the size of the dataset. When running on the AWS EMR (Elastic MapReduce) service, when running the model on a cluster with a single M4.large instance, we observed a 1.5 x decrease in speed (9 minutes). 5.5 INTERPRETING BENCHMARK RESULTS: We note that our servers are able to linearly scale with the number of requests they receive as seen by the difference between the 500 and 1000 workload. We note that our results can be improved by switching form the flask development server to a real webserver like nginx and interfacing with it using gunicorn ot have more than one worker per. Additionally the app can benefit greatly from caching using redis to cache sensor and reservations data amongst the multiple workers per server. We also not that some of our accesses to reservations appear slow as we have overloaded DynamoDB with a substantial amount of testing data. We are also limited in benchmarking by our computers which only have a certain amount of cores to run the multiprocessing code and therefore have to wait to acquire and release resources. 5.6 FAULT TOLERANCE: We test our fault tolerance by removing individual pieces of our infrastructure and noting that the system still runs as expected. There are different instances of fault tolerance within the system. The servers (EdgeServers and ProxyServers) are f +1 fault tolerant at the level of the container — as long as there isn’t a total failure, the containerized applications will be restarted automatically by AWS. It is still f +1 fault tolerant at the level of the EC2 Instances, but the system takes longer to recover in this case. The ALB allows for fault tolerance of requests — if a response is not received by the ALB (which passes requests between servers and client), it will retransmit the request to another instance. DynamoDB is fault tolerant based on the DHT architecture by replicating ranges in the key space across multiple nodes in the ring. 5.7 NOTES: We minimize the error of our measurements by using the python multiprocessing processing library which circumvents false measurements observed when using classic python threading to due GIL(Global Interpreter Lock) overhead. Additional testing to verify the accuracy of our findings could be done by profiling our code using a profiler 8
  • 9. 6 MILESTONES 0 (Completed March 7, Kush) - Milestone Completed Found good data sources online of parking lots and their availability over a long period of time. Will decide how to use them to generate realistic data for when we test scaling our application to handle multiple sensors at many locations. 0.5 (Completed March 7, Chris & Daniel) - Milestone Completed Locally created a script to simulate multiple sensors that send data (in the form described in the Data section) every 3 seconds to our server. Created script to send multiple client query requests simultaneously. 1 (Completed March 15, Chris & Daniel & Kush) - Milestone Completed Implemented Server + Database Infrastructure locally for a multiple parking lots, by running server and database instances locally. This involved taking in the data described in the data section and finding some viable options for the user to park. Additionally, we modeled the most recent sensor data held by the servers as an in-memory data structure. We implemented a REST API for the clients to sucessfully make requests to the server, and created connectors to handle parts of the business logic. This resulted in running a full webserver(nginx & WSGI) locally. 2 (Completed March 22, Chris & Daniel) - Milestone Completed Test Run+DevOps: We locally tested our single server implementation for a few parking lots. We deployed our single server application (along with the database) an AWS EC2 t1.micro instance, using Ansible to handle server configuration. * (March 27): INTERMEDIATE REPORT DUE 2 (Due April 1, Kush) - Milestone Completed Implement ML model (for a single node) to predict trends in number of open parking spots of parking lots and streets from the sample data generated in step 0. - Learn time-series trends of how parking lots behave over the course of a day. Update: Explored Pytorch for neural network architectures that would be feasible for the limited number of features we have. Although Pytorch would support recurrent neural network/LSTMs that function very well with time series data, integrating with a distributed training framework would require lots of code rewriting. Decided on a simple linear regression model that is best suited for distributed training using Spark. A multilayer perceptron could also be entertained, if we obtained more training data including features like weather. 4 (Due April 10, Chris) - Milestone Completed Implement pub-sub model, with a subset of sensors publishing to a specific "Area" topic. Allow the CH-micro-service to subscribe to an "Area" topic. 3 (Due April 12, Kush & Daniel & Chris) - Milestone Completed Core business logic: Now that the ML model is functional we plan to update our core 9
  • 10. business logic to have better routing based on what the model gives us. This involves pushing the model to the edge(which we will figure out) and updating how our predic- tions interface with he model. At this point we are still running on 1 machine. 4 (Due April 17, Daniel & Chris) - Milestone Completed Create micro-service that is responsible for handling client requests (henceforth the CH-micro-service), such as range queries or making parking spot reservations. Make this micro-service an auto-scaling group, add load balancing, and ensure that it is fault-tolerant. 4 (Due April 17, Chris & Kush) - Milestone Completed Create Routing-micro-service that is responsible for mapping parking lots to "Area" groups. These groups must be overlapping in order to add redundancy, and possibly even dynamically add new mappings of parking lots. Add load-balancing and auto- scaling to this micro-service. 6 (Due May 8, Kush) - Milestone Completed Update the infrastructure for logging the data that the ML model(predicting user behav- ior to predict which spots will open/close up in the future) will use to train itself. Train ML model on a cluster of machines in a distributed manner. 7 (Due May 8, Kush & Daniel & Chris) - Milestone Completed Crash mechanics for fault tolerance testing. We plan to implement a crash command that can automatically shut down a server to conduct further fault tolerance testing. 8 (Due May 10, Chris &Daniel) Extras Features - Milestone Completed Extra business logic to create reservations of parking spots. 9 (Due May 12, Daniel & Chris & Kush) Evaluations - Milestone Completed Create some metrics of evaluations so we know how well our system is doing and if we need to change some of our implementation. Create a reusable framework for this so we can use it in the later stages of the project. Begin using the data from evaluations to create poster 10 (Due May 15, Chris & Kush & Daniel) Poster + Presentation - Milestone Completed Create our poster for the presentation and practice/prepare the presentation. Updates Finished all of them. 10