Summer 2017 undergraduate research powerpoint

A Framework for Real-Time
Analysis, Storage, and
Visualization of Big Data

Sensor Network
• Composed of
• Microcontroller units (MCU)
• Various sensors
• Temperature
• Humidity
• Photocells
• Web API
• Control MCUs
• Message Broker
• Route messages
• Manage communication channels
• Manage ques

The Ideal MCU
• Size
• Small form factor
• Power
• Low Power Consumption
• Battery Based
• Sleep Modes
• GPIO
• Digital
• Analog
• Communication
• Wifi 802.11 b/g/n
• Bluetooth
• Cellular 2G/3G
• Software
• Developed support
• Functionality Libraries
• Language Support
• C
• Javascript

Our Pick: ESP8266-12e
● Size
○ 24mm x16mm x 3mm
● Power
○ 3v3
○ Sleep mode
● Full IP Stack
○ WIFI 802.11 b/g/n
○ Can act as client or host
● Low Cost
○ $1.50
● Software
○ Several firmata available
○ Huge online community
https://acrobotic.com/media/wysiwyg/products/esp8266_esp12e_horizontal-01.png

Web API
• Web API
• Supervises MCUs
• MCUs initialization
• Connect to API Web Server
• Over Websockets
• Enter REPL
• Read Evaluate Print Loop (REPL)
• Inject Code directly into device
• Keep Status of Devices
• Modes
• Sleep
• On
• Off

Message Queue Telemetry Transport
(MQTT)
• MQTT: Lightweight Comm. Protocol
• Over TCP/IP
• Topic
• Data stream subject/identifier
• Publisher
• Transmits data on select topic
• Active connection
• Subscriber
• Listens for data on select topic
• Passive connection
• MQTT Broker
• Routes traffic based on subscriptions

Message Broker
• Intermediary
• Between sender and receiver
• Manages
• Delivery
• Routing
• Message Queue
• Protocol Conversion
• Message translation
Topic Subscriber Publisher
Temperature Server Temp. Sensor
Feedback Server Thermostat
mn … m4 m3 m2 m1 m0
m0 m1 m2 m3 m4 … mn
Temp. Sensor
Thermostat
Message Broker
Topic: Feedback
Topic: Temperature Server Side
Calculations
Message Queues

RabbitMQ
• Cross Language Support
• Java, Python, JavaScript, Ruby, & .NET
• Cross Protocol Support
• MQTT, AMQP, HTTP, & STOMP
• Asynchronous Messaging
• Many subs
• Many pubs
• QoS
• Data persists in queues until read by subscriber
• Low Latency
• Critical for real time apps.

Alternative Input Devices
• ROS Robots
• Drones
• Rovers
• Multi Input, Multi Output Systems
• Sensor + Actuator Pairs

Eclipse Kura- Gateway
Device for IOT

What is Kura?
● OSGI based framework for IOT Gateways
● Runs in a JVM
● Built-in MQTT cloud services
● Browser based GUI

Key Features-Network and Cloud Services
● Extensive MQTT configuration options
● Helps implement more complex interaction flows beyond publish/subscribe
● Remote management of M2M applications

Key Features-Configurable Services
● 3 ways to add service packages:
○ Eclipse Marketplace
○ URL
○ Uploaded files.
● Can be configured during runtime through the web GUI
● Can access device hardware: GPIO, GPS, etc

Key Features-Kura Wires
● Add new processes to Kura in a block based visual representation
● Runs automatically once changes are applied; no need to compile
● Implementations for MQTT, building databases, data filtering, and more
● Additional assets can be added to Kura Wires by adding packages

Distributed Stream
Processing Engine

Prototype Project with Temperature Sensor
Processing Tool:
● Handle streams of big data in real-time
● Low Latency
● Robust
● Simple / Flexible Implementation

What is Storm?
Storm = Distributed Stream Processing Engine (DSPE)
Topology :

What is Storm?
Parallel Processing with Apache - Zookeeper
Cluster Architecture :

Apache Storm - FEATURES
● Scalable infrastructure :
○ Zookeeper
● Simple API :
○ Free and open source
● Guarantee data processing :
○ Anchor tuples
○ At-least-once processing (Default)
○ Exactly-once processing (Apache -
Trident)
● Fault tolerant :
○ Daemons are stateless and fail-
fast
○ Zookeeper uses heartbeats
● Any language :
○ Apache - Thrift
● Data variability :
○ Kyro Serialization

SQL & NoSQL Comparison
SQL
● Vertically Scalable
● Predetermined data structure
● ACID (Atomicity, Consistency, Isolation
and Durability)
● Uses the Universal SQL (Structured
Query Language) language which
provides a powerful tool to manipulate
and define data
● Allows for complex queries
● Examples:
○ MySQL, Sqlite, and Postgres
NoSQL
● Horizontally Scalable
● Unstructured data storage
● CAP theorem ( Consistency, Availability
and Partition tolerance )
● Typically NoSQL relies on a collection of
documents and the syntax varies per the
database
● No standard interface for complex
queries
● Examples:
○ MongoDB, Redis, and Hbase

MySQL
● Open source relational database management system
(RDBMS)
● Provides a password system that is very flexible and
secure
● MySQL supports large databases, up to 50 million rows
or more in a table
● MySQL is also used in the industry as well (Facebook,
Twitter, Flickr and YouTube)

Where does
MySQL fit?
● Data Storage portion of the
architecture
● Apache Storm writes data
to MySQL
● Spark is then able to query
the database and provide a
visualization tool

Database Structure
MySQL
Place_id(PK)
Name
Bldg_id(PK)
Name
Place_id(FK)
Room_id(PK)
Name
Floor_id(FK)
Device_id(PK)
Name
Device_Type(FK)
Service_Date
Floor_id(PK)
Name
Bldg_id(FK)
Sensor_id(PK)
Device_id(FK)
Name
Sensor_Type(FK)
Data_id(PK)
Sensor_id(FK)
Time
Data_Value
Room_id(FK)
Coordinates
Device_Type(PK)
Name
Sensor_Type(PK)
Name
Place Building Floor Room
Device Sensor Sensor Data
Device Type Sensor Type

Design Challenges (Kyle)
● Normalizing the database was challenging
because I never experienced high level data
modeling before. It took me a couple of office
hour sessions with my professor for me to
grasp the concept.
● Creating the connection between Apache
Storm was incredibly frustrating. It was a
gruelling debugging process that me and a
fellow engineer were stuck on for about a day
or two.
● The time spent to build up my knowledge of all
the technologies integrated into the framework.

YARN
Stand Alone
Cluster
Mesos
HDFS(Hadoop, S3, or local)
Spark Streaming Spark MLib Spark GraphX Spark SQL
Apache Spark Architecture

Cloud Batch Processing
• Four components to Apache Spark:
• Spark SQL
• Introduces the concept of the “Resilient
Distributed Dataset”.
• Enables reading/writing and querying a
database using SQL.
• Spark MLib
• Machine Learning Algorithms
• Spark Streaming
• Real-time streaming using the process
of micro-batching.
• Spark GraphX
• Extends RDD’s for graphs and graph-
parallel computation.

Data Visualization
• Web-based book for data analysis
and visualization.
• Frames the question: “What is really
happening behind the hood?”.
• Versatile
• Interpreters for any language.
• Easy to share web-based notebook
• Able to set permissions per person.
• Integration with Apache Spark
Web Browser
Web
Server
Local
Interpreters
Zeppelin Daemon
Remote
Interpreters
Spark Master Node
Spark Worker
Node
Spark Worker
Node

Apache Spark Apache Zeppelin
✔ Python rather than Scala.
✔Plenty of Examples with a big community.
✔Troubleshooting was not that bad.
✔Basically Spark, but visualized.
✔Creating interpreters is useful.
✔Fun to explore and use.
🗙I used Python, but I’m new to Python
also.
🗙Understanding the program is one
thing, using it is another.
🗙New to Open-Source and Linux OS.
🗙 I learned Zeppelin, before Spark
🗙 Not as big of a community as Spark.
🗙 Troubleshooting was more difficult.
🗙More to Zeppelin than just Spark
interpreter..

Parking on campus
• Time-to-park is an issue on UTSA campus
• UTSA claims to have enough parking to meet typical demand
• Too many people in the same parking lot / Random distribution of
newly opened parking spots
• Solution: Accurate and reliable system to track (and predict)
parking patterns, providing real-time information to drivers to
assist in parking decisions

High Level Design
Cloud
Processing
• Percentage availability
• Predicted time-to-park (real time)
• Potential time-to-park (past data)

Summer 2017 undergraduate research powerpoint

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Summer 2017 undergraduate research powerpoint

Similar to Summer 2017 undergraduate research powerpoint (20)

Recently uploaded

Recently uploaded (20)

Summer 2017 undergraduate research powerpoint