Hype, buzzword, threat; however you want to characterize it, the Internet of Things (IoT) is here.
IoT scenarios that were hypothetical only a few years ago are real today. Still thinking along the line of fleet management and temperature measurements? You’re out. Endless possibilities of IoT applications are surfacing every day, from the connected cow (huh?) to things that monitor and analyze your daily life (really?).
In this webinar, we will discuss architecture of IoT data management solutions and the challenges that arise. We will explore how MongoDB features provide solutions to those problems. Time permitting, we will demonstrate an IoT Cloud service built on top of MongoDB.
11. CONNECTED COW
by VITAL HERD
E-pill ingested into stomach
Transmits heart rate, temp,
chemical composition
Notifies farmer when
abnormality is detected
Health management
94 Million Cows in US, Billions
of savings
21. SAMPLE DESIGN 1
EVENT_ID PLANE_ID TIMESTAMP LAT LONG ENGINE
TEMP
FUEL
LEVEL
… SPEED
100001 3902 1437297148810 38.2031 -124.4904
100002 3902 1437297149213 750
Modeling all metrics as columns in one relational table
Huge table, lots of wasted space caused by empty
values
Frequent schema change and data migrations when
adding new metrics
22. SAMPLE DESIGN 2
EVENT_ID METRIC_NAME METRIC_VALUE
100001 LAT 38.2031
100001 LONG -124.4904
100002 SPEED 750
Store variable metrics in an EAV table
EVENT_ID PLANE_ID TIMESTAMP
100001 3902 1437297148810
METRIC_VALUE needs be
defined as TEXT field
Index implication for
METRIC_VALUE field
Multiple self joins necessary
23. Enormous
Data
Volume
A single flight, per minute interval:
3 * 60 * 100 = 18K data points/flight
100,000 flights per day:
1.8 Billion, 1.8TB per day
21,000 QPS
28. AGILITY
Start coding now, without month long ER design.
Changing schema as you go without penalty.
Flexible schema models variable structure with ease
29. location: (-84.2391, 34.1039)
speed: 750
engine:
fuel_level: 100 ,
temperature: 88.48
DATA MODEL
1
3
2
1 Variable data structure
Sparse Indexes
Dynamic Schema
2
3
31. OPTIMIZE
With document model
A time series is
a sequence of data points,
typically consisting of successive
measurements
made over a time interval. Examples
of time series are ocean tides, counts
of sunspots, and the daily closing value
of the Dow Jones Industrial Average.
--wikipedia
35. CHOOSING A SHARD KEY FOR SENSORS
Cardinality - LARGE
Write distribution - EVEN
Query isolation – ISOLATED
36. CHOOSING A SHARD KEY
Cardinality
Write distribution
Query isolation
Reliability
Index locality
Cardinality
Write
Distribution
Query
Isolation
Reliability
Index
Locality
_id Doc level One shard Scatter/gather All users affected Good
hash(_id) Hash level All Shards Scatter/gather All users affected Poor
asset_id Many docs All Shards Targeted
Some assets
affected
Good
asset_id, ts Doc level All Shards Targeted
Some assets
affected
Good
43. 43
WE CAN HELP
MongoDB Enterprise Advanced
The best way to run MongoDB in your data center
MongoDB Ops Manager
The easiest way to run MongoDB in your datacenter
Production Support
In production and under control
Development Support
Let’s get you running
Consulting
We solve problems
Training
Get your teams up to speed.
Editor's Notes
Okay so what is IoT, or Internet of Things?
It is ranked as #2 or #3 buzzwords in the tech industry, depending who you prefer. According to profoundry, there’re 93 million Google search results and 36 thousand Twitter mentions within a 30 day period. According to Cloud Computing, IoT is second biggest buzzword and all of you know which one is the top:
Being a buzzword, it means there’re really bazillions of interpretations, which makes it hard to have a universally agreed definition.
The good thing about being a buzzword is everyone gets a chance to give it a spin. And my personal definition of IoT is Internet 4.0
Why 4.0?
The prototype of the Internet occurred in the 1960s but only in the 1980s did it start to mature. At first it was mostly used by the universities and research institutions. And we call it 1.0
In 1990s, the commercialization of internet took off as personal computers were connected to the network. This period can be considered as version 2.0.
Less than 10 years ago the invention of iPhone brought us into the version 3.0. As of today 3 billion people are connected, thanks to the proliferation of mobile devices, accounting for as much as 80% of connected devices.
From server, to PC, to mobile, finally we are expanding the reach to so-called things, devices that traditionally weren’t meant to be smart. Internet 4.0 is really an extension of today’s internet to devices that are equipped with sensors and network modules to allow the devices to sense its surroundings and to communicate the rest of the world.
We’re at the beta version of Internet 4.0. If you didn’t build a website during the 90s and didn’t create a iPhone app in 2008, now is your opportunity to catch the internet wave. Once the beta version becomes official, the competition will be a lot more severe.
Okay so what exactly do we need to know about IoT, from a technological perspective? Here I’m borrowing a few slides from postscapes.com, a website dedicated to tracking everything about Internet of Things. It worked with Harbor research to produce an very informative infographis. The folks there have done a phenomenal job visualizing the key aspects of the IoT, if you are somewhat new to IoT world, I’d highly recommend you to take a look at the full version.
In Postscapes version, there are three, or rather four critical components that drives the smart systems in our world: Sensors/Connectivity and People and Processes.
Sensors come at first when we talk about IoT, or Internet of Things. Sensor are the eyes and ears for the internet. In the early days, or 1.0, keyboard was the main interface between outside world and the internet. In 2.0 personal computer we added cameras and microphones. With mobile phones, GPS, accelerometer and the gyroscope are almost universal standard configuration. In IoT era, sensors now have an even wider range of options. Temperature, pressure, ambient light, velocity, displacement, humidity, torque strain, anything you want to know there is a sensor for it.
In addition to sensors, what’s equally important is the actuators that allow IoT applications to respond to the changing condition. For instance, a connected HVAC controller might automatically slightly raise the temperature to save energy when the cloud detects a spike in electricity use.
Sensors, especially those industry sensors are nothing new and they have been in existence for decades. What is different today is the connectivity to the network(internet) that brings it to the next level. With a variety of network protocols, such as WiFi, Bluetooth, Zigbee, Z-wave, GPRS, these sensors are now able to feed the knowledge, what they see, what they hear and what they sense, to the connected networks or to the cloud. Among these, Zigbee and Z-wave are specifically designed for the smart things. Compare to WiFi, Zigbee and Z-wave has much lower effective range but typically requires much lower footprint in terms of power consumption and hardware resources. Zigbee can often be found in chips which only have less than 100KB RAM.
We have the sensors and actuators as our input and output, we have the connectivity to bring the data in, real time, and on the same communication channel we can send commands out to those devices to respond to the changes. This kind of bi-directional infrastructure opened a new arena for us to design and build applications The reason I think Internet 4.0 is a better name is essentially these things are connected to us, the people and our everyday life, the internet we have grown to live without .
IoT applications are found in all industries today. Geofencing is a very popular for nursing homes, cattle management etc. Fleet management becomes real time with the GPS tracking and reporting. Agriculture is improving the produce by monitoring and surroundings and provide immediate feedback to growers. Crowd management becomes more effective with smart bracelet or mobile apps. Smart city and smart home are also the hot areas of IoT applications.
The city of Santander in northern Spain has 20,000 sensors connecting buildings, infrastructure, transport, networks and utilities。 These sensors monitor the levels of pollution, noise, traffic and parking and allow city management to make informed, timely decisions to best use the city resources.
Here we can look at a few exemplary use cases.
First is the connected cow app provided by Vital Herd. They produce an e-pill to be ingested by cows. The pill then sinks to the bottom of its stomach for the animal’s lifetime, transmitting information out about its vital signs: heart and respiratory rate, digestion information, core temperature and one day soon, the chemical composition of its stomach. The data are sent to the cloud where the data is normalized, create vital sign benchmarks for each animal and then deliver that information back to the producer in an easy-to-understand, actionable dashboard format.
This data offers new and early insight into productivity-limiting illnesses, suboptimal nutrition programs and even environmental factors such as heat stress that can reduce production.
In US alone there are 94 million cows, the potential savings the company has estimated to be billions.
John Deere uses sensors added to their latest equipment to help farmers manage their fleet and to decrease downtime of their tractors as well as save on fuel. The information is combined with historical and real-time data regarding weather prediction, soil conditions, crop features and many other data sets. The information is presented in the MyJohnDeere.com platform as well as on the iPad and iPhone app Mobile Farm Manager in order to helper farmers figure out which crops to plant where and when, when and where to plough, where the best return will be made with the crops and even which path to follow when ploughing.
All this will increase the productivity and efficiency of the crops that will in the end lead to higher production and revenue.
John Deere uses MongoDB to store the sensor data and provide real time analytics for the dashboards.
You may not realize, but the number sensors or connected devices are already at an astounding number. According to Cisco’s research, in 2015 there’re close to 20 billions of connections from the things. Put into perspective that’s about 3 sensors per person on the earth. And this number will only increase and by 2020, it will reach 50 billions.
Another interesting chart, is that Asia will be dominating other continents in terms of the IoT application deployments. This is partly due to the fact that 60% of the population is in Asia.
According to IDC, the market size for global IoT will reach 7.1 trillion. This is compared at 1.9T a couple of years ago. The IoT ecosystem includes intelligent and embedded systems shipments, connectivity services, infrastructure, purpose-built IoT platforms, applications, security, analytics, and professional services”
Here’s a diagram that depicts the relationship between these technology stacks. On the left you have sensors and actuators. The sensors typically works 24 x 7, sensing the surrounding information and gather the data, establish a connection typically to a local gateway, and sends the data over. The local gateway then may choose immediately relay or aggregate a bunch of data points then send over the internet to the centralized server, typically in the cloud or in enterprise data centers.
The use of gateway is optional. Sometimes the device may directly interact with a smart phone and the data interaction ends right there. And sometimes the sensor may be equipped with GPRS module thus can directly communicate with remote servers.
http://www.forbes.com/sites/huawei/2015/05/12/the-challenges-of-iot-to-build-a-better-connected-future/
Here let’s look at a few challenges that you might face if you’re going to work on an IoT application.
How many of you still remember MH370, the passenger plane vanished over Indian Ocean last year? After the incident lots of people are wondering, in today’s technology, why we couldn’t even track such a big monster?
Let’s use this example to look at what is involved.
The tracking data could be from multiple sources onboard the plane today. Such as ADS-C, HFDL, EUROCONTROL, ACARS(Airline Comm Addressing & Reporting Sys). A robust tracking may need to combine all these sources together to patch up the whole picture when the unfortunate event happens. For instance, it is reported the ACARS system was deliberately switched off by the pilot. In this case, multiple sources of data may help alleviate the difficulty of locating the plane afterwards.
Even when data comes from one source, depending on the nature of the data, they can have a big variance. For instance, we could be tracking the location data which comes in the form of a pair of coordinates, and engine parameters, which could come with a complex data structure such as a nested document representing the engine performances.
How do you model these data in database?
The not necessary very structured data poses a big challenge for traditional relational database, which works best when you have a clearly defined, well structured data models across all of the data points. For instance, if you were to naively design a huge table, each metric is a column. You would end up lots of wasted spaces for the null values for those rows that don’t have the data. The 2nd downside is that you would need to change the schema structure if you need to add more metrics to the data.
A better alternative is to use an EAV table, stands for Entity, Attribute, Value. The EAV pattern uses two table design. One table holds the data that typically that do not vary among different rows. For those varying data fields, such as LAT/LONG, SPEED, ENGINE, Fuel, that may occur in some rows but not others, use a separate metadata table.
This would appear to be a better design since it doesn’t have the empty cell problem. However, if you think carefully, the METRIC_VALUE column must be defined as a TEXT field, or a string long enough to hold the biggest possible metric value. You still end up some significant disk space waste and IO inefficiencies. Things get worse if you ever want to index on that column: its type variations will severely degrade your index performance.
Furthermore, this EAV design also has another drawback, in that it makes the query more difficult, Especially if you want to lookup a row/document based on multiple fields. For instance: find the planes within 100 miles of New York City with fuel level below 10%. You would need to do multiple joins on the AV table in order to be able to filter on multiple attribute fields.
So this is an example of the variable data structure problem I was trying to illustrate.
The other data challenge is the data volume issue. Again using the flight tracking. Assuming we’re sending data for various stats per minute level, some data may be more frequent some may be less frequent. For the sake of example here, we use a one-minute interval.
MongoDB was built to address the way the world has changed while preserving the core database capabilities required to build functional apps
MongoDB is the only database that harnesses the innovations of NoSQL and maintains the foundation of relational databases
What do we mean by Agility?
Variable structure example
You can create Indexes for selected fields, such as location. What happens with those documents without this field? Sparse
Dynamic schema
Many of sensor data cases fall into this time series pattern. Such as wind sensors, tide monitor, location tracking etc. It turns out the rich document model has another benefit for this type of data. Let’s use the engine fuel as an example
Don’t take my word for it – many of IoT apps out there are using MongoDB. Here are a few examples:
Kaa
IoTgo
Sitewhere
iKeg
What We Sell
We are the MongoDB experts. Over 2,000 organizations rely on our commercial offerings, including leading startups and 30 of the Fortune 100. We offer software and services to make your life easier:
MongoDB Enterprise Advanced is the best way to run MongoDB in your data center. It’s a finely-tuned package of advanced software, support, certifications, and other services designed for the way you do business.
MongoDB Cloud Manager is the easiest way to run MongoDB in the cloud. It makes MongoDB the system you worry about the least and like managing the most.
Production Support helps keep your system up and running and gives you peace of mind. MongoDB engineers help you with production issues and any aspect of your project.
Development Support helps you get up and running quickly. It gives you a complete package of software and services for the early stages of your project.
MongoDB Consulting packages get you to production faster, help you tune performance in production, help you scale, and free you up to focus on your next release.
MongoDB Training helps you become a MongoDB expert, from design to operating mission-critical systems at scale. Whether you’re a developer, DBA, or architect, we can make you better at MongoDB.