This document discusses real-time data and real-time systems. It defines real-time systems as systems where the correctness depends on both the logical result and the time the result is produced. Real-time systems must respond to events in a fast and predictable way. The document discusses soft, firm, and hard deadlines and gives examples of hard real-time systems like nuclear reactor control. It also discusses challenges in validating that a real-time system can meet all its timing constraints and potential solutions like real-time operating systems and distributed systems.
2. What is a Real-Time System?
Real-time systems have been defined as: "those systems in
which the correctness of the system depends not only on the
logical result of the computation, but also on the time at which
the results are produced";
J. Stankovic, "Misconceptions About Real-Time Computing," IEEE Computer, 21(10),
October 1988.
Real-time is the ability of the control system to respond to any
external or internal events in a fast and deterministic way.
We say that a system is deterministic if the response time is
predictable.
3. Some Definitions
Timing constraint: constraint imposed on timing behavior of a
job: hard, firm, or soft.
Release Time: Instant of time job becomes available for
execution.
Deadline: Instant of time a job's execution is required to be
completed.
Response time: Length of time from release time to instant job
completes.
4. Soft, Firm and Hard deadlines
The instant at which a result is needed is called a
deadline.
If the result has utility even after the deadline has passed,
the deadline is classified as soft, otherwise it is firm.
If a catastrophe could result if a firm deadline is missed, the
deadline is hard.
Examples?
5. Hard Real Time Systems
If it has a hard deadline for the completion of an action
meaning that the deadline must always be met, otherwise
the task has failed.
This types of systems deployed in embedded safety-critical
systems in which missed deadline can be catastrophic.
9. Soft Real Time Systems
Soft real time by default as “Not Hard Real Time.
Missing some deadlines by some amount under some circumstances may be
acceptable rather than failure.
In this systems there is usually a rising cost associated with lateness.
Soft real time means systems which have reduced constraints on “lateness”
but still must operate very quickly and repeatable.
Example:
Multimedia
Video Game Systems
Real Time Data Analytics systems
10. Validating a RTS is hard
Validation is simply the ability to be able to prove that you will meet your
constraints
Or for a non-hard time system, prove failure is rare.
This is a hard problem just with timing restrictions
How do you know that you will meet all deadlines?
And how do you know the worst-case for all these applications?
Sure you can measure a billion instances of the program running, but could
something make it worse?
Caches are a pain here.
11. Some Solutions
Embedded Systems
Real Time Operating Systems
Concurrent and Parallel Programming
Distributed Systems
12. What is Embedded Systems ?
An embedded system is a special-purpose computer system designed to perform
one or a few dedicated functions, often with real-time computing constraints.
Embedded systems contain a processor, software and Memory and The processor
may be 8051micro-controller or a Pentium-IV processor, Memory ROM and RAM
respectively
Processor
Memory
Input Output
13. What is Embedded Systems ?
Embedded systems also contain some type of inputs and outputs
Inputs to the system generally take the form of sensors and, communication
signals, or control knobs and buttons.
Outputs are generally displays, communication signals, or changes to the physical
world.
Real-time embedded systems is one major subclass of embedded systems and
time is most important part for this type of system
16. Real-Time Operating System
An RTOS is an OS for response time-controlled and event-controlled processes. It is very
essential for large scale embedded systems.
The main task of a RTOS is to manage the resources of the computer such that a particular
operation executes in precisely the same amount of time every time it occur.
Multitasking
Inter-Task communications
Deterministic response
Fast Response
Low Interrupt Latency
Synchronization
17. When RTOS is necessary?
RTOS is essential when…
A common and effective way of handling of the hardware source calls from the
interrupts
I/O management with devices, files, mailboxes becomes simple using an RTOS
Effectively scheduling and running and blocking of the tasks in cases of many
tasks and many more…..
In conclusion, an RTOS may not be necessary in a small-scaled embedded system.
An RTOS is necessary when scheduling of multiple processes and devices is
important.
21. Complexity
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
Social Network, Semantic Web (RDF), …
Streaming Data
You can only scan the data once
Big Public Data (online, weather, finance, etc)
22. Speed
Data is begin generated fast and need to be processed fast
Online Data Analytics
Late decisions missing opportunities
Social media and networks
(all of us are generating data)
Mobile devices
(tracking all objects all the time)
Sensor technology and
networks
(measuring all kinds of data)
Lets look at an end to end architecture of putting together open source tools to do real time stream processing.
Lets start with the sources of data.
You want to write this data to a reliable high-throughput low latency messaging system, Kafka and Flume are popular choices, but there are many options out there, like ActiveMQ, RabbitMQ,etc.
Kafka is the system that is gaining the most popularity right now.
======
With this architecture, the real-time processed data only gets leveraged when the next application query comes in. But often you want to take some action based on the real-time analysis of your data.
For proactive actions, write relevant events out to Kafka. Again, based on yoru stream processign engine you will find libraries that make this easy.
You can have an application that is continusouly listeing on your event queue, and can issues alerts, emails, etc
A stream processing system like Spark Streaming can then read your data streams from the messaging system.
Filter
Enrich or embellish your data with relevant metadata
Transform
Compute statistics based on moving windows of time
Feature Engineering + Predictive Analytics
… and much more
Almost always, you want to take your full fidelity raw data, and put it in HDFS, or an object store if your are running in the cloud.
The raw data can then be used in batch jobs where you may want to do deep complex processing that can not be done in a streaming fashion. Or you may have a team of data scientists who may want to explore the data and uncover new insights.
Why the dotted line: how you dump your data to HDFS depends on your messaging system. Almost all messaging systems will provide a way to transfer your data to HDFS
All this real-time processing is great, but not very useful if you can not serve the processed data to your application in real-time. Your need a system that can enable a lot of fast reads and writes. That is where NoSql stores come in. There are many choices here. Hbase, Cassnadra and MongoDb are popular choices.
All those end applications
Also, for most stream procsssing engine and NoSql store pairs, there are libraries available that make it easy to read from or write to your NoSql store from the stream processing engine: for example, the SparkOnHbase library makes it easy to write to Hbase from spark streamign jobs.
Another common scenario is indexing your data, in real-time, into a search system.
This is great if the data your are dealing with is textual data.
There are libararies that enable real-time indexing of your data in your stream proocessing engine, and writing it to a Search Engine.
Now the data is ready to be queried by your application.
This is a very common and popular architecture, and I am guessing this is in keeping with what most of you would have expected.
Again, write your processed output to HDFS. Again, why the dotter arrow. Weather or not you need to dump data to HDFS depends upon your serving system of choice. If you write it to Hbase, you may not need to duplicate it in HDFS. But if you are indexing the data in search or writing to a system like Redis, you may want to also write the processed otuptut to HDFS. Why? If nothing else, for auditing purposes. Errors will happen. And you may need to go back and audit what was done in your stream processing engine. Hence, put the data in hdfs and keep it there are some amount of time.
With this architecture, the real-time processed data only gets leveraged when the next application query comes in. But often you want to take some action based on the real-time analysis of your data.
For proactive actions, write relevant events out to Kafka. Again, based on yoru stream processign engine you will find libraries that make this easy.
You can have an application that is continusouly listeing on your event queue, and can issues alerts, emails, etc
By writing it to a message queue, you enable multiple downstream applications to consume the data as its produced, including enabling furthur processing of your data with a stream processing engine. Such multi-stage architectures, where you cosnume from say Kafka, process the data, produce a new stream in Kafka, and process