Real Time Data
VAHID AMIRI
VAHIDAMIRY.IR
@VAHIDAMIRY
What is a Real-Time System?
 Real-time systems have been defined as: "those systems in
which the correctness of the system depends not only on the
logical result of the computation, but also on the time at which
the results are produced";
J. Stankovic, "Misconceptions About Real-Time Computing," IEEE Computer, 21(10),
October 1988.
 Real-time is the ability of the control system to respond to any
external or internal events in a fast and deterministic way.
 We say that a system is deterministic if the response time is
predictable.
Some Definitions
 Timing constraint: constraint imposed on timing behavior of a
job: hard, firm, or soft.
 Release Time: Instant of time job becomes available for
execution.
 Deadline: Instant of time a job's execution is required to be
completed.
 Response time: Length of time from release time to instant job
completes.
Soft, Firm and Hard deadlines
 The instant at which a result is needed is called a
deadline.
 If the result has utility even after the deadline has passed,
the deadline is classified as soft, otherwise it is firm.
 If a catastrophe could result if a firm deadline is missed, the
deadline is hard.
 Examples?
Hard Real Time Systems
 If it has a hard deadline for the completion of an action
meaning that the deadline must always be met, otherwise
the task has failed.
 This types of systems deployed in embedded safety-critical
systems in which missed deadline can be catastrophic.
Power Station and Nuclear Reactor Control Systems
Missile Control System
Autopilot Control Systems
 Aircraft
 Train
 Car
Soft Real Time Systems
 Soft real time by default as “Not Hard Real Time.
 Missing some deadlines by some amount under some circumstances may be
acceptable rather than failure.
 In this systems there is usually a rising cost associated with lateness.
 Soft real time means systems which have reduced constraints on “lateness”
but still must operate very quickly and repeatable.
 Example:
 Multimedia
 Video Game Systems
 Real Time Data Analytics systems
Validating a RTS is hard
 Validation is simply the ability to be able to prove that you will meet your
constraints
 Or for a non-hard time system, prove failure is rare.
 This is a hard problem just with timing restrictions
 How do you know that you will meet all deadlines?
 And how do you know the worst-case for all these applications?
 Sure you can measure a billion instances of the program running, but could
something make it worse?
 Caches are a pain here.
Some Solutions
 Embedded Systems
 Real Time Operating Systems
 Concurrent and Parallel Programming
 Distributed Systems
What is Embedded Systems ?
 An embedded system is a special-purpose computer system designed to perform
one or a few dedicated functions, often with real-time computing constraints.
 Embedded systems contain a processor, software and Memory and The processor
may be 8051micro-controller or a Pentium-IV processor, Memory ROM and RAM
respectively
Processor
Memory
Input Output
What is Embedded Systems ?
 Embedded systems also contain some type of inputs and outputs
 Inputs to the system generally take the form of sensors and, communication
signals, or control knobs and buttons.
 Outputs are generally displays, communication signals, or changes to the physical
world.
 Real-time embedded systems is one major subclass of embedded systems and
time is most important part for this type of system
Embedded Systems
Real Time Operating System
Real-Time Operating System
 An RTOS is an OS for response time-controlled and event-controlled processes. It is very
essential for large scale embedded systems.
 The main task of a RTOS is to manage the resources of the computer such that a particular
operation executes in precisely the same amount of time every time it occur.
 Multitasking
 Inter-Task communications
 Deterministic response
 Fast Response
 Low Interrupt Latency
 Synchronization
When RTOS is necessary?
RTOS is essential when…
 A common and effective way of handling of the hardware source calls from the
interrupts
 I/O management with devices, files, mailboxes becomes simple using an RTOS
 Effectively scheduling and running and blocking of the tasks in cases of many
tasks and many more…..
 In conclusion, an RTOS may not be necessary in a small-scaled embedded system.
An RTOS is necessary when scheduling of multiple processes and devices is
important.
Distributed Systems
Big Data Characteristics
The world in 60 seconds
Complexity
 Relational Data (Tables/Transaction/Legacy Data)
 Text Data (Web)
 Semi-structured Data (XML)
 Graph Data
 Social Network, Semantic Web (RDF), …
 Streaming Data
 You can only scan the data once
 Big Public Data (online, weather, finance, etc)
Speed
 Data is begin generated fast and need to be processed fast
 Online Data Analytics
 Late decisions  missing opportunities
Social media and networks
(all of us are generating data)
Mobile devices
(tracking all objects all the time)
Sensor technology and
networks
(measuring all kinds of data)
Big Data Vs Real Time
Big Data Processing Timeline
 Batch processing
 Large amount of static data
 Scalable solution
 Volume
 Real-time processing
 Computing streaming data
 Low latency
 Velocity
 Hybrid computation
 Lambda Architecture
 Kappa Architecture
Big Data Solutions
Spark Stack
Conceptual and Physical View of Storm
Big Data Architecture
Lambda Architecture
Kappa Architecture
Unified Architecture
Demo Time
Case Study: Twitter
Data
Sources
Case Study: Twitter
Data
Sources
Kafka
Case Study: Twitter
Data
Sources
Kafka
Case Study: Twitter
Data
Sources
Kafka
NoSql
Case Study: Twitter
Kafka
NoSql
Data
Sources
Real timedata

Real timedata

  • 1.
    Real Time Data VAHIDAMIRI VAHIDAMIRY.IR @VAHIDAMIRY
  • 2.
    What is aReal-Time System?  Real-time systems have been defined as: "those systems in which the correctness of the system depends not only on the logical result of the computation, but also on the time at which the results are produced"; J. Stankovic, "Misconceptions About Real-Time Computing," IEEE Computer, 21(10), October 1988.  Real-time is the ability of the control system to respond to any external or internal events in a fast and deterministic way.  We say that a system is deterministic if the response time is predictable.
  • 3.
    Some Definitions  Timingconstraint: constraint imposed on timing behavior of a job: hard, firm, or soft.  Release Time: Instant of time job becomes available for execution.  Deadline: Instant of time a job's execution is required to be completed.  Response time: Length of time from release time to instant job completes.
  • 4.
    Soft, Firm andHard deadlines  The instant at which a result is needed is called a deadline.  If the result has utility even after the deadline has passed, the deadline is classified as soft, otherwise it is firm.  If a catastrophe could result if a firm deadline is missed, the deadline is hard.  Examples?
  • 5.
    Hard Real TimeSystems  If it has a hard deadline for the completion of an action meaning that the deadline must always be met, otherwise the task has failed.  This types of systems deployed in embedded safety-critical systems in which missed deadline can be catastrophic.
  • 6.
    Power Station andNuclear Reactor Control Systems
  • 7.
  • 8.
    Autopilot Control Systems Aircraft  Train  Car
  • 9.
    Soft Real TimeSystems  Soft real time by default as “Not Hard Real Time.  Missing some deadlines by some amount under some circumstances may be acceptable rather than failure.  In this systems there is usually a rising cost associated with lateness.  Soft real time means systems which have reduced constraints on “lateness” but still must operate very quickly and repeatable.  Example:  Multimedia  Video Game Systems  Real Time Data Analytics systems
  • 10.
    Validating a RTSis hard  Validation is simply the ability to be able to prove that you will meet your constraints  Or for a non-hard time system, prove failure is rare.  This is a hard problem just with timing restrictions  How do you know that you will meet all deadlines?  And how do you know the worst-case for all these applications?  Sure you can measure a billion instances of the program running, but could something make it worse?  Caches are a pain here.
  • 11.
    Some Solutions  EmbeddedSystems  Real Time Operating Systems  Concurrent and Parallel Programming  Distributed Systems
  • 12.
    What is EmbeddedSystems ?  An embedded system is a special-purpose computer system designed to perform one or a few dedicated functions, often with real-time computing constraints.  Embedded systems contain a processor, software and Memory and The processor may be 8051micro-controller or a Pentium-IV processor, Memory ROM and RAM respectively Processor Memory Input Output
  • 13.
    What is EmbeddedSystems ?  Embedded systems also contain some type of inputs and outputs  Inputs to the system generally take the form of sensors and, communication signals, or control knobs and buttons.  Outputs are generally displays, communication signals, or changes to the physical world.  Real-time embedded systems is one major subclass of embedded systems and time is most important part for this type of system
  • 14.
  • 15.
  • 16.
    Real-Time Operating System An RTOS is an OS for response time-controlled and event-controlled processes. It is very essential for large scale embedded systems.  The main task of a RTOS is to manage the resources of the computer such that a particular operation executes in precisely the same amount of time every time it occur.  Multitasking  Inter-Task communications  Deterministic response  Fast Response  Low Interrupt Latency  Synchronization
  • 17.
    When RTOS isnecessary? RTOS is essential when…  A common and effective way of handling of the hardware source calls from the interrupts  I/O management with devices, files, mailboxes becomes simple using an RTOS  Effectively scheduling and running and blocking of the tasks in cases of many tasks and many more…..  In conclusion, an RTOS may not be necessary in a small-scaled embedded system. An RTOS is necessary when scheduling of multiple processes and devices is important.
  • 18.
  • 19.
  • 20.
    The world in60 seconds
  • 21.
    Complexity  Relational Data(Tables/Transaction/Legacy Data)  Text Data (Web)  Semi-structured Data (XML)  Graph Data  Social Network, Semantic Web (RDF), …  Streaming Data  You can only scan the data once  Big Public Data (online, weather, finance, etc)
  • 22.
    Speed  Data isbegin generated fast and need to be processed fast  Online Data Analytics  Late decisions  missing opportunities Social media and networks (all of us are generating data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data)
  • 23.
    Big Data VsReal Time
  • 24.
    Big Data ProcessingTimeline  Batch processing  Large amount of static data  Scalable solution  Volume  Real-time processing  Computing streaming data  Low latency  Velocity  Hybrid computation  Lambda Architecture  Kappa Architecture
  • 25.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.

Editor's Notes

  • #34 Lets look at an end to end architecture of putting together open source tools to do real time stream processing. Lets start with the sources of data.
  • #35 You want to write this data to a reliable high-throughput low latency messaging system, Kafka and Flume are popular choices, but there are many options out there, like ActiveMQ, RabbitMQ,etc. Kafka is the system that is gaining the most popularity right now. ====== With this architecture, the real-time processed data only gets leveraged when the next application query comes in. But often you want to take some action based on the real-time analysis of your data. For proactive actions, write relevant events out to Kafka. Again, based on yoru stream processign engine you will find libraries that make this easy. You can have an application that is continusouly listeing on your event queue, and can issues alerts, emails, etc
  • #36 A stream processing system like Spark Streaming can then read your data streams from the messaging system. Filter Enrich or embellish your data with relevant metadata Transform Compute statistics based on moving windows of time Feature Engineering + Predictive Analytics … and much more
  • #37 Almost always, you want to take your full fidelity raw data, and put it in HDFS, or an object store if your are running in the cloud. The raw data can then be used in batch jobs where you may want to do deep complex processing that can not be done in a streaming fashion. Or you may have a team of data scientists who may want to explore the data and uncover new insights. Why the dotted line: how you dump your data to HDFS depends on your messaging system. Almost all messaging systems will provide a way to transfer your data to HDFS
  • #38 All this real-time processing is great, but not very useful if you can not serve the processed data to your application in real-time. Your need a system that can enable a lot of fast reads and writes. That is where NoSql stores come in. There are many choices here. Hbase, Cassnadra and MongoDb are popular choices. All those end applications Also, for most stream procsssing engine and NoSql store pairs, there are libraries available that make it easy to read from or write to your NoSql store from the stream processing engine: for example, the SparkOnHbase library makes it easy to write to Hbase from spark streamign jobs.
  • #39 Another common scenario is indexing your data, in real-time, into a search system. This is great if the data your are dealing with is textual data. There are libararies that enable real-time indexing of your data in your stream proocessing engine, and writing it to a Search Engine.
  • #40 Now the data is ready to be queried by your application. This is a very common and popular architecture, and I am guessing this is in keeping with what most of you would have expected.
  • #41 Again, write your processed output to HDFS. Again, why the dotter arrow. Weather or not you need to dump data to HDFS depends upon your serving system of choice. If you write it to Hbase, you may not need to duplicate it in HDFS. But if you are indexing the data in search or writing to a system like Redis, you may want to also write the processed otuptut to HDFS. Why? If nothing else, for auditing purposes. Errors will happen. And you may need to go back and audit what was done in your stream processing engine. Hence, put the data in hdfs and keep it there are some amount of time.
  • #42 With this architecture, the real-time processed data only gets leveraged when the next application query comes in. But often you want to take some action based on the real-time analysis of your data. For proactive actions, write relevant events out to Kafka. Again, based on yoru stream processign engine you will find libraries that make this easy. You can have an application that is continusouly listeing on your event queue, and can issues alerts, emails, etc
  • #43 By writing it to a message queue, you enable multiple downstream applications to consume the data as its produced, including enabling furthur processing of your data with a stream processing engine. Such multi-stage architectures, where you cosnume from say Kafka, process the data, produce a new stream in Kafka, and process