2. Introduction
Motivation and Background
Architecture
Framework
Result
Future work
Conclusion
Index term
References
Wayne State University
3. This paper describes:
a few key additional requirements that result from having to
support in-memory processing of data while updates proceed
concurrently.
RAF
Two RAF based solutions (discussed further)
Wayne State University
4. A few examples of information in motion that may just be seconds old, and
not yet well categorized or linked to other data:
- GPS-based navigation : to reduce wasted energy, accidents, delays and
emergencies.
- A credit card company : to detect and intercept suspicious transactions
- A metropolitan or regional power grid : to modulate power generation,
perform load-balancing, direct repair actions, and take policy enforcement
steps
An essential feature in the above examples is the need to integrate new
transactions into analysis results within a very short time—sometimes as
short as a few tens of milliseconds.
Wayne State University
5. RDD makes in-memory solutions less failure prone. So RAF enhances RDD
approach so that resiliency is blended with a few additional characteristics as
listed below:
• Efficient allocation and control of memory resources
• Resilient update of information at much finer resolution
• Flexible and highly efficient concurrency control
• Replication and partitioning of data transparent to clients
Architecturally RAF elevates memory across an entire cluster to a first class
storage entity and defines high level mechanisms by which applications on RAF
can orchestrate distributed actions upon objects stored in cluster memory.
To promote responsible and transparent use of memory, RAF opts to use a
programming language such as C, C++, over mixed language environments in
which garbage allocation is opaque.
Wayne State University
6. Data has a lot of value when mined. As data continues to compound at brisk
rates, institutions need to grapple with two broad demands –
accumulating, processing, synopsizing and utilizing information in a
timely manner
storing the refined data resiliently
keeping the data accessible at high speed.
The term Big Data itself is elastic and serves well as a description of the scale
or volume of these solutions, but does not define a constraining principle for
organizing storage .
Wayne State University
7. Requirements for low-latency and high throughput analytics on
datasets:
In-memory structures and storage
Resiliency
Sharing data through memory
Uniform interaction with storage
Minimizing memory recycling
Efficient integration of CRUD
Synchronizing efficiently
Searching Efficiently
Wayne State University
8.
9. Translation of eight requirements into five design elements:
C and C++ based programming for efficient sharing of data
through memory
Resilient storing of new content
Efficient concurrency
Processing information in motion
Fast, general, ad-hoc searches
Wayne State University
10. This framework targets the execution of complex queries at
very low latency.
Information upon which queries operate may be available on
some storage medium, or generated dynamically as a result of
ongoing transactional activities.
RAF provides distributed computing environment which is
integrated with memory-centric, distributed storage system
where one application can pass the data to another in order to
share data in memory
Wayne State University
11. RDD: used to store information in memory of one or more machines to
assure that in case of failure of one or more machines, the RDD can be
reconstructed.
Transformations: operation on RDD to generate new data sets. RAF
transformations are join, map, union, etc.
Filter: a particular type of transformation. Produces a dataset whose
contents satisfy a specified condition.
Delegate: It is a bridged module. Purpose of delegate is to create a version
of datastore at a particular time and present it as memory resident RDD.
Wayne State
University
12.
13. Efficient storage sharing using DELEGATE
Memory-centric storage operation
-Reliability
Data and storage types
-Structured data
-Storage types (Replicated store and Partitioned store)
Distributed Execution of Analytics tasks
-Analytics tasks interface
Wayne State University
14.
15.
16. Unit Testing:
-Scalability testing results (how well update operations scale)
-Latency relative to Hive/HDFS (how long does it take to
complete a query)
NOTE: These unit test results show advantage of in-memory
distributed processing oriented design of RAF.
Solution-level implementation and testing
-Telecommunications subscriber Management
-Safe City Solution
19. Motivated by the high degree of familiarity that many developers have
with database interfaces, we are incrementally introducing SQL-
92/JDBC/ODBC like interfaces on top of RAF. A number of optimizations
are also being added.
These optimizations include:
application requested indexing, to accelerate searches
blending in column-store capabilities where appropriate (for example, for
rarely-written data)
compression, in order to reduce data transported between nodes.
Wayne State
University
20. Discussed RAF, an architectural approach that meshes memory-centric
non-relational query processing for low latency analytics with memory-
centric update processing to accommodate high volumes of updates.
Delegate, which participates as a special type of content transformer in a
hierarchy of RDD transformations.
In RAF, protocol buffers are used to obtain data abstraction and efficient
conveyance among applications, providing applications with a high degree
of independence in location, representation, and transmission of data.
A light-weight but expressive interface for RAF
Using unit tests we show high cluster scaling capability for transactions, an
order of magnitude latency improvement for query processing.
Discussed two real-world usage scenarios in which RAF is being used.
Wayne State University