Apache Ignite is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time. But, did you know it provides streaming and complex event processing (CEP)? In this hands-on demonstration we will take Apache Ignite’s Streaming and CEP features for a test drive. We will start with an example streaming use case then demonstrate how to implement each component in Apache Ignite. Finally we will show how to connect a dashboard application to Apache Ignite to display the results.
With persistent memory solutions quickly moving from concept designs to mass-production reality, IT architects are faced with significant questions: How do I get the most value out of my system? How will the broader market adopt and implement today’s NVDIMM portfolio? What applications gain the most benefit from today’s solutions? What are the current challenges for adoption? How should I plan to ensure I keep up with industry trends?
Gordon Patrick, director of Micron’s enterprise computing memory business, will provide a view of how current products are driving new opportunities in persistent memory and provide insight on important industry trends affecting tomorrow’s persistent memory platforms.
Three key audience takeaways:
What are the clearest routes to add value to your systems through today’s persistent memory solutions?
What key design elements should be considered given the broader shift to non-volatile memory systems?
What changes are needed to truly extract the value from today’s persistent memory technology?
Lets face it: Distributed computing is hard. The truth is that most systems and vendor solutions work great under regular conditions, but what separates them is what happens when things go wrong. If you’re building a mission critical distributed system, you need to take the time to build infrastructure to test for failure. In this talk we’ll outline how we think about testing a distributed system, and share some real world experience in ferreting out issues before they become problems in production. We’ll provide a hands on overview of our test framework and show you how you too can be prepared.
This talk describes the future memory and storage architecture create by the convergence of In-Memory computing and emerging Persistent Memory technologies. The audience will learn:
- The new Memory and Storage architecture created by these technologies;
- The new operating system file system and memory management architectures under development by the major OS vendors;
- The new APIs for In-Memory computing with Persistent Memory
- Opportunities for software innovation based this disruptive shift in the cloud architecture
Join Dr. Konstantin Boudnik, VP Open Source Development, WANdisco and Member of the Apache Software Foundation on Thursday, August 20, 2015 at 11:00 AM PDT / 2:00 PM EDT as he explains how Hadoop, Apache Spark and Apache Ignite™ (incubating) are integrated under Apache Bigtop. In this one hour webinar he’ll go in-depth, including live demos and benchmarking examples, on how to turbocharge Hadoop back-end storage access with Apache Ignite™ (incubating) MapReduce and Caching.
Today, many companies are faced with a huge quantity of data and a wide variety of tools with which to process it. This potentially allows for great opportunities to satisfy customers’ needs and bring user experience to the next level. However, in order to achieve this and provide a competitive solution, sophisticated and complex data processing is needed. Such processing can rarely be done with one tool or framework — a number of tools are often involved, each having prowess in a particular field of the processing pipeline.
In this session, we will see the latest endeavors of Apache Ignite to integrate with other big data platforms and provide its in-memory computing strengths for data processing pipelines. In particular we will have a closer look at how it can be integrated and used with Apache Kafka and/or Flume, and outline several use scenarios.
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...Stephen Darlington
Presented at OpenStack Summit, Berlin 2018.
In this presentation, attendees will learn how Kubernetes can orchestrate a distributed database like Apache Ignite, in particular:
* Cluster Assembling - database nodes auto-discovery in Kubernetes.
* Database Resilience - automated horizontal scalability.
* Database Availability - what’s the role of Kubernetes and the database.
* Utilizing both RAM and disk - set up Apache Ignite in a way to get in-memory performance with the durability of disk.
With persistent memory solutions quickly moving from concept designs to mass-production reality, IT architects are faced with significant questions: How do I get the most value out of my system? How will the broader market adopt and implement today’s NVDIMM portfolio? What applications gain the most benefit from today’s solutions? What are the current challenges for adoption? How should I plan to ensure I keep up with industry trends?
Gordon Patrick, director of Micron’s enterprise computing memory business, will provide a view of how current products are driving new opportunities in persistent memory and provide insight on important industry trends affecting tomorrow’s persistent memory platforms.
Three key audience takeaways:
What are the clearest routes to add value to your systems through today’s persistent memory solutions?
What key design elements should be considered given the broader shift to non-volatile memory systems?
What changes are needed to truly extract the value from today’s persistent memory technology?
Lets face it: Distributed computing is hard. The truth is that most systems and vendor solutions work great under regular conditions, but what separates them is what happens when things go wrong. If you’re building a mission critical distributed system, you need to take the time to build infrastructure to test for failure. In this talk we’ll outline how we think about testing a distributed system, and share some real world experience in ferreting out issues before they become problems in production. We’ll provide a hands on overview of our test framework and show you how you too can be prepared.
This talk describes the future memory and storage architecture create by the convergence of In-Memory computing and emerging Persistent Memory technologies. The audience will learn:
- The new Memory and Storage architecture created by these technologies;
- The new operating system file system and memory management architectures under development by the major OS vendors;
- The new APIs for In-Memory computing with Persistent Memory
- Opportunities for software innovation based this disruptive shift in the cloud architecture
Join Dr. Konstantin Boudnik, VP Open Source Development, WANdisco and Member of the Apache Software Foundation on Thursday, August 20, 2015 at 11:00 AM PDT / 2:00 PM EDT as he explains how Hadoop, Apache Spark and Apache Ignite™ (incubating) are integrated under Apache Bigtop. In this one hour webinar he’ll go in-depth, including live demos and benchmarking examples, on how to turbocharge Hadoop back-end storage access with Apache Ignite™ (incubating) MapReduce and Caching.
Today, many companies are faced with a huge quantity of data and a wide variety of tools with which to process it. This potentially allows for great opportunities to satisfy customers’ needs and bring user experience to the next level. However, in order to achieve this and provide a competitive solution, sophisticated and complex data processing is needed. Such processing can rarely be done with one tool or framework — a number of tools are often involved, each having prowess in a particular field of the processing pipeline.
In this session, we will see the latest endeavors of Apache Ignite to integrate with other big data platforms and provide its in-memory computing strengths for data processing pipelines. In particular we will have a closer look at how it can be integrated and used with Apache Kafka and/or Flume, and outline several use scenarios.
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...Stephen Darlington
Presented at OpenStack Summit, Berlin 2018.
In this presentation, attendees will learn how Kubernetes can orchestrate a distributed database like Apache Ignite, in particular:
* Cluster Assembling - database nodes auto-discovery in Kubernetes.
* Database Resilience - automated horizontal scalability.
* Database Availability - what’s the role of Kubernetes and the database.
* Utilizing both RAM and disk - set up Apache Ignite in a way to get in-memory performance with the durability of disk.
Octo and the DevSecOps Evolution at Oracle by Ian Van HovenInfluxData
The transition from 40 years of successful licensed software development to an agile-based SaaS business involves many challenges. Octo, a real-time streaming metrics framework built around InfluxDB time series database, is aimed specifically at one: simplifying the collection and visualization of mission-critical operational data to enable a culture change toward metrics immersion and product ownership. Learn more by viewing this InfluxDays NYC 2019 presentation.
De nouvelles générations de technologies de bases de données permettent aux organisations de créer des applications jusque-là inédites, à une vitesse et une échelle inimaginables auparavant. MongoDB est la base de données qui connaît la croissance la plus rapide au monde. La nouvelle version 3.2 offre les avantages des architectures de bases de données modernes à une gamme toujours plus large d'applications et d'utilisateurs.
Amazon EC2 F1 is a new compute instance with programmable hardware for application acceleration. With F1, you can directly access custom FPGA hardware on the instance in a few clicks.
Learning Objectives:
• Learn about the capabilities, features, and benefits of the new F1 instances
• Develop your FPGA using the F1 Hardware Developer Kit and FPGA Developer AMI
• Deploy your FPGA acceleration code using F1 instances
• Use F1 instances for hardware acceleration in your applications
• Learn how to offer pre-packaged Amazon FPGA Machine Images (AFIs) to your customers through the AWS Marketplace
Using Databases and Containers From Development to DeploymentAerospike, Inc.
We cover the following topics:
Using Docker to Orchestrate a multi container application (Flask + Aerospike)
Injecting HAProxy and other production requirements as we deploy to production
Scaling the Web and Aerospike clusters to grow to meet demand
How to Get a Game Changing Performance Advantage with Intel SSDs and AerospikeAerospike, Inc.
Frank Ober of Intel’s Solutions Group will review how he achieved 1+ million transactions per second on a single dual socket Xeon Server with SSDs using the open source tools of Aerospike for benchmarking. The presentation will include a live demo showing the performance of a sample system. We will cover:
The state of Key-value Stores on modern SSDs.
What choices you make in your selection process of hardware that will most benefit a consistent deployment of Aerospike.
How to run an Aerospike mesh on a single machine.
How to work replication of that mesh, and what values allow for maximum threading and scale.
We will also focus on some key learnings and the Total Cost of Ownership choices that will make your deployment more effective long term.
Running Analytics at the Speed of Your BusinessRedis Labs
The speed at which you can extract insights from your data is increasingly a competitive edge for your business. Data and analytics have to be at lightning fast speeds to seriously impact your user acquisition.
Join this webinar featuring Forrester analyst Noel Yuhanna and Leena Joshi, VP Product Marketing at Redis Labs to learn how you can glean insights faster with new open source data processing frameworks like Spark and Redis.
In this webinar you will learn:
* Why analytics has to run at the real time speed of business
* How this can be achieved with next generation Big Data tools
* How data structures can optimize your hybrid transaction-analytics processing scenarios
TidalScale has created a software defined computer.
At TidalScale, we have created a simple cost-effective way for a data scientist, an analyst, an engineer, a scientist, a database administrator, or a software developer to access a group of servers through a single operating system instance as if it were a single supercomputer. This dramatically simplifies development, while reducing software scaling complexity not to mention a dramatic cost saving in hardware and software.
We configure hosted hardware into one or more TidalPods. Each TidalPod is a virtual supercomputer comprising a set of commodity servers configured with the TidalScale HyperKernel. What the user sees is standard Linux, FreeBSD or Windows running with the sum of all memory, processors, networks, and I/O. The secret sauce is the HyperKernel that fools the guest OS into thinking it’s running directly on a huge, expensive machine when in fact it’s running on a set of smaller, less expensive servers.
We offer an incredibly simple user experience.
• Define the computer size you want (Number of CPU, Amount of Memory), boot the virtual machine, then login to the computer…
Thus, we enable a simple cost-effective way for a data scientist, an analyst, an engineer, a scientist, a database administrator, or a software developer to access a group of servers in a Datacenter through a single operating system instance as if it were a single supercomputer. This dramatically simplifies development, while reducing software scaling complexity not to mention a dramatic cost saving in hardware and software.
ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACIDAerospike, Inc.
Aerospike founder & VP of Engineering & Operations Srini Srinivasan, and Engineering Lead Sunil Sayyaparaju, will review the principles of the CAP Theorem and how they apply to the Aerospike database. They will give a brief technical overview of ACID support in Aerospike and describe how Aerospike’s continuous availability and practical approach to avoiding partitions provides the highest levels of consistency in an AP system. They will also show how to optimize Aerospike and describe how this is achieved in numerous real world scenarios.
Hadoop and NoSQL databases have emerged as leading choices by bringing new capabilities to the field of data management and analysis. At the same time, the RDBMS, firmly entrenched in most enterprises, continues to advance in features and varieties to address new challenges.
Join us for a special roundtable webcast on April 7th to learn:
The key differences between Hadoop, NoSQL and RDBMS today
The key use cases
How to choose the best platform for your business needs
When a hybrid approach will best fit your needs
Best practices for managing, securing and integrating data across platforms
Webinar | Building Apps with the Cassandra Python DriverDataStax Academy
With the new Python driver for Cassandra it is easy to build integrations and apps that use Cassandra seamlessly as a back in. This session will explore what it takes to build the app and the features available with the new Python drivers.
In this presentation we look at the roadmap for Apache Ignite 2.0 towards becoming one of the first convergent data platform that would combine cross-channel tiered storage model (DRAM, Flash, HDD) and multi-paradigm access pattern (K/V, SQL, MapReduce, MPP) into one highly integrated and easy to use data platform.
The advent of non-volatile memory (NVM) will fundamentally change the dichotomy between memory and durable storage in database management systems (DBMSs). These new NVM devices are almost as fast as DRAM, but all writes to it are potentially persistent even after power loss. Existing DBMSs are unable to take full advantage of this technology because their internal architectures are predicated on the assumption that memory is volatile. That means when NVM finally arrives, just like when you finally passed that kidney stone after three weeks, everyone will be relieved but the transition will be painful. Many of the components of legacy DBMSs will become unnecessary and will degrade the performance of data intensive applications.
Octo and the DevSecOps Evolution at Oracle by Ian Van HovenInfluxData
The transition from 40 years of successful licensed software development to an agile-based SaaS business involves many challenges. Octo, a real-time streaming metrics framework built around InfluxDB time series database, is aimed specifically at one: simplifying the collection and visualization of mission-critical operational data to enable a culture change toward metrics immersion and product ownership. Learn more by viewing this InfluxDays NYC 2019 presentation.
De nouvelles générations de technologies de bases de données permettent aux organisations de créer des applications jusque-là inédites, à une vitesse et une échelle inimaginables auparavant. MongoDB est la base de données qui connaît la croissance la plus rapide au monde. La nouvelle version 3.2 offre les avantages des architectures de bases de données modernes à une gamme toujours plus large d'applications et d'utilisateurs.
Amazon EC2 F1 is a new compute instance with programmable hardware for application acceleration. With F1, you can directly access custom FPGA hardware on the instance in a few clicks.
Learning Objectives:
• Learn about the capabilities, features, and benefits of the new F1 instances
• Develop your FPGA using the F1 Hardware Developer Kit and FPGA Developer AMI
• Deploy your FPGA acceleration code using F1 instances
• Use F1 instances for hardware acceleration in your applications
• Learn how to offer pre-packaged Amazon FPGA Machine Images (AFIs) to your customers through the AWS Marketplace
Using Databases and Containers From Development to DeploymentAerospike, Inc.
We cover the following topics:
Using Docker to Orchestrate a multi container application (Flask + Aerospike)
Injecting HAProxy and other production requirements as we deploy to production
Scaling the Web and Aerospike clusters to grow to meet demand
How to Get a Game Changing Performance Advantage with Intel SSDs and AerospikeAerospike, Inc.
Frank Ober of Intel’s Solutions Group will review how he achieved 1+ million transactions per second on a single dual socket Xeon Server with SSDs using the open source tools of Aerospike for benchmarking. The presentation will include a live demo showing the performance of a sample system. We will cover:
The state of Key-value Stores on modern SSDs.
What choices you make in your selection process of hardware that will most benefit a consistent deployment of Aerospike.
How to run an Aerospike mesh on a single machine.
How to work replication of that mesh, and what values allow for maximum threading and scale.
We will also focus on some key learnings and the Total Cost of Ownership choices that will make your deployment more effective long term.
Running Analytics at the Speed of Your BusinessRedis Labs
The speed at which you can extract insights from your data is increasingly a competitive edge for your business. Data and analytics have to be at lightning fast speeds to seriously impact your user acquisition.
Join this webinar featuring Forrester analyst Noel Yuhanna and Leena Joshi, VP Product Marketing at Redis Labs to learn how you can glean insights faster with new open source data processing frameworks like Spark and Redis.
In this webinar you will learn:
* Why analytics has to run at the real time speed of business
* How this can be achieved with next generation Big Data tools
* How data structures can optimize your hybrid transaction-analytics processing scenarios
TidalScale has created a software defined computer.
At TidalScale, we have created a simple cost-effective way for a data scientist, an analyst, an engineer, a scientist, a database administrator, or a software developer to access a group of servers through a single operating system instance as if it were a single supercomputer. This dramatically simplifies development, while reducing software scaling complexity not to mention a dramatic cost saving in hardware and software.
We configure hosted hardware into one or more TidalPods. Each TidalPod is a virtual supercomputer comprising a set of commodity servers configured with the TidalScale HyperKernel. What the user sees is standard Linux, FreeBSD or Windows running with the sum of all memory, processors, networks, and I/O. The secret sauce is the HyperKernel that fools the guest OS into thinking it’s running directly on a huge, expensive machine when in fact it’s running on a set of smaller, less expensive servers.
We offer an incredibly simple user experience.
• Define the computer size you want (Number of CPU, Amount of Memory), boot the virtual machine, then login to the computer…
Thus, we enable a simple cost-effective way for a data scientist, an analyst, an engineer, a scientist, a database administrator, or a software developer to access a group of servers in a Datacenter through a single operating system instance as if it were a single supercomputer. This dramatically simplifies development, while reducing software scaling complexity not to mention a dramatic cost saving in hardware and software.
ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACIDAerospike, Inc.
Aerospike founder & VP of Engineering & Operations Srini Srinivasan, and Engineering Lead Sunil Sayyaparaju, will review the principles of the CAP Theorem and how they apply to the Aerospike database. They will give a brief technical overview of ACID support in Aerospike and describe how Aerospike’s continuous availability and practical approach to avoiding partitions provides the highest levels of consistency in an AP system. They will also show how to optimize Aerospike and describe how this is achieved in numerous real world scenarios.
Hadoop and NoSQL databases have emerged as leading choices by bringing new capabilities to the field of data management and analysis. At the same time, the RDBMS, firmly entrenched in most enterprises, continues to advance in features and varieties to address new challenges.
Join us for a special roundtable webcast on April 7th to learn:
The key differences between Hadoop, NoSQL and RDBMS today
The key use cases
How to choose the best platform for your business needs
When a hybrid approach will best fit your needs
Best practices for managing, securing and integrating data across platforms
Webinar | Building Apps with the Cassandra Python DriverDataStax Academy
With the new Python driver for Cassandra it is easy to build integrations and apps that use Cassandra seamlessly as a back in. This session will explore what it takes to build the app and the features available with the new Python drivers.
In this presentation we look at the roadmap for Apache Ignite 2.0 towards becoming one of the first convergent data platform that would combine cross-channel tiered storage model (DRAM, Flash, HDD) and multi-paradigm access pattern (K/V, SQL, MapReduce, MPP) into one highly integrated and easy to use data platform.
The advent of non-volatile memory (NVM) will fundamentally change the dichotomy between memory and durable storage in database management systems (DBMSs). These new NVM devices are almost as fast as DRAM, but all writes to it are potentially persistent even after power loss. Existing DBMSs are unable to take full advantage of this technology because their internal architectures are predicated on the assumption that memory is volatile. That means when NVM finally arrives, just like when you finally passed that kidney stone after three weeks, everyone will be relieved but the transition will be painful. Many of the components of legacy DBMSs will become unnecessary and will degrade the performance of data intensive applications.
Modern transactional systems need to be fast, always available and constantly scale to meet the ever changing needs of the business. It is becoming increasingly commonplace for next generation e-commerce systems to demand double or single digit millisecond response times, for financial trading systems to incur maximum latencies in the order of microseconds and gaming and analytic engines to consumes hundreds of thousands of transactions a second. It is a common and tempting mistake to believe that we can meet the extreme needs of such systems by just replacing traditional disk based storage systems with in-memory data grids using traditional application architectures. Such an approach will take us only so far after which the system’s demands will once again overtake its capabilities. To truly meet the extreme needs of these systems and continue to scale as the demand scales, we need to think differently about how such systems are architected and employ modern techniques to unlock the full potential of memory oriented computing. This talk explains why and how.
Join Girish Mutreja, CEO of Neeve Research and author of the X Platform as he discusses the above and provides a unique perspective into what’s different about memory oriented TP applications and how application architectures, particularly mission critical applications, need to adapt to the new world of memory oriented computing. Girish will outline the key architectural elements of TP applications and explain how they need to function in the world of memory oriented computing. He will delve into why such systems need to be architected as a marriage between messaging and data storage; why message routing and data gravity is of critical importance to these systems; how structured, in-memory state lends to extreme agility; how fault tolerance, load balancing, transaction processing and threading need to function in such systems; why architectural precepts such as transaction pipelining and agent oriented design are critical to reliability, performance and scalability. Girish will illustrate how these concepts have enabled enterprises such as MGM Resorts to transition to game changing, memory oriented architectures by leveraging the X Platform.
There are many computational paradigms that could be used to harness the power of the herd of computers. In financial services, a share-nothing approach could be used to speed up CPU intensive calculations while the hierarchal nature of rollups requires tight synchronization. Some interesting use cases are:
In Wealth Management, the SQL approach is traditionally used, but it lacks efficient support of hierarchal structures, iterative calculation, and provides limited scalability. Unlike traditional, centralized scale-up enterprise systems, an in-memory-based architecture scales out and takes advantage of cost-effective high volume commodity hardware that maximizes compute power efficiently. It makes the user experience better by speeding up response time utilizing distributed implementation of calculation algorithms. OData enables DaaS to expose financial data and calculation capabilities.
In the insurance industry, in-memory computing was used for Monte-Carlo to estimate the value of life insurance policies. This is a very CPU-intensive task, which requires 2000 cores to build ~1 million simulated policies in 30 minutes (about 25 trillion numbers or 100TB of data), which then aggregates and compresses into 40GB of data for analysis.
To speed up CPU-intensive iterative financial calculations, we use a share-nothing approach while the hierarchal nature of rollups requires tight synchronization. Several algorithms that are typical for the financial industry, different approaches on distribution and synchronization, and the benefits of in-memory data grid technologies will be discussed.
In this talk I’ll present the SharedRDD – a high-performance in-memory caching layer for Spark jobs. We’ll work through 1) design & architecture of this component, 2) configuration and 3) actual Java and Scala usage examples.
In this presentation, Dmitriy will describe the strategy and architecture behind the Apache Ignite(TM) (incubating) In-Memory Data Fabric, a high-performance, distributed in-memory data management software layer that boosts application performance and scale by orders of magnitude. We will dive into the technical details of distributed clusters and compute grids as well as distributed data grids, and provide code samples for each. As integral parts of an In-Memory Data Fabric, Dmitriy will also cover distributed streaming, CEP and Hadoop acceleration. This presentation is particularly relevant for software developers and architects who work on the front lines of high-speed, low-latency Fast Data systems, high-performance transactional systems and real-time analytics applications.
In-Memory Computing frameworks such as Spark are gaining tremendous popularity for Big Data processing as their in-memory primitives make it possible to eliminate disk I/O bottleneck. Logically, the more available memory they have, the better performance they can achieve. However, unpredicted GC activity from on-heap memory management, high cost for serialization/de-serialization (SerDe), and burst temporary object creation/destruction greatly impacts their performance and scale-out ability. For example in Spark, when the volume of datasets are much larger than the system memory volume, SerDe makes significant impact on almost every in-memory computing steps such as caching, checkpoint, shuffling/dispatching, data loading and Storing.
With fast growing advanced server platform with significant increased non-volatile memory such as Intel 3D Xpoint technology powered NVMe and Fast SSD Array Storage, how to best use various hybrid memory-like resources from DRAM to NVMe/SSD determines Big Data applications performance and scalability.
In this presentation, we will first introduce our non-volatile generic Java object programming model for In-Memory Computing. This programming model defines in-memory non-volatile objects which can be directly operated on memory-like resources. We then discuss our structured data in-memory persistence library that can be used to load/store non-volatile generic Java object from/to underlying heterogeneous memory-like resources, such as DRAM, NVMe, even SSD.
We then present a non-volatile computing case using Spark. We will introduce that this model can (1) Lazily loads data to minimize memory footprint, (2) Naturally fits both non-volatile RDD and off-heap RDD, (3) Uses non-volatile/off-heap RDDs to transform Spark datasets, (4) Avoids memory caching by using in-place non-volatile datasets.
Finally we will present that up to 2X performance boost can be achieved on Spark ML tests after applying this non-volatile computing approach that removed SerDe, caching hot data, and reducing GC pause time dramatically.
In-memory computing is all about now. It’s the art of collecting and processing data as quickly as it is created in order to provide instant actionable insights. Databases, however, are all about the past. They are a record of what happened, not what is happening right now.
In this presentation, you will learn how to turn your enterprise databases, and the applications they support, into real-time sources of what’s currently happening throughout the business. By utilizing database change, and in-memory processing and analytics, you can tap into your enterprise activity and make decisions while the data is still relevant.
Simplicity, accuracy, speed are three things everyone wants from their data architecture. A content delivery network based in LA, was looking to achieve these goals and developed a framework that handled batch and stream processing with open source software. The objective was to manage the real-time aggregation of over 32 TB of daily web server log data. The problem? Everything. Listen as Dennis Duckworth explains how VoltDB reduced the number of environments, used 1/10th the CPU cycles, and achieved 100% billing accuracy on 32 TB of daily web server data.
Neeve Research offers the X Platform, a revolutionary memory-oriented transaction processing platform for extreme enterprise applications. The platform uniquely integrates structured in-memory state, advanced messaging, multi-agency and decoupled enterprise data management to enable a true no-compromise extreme TP platform. The true innovation of the platform lies in its ability to provide a no-compromise blend of extreme performance, reliability, scalability and developmental agility. It is extremely fast, it is extremely easy to use, it can be used to build a wide variety of applications and the applications built using it exhibit zero data loss and scale linearly. After almost a decade of hard engineering and close-quarters field hardening with an exclusive set of Fortune 300 companies, Neeve is opening the platform for wider use. Listen as Girish Mutreja unveils the X Platform and shows how easy it is to build an application that performs at 100s of thousands of transactions per second or sub-100 microsecond latencies with zero garbage and zero data loss.
The reality is that you don’t need ‘stateless’ services to either scale out or be fault tolerant — what you really need is a scalable, fault tolerant state management solution that you can build your services around.
In this talk we will discuss how some of the popular microservices frameworks are tackling this problem, and will look at technologies available today that make it possible to build scalable, highly available systems without ‘stateless’ service layers, whether you are building microservices or good ol’ monoliths.
Online decision making over time needs interacting with an ever changing environment. And underlying machine learning models need to change and adapt to this changing environment. We discuss class of algorithms and provide details of how the computation is parallelized using the Spark framework. Our implementation follows the architectural style of the Lambda Architecture—a batch layer to process bulk data and create models, a speed layer to process incremental data and create updates to models, and a serving layer to respond to decision requests in near real time. The batch layer is implemented as a Spark application, the speed layer is a Spark Streaming application, and the serving layer is implemented using the Play Framework. Spark’s MlLib and low-level API are used for training and creating models in both the batch and speed layers.
Speedment SQL Reflector is a software solution that allows applications to get automatically updated data in real time. The SQL Reflector loads data from your existing SQL database and feeds it into an in-memory data grid e.g. GridGain. When started, the SQL reflector will load your selected existing relational data into your map cluster. Also, any subsequent changes that are made to the relational database (regardless how, via your application, script, SQL commands or even stored procedures) are then continuously fed to your GridGain nodes. Even SQL-transactions are preserved so that your maps will always reflect a valid state of the underlying SQL database.
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...DataStax
Are your read latencies not meeting your SLAs? Do you want to write SQL-99 queries against your Cassandra Data? Do you need transactions and ACID compliance?
Well, look no further! Apache Ignite can slide between your application and your Cassandra cluster, provide true in-memory performance, supply full SQL-99 support and maintain the same “Always On” availability guarantees that you have come to know and love with Cassandra.
In this session you will learn how Apache Ignite can turbocharge your Cassandra cluster without sacrificing availability guarantees. In this talk we’ll cover:
An overview of the Apache Ignite architecture
How to deploy Apache Ignite in minutes on top of Cassandra
How companies use this powerful combination to handle extreme OLTP workloads
About the Speakers
Rachel Pedreschi Principal Solutions Architect, GridGain
Rachel is Principal Solutions Architect at GridGain Systems. A ""Big Data Geek-ette,"" Rachel is no stranger to the world of high performance database systems. She is a Cassandra, Vertica, Informix and Redbrick certified DBA on top of her work with Apache Ignite and has 20 years of business intelligence and ETL tool experience. Rachel has an MBA from SFSU and a BA in Math from University of California, Santa Cruz. She loves collecting new experiences around the world!
Roommates matter particularly in college or university. Marcie Tucker, Ph.D./CEO of MyRoomsolution, summarizes the key research on belonging and how roommate relationships are key to student retention. Moreover, all roommate conflict is not the same and in today's age of personalized customer service we want a solution that fits our problem. We have a solution to support students and their development AND helps college/university staff develop a more effective intervention strategy.
This talk was given in ilJUG on the 29th of July 2014 and discusses the new Java8 StampedLock class. It compares it to different locking mechanism is Java and shows some insights deduced from a simple benchmark
Il Kettlebell: la pesistica del popolo, la forza per tutti
Emanuele Conti
http://www.calzetti-mariucci.it/shop/prodotti/il-kettlebell-la-pesistica-del-popolo-la-forza-per-tutti
Stibo Systems recently released its in-memory component for our Master Data Management (MDM) platform, giving significant speed-ups in most parts of the system. Our MDM platform provides high volume data management with many concurrent users. This in-memory component is built in-house and this talk is about how and why we did this, including:
MVCC (Multi Version Concurrency Control) aware map, off heap and compact.
- Lock-free MVCC aware indexing.
- Wait-free MVCC aware querying that goes directly on the metal.
- Clustering and MVCC with recovery support.
- Why we built our own in-memory technology, how we integrated it into our existing 200+ man years system and the speed-ups we gained.
This will help you as a developer to navigate the landscape of in-memory products and identify the trade-offs involved helping you choose the right path.
This presentation describes a intelligent IT monitoring solution that uses Nagios as source of information, Esper as the CEP engine and a PCA algorithm.
What is going on? Application Diagnostics on Azure - Copenhagen .NET User GroupMaarten Balliauw
We all like building and deploying cloud applications. But what happens once that’s done? How do we know if our application behaves like we expect it to behave? Of course, logging! But how do we get that data off of our machines? How do we sift through a bunch of seemingly meaningless diagnostics? In this session, we’ll look at how we can keep track of our Azure application using structured logging, AppInsights and AppInsights analytics to make all that data more meaningful.
What is going on - Application diagnostics on Azure - TechDays FinlandMaarten Balliauw
We all like building and deploying cloud applications. But what happens once that’s done? How do we know if our application behaves like we expect it to behave? Of course, logging! But how do we get that data off of our machines? How do we sift through a bunch of seemingly meaningless diagnostics? In this session, we’ll look at how we can keep track of our Azure application using structured logging, AppInsights and AppInsights analytics to make all that data more meaningful.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1yyaHb8.
The authors discuss Netflix's new stream processing system that supports a reactive programming model, allows auto scaling, and is capable of processing millions of messages per second. Filmed at qconsf.com.
Danny Yuan is an architect and software developer in Netflix’s Platform Engineering team. Justin Becker is Senior Software Engineer at Netflix.
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
The hidden engineering behind machine learning products at Helixa
Gianmario Spacagna, (Helixa)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
Webinar: Cutting Time, Complexity and Cost from Data Science to Productioniguazio
Imagine a system where one collects real-time data, develops a machine learning model… Runs analysis and training on powerful GPUs… Clicks on a magic button and then deploys code and ML models to production… All without any heavy lifting from data and DevOps engineers. Today, data scientists work on laptops with just a subset of data and time is wasted while waiting for data and compute.
It’s about efficient use of time! Join Iguazio and NVIDIA so that you can get home early today! Learn how to speed up data science from development to production:
- Access to large scale, real-time and operational data without waiting for ETL
- Run high performance analytics and ML on NVIDIA GPUs (Rapids)
- Work on a shared, pre-integrated Kubernetes cluster with - - Jupyter notebook and leading data science tools
- One-click (really!) deployment to production
Speakers: Yaron Haviv, CTO at Iguazio, Or Zilberman, Data Scientist at Iguazio and Jacci Cenci, Sr. Technical Marketing Engineer at NVIDIA
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Amazon Web Services
Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. In this session, we first present an end-to-end streaming data solution using Amazon Kinesis Data Streams for data ingestion, Amazon Kinesis Data Analytics for real-time processing, and Amazon Kinesis Data Firehose for persistence. We review in detail how to write SQL queries for operational monitoring using Kinesis Data Analytics.
Learn how PNNL is building their ingestion flow into their Serverless Data Lake leveraging the Kinesis Platform. At times migrating existing NiFi Processes where applicable to various parts of the Kinesis Platform, replacing complex flows on Nifi to bundle and compress the data with Kinesis Firehose, leveraging Kinesis Streams for their enrichment and transformation pipelines, and using Kinesis Analytics to Filter, Aggregate, and detect anomalies.
Informix Spark Streaming is an extension of Informix that allows data to be streamed out of the database as soon as it is inserted, updated, or deleted.
The protocol currently used to stream the changes is MQTT v3.1.1 (older versions not supported!). This extension is able to stream data to any MQTT broker where it can be processed or passed on to subscribing clients for processing.
Ingesting streaming data for analysis in apache ignite (stream sets theme)Tom Diederich
Apache Ignite provides a distributed platform for a wide variety of workloads, but often the issue is simply in getting data into the database in the first place. The wide variety of data sources and formats presents a challenge to any data engineer; in addition, 'data drift', the constant and inevitable mutation of the incoming data's structure and semantics, can break even the most well-engineered integration.
This session, aimed at data architects, data engineers and developers, will explore how we can use the open source StreamSets Data Collector to build robust data pipelines. Attendees will learn how to collect data from cloud platforms such as Amazon and Salesforce, devices, relational databases and other sources, continuously stream it to Ignite, and then use features such as Ignite's continuous queries to perform streaming analysis.
We'll start by covering the basics of reading files from disk, move on to relational databases, then look at more challenging sources such as APIs and message queues. You will learn how to:
* Build data pipelines to ingest a wide variety of data into Apache Ignite
* Anticipate and manage data drift to ensure that data keeps flowing
* Perform simple and complex ad-hoc queries in Ignite via SQL
* Write applications using Ignite to run continuous queries, combining data from multiple sources
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are available for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
EDA Meets Data Engineering – What's the Big Deal?confluent
Presenter: Guru Sattanathan, Systems Engineer, Confluent
Event-driven architectures have been around for many years, much like Apache Kafka®, which first open sourced in 2011. The reality is that the true potential of Kafka is only being realised now. Kafka is becoming the central nervous system of many of today’s enterprises. It is bringing a profound paradigm shift to the way we think about enterprise IT. What has changed in Kafka to enable this paradigm shift? Is it not just a message broker, and how are enterprises using it today? This session will explore these key questions.
Sydney: https://content.deloitte.com.au/20200221-tel-event-tech-community-syd-registration
Melbourne: https://content.deloitte.com.au/20200221-tel-event-tech-community-mel-registration
Webinar - Big Data: Let's SMACK - Jorg SchadCodemotion
For many use cases such as fraud detection or reacting on sensor data the response times of traditional batch processing are simply to slow. In order to be able to react to such events close to real-time, we need to go beyond the classical batch processing and utilize stream processing systems such as Apache Spark Streaming, Apache Flink, or Apache Storm. But these systems are not sufficient by itself. One common example for such fast data pipelines is the SMACK stack using Apache Spark, Mesos, Kafka, Akka, Cassandra, Kafka.
Measure anything, measure everything.
Effortless monitoring with Statsd, Collectd and Graphite can increase software development productivity and quality at the same time.
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Amazon Web Services
"Low latency analytics is becoming a very popular scenario. In this session we will discuss several architectural options for doing
analytics on moving data using Amazon Kinesis and EMR/Spark Streaming and share some best practices and real world examples."
Similar to IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on Apache Ignite (20)
As the dangers of global climate change multiply, utility companies seek methods to reduce carbon emissions, such as integrating renewable and sustainable energy sources like wind, solar, and hydroelectric power. Renewable energy not only has the power to improve climate conditions, it also encourages economic growth. By combining advances in sensor technology with machine learning algorithms and environmental data, utility companies can monitor energy sources in real time to make faster decisions and speed innovation.
In this session, Nikita Shamgunov, CTO and co-founder of MemSQL, will conduct a live demonstration based on real-time data from 2 million sensors on 197,000 wind turbines installed on wind farms around the world. This Internet of Things (IoT) simulation explores the ways utility companies can integrate new data pipelines into established infrastructure. Attendees will learn how to deploy this breakthrough technology composed of Apache Kafka, a real-time message queue; Streamliner, an integrated Apache Spark solution; MemSQL Ops, a cluster management and monitoring interface; and a set of simulated data producers written in Python. By applying machine learning to analyze millions of data points in real time, the data pipeline predicts and visualizes health of wind farms at global scale. This architecture propels innovation in the energy industry and is replicable across other IoT applications including smart cities, connected cars, and digital healthcare.
PipelineDB is an open-source relational database that runs SQL queries continuously on streaming data, incrementally storing results in tables. Our talk will include an overview of PipelineDB’s architecture, the use cases for continuous SQL queries on streams, user case studies, and outline how PipelineDB can used to easily build scalable and highly available streaming and realtime analytics applications using only SQL with no external dependencies.
Much industry focus is on All-Flash Arrays with traditional databases, but new databases using native direct-attached Flash have proven reliable, performant, and popular for operational use cases. Today, these operational databases store account information for banking and retail applications, real-time routing information for telecoms, and user profiles for advertising; they also support machine learning for applications in the financial industry, such as fraud detection. While proprietary PCIe and “wide SATA” had previously been popular, NVMe has finally come into operational use. Aerospike will discuss the benefits of NVMe for these use cases (including specific configurations and performance numbers), as well as the architectural implications of low-latency Flash and Storage Class Memory.
Do you need to move enterprise database information into a Data Lake in real time, and keep it current? Or maybe you need to track real-time customer actions in order to engage them while they are still accessible. Perhaps you have been tasked with ingesting and processing large amounts of IoT data.
While everyone is talking about ‘stateless’ services as a way to achieve scalability and high availability, the truth is that they are about as real as the unicorns. Building applications and services that way simply pushes the problem further down the stack, which only makes it worse and more difficult to solve (although, on the upside, it might make it somebody else’s problem). This is painfully obvious when building microservices, where each service must truly own its state.
The reality is that you don’t need ‘stateless’ services to either scale out or be fault tolerant — what you really need is a scalable, fault tolerant state management solution that you can build your services around.
In this talk we will discuss how some of the popular microservices frameworks are tackling this problem, and will look at technologies available today that make it possible to build scalable, highly available systems without ‘stateless’ service layers, whether you are building microservices or good ol’ monoliths.
Caching is a frequently used and misused technique for speeding up performance, off-loading non-scalable or expensive infrastructure, scaling systems and coping with large processing peaks. In this talk Greg introduces you to the theory of caching and highlights key things to keep in mind when you apply caching. Then we take a comprehensive look at how the JCache standard standardises Java usage of caching.
With tremendous growth in big data, low latency and high throughput is the key ask for many big data application. The in-memory technology market is growing rapidly. We see that traditional database vendors are extending their platform to support in-memory capability and others are offering in-memory data grid and NoSQL solutions for high performance and scalability. In this talk, we will share our point of view on In-Memory Data Grid and NoSQL technology. It is all about how to build architecture that meets low latency and high throughput requirements. We will share our thoughts and experiences in implementing the use cases that demands low latency & high throughput with inherent scale-out features.
You will learn about how in-memory data grid and NoSQL is used to meet the low latency and high throughput needs and choosing in-memory technology that is good fit for your use case.
In-memory data grids (IMDGs) are widely used as distributed, key-value stores for serialized objects, providing fast data access, location transparency, scalability, and high availability. With its support for built-in data structures, such as hashed sets and lists, Redis has demonstrated the value of enhancing standard create/read/update/delete (CRUD) APIs to provide extended functionality and performance gains. This talk describes new techniques which can be used to generalize this concept and enable the straightforward creation of arbitrary, user-defined data structures both within single objects and sharded across the IMDG.
A key challenge for IMDGs is to minimize network traffic when accessing and updating stored data. Standard CRUD APIs place the burden of implementing data structures on the client and require that full objects move between client and server on every operation. In contrast, implementing data structures within the server streamlines communication since only incremental changes to stored objects or requested subsets of this data need to be transferred. However, building extended data structures within IMDG servers creates several challenges, including, how to extend this mechanism, how to efficiently implement data-parallel operations spanning multiple shards, and how to protect the IMDG from errors in user-defined extensions.
This talk will describe two techniques which enable IMDGs to be extended to implement user-defined data structures. One technique, called single method invocation (SMI), allows users to define a class which implements a user-defined data structure stored as an IMDG object and then remotely execute a set of class methods within the IMDG. This enables IMDG clients to pass parameters to the IMDG and receive a result from method execution.
A second technique, called parallel method invocation (PMI), extends this approach to execute a method in parallel on multiple objects sharded across IMDG servers. PMI also provides an efficient mechanism for combining the results of method execution and returning a single result to the invoking client. In contrast to client-based techniques, this combining mechanism is integrated into the IMDG and completes in O(logN) time, where N is the number of IMDG servers.
The talk will describe how user-defined data structures can be implemented within the IMDG to run in a separate process (e.g., a JVM) to ensure that execution errors do not impair the stability of the IMDG. It will examine the associated performance trade-offs and techniques that can be used to minimize overhead.
Lastly, the talk will describe how popular Redis data structures, such as hashed sets, can be implemented as a user-defined data structure using SMI and then extended using both SMI and PMI to build a scalable hashed set that spans multiple shards. It will also examine other examples of user-defined data structures that can be built using these techniques.
Non-Volatile DIMMs, or NVDIMMs, have emerged as a go-to technology for boosting performance for next generation storage platforms. The standardization efforts around NVDIMMs have paved the way to simple, plug-n-play adoption. This session will highlight the state of NVDIMMs today and give a glimpse into the future – what customers, storage developers, and the industry would like to see to fully unlock the potential of NVDIMMs.
Long gone are the halcyon days of multi-year big-budget waterfall-style projects serviced by traditional relational databases, overnight batch processing and monthly reports.
Today’s world is about immediate and global access to vast oceans of information, insights derived from the analysis of floods of data and the ability to quickly and effectively move to market and deliver value to our customers.
Financial services operate in this increasingly complex and demanding environment with escalating demands on security, performance and scalability. Regulatory requirements abound from a proliferation of financial regulatory authorities like the Financial Conduct Authority (FCA), Prudential Regulatory Authority (PRA) and Bank of England in the UK, the Security and Exchange Commission (SEC) and Federal Reserve (Fed) in the US and dozens more in the countries that we operate in around the world, spawning regulatory jargon like SOX, MAS, SCAP, and BASEL III.
In this talk we’ll look at the evolution of In-Memory Computing in Financial services and how it adds to and helps to address the challenges faced by large scale banking enterprises. We’ll also look a little bit ahead at emerging technologies and discuss the opportunities and challenges that they present.
In his keynote, Jason looks at some of the challenges that in-memory approaches to data and data processing are helping to overcome today. But he also gazes into his crystal ball to ask what the future holds for in-memory computing. Specifically, how will in-memory approaches change in the next three to five years, as it is increasingly relied upon to support the emerging Internet of Things? Jason also briefly looks at some of the other nascent technologies that are likely to be used in parallel with in-memory computing, and he wraps up by asking what kind of role in-memory is likely to play in related areas such as cloud computing and edge analytics.
In-memory computing is a reality. So are the limits of memory capacity. Data size constantly increases, while application developers and IT staff push for in-memory efficiencies; the conclusion is inevitable: we need to be able to access more memory than the DRAM capacity that the server provides. ScaleMP’s Software Defined Memory (SDM) technology allows for more system memory to be available per server, far beyond the hardware limits, by utilizing memory from other nodes (over fabric) or from locally installed non-volatile memory (NVM) such as NAND Flash or 3D XPoint – transparently and without any changes to operating system or applications. We shall present the benefits of SDM, discuss the relevant use-cases, and share performance data.
Organizations increasingly require real-time, highly scalable computing platforms in industries such as financial services, telecommunications, retail, SaaS, and IoT. This has spurred rapid advancements in chips, servers, storage and software for in-memory computing. We will review the state of in-memory computing today and offer some thoughts on where it is headed tomorrow as companies strive to create real-time, massively scalable Fast Data solutions.
Yesterday's thinking may still believe NVMe (NVM Express) is in transition to a production ready solution. In this session, we will discuss how the evolution of NVMe is ready for production, the history and evolution of NVMe and the Linux stack to address where NVMe has progressed today to become the low latency, highly reliable database key value store mechanism that will drive the future of cloud expansion. Examples of protocol efficiencies and types of storage engines that are optimizing for NVMe will be discussed. Please join us for an exciting session where in-memory computing and persistence have evolved.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
5. WHAT IS STREAMING?
Most commonly, streaming refers to processing unbounded data sets as they arrive to achieve lower
latency and therefore more timely results.
If you haven’t already, read these helpful posts that clarify the terms, techniques, and design patterns:
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
6. WHAT IS CEP?
Complex event processing, or CEP, is event processing that combines data from multiple sources to
infer events or patterns that suggest more complicated circumstances. The goal of complex event
processing is to identify meaningful events (such as opportunities or threats) and respond to them as
quickly as possible (https://en.wikipedia.org/wiki/Complex_event_processing)
7. IN THE APACHE IGNITE CONTEXT
Apache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory
platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster
than possible with traditional disk-based or flash technologies.
8. APACHE IGNITE STREAMING
Primarily a high performance means of
inserting unbounded data sets into the Ignite
Data Grid (cache) using IgniteDataStreamer
API
StreamReceiver API offers custom pre-
processing
Other data processing through queries
(including continuous queries) and cache
policies
Backed by all kinds of Ignite goodness:
Scalable
Fault-tolerant
High throughput
Streaming functionality atop a convergent
data platform – the future is bright!
9. IGNITE DATA STREAMER API
IgniteData
Streamer
MQTT
Streamer
Kafka
Streamer
Camel
Streamer
Stream
Receiver
Stream
Transformer
Stream
Visitor
JMS Data
Streamer
Other...
10. IGNITE DATA STREAMER API
IgniteDataStreamer API is the basic building block to writing unbounded data to Ignite
Scalable
Fault-tolerant
At-least-once-guarantee (watch out for duplicate data)
Buffers data and writes in batches (may introduce unwanted latency, set perNodeBufferSize() and
autoFlushFrequency() accordingly)
11. STREAM RECEIVER API
StreamReceiver API allows you to add custom, collocated pre-processing of the streaming data prior to
putting it into the cache.
Does not put data into the cache automatically, you need to handle that during processing
Single receiver per IgniteDataStreamer
Two out of the box implementation of StreamReceiver
StreamTransformer updates data in the stream cache based on its previous value
StreamVisitor visits every key-value tuple in the stream
Might be possible to implement watermark, trigger, accumulation patterns (depending on use case, see
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102)
12. WINDOWING
Achieved through cache eviction and expiry policies
Use eviction policies for size/batch based
Consider SortedEvictionPolicy with custom comparator for “x most recent events”
Use expiry policies for time based
Consider notion of event time, ingestion time, and processing time
CreatedExpiryPolicy is ingestion time based
What if data is delayed?
Consider a custom expiry policy based on event time
13. QUERYING
All Ignite data indexing capabilities as well as Ignite SQL, TEXT, and Predicate based cache queries are
available (it’s just another cache after all)
Leverage continuous queries to filter events on the node and receive real-time notifications that match
your criteria
Another option to implement watermark, trigger, and accumulation patterns
This is where the complex event processing (CEP) magic happens leveraging distributed joins and
cross-cache joins
15. A SIMPLE IOT USE CASE
Monitor productivity on manufacturing lines
Sensors stream number of items per second through IgniteDataStreamer
Data is retained in the cache for 60 seconds (windowing)
Dashboard shows number of items per minute for each active line and the total items
per minute for the entire factory
1
2
n
Ignite Data
Streamer
Ignite Cache
3
Dashboard