With most machine learning (ML) and deep learning (DL) frameworks, it can take hours to move data, and hours to train models. It's also hard to scale, with data sets increasingly being larger than the capacity of any single server. The size of the data also makes it hard to incrementally test and retrain models in near real-time to improve results. Learn how Apache Ignite and GridGain help to address these limitations with model training and execution, and help achieve near-real-time, continuous learning. It will be explained how ML/DL work with Apache Ignite, and how to get started.
Topics include:
— Overview of distributed ML/DL including design, implementation, usage patterns, pros and consn
— Overview of Apache Ignite ML/DL, including prebuilt ML/DL, and how to add your own ML/DL algorithms
— Model execution with Apache Ignite, including how to build models with Apache Spark and deploy them in Ignite
— How Apache Ignite and TensorFlow can be used together to build distributed DL model training and execution
Fraud prevention. A bank has developed a historical model of what indicates a loan application is likely fraudulent, but as the system ingests new credit applications the system continually updates the machine learning model based on the new data to identify in real-time any emerging trends that might indicate a new concerted effort to acquire credit fraudulently. Any related fraudulent activity can then be immediately identified.
Ecommerce recommendations. Online shopping recommendation engines are based on historical data such as web page visits and purchase patterns, but they are far more powerful – and deliver an increased ROI – if they incorporate real-time continuous learning. Incorporating the latest web page information, referral information, and purchase patterns into the machine learning model can result in real-time improvements to the recommendation engine model, resulting in improved recommendations based on the latest data available.
The GridGain Platform
GridGain is a memory-centric data platform that is used to build fast, scalable & resilient solutions.
At the heart of the GridGain platform lies a distributed memory-centric data storage platform with ACID semantics, and powerful processing APIs including SQL, Compute, Key/Value and transactions. Built with a memory-centric approach, this enables GridGain to leverage memory for high throughput and low latency whilst utilising local disk or SSD to provide durability and fast recovery.
GridGain platform can be integrated with third-party databases and external storage mediums and can be deployed on any infrastructure. It provides linear scalability, built-in fault tolerance, comprehensive security and auditing alongside advanced monitoring & management.
The GridGain platform caters for a range of use cases including: Core banking services, Real-time product pricing, reconciliation and risk calculation engines, analytics and machine learning.
* Architectural simplification
Apache Ignite incorporates distributed SQL database capabilities as a part of its platform. The database is horizontally scalable, fault tolerant and SQL ANSI-99 compliant. It supports all SQL, DDL, and DML commands including SELECT, UPDATE, INSERT, MERGE, and DELETE queries. It also provides support for a subset of DDL commands relevant for distributed databases.
Data sets as well as indexes can be stored both in RAM and on disk thanks to the durable memory architecture. This allows executing distributed SQL operations across different memory layers achieving in-memory performance with durability of disk.
You can interact with Apache Ignite using SQL language via natively developed APIs for Java, .NET and C++, or via the Ignite JDBC or ODBC drivers. This provides a true cross-platform connectivity from languages such as PHP, Ruby and more.
Also you could await that your model is perfect. Calculate the classification metric, accuracy for example to evaluate the quality of model.
Apache Ignite memory-centric platform is based on an in-memory architecture that allows storing and processing data and indexes both in memory and on disk when the Ignite Persistent Store feature is enabled. The memory architecture helps achieve in-memory performance with durability of disk using all the available resources of the cluster.The GridGain in-memory data store is built and operates in a way similar to the Virtual Memory of operating systems such as Linux. However, one significant difference between these two types of architectures is that Durable Memory always keeps the whole data set and indexes on disk if the Ignite Persistent Store is used, while Virtual Memory uses the disk for swapping purposes only.
In-Memory
• Off-Heap memory
• Removes noticeable GC pauses
• Automatic Defragmentation
• Predictable memory consumption
• Boosts SQL performance
On Disk
• Optional Persistence
• Support of flash, SSD, Intel 3D Xpoint
• Stores superset of data
• Fully Transactional
◦ Write-Ahead-Log (WAL)
• Instantaneous Cluster Restarts
Abstraction layer on top of Ignite storage and computation
MapReduce using Compute Grid
Partition data
Can be recovered from another node
Partition context
ML algorithms are iterative and require context
Part of the reason behind our growth is the growth of Apache Ignite.
HAVE YOU HEARD OF APACHE IGNITE?
GridGain Systems donated the code to the Apache Ignite project in late 2014. It became a top level project of the Apache Software Foundation (ASF) in mid 2015, the second fastest to do so. Apache Ignite is now one of the top 5 Apache Software Foundation projects, and has been for the last 2 years now. While we continue to be the leading contributor, though there are several others.
With over 4 million total downloads, Ignite has reached a 2 million download-a-year run rate.
[1] http://globenewswire.com/news-release/2019/07/09/1534470/0/en/The-Apache-Software-Foundation-Announces-Annual-Report-for-2019-Fiscal-Year.html
2018 numbers [2] https://blogs.apache.org/foundation/entry/apache-in-2018-by-the
2017 numbers [3] https://blogs.apache.org/foundation/entry/apache-in-2017-by-the
Today there are hundreds of leading companies that rely on GridGain to support their mission-critical applications. While GridGain started in Financial Services, today that is about 25% of its total business …
USE THIS OPPORTUNITY TO TELL SOME OF THE RELEVENT STORIES.
It is used by FinTech and SaaS companies to add speed and scale, usually to support the larger customers as they adopt the FinTech/SaaS technologies.
In FinTech, Finastra, which supports 48 out of the 50 top banks worldwide, adopted GridGain for their Cloud platform to add the speed and scale needed for their offerings and to support FRTB real-time regulatory requirements.
In SaaS Microsoft Azure uses GridGain for real-time attack prevention as part of their identity services for all customer applications on Azure.
In telco, all of RingCentral’s VOIP relies on GridGain for storing all call/service sessions and making sure connections continue even as calls connect through different datacenters.
In IoT, Itron supports hundreds of millions of smartmeters globally and relies on GridGain for real-time data ingestion at scale. They adopted GridGain at first to support their larger customers.
American Airlines uses GridGain for real-time rerouting of customers and their luggage as they land
Multiplan uses GridGain to better manage healthcare costs at scale.