Gearpump is a Akka based realtime streaming engine, it use Actor to model everything. It has super performance and flexibility. It has performance of 18000000 messages/second and latency of 8ms on a cluster of 4 machines.
Deep learning at supercomputing scale by Rangan Sukumar from CrayBill Liu
Presented at AI NEXTCon Seattle 1/17-20, 2018
http://aisea18.xnextcon.com
join our free online AI group with 50,000+ tech engineers to learn and practice AI technology, including: latest AI news, tech articles/blogs, tech talks, tutorial videos, and hands-on workshop/codelabs, on machine learning, deep learning, data science, etc..
Real time machine learning visualization with spark -- Hadoop Summit 2016Chester Chen
This document discusses enabling real-time machine learning visualization with Spark. It presents a callback interface for Spark ML algorithms to send messages during training and a task channel to deliver messages from the Spark driver to a client. The messages are pushed to a browser using server-sent events and HTTP chunked responses. This allows visualizing training metrics, determining early stopping points, and monitoring algorithm convergence in real time.
Big Data and High Performance Computing Solutions in the AWS CloudAmazon Web Services
Managing big data and running supercomputing jobs used to be for only well-funded research organizations and large corporations, but not any longer. AWS has democratized supercomputing and big data for the masses! AWS can provide you with the 64th fastest supercomputer in the world, on-demand and pay as you go. Hear from Ben Butler, Head of AWS Big Data Marketing, to learn how our customers are using big data and high performance computing to change the world. Not only is AWS technology available to everyone, but it is self-service and cheaper than ever before, featuring innovative technology and flexible pricing models – our AWS cloud computing platform has disrupted big data and HPC. Learn from customer successes, as Ben shares real-world case studies describing the specific big data and high performance computing challenges being solved on AWS. We will conclude with a discussion around the tutorials, public datasets, test drives, and our grants program - all of the tools needed to get you started quickly.
Workload-Aware: Auto-Scaling A new paradigm for Big Data WorkloadsVasu S
Learn more about Workload-Aware-Auto-Scaling-- an alternative architectural approach to Auto-Scaling that is better suited for the Cloud and applications like Hadoop, Spark, and Presto.
qubole.com/resources/white-papers/workload-aware-auto-scaling-qubole
Amazon EC2 provides resizable compute capacity in the cloud and makes web scale computing easier for customers. It offers a wide variety of compute instances is well suited to every imaginable use case, from static websites to high performance supercomputing on-demand, all available via highly flexible pricing options. This session covers the latest EC2 features and capabilities, including new instance families available in Amazon EC2, the differences among their hardware types and capabilities, and their optimal use cases. We also will cover some best practices on how you can optimize your spend on EC2 to make the most of your EC2 instances, saving time and money.
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...Amazon Web Services
"Running high-performance scientific and engineering applications is challenging no matter where you do it. Join IT executives from Hitachi Global Storage Technology, The Aerospace Corporation, Novartis, and Cycle Computing and learn how they have used the AWS cloud to deploy mission-critical HPC workloads.
Cycle Computing leads the session on how organizations of any scale can run HPC workloads on AWS. Hitachi Global Storage Technology discusses experiences using the cloud to create next-generation hard drives. The Aerospace Corporation provides perspectives on running MPI and other simulations, and offer insights into considerations like security while running rocket science on the cloud. Novartis Institutes for Biomedical Research talks about a scientific computing environment to do performance benchmark workloads and large HPC clusters, including a 30,000-core environment for research in the fight against cancer, using the Cancer Genome Atlas (TCGA)."
In this sessions, learn the basics of HPC on AWS and watch a demonstration on how to launch a large cluster.
Presenter: Scott Eberhardt, Specialist Solutions Architect, AWS UK
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
Eventing and streaming open a world of compelling new possibilities to our software and platform designs. They can reduce time to decision and action while lowering total platform cost. But they are not a panacea. Understanding the edges and limits of these architectures can help you avoid painful missteps. This talk will focus on event driven and streaming architectures and how Apache Kafka can help you implement these. It will also discuss key tradeoffs you will face along the way from partitioning schemes to the impact of availability vs. consistency (CAP Theorem). Finally we’ll discuss some challenges of scale for patterns like Event Sourcing and how you can use other tools and even features of Kafka to work around them. This talk assumes a basic understanding of Kafka and distributed computing, but will include brief refresher sections.
Deep learning at supercomputing scale by Rangan Sukumar from CrayBill Liu
Presented at AI NEXTCon Seattle 1/17-20, 2018
http://aisea18.xnextcon.com
join our free online AI group with 50,000+ tech engineers to learn and practice AI technology, including: latest AI news, tech articles/blogs, tech talks, tutorial videos, and hands-on workshop/codelabs, on machine learning, deep learning, data science, etc..
Real time machine learning visualization with spark -- Hadoop Summit 2016Chester Chen
This document discusses enabling real-time machine learning visualization with Spark. It presents a callback interface for Spark ML algorithms to send messages during training and a task channel to deliver messages from the Spark driver to a client. The messages are pushed to a browser using server-sent events and HTTP chunked responses. This allows visualizing training metrics, determining early stopping points, and monitoring algorithm convergence in real time.
Big Data and High Performance Computing Solutions in the AWS CloudAmazon Web Services
Managing big data and running supercomputing jobs used to be for only well-funded research organizations and large corporations, but not any longer. AWS has democratized supercomputing and big data for the masses! AWS can provide you with the 64th fastest supercomputer in the world, on-demand and pay as you go. Hear from Ben Butler, Head of AWS Big Data Marketing, to learn how our customers are using big data and high performance computing to change the world. Not only is AWS technology available to everyone, but it is self-service and cheaper than ever before, featuring innovative technology and flexible pricing models – our AWS cloud computing platform has disrupted big data and HPC. Learn from customer successes, as Ben shares real-world case studies describing the specific big data and high performance computing challenges being solved on AWS. We will conclude with a discussion around the tutorials, public datasets, test drives, and our grants program - all of the tools needed to get you started quickly.
Workload-Aware: Auto-Scaling A new paradigm for Big Data WorkloadsVasu S
Learn more about Workload-Aware-Auto-Scaling-- an alternative architectural approach to Auto-Scaling that is better suited for the Cloud and applications like Hadoop, Spark, and Presto.
qubole.com/resources/white-papers/workload-aware-auto-scaling-qubole
Amazon EC2 provides resizable compute capacity in the cloud and makes web scale computing easier for customers. It offers a wide variety of compute instances is well suited to every imaginable use case, from static websites to high performance supercomputing on-demand, all available via highly flexible pricing options. This session covers the latest EC2 features and capabilities, including new instance families available in Amazon EC2, the differences among their hardware types and capabilities, and their optimal use cases. We also will cover some best practices on how you can optimize your spend on EC2 to make the most of your EC2 instances, saving time and money.
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...Amazon Web Services
"Running high-performance scientific and engineering applications is challenging no matter where you do it. Join IT executives from Hitachi Global Storage Technology, The Aerospace Corporation, Novartis, and Cycle Computing and learn how they have used the AWS cloud to deploy mission-critical HPC workloads.
Cycle Computing leads the session on how organizations of any scale can run HPC workloads on AWS. Hitachi Global Storage Technology discusses experiences using the cloud to create next-generation hard drives. The Aerospace Corporation provides perspectives on running MPI and other simulations, and offer insights into considerations like security while running rocket science on the cloud. Novartis Institutes for Biomedical Research talks about a scientific computing environment to do performance benchmark workloads and large HPC clusters, including a 30,000-core environment for research in the fight against cancer, using the Cancer Genome Atlas (TCGA)."
In this sessions, learn the basics of HPC on AWS and watch a demonstration on how to launch a large cluster.
Presenter: Scott Eberhardt, Specialist Solutions Architect, AWS UK
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
Eventing and streaming open a world of compelling new possibilities to our software and platform designs. They can reduce time to decision and action while lowering total platform cost. But they are not a panacea. Understanding the edges and limits of these architectures can help you avoid painful missteps. This talk will focus on event driven and streaming architectures and how Apache Kafka can help you implement these. It will also discuss key tradeoffs you will face along the way from partitioning schemes to the impact of availability vs. consistency (CAP Theorem). Finally we’ll discuss some challenges of scale for patterns like Event Sourcing and how you can use other tools and even features of Kafka to work around them. This talk assumes a basic understanding of Kafka and distributed computing, but will include brief refresher sections.
Drug discovery at 2x speed. Faster, more comprehensive testing approval processes. Identifying gene targets in massive sequencing data sets. These goals are ambitious yet attainable, but not without increasing the computational capabilities of today's researchers. While everyone agrees that simply deploying more infrastructure is not the answer, running that work in the cloud is not without challenges. In this talk we will discuss and illustrate elements of those workloads that Cycle Computing's customers have run on AWS, generating vastly better results than would have been attained on traditional infrastructure. We will cover some common problems they encountered, and how they resolved them using Amazon EC2, S3, Glacier, and Cycle's software.
Presenters: Dougal Ballantyne, Business Development, AWS; Rob Futrick, CTO, Cycle Computing
Those who out-compute can many times out-compete. The cloud gives you access to a massive amount of compute power when you need it. This talk will present an introduction to HPC in the cloud, including, the benefits of HPC in the cloud, how to get started, some tools to use, and how you can manage data. We will showcase several examples of HPC in the cloud by a number of public sector and commercial customers.
Created by: Dr. Jeff Layton, Principal, Solutions Architect
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
Leveraging big data and high performance computing (HPC) solutions enables your organization to make smarter and faster decisions that influence strategy, increase productivity, and ultimately grow your business. We kick off the Big Data and HPC track with the latest advancements in data analytics, databases, storage, and HPC at AWS. Hear customer success stories and discover how to put data to work in your own organization.
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...confluent
1) The document discusses the art of building event streaming applications using various techniques like bounded contexts, stream processors, and architectural pillars.
2) Key aspects include modeling the application as a collection of loosely coupled bounded contexts, handling state using Kafka Streams, and building reusable stream processing patterns for instrumentation.
3) Composition patterns involve choreographing and orchestrating interactions between bounded contexts to capture business workflows and functions as event-driven data flows.
Machine learning in the physical world by Kip Larson from AWS IoTBill Liu
Presented at AI NEXTCon Seattle 1/17-20, 2018
http://aisea18.xnextcon.com
join our free online AI group with 50,000+ tech engineers to learn and practice AI technology, including: latest AI news, tech articles/blogs, tech talks, tutorial videos, and hands-on workshop/codelabs, on machine learning, deep learning, data science, etc..
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...StreamNative
More and more developer want to build cloud-native distributed application or microservices by making use of high performing, cloud-agnostic messaging technology for maximum decoupling. The only thing we do not want is the hassle of managing the complex message infrasturcture needed for the job, or the risk of getting into a vendor lock-in. Generally developers know Apache Kafka, but for event sourcing or the CQRS pattern Kafka is not really suitable. In this talk I will give you at least ten reasons why to choose Pulsar over Kafka for event sourcing and data consensus.
Visit http:aws.amazon.com/hpc for more information about HPC on AWS.
High Performance Computing (HPC) allows scientists and engineers to solve complex science, engineering, and business problems using applications that require high bandwidth, low latency networking, and very high compute capabilities. AWS allows you to increase the speed of research by running high performance computing in the cloud and to reduce costs by providing Cluster Compute or Cluster GPU servers on-demand without large capital investments. You have access to a full-bisection, high bandwidth network for tightly-coupled, IO-intensive workloads, which enables you to scale out across thousands of cores for throughput-oriented applications.
(CMP202) Engineering Simulation and Analysis in the CloudAmazon Web Services
"Building great products, ones that are aesthetically appealing as well as functionally sound, requires cutting-edge design and engineering. Given the high cost of physical testing prototypes, engineering organizations are turning to simulation and analysis using digital models, but compute requirements for these have traditionally required expensive on-premises infrastructure. But now, engineering organizations can use high-performance computing services from AWS and solutions from AWS technology partners to innovate at scale globally, with no up-front capital infrastructure investment.
In this session, AWS Partner Ansys shares how they help customers of all sizes design and engineer better products through digital simulation and analysis using HPC on AWS."
More info: https://cnfl.io/cloud-native-experience-for-kafka-in-cloud | Neha Narkhede is co-founder and CTO at Confluent, a company backing the popular Apache Kafka messaging system. Prior to founding Confluent, Neha led streams infrastructure at LinkedIn, where she was responsible for LinkedIn’s streaming infrastructure built on top of Apache Kafka and Apache Samza. She is one of the initial authors of Apache Kafka and a committer and PMC member on the project.
Apache Kafka® and Analytics in a Connected IoT Worldconfluent
Apache Kafka® and Analytics in a Connected IoT World, Kai Waehner, Sr. Solutions Engineer Advanced Technology Group, Confluent
https://www.meetup.com/Berlin-Apache-Kafka-Meetup-by-Confluent/events/273166575/
Serverless Streaming Architectures and Algorithms for the EnterpriseArun Kejariwal
In recent years, serverless has gained momentum in the realm of cloud computing. Broadly speaking, it comprises function as a service (FaaS) and backend as a service (BaaS). The distinction between the two is that under FaaS, one writes and maintains the code (e.g., the functions) for serverless compute; in contrast, under BaaS, the platform provides the functionality and manages the operational complexity behind it. Serverless provides a great means to boost development velocity. With greatly reduced infrastructure costs, more agile and focused teams, and faster time to market, enterprises are increasingly adopting serverless approaches to gain a key advantage over their competitors.
Example early use cases of serverless include, for example, data transformation in batch and ETL scenarios and data processing using MapReduce patterns. As a natural extension, serverless is being used in the streaming context such as, but not limited to, real-time bidding, fraud detection, intrusion detection. Serverless is, arguably, naturally suited to extracting insights from fast data, that is, high-volume, high-velocity data. Example tasks in this regard include filtering and reducing noise in the data and leveraging machine learning and deep learning models to provide continuous insights about business operations.
We walk the audience through the landscape of streaming systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage. We overview the inception and growth of the serverless paradigm. Further, we deep dive into Apache Pulsar, which provides native serverless support in the form of Pulsar functions, and paint a bird’s-eye view of the application domains where Pulsar functions can be leveraged.
Baking in intelligence in a serverless flow is paramount from a business perspective. To this end, we detail different serverless patterns—event processing, machine learning, and analytics—for different use cases and highlight the trade-offs. We present perspectives on how advances in hardware technology and the emergence of new applications will impact the evolution of serverless streaming architectures and algorithms. The topics covered include an introduction to st
reaming, an introduction to serverless, serverless and streaming requirements, Apache Pulsar, application domains, serverless event processing patterns, serverless machine learning patterns, and serverless analytics patterns.
Machine learning at scale by Amy Unruh from GoogleBill Liu
Presented at AI NEXTCon Seattle 1/17-20, 2018
http://aisea18.xnextcon.com
join our free online AI group with 50,000+ tech engineers to learn and practice AI technology, including: latest AI news, tech articles/blogs, tech talks, tutorial videos, and hands-on workshop/codelabs, on machine learning, deep learning, data science, etc..
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
Morningstar’s Risk Model project is created by stitching together statistical and machine learning models to produce risk and performance metrics for millions of financial securities. Previously, we were running a single version of this application, but needed to expand it to allow for customizations based on client demand. With the goal of running hundreds of custom Risk Model runs at once at an output size of around 1TB of data each, we had a challenging technical problem on our hands! In this presentation, we’ll talk about the challenges we faced replatforming this application to Spark, how we solved them, and the benefits we saw.
Some things we’ll touch on include how we created customized models, the architecture of our machine learning application, how we maintain an audit trail of data transformations (for rigorous third party audits), and how we validate the input data our model takes in and output data our model produces. We want the attendees to walk away with some key ideas of what worked for us when productizing a large scale machine learning platform.
This document provides an overview of Amazon Web Services (AWS) and its capabilities. It describes AWS's global consumer and seller businesses as well as its cloud infrastructure business. It then discusses why researchers love using AWS due to benefits like time to science, global accessibility, low costs, security, and elasticity. Popular high-performance computing workloads on AWS are also listed.
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...HostedbyConfluent
In a typical deployment of Kafka with many topics and partitions, scaling the Kafka consumer efficiently is one of the important tasks in maintaining overall smooth Kafka operations. The traditional Kubernetes Horizontal Pod Scaling (HPA) that uses basic CPU and/or memory metrics is not suitable for scaling Kafka consumers. The more appropriate workload metric for Kafka consumer is the number of messages in Kafka broker queue. More specifically, the message production rate of a specific topic would be the right workload metric for a Kafka consumer.
While using message production rate is a better way to decide the number of consumer replicas, this is still a reaction based auto-scaling. Using machine-learning based forecasting, it is possible for predict the upcoming increase or decrease of message production rate. With predicted workload, scaling the Kafka consumers could be achieved in a more timely manner, resulting with better performance KPI's.
Architecting Microservices Applications with Instant Analyticsconfluent
View recording here: https://www.confluent.io/online-talks/architecting-microservices-applications-with-instant-analytics
The next generation architecture for exploring and visualizing event-driven data in real-time requires the right technology. Microservices deliver significant deployment and development agility, but raise questions of how data will move between services and how it will be analyzed. This online talk explores how Apache Druid and Apache Kafka® can turn a microservices ecosystem into a distributed real-time application with instant analytics. Apache Kafka and Druid form the backbone of an architecture that meet the demands imposed on the next generation applications you are building right now. Join industry experts Tim Berglund, Confluent, and Rachel Pedreschi, Imply, as they discuss architecting microservices apps with Druid and Apache Kafka.
In this video from the 2014 HPC User Forum in Seattle, David Pellerin from Amazon presents: Update on HPC Use on AWS.
"High Performance Computing (HPC) allows scientists and engineers to solve complex science, engineering, and business problems using applications that require high bandwidth, low latency networking, and very high compute capabilities. AWS allows you to increase the speed of research by running high performance computing in the cloud and to reduce costs by providing Cluster Compute or Cluster GPU servers on-demand without large capital investments. You have access to a full-bisection, high bandwidth network for tightly-coupled, IO-intensive workloads, which enables you to scale out across thousands of cores for throughput-oriented applications."
Watch the video presentation: http://wp.me/p3RLHQ-d0n
This document discusses how Amazon Web Services (AWS) provides high-performance computing (HPC) capabilities in the cloud. It notes that HPC workloads require large amounts of data, compute, and people. AWS offers elastic, on-demand infrastructure services like Amazon EC2 and S3 that allow users to pay for only what they use. The document outlines how users can run HPC workloads on AWS using EC2 instance types optimized for compute and GPU-accelerated instances. It shares examples of organizations using AWS for applications like computational fluid dynamics and molecular modeling.
Auto-Train a Time-Series Forecast Model With AML + ADBDatabricks
Supply Chain, Healthcare, Insurance, and Finance often require highly accurate forecasting models in an enterprise large-scale fashion. With Azure Machine Learning on Azure Databricks, the scale and speed to large-scale many-models can be achieved and time-to-product decreases drastically. The better-together story poses an enterprise approach to AI/ML.
Azure AutoML offers an elegant solution efficiently to build forecasting models on Azure Databricks compute solving sophisticated business problems. The presentation covers the Azure Machine Learning + Azure Databricks approach (see slides attached) while the demo covers a hands-on business problem building a forecasting model in Azure Databricks using Azure Machine Learning. The AI/ML better-together story is elevated as MLFlow for Data Science Lifecycle Management and Hyperopt for distributed model execution completes AI/ML enterprise readiness for industry problems.
IoT Sensor Analytics with Apache Kafka, KSQL, TensorFlow and MQTT => Kafka-Native End-to-End IoT Data Integration and Processing.
Large numbers of IoT devices lead to big data and the need for further processing and analysis. Apache Kafka is a highly scalable and distributed open source streaming platform, which can connect to MQTT and other IoT standards. Kafka ingests, stores, processes and forwards high volumes of data from thousands of IoT devices.
The rapidly expanding world of stream processing can be daunting, with new concepts such as various types of time semantics, windowed aggregates, changelogs, and programming frameworks to master. KSQL is the streaming SQL engine on top of Apache Kafka which simplifies all this and make stream processing available to everyone without the need to write source code.
This talk shows how to leverage Kafka and KSQL in an IoT sensor analytics scenario for predictive maintenance and integration with real time monitoring systems. A live demo shows how to embed and deploy Machine Learning models - built with frameworks like TensorFlow, DeepLearning4J or H2O - into mission-critical and scalable real time applications.
This profile summarizes Bruce Roshanravan's experience and qualifications. He has over 20 years of experience in software development, consultancy, and web development. He has expertise in various programming languages like Java, PLSQL, C++, and Visual Basic. He is proficient in software design methodologies like Waterfall, Scrum, and RUP. He has worked as a senior developer for various companies on projects involving software maintenance, new feature development, database management, and project documentation. His education includes a BSc in Computer Studies and a BEng in Civil Engineering.
Drug discovery at 2x speed. Faster, more comprehensive testing approval processes. Identifying gene targets in massive sequencing data sets. These goals are ambitious yet attainable, but not without increasing the computational capabilities of today's researchers. While everyone agrees that simply deploying more infrastructure is not the answer, running that work in the cloud is not without challenges. In this talk we will discuss and illustrate elements of those workloads that Cycle Computing's customers have run on AWS, generating vastly better results than would have been attained on traditional infrastructure. We will cover some common problems they encountered, and how they resolved them using Amazon EC2, S3, Glacier, and Cycle's software.
Presenters: Dougal Ballantyne, Business Development, AWS; Rob Futrick, CTO, Cycle Computing
Those who out-compute can many times out-compete. The cloud gives you access to a massive amount of compute power when you need it. This talk will present an introduction to HPC in the cloud, including, the benefits of HPC in the cloud, how to get started, some tools to use, and how you can manage data. We will showcase several examples of HPC in the cloud by a number of public sector and commercial customers.
Created by: Dr. Jeff Layton, Principal, Solutions Architect
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
Leveraging big data and high performance computing (HPC) solutions enables your organization to make smarter and faster decisions that influence strategy, increase productivity, and ultimately grow your business. We kick off the Big Data and HPC track with the latest advancements in data analytics, databases, storage, and HPC at AWS. Hear customer success stories and discover how to put data to work in your own organization.
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...confluent
1) The document discusses the art of building event streaming applications using various techniques like bounded contexts, stream processors, and architectural pillars.
2) Key aspects include modeling the application as a collection of loosely coupled bounded contexts, handling state using Kafka Streams, and building reusable stream processing patterns for instrumentation.
3) Composition patterns involve choreographing and orchestrating interactions between bounded contexts to capture business workflows and functions as event-driven data flows.
Machine learning in the physical world by Kip Larson from AWS IoTBill Liu
Presented at AI NEXTCon Seattle 1/17-20, 2018
http://aisea18.xnextcon.com
join our free online AI group with 50,000+ tech engineers to learn and practice AI technology, including: latest AI news, tech articles/blogs, tech talks, tutorial videos, and hands-on workshop/codelabs, on machine learning, deep learning, data science, etc..
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...StreamNative
More and more developer want to build cloud-native distributed application or microservices by making use of high performing, cloud-agnostic messaging technology for maximum decoupling. The only thing we do not want is the hassle of managing the complex message infrasturcture needed for the job, or the risk of getting into a vendor lock-in. Generally developers know Apache Kafka, but for event sourcing or the CQRS pattern Kafka is not really suitable. In this talk I will give you at least ten reasons why to choose Pulsar over Kafka for event sourcing and data consensus.
Visit http:aws.amazon.com/hpc for more information about HPC on AWS.
High Performance Computing (HPC) allows scientists and engineers to solve complex science, engineering, and business problems using applications that require high bandwidth, low latency networking, and very high compute capabilities. AWS allows you to increase the speed of research by running high performance computing in the cloud and to reduce costs by providing Cluster Compute or Cluster GPU servers on-demand without large capital investments. You have access to a full-bisection, high bandwidth network for tightly-coupled, IO-intensive workloads, which enables you to scale out across thousands of cores for throughput-oriented applications.
(CMP202) Engineering Simulation and Analysis in the CloudAmazon Web Services
"Building great products, ones that are aesthetically appealing as well as functionally sound, requires cutting-edge design and engineering. Given the high cost of physical testing prototypes, engineering organizations are turning to simulation and analysis using digital models, but compute requirements for these have traditionally required expensive on-premises infrastructure. But now, engineering organizations can use high-performance computing services from AWS and solutions from AWS technology partners to innovate at scale globally, with no up-front capital infrastructure investment.
In this session, AWS Partner Ansys shares how they help customers of all sizes design and engineer better products through digital simulation and analysis using HPC on AWS."
More info: https://cnfl.io/cloud-native-experience-for-kafka-in-cloud | Neha Narkhede is co-founder and CTO at Confluent, a company backing the popular Apache Kafka messaging system. Prior to founding Confluent, Neha led streams infrastructure at LinkedIn, where she was responsible for LinkedIn’s streaming infrastructure built on top of Apache Kafka and Apache Samza. She is one of the initial authors of Apache Kafka and a committer and PMC member on the project.
Apache Kafka® and Analytics in a Connected IoT Worldconfluent
Apache Kafka® and Analytics in a Connected IoT World, Kai Waehner, Sr. Solutions Engineer Advanced Technology Group, Confluent
https://www.meetup.com/Berlin-Apache-Kafka-Meetup-by-Confluent/events/273166575/
Serverless Streaming Architectures and Algorithms for the EnterpriseArun Kejariwal
In recent years, serverless has gained momentum in the realm of cloud computing. Broadly speaking, it comprises function as a service (FaaS) and backend as a service (BaaS). The distinction between the two is that under FaaS, one writes and maintains the code (e.g., the functions) for serverless compute; in contrast, under BaaS, the platform provides the functionality and manages the operational complexity behind it. Serverless provides a great means to boost development velocity. With greatly reduced infrastructure costs, more agile and focused teams, and faster time to market, enterprises are increasingly adopting serverless approaches to gain a key advantage over their competitors.
Example early use cases of serverless include, for example, data transformation in batch and ETL scenarios and data processing using MapReduce patterns. As a natural extension, serverless is being used in the streaming context such as, but not limited to, real-time bidding, fraud detection, intrusion detection. Serverless is, arguably, naturally suited to extracting insights from fast data, that is, high-volume, high-velocity data. Example tasks in this regard include filtering and reducing noise in the data and leveraging machine learning and deep learning models to provide continuous insights about business operations.
We walk the audience through the landscape of streaming systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage. We overview the inception and growth of the serverless paradigm. Further, we deep dive into Apache Pulsar, which provides native serverless support in the form of Pulsar functions, and paint a bird’s-eye view of the application domains where Pulsar functions can be leveraged.
Baking in intelligence in a serverless flow is paramount from a business perspective. To this end, we detail different serverless patterns—event processing, machine learning, and analytics—for different use cases and highlight the trade-offs. We present perspectives on how advances in hardware technology and the emergence of new applications will impact the evolution of serverless streaming architectures and algorithms. The topics covered include an introduction to st
reaming, an introduction to serverless, serverless and streaming requirements, Apache Pulsar, application domains, serverless event processing patterns, serverless machine learning patterns, and serverless analytics patterns.
Machine learning at scale by Amy Unruh from GoogleBill Liu
Presented at AI NEXTCon Seattle 1/17-20, 2018
http://aisea18.xnextcon.com
join our free online AI group with 50,000+ tech engineers to learn and practice AI technology, including: latest AI news, tech articles/blogs, tech talks, tutorial videos, and hands-on workshop/codelabs, on machine learning, deep learning, data science, etc..
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
Morningstar’s Risk Model project is created by stitching together statistical and machine learning models to produce risk and performance metrics for millions of financial securities. Previously, we were running a single version of this application, but needed to expand it to allow for customizations based on client demand. With the goal of running hundreds of custom Risk Model runs at once at an output size of around 1TB of data each, we had a challenging technical problem on our hands! In this presentation, we’ll talk about the challenges we faced replatforming this application to Spark, how we solved them, and the benefits we saw.
Some things we’ll touch on include how we created customized models, the architecture of our machine learning application, how we maintain an audit trail of data transformations (for rigorous third party audits), and how we validate the input data our model takes in and output data our model produces. We want the attendees to walk away with some key ideas of what worked for us when productizing a large scale machine learning platform.
This document provides an overview of Amazon Web Services (AWS) and its capabilities. It describes AWS's global consumer and seller businesses as well as its cloud infrastructure business. It then discusses why researchers love using AWS due to benefits like time to science, global accessibility, low costs, security, and elasticity. Popular high-performance computing workloads on AWS are also listed.
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...HostedbyConfluent
In a typical deployment of Kafka with many topics and partitions, scaling the Kafka consumer efficiently is one of the important tasks in maintaining overall smooth Kafka operations. The traditional Kubernetes Horizontal Pod Scaling (HPA) that uses basic CPU and/or memory metrics is not suitable for scaling Kafka consumers. The more appropriate workload metric for Kafka consumer is the number of messages in Kafka broker queue. More specifically, the message production rate of a specific topic would be the right workload metric for a Kafka consumer.
While using message production rate is a better way to decide the number of consumer replicas, this is still a reaction based auto-scaling. Using machine-learning based forecasting, it is possible for predict the upcoming increase or decrease of message production rate. With predicted workload, scaling the Kafka consumers could be achieved in a more timely manner, resulting with better performance KPI's.
Architecting Microservices Applications with Instant Analyticsconfluent
View recording here: https://www.confluent.io/online-talks/architecting-microservices-applications-with-instant-analytics
The next generation architecture for exploring and visualizing event-driven data in real-time requires the right technology. Microservices deliver significant deployment and development agility, but raise questions of how data will move between services and how it will be analyzed. This online talk explores how Apache Druid and Apache Kafka® can turn a microservices ecosystem into a distributed real-time application with instant analytics. Apache Kafka and Druid form the backbone of an architecture that meet the demands imposed on the next generation applications you are building right now. Join industry experts Tim Berglund, Confluent, and Rachel Pedreschi, Imply, as they discuss architecting microservices apps with Druid and Apache Kafka.
In this video from the 2014 HPC User Forum in Seattle, David Pellerin from Amazon presents: Update on HPC Use on AWS.
"High Performance Computing (HPC) allows scientists and engineers to solve complex science, engineering, and business problems using applications that require high bandwidth, low latency networking, and very high compute capabilities. AWS allows you to increase the speed of research by running high performance computing in the cloud and to reduce costs by providing Cluster Compute or Cluster GPU servers on-demand without large capital investments. You have access to a full-bisection, high bandwidth network for tightly-coupled, IO-intensive workloads, which enables you to scale out across thousands of cores for throughput-oriented applications."
Watch the video presentation: http://wp.me/p3RLHQ-d0n
This document discusses how Amazon Web Services (AWS) provides high-performance computing (HPC) capabilities in the cloud. It notes that HPC workloads require large amounts of data, compute, and people. AWS offers elastic, on-demand infrastructure services like Amazon EC2 and S3 that allow users to pay for only what they use. The document outlines how users can run HPC workloads on AWS using EC2 instance types optimized for compute and GPU-accelerated instances. It shares examples of organizations using AWS for applications like computational fluid dynamics and molecular modeling.
Auto-Train a Time-Series Forecast Model With AML + ADBDatabricks
Supply Chain, Healthcare, Insurance, and Finance often require highly accurate forecasting models in an enterprise large-scale fashion. With Azure Machine Learning on Azure Databricks, the scale and speed to large-scale many-models can be achieved and time-to-product decreases drastically. The better-together story poses an enterprise approach to AI/ML.
Azure AutoML offers an elegant solution efficiently to build forecasting models on Azure Databricks compute solving sophisticated business problems. The presentation covers the Azure Machine Learning + Azure Databricks approach (see slides attached) while the demo covers a hands-on business problem building a forecasting model in Azure Databricks using Azure Machine Learning. The AI/ML better-together story is elevated as MLFlow for Data Science Lifecycle Management and Hyperopt for distributed model execution completes AI/ML enterprise readiness for industry problems.
IoT Sensor Analytics with Apache Kafka, KSQL, TensorFlow and MQTT => Kafka-Native End-to-End IoT Data Integration and Processing.
Large numbers of IoT devices lead to big data and the need for further processing and analysis. Apache Kafka is a highly scalable and distributed open source streaming platform, which can connect to MQTT and other IoT standards. Kafka ingests, stores, processes and forwards high volumes of data from thousands of IoT devices.
The rapidly expanding world of stream processing can be daunting, with new concepts such as various types of time semantics, windowed aggregates, changelogs, and programming frameworks to master. KSQL is the streaming SQL engine on top of Apache Kafka which simplifies all this and make stream processing available to everyone without the need to write source code.
This talk shows how to leverage Kafka and KSQL in an IoT sensor analytics scenario for predictive maintenance and integration with real time monitoring systems. A live demo shows how to embed and deploy Machine Learning models - built with frameworks like TensorFlow, DeepLearning4J or H2O - into mission-critical and scalable real time applications.
This profile summarizes Bruce Roshanravan's experience and qualifications. He has over 20 years of experience in software development, consultancy, and web development. He has expertise in various programming languages like Java, PLSQL, C++, and Visual Basic. He is proficient in software design methodologies like Waterfall, Scrum, and RUP. He has worked as a senior developer for various companies on projects involving software maintenance, new feature development, database management, and project documentation. His education includes a BSc in Computer Studies and a BEng in Civil Engineering.
Sperry Van Ness International Corporation owns the Sperry Van Ness and SVN marks. Each Sperry Van Ness office is independently owned and operated. SVN welcomes a new office in Mexico City with advisors Manuel Pumarejo Gonzalez and others focusing on retail, land, and tenant representation with over 14 years of experience.
This document summarizes Philip Engström's master's thesis project on interactive GPU-based volume rendering. The project investigated two approaches - one based on textured slices of proxy geometry and one based on ray casting. It was found that the ray casting implementation provided far superior image quality. Most of the project work focused on improving ray casting performance through an empty space skipping method using a complex bounding geometry. The report provides background on volume rendering and GPU technology to make the project accessible to readers with basic computer graphics knowledge.
Christian Aid has developed a Resilience Framework to help empower marginalized communities manage risks and improve well-being. The framework is based on principles of community-led processes, power and inclusion, accountability, and do no harm. It recognizes that communities face various interconnected risks at different levels. The framework guides programs to support communities in identifying risks, taking action, and accessing resources to build sustainable resilience through interventions that address issues like power relations, agriculture, markets, health, and conflict.
Las redes sociales y el mundo de los archivos: girando alrededor de la 2.0Julián Marquina
Este documento presenta las redes sociales y plataformas 2.0 que pueden ser utilizadas por los archivos. Explica brevemente el concepto de Web 2.0 y cómo ha evolucionado hacia un modelo participativo. Luego describe algunas herramientas sociales como blogs, Twitter y Facebook que los archivos pueden emplear para difundir información y generar contenido con los usuarios de forma dinámica. Finalmente, ofrece consejos sobre cómo administrar con éxito estas plataformas para los archivos.
Laporan ini membahas hasil pengawasan di SMAN 2 Semarang yang mencakup penilaian terhadap persiapan dan pelaksanaan pembelajaran guru, analisis efektivitas manajemen sekolah melalui empat perspektif Balanced Scorecard, serta tujuan dan ruang lingkup pengawasan yang meliputi aspek akademik dan manajerial."
Este documento presenta una introducción al curso de Teoría Literaria. Explica que el curso desarrollará competencias para analizar textos literarios desde diferentes perspectivas teóricas. También justifica que los futuros docentes deben conocer las teorías literarias para comprender la complejidad de los textos. Los objetivos son introducir a los estudiantes en las corrientes teóricas y debates culturales en torno a la literatura. Finalmente, la primera unidad cubre conceptos básicos como literatura, géneros literarios, funciones de la liter
Apache Kafka® + Machine Learning for Supply Chain confluent
Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-machine-learning-for-supply-chain
Automating multifaceted, complex workflows requires hybrid solutions like streaming analytics of IoT data, batch analytics like machine learning solutions, and real-time visualizations. Leaders in organizations who are responsible for global supply chain planning are responsible for working with and integrating with data from disparate sources around the world. Many of these data sources output information in real-time, which assists planners in operationalizing plans and interacting with manufacturing output. IoT sensors on manufacturing equipment and inventory control systems feed real-time processing pipelines to match actual production figures against planned schedules to calculate yield efficiency.
Using information from both real-time systems and batch optimization, supply chain managers are able to economize operations and automate tedious inventory and manufacturing accounting processes. Sitting on top of all these systems is a supply chain visualization tool, enabling users' visibility over the global supply chain. If you are responsible for key data integration initiatives, join for a detailed walk through of a customer's use of this system built using Confluent and Expero tools.
WHAT YOU'LL LEARN:
• See different use cases in automation industry and Industrial IoT (IIoT) where an event streaming platform adds business value.
• Understand different architecture options to leverage Apache Kafka and Confluent.
• How to leverage different analytics tools and machine learning frameworks in a flexible and scalable way.
• How real-time visualization ties together streaming and batch analytics for business users, interpreters, and analysts.
• Understand how streaming and batch analytics optimize the supply chain planning workflow.
• Conceptualize the intersection between resource utilization and manufacturing assets with long term planning and supply chain optimization.
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...Kai Wähner
I did a webinar with Confluent's partner Expero about "Apache Kafka and Machine Learning for Real Time Supply Chain Optimization". This is a great example for anybody in automation industry / Industrial IoT (IIoT) like automotive, manufacturing, logistics, etc.
We explain how a real time event streaming platform can integrate in real time with the legacy world and proprietary IIoT protocols (like Siemens S7, Modbus, Beckhoff ADS, OPC-UA, et al). You can process the data at scale and then ingest it into a modern database (like AWS S3, Snowflake or MongoDB) or analytic / machine learning framework (like TensorFlow, PyTorch or Azure Machine Learning Service).
The document outlines 19 potential project titles for a Cisco summer internship in 2011. The projects cover a wide range of topics including network performance testing, automation, monitoring, management, and security tools.
Kalix: Tackling the The Cloud to Edge ContinuumJonas Bonér
Read this blog for an overview of Kalix:
https://www.kalix.io/blog/kalix-move-to-the-cloud-extend-to-the-edge-go-beyond
Abstract:
What will the future of the Cloud and Edge look like for us as developers? We have great infrastructure nowadays, but that only solves half of the problem. The Serverless developer experience shows the way, but it’s clear that FaaS is not the final answer. What we need is a programming model and developer UX that takes full advantage of new Cloud and Edge infrastructure, allowing us to build general-purpose applications, without needless complexity.
What if you only had to think about your business logic, public API, and how your domain data is structured, not worry about how to store and manage it? What if you could not only be serverless but become “databaseless” and forget about databases, storage APIs, and message brokers?
Instead, what if your data just existed wherever it needed to be, co-located with the service and its user, at the edge, in the cloud, or in your own private network—always there and available, always correct and consistent? Where the data is injected into your services on an as-needed basis, automatically, timely, efficiently, and intelligently.
Services, powered with this “data plane” of application state—attached to and available throughout the network—can run anywhere in the world: from the public Cloud to 10,000s of PoPs out at the Edge of the network, in close physical approximation to its users, where the co-location of state, processing, and end-user, ensures ultra-low latency and high throughput.
Sounds exciting? Let me show you how we are making this vision a reality building a distributed real-time Data Plane PaaS using technologies like Akka, Kubernetes, gRPC, Linkerd, and more.
ALT-F1.BE : The Accelerator (Google Cloud Platform)Abdelkrim Boujraf
The Accelerator is an IT infrastructure able to collect and analyze a massive amount of public data on the WWW.
The Accelerator leverages the untapped potential of web data with the first solution designed for diverse sectors,
completely scalable, available on-premise, and cloud-provider agnostic.
This session introduces you to Amazon EC2 F1 instances and walks you through a typical development and deployment process, including the Approved Amazon EC2 F1 C/C++ development workflow. We also discuss a number of use cases in different domains, including financial risk simulation, genomics, video processing, big data and analytics, with a discussion about acceleration work on top of EC2 F1.
Do Clouds Compute? A Framework for Estimating the Value of Cloud Computing.Markus Klems
My slides of the WeB 2008 workshop on e-Business in Paris. The framework that I describe in the presentation assists decision makers to understand the value proposition of Cloud Computing technology.
kreuzwerker AWS Modernizing Legacy Operations with Containerized Solutions 20...kreuzwerker GmbH
This document summarizes a presentation about modernizing legacy operations with containerized solutions on AWS. It discusses how kreuzwerker helped a client streamline data flow between their CRM, ERP, and LIMS systems using a serverless architecture on AWS. The architecture designed by kreuzwerker used AWS services like EventBridge and Step Functions to extract data from their MDM via REST, transform it, and load it into various external systems in parallel via REST. This provided benefits like low maintenance, simple usage, and cost optimization while handling errors and retries.
Cloud computing allows users to access computing resources like software, storage, databases, and more over the internet. It has several key characteristics including on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service. There are three main service models: Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). Cloud computing provides benefits for applications, storage, and databases by offering the latest versions and flexibility without the need for local hardware and maintenance.
This document discusses modern application architectures on AWS. It covers key concepts like containers, serverless computing using AWS Lambda, and managed Kubernetes with Amazon EKS. Specific services are highlighted, like Amazon ECS for container orchestration and Amazon Fargate for serverless containers without managing infrastructure. Case studies are presented on companies like FINRA and McDonald's using these architectures on AWS for speed, scale, and cost efficiency. The principles of cloud native applications are also summarized, focusing on pay-as-you-go models, self-service, elasticity, and other advantages over traditional data center architectures.
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
In today's data-driven world, the Internet of Things (IoT) is revolutionizing industries and unlocking new possibilities. Join Data Reply, Confluent, and Imply as we unveil a comprehensive solution for IoT that harnesses the power of real-time insights.
Lew Tucker discusses the rise of cloud computing and its impact. He defines various cloud service models like SaaS, PaaS, and IaaS. Tucker analogizes the shift to cloud computing from individual data centers generating their own power to today's electrical grid. Major drivers of cloud computing include the growth of web APIs and massive amounts of user-generated data. Tucker outlines how cloud computing changes what developers can access and how applications are designed and scaled.
This document discusses the high availability and disaster recovery solution implemented at QR to support their mission critical SAP application. The solution involved migrating from a mainframe database to a SQL Server database, with databases clustered across two data centers for redundancy. The new solution met all of QR's requirements including zero data loss, recovery within 5 minutes, and the ability to run the full solution on a single node.
All you need to know about yelowsofts new version updateYelowsoft
At Yelowsoft, our team is highly dedicated to achieve excellence in all the work that we do. We have achieved excellence in all our works in the past by continuously reinventing ourselves. This continuous process of reinvention has now become our trademark.
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Amazon Web Services
Amazon Kinesis makes it easy to speed up the time it takes for you to get valuable, real-time insights from your streaming data. In this session, we walk through the most popular applications that customers implement using Amazon Kinesis, including streaming extract-transform-load, continuous metric generation, and responsive analytics. Our customer Autodesk joins us to describe how they created real-time metrics generation and analytics using Amazon Kinesis and Amazon Elasticsearch Service. They walk us through their architecture and the best practices they learned in building and deploying their real-time analytics solution.
Scalable AutoML for Time Series Forecasting using RayDatabricks
Time Series Forecasting is widely used in real world applications, such as network quality analysis in Telcos, log analysis for data center operations, predictive maintenance for high-value equipment, etc
Wire data provides deep insights across IT, security and business use cases by capturing the communications transmitted over the wire between machines and applications in real-time. The Splunk App for Stream enables new operational intelligence by indexing this wire data without needing instrumentation. It provides enhanced visibility, efficient cloud-ready collection, and fast time to value through interface-driven deployment. Key features include protocol decoding, attribute filtering, aggregations, and custom content extraction for analysis in Splunk.
Data Con LA 2022 - Event Sourcing with Apache Pulsar and Apache QuarkusData Con LA
This document discusses event sourcing with Apache Pulsar and Quarkus. It provides background on event sourcing and how it supports microservices architecture. Apache Pulsar is presented as a cloud-native event streaming platform that is well-suited for event sourcing due to its guaranteed message delivery, scalability, and other features. Quarkus is discussed as a Kubernetes-native Java framework that is optimized for container-based architectures. An example application architecture is shown that uses Pulsar for event storage and Quarkus microservices that generate views of the event data.
SRV205 Architectures and Strategies for Building Modern Applications on AWSAmazon Web Services
Rapid growth of technology and tooling in the cloud has enabled us to build modern applications that are more secure, scalable, and focused on our business. In this session, we cover the key compute primitives that enable us to accelerate towards building and running modern, cloud-native applications. We highlight what we’ve learned from customers running applications with AWS Lambda and AWS Fargate, two modern compute technologies for running applications in the cloud. In addition, we cover architecture patterns of modern application, key primitives required for building modern systems, steps you can take to start building and monitoring modern applications today, and secrets to fearlessly going faster and farther in the cloud.
The document discusses monitoring and managing infrastructure as a service (IaaS) and platform as a service (PaaS) solutions with Hyperic HQ. It notes that current data center realities often fall short of goals like resilience, efficiency, and accommodation of new technologies. Open source tools provide opportunities for innovation through virtualization, standardization, and cloud computing. Case studies show how open source technologies helped commercial and government clients reduce costs, improve flexibility and provisioning, and consolidate infrastructure.
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdfFlorence Consulting
Quattordicesimo Meetup di Milano, tenutosi a Milano il 23 Maggio 2024 dalle ore 17:00 alle ore 18:30 in presenza e da remoto.
Abbiamo parlato di come Axpo Italia S.p.A. ha ridotto il technical debt migrando le proprie APIs da Mule 3.9 a Mule 4.4 passando anche da on-premises a CloudHub 1.0.
Gen Z and the marketplaces - let's translate their needsLaura Szabó
The product workshop focused on exploring the requirements of Generation Z in relation to marketplace dynamics. We delved into their specific needs, examined the specifics in their shopping preferences, and analyzed their preferred methods for accessing information and making purchases within a marketplace. Through the study of real-life cases , we tried to gain valuable insights into enhancing the marketplace experience for Generation Z.
The workshop was held on the DMA Conference in Vienna June 2024.
Instagram has become one of the most popular social media platforms, allowing people to share photos, videos, and stories with their followers. Sometimes, though, you might want to view someone's story without them knowing.
Ready to Unlock the Power of Blockchain!Toptal Tech
Imagine a world where data flows freely, yet remains secure. A world where trust is built into the fabric of every transaction. This is the promise of blockchain, a revolutionary technology poised to reshape our digital landscape.
Toptal Tech is at the forefront of this innovation, connecting you with the brightest minds in blockchain development. Together, we can unlock the potential of this transformative technology, building a future of transparency, security, and endless possibilities.
1.
GearPump
–
Real-‐time
Streaming
Engine
Using
Akka*
Sean
Zhong,
Kam
Kasravi,
Huafeng
Wang,
Manu
Zhang,
Weihua
Jiang
{xiang.zhong,
kam.d.kasravi,
huafeng.wang,
tianlun.zhang,
weihua.jiang}@intel.com
Intel
Big
Data
Technology
Team
Dec.
2014
2.
About
GearPump…………………………………………………………………………………………………………………….3
Motive
.........................................................................................................................................................................................
3
Example
.....................................................................................................................................................................................
4
Overview
...................................................................................................................................................................................
5
Implementation
.....................................................................................................................................................................
7
DAG
API
................................................................................................................................................................................
7
Application
Submission
.................................................................................................................................................
8
Remote
Actor
Code
Provisioning
...............................................................................................................................
8
High
performance
Message
passing
.........................................................................................................................
9
Flow
Control
....................................................................................................................................................................
10
Message
Loss
Detection
..............................................................................................................................................
11
Stream
Replay
.................................................................................................................................................................
11
Master
High
Availability
.............................................................................................................................................
12
Global
Clock
Service
.....................................................................................................................................................
13
Fault
Tolerant
and
At-‐Least-‐Once
Message
Delivery
.....................................................................................
13
Exactly
Once
Message
Delivery
(ongoing)
.........................................................................................................
14
Dynamic
Graph
(ongoing)
.........................................................................................................................................
17
Observations
on
Akka
......................................................................................................................................................
17
Experiment
Results
...........................................................................................................................................................
18
Throughput
......................................................................................................................................................................
18
Latency
...............................................................................................................................................................................
18
Fault
Recovery
time
.....................................................................................................................................................
19
Future
Plan(s)
......................................................................................................................................................................
19
Summary
................................................................................................................................................................................
19
4.
About
GearPump
GearPump
(https://github.com/intel-‐hadoop/gearpump)
is
a
lightweight,
real-‐time,
big
data
streaming
engine.
It
is
inspired
by
recent
advances
with
Akka
and
a
desire
to
improve
on
existing
streaming
frameworks.
GearPump
draws
from
a
number
of
existing
frameworks
including
MillWheel1,
Apache
Storm2,
Spark
Streaming3,
Apache
Samza4,
Apache
Tez5,
and
Hadoop
YARN6
while
leveraging
Actors
throughout
its
architecture.
The
name
GearPump
is
a
reference
to
the
engineering
term
“gear
pump,”
which
is
a
super
simple
pump
that
consists
of
only
two
gears,
but
is
very
powerful
at
streaming
water.
In
this
article,
we
will
present
how
we’ve
leveraged
Akka
to
build
this
engine.
Motive
Today’s
big
data
streaming
frameworks
are
in
high
demand
to
process
immense
amounts
of
data
from
a
rapidly
growing
set
of
disparate
data
sources.
Beyond
the
requirements
of
fault
tolerance,
persistence
and
scalability,
streaming
engines
need
to
provide
a
programming
model
where
computations
can
be
easily
expressed
and
deployed
in
a
distributed
manner.
GearPump
meets
this
objective
while
differentiating
itself
from
other
streaming
engines
by
elevating
and
promoting
the
Actor
Model
as
the
primary
entity
permeating
the
framework.
What
we
set
out
to
build
is
a
simple
and
powerful
streaming
framework.
The
Scala
language
and
Akka
offer
a
higher
level
of
abstraction
allowing
frameworks
to
focus
more
on
the
application
and
make
the
engine
more
lightweight.
We’ve
adhered
to
several
basic
principles:
l Message-‐driven
architecture
is
the
right
choice
for
real-‐time
streaming.
Real-‐time
applications
need
to
respond
to
messages
immediately.
Akka
gives
us
all
the
facilities
and
building
blocks
for
message-‐driven
architecture;
it
simplifies
the
design
and
coding
and
allows
code
to
evolve
more
quickly.
l Actor
is
the
right
level
of
concurrency.
It
can
give
us
better
performance
to
scale
up
and
scale
out.
We
have
seen
other
big
data
engines
using
process,
thread,
or
self-‐managed
thread
pools
to
parallelize.
Actors
are
much
more
lightweight.
Moreover,
they
have
many
optimizations
related
to
concurrency;
Actor
execution
can
be
swapped
in
and
out
efficiently
with
respect
to
fairness
and
performance.
l The
flexibility
of
the
Actor
Model
imbues
streaming
with
some
unique
abilities.
To
see
an
example
of
this,
let’s
take
a
look
at
a
video
streaming
problem
in
the
next
section.
Example
Suppose
we
are
using
streaming
to
solve
a
public
security
problem:
we
want
to
find
and
track
a
crime
suspect’s
location
by
correlating
all
video
data
from
HD
cameras
across
the
city.
This
requires
processing
raw
video
data
in
order
to
run
face
recognition
algorithms
on
this
data,
but
we
cannot
5.
afford
to
pass
ALL
data
back
to
the
data
center
as
the
bandwidth
is
limited.
Figure
1
shows
how
GearPump
approaches
this
problem:
Figure
1:
Video
Stream
Processing,
and
Dynamic
Graph
1. Cameras
pass
coarse-‐grained
feature
data
to
the
backend
data
center.
Raw
video
data
cannot
be
transferred
because
bandwidth
usage
is
limited.
2. The
backend
correlates
some
of
the
data
with
nearby
cameras
and
identifies
a
potential
suspect.
This
requires
additional
examination
of
all
related
video
frames
for
this
target.
3. A
task
is
dynamically
delegated
to
one
or
more
edge
devices
to
gather
the
raw
video
data
and
send
it
back.
The
backend
then
runs
face
recognition
algorithms
on
this
data
and
passes
results
downstream.
The
interesting
part
of
the
use
case
is
step
3,
where
new
processing
directives
are
dynamically
deployed
to
specific
edge
devices
in
order
to
allow
subsequent
ingestion
and
analysis.
If
we
generalize
this
capability,
we
can
transform
a
DAG
at
runtime
where
relevant
computation
can
be
deployed
beyond
the
boundary
of
the
data
center.
The
location
transparency
of
the
Actor
Model
makes
this
possible.
Overview
In
this
section
we
describe
how
we
model
streaming
within
the
Actor
Model
(Figure
2).
Note
there
is
an
obvious
parallelism
to
YARN;
however,
our
model
is
concentric
to
the
application—something
YARN
architects
explicitly
delegate
to
“a
host
of
framework
implementations7.”
Integration
with
YARN
is
on
our
roadmap
and
will
pair
a
compelling
application
streaming
architecture
with
YARN’s
broadly
recognized
de-‐facto
resource
management
architecture.
6.
Figure
2:
GearPump
Actor
Hierarchy
Everything
in
the
diagram
is
an
actor.
The
actors
fall
into
two
categories,
Cluster
and
Application
Actors.
Cluster
Actors
l Worker:
Maps
to
a
physical
worker
machine.
It
is
responsible
for
managing
resources
and
reporting
metrics
on
that
machine.
l Master:
Heart
of
the
cluster,
which
manages
workers,
resources,
and
applications.
The
main
function
is
delegated
to
three
child
actors:
App
Manager,
Worker
Manager,
and
Resource
Scheduler.
Application
Actors
l AppMaster:
Responsible
for
scheduling
the
tasks
to
workers
and
managing
the
state
of
the
application.
Different
applications
have
different
AppMaster
instances
and
are
isolated.
l Executor:
Child
of
AppMaster,
represents
a
JVM
process.
Its
job
is
to
manage
the
lifecycle
of
tasks
and
recover
them
in
case
of
failure.
l Task:
Child
of
Executor,
does
the
real
job.
Every
task
actor
has
a
global
unique
address.
One
task
actor
can
send
data
to
any
other
task
actors.
This
gives
us
great
flexibility
of
how
the
computation
DAG
is
distributed.
All
actors
in
the
graph
are
woven
together
with
actor
supervision;
actor
watching
and
all
errors
are
handled
properly
via
supervisors.
In
a
master,
a
risky
job
is
isolated
and
delegated
to
child
actors,
7.
so
it’s
more
robust.
In
the
application,
an
extra
intermediate
layer,
“Executor,”
is
created
so
that
we
can
do
fine-‐grained
and
fast
recovery
in
case
of
task
failure.
A
Master
watches
the
lifecycle
of
AppMaster
and
Worker
to
handle
the
failures,
but
the
lifecycle
of
Worker
and
AppMaster
are
not
bound
to
a
Master
Actor
by
supervision,
so
that
Master
node
can
fail
independently.
Several
Master
Actors
form
an
Akka
cluster,
the
Master
state
is
exchanged
using
the
Gossip
protocol
in
a
conflict-‐
free
consistent
way
so
that
there
is
no
single
point
of
failure.
With
this
hierarchy
design,
we
are
able
to
achieve
high
availability.
Implementation
DAG
API
As
Figure
3
shows,
an
application
is
a
DAG
of
processors.
Each
processor
handles
messages.
Figure
3:
Processor
DAG
At
runtime,
each
processor
can
be
parallelized
to
a
list
of
tasks;
each
task
is
an
actor
within
the
data
pipeline.
Data
is
passed
between
tasks.
The
Partitioner
(Figure
4)
defines
how
data
is
transferred.
Figure
4:
Task
Data
Shuffling
We
have
an
easy
syntax
to
express
the
computation.
To
represent
the
DAG
above,
we
define
the
computation
as:
Graph(A~>B~>C~>D,
A~>
C,
B~>E~>D)
A~>B~>C~>D
is
a
Path
of
Processors,
and
Graph
is
composed
of
this
Path
list.
This
representation
should
be
cleaner
than
that
of
vertex
and
edge
set.
A
high-‐level
domain
specific
language
(DSL)
8.
using
combinators
like
flatmap
will
be
defined
on
top
of
this.
Our
aim
is
to
extend
the
existing
Scala
collections
framework…
likely
by
pimping8.
Application
Submission
In
GearPump,
each
application
consists
of
an
AppMaster
and
a
set
of
Executors,
and
each
one
is
a
standalone
JVM.
Thus,
an
application
is
isolated
from
other
applications
by
JVM
isolation.
Figure
5:
User
Submit
Application
To
submit
an
application
(Figure
5),
a
GearPump
client specifies a computation defined within a
DAG and submits this to an active master. The SubmitApplication message is sent to the Master,
which then forwards it to an AppManager.
Figure
6:
Launch
Executors
and
Tasks
The
AppManager
locates
an
available
worker
and
launches
an
AppMaster
in
a
sub-‐process
JVM
of
the
worker.
As
shown
in
Figure
6,
the
AppMaster
then
negotiates
with
the
Master
for
resource
allocation
in
order
to
distribute
the
DAG
as
defined
within
the
application.
The
allocated
workers
then
launch
Executors
(new
JVMs)
and
subsequently
start
Tasks.
Remote
Actor
Code
Provisioning
Akka
currently
does
not
support
remote
code
provisioning.
That
is,
instantiating
an
actor
on
a
remote
node
assumes
this
actor
class
and
related
dependencies
are
already
available
and
accessible
by
the
JVM’s
system
class
loader.
GearPump
provides
several
mechanisms
that
allow
new
resources
to
be
loaded
at
a
remote
node
site.
This
is
a
prerequisite
to
enabling
run-‐time
DAG
modification
9.
since
you
cannot
assume
new
DAG
sub-‐graphs
have
the
requisite
resources
on
the
remote
node.
As
noted
above,
when
a
worker
launches
an
executor,
the
worker
may
modify
the
executor’s
classpath
in
several
specific
ways.
First,
it
looks
for
any
serialized
jar
sent
as
part
of
the
application
definition.
If
one
exists,
the
worker
saves
this
locally
and
appends
the
jar
name
to
the
executor’s
classpath.
Second,
the
worker
looks
at
the
configuration
settings
also
included
as
part
of
an
application
definition
that
may
define
an
extra
classpath
for
the
executor.
In
this
latter
scenario
resources
are
resident
on
the
remote
node
but
need
to
be
appended
to
the
executor’s
classpath.
For
example,
an
AppMaster
and
its
task
actors
may
need
to
access
various
Hadoop
components
during
runtime
that
have
already
been
preinstalled
on
the
worker’s
machine.
High
performance
Message
passing
For
streaming
applications,
message-‐passing
performance
is
extremely
important.
For
example,
one
streaming
platform
may
need
to
process
millions
of
messages
per
second
with
millisecond
level
latency.
High
throughput
and
low
latency
are
not
that
easy
to
achieve.
There
are
a
number
of
challenges:
1. First
Challenge:
Network
is
not
efficient
for
small
messages
In
streaming,
typical
message
size
is
very
small,
usually
less
than
100
bytes
per
message,
like
the
floating
car
GPS
data.
But
network
efficiency
is
very
bad
when
transferring
small
messages.
As
you
can
see
in
Figure
7,
when
message
size
is
50
bytes,
it
can
only
use
20%
bandwidth.
How
to
improve
the
throughput?
Figure
7:
Messaging
Throughput
vs
Message
Size
2. Second
Challenge:
Message
overhead
is
too
big
Each
message
sent
between
two
actors
contains
sender
and
receiver
actor
paths.
When
sending
over
the
wire,
the
overhead
of
this
ActorPath
is
not
trivial.
For
example,
the
actor
path
below
takes
more
than
200
bytes.
akka.tcp://system1@192.168.1.53:51582/remote/akka.tcp/2120193a-e10b-474e-bccb-
8ebc4b3a0247@192.168.1.53:48948/remote/akka.tcp/system2@192.168.1.54:43676/user/master
/Worker1/app_0_executor_0/group_1_task_0#-768886794
10.
How
do
we
solve
these
challenges?
We’ve
implemented
a
custom
Netty
transportation
layer
with
Akka
extension.
In
Figure
8,
Netty
Client
translates
ActorPath
to
TaskId,
and
Netty
Server
translates
it
back.
Only
TaskId
is
passed
on
the
wire.
It
is
only
about
10
bytes,
so
the
overhead
is
minimized.
Different
Netty
Client
Actors
are
isolated;
they
do
not
block
each
other.
Figure
8:
Netty
Transport
and
Task
Addressing
For
performance,
effective
batching
is
really
the
key!
We
group
multiple
messages
into
a
single
batch
and
send
it
on
the
wire.
The
batch
size
is
not
fixed;
it
is
adjusted
dynamically
based
on
network
status.
If
the
network
is
available,
we
flush
pending
messages
immediately
without
waiting;
otherwise,
we
put
the
message
in
a
batch
and
trigger
a
timer
to
flush
the
batch
later.
Flow
Control
Without
flow
control,
one
task
can
easily
flood
another
task
with
too
many
messages,
causing
out
of
memory
errors.
Typical
flow
control
uses
a
TCP-‐like
sliding
window,
so
that
source
and
target
can
run
concurrently
without
blocking
each
other.
Figure
9:
Flow
control,
each
task
is
“star”
connected
to
input
and
output
tasks
11.
The
difficult
part
for
our
problem
is
that
each
task
can
have
multiple
input
and
output
tasks.
The
input
and
output
must
be
entangled
so
that
the
backpressure
can
be
properly
propagated
from
downstream
to
upstream,
as
shown
in
Figure
9.
The
flow
control
also
needs
to
consider
failures,
and
it
needs
to
be
able
to
recover
when
there
is
message
loss.
Another
challenge
is
that
the
overhead
of
flow
control
messages
may
be
big.
If
we
ack
every
message,
a
huge
amount
of
ack’d
messages
will
be
in
the
system,
degrading
streaming
performance.
The
approach
we
adopted
is
to
use
explicit
AckRequest
messages.
The
target
tasks
are
only
ack’d
when
they
receive
the
AckRequest
message,
and
the
source
only
sends
an
AckRequest
when
necessary.
With
this
approach,
we
largely
reduced
the
overhead.
Message
Loss
Detection
The
streaming
platform
needs
to
effectively
track
what
messages
have
been
lost
and
recover
as
fast
as
possible.
For
example,
for
web
ads,
the
advertiser
is
charged
for
every
ad
click,
so
miscounts
are
not
acceptable.
Figure
10:
Message
Loss
Detection
We
use
the
flow
control
message
AckRequest
and
Ack
to
detect
message
loss
(Figure
10).
The
target
task
counts
how
many
messages
have
been
received
since
the
last
AckRequest,
and
acks
the
count
back
to
the
source
task.
The
source
task
checks
the
count
and
finds
the
message
loss.
Stream
Replay
In
some
applications,
a
message
cannot
be
lost
and
must
be
replayed.
For
example,
during
a
money
transfer,
the
bank
sends
an
SMS
to
us
with
a
verification
code.
If
that
message
is
lost,
the
system
must
send
another
one
so
that
money
transfer
can
continue.
We
made
the
decision
to
use
source
end
message
storage
and
timestamp-‐based
replay.
12.
Figure
11:
Replay
with
Source
End
Message
Store
Every
message
is
immutable
and
tagged
with
a
timestamp.
We
made
an
assumption
that
the
timestamp
is
approximately
incremental
(allow
small
ratio
message
disorder).
We
assume
the
message
is
coming
from
a
replayable
source,
like
an
Apache
Kafka
queue;
otherwise,
the
message
is
stored
at
a
customizable
source
end
"message
store”.
In
Figure
11,
when
the
source
task
sends
the
message
downstream,
the
timestamp
and
offset
of
the
message
are
also
check-‐
pointed
to
offset-‐timestamp
storage
periodically.
During
recovery,
the
system
first
retrieves
the
right
timestamp
and
offset
from
the
offset-‐timestamp
storage.
Then
it
replays
the
message
store
from
that
timestamp
and
offset.
A
Timestamp
Filter
filters
out
old
messages
in
case
the
message
in
storage
is
not
strictly
time-‐ordered.
Master
High
Availability
In
a
distributed
streaming
system,
any
part
can
fail.
The
system
must
stay
responsive
and
do
recovery
in
case
of
errors.
Figure
12:
Master
High
Availability
We
use
Akka
clustering,
as
shown
in
Figure
12,
to
implement
the
Master
high
availability.
The
cluster
consists
of
several
master
nodes,
but
no
worker
nodes.
With
clustering
facilities,
we
can
easily
detect
and
handle
a
master
node
crash.
The
master
state
is
replicated
on
all
master
nodes
with
the
Typesafe
akka-‐data-‐replication9
library.
When
one
master
node
crashes,
another
standby
master
reads
the
master
state
and
takes
over.
The
master
state
contains
the
submission
data
of
all
applications.
If
one
application
dies,
a
master
can
use
that
state
to
recover
that
application.
CRDT
LwwMap10
is
used
to
represent
the
state;
it
is
a
hash
map
that
can
converge
on
distributed
nodes
13.
without
conflict.
To
have
strong
data
consistency,
the
state
read
and
write
must
happen
on
a
quorum
of
master
nodes.
Global
Clock
Service
Global
clock
service
tracks
the
minimum
timestamp
of
all
pending
messages
in
the
system.
Every
task
updates
its
own
minimum-‐clock
to
global
clock
service
(Figure
13);
the
minimum-‐clock
of
a
task
is
decided
by
the
minimum
of:
l Minimum
timestamp
of
all
pending
messages
in
the
inbox.
l Minimum
timestamp
of
all
unacked
outgoing
messages.
When
there
is
message
loss,
the
minimum
clock
does
not
advance.
l Minimum
clock
of
all
task
states.
If
the
state
is
accumulated
by
a
lot
of
input
messages,
then
the
clock
value
is
decided
by
the
oldest
message’s
timestamp.
The
state
clock
advances
by
doing
snapshots
to
persistent
storage
or
by
fading
out
the
effect
of
old
messages.
Figure
13:
Global
Clock
Service
The
global
clock
service
keeps
track
of
all
task
minimum
clocks
and
maintains
a
global
view
of
minimum
clock.
The
global
minimum
clock
value
is
monotonically
increasing,
meaning
that
all
source
messages
before
this
clock
value
have
been
processed.
If
there
is
message
loss
or
task
crash,
the
global
minimum
clock
stops.
Fault
Tolerant
and
At-‐Least-‐Once
Message
Delivery
With
Akka’s
powerful
actor
supervision,
we
can
implement
a
resilient
system
relatively
easily.
In
GearPump,
different
applications
have
a
different
AppMaster
instance
and
are
totally
isolated
from
each
other.
For
each
application,
there
is
a
supervision
tree,
AppMaster-‐>Executor-‐>Task.
With
this
supervision
hierarchy,
we
can
free
ourselves
from
the
headache
of
zombie
processes.
For
example
if
AppMaster
is
down
its
Akka
supervisor
ensures
the
whole
tree
shuts
down.
14.
Figure
14:
Possible
Failure
Scenarios
and
Error
Supervision
Hierarchy
It
is
also
easier
to
handle
message
loss
and
other
failures.
Multiple
failure
scenarios
are
possible,
as
shown
in
Figure
14.
If
an
AppMaster
crashes,
Master
schedules
a
new
resource
to
restart
the
AppMaster,
and
then
the
AppMaster
requests
resources
to
start
all
executors
and
tasks
in
the
tree.
If
an
Executor
crashes,
its
supervisor
AppMaster
is
notified
and
requests
a
new
resource
from
the
active
master
to
start
the
executor’s
child
tasks.
If
a
task
throws
an
exception,
its
supervisor
executor
restarts
that
Task.
When
“at
least
once”
message
delivery
is
a
requirement,
we
replay
the
messages
in
the
case
of
message
loss.
First,
AppMaster
reads
the
latest
minimum
clock
from
the
global
clock
service
or
clock
storage
if
the
clock
service
crashes.
Then
AppMaster
restarts
all
the
task
actors
to
get
a
fresh
task
state,
and
the
message
source
replays
messages
from
that
minimum
clock.
This
can
be
further
improved
to
restart
partial
tasks
only.
Exactly
Once
Message
Delivery
(ongoing)
For
some
applications,
it
is
extremely
important
to
do
“exactly
once”
message
delivery.
For
example,
for
a
real-‐time
billing
system,
we
do
not
want
to
bill
the
customer
twice.
The
goal
of
exactly
once
message
delivery
is
to
make
sure:
• Errors
don’t
accumulate—today’s
error
will
not
be
accumulated
to
tomorrow
• State
management
is
transparent
to
application
developer
We
use
global
clock
to
synchronize
the
distributed
transactions.
We
assume
every
message
from
the
data
source
has
a
unique
timestamp.
The
timestamp
can
be
a
part
of
the
message
body
or
can
be
attached
later
with
system
clock
when
the
message
is
injected
into
the
streaming
system.
With
this
global
synchronized
clock,
we
can
coordinate
all
tasks
to
checkpoint
at
the
same
timestamp.
15.
Figure
15:
Checkpointing
and
Exactly
Once
Message
delivery
Workflow
to
do
state
checkpointing
(Figure
15):
1. The
coordinator
asks
the
streaming
system
to
do
a
checkpoint
at
timestamp
Tc.
2. Each
application
task
maintains
two
states:
checkpoint
state
and
current
state.
Checkpoint
state
only
contains
information
before
timestamp
Tc.
Current
state
contains
all
information.
3. When
the
global
minimum
clock
is
larger
than
Tc,
all
messages
older
than
Tc
have
been
processed.
The
checkpoint
state
will
no
longer
change,
so
we
can
then
safely
persist
the
checkpoint
state
in
storage.
4. When
there
is
message
loss,
we
start
the
recovery
process.
5. To
recover,
load
the
latest
checkpoint
state
from
storage
and
use
it
to
restore
the
application
status.
6. Data
source
replays
messages
from
the
checkpoint
timestamp.
The
checkpoint
interval
is
dynamically
determined
by
the
global
clock
service
(Figure
16).
Each
data
source
tracks
the
max
timestamp
of
input
messages.
Upon
receiving
min
clock
updates,
the
data
source
reports
the
time
delta
back
to
global
clock
service.
The
max
time
delta
is
the
upper
bound
of
the
application
state
timespan.
The
checkpoint
interval
is
bigger
than
max
delta
time
as
determined
by:
checkpointInterval = 2!! !"#! !"# _!"#$%_!"#$
16.
Figure
16:
How
to
determine
Checkpoint
Interval
After
the
checkpoint
interval
is
notified
to
tasks
by
the
global
clock
service,
each
task
calculates
its
next
checkpoint
timestamp
autonomously
without
global
synchronization.
nextCheckpointTimeStamp = checkpointInterval ∙ (1 +
𝑚𝑎𝑥𝑀𝑒𝑠𝑠𝑎𝑔𝑒𝑇𝑖𝑚𝑒𝑠𝑡𝑎𝑚𝑝𝑂𝑓𝑇ℎ𝑖𝑠𝑇𝑎𝑠𝑘
𝑐ℎ𝑒𝑐𝑘𝑝𝑜𝑖𝑛𝑡𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙
)
Each
task
contains
two
states:
checkpoint
state
and
current
state.
The
code
to
update
the
state
is
shown
in
the
listing
below
(List
1).
TaskState(stateStore, initialTimeStamp):
currentState = stateStore.load(initialTimeStamp)
checkpointState = currentState.clone
checkpointTimestamp = nextCheckpointTimeStamp(initialTimeStamp)
onMessage(msg):
if (msg.timestamp < checkpointTimestamp):
checkpointState.updateMessage(msg)
currentState.updateMessage(msg)
maxClock = max(maxClock, msg.timeStamp)
onMinClock(minClock):
if (minClock > checkpointTimestamp):
stateStore.persist(checkpointState)
checkpointTimeStamp = nextCheckpointTimeStamp(maxClock)
checkpointState = currentState.clone
onNewCheckpointInterval(newStep):
step = newStep
nextCheckpointTimeStamp(timestamp):
checkpointTimestamp = (1 + timestamp/step) * step
List
1:
Task
Transactional
State
Implementation
17.
The
in-‐memory
current
state
can
be
queried
directly
by
the
client,
it
represents
the
real
time
view
of
the
data.
Dynamic
Graph
(ongoing)
We
want
to
be
able
to
modify
the
DAG
dynamically
so
we
can
add,
remove,
and
replace
sub-‐graphs
(Figure
17).
Figure
17:
Dynamic
Graph,
Attach,
Replace,
and
Remove
Observations
on
Akka
Akka
is
a
great
tool
for
streaming.
The
message-‐driven
architecture
and
intuitive
Actor
Model
largely
influence
how
a
distributed
service
should
be
designed.
Akka
and
its
libraries
give
us
enough
flexibility
and
power
to
build
a
real-‐time
and
scalable
distributed
streaming
platform.
l With
Akka
remoting,
we
can
easily
provision
a
computation
DAG
of
remote
nodes,
including
code
provisioning.
l With
Akka
clustering,
we
immediately
have
a
distributed
consistent
state
to
build
a
high
available
service.
l With
location
transparency,
we
can
handle
scale-‐up
and
scale-‐out
in
the
same
way;
it
is
also
easier
for
us
to
test
our
code
using
a
single
node.
l With
an
error
supervision
tree,
we
are
better
situated
to
handle
the
errors
and
recover
with
the
best
approach.
l Akka
has
very
good
extensibility;
we
have
customized
Akka
on
serialization,
message
passing,
and
metrics
to
meet
particular
needs.
Because
streaming
applications
has
extreme
demands
on
throughput
and
latency
when
processing
large
volumes
of
data,
we
customized
Akka
to
better
meet
our
performance
requirements.
l For
the
message
passing
between
tasks,
we
implemented
a
customized
message
passing
extension
that
batches
the
messages
before
sending
across
the
network.
l To
reduce
the
actor
path
overhead,
we
developed
a
lightweight
method
to
address
the
task
uniquely
in
the
application.
l To
reduce
the
CPU
time
of
serialization,
we
avoid
using
a
pooled
serializer
and
the
heavy
serializer
lookup
process.
We
also
avoid
serializing
the
same
message
multiple
times.
l To
reduce
the
Akka
clustering
messaging
overhead,
the
worker
is
designed
to
NOT
be
part
of
Akka
cluster.
18.
l For
flow
control,
our
implementation
is
slightly
different
with
Akka
streaming,
the
rate
of
Ack
messages
is
smaller.
One
case
we
have
not
solved
is
timeout
detection.
We
want
to
avoid
the
following
two
scenarios:
A) When
one
node
crashes,
the
watching
node
takes
too
long
to
note
its
death.
B) When
the
CPU
load
is
high,
actors
may
start
to
disassociate
with
other
nodes.
The
reality
is
that
either
or
both
of
these
may
occur.
We
want
a
faster
and
more
robust
way
to
differentiate
real
failures
from
temporary
unavailability,
so
we
can
recover
faster
in
case
of
real
failures
and
be
more
resilient
in
case
of
temporary
load
increases.
Experiment
Results
To
illustrate
the
performance
of
GearPump,
we
mainly
focused
on
two
aspects:
throughput
and
latency,
using
a
micro
benchmark
called
SOL
(an
example
in
the
GearPump
distro)
whose
topology
is
quite
simple.
SOLStreamProducer
delivers
messages
to
SOLStreamProcessor
constantly
and
SOLStreamProcessor
does
nothing.
We
set
up
a
4-‐node
cluster
(each
node
was
an
Intel®
Xeon®
32-‐
core
processor
E5-‐2680
2.70GHz
with
128GB
memory)
and
a
10GbE
network.
Throughput
Figure
18
shows
that
the
whole
throughput
of
the
cluster
can
reach
about
11
million
messages/second
(100
bytes
per
message).
Figure
18:
Performance
Evaluation,
Throughput,
and
Latency
Latency
When
we
transfer
messages
at
the
max
throughput
above,
the
average
latency
(Figure
19)
between
two
tasks
is
17ms
with
a
standard
deviation
of
13ms.
19.
Figure
19:
Latency
between
Two
tasks(ms)
Fault
Recovery
time
When
corruption
is
detected—for
example,
the
Executor
is
down—GearPump
reallocates
the
resource
and
restarts
the
application.
It
takes
about
10
seconds
to
recover
the
application.
Future
Plan(s)
We
are
still
working
on
transactional
support
and
dynamic
graph
support
for
our
next
release.
Besides
these
two,
we
also
want
to
incorporate
these
features:
l StreamSQL
Per
O'Reilly
Media
Strata’s
report,
Data
Science
Salary
Survey
in
2013,SQL
is
the
most
used
data
tool
for
data
scientists.
StreamSQL
is
a
must-‐have
feature.
We
want
to
support
translating
a
SQL
statement
to
a
running
DAG,
and
supporting
stream
special
functions
like
windows
operations.
l Visualization
Everything
is
moving
in
streaming
applications,
and
it
is
hard
to
know
when
unexpected
things
happen,
so
visualization
is
extremely
important.
Visualization
will
give
us
insights
about
how
the
computation
DAG
is
composed,
latency
and
throughput,
system
and
network
load
status,
and
error
information.
We
have
preliminary
REST
support
that
emits
a
given
AppMasters
DAG
in
JSON.
We’ll
likely
have
visualization
using
viz.js
in
the
next
release.
l Data
skew
handling
with
Auto-‐Split
Data
skew
is
common
in
big
data,
and
it
highly
impacts
application
performance.
In
case
of
data
skew,
some
tasks
may
be
too
busy,
while
the
others
are
free.
We
want
the
busy
task
to
be
able
to
split
into
small
tasks
automatically
in
response
to
load
change.
Summary
Akka
is
characterized
by
its
simplicity
and
elegance.
The
concepts
of
being
message
driven
and
providing
computation
isolation
makes
Akka
a
natural
fit
for
streaming
frameworks.
Akka
provides
20.
a
solid
foundation
that
allows
developers
to
focus
on
the
data
logic
and
the
specific
requirements
of
streaming.
GearPump
exploits
the
current
Akka
feature
set
and
is
small
and
nimble,
only
5000
lines
of
code
in
total
yet
provides
compelling
streaming
features
including
scalability,
low
latency,
resilience,
error
isolation,
and
high
availability.
Performance
is
not
traded
for
features:
11
million
messages
per
second
is
a
strong
performance
metric.
Finally,
we
provide
a
growing
number
of
innovations
including
code
provisioning
and
in-‐flight
DAG
modification.
The
platform
is
rapidly
evolving
and
will
be
directed
towards
solving
modern
day
challenges
at
scale.
We
invite
those
interested
to
explore
the
tutorial
at
https://github.com/intel-‐hadoop/gearpump.
Reference
1
Millwheel
http://research.google.com/pubs/pub41378.html
2
Apache
Storm
http://storm.apache.org/
3
Spark
Streaming
https://spark.apache.org/streaming/
4
Apache
Samza
http://samza.incubator.apache.org/
5
Apache
TEZ
http://zh.hortonworks.com/hadoop/tez/
6
Hadoop
YARN
http://hadoop.apache.org/docs/current/hadoop-‐yarn/hadoop-‐yarn-‐
site/YARN.html
7
YARN
Architecture
Paper
Section
3
http://dl.acm.org/citation.cfm?id=2523633
8
Scala
class
implicits
https://www.google.com/search?q=scala+pimp+my+library
9
Akka
Data
replication
library
https://github.com/patriknw/akka-‐data-‐replication
10
A
comprehensive
study
of
Convergent
and
Commutative
Replicated
Data
Types
https://hal.archives-‐ouvertes.fr/file/index/docid/555588/filename/techreport.pdf
Notices
INFORMATION
IN
THIS
DOCUMENT
IS
PROVIDED
IN
CONNECTION
WITH
INTEL
PRODUCTS.
NO
LICENSE,
EXPRESS
OR
IMPLIED,
BY
ESTOPPEL
OR
OTHERWISE,
TO
ANY
INTELLECTUAL
PROPERTY
RIGHTS
IS
GRANTED
BY
THIS
DOCUMENT.
EXCEPT
AS
PROVIDED
IN
INTEL'S
TERMS
AND
CONDITIONS
OF
SALE
FOR
SUCH
PRODUCTS,
INTEL
ASSUMES
NO
LIABILITY
WHATSOEVER
AND
INTEL
DISCLAIMS
ANY
EXPRESS
OR
IMPLIED
WARRANTY,
RELATING
TO
SALE
AND/OR
USE
OF
INTEL
PRODUCTS
INCLUDING
LIABILITY
OR
WARRANTIES
RELATING
TO
FITNESS
FOR
A
PARTICULAR
PURPOSE,
MERCHANTABILITY,
OR
INFRINGEMENT
OF
ANY
PATENT,
COPYRIGHT
OR
OTHER
INTELLECTUAL
PROPERTY
RIGHT.
UNLESS
OTHERWISE
AGREED
IN
WRITING
BY
INTEL,
THE
INTEL
PRODUCTS
ARE
NOT
DESIGNED
NOR
INTENDED
FOR
ANY
APPLICATION
IN
WHICH
THE
FAILURE
OF
THE
INTEL
PRODUCT
COULD
CREATE
A
SITUATION
WHERE
PERSONAL
INJURY
OR
DEATH
MAY
OCCUR.
Intel
may
make
changes
to
specifications
and
product
descriptions
at
any
time,
without
notice.
Designers
must
not
rely
on
the
absence
or
characteristics
of
any
features
or
instructions
marked
"reserved"
or
"undefined."
Intel
reserves
these
for
future
definition
and
shall
have
no
responsibility
whatsoever
for
conflicts
or
incompatibilities
arising
from
future
changes
to
them.
The
information
here
is
subject
to
change
without
notice.
Do
not
finalize
a
design
with
this
information.
The
products
described
in
this
document
may
contain
design
defects
or
errors
known
as
errata
which
may