The document discusses concurrency and distribution in applications using Akka, Java and Scala. It covers key concepts like actors, messages and message passing in Akka. It describes how actors encapsulate state and behavior, communicate asynchronously via message passing and provide built-in concurrency without shared state or locks. The document also discusses patterns for building distributed, fault tolerant and scalable applications using Akka actors deployed locally or remotely.
Apache Spark is a fast, general engine for large-scale data processing. It supports batch, interactive, and stream processing using a unified API. Spark uses resilient distributed datasets (RDDs), which are immutable distributed collections of objects that can be operated on in parallel. RDDs support transformations like map, filter, and reduce and actions that return final results to the driver program. Spark provides high-level APIs in Scala, Java, Python, and R and an optimized engine that supports general computation graphs for data analysis.
Apache Spark is a fast and general engine for large-scale data processing. It provides a unified API for batch, interactive, and streaming data processing using in-memory primitives. A benchmark showed Spark was able to sort 100TB of data 3 times faster than Hadoop using 10 times fewer machines by keeping data in memory between jobs.
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sewz2m
This CloudxLab Key-Value RDD tutorial helps you to understand Key-Value RDD in detail. Below are the topics covered in this tutorial:
1) Spark Key-Value RDD
2) Creating Key-Value Pair RDDs
3) Transformations on Pair RDDs - reduceByKey(func)
4) Count Word Frequency in a File using Spark
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kyXPo0
This CloudxLab Writing MapReduce Programs tutorial helps you to understand how to write MapReduce Programs using Java in detail. Below are the topics covered in this tutorial:
1) Why MapReduce?
2) Write a MapReduce Job to Count Unique Words in a Text File
3) Create Mapper and Reducer in Java
4) Create Driver
5) MapReduce Input Splits, Secondary Sorting, and Partitioner
6) Combiner Functions in MapReduce
7) Job Chaining and Pipes in MapReduce
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2skCodH
This CloudxLab Understanding MapReduce tutorial helps you to understand MapReduce in detail. Below are the topics covered in this tutorial:
1) Thinking in Map / Reduce
2) Understanding Unix Pipeline
3) Examples to understand MapReduce
4) Merging
5) Mappers & Reducers
6) Mapper Example
7) Input Split
8) mapper() & reducer() Code
9) Example - Count number of words in a file using MapReduce
10) Example - Compute Max Temperature using MapReduce
11) Hands-on - Count number of words in a file using MapReduce on CloudxLab
My Hadoop Ecosystem presentation at the 2011 BreizhCamp.
See the talk video (in french):
http://mediaserver.univ-rennes1.fr/videos/?video=MEDIA110628093346744
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sh5b3E
This CloudxLab Hadoop Streaming tutorial helps you to understand Hadoop Streaming in detail. Below are the topics covered in this tutorial:
1) Hadoop Streaming and Why Do We Need it?
2) Writing Streaming Jobs
3) Testing Streaming jobs and Hands-on on CloudxLab
Apache Spark is a fast, general engine for large-scale data processing. It supports batch, interactive, and stream processing using a unified API. Spark uses resilient distributed datasets (RDDs), which are immutable distributed collections of objects that can be operated on in parallel. RDDs support transformations like map, filter, and reduce and actions that return final results to the driver program. Spark provides high-level APIs in Scala, Java, Python, and R and an optimized engine that supports general computation graphs for data analysis.
Apache Spark is a fast and general engine for large-scale data processing. It provides a unified API for batch, interactive, and streaming data processing using in-memory primitives. A benchmark showed Spark was able to sort 100TB of data 3 times faster than Hadoop using 10 times fewer machines by keeping data in memory between jobs.
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sewz2m
This CloudxLab Key-Value RDD tutorial helps you to understand Key-Value RDD in detail. Below are the topics covered in this tutorial:
1) Spark Key-Value RDD
2) Creating Key-Value Pair RDDs
3) Transformations on Pair RDDs - reduceByKey(func)
4) Count Word Frequency in a File using Spark
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kyXPo0
This CloudxLab Writing MapReduce Programs tutorial helps you to understand how to write MapReduce Programs using Java in detail. Below are the topics covered in this tutorial:
1) Why MapReduce?
2) Write a MapReduce Job to Count Unique Words in a Text File
3) Create Mapper and Reducer in Java
4) Create Driver
5) MapReduce Input Splits, Secondary Sorting, and Partitioner
6) Combiner Functions in MapReduce
7) Job Chaining and Pipes in MapReduce
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2skCodH
This CloudxLab Understanding MapReduce tutorial helps you to understand MapReduce in detail. Below are the topics covered in this tutorial:
1) Thinking in Map / Reduce
2) Understanding Unix Pipeline
3) Examples to understand MapReduce
4) Merging
5) Mappers & Reducers
6) Mapper Example
7) Input Split
8) mapper() & reducer() Code
9) Example - Count number of words in a file using MapReduce
10) Example - Compute Max Temperature using MapReduce
11) Hands-on - Count number of words in a file using MapReduce on CloudxLab
My Hadoop Ecosystem presentation at the 2011 BreizhCamp.
See the talk video (in french):
http://mediaserver.univ-rennes1.fr/videos/?video=MEDIA110628093346744
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sh5b3E
This CloudxLab Hadoop Streaming tutorial helps you to understand Hadoop Streaming in detail. Below are the topics covered in this tutorial:
1) Hadoop Streaming and Why Do We Need it?
2) Writing Streaming Jobs
3) Testing Streaming jobs and Hands-on on CloudxLab
This was the first session about Hadoop and MapReduce. It introduces what Hadoop is and its main components. It also covers the how to program your first MapReduce task and how to run it on pseudo distributed Hadoop installation.
This session was given in Arabic and i may provide a video for the session soon.
Slides of the workshop conducted in Model Engineering College, Ernakulam, and Sree Narayana Gurukulam College, Kadayiruppu
Kerala, India in December 2010
Pig is an open-source dataflow system that allows users to analyze large datasets through a high-level language called Pig Latin. It sits on top of Hadoop and compiles Pig Latin queries into MapReduce jobs. Pig provides simple operations for data manipulation like filtering, grouping, joining, and generating new columns. It is commonly used by companies like Yahoo, Twitter, and LinkedIn to process web logs, build user behavior models, and perform other large-scale data analysis tasks.
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2LCTufA
This CloudxLab Introduction to SparkR tutorial helps you to understand SparkR in detail. Below are the topics covered in this tutorial:
1) SparkR (R on Spark)
2) SparkR DataFrames
3) Launch SparkR
4) Creating DataFrames from Local DataFrames
5) DataFrame Operation
6) Creating DataFrames - From JSON
7) Running SQL Queries from SparkR
Apache Spark is a fast and general engine for large-scale data processing. It uses RDDs (Resilient Distributed Datasets) that allow data to be partitioned across clusters. Spark supports operations like transformations that create new RDDs and actions that return values. Key operations include map, filter, reduceByKey. RDDs can be persisted in memory to improve performance of iterative jobs. Spark runs on clusters managed by YARN, Spark Standalone, or Mesos and provides a driver program and executors on worker nodes to process data in parallel.
The document provides an overview of various Apache Pig features including:
- The Grunt shell which allows interactive execution of Pig Latin scripts and access to HDFS.
- Advanced relational operators like SPLIT, ASSERT, CUBE, SAMPLE, and RANK for transforming data.
- Built-in functions and user defined functions (UDFs) for data processing. Macros can also be defined.
- Running Pig in local or MapReduce mode and accessing HDFS from within Pig scripts.
Shark is a SQL query engine built on top of Spark, a fast MapReduce-like engine. It extends Spark to support SQL and complex analytics efficiently while maintaining the fault tolerance and scalability of MapReduce. Shark uses techniques from databases like columnar storage and dynamic query optimization to improve performance. Benchmarks show Shark can perform SQL queries and machine learning algorithms faster than traditional MapReduce systems like Hive and Hadoop. The goal of Shark is to provide a unified system for both SQL and complex analytics processing at large scale.
Tokyo Cabinet is a library of routines for managing a database. The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. Records are organized in hash table, B+ tree, or fixed-length array.
Onyx is a data processing framework for Clojure that allows users to define workflows, functions, and windows to process streaming and batch data across distributed clusters. It uses concepts like peers, virtual peers, and Zookeeper for scheduling and Aeron for messaging. Users can write Onyx jobs in Clojure to perform ETL, analytics, and other data processing tasks in a declarative way.
This is a deck of slides from a recent meetup of AWS Usergroup Greece, presented by Ioannis Konstantinou from the National Technical University of Athens.
The presentation gives an overview of the Map Reduce framework and a description of its open source implementation (Hadoop). Amazon's own Elastic Map Reduce (EMR) service is also mentioned. With the growing interest on Big Data this is a good introduction to the subject.
dmapply: A functional primitive to express distributed machine learning algor...Bikash Chandra Karmokar
ddR is a package that introduces distributed data structures in R like darray, dframe, and dlist. It provides a standardized API for distributed iteration and data manipulation through functions like dmapply. ddR aims to make distributed computing in R easier to use with good performance by writing algorithms once that can run on different distributed backends like Spark, HPE Distributed R through its unified interface. Evaluation shows ddR algorithms have performance comparable or better than custom implementations and other machine learning libraries.
This document provides an introduction and overview of Apache Spark. It discusses why in-memory computing is important for speed, compares Spark and Ignite, describes what Spark is and how it works using Resilient Distributed Datasets (RDDs) and a directed acyclic graph (DAG) model. It also provides examples of Spark operations on RDDs and shows a word count example in Java, Scala and Python.
Shark is a new data analysis system that marries SQL queries with complex analytics like machine learning on large clusters. It uses Spark as an execution engine and provides in-memory columnar storage with extensions like partial DAG execution and co-partitioning tables to optimize query performance. Shark also supports expressing machine learning algorithms in SQL to avoid moving data out of the database. It aims to efficiently support both SQL and complex analytics while retaining fault tolerance and allowing users to choose loading frequently used data into memory for fast queries.
Mesos provides a distributed systems kernel that allows organizations to dynamically share resources between distributed applications like Hadoop, Spark, and Storm. It addresses issues with static resource partitioning, like increased complexity and poor resource utilization. Mesos introduces an abstraction layer that bundles all machines in a cluster into a single shared pool. It provides APIs for building frameworks to run applications that leverage the shared resources.
This document provides a technical introduction to Hadoop, including:
- Hadoop has been tested on a 4000 node cluster with 32,000 cores and 16 petabytes of storage.
- Key Hadoop concepts are explained, including jobs, tasks, task attempts, mappers, reducers, and the JobTracker and TaskTrackers.
- The process of launching a MapReduce job is described, from the client submitting the job to the JobTracker distributing tasks to TaskTrackers and running the user-defined mapper and reducer classes.
apache pig performance optimizations talk at apachecon 2010Thejas Nair
Pig provides a high-level language called Pig Latin for analyzing large datasets. It optimizes Pig Latin scripts by restructuring the logical query plan through techniques like predicate pushdown and operator rewriting, and by generating efficient physical execution plans that leverage features like combiners, different join algorithms, and memory management. Future work aims to improve memory usage and allow joins and groups within a single MapReduce job when keys are the same.
The document discusses using Ruby for big data applications, including using Ruby with NoSQL databases like Cassandra and Hadoop for distributed storage and processing, and integrating Ruby with real-time streaming frameworks like Storm. It also covers using REST APIs to allow Ruby applications to interact with these big data systems and perform batch and real-time processing of data.
Hadoop is an open source framework for running large-scale data processing jobs across clusters of computers. It has two main components: HDFS for reliable storage and Hadoop MapReduce for distributed processing. HDFS stores large files across nodes through replication and uses a master-slave architecture. MapReduce allows users to write map and reduce functions to process large datasets in parallel and generate results. Hadoop has seen widespread adoption for processing massive datasets due to its scalability, reliability and ease of use.
Speaking of big data analysis, what comes to mind is possibly using HDFS and MapReduce within Hadoop. But to write a MapReduce program, one must face the problem of learning how to write native java. One might wonder is it possible to use R, the most popular language adapted by data scientist, to implement MapReduce program? And through the integration or R and Hadoop, is it truly one can unleash the power of parallel computing and big data analysis?
This slide introduces how to install RHadoop step by step, and introduces how to write a MapReduce program through R. What is more, this slide will discuss whether RHadoop is really a light for big data analysis, or just another method to write MapReduce Program.
Please mail me if you found any problem toward the slide. EMAIL: tr.ywchiu@gmail.com
談到巨量資料,通常大家腦海中聯想到的就是使用Hadoop 的 MapReduce 和HDFS,但是撰寫MapReduce,則就必須要學會撰寫Java 或透過Thrift 接口才能撰寫。但R是否有辦法運行在Hadoop 上呢 ? 而使用R + Hadoop,是否就真的能結合R強大的分析功能,分析巨量資料呢 ?
本次講題將介紹如何Step by step 在Hadoop 上安裝RHadoop相關套件,並介紹如何撰寫R的MapReduce 程式。更重要的是,此次將探討使用RHadoop 是否為巨量資料分析找到一盞明燈? 或者只是另一套實作方法而已?
This document introduces Scala collections and their key features:
- Collections provide a concise, safe, and fast way to process collections of data through built-in functions.
- Collections can be mutable or immutable, with immutable being the default. Mutable collections require importing specific packages.
- The core abstractions are Traversable, Iterable, and Seq, with traits like Set and Map defining specific collection types.
- Common collection types include lists, arrays, buffers, and queues - each with their own performance characteristics for different usage cases.
System Integration with Akka and Apache Camelkrasserm
This document summarizes the Apache Camel integration framework and how it can be used with Akka actors. Camel provides a domain-specific language and components for integration patterns and protocols. Akka actors handle asynchronous message processing and can be used as Camel consumers and producers through Akka-Camel integration. Consumer actors receive messages from Camel endpoints, while producer actors send messages to endpoints. Actor components allow actors to be used directly in Camel routes.
This was the first session about Hadoop and MapReduce. It introduces what Hadoop is and its main components. It also covers the how to program your first MapReduce task and how to run it on pseudo distributed Hadoop installation.
This session was given in Arabic and i may provide a video for the session soon.
Slides of the workshop conducted in Model Engineering College, Ernakulam, and Sree Narayana Gurukulam College, Kadayiruppu
Kerala, India in December 2010
Pig is an open-source dataflow system that allows users to analyze large datasets through a high-level language called Pig Latin. It sits on top of Hadoop and compiles Pig Latin queries into MapReduce jobs. Pig provides simple operations for data manipulation like filtering, grouping, joining, and generating new columns. It is commonly used by companies like Yahoo, Twitter, and LinkedIn to process web logs, build user behavior models, and perform other large-scale data analysis tasks.
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2LCTufA
This CloudxLab Introduction to SparkR tutorial helps you to understand SparkR in detail. Below are the topics covered in this tutorial:
1) SparkR (R on Spark)
2) SparkR DataFrames
3) Launch SparkR
4) Creating DataFrames from Local DataFrames
5) DataFrame Operation
6) Creating DataFrames - From JSON
7) Running SQL Queries from SparkR
Apache Spark is a fast and general engine for large-scale data processing. It uses RDDs (Resilient Distributed Datasets) that allow data to be partitioned across clusters. Spark supports operations like transformations that create new RDDs and actions that return values. Key operations include map, filter, reduceByKey. RDDs can be persisted in memory to improve performance of iterative jobs. Spark runs on clusters managed by YARN, Spark Standalone, or Mesos and provides a driver program and executors on worker nodes to process data in parallel.
The document provides an overview of various Apache Pig features including:
- The Grunt shell which allows interactive execution of Pig Latin scripts and access to HDFS.
- Advanced relational operators like SPLIT, ASSERT, CUBE, SAMPLE, and RANK for transforming data.
- Built-in functions and user defined functions (UDFs) for data processing. Macros can also be defined.
- Running Pig in local or MapReduce mode and accessing HDFS from within Pig scripts.
Shark is a SQL query engine built on top of Spark, a fast MapReduce-like engine. It extends Spark to support SQL and complex analytics efficiently while maintaining the fault tolerance and scalability of MapReduce. Shark uses techniques from databases like columnar storage and dynamic query optimization to improve performance. Benchmarks show Shark can perform SQL queries and machine learning algorithms faster than traditional MapReduce systems like Hive and Hadoop. The goal of Shark is to provide a unified system for both SQL and complex analytics processing at large scale.
Tokyo Cabinet is a library of routines for managing a database. The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. Records are organized in hash table, B+ tree, or fixed-length array.
Onyx is a data processing framework for Clojure that allows users to define workflows, functions, and windows to process streaming and batch data across distributed clusters. It uses concepts like peers, virtual peers, and Zookeeper for scheduling and Aeron for messaging. Users can write Onyx jobs in Clojure to perform ETL, analytics, and other data processing tasks in a declarative way.
This is a deck of slides from a recent meetup of AWS Usergroup Greece, presented by Ioannis Konstantinou from the National Technical University of Athens.
The presentation gives an overview of the Map Reduce framework and a description of its open source implementation (Hadoop). Amazon's own Elastic Map Reduce (EMR) service is also mentioned. With the growing interest on Big Data this is a good introduction to the subject.
dmapply: A functional primitive to express distributed machine learning algor...Bikash Chandra Karmokar
ddR is a package that introduces distributed data structures in R like darray, dframe, and dlist. It provides a standardized API for distributed iteration and data manipulation through functions like dmapply. ddR aims to make distributed computing in R easier to use with good performance by writing algorithms once that can run on different distributed backends like Spark, HPE Distributed R through its unified interface. Evaluation shows ddR algorithms have performance comparable or better than custom implementations and other machine learning libraries.
This document provides an introduction and overview of Apache Spark. It discusses why in-memory computing is important for speed, compares Spark and Ignite, describes what Spark is and how it works using Resilient Distributed Datasets (RDDs) and a directed acyclic graph (DAG) model. It also provides examples of Spark operations on RDDs and shows a word count example in Java, Scala and Python.
Shark is a new data analysis system that marries SQL queries with complex analytics like machine learning on large clusters. It uses Spark as an execution engine and provides in-memory columnar storage with extensions like partial DAG execution and co-partitioning tables to optimize query performance. Shark also supports expressing machine learning algorithms in SQL to avoid moving data out of the database. It aims to efficiently support both SQL and complex analytics while retaining fault tolerance and allowing users to choose loading frequently used data into memory for fast queries.
Mesos provides a distributed systems kernel that allows organizations to dynamically share resources between distributed applications like Hadoop, Spark, and Storm. It addresses issues with static resource partitioning, like increased complexity and poor resource utilization. Mesos introduces an abstraction layer that bundles all machines in a cluster into a single shared pool. It provides APIs for building frameworks to run applications that leverage the shared resources.
This document provides a technical introduction to Hadoop, including:
- Hadoop has been tested on a 4000 node cluster with 32,000 cores and 16 petabytes of storage.
- Key Hadoop concepts are explained, including jobs, tasks, task attempts, mappers, reducers, and the JobTracker and TaskTrackers.
- The process of launching a MapReduce job is described, from the client submitting the job to the JobTracker distributing tasks to TaskTrackers and running the user-defined mapper and reducer classes.
apache pig performance optimizations talk at apachecon 2010Thejas Nair
Pig provides a high-level language called Pig Latin for analyzing large datasets. It optimizes Pig Latin scripts by restructuring the logical query plan through techniques like predicate pushdown and operator rewriting, and by generating efficient physical execution plans that leverage features like combiners, different join algorithms, and memory management. Future work aims to improve memory usage and allow joins and groups within a single MapReduce job when keys are the same.
The document discusses using Ruby for big data applications, including using Ruby with NoSQL databases like Cassandra and Hadoop for distributed storage and processing, and integrating Ruby with real-time streaming frameworks like Storm. It also covers using REST APIs to allow Ruby applications to interact with these big data systems and perform batch and real-time processing of data.
Hadoop is an open source framework for running large-scale data processing jobs across clusters of computers. It has two main components: HDFS for reliable storage and Hadoop MapReduce for distributed processing. HDFS stores large files across nodes through replication and uses a master-slave architecture. MapReduce allows users to write map and reduce functions to process large datasets in parallel and generate results. Hadoop has seen widespread adoption for processing massive datasets due to its scalability, reliability and ease of use.
Speaking of big data analysis, what comes to mind is possibly using HDFS and MapReduce within Hadoop. But to write a MapReduce program, one must face the problem of learning how to write native java. One might wonder is it possible to use R, the most popular language adapted by data scientist, to implement MapReduce program? And through the integration or R and Hadoop, is it truly one can unleash the power of parallel computing and big data analysis?
This slide introduces how to install RHadoop step by step, and introduces how to write a MapReduce program through R. What is more, this slide will discuss whether RHadoop is really a light for big data analysis, or just another method to write MapReduce Program.
Please mail me if you found any problem toward the slide. EMAIL: tr.ywchiu@gmail.com
談到巨量資料,通常大家腦海中聯想到的就是使用Hadoop 的 MapReduce 和HDFS,但是撰寫MapReduce,則就必須要學會撰寫Java 或透過Thrift 接口才能撰寫。但R是否有辦法運行在Hadoop 上呢 ? 而使用R + Hadoop,是否就真的能結合R強大的分析功能,分析巨量資料呢 ?
本次講題將介紹如何Step by step 在Hadoop 上安裝RHadoop相關套件,並介紹如何撰寫R的MapReduce 程式。更重要的是,此次將探討使用RHadoop 是否為巨量資料分析找到一盞明燈? 或者只是另一套實作方法而已?
This document introduces Scala collections and their key features:
- Collections provide a concise, safe, and fast way to process collections of data through built-in functions.
- Collections can be mutable or immutable, with immutable being the default. Mutable collections require importing specific packages.
- The core abstractions are Traversable, Iterable, and Seq, with traits like Set and Map defining specific collection types.
- Common collection types include lists, arrays, buffers, and queues - each with their own performance characteristics for different usage cases.
System Integration with Akka and Apache Camelkrasserm
This document summarizes the Apache Camel integration framework and how it can be used with Akka actors. Camel provides a domain-specific language and components for integration patterns and protocols. Akka actors handle asynchronous message processing and can be used as Camel consumers and producers through Akka-Camel integration. Consumer actors receive messages from Camel endpoints, while producer actors send messages to endpoints. Actor components allow actors to be used directly in Camel routes.
Presentation on handling non-existence of data in Java et. al. (e.g. the problem with pesky nulls) and an introduction to the Option monad in Scala as a "solution" to this problem.
I presented this talk June, 28th 2013 at CPH Scala Group meeting, and a week later, July 3rd, at the "Scala User Group Århus" meetup.
In this short introduction, I try to frame the problem, i.e. the large amounts of error-prone null-checking code we usually have to write in Java, and Introduce the Option monad (Some/None) in Scala, as a solution. I explain the basics of what the Option class provides, and various ways of using it, ranging from basic level isEmtpy, over pattern-matching to more advanced fully functional "collection-style" (e.g. map, flatMap) operations and finally by using the for-comprehension.
Also includes links to relevant resources for further reading on the last slide.
The document discusses exception handling in Scala code. It provides examples of the original code with try/catch blocks, a modified version that handles specific exception types in catch, and an improved approach that centralizes exception handling. The improved approach throws a custom CandAllExceptions for any exception in the catch block, allowing exceptions to be handled in one place rather than throughout the code. It also discusses best practices like using ProcessLogger instead of Process to catch process exceptions, creating custom exceptions with messages and causes, and annotating methods with @throws only if calling from Java.
Introduction to Functional Programming with Scalapramode_ce
The document provides an introduction to functional programming with Scala. It outlines the following topics that will be covered: learning Scala syntax and writing simple programs; important functional programming concepts like closures, higher-order functions, purity, lazy evaluation, currying, tail calls, immutability, and type inference; and understanding the functional programming paradigm through Scala. It also provides some background information on Scala and examples of Scala code demonstrating various concepts.
Embrace NoSQL and Eventual Consistency with RippleSean Cribbs
So, there's this "NoSQL" thing you may have heard of, and this related thing called "eventual consistency". Supposedly, they help you scale, but no one has ever explained why! Well, wonder no more! This talk will demystify NoSQL, eventual consistency, how they might help you scale, and -- most importantly -- why you should care.
We'll look closely at how Riak, a linearly-scalable, distributed and fault-tolerant NoSQL datastore, implements eventual consistency, and how you can harness it from Ruby via the slick Ripple client/ORM. When the talk is finished, you'll have the tools both to understand eventual consistency and to handle it like a pro inside your next Ruby application.
This document discusses using Akka and microservices architecture for building distributed applications. It covers key Akka concepts like actors and messaging. It also discusses domain-driven design patterns for modeling application domains and boundaries. The document recommends using asynchronous messaging between microservices. It provides an example of using Apache Camel for enterprise integration. Finally, it discusses using Akka Clustering and Persistence for building highly available stateful services in a distributed fashion.
Declarative Multilingual Information Extraction with SystemTLaura Chiticariu
Information extraction (IE), the task of extracting structured information from unstructured or semi-structured data, is increasingly important to a wide array of enterprise applications, ranging from Business Intelligence to Data-as-a-Service.
In the first part of the talk, we give an overview of SystemT, a declarative IE system designed and developed to address the requirements driven by modern applications: scalability, expressivity, and transparency. SystemT is based on the basic principle underlying relational database technology: complete separation of specification from execution. SystemT uses a declarative language for expressing NLP algorithms called AQL, and an optimizer that generates high-performance algebraic execution plans for AQL rules. It makes IE orders of magnitude more scalable and easy to use, maintain and customize. Today, SystemT ships with multiple products across 4 IBM Software Brands and it being taught in universities. Our ongoing research and development efforts focus on making SystemT more usable for both technical and business users, and continuing enhancing its core functionalities based on natural language processing, machine learning, and database technology.
In the second part of the talk we present POLYGLOT, a multilingual semantic role labeling system capable of semantically parsing sentences in 9 different languages from 4 different language groups. The key feature of the system is that it treats the semantic labels of the English Proposition Bank as “universal semantic labels”: Given a sentence in any of the supported languages, POLYGLOT will predict appropriate English PropBank frame and role annotation. We illustrate how these universal semantic labels can be used within SystemT to create information extractors that immediately work across different languages. In addition, we illustrate how we automatically generate Proposition Banks for new languages in order to enable multilingual SRL and discuss some challenges of crosslingual semantics.
Notes on a High-Performance JSON ProtocolDaniel Austin
This is my presentation from JSConf 2011. I am proposing a new Web protocol to improve performance across the Internet. It's based on a dual-band protocol layered over TCP/IP and UDP and is backward compatible with existing HTTP-based systems.
Actors, a Unifying Pattern for Scalable Concurrency | C4 2006 Real Nobile
Actors are a design pattern for scalable concurrency that encapsulate state, instructions, and execution context. They communicate asynchronously via message passing. This model naturally scales across machines using the same pattern. Actors avoid issues with traditional threading by preventing direct access to shared state and using message-passing. This allows scaling across cores by using user-level threads for concurrency on a single core and OS processes across multiple cores. Future hardware may implement the actor model directly by distributing processing across cores in a machine-independent manner.
The document introduces Akka, an open-source toolkit for building distributed, concurrent applications on the JVM. It provides a programming model called the actor model that makes it easier to build scalable and fault-tolerant systems. Actors process messages asynchronously and avoid shared state, providing a simpler approach to concurrency than traditional threads and locks. Akka allows actors to be distributed across a network, enabling applications to scale out elastically.
Keynote given at BOSC, 2010.
Does the hype surrounding cloud match the reality?
Can we use them to solve the problems in provisioning IT services to support next-generation sequencing?
Real-World Pulsar Architectural PatternsDevin Bost
This presentation covers Real-World Pulsar Architectural Patterns involving Distributed Caching and Distributed Tracing. We also cover the use of Apache Ignite, Jaeger, Apache Flink, and many other technologies, as well as industry best-practices.
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
Tech-talk at Bay Area Apache Spark Meetup.
Apache Spark 2.0 will ship with the second generation Tungsten engine. Building upon ideas from modern compilers and MPP databases, and applying them to data processing queries, we have started an ongoing effort to dramatically improve Spark’s performance and bringing execution closer to bare metal. In this talk, we’ll take a deep dive into Apache Spark 2.0’s execution engine and discuss a number of architectural changes around whole-stage code generation/vectorization that have been instrumental in improving CPU efficiency and gaining performance.
Towards Scalable Service Composition on MulticoresCesare Pautasso
The document discusses scaling service composition engines to leverage multicore architectures. It proposes a topology-aware deployment approach that replicates the engine architecture across cores instead of just increasing threads. Each replica's threads would be bound to specific affinity groups, and resources like memory and threads distributed proportionally among replicas based on hardware resources and number of replicas. An example shows binding two engine instances to separate sets of cores instead of letting all threads span all cores. This improves scalability over a single instance approach.
This document discusses Reactive Programming and Reactive Streams. It introduces Reactor, a reactive programming framework, and how it addresses issues like latency in microservices architectures. Reactive Streams provide an interoperable way to work with asynchronous data streams in a non-blocking manner. Streams represent sequences of data that can be processed reactively through operators like map and filter.
Modern javascript localization with c-3po and the good old gettextAlexander Mostovenko
This document summarizes a presentation about localization in modern JavaScript applications using GNU gettext. Some key points:
- GNU gettext is recommended over ICU due to better tooling and compatibility with existing backend formats.
- C-3po is an open source library that improves on gettext by allowing extraction and resolution of translations directly from JavaScript code using tagged template literals.
- It implements an extraction/merge/resolve workflow that allows developers and translators to work independently and precompiles translations for faster loading.
This document discusses challenges faced in implementing Presto, an open source distributed SQL query engine, for targeted audience delivery at TiVo. It describes choosing appropriate instance types for Presto worker nodes based on memory needs. It also addresses scaling the Presto cluster elastically to handle query concurrency and maturity issues with the Presto software. The document provides insights on testing Presto using Docker containers and connecting to mocked tables.
Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...Codemotion
Yet another presentation about Event Sourcing? Yes and no. Event Sourcing is a really great concept. Some could say it’s a Holy Grail of the software architecture. True, but everything comes with a price. This session is a summary of my experience with ES gathered while working on 3 different commercial products. Instead of theoretical aspects, I will focus on possible challenges with ES implementation. What could explode? How and where to store events effectively? What are possible schema evolution solutions? How to achieve the highest level of scalability and live with eventual consistency?
The document discusses the future of server-side JavaScript. It covers various Node.js frameworks and libraries that support both synchronous and asynchronous programming styles. CommonJS aims to provide interoperability across platforms by implementing synchronous proposals using fibers. Examples demonstrate how CommonJS allows for synchronous-like code while maintaining asynchronous behavior under the hood. Benchmarks show it has comparable performance to Node.js. The author advocates for toolkits over frameworks and continuing development of common standards and packages.
This document discusses Edge Side Includes (ESI) and its use at Yahoo. ESI allows content to be assembled at the edge from different sources, improving performance. Yahoo uses ESI to assemble pages, support legacy modules, and handle combinations of assets. ESI enables availability through caching and fallbacks. The future may include deeper HTTP integration and smarter assembly of includes.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Concurrent and Distributed Applications with Akka, Java and Scala
1. Concurrent and Distributed Applications
with
Akka, Java and Scala
!
Buenos Aires, Argentina, Oct 2012
!
@frodriguez
2. Moore’s law
Moore's law says that every 18 months,
the number of transistors that can fit within a
given area on a chip doubles.
3. Moore’s law
Moore's law says that every 18 months,
the number of transistors that can fit within a
given area on a chip doubles.
Page's law says that every 18 months
software becomes twice as slow
21. Traditional Threads ?
process(...){
Computing
Reading State from Heap
I/O (e.g: Disk, Network, DBs)
Processing Results
Updating State in the Heap
Returning Results
}
22. Traditional Threads ?
process(...){
Computing
Reading State from Heap
I/O (e.g: Disk, Network, DBs)
Processing Results
Updating State in the Heap
Returning Results
}
23. Traditional Threads ?
process(...){
Computing
Reading State from Heap
I/O (e.g: Disk, Network, DBs)
Processing Results
Blocked
Updating State in the Heap
Returning Results
}
Thread
Suspended
24. Traditional Threads ?
process(...){
Computing
Reading State from Heap
I/O (e.g: Disk, Network, DBs)
Processing Results
Updating State in the Heap
Returning Results
}
25. Traditional Threads ?
process(...){
Computing
Reading State from Heap
I/O (e.g: Disk, Network, DBs)
Processing Results
Updating State in the Heap
Returning Results
}
26. Traditional Threads ?
process(...){
Computing
Reading State from Heap
I/O (e.g: Disk, Network, DBs)
Processing Results
Updating State in the Heap
Returning Results
}
Add concurrency...
27. Traditional Threads ?
process(...){
Computing
Reading State from Heap
I/O (e.g: Disk, Network, DBs)
Processing Results
Updating State in the Heap
Returning Results
}
28. Traditional Threads ?
process(...){
Computing
Reading State from Heap
I/O (e.g: Disk, Network, DBs)
Processing Results
Updating State in the Heap
Returning Results
Requires
Synchronization
can be blocked
Requires
Synchronization
can be blocked
}
29. Traditional Threads ?
process(...){
Computing
Reading State from Heap
I/O (e.g: Disk, Network, DBs)
Processing Results
Updating State in the Heap
Returning Results
Requires
Synchronization
can be blocked
Requires
Synchronization
can be blocked
}
Bad for
CPU caches
30. Traditional Threads ?
How many threads ?
Computing
Reading State from Heap
I/O (e.g: Disk, Network, DBs)
Processing Results
Updating State in the Heap
Returning Results
31. Traditional Threads ?
How many threads ?
Computing
Reading State from Heap
I/O (e.g: Disk, Network, DBs)
Processing Results
Updating State in the Heap
Returning Results
32. Traditional Threads ?
How many threads ?
Computing
Reading State from Heap
I/O (e.g: Disk, Network, DBs)
Processing Results
Updating State in the Heap
Returning Results
Improves with threads
(Assuming blocking,
non-async I/O is used...)
33. Traditional Threads ?
How many threads ?
Computing
Reading State from Heap
I/O (e.g: Disk, Network, DBs)
Processing Results
Updating State in the Heap
Returning Results
Degrades with more
threads than cores
Context Switching,
Contention,
L1 & L2 Caches
34. What About Latency ?
Client
Biz
DB
Fetching
and
Mapping N Items
Latency (time to first item)
Thread by task (instead of by layer). Sync Results
35. What About Latency ?
Client
Biz
DB
Fetching
and
Mapping N Items
Latency (time to first item)
Parallelism by Layer - Asynchronous and Partial Results
From Request/Response to Request Stream/Response Stream
38. Traditional Approach
RPC (WS, RMI, ...)
Queues (JMS, AMQP, STOMP, etc),
Raw Sockets
Local != Remote
Local should be an optimization,
not a forced early decision...
39. Akka
“Akka is a toolkit and runtime for
building highly concurrent,
distributed, and fault tolerant event-driven
applications on the JVM. ”
Based on the actor model
40. What is an Actor ?
Actors are objects which
encapsulate state and behavior
Communicate exclusively by
exchanging messages
Conceptually have their own
light-weight thread
No Need for Synchronization
55. Actors: Processing Messages
/myactor
State
Behavior
/someactor
State
B Behavior
Change State
Change Behavior
Send a Message
56. Actors: Processing Messages
/myactor
State
Behavior
/someactor
State
Behavior
B
Change State
Change Behavior
Send a Message
57. Actors: Processing Messages
/myactor
State
Behavior
/someactor
State
BehaBvior
Change State
Change Behavior
Send a Message
58. Actors: Processing Messages
/myactor
State
Behavior
/someactor
State
Behavior
/myactor/child
State
Behavior
Change State
Change Behavior
Send a Message
Create Actors
59. Hello World Actor
Define
class HelloWorld extends Actor {
def receive = {
case msg =>
printf(“Received %sn”, msg)
}
}
Create
val system = ActorSystem(“MySystem”)
val hello = system.actorOf(Props[HelloWorld], “hello”)
Send Message
hello ! “World”
60. Counter Actor
Define
class Counter extends Actor {
var total = 0
!
def receive = {
case Count(value) =>
total += value
case GetStats =>
sender ! Stats(total)
}
}
Protocol
case class Count(n: Int)
case class Stats(total: Int)
case object GetStats
62. Sending a Message
/actorB
State
Behavior
/actorA
State
Behavior
actorB ! A
63. Sending a Message
/actorB
State
Behavior
/actorA
State
BehAavior
actorB ! A
64. Sending a Message
/actorB
State
Behavior
/actorA
State
Behavior
A
actorB ! A
65. Sending a Message
/actorB
State
Behavior
/actorA
State
Behavior A
actorB ! A
66. Sending a Message
/actorB
State
Behavior
/actorA
State
Behavior A
actorB ! A sender ! B
67. Sending a Message
/actorB
State
Behavior
/actorA
State
Behavior
B
actorB ! A sender ! B
68. Sending a Message
/actorB
State
Behavior
/actorA
State
Behavior
actorB ! A sender ! B
B
69. Sending a Message
/actorB
State
Behavior
/actorA
State
Behavior
B
actorB ! A sender ! B
70. Sending a Message
/actorB
State
Behavior
/actorA
State
Behavior
actorB ! A sender ! B
71. Sending a Message
/actorB
State
Behavior
/actorA
State
Behavior
actorB ! A
actorB tell A
sender ! B
sender tell B
72. Sending a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
73. Sending a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
actorB tell (A, actorC)
74. Sending a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
BehAavior
actorB tell (A, actorC)
75. Sending a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
A
actorB tell (A, actorC)
76. Sending a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior A
actorB tell (A, actorC)
77. Sending a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior A
actorB tell (A, actorC) sender ! B
78. Sending a Message
/actorB
State
Behavior
B
/actorC
State
Behavior
/actorA
State
Behavior
actorB tell (A, actorC) sender ! B
79. Sending a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
actorB tell (A, actorC) sender ! B
B
80. Sending a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
actorB tell (A, actorC) sender ! B
B
81. Sending a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
actorB tell (A, actorC) sender ! B
82. Forward a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
83. Forward a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
actorB ! A
84. Forward a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
BehAavior
actorB ! A
85. Forward a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
A
actorB ! A
86. Forward a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior A
actorB ! A
87. Forward a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior A
actorB ! A actorC forward B
88. Forward a Message
/actorB
State
Behavior
B
/actorC
State
Behavior
/actorA
State
Behavior
actorB ! A actorC forward B
89. Forward a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
actorB ! A actorC forward B
B
90. Forward a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
actorB ! A actorC forward B
B
91. Forward a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
actorB ! A actorC forward B
sender ! C B
92. Forward a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
actorB ! A actorC forward B
sender ! C C
93. Forward a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
actorB ! A actorC forward B
sender ! C
C
94. Forward a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
C
actorB ! A actorC forward B
sender ! C
95. Forward a Message
/actorB
State
Behavior
/actorC
State
Behavior
/actorA
State
Behavior
actorB ! A actorC forward B
sender ! C
96. Ask & Pipe Patterns
Ask
val response = actor ? Message
!
response onSuccess {
case Response(a) =>
printf(“Response %s”, a)
}
Pipe
val response = actor ? Message
!
response pipeTo actor2
97. Mailbox
UnboundedMailbox (default)
UnboundedPriorityMailbox
BoundedMailbox (*)
BoundedPriorityMailbox (*)
* May produce deadlocks if used unproperly
98. Routing
Round Robin Router
val actor = system.actorOf(
Props[MyActor].withRouter(RoundRobinRouter(4)),
name = “myrouter”
)
Using actor with routers (no changes)
!
actor ! Message
100. Routing Configuration
Configuration overrides code
akka.actor.deployment {
/myrouter {
router = round-robin
nr-of-instances = 8
}
}
Routers from Config
val actor = system.actorOf(
Props[MyActor].withRouter(FromConfig()),
name = “myrouter”
)
101. Remoting
Accessing remote actor
val actor = system.actorFor(
“akka://sys@server:2552/user/actor”
)
Using remote actor (no changes)
!
actor ! Message
!
// Replies also work ok
sender ! Response
102. Remote Deployment
Code without changes
val actor = system.actorOf(
Props[MyActor],
name = “myactor”
)
Configuration
!
akka.actor.deployment {
/myactor {
remote = “akka://sys@server:2553”
}
}
103. Remote Deployment (routers)
akka.actor.deployment {
/myrouter {
router = round-robin
nr-of-instances = 8
!
target {
nodes = [“akka://sys@server1:2552”
“akka://sys@server2:2552”]
}
}
}
Routers from Config
val actor = system.actorOf(
Props[MyActor].withRouter(FromConfig()),
name = “myrouter”
)
104. Fault Tolerance
override val supervisorStrategy = OneForOneStrategy(...)
{
case _: ArithmeticException => Resume
case _: NullPointerException => Restart
case _: IllegalArgumentException => Stop
case _: Exception => Escalate
}
Supervision Hierarchies across machines