SlideShare a Scribd company logo
MILAN - 08TH OF MAY - 2015
PARTNERS
Scala in increasingly demanding
environments
Stefano Rocco – Roberto Bentivoglio
DATABIZ
Agenda
Introduction
Command Query Responsibility Segregation
Event Sourcing
Akka persistence
Apache Spark
Real-time “bidding”
Live demo (hopefully)
FAQ
1. Introduction
The picture
Highly demanding environments
- Data is increasing dramatically
- Applications are needed faster than ever
- Customers are more demanding
- Customers are becoming more sophisticated
- Services are becoming more sophisticated and complex
- Performance & Quality is becoming a must
- Rate of business change is ever increasing
- And more…
Reactive Manifesto
Introduction – The way we see
Responsive
Message Driven
ResilientElastic
We need to embrace change!
Introduction – The world is changing…
Introduction - Real Time “Bidding”
High level architecture
Akka
Persistence
Input
Output
Cassandra
Kafka
Training PredictionScoring
SparkBatch
Real Time
Action
Dispatch
Publish
Store
Journaling
2. Command Query
Responsibility
Segregation
Multi-tier stereotypical architecture + CRUD
CQRS
Presentation Tier
Business Logic Tier
Data Tier
Integration
Tier
RDBMS
ClientSystems
ExternalSystems
DTO/VO
Multi-tier stereotypical architecture + CRUD
CQRS
- Pro
- Simplicity
- Tooling
- Cons
- Difficult to scale (RDBMS is usually the bottleneck)
- Domain Driven Design not applicable (using CRUD)
Think different!
CQRS
- Do we have a different architecture model without heavily rely on:
- CRUD
- RDBMS transactions
- J2EE/Spring technologies stack
Command and Query Responsibility Segregation
Originated with Bertrand Meyer’s Command and Query Separation Principle
“It states that every method should either be a command that performs an action, or a query that
returns data to the caller, but not both. In other words, asking a question should not change the
answer. More formally, methods should return a value only if they are referentially transparent
and hence possess no side effects” (Wikipedia)
CQRS
Command and Query Responsibility Segregation (Greg Young)
CQRS
Available Services
- The service has been split into:
- Command → Write side service
- Query → Read side service
CQRS
Change status Status changed
Get status Status retrieved
Main architectural properties
- Consistency
- Command → consistent by definition
- Query → eventually consistent
- Data Storage
- Command → normalized way
- Query → denormalized way
- Scalability
- Command → low transactions rate
- Query → high transactions rate
CQRS
3. Event Sourcing
Storing Events…
Event Sourcing
Systems today usually rely on
- Storing of current state
- Usage of RDBMS as storage solution
Architectural choices are often “RDBMS centric”
Many systems need to store all the occurred events instead to store only the updated state
Commands vs Events
Event Sourcing
- Commands
- Ask to perform an operation (imperative tense)
- Can be rejected
- Events
- Something happened in the past (past tense)
- Cannot be undone
State mutationCommand validationCommand received Event persisted
Command and Event sourcing
Event Sourcing
An informal and short definition...
Append to a journal every commands (or
events) received (or generated) instead of
storing the current state of the application!
CRUD vs Event sourcing
Event Sourcing
Deposited 100 EUR Withdrawn
40 EUR
Deposited
200 EUR
- CRUD
- Account table keeps the current amount availability (260)
- Occoured events are stored in a seperated table
- Event Sourcing
- The current status is kept in-memory or by processing all events
- 100 – 40 + 200 => 260
Account created
Main properties
- There is no delete
- Performance and Scalability
- “Append only” model are easier to scale
- Horizontal Partitioning (Sharding)
- Rolling Snapshots
- No Impedance Mismatch
- Event Log can bring great business value
Event Sourcing
4. Akka persistence
Introduction
We can think about it as
AKKA PERSISTENCE = CQRS + EVENT SOURCING
Akka Persistence
Main properties
- Akka persistence enables stateful actors to persiste their internal state
- Recover state after
- Actor start
- Actor restart
- JVM crash
- By supervisor
- Cluster migration
Akka Persistence
Main properties
- Changes are append to storage
- Nothing is mutated
- high transactions rates
- Efficient replication
- Stateful actors are recovered by replying store changes
- From the begging or from a snapshot
- Provides also P2P communication with at-least-once message delivery semantics
Akka Persistence
Components
- PersistentActor → persistent stateful actor
- Command or event sourced actor
- Persist commands/events to a journal
- PersistentView → Receives journaled messages written by another persistent actor
- AtLeastOnceDelivery → also in case of sender or receiver JVM crashes
- Journal → stores the sequence of messages sent to a persistent actor
- Snapshot store → are used for optimizing recovery times
Akka Persistence
Code example
class BookActor extends PersistentActor {
override val persistenceId: String = "book-persistence"
override def receiveRecover: Receive = {
case _ => // RECOVER AFTER A CRASH HERE...
}
override def receiveCommand: Receive = {
case _ => // VALIDATE COMMANDS AND PERSIST EVENTS HERE...
}
}
type Receive = PartialFunction[Any, Unit]
Akka Persistence
5. Apache Spark
Apache Spark is a cluster computing platform designed to be fast and general-purpose
Spark SQL
Structured data
Spark Streaming
Real Time
Mllib
Machine Learning
GraphX
Graph Processing
Spark Core
Standalone Scheduler YARN Mesos
Apache Spark
The Stack
Apache Spark
The Stack
- Spark SQL: It allows querying data via SQL as well as the Apache Variant of SQL (HQL) and supports
many sources of data, including Hive tables, Parquet and JSON
- Spark Streaming: Components that enables processing of live streams of data in a elegant, fault tolerant,
scalable and fast way
- MLlib: Library containing common machine learning (ML) functionality including algorithms such as
classification, regression, clustering, collaborative filtering etc. to scale out across a cluster
- GraphX: Library for manipulating graphs and performing graph-parallel computation
- Cluster Managers: Spark is designed to efficiently scale up from one to many thousands of compute
nodes. It can run over a variety of cluster managers including Hadoop, YARN, Apache Mesos etc. Spark
has a simple cluster manager included in Spark itself called the Standalone Scheduler
Apache Spark
Core Concepts
SparkContext
Driver Program
Worker Node
Worker Node
Executor
Task Task
Worker Node
Executor
Task Task
Apache Spark
Core Concepts
- Every Spark application consists of a driver program that launches various parallel operations
on the cluster. The driver program contains your application’s main function and defines
distributed datasets on the cluster, then applies operations to them
- Driver programs access spark through the SparkContext object, which represents a connection
to a computing cluster.
- The SparkContext can be used to build RDDs (Resilient distributed datasets) on which you can
run a series of operations
- To run these operations, driver programs typically manage a number of nodes called executors
Apache Spark
RDD (Resilient Distributed Dataset)
It is an immutable distributed collection of data, which is partitioned across
machines in a cluster.
It facilitates two types of operations: transformation and action
-Resilient: It can be recreated when data in memory is lost
-Distributed: stored in memory across the cluster
-Dataset: data that comes from file or created programmatically
Apache Spark
Transformations
- A transformation is an operation such as map(), filter() or union on a RDD that yield
another RDD.
- Transformations are lazilly evaluated, in that the don’t run until an action is executed.
- Spark driver remembers the transformation applied to an RDD, so if a partition is lost,
that partition can easily be reconstructed on some other machine in the cluster.
(Resilient)
- Resiliency is achieved via a Lineage Graph.
Apache Spark
Actions
- Compute a result based on a RDD and either return it to the driver program
or save it to an external storage system.
- Typical RDD actions are count(), first(), take(n)
Apache Spark
Transformations vs Actions
RDD RDD
RDD Value
Transformations: define new RDDs based on current one. E.g. map, filter, reduce etc.
Actions: return values. E.g. count, sum, collect, etc.
Apache Spark
Benefits
Scalable Can be deployed on very large clusters
Fast In memory processing for speed
Resilient Recover in case of data loss
Written in Scala… has a simple high level API for Scala, Java and Python
Apache Spark
Lambda Architecture – One fits all technology!
New data
Batch Layer
Speed Layer
Serving Layer
Data
Consumers
Query
Spark
Spark
- Spark Streaming receives streaming input, and divides the data into batches which are then
processed by the Spark Core
Input data
Stream
Batches of input
data
Batches of
processed data
Spark Streaming Spark Core
Apache Spark
Speed Layer
val numThreads = 1
val group = "test"
val topicMap = group.split(",").map((_, numThreads)).toMap
val conf = new SparkConf().setMaster("local[*]").setAppName("KafkaWordCount")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(2))
val lines = KafkaUtils.createStream(ssc, "localhost:2181", group,
topicMap).map(_._2)
val words = lines.flatMap(_.split(","))
val wordCounts = words.map { x => (x, 1L) }.reduceByKey(_ + _)
....
ssc.start()
ssc.awaitTermination()
Apache Spark – Streaming word count
example
Streaming with Spark and Kafka
6. Real-time “bidding”
Real Time “Bidding”
High level architecture
Akka
Persistence
Input
Output
Cassandra
Kafka
Training PredictionScoring
SparkBatch
Real Time
Action
Dispatch
Publish
Store
Journaling
Apache Kafka
Distributed messaging system
- Fast: Hight throughput for both publishing and subribing
- Scalable: Very easy to scale out
- Durable: Support persistence of messages
- Consumers are responsible to track their location in each log
Producer 1
Producer 2
Consumer
A
Consumer
B
Consumer
C
Partition 1
Partition 2
Partition 3
Apache Cassandra
Massively Scalable NoSql datastore
- Elastic Scalability
- No single point of failure
- Fast linear scale performance
1 Clients write to any Cassandra node
2 Coordinator node replicates to nodes and zones
3 Nodes returns ack to client
4 Data written to internal commit log disk
5 If a node goes offline, hinted handoff completes the write
when the node comes back up
- Regions = Datacenters
- Zones = Racks
Node
Node
Node
Node
Node
Node
Cluster
7. Live demo
MILAN - 08TH OF MAY - 2015
PARTNERS
THANK YOU!
Stefano Rocco - @whispurr_it
Roberto Bentivoglio - @robbenti
@DATABIZit
PARTNERS
FAQ
We’re hiring!

More Related Content

What's hot

MaxScale - The Pluggibale Router MariaDB Roadshow 2014 Paris
MaxScale - The Pluggibale Router MariaDB Roadshow 2014 ParisMaxScale - The Pluggibale Router MariaDB Roadshow 2014 Paris
MaxScale - The Pluggibale Router MariaDB Roadshow 2014 Paris
MariaDB Corporation
 
As fast as a grid, as safe as a database
As fast as a grid, as safe as a databaseAs fast as a grid, as safe as a database
As fast as a grid, as safe as a database
gojkoadzic
 
A Quick Guide to Sql Server Availability Groups
A Quick Guide to Sql Server Availability GroupsA Quick Guide to Sql Server Availability Groups
A Quick Guide to Sql Server Availability Groups
Pio Balistoy
 
MySQL Fabric: Easy Management of MySQL Servers
MySQL Fabric: Easy Management of MySQL ServersMySQL Fabric: Easy Management of MySQL Servers
MySQL Fabric: Easy Management of MySQL Servers
Mats Kindahl
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
WSO2
 
How Orwell built a geo-distributed Bank-as-a-Service with microservices
How Orwell built a geo-distributed Bank-as-a-Service with microservicesHow Orwell built a geo-distributed Bank-as-a-Service with microservices
How Orwell built a geo-distributed Bank-as-a-Service with microservices
MariaDB plc
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
MariaDB plc
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server side
Howard Marks
 
Making MySQL Administration a Breeze - A look into a MySQL DBA's toolchest
Making MySQL Administration a Breeze - A look into a MySQL DBA's toolchest Making MySQL Administration a Breeze - A look into a MySQL DBA's toolchest
Making MySQL Administration a Breeze - A look into a MySQL DBA's toolchest
Lenz Grimmer
 
Deploying MariaDB databases with containers at Nokia Networks
Deploying MariaDB databases with containers at Nokia NetworksDeploying MariaDB databases with containers at Nokia Networks
Deploying MariaDB databases with containers at Nokia Networks
MariaDB plc
 
Efficient Performance Analysis and Tuning with MySQL Enterprise Monitor
Efficient Performance Analysis and Tuning with MySQL Enterprise MonitorEfficient Performance Analysis and Tuning with MySQL Enterprise Monitor
Efficient Performance Analysis and Tuning with MySQL Enterprise Monitor
Mark Matthews
 
High-Availability using MySQL Fabric
High-Availability using MySQL FabricHigh-Availability using MySQL Fabric
High-Availability using MySQL Fabric
Mats Kindahl
 
Mysql User Camp : 20-June-14 : Mysql Fabric
Mysql User Camp : 20-June-14 : Mysql FabricMysql User Camp : 20-June-14 : Mysql Fabric
Mysql User Camp : 20-June-14 : Mysql Fabric
Mysql User Camp
 
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
Ulf Wendel
 
Failover or not to failover
Failover or not to failoverFailover or not to failover
Failover or not to failover
Henrik Ingo
 
NoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSONNoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSON
Mario Beck
 
MaxScale - the pluggable router
MaxScale - the pluggable routerMaxScale - the pluggable router
MaxScale - the pluggable router
MariaDB Corporation
 
MariaDB: Connect Storage Engine
MariaDB: Connect Storage EngineMariaDB: Connect Storage Engine
MariaDB: Connect Storage Engine
Kangaroot
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera Cluster
Abdul Manaf
 
MaxScale - The Pluggable Router
MaxScale - The Pluggable RouterMaxScale - The Pluggable Router
MaxScale - The Pluggable Router
MariaDB Corporation
 

What's hot (20)

MaxScale - The Pluggibale Router MariaDB Roadshow 2014 Paris
MaxScale - The Pluggibale Router MariaDB Roadshow 2014 ParisMaxScale - The Pluggibale Router MariaDB Roadshow 2014 Paris
MaxScale - The Pluggibale Router MariaDB Roadshow 2014 Paris
 
As fast as a grid, as safe as a database
As fast as a grid, as safe as a databaseAs fast as a grid, as safe as a database
As fast as a grid, as safe as a database
 
A Quick Guide to Sql Server Availability Groups
A Quick Guide to Sql Server Availability GroupsA Quick Guide to Sql Server Availability Groups
A Quick Guide to Sql Server Availability Groups
 
MySQL Fabric: Easy Management of MySQL Servers
MySQL Fabric: Easy Management of MySQL ServersMySQL Fabric: Easy Management of MySQL Servers
MySQL Fabric: Easy Management of MySQL Servers
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
 
How Orwell built a geo-distributed Bank-as-a-Service with microservices
How Orwell built a geo-distributed Bank-as-a-Service with microservicesHow Orwell built a geo-distributed Bank-as-a-Service with microservices
How Orwell built a geo-distributed Bank-as-a-Service with microservices
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server side
 
Making MySQL Administration a Breeze - A look into a MySQL DBA's toolchest
Making MySQL Administration a Breeze - A look into a MySQL DBA's toolchest Making MySQL Administration a Breeze - A look into a MySQL DBA's toolchest
Making MySQL Administration a Breeze - A look into a MySQL DBA's toolchest
 
Deploying MariaDB databases with containers at Nokia Networks
Deploying MariaDB databases with containers at Nokia NetworksDeploying MariaDB databases with containers at Nokia Networks
Deploying MariaDB databases with containers at Nokia Networks
 
Efficient Performance Analysis and Tuning with MySQL Enterprise Monitor
Efficient Performance Analysis and Tuning with MySQL Enterprise MonitorEfficient Performance Analysis and Tuning with MySQL Enterprise Monitor
Efficient Performance Analysis and Tuning with MySQL Enterprise Monitor
 
High-Availability using MySQL Fabric
High-Availability using MySQL FabricHigh-Availability using MySQL Fabric
High-Availability using MySQL Fabric
 
Mysql User Camp : 20-June-14 : Mysql Fabric
Mysql User Camp : 20-June-14 : Mysql FabricMysql User Camp : 20-June-14 : Mysql Fabric
Mysql User Camp : 20-June-14 : Mysql Fabric
 
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
 
Failover or not to failover
Failover or not to failoverFailover or not to failover
Failover or not to failover
 
NoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSONNoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSON
 
MaxScale - the pluggable router
MaxScale - the pluggable routerMaxScale - the pluggable router
MaxScale - the pluggable router
 
MariaDB: Connect Storage Engine
MariaDB: Connect Storage EngineMariaDB: Connect Storage Engine
MariaDB: Connect Storage Engine
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera Cluster
 
MaxScale - The Pluggable Router
MaxScale - The Pluggable RouterMaxScale - The Pluggable Router
MaxScale - The Pluggable Router
 

Similar to Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environments

SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
SnappyData
 
Nike tech talk.2
Nike tech talk.2Nike tech talk.2
Nike tech talk.2
Jags Ramnarayan
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
Cedric Vidal
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Michael Spector
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your Eyes
Demi Ben-Ari
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
Karan Alang
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
Oh Chan Kwon
 
Spark cep
Spark cepSpark cep
Spark cep
Byungjin Kim
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Spark streaming with kafka
Spark streaming with kafkaSpark streaming with kafka
Spark streaming with kafka
Dori Waldman
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
Dori Waldman
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Amazon Web Services
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
Rahul Kumar
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
C4Media
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
Databricks
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
Benjamin Bengfort
 

Similar to Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environments (20)

SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 
Nike tech talk.2
Nike tech talk.2Nike tech talk.2
Nike tech talk.2
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your Eyes
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Spark cep
Spark cepSpark cep
Spark cep
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Spark streaming with kafka
Spark streaming with kafkaSpark streaming with kafka
Spark streaming with kafka
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 

More from Scala Italy

Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64
Scala Italy
 
Alberto Paro - Hands on Scala.js
Alberto Paro - Hands on Scala.jsAlberto Paro - Hands on Scala.js
Alberto Paro - Hands on Scala.js
Scala Italy
 
Andrea Lattuada, Gabriele Petronella - Building startups on Scala
Andrea Lattuada, Gabriele Petronella - Building startups on ScalaAndrea Lattuada, Gabriele Petronella - Building startups on Scala
Andrea Lattuada, Gabriele Petronella - Building startups on Scala
Scala Italy
 
Federico Feroldi - Scala microservices
Federico Feroldi - Scala microservicesFederico Feroldi - Scala microservices
Federico Feroldi - Scala microservices
Scala Italy
 
Martin Odersky - Evolution of Scala
Martin Odersky - Evolution of ScalaMartin Odersky - Evolution of Scala
Martin Odersky - Evolution of Scala
Scala Italy
 
Daniela Sfregola - Intro to Akka
Daniela Sfregola - Intro to AkkaDaniela Sfregola - Intro to Akka
Daniela Sfregola - Intro to Akka
Scala Italy
 
Mirco Dotta - Akka Streams
Mirco Dotta - Akka StreamsMirco Dotta - Akka Streams
Mirco Dotta - Akka Streams
Scala Italy
 
Phil Calçado - Your microservice as a function
Phil Calçado - Your microservice as a functionPhil Calçado - Your microservice as a function
Phil Calçado - Your microservice as a function
Scala Italy
 
Scalatra - Massimiliano Dessì (Energeya)
Scalatra - Massimiliano Dessì (Energeya)Scalatra - Massimiliano Dessì (Energeya)
Scalatra - Massimiliano Dessì (Energeya)
Scala Italy
 
Scala: the language of languages - Mario Fusco (Red Hat)
Scala: the language of languages - Mario Fusco (Red Hat)Scala: the language of languages - Mario Fusco (Red Hat)
Scala: the language of languages - Mario Fusco (Red Hat)
Scala Italy
 
Reflection in Scala Whats, Whys and Hows - Walter Cazzola (Dipartimento di In...
Reflection in Scala Whats, Whys and Hows - Walter Cazzola (Dipartimento di In...Reflection in Scala Whats, Whys and Hows - Walter Cazzola (Dipartimento di In...
Reflection in Scala Whats, Whys and Hows - Walter Cazzola (Dipartimento di In...
Scala Italy
 
Simplifying development-short - Mirco Dotta (Typesafe)
Simplifying development-short - Mirco Dotta (Typesafe)Simplifying development-short - Mirco Dotta (Typesafe)
Simplifying development-short - Mirco Dotta (Typesafe)
Scala Italy
 
Scala in pratica - Stefano Rocco (MoneyFarm)
Scala in pratica - Stefano Rocco (MoneyFarm)Scala in pratica - Stefano Rocco (MoneyFarm)
Scala in pratica - Stefano Rocco (MoneyFarm)
Scala Italy
 

More from Scala Italy (13)

Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64
 
Alberto Paro - Hands on Scala.js
Alberto Paro - Hands on Scala.jsAlberto Paro - Hands on Scala.js
Alberto Paro - Hands on Scala.js
 
Andrea Lattuada, Gabriele Petronella - Building startups on Scala
Andrea Lattuada, Gabriele Petronella - Building startups on ScalaAndrea Lattuada, Gabriele Petronella - Building startups on Scala
Andrea Lattuada, Gabriele Petronella - Building startups on Scala
 
Federico Feroldi - Scala microservices
Federico Feroldi - Scala microservicesFederico Feroldi - Scala microservices
Federico Feroldi - Scala microservices
 
Martin Odersky - Evolution of Scala
Martin Odersky - Evolution of ScalaMartin Odersky - Evolution of Scala
Martin Odersky - Evolution of Scala
 
Daniela Sfregola - Intro to Akka
Daniela Sfregola - Intro to AkkaDaniela Sfregola - Intro to Akka
Daniela Sfregola - Intro to Akka
 
Mirco Dotta - Akka Streams
Mirco Dotta - Akka StreamsMirco Dotta - Akka Streams
Mirco Dotta - Akka Streams
 
Phil Calçado - Your microservice as a function
Phil Calçado - Your microservice as a functionPhil Calçado - Your microservice as a function
Phil Calçado - Your microservice as a function
 
Scalatra - Massimiliano Dessì (Energeya)
Scalatra - Massimiliano Dessì (Energeya)Scalatra - Massimiliano Dessì (Energeya)
Scalatra - Massimiliano Dessì (Energeya)
 
Scala: the language of languages - Mario Fusco (Red Hat)
Scala: the language of languages - Mario Fusco (Red Hat)Scala: the language of languages - Mario Fusco (Red Hat)
Scala: the language of languages - Mario Fusco (Red Hat)
 
Reflection in Scala Whats, Whys and Hows - Walter Cazzola (Dipartimento di In...
Reflection in Scala Whats, Whys and Hows - Walter Cazzola (Dipartimento di In...Reflection in Scala Whats, Whys and Hows - Walter Cazzola (Dipartimento di In...
Reflection in Scala Whats, Whys and Hows - Walter Cazzola (Dipartimento di In...
 
Simplifying development-short - Mirco Dotta (Typesafe)
Simplifying development-short - Mirco Dotta (Typesafe)Simplifying development-short - Mirco Dotta (Typesafe)
Simplifying development-short - Mirco Dotta (Typesafe)
 
Scala in pratica - Stefano Rocco (MoneyFarm)
Scala in pratica - Stefano Rocco (MoneyFarm)Scala in pratica - Stefano Rocco (MoneyFarm)
Scala in pratica - Stefano Rocco (MoneyFarm)
 

Recently uploaded

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 

Recently uploaded (20)

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 

Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environments

  • 1. MILAN - 08TH OF MAY - 2015 PARTNERS Scala in increasingly demanding environments Stefano Rocco – Roberto Bentivoglio DATABIZ
  • 2. Agenda Introduction Command Query Responsibility Segregation Event Sourcing Akka persistence Apache Spark Real-time “bidding” Live demo (hopefully) FAQ
  • 4. The picture Highly demanding environments - Data is increasing dramatically - Applications are needed faster than ever - Customers are more demanding - Customers are becoming more sophisticated - Services are becoming more sophisticated and complex - Performance & Quality is becoming a must - Rate of business change is ever increasing - And more…
  • 5. Reactive Manifesto Introduction – The way we see Responsive Message Driven ResilientElastic
  • 6. We need to embrace change! Introduction – The world is changing…
  • 7. Introduction - Real Time “Bidding” High level architecture Akka Persistence Input Output Cassandra Kafka Training PredictionScoring SparkBatch Real Time Action Dispatch Publish Store Journaling
  • 9. Multi-tier stereotypical architecture + CRUD CQRS Presentation Tier Business Logic Tier Data Tier Integration Tier RDBMS ClientSystems ExternalSystems DTO/VO
  • 10. Multi-tier stereotypical architecture + CRUD CQRS - Pro - Simplicity - Tooling - Cons - Difficult to scale (RDBMS is usually the bottleneck) - Domain Driven Design not applicable (using CRUD)
  • 11. Think different! CQRS - Do we have a different architecture model without heavily rely on: - CRUD - RDBMS transactions - J2EE/Spring technologies stack
  • 12. Command and Query Responsibility Segregation Originated with Bertrand Meyer’s Command and Query Separation Principle “It states that every method should either be a command that performs an action, or a query that returns data to the caller, but not both. In other words, asking a question should not change the answer. More formally, methods should return a value only if they are referentially transparent and hence possess no side effects” (Wikipedia) CQRS
  • 13. Command and Query Responsibility Segregation (Greg Young) CQRS
  • 14. Available Services - The service has been split into: - Command → Write side service - Query → Read side service CQRS Change status Status changed Get status Status retrieved
  • 15. Main architectural properties - Consistency - Command → consistent by definition - Query → eventually consistent - Data Storage - Command → normalized way - Query → denormalized way - Scalability - Command → low transactions rate - Query → high transactions rate CQRS
  • 17. Storing Events… Event Sourcing Systems today usually rely on - Storing of current state - Usage of RDBMS as storage solution Architectural choices are often “RDBMS centric” Many systems need to store all the occurred events instead to store only the updated state
  • 18. Commands vs Events Event Sourcing - Commands - Ask to perform an operation (imperative tense) - Can be rejected - Events - Something happened in the past (past tense) - Cannot be undone State mutationCommand validationCommand received Event persisted
  • 19. Command and Event sourcing Event Sourcing An informal and short definition... Append to a journal every commands (or events) received (or generated) instead of storing the current state of the application!
  • 20. CRUD vs Event sourcing Event Sourcing Deposited 100 EUR Withdrawn 40 EUR Deposited 200 EUR - CRUD - Account table keeps the current amount availability (260) - Occoured events are stored in a seperated table - Event Sourcing - The current status is kept in-memory or by processing all events - 100 – 40 + 200 => 260 Account created
  • 21. Main properties - There is no delete - Performance and Scalability - “Append only” model are easier to scale - Horizontal Partitioning (Sharding) - Rolling Snapshots - No Impedance Mismatch - Event Log can bring great business value Event Sourcing
  • 23. Introduction We can think about it as AKKA PERSISTENCE = CQRS + EVENT SOURCING Akka Persistence
  • 24. Main properties - Akka persistence enables stateful actors to persiste their internal state - Recover state after - Actor start - Actor restart - JVM crash - By supervisor - Cluster migration Akka Persistence
  • 25. Main properties - Changes are append to storage - Nothing is mutated - high transactions rates - Efficient replication - Stateful actors are recovered by replying store changes - From the begging or from a snapshot - Provides also P2P communication with at-least-once message delivery semantics Akka Persistence
  • 26. Components - PersistentActor → persistent stateful actor - Command or event sourced actor - Persist commands/events to a journal - PersistentView → Receives journaled messages written by another persistent actor - AtLeastOnceDelivery → also in case of sender or receiver JVM crashes - Journal → stores the sequence of messages sent to a persistent actor - Snapshot store → are used for optimizing recovery times Akka Persistence
  • 27. Code example class BookActor extends PersistentActor { override val persistenceId: String = "book-persistence" override def receiveRecover: Receive = { case _ => // RECOVER AFTER A CRASH HERE... } override def receiveCommand: Receive = { case _ => // VALIDATE COMMANDS AND PERSIST EVENTS HERE... } } type Receive = PartialFunction[Any, Unit] Akka Persistence
  • 29. Apache Spark is a cluster computing platform designed to be fast and general-purpose Spark SQL Structured data Spark Streaming Real Time Mllib Machine Learning GraphX Graph Processing Spark Core Standalone Scheduler YARN Mesos Apache Spark The Stack
  • 30. Apache Spark The Stack - Spark SQL: It allows querying data via SQL as well as the Apache Variant of SQL (HQL) and supports many sources of data, including Hive tables, Parquet and JSON - Spark Streaming: Components that enables processing of live streams of data in a elegant, fault tolerant, scalable and fast way - MLlib: Library containing common machine learning (ML) functionality including algorithms such as classification, regression, clustering, collaborative filtering etc. to scale out across a cluster - GraphX: Library for manipulating graphs and performing graph-parallel computation - Cluster Managers: Spark is designed to efficiently scale up from one to many thousands of compute nodes. It can run over a variety of cluster managers including Hadoop, YARN, Apache Mesos etc. Spark has a simple cluster manager included in Spark itself called the Standalone Scheduler
  • 31. Apache Spark Core Concepts SparkContext Driver Program Worker Node Worker Node Executor Task Task Worker Node Executor Task Task
  • 32. Apache Spark Core Concepts - Every Spark application consists of a driver program that launches various parallel operations on the cluster. The driver program contains your application’s main function and defines distributed datasets on the cluster, then applies operations to them - Driver programs access spark through the SparkContext object, which represents a connection to a computing cluster. - The SparkContext can be used to build RDDs (Resilient distributed datasets) on which you can run a series of operations - To run these operations, driver programs typically manage a number of nodes called executors
  • 33. Apache Spark RDD (Resilient Distributed Dataset) It is an immutable distributed collection of data, which is partitioned across machines in a cluster. It facilitates two types of operations: transformation and action -Resilient: It can be recreated when data in memory is lost -Distributed: stored in memory across the cluster -Dataset: data that comes from file or created programmatically
  • 34. Apache Spark Transformations - A transformation is an operation such as map(), filter() or union on a RDD that yield another RDD. - Transformations are lazilly evaluated, in that the don’t run until an action is executed. - Spark driver remembers the transformation applied to an RDD, so if a partition is lost, that partition can easily be reconstructed on some other machine in the cluster. (Resilient) - Resiliency is achieved via a Lineage Graph.
  • 35. Apache Spark Actions - Compute a result based on a RDD and either return it to the driver program or save it to an external storage system. - Typical RDD actions are count(), first(), take(n)
  • 36. Apache Spark Transformations vs Actions RDD RDD RDD Value Transformations: define new RDDs based on current one. E.g. map, filter, reduce etc. Actions: return values. E.g. count, sum, collect, etc.
  • 37. Apache Spark Benefits Scalable Can be deployed on very large clusters Fast In memory processing for speed Resilient Recover in case of data loss Written in Scala… has a simple high level API for Scala, Java and Python
  • 38. Apache Spark Lambda Architecture – One fits all technology! New data Batch Layer Speed Layer Serving Layer Data Consumers Query Spark Spark
  • 39. - Spark Streaming receives streaming input, and divides the data into batches which are then processed by the Spark Core Input data Stream Batches of input data Batches of processed data Spark Streaming Spark Core Apache Spark Speed Layer
  • 40. val numThreads = 1 val group = "test" val topicMap = group.split(",").map((_, numThreads)).toMap val conf = new SparkConf().setMaster("local[*]").setAppName("KafkaWordCount") val sc = new SparkContext(conf) val ssc = new StreamingContext(sc, Seconds(2)) val lines = KafkaUtils.createStream(ssc, "localhost:2181", group, topicMap).map(_._2) val words = lines.flatMap(_.split(",")) val wordCounts = words.map { x => (x, 1L) }.reduceByKey(_ + _) .... ssc.start() ssc.awaitTermination() Apache Spark – Streaming word count example Streaming with Spark and Kafka
  • 42. Real Time “Bidding” High level architecture Akka Persistence Input Output Cassandra Kafka Training PredictionScoring SparkBatch Real Time Action Dispatch Publish Store Journaling
  • 43. Apache Kafka Distributed messaging system - Fast: Hight throughput for both publishing and subribing - Scalable: Very easy to scale out - Durable: Support persistence of messages - Consumers are responsible to track their location in each log Producer 1 Producer 2 Consumer A Consumer B Consumer C Partition 1 Partition 2 Partition 3
  • 44. Apache Cassandra Massively Scalable NoSql datastore - Elastic Scalability - No single point of failure - Fast linear scale performance 1 Clients write to any Cassandra node 2 Coordinator node replicates to nodes and zones 3 Nodes returns ack to client 4 Data written to internal commit log disk 5 If a node goes offline, hinted handoff completes the write when the node comes back up - Regions = Datacenters - Zones = Racks Node Node Node Node Node Node Cluster
  • 46. MILAN - 08TH OF MAY - 2015 PARTNERS THANK YOU! Stefano Rocco - @whispurr_it Roberto Bentivoglio - @robbenti @DATABIZit PARTNERS FAQ We’re hiring!

Editor's Notes

  1. Rensponsive -> The system responds in a timely manner if at all possible Elastic -> The system stays responsive under varying workload Resilient -> The system stays responsive in the face of failure Message Driven -> Reactive Systems rely on asynchronous message-passing to establish a boundary between components that ensures loose coupling, isolation, location transparency, and provides the means to delegate errors as messages
  2. Remember to mention and explain CRUD
  3. Simplicity - One could teach a Junior developer how to interact with a system built using this architecture in a very short period of time - the architecture is completely generic. Tooling (Framework) - For instance ORM Scaling - RDBMS are at this point not horizontally scalable and vertically scaling becomes prohibitively expensive very quickly DDD - CRUD => Anemic Model (object containing only data and not behavior)
  4. Method command => perform an action (MUTATE THE STATE, WE HAVE HERE SIDE EFFECT) query => return data to the caller (NO SIDE EFFECT, IT’S REFERENTIAL TRASPARENT)
  5. In this slide you don’t need to introduce Event Sourcing but only to speak about command/write/left side vs query/read/right side
  6. Explaining with others words/figures the meaning of Command and Query
  7. Main properties of CQRS
  8. An event is something that has happened in the past.
  9. Remember to speak about the append on journal
  10. Remember to the audience that having an append we don’t have deletion but we have events with opposite sign
  11. Main properties of CQRS