Did you miss Scala Days 2015 in San Francisco? Have no fear! BoldRadius was there and we've compiled the best of the best! Here are the highlights of a great conference.
Improving Mobile Payments With Real time Sparkdatamantra
Talk about real world spark streaming implementation for improving mobile payments experience. Presented at Target data meetup at Bangalore by Madhukara Phatak on 22/08/2015.
Improving Mobile Payments With Real time Sparkdatamantra
Talk about real world spark streaming implementation for improving mobile payments experience. Presented at Target data meetup at Bangalore by Madhukara Phatak on 22/08/2015.
Anatomy of Data Source API : A deep dive into Spark Data source APIdatamantra
In this presentation, we discuss how to build a datasource from the scratch using spark data source API. All the code discussed in this presentation available at https://github.com/phatak-dev/anatomy_of_spark_datasource_api
We are a company driven by inquisitive data scientists, having developed a pragmatic and interdisciplinary approach, which has evolved over the decades working with over 100 clients across multiple industries. Combining several Data Science techniques from statistics, machine learning, deep learning, decision science, cognitive science, and business intelligence, with our ecosystem of technology platforms, we have produced unprecedented solutions. Welcome to the Data Science Analytics team that can do it all, from architecture to algorithms.
Our practice delivers data driven solutions, including Descriptive Analytics, Diagnostic Analytics, Predictive Analytics, and Prescriptive Analytics. We employ a number of technologies in the area of Big Data and Advanced Analytics such as DataStax (Cassandra), Databricks (Spark), Cloudera, Hortonworks, MapR, R, SAS, Matlab, SPSS and Advanced Data Visualizations.
This presentation is designed for Spark Enthusiasts to get started and details of the course are below.
1. Introduction to Apache Spark
2. Functional Programming + Scala
3. Spark Core
4. Spark SQL + Parquet
5. Advanced Libraries
6. Tips & Tricks
7. Where do I go from here?
Lessons learnt and system built while solving the last mile problem in machine learning - taking models to production. Used for the talk at - http://sched.co/BLvf
Slides from my presentation on Lambda Architecture at Indix, presented at Fifth Elephant 2014.
It talks about our experience in using Lambda Architecture at Indix, to build a large scale analytics system on unstructured, dynamically changing data sources using Hadoop, HBase, Scalding, Spark and Solr.
How to Choose a Deep Learning FrameworkNavid Kalaei
The trend of neural networks has been attracted a huge community of researchers and practitioners. However, not all of the upfront runners are masters of deep learning and the colorful frameworks could be confusing, especially for the newcomers. In this presentation, I demystified the mystery of the leading frameworks of deep learning and provided a guideline on how to choose the most suitable option.
When Apache Spark Meets TiDB with Xiaoyu MaDatabricks
During the past 10 years, big-data storage layers mainly focus on analytical use cases. When it comes to analytical cases, users usually offload data onto Hadoop cluster and perform queries on HDFS files. People struggle dealing with modifications on append only storage and maintain fragile ETL pipelines.
On the other hand, although Spark SQL has been proven effective parallel query processing engine, some tricks common in traditional databases are not available due to characteristics of storage underneath. TiSpark sits directly on top of a distributed database (TiDB)’s storage engine, expand Spark SQL’s planning with its own extensions and utilizes unique features of database storage engine to achieve functions not possible for Spark SQL on HDFS. With TiSpark, users are able to perform queries directly on changing / fresh data in real time.
The takeaways from this two are twofold:
— How to integrate Spark SQL with a distributed database engine and the benefit of it
— How to leverage Spark SQL’s experimental methods to extend its capacity.
Story of architecture evolution of one project from zero to Lambda Architecture. Also includes information on how we scaled cluster as soon as architecture is set up.
Contains nice performance charts after every architecture change.
Infrastructure Provisioning in the context of organizationKatarína Valaliková
Nowadays companies/organizations migrate and operate their infrastructure in virtual infrastructures (Cloud/IaaS). To efficiently operate and adapt to everyday changes and requirements they need to leverage automation which will do not only configuration, but orchestration, backup/recovery, reporting and monitoring as well. All of the processes are related to organization and are used by people in the organization.
Imagine a tool which is able to automate and simplify whole process around the IaaS. From spinning whole project’s infrastructure, set it up, help to operate, assign accounts, permissions and deprovisions when project ends. In this presentation we will try to show proposal for such solution. Using OpenStack for private cloud infrastrucure, Chef and midPoint as their orchestrator. And we will try to cover a little bit more. Think about user management and connection between users/employees and the infrastructure....
Erich Ess CTO of SimpelRelevance introduces the Spark distributed computing platform and explains how to integrate it with Cassandra. He demonstrates running a distributed analytic computation on a data-set stored in Cassandra
This presentation was first held at the OpenSQL Camp 2009, part of the FrOSCon conference in St. Augustin, Germany. It gives a nice overview over the project, technology and how it will progress. Find more information at http://www.blackray.org
A brief history of the RDF4J Project and an overview of tools and code examples that demonstrate how to work with it in your applications.
Slides accompanying the Lotico Webinar event on May 14, 2020 - see http://www.lotico.com/index.php/Eclipse_RDF4J_-_Working_with_RDF_in_Java
Senior Software Developer and Lead Trainer Alejandro Lujan explains pattern matching, a very powerful and elegant feature of Scala, using a series of examples.
Learn more about this topic and find more presentation on Scala at:
Anatomy of Data Source API : A deep dive into Spark Data source APIdatamantra
In this presentation, we discuss how to build a datasource from the scratch using spark data source API. All the code discussed in this presentation available at https://github.com/phatak-dev/anatomy_of_spark_datasource_api
We are a company driven by inquisitive data scientists, having developed a pragmatic and interdisciplinary approach, which has evolved over the decades working with over 100 clients across multiple industries. Combining several Data Science techniques from statistics, machine learning, deep learning, decision science, cognitive science, and business intelligence, with our ecosystem of technology platforms, we have produced unprecedented solutions. Welcome to the Data Science Analytics team that can do it all, from architecture to algorithms.
Our practice delivers data driven solutions, including Descriptive Analytics, Diagnostic Analytics, Predictive Analytics, and Prescriptive Analytics. We employ a number of technologies in the area of Big Data and Advanced Analytics such as DataStax (Cassandra), Databricks (Spark), Cloudera, Hortonworks, MapR, R, SAS, Matlab, SPSS and Advanced Data Visualizations.
This presentation is designed for Spark Enthusiasts to get started and details of the course are below.
1. Introduction to Apache Spark
2. Functional Programming + Scala
3. Spark Core
4. Spark SQL + Parquet
5. Advanced Libraries
6. Tips & Tricks
7. Where do I go from here?
Lessons learnt and system built while solving the last mile problem in machine learning - taking models to production. Used for the talk at - http://sched.co/BLvf
Slides from my presentation on Lambda Architecture at Indix, presented at Fifth Elephant 2014.
It talks about our experience in using Lambda Architecture at Indix, to build a large scale analytics system on unstructured, dynamically changing data sources using Hadoop, HBase, Scalding, Spark and Solr.
How to Choose a Deep Learning FrameworkNavid Kalaei
The trend of neural networks has been attracted a huge community of researchers and practitioners. However, not all of the upfront runners are masters of deep learning and the colorful frameworks could be confusing, especially for the newcomers. In this presentation, I demystified the mystery of the leading frameworks of deep learning and provided a guideline on how to choose the most suitable option.
When Apache Spark Meets TiDB with Xiaoyu MaDatabricks
During the past 10 years, big-data storage layers mainly focus on analytical use cases. When it comes to analytical cases, users usually offload data onto Hadoop cluster and perform queries on HDFS files. People struggle dealing with modifications on append only storage and maintain fragile ETL pipelines.
On the other hand, although Spark SQL has been proven effective parallel query processing engine, some tricks common in traditional databases are not available due to characteristics of storage underneath. TiSpark sits directly on top of a distributed database (TiDB)’s storage engine, expand Spark SQL’s planning with its own extensions and utilizes unique features of database storage engine to achieve functions not possible for Spark SQL on HDFS. With TiSpark, users are able to perform queries directly on changing / fresh data in real time.
The takeaways from this two are twofold:
— How to integrate Spark SQL with a distributed database engine and the benefit of it
— How to leverage Spark SQL’s experimental methods to extend its capacity.
Story of architecture evolution of one project from zero to Lambda Architecture. Also includes information on how we scaled cluster as soon as architecture is set up.
Contains nice performance charts after every architecture change.
Infrastructure Provisioning in the context of organizationKatarína Valaliková
Nowadays companies/organizations migrate and operate their infrastructure in virtual infrastructures (Cloud/IaaS). To efficiently operate and adapt to everyday changes and requirements they need to leverage automation which will do not only configuration, but orchestration, backup/recovery, reporting and monitoring as well. All of the processes are related to organization and are used by people in the organization.
Imagine a tool which is able to automate and simplify whole process around the IaaS. From spinning whole project’s infrastructure, set it up, help to operate, assign accounts, permissions and deprovisions when project ends. In this presentation we will try to show proposal for such solution. Using OpenStack for private cloud infrastrucure, Chef and midPoint as their orchestrator. And we will try to cover a little bit more. Think about user management and connection between users/employees and the infrastructure....
Erich Ess CTO of SimpelRelevance introduces the Spark distributed computing platform and explains how to integrate it with Cassandra. He demonstrates running a distributed analytic computation on a data-set stored in Cassandra
This presentation was first held at the OpenSQL Camp 2009, part of the FrOSCon conference in St. Augustin, Germany. It gives a nice overview over the project, technology and how it will progress. Find more information at http://www.blackray.org
A brief history of the RDF4J Project and an overview of tools and code examples that demonstrate how to work with it in your applications.
Slides accompanying the Lotico Webinar event on May 14, 2020 - see http://www.lotico.com/index.php/Eclipse_RDF4J_-_Working_with_RDF_in_Java
Senior Software Developer and Lead Trainer Alejandro Lujan explains pattern matching, a very powerful and elegant feature of Scala, using a series of examples.
Learn more about this topic and find more presentation on Scala at:
In this video, senior software developer Alejandro Lujan explores the elements of Scala's language that allow you to write clean and powerful code in a more brief manner.
This presentation explores the benefits of functional programming, especially with respect to reliability. It presents a sample of types that allow many program invariants to be enforced by compilers. We also discuss the industrial adoption of functional programming, and conclude with a live coding demo in Scala.
Alejandro Lujan introduces us to String Interpolation, a feature of Scala that allows us to have placeholders inside of string definitions, and explains why you would want to use them. Video included!
Senior Software Developer Alejandro Lujan discusses the collections API in Scala, and provides some insight into what it can do with with some examples.
This presentation provides an overview on Value Classes in Scala, which is explained in the video on the last slide by Alejandro Lujan. He explains why you would want to use them, outlines the restrictions that are associated with them, and shows examples of how you would use them. Value classes are a mechanism that Scala provides to create a certain type of wrapper classes that provide memory and performance optimizations. In this video, we show a use case for Tiny Types with Value classes.
In his latest Typesafe tutorial video, Alejandro Lujan explains for expressions in Scala, and provides an example of them in action.
For expressions are a very useful construct that can simplify manipulation of collections and several other data structures. They can be used in place of nested for loops, or to replace calls to map and flatMap in non-collection structures.
Learn more
As a full-time Scala developer, I often find myself talking about Scala and functional programming in different kinds of situations, ranging from meeting a friend working in J2EE, Ruby or C++, to dedicated Scala Meetups aiming to promote deeper understanding of the language. However, something occurred to me lately. By hanging out with people who have some Scala knowledge or experience, I am somewhat holding on to a safe place. By presenting only to people who are curious about Scala, I'm preaching to the converted.
To make a long story short, I recently made an attempt at getting out of my comfort zone by presenting about how making the transition from Java to Scala makes total sense (from Java developer point of view). The presentation went through proof-hearing of approximately 60 experienced Java programmers (with almost no prior Scala knowledge) gathered in one room for a Lunch & Learn. Here are my slides.
Punishment Driven Development #agileinthecityLouise Elliott
What is the first thing we do when a major issue occurs in a live system? Sort it out of course. Then we start the hunt for the person to blame so that they can suffer the appropriate punishment. What do we do if a person is being awkward in the team and won’t agree to our ways of doing things? Ostracise them of course, and see how long it is until they hand in their notice – problem solved.
This highly interactive talk delves into why humans have this tendency to blame and punish. It looks at real examples of punishment within the software world and the results which were achieved. These stories not only cover managers punishing team members but also punishment within teams and self-punishment. We are all guilty of some of the behaviours discussed.
This is aimed at everyone involved in software development. It covers:
• Why we tend to blame and punish others.
• The impact of self-blame.
• The unintended (but predictable) results from punishment.
• The alternatives to punishment, which get real results.
The world has changed and having one huge server won’t do the job anymore, when you’re talking about vast amounts of data, growing all the time the ability to Scale Out would be your savior. Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
This lecture will be about the basics of Apache Spark and distributed computing and the development tools needed to have a functional environment.
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaDataWorks Summit
At NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences.
To achieve that, we need to ingest billions of events per day into our big data stores, and we need to do it in a scalable yet cost-efficient manner.
In this session, we will discuss how we continuously transform our data infrastructure to support these goals.
Specifically, we will review how we went from CSV files and standalone Java applications all the way to multiple Kafka and Spark clusters, performing a mixture of Streaming and Batch ETLs, and supporting 10x data growth.
We will share our experience as early-adopters of Spark Streaming and Spark Structured Streaming, and how we overcame technical barriers (and there were plenty...).
We will present a rather unique solution of using Kafka to imitate streaming over our Data Lake, while significantly reducing our cloud services' costs.
Topics include :
* Kafka and Spark Streaming for stateless and stateful use-cases
* Spark Structured Streaming as a possible alternative
* Combining Spark Streaming with batch ETLs
* "Streaming" over Data Lake using Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaItai Yaffe
Going into different streaming methods, we will share our experience as early-adopters of Spark Streaming and Spark Structured Streaming, and how we overcame technical barriers (and there were plenty...).
We will also present a rather unique solution of using Kafka to imitate streaming over our Data Lake, while significantly reducing our cloud services’ costs.
Topics include :
* Kafka and Spark Streaming for stateless and stateful use-cases
* Spark Structured Streaming as a possible alternative
* Combining Spark Streaming with batch ETLs
* “Streaming” over Data Lake using Kafka
What is Distributed Computing, Why we use Apache SparkAndy Petrella
In this talk we introduce the notion of distributed computing then we tackle the Spark advantages.
The Spark core content is very tiny because the whole explanation has been done live using a Spark Notebook (https://github.com/andypetrella/spark-notebook/blob/geek/conf/notebooks/Geek.snb).
This talk has been given together by @xtordoir and myself at the University of Liège, Belgium.
Data Science Salon: A Journey of Deploying a Data Science Engine to ProductionFormulatedby
Presented by Mostafa Madjipour., Senior Data Scientist at Time Inc.
Next DSS NYC Event 👉 https://datascience.salon/newyork/
Next DSS LA Event 👉 https://datascience.salon/la/
Reducing the gap between R&D and production is still a challenge for data science/ machine learning engineering groups in many companies. Typically, data scientists develop the data-driven models in a research-oriented programming environment (such as R and python). Next, the data/machine learning engineers rewrite the code (typically in another programming language) in a way that is easy to integrate with production services.
This process has some disadvantages: 1) It is time consuming; 2) slows the impact of data science team on business; 3) code rewriting is prone to errors.
A possible solution to overcome the aforementioned disadvantages would be to implement a deployment strategy that easily embeds/transforms the model created by data scientists. Packages such as jPMML, MLeap, PFA, and PMML among others are developed for this purpose.
In this talk we review some of the mentioned packages, motivated by a project at Time Inc. The project involves development of a near real-time recommender system, which includes a predictor engine, paired with a set of business rules.
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaDatabricks
At NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. To achieve that, we need to ingest billions of events per day into our big data stores, and we need to do it in a scalable yet cost-efficient manner.
In this session, we will discuss how we continuously transform our data infrastructure to support these goals. Specifically, we will review how we went from CSV files and standalone Java applications all the way to multiple Kafka and Spark clusters, performing a mixture of Streaming and Batch ETLs, and supporting 10x data growth We will share our experience as early-adopters of Spark Streaming and Spark Structured Streaming, and how we overcame technical barriers (and there were plenty). We will present a rather unique solution of using Kafka to imitate streaming over our Data Lake, while significantly reducing our cloud services’ costs. Topics include:
Kafka and Spark Streaming for stateless and stateful use-cases
Spark Structured Streaming as a possible alternative
Combining Spark Streaming with batch ETLs
”Streaming” over Data Lake using Kafka
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks
As Apache Spark applications move to a containerized environment, there are many questions about how to best configure server systems in the container world. In this talk we will demonstrate a set of tools to better monitor performance and identify optimal configuration settings. We will demonstrate how Prometheus, a project that is now part of the Cloud Native Computing Foundation (CNCF: https://www.cncf.io/projects/), can be applied to monitor and archive system performance data in a containerized spark environment.
In our examples, we will gather spark metric output through Prometheus and present the data with Grafana dashboards. We will use our examples to demonstrate how performance can be enhanced through different tuned configuration settings. Our demo will show how to configure settings across the cluster as well as within each node.
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...Alexey Zinoviev
Alexey Zinoviev presented this paper on the JPoint'15 conference javapoint.ru/talks/#zinoviev.
This paper covers next topics: Java, JPA, Morphia, Hibernate OGM, Spring Data, Hector, Kundera, NoSQL, Mongo, Cassandra, HBase, Riak
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
AWS Big Data Demystified #1: Big data architecture lessons learned . a quick overview of a big data techonoligies, which were selected and disregard in our company
The video: https://youtu.be/l5KmaZNQxaU
dont forget to subcribe to the youtube channel
The website: https://amazon-aws-big-data-demystified.ninja/
The meetup : https://www.meetup.com/AWS-Big-Data-Demystified/
The facebook group : https://www.facebook.com/Amazon-AWS-Big-Data-Demystified-1832900280345700/
In this webinar, Michael Nash of BoldRadius explores the Typesafe Reactive Platform.
The Typesafe Reactive Platform is a suite of technologies and tools that support the creation of reactive applications, that is, applications that handle the kind of responsiveness requirements, data volume, and user load that was out of practical reach only a few years ago.
From analysis of the human genome to wearable technology to communications at a massive scale, BoldRadius has the premier team of experts with decades of collective experience in designing and building these types of applications, and in helping teams adopt these tools.
Patrick Premont of BoldRadius presented this talk at Scala By The Bay 2015.
Why do data structure lookups often return Options? Could we safely eliminate all the recovery code that we hope is never called? We will see how Scala’s type system lets us express referential integrity constraints to achieve unparalleled reliability. We apply the technique to in-memory data structures using the Total-Map library and consider how to extend the benefits to persisted data.
How You Convince Your Manager To Adopt Scala.js in ProductionBoldRadius Solutions
Dave Sugden and Katrin Shechtman of BoldRadius presented this talk at Scala By The Bay 2015.
The talk will present fully functional sample application developed with Scala.js, scalatags, scalacss and other Scala and Typesafe technologies. We aim to show all the pros and cons for having Scala coast-to-coast approach to web-application development and encourage people not to shy away from asking difficult questions challenging this approach. Participants can expect to gain a clear view on the current state of the Scala based client side technologies and take away an activator template with application code that could be used as a base for technical discussions with their peers and managers.
Domain Driven Design with Onion Architecture is a powerful combination of Architecture Patterns that can dramatically improve code quality and can help you learn a great deal about writing "clean" code.
Senior Software Developer and Trainer Alejandro Lujan explains sealed classes, why they are needed, and how to implement them in Scala. Read more on the BoldRadius blog: http://boldradius.com/blog-post/VBB3uzIAADYAiiSy/sealed-classes-in-scala
BoldRadius' Senior Software Developer Alejandro Lujan explains how to use higher order functions in Scala and illustrates them with some examples.
See the accompanying video at www.boldradius.com/blog
Mike Kelland and the BoldRadius team lead an interactive discussion at Scala Days 2014 in Berlin on adopting the Typesafe Reactive Platform and creating change in your organization.
We explored approaches to solving the pain points that may arise, presenting tools, strategies and resources designed to help you adopt the Typesafe Reactive Platform today.
BoldRadius senior developer Alejandro Lujan shows us some examples of using case classes in Scala, explains why they are beneficial, and shares some items to be mindful of. Learn more about using case classes in Scala on our blog.
2. Who?
● BoldRadius Solutions
○ boldradius.com
○ Typesafe Partner
○ Scala, Akka and Play specialists
○ Ottawa, Saskatoon, San Francisco, Boston, Chicago, Montreal, New
York, Toronto
● Michael Nash, VP Capabilities
● Adam Murray, VP Business
4. What?
What was ScalaDays all about?
● Held in San Francisco March 16th thru 18th
○ About 800 attendees
○ Three categories (intermediate, beginner,
advanced),
○ Four tracks, 55 presentations, three keynotes
● Followed by Two Intensive Training days
○ All the Typesafe courses offered
7. Keynotes
● Martin Odersky, Chief Architect & Co-Founder at Typesafe
○ Scala - where it came from, where it’s going
● Danese Cooper, Distinguished Member of Technical Staff -
Open Source at Paypal
○ Open Languages
● Dianne March, Director of Engineering Tools at Netflix
○ Technical Leadership from wherever you are
9. Highlights
Too many great sessions to summarize them all, we have
to extract a few recurring themes…
○ Distributed Application Design and Development
○ Big Data/Fast Data
○ Types and Safe Scala
○ Performance and Scalability
10. Distributed Applications
● Life Beyond the Illusion of Present
● Reactive Reference Architectures
● Akka in Production: Why and How
● Easy Scalability with Akka
● Scalable task distribution with Scala, Akka
and Mesos
● A Guided Tour of a Distributed Application
11. Performance and Scalability
● Scala Collections and Performance
● Shattering Hadoop’s Large-Scala Sort Record with
Spark and Scala
● Type-safe off-heap memory for Scala
● Akka in Production: Why and How
● Easy Scalability with Akka
● The JVM Backend and Optimizer in Scala 2.12
12. Big Data, Fast Data
● Shattering Hadoop’s Large Scala Sort Record with
Spark and Scala
● Scala - The Real Spark of Data Science
● Apache Spark: A Large Community Project in Scala
● S3 at Scala: Async Scala Client with Play Iteratees and
Composable Operations
● The Unreasonable Effectiveness of Scala for Big Data
● Scalable task distribution with scala, Akka and Mesos
13. Types and Safer Scala
● Keynote: Scala - where it came from, where it’s going
● Towards a Safer Scala
● Leveraging Scala Macros for Better Validation
● Type-level programming in Scala 101
● Improving Correctness with Types
● Happy Paths: Functional Constructs in the Wild
● The Scalactic Way
● Delimited dependency-typed monadic checked
exceptions
14. The Rest...
Many excellent talks were outside these categories
● Why Scala.js
● Reactive Slick for Database Programming
● Exercise in machine learning
● Functional Natural Language Processing
● If I Only Had a Brain...in Scala
● Akka HTTP: the Reactive Web Toolkit
● many many others
15. Highlights of Specific Sessions
A quick taster of what was in some of the more
popular sessions...
17. Where it’s from, where it’s going
“Scala is a gateway drug to Haskell” (in actual fact it’s going well beyond Haskell.)
Slides: http://www.slideshare.net/Odersky/scala-days-san-francisco-45917092
Came from a practical combination of OOP and functional programming
- Funny story about hipster syntax (..) instead of [..], <..> instead of [..], ??? instead of <..>
- Trend in Type Systems
- Scala JS is no longer experimental
- TASTY: new scala-specific platform
- Introduction to DOT
- Type Parameters
- Better treatment of effects with implicit Functions instead of Monads
18. TASTY
Scala faces challenges:
● binary compatibility
● having to pick a platform: JDK( 7,8,9,10, ?) or Javascript.
Proposing a scala-specific platform called TASTY, (serialized typed abstract syntax tree),
as an intermediate representation before bytecode - carries metadata of types, can be compiled with
different versions of JDK, and to Javascript.
19. Tasty will enable:
● instrumentation
● optimization
● code analysis
● refactoring
● publish once run anywhere
● automated remapping to solve binary compatibility issues.
21. Explorations:
Hope to find something cooler than Monads to handle effects.
● Monads don’t commute
● Require Monad transformers for composition
● Monad transformers make Martin’s head explode
22. Toward a Safer Scala
Leif Wickland
http://tinyurl.com/sd15lint
● Scalac enables some error-prone code.
○ head of empty List?
● Using Static Analysis to detect errors early
● IDE based solutions
○ Inconsistencies
○ If not in release build process, doesn’t exist
● Web-based analysis
○ outside of compile loop
○ relatively immature analysis
25. Life Beyond the Illusion of Present
Jonas Bonér:
The idea of the present is an illusion. Everything we see, hear and feel is just
an echo from the past. But this illusion has influenced us and the way we view
the world in so many ways.
There is no present, all we have is facts derived from the merging of multiple
pasts. The truth is closer to Einstein’s physics where everything is relative to
one’s perspective. As developers we need to wake up and break free from the
perceived reality of living in a single globally consistent present.
26. The advent of multicore and cloud computing architectures meant that most
applications today are distributed systems—multiple cores separated by the
memory bus or multiple nodes separated by the network—which puts a harsh
end to this illusion.
The only way to design truly scalable and performant systems that can
construct a sufficiently consistent view of history—and thereby our local
“present”—is by treating time as a first class construct in our programming
model and to model the present as facts derived from the merging of multiple
concurrent pasts.
27. How do we deal with failure and communication unreliability in real life?
Confirmation and repetition
We can’t force the world into a globally consistent present (CRUD).
Mentioned 2 paradigms/theories:
● CALM (consistency as logical monotonicity)
● CRDT (Commutative Replicated Data Type)
CRDT (Commutative Replicated Data Type) eventually consistent data types
● minimize contention / coordination in a distributed system.
● set, maps, graphs: rich data types.
● monotonic merge function: all state change is monotonically increasing, no way
back.
29. ● Using Wrapper Types (aka Tiny Types) instead of primitives
● Never use null, or throw Exceptions
● use === org.scalactic.TypeCheckedTripleEquals
○ requires the types of the two values compared to be in a subtype/supertype
● Never Use Non-Empty Lists when a list must be populated (org.scalactic.Every)
● Use Type Tags (ala Shapeless, Scalaz)
● Use Path Dependent Types
● Other reading
○ Self recursive types
○ Phantom Types
○ Shapeless
○ Scalactic
Types: Defensive Programming. Fail Fast. Design By Contract.
30. Function Passing Style
Heather Miller:
● A new programming model called function passing designed to overcome many of
imperative / weakly-typed issues found in traditional “big data” processing
systems.
● Provides a more principled substrate on which to build data-centric distributed
systems.
● Pass safe, well-typed serializable functions to immutable distributed data
● Based on her work on Pickling and Spores
● Uses Spores (Serializable Functions) for a distributed model.
● Kind of an Inverse of the Actor Model
● Stateless. data is stationary, functions are passed around.
● Uses Data Silos accessed through a Silo Ref.
31. The Unreasonable Effectiveness of Scala for Big Data
Dean Wampler
● How Hadoop Works - Map Reduce
○ Problems
■ Hard to implement algorithms
■ The Hadoop API is horrible
● Scalding
○ An improved Hadoop API in Scala
○ Problems
■ Still uses a batch mode
● Spark
○ An elegant, functional API
○ Still in Batch Mode, but with mini batches which
approach real time.
32. Akka HTTP: The Reactive Web Toolkit
Roland Kuhn
● Replaces Spray
● Uses Akka Streams
○ Sources emit values to the stream
○ Sinks receive values, act on them
○ Sources can compose using Zip and Graph shapes
● “The pinball interpreter”
○ produce data
○ move downstream through transformations
○ get to the effect
○ go up and ask for more data
○ Filters interrupt the flow before getting to the effect, make the pinball go back
upstream.
● A live coded demonstration of using Streams and Http
● Expected timeline for Streams - 4 weeks
● Expected timeline for Http - 8 weeks
33. The Scalactic Way
Bill Venners
http://www.slideshare.net/bvenners/the-scalactic-way
ScalaTest: quality through tests
Scalactic: quality through types
SuperSafe: quality through static analysis
34.
35.
36. Reactive Slick for Database Programming
Stefan Zeiger
http://slick.typesafe.com/talks/scaladays2015sf/Reactive_Slick_for_Database_Programming.pdf
Slick 3.0
● JDBC is inherently blocking (and blocking ties up threads)
● Traditional Model
○ Fully synchronous
○ One thread per web request
○ Contention for Connections (getConnection blocks)
○ Database back-pressure creates more blocked threads
○ Doesn’t scale
37. New Slick Architecture makes use of a new DataType to provide
Asynchronous Database I/O
● based on State, IO and Free Monads.
● Returns a Future[R]
● Creates a separate ExecutionContext, avoiding blocking of current thread
● Works with akka-streams to create back pressure, so DB only gives as much data as client can
process.
● For performance purposes it pre-fetches some data to keep client busy while it waits for the next
portion from DB.
sealed trait DBIOAction[+R, +S <: NoStream, -E <: Effect]{
def map[R2](f: R => R2)(implicit executor: ExecutionContext): DBIOAction[R2, NoStream, E]
def flatMap[R2, S2 <: NoStream, E2 <: Effect](f: R => DBIOAction[R2, S2, E2])(implicit
executor: ExecutionContext): DBIOAction[R2, S2, E with E2]
...
}
39. Easy Scalability with Akka
Michael Nash
● Reviewed Akka, CQRS, ES
● Introduced Distributed DDD
● Identical clustered system with DDDD and Without
● Gatling performance tests on both
40. ConductR
● Application manager that empowers ops to
deploy distributed systems
● Uses Akka, Play, Aka Streams, Akka
Cluster, FSM, Akka Data Replication
● How can we run cluster based apps ensuring
the seed nodes are started first?
○ State replicated using Data
Replication
● How can we consolidate logging?
○ Using Akka Streams
● How can we avoid batching?
○ Use Event Driven Architecture
● How can monitor/test multiple nodes?
○ Use the visualizer built into conductor
● How can we share state among the nodes?
○ Use Akka Data Replication