BugSense is a company that analyzes data from 400 million mobile devices using custom databases built with Erlang, C, and Lisp. They process over 120,000 writes per second using their distributed, concurrent database called LDB. LDB is an in-memory time-series stream processor that allows for complex event processing and distributed, modular architecture through Erlang nodes. They use Lisp for custom data processing and views due to its expressive power for parallel computing on large datasets.
By Eduardo Lima.
If you are running Linux on an Intel GPU, chances are that your graphics driver just got much better. Mesa, the most popular open source OpenGL implementation, has got a new intermediate language to represent GLSL shader programs. It is called NIR, and is based on modern knowledge on compilers and GPU architecture. The Intel i965 driver is fully powered by NIR now, after support to non-scalar shaders has been recently added.
This talk will give the audience a tour around the work done during 2015, that culminated in the rewrite of the i965 non-scalar backend to use NIR, and was released in Mesa 11.0. It will present a brief technical overview of how NIR integrates into a Mesa backend, some important considerations, and the main challenges we faced. It will also illustrate the benefits of the new backend by comparing to the old one in terms of performance and backend code complexity.
(c) 2016 FOSDEM VZW
CC BY 2.0 BE
https://archive.fosdem.org/2016/
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)Igalia
By Andy Wingo.
With the new compiler and virtual machine in Guile 2.2, Guile hackers need to update their mental performance models. This talk will give a bit of a state of the union of Guile performance, with an updated overview of the cost of various kinds of abstractions. Sometimes abstraction is free!
(c) 2016 FOSDEM VZW
CC BY 2.0 BE
https://archive.fosdem.org/2016/
Asynchronous single page applications without a line of HTML or Javascript, o...Robert Schadek
AngularJS, together with Node.js, is an extremely powerful combination for building single page applications. Unfortunately, its development requires writing HTML and Javascript, which is tedious and error prone. By using vibe.d, HTML is no longer necessary, and the developers can use the full power of a static-typed language for the development of the backend. Substituting Javascript with Typescript in addition to a little bit of CTFE D magic then removes the need for redundant data type declarations, and makes everything statically typed. At the end of the talk, the attendee will have witnessed the creation of a statically typed, asynchronous single page application that required little extra typing than its dynamically typed equivalent. Additionally, the attendees will be motivated to explore the presented combination of frameworks as a viable desktop application UI framework.
The session is about how you can use dgraph in your project as a database with java. It has detail information's about the dgraph sync client and overview of the asynchronous client. It also has information about dgraph HTTP endpoints.
By Eduardo Lima.
If you are running Linux on an Intel GPU, chances are that your graphics driver just got much better. Mesa, the most popular open source OpenGL implementation, has got a new intermediate language to represent GLSL shader programs. It is called NIR, and is based on modern knowledge on compilers and GPU architecture. The Intel i965 driver is fully powered by NIR now, after support to non-scalar shaders has been recently added.
This talk will give the audience a tour around the work done during 2015, that culminated in the rewrite of the i965 non-scalar backend to use NIR, and was released in Mesa 11.0. It will present a brief technical overview of how NIR integrates into a Mesa backend, some important considerations, and the main challenges we faced. It will also illustrate the benefits of the new backend by comparing to the old one in terms of performance and backend code complexity.
(c) 2016 FOSDEM VZW
CC BY 2.0 BE
https://archive.fosdem.org/2016/
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)Igalia
By Andy Wingo.
With the new compiler and virtual machine in Guile 2.2, Guile hackers need to update their mental performance models. This talk will give a bit of a state of the union of Guile performance, with an updated overview of the cost of various kinds of abstractions. Sometimes abstraction is free!
(c) 2016 FOSDEM VZW
CC BY 2.0 BE
https://archive.fosdem.org/2016/
Asynchronous single page applications without a line of HTML or Javascript, o...Robert Schadek
AngularJS, together with Node.js, is an extremely powerful combination for building single page applications. Unfortunately, its development requires writing HTML and Javascript, which is tedious and error prone. By using vibe.d, HTML is no longer necessary, and the developers can use the full power of a static-typed language for the development of the backend. Substituting Javascript with Typescript in addition to a little bit of CTFE D magic then removes the need for redundant data type declarations, and makes everything statically typed. At the end of the talk, the attendee will have witnessed the creation of a statically typed, asynchronous single page application that required little extra typing than its dynamically typed equivalent. Additionally, the attendees will be motivated to explore the presented combination of frameworks as a viable desktop application UI framework.
The session is about how you can use dgraph in your project as a database with java. It has detail information's about the dgraph sync client and overview of the asynchronous client. It also has information about dgraph HTTP endpoints.
Optimizing with persistent data structures (LLVM Cauldron 2016)Igalia
By Andy Wingo.
Is there life beyond phi variables and basic blocks? Andy will report on his experience using a new intermediate representation for compiler middle-ends, "CPS soup". The CPS soup language represents programs using Clojure-inspired maps, allowing optimizations to be neatly expressed as functions from one graph to another. Using these persistent data structures also means that the source program doesn't change while the residual program is being created, eliminating one class of problems that the optimizer writer has to keep in mind. Together we will look at some example transformations from an expressiveness as well as a performance point of view, and we will also cover some advantages which a traditional SSA graph maintains over CPS soup.
Knit, Chisel, Hack: Building Programs in Guile Scheme (Strange Loop 2016)Igalia
By Andy Wingo.
This talk makes the case that Guile is a delightful medium for making crafty programs, from the most ephemeral scripts to long-lived systems that you can rely on for years. Guile takes the elegant Scheme programming language, integrates it with the POSIX environments that you know and loathe and love, and wraps it all up in a responsive, hackable environment that nurtures programs from the small up to the large. Guile hacker will give you a gentle introduction to the language as they lead you through the process of building cool stuff in Scheme. With all this going for it, maybe you will choose to make your next program in Guile!
(c) Strange Loop 2016
http://www.thestrangeloop.com/2016/sessions.html
There is garbage in our code - Talks by SoftbinatorAdrian Oprea
A presentation packed with useful information about garbage collection algorithms, JavaScript memory allocation as well as V8's approach to garbage collection.
They're in the news, the big boys are talking about them, Microsoft, Amazon, Intel, etc. but what are they? This talk is an introduction to FPGAs from the software developer's perspective. I will address what they are and how they are different from CPUs and GPUs. We will look at examples of when you would use FPGAs over other technology. In the remainder of the talk, I will introduce you to high-level development for FPGAs.
Optimizing Communicating Event-Loop Languages with TruffleStefan Marr
Communicating Event-Loop Languages similar to E and AmbientTalk are recently gaining more traction as a subset of actor languages. With the rise of JavaScript, E’s notion of vats and non-blocking communication based on promises entered the mainstream. For implementations, the combination of dynamic typing, asynchronous message sending, and promise resolution pose new optimization challenges.
This paper discusses these challenges and presents initial experiments for a Newspeak implementation based on the Truffle framework. Our implementation is on average 1.65x slower than Java on a set of 14 benchmarks. Initial optimizations improve the performance of asynchronous messages and reduce the cost of encapsulation on microbenchmarks by about 2x. Parallel actor benchmarks further show that the system scales based on the workload characteristics. Thus, we conclude that Truffle is a promising platform also for communicating event-loop languages.
We show GEO map search request with The Nominatim Search Service and one GPS Example. It has a Map Quest Search with a simple and efficient interface and powerful capabilities, and relies solely on data contributed to OpenStreetMap.
On the menu View/GEO Map View you can use the function without scripting.
Optimizing with persistent data structures (LLVM Cauldron 2016)Igalia
By Andy Wingo.
Is there life beyond phi variables and basic blocks? Andy will report on his experience using a new intermediate representation for compiler middle-ends, "CPS soup". The CPS soup language represents programs using Clojure-inspired maps, allowing optimizations to be neatly expressed as functions from one graph to another. Using these persistent data structures also means that the source program doesn't change while the residual program is being created, eliminating one class of problems that the optimizer writer has to keep in mind. Together we will look at some example transformations from an expressiveness as well as a performance point of view, and we will also cover some advantages which a traditional SSA graph maintains over CPS soup.
Knit, Chisel, Hack: Building Programs in Guile Scheme (Strange Loop 2016)Igalia
By Andy Wingo.
This talk makes the case that Guile is a delightful medium for making crafty programs, from the most ephemeral scripts to long-lived systems that you can rely on for years. Guile takes the elegant Scheme programming language, integrates it with the POSIX environments that you know and loathe and love, and wraps it all up in a responsive, hackable environment that nurtures programs from the small up to the large. Guile hacker will give you a gentle introduction to the language as they lead you through the process of building cool stuff in Scheme. With all this going for it, maybe you will choose to make your next program in Guile!
(c) Strange Loop 2016
http://www.thestrangeloop.com/2016/sessions.html
There is garbage in our code - Talks by SoftbinatorAdrian Oprea
A presentation packed with useful information about garbage collection algorithms, JavaScript memory allocation as well as V8's approach to garbage collection.
They're in the news, the big boys are talking about them, Microsoft, Amazon, Intel, etc. but what are they? This talk is an introduction to FPGAs from the software developer's perspective. I will address what they are and how they are different from CPUs and GPUs. We will look at examples of when you would use FPGAs over other technology. In the remainder of the talk, I will introduce you to high-level development for FPGAs.
Optimizing Communicating Event-Loop Languages with TruffleStefan Marr
Communicating Event-Loop Languages similar to E and AmbientTalk are recently gaining more traction as a subset of actor languages. With the rise of JavaScript, E’s notion of vats and non-blocking communication based on promises entered the mainstream. For implementations, the combination of dynamic typing, asynchronous message sending, and promise resolution pose new optimization challenges.
This paper discusses these challenges and presents initial experiments for a Newspeak implementation based on the Truffle framework. Our implementation is on average 1.65x slower than Java on a set of 14 benchmarks. Initial optimizations improve the performance of asynchronous messages and reduce the cost of encapsulation on microbenchmarks by about 2x. Parallel actor benchmarks further show that the system scales based on the workload characteristics. Thus, we conclude that Truffle is a promising platform also for communicating event-loop languages.
We show GEO map search request with The Nominatim Search Service and one GPS Example. It has a Map Quest Search with a simple and efficient interface and powerful capabilities, and relies solely on data contributed to OpenStreetMap.
On the menu View/GEO Map View you can use the function without scripting.
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXrhatr
Graph relationships are everywhere. In fact, more often than not, analyzing relationships between points in your datasets lets you extract more business value from your data.
Consider social graphs, or relationships of customers to each other and products they purchase, as two of the most common examples. Now, if you think you have a scalability issue just analyzing points in your datasets, imagine what would happen if you wanted to start analyzing the arbitrary relationships between those data points: the amount of potential processing will increase dramatically, and the kind of algorithms you would typically want to run would change as well.
If your Hadoop batch-oriented approach with MapReduce works reasonably well, for scalable graph processing you have to embrace an in-memory, explorative, and iterative approach. One of the best ways to tame this complexity is known as the Bulk synchronous parallel approach. Its two most widely used implementations are available as Hadoop ecosystem projects: Apache Giraph (used at Facebook), and Apache GraphX (as part of a Spark project).
In this talk we will focus on practical advice on how to get up and running with Apache Giraph and GraphX; start analyzing simple datasets with built-in algorithms; and finally how to implement your own graph processing applications using the APIs provided by the projects. We will finally compare and contrast the two, and try to lay out some principles of when to use one vs. the other.
Hadoop became the most common systm to store big data.
With Hadoop, many supporting systems emerged to complete the aspects that are missing in Hadoop itself.
Together they form a big ecosystem.
This presentation covers some of those systems.
While not capable to cover too many in one presentation, I tried to focus on the most famous/popular ones and on the most interesting ones.
Hadoop became the most common systm to store big data.
With Hadoop, many supporting systems emerged to complete the aspects that are missing in Hadoop itself.
Together they form a big ecosystem.
This presentation covers some of those systems.
While not capable to cover too many in one presentation, I tried to focus on the most famous/popular ones and on the most interesting ones.
My talk for the Scala meetup at PayPal's Singapore office.
The intention is to focus on 3 things:
(a) two common functions in Apache Spark "aggregate" and "cogroup"
(b) Spark SQL
(c) Spark Streaming
The umbrella event is http://www.meetup.com/Singapore-Scala-Programmers/events/219613576/
The Fundamentals Guide to HDP and HDInsightGert Drapers
This session will give you the architectural overview and introduction in to inner workings of HDP 2.0 (http://hortonworks.com/products/hdp-windows/) and HDInsight. The world has embraced the Hadoop toolkit to solve their data problems from ETL, data warehouses to event processing pipelines. As Hadoop consists of many components, services and interfaces, understanding its architecture is crucial, before you can successfully integrate it in to your own environment.
Leveraging Hadoop in your PostgreSQL EnvironmentJim Mlodgenski
This talk will begin with a discussion of the strengths of PostgreSQL and Hadoop. We will then lead into a high level overview of Hadoop and its community of projects like Hive, Flume and Sqoop. Finally, we will dig down into various use cases detailing how you can leverage Hadoop technologies for your PostgreSQL databases today. The use cases will range from using HDFS for simple database backups to using PostgreSQL and Foreign Data Wrappers to do low latency analytics on your Big Data.
Hadoop isn't limited to running Java code, you can write your jobs in a variety of dynamic languages.
This talk is about Hadoop's Streaming API, and the best way we found to run Perl jobs on Amazon's Elastic MapReduce platform.
Big Data Scala by the Bay: Interactive Spark in your Browsergethue
Supporting running Spark scripts directly from a browser would bring the user experience up. Indeed, everybody has a Web navigator, the command line can be avoided, built-in graphing and visualization make it easy to explore and understand data with just a few clicks. This also simplifies the administration as now everything becomes centralized in a service and is accessible by non native clients. For this purpose, an open source Spark Job Server was developed in order to provide Scala, SQL and Python in a Web shell. The main Hadoop components of the platform are also integrated in the same interface. This talk describes the architecture of the Spark Server and its main features: # Scala, Python, SQL submissions # Impersonation # Security # Job progress / canceling # YARN / HDFS / Hive integration The server also ships with a friendly user interface built as a Hue app. We will focus on explaining how they were built, how to use the API and which lessons were learned. The final end user interaction will be live demoed.
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data for Mobile
1. BugSense
Big Data for mobile
with Erlang, C, LISP
Dionisis "Dio" Kakoliris
Head of Engineering
dgk@bugsense.com
2. BugSense Trivia
● Third biggest mobile SDK in the world
● Analyzing data from ~400M devices
● More than ~120k writes/updates per sec
● Custom Big Data database (LDB)
● Eleven engineers
● Cash positive
7. Overview
● Complex Event Processing, In-Memory DB
● Time-series Stream Processor
● Super easy to setup/use - one package
● No big locks (fine-grained locking)
● Describe-your-data mentality
● C is fast
● C is ideal for destructive updates (imperative)
● "Let it crash!"
8. Overview (cont'd)
● LISP-like DSL for custom processing / views
● Ideal for parallelism (functional)
● Lazy loading of files (asynchronous)
● Saves data to disk (asynchronous)
● Modular / Extendable architecture
● Custom reducers / off-line processing
9. ...And now, let's go BIG
● We love Erlang!
● Node isolation, replication, supervision
● Request processing and forwarding
● Distributed
● Ideal for building real-time systems
● Fast, robust, reliable
● YOU TRUST ERLANG
10. Why Erlang?
● Scaling
● Handles lots of connections efficiently (HTTP)
● Sending/receiving messages from/to nodes is trivial
● Building a replication/take over engine is easy
● Mnesia for storage and shared info
● C Linkedin Drivers
● Being able to connect to remote nodes
● All of the above - integrated in one language
13. Erlang <--> LDB
● Erlang is the right tool for the job
● Small language that handles a very crucial sector
● Actors are magnificent!
● Mnesia is your "faithful" companion
● Constantly evolving / getting better
● Erlang processes are your (million) friends!
More at: http://highscalability.com/blog/2012/11/26/bigdata-using-erlang-c-and-lisp-to-fight-the-
tsunami-of-mobi.html
16. Bonus Stage: Why LISP?
● Very easy lexer, parser, analyzer, interpreter impl.
● SQL-ish queries
● Expressive power, abstraction
● S-expressions, heterogeneous multi-dim. lists
● Immutability => Ideal for parallel computing
● Tons of Data => Prefix notation
● Tons of Data => Data transformation FTW!
17. Yet another snippet...
(define (in-all? arrays uid)
(reduce (lambda (x acc) (and x acc))
(map (lambda (x) (in? x uid))
arrays)))
(define (in-all-timespan arrays uid from to)
(map (lambda (x)
(timebubble (timewarp (current-timespace) x)
(in-all? arrays uid)))
(range from to)))
(define (at-least-once-in-all? arrays uid from to)
(reduce (lambda (x acc) (or x acc))
(in-all-timespan arrays uid from to)))
(define (always-in-all? arrays uid from to)
(reduce (lambda (x acc) (and x acc))
(in-all-timespan arrays uid from to)))
(define (never-in-all? arrays uid from to)
(not (at-least-once-in-all? arrays uid from to)))
18. Moving Forward
● Not easy hot code swapping
● Auxiliary tools for monitoring, support, configuration
● GUIs for the above
● Cross-platform
● Interoperability with other systems
● Support role for big and complex systems
● Tweaking - LDB has the potential to run everywhere!
● ...From your cell-phone to an HPC cluster!
19. And?
● Use Erlang for your projects!
● It's small, easy and above-all it DELIVERS!
● Don't be scared by the complexity of modern systems
● Distributed systems are the present and future
● Harness this massive potential with Erlang
● START A PROJECT NOW!