Talk at Stanford's CS442 (High Productivity and Performance with Domain Specific Languages in Scala http://www.stanford.edu/class/cs442/), on Rogue. 5/24/2011
Presented by Gregg Donovan, Senior Software Engineer, Etsy.com, Inc.
Understanding the impact of garbage collection, both at a single node and a cluster level, is key to developing high-performance, high-availability Solr and Lucene applications. After a brief overview of garbage collection theory, we will review the design and use of the various collectors in the JVM.
At a single-node level, we will explore GC monitoring -- how to understand GC logs, how to monitor what % of your Solr request time is spend on GC, how to use VisualGC, YourKit, and other tools, and what to log and monitor. We will review GC tuning and how to measure success.
At a cluster-level, we will review how to design for partial availability -- how to avoid sending requests to a GCing node and how to be resilient to mid-request GC pauses.For application development, we will review common memory leak scenarios in custom Solr and Lucene application code and how to detect them.
Etsy is using Solr and Lucene to serve queries at a rate of more than 8 billion per year (and growing). In this case study, we will describe how Etsy has integrated Solr/Lucene into our continuous deployment infrastructureallowing for Solr configuration, Java-based indexers, and query parsing logic to go from passing tests to production code in minutes
Presented by Gregg Donovan, Senior Software Engineer, Etsy.com, Inc.
Understanding the impact of garbage collection, both at a single node and a cluster level, is key to developing high-performance, high-availability Solr and Lucene applications. After a brief overview of garbage collection theory, we will review the design and use of the various collectors in the JVM.
At a single-node level, we will explore GC monitoring -- how to understand GC logs, how to monitor what % of your Solr request time is spend on GC, how to use VisualGC, YourKit, and other tools, and what to log and monitor. We will review GC tuning and how to measure success.
At a cluster-level, we will review how to design for partial availability -- how to avoid sending requests to a GCing node and how to be resilient to mid-request GC pauses.For application development, we will review common memory leak scenarios in custom Solr and Lucene application code and how to detect them.
Etsy is using Solr and Lucene to serve queries at a rate of more than 8 billion per year (and growing). In this case study, we will describe how Etsy has integrated Solr/Lucene into our continuous deployment infrastructureallowing for Solr configuration, Java-based indexers, and query parsing logic to go from passing tests to production code in minutes
Part of the JavaScript training series offered by Bitovi. Full course schedule is available here: http://blog.bitovi.com/free-weekly-online-javascript-training/
An Elephant of a Different Colour: HackVic Metcalfe
Slides from my GTA-PHP Meetup talk about Hack which is the Facebook version of the PHP programming language which runs under their HHVM runtime environment for PHP. The focus of my talk was the language improvements that the Facebook team has added to PHP.
There's a lot of information in the presenter's notes, so if you're interested in Hack scroll down to see the extras.
Patterns for slick database applicationsSkills Matter
Slick is Typesafe's open source database access library for Scala. It features a collection-style API, compact syntax, type-safe, compositional queries and explicit execution control. Community feedback helped us to identify common problems developers are facing when writing Slick applications. This talk suggests particular solutions to these problems. We will be looking at reducing boiler-plate, re-using code between queries, efficiently modeling object references and more.
As a result of an engine rewrite with focus on more efficient data structures, PHP 7 offers much improved performance and memory usage. This session describes important aspects of the new implementation and how it compares to PHP 5. A particular focus will be on the representation of values, arrays and objects.
PHP 7 – What changed internally? (Forum PHP 2015)Nikita Popov
One of the main selling points of PHP 7 is greatly improved performance, with many real-world applications now running twice as fast… But where do these improvements come from?
At the core of PHP 7 lies an engine rewrite with focus on improving memory usage and performance. This talk provides an overview of the most significant changes, briefly covering everything from data structure changes, over enhancements in the executor, to the new compiler implementation.
Teaching Your Machine To Find FraudstersIan Barber
The slides from my talk at PHP Tek 11.
When dealing with money online, fraud is an ongoing problem for both
consumers and sellers. Researchers have been developing statistical
and machine learning techniques to detect shady sellers on auction
sites, spot fraudulent payments on e-commerce systems and catch click
fraud on adverts. While there is no silver bullet, you will learn to
flag suspicious activity and help protect your site from scammers
using PHP and a little help from some other technologies.
PhoneGap: Local Storage
This presentation has been developed in the context of the Mobile Applications Development course, DISIM, University of L'Aquila (Italy), Spring 2013.
http://www.ivanomalavolta.com
Part of the JavaScript training series offered by Bitovi. Full course schedule is available here: http://blog.bitovi.com/free-weekly-online-javascript-training/
An Elephant of a Different Colour: HackVic Metcalfe
Slides from my GTA-PHP Meetup talk about Hack which is the Facebook version of the PHP programming language which runs under their HHVM runtime environment for PHP. The focus of my talk was the language improvements that the Facebook team has added to PHP.
There's a lot of information in the presenter's notes, so if you're interested in Hack scroll down to see the extras.
Patterns for slick database applicationsSkills Matter
Slick is Typesafe's open source database access library for Scala. It features a collection-style API, compact syntax, type-safe, compositional queries and explicit execution control. Community feedback helped us to identify common problems developers are facing when writing Slick applications. This talk suggests particular solutions to these problems. We will be looking at reducing boiler-plate, re-using code between queries, efficiently modeling object references and more.
As a result of an engine rewrite with focus on more efficient data structures, PHP 7 offers much improved performance and memory usage. This session describes important aspects of the new implementation and how it compares to PHP 5. A particular focus will be on the representation of values, arrays and objects.
PHP 7 – What changed internally? (Forum PHP 2015)Nikita Popov
One of the main selling points of PHP 7 is greatly improved performance, with many real-world applications now running twice as fast… But where do these improvements come from?
At the core of PHP 7 lies an engine rewrite with focus on improving memory usage and performance. This talk provides an overview of the most significant changes, briefly covering everything from data structure changes, over enhancements in the executor, to the new compiler implementation.
Teaching Your Machine To Find FraudstersIan Barber
The slides from my talk at PHP Tek 11.
When dealing with money online, fraud is an ongoing problem for both
consumers and sellers. Researchers have been developing statistical
and machine learning techniques to detect shady sellers on auction
sites, spot fraudulent payments on e-commerce systems and catch click
fraud on adverts. While there is no silver bullet, you will learn to
flag suspicious activity and help protect your site from scammers
using PHP and a little help from some other technologies.
PhoneGap: Local Storage
This presentation has been developed in the context of the Mobile Applications Development course, DISIM, University of L'Aquila (Italy), Spring 2013.
http://www.ivanomalavolta.com
Apache Spark for Library Developers with William Benton and Erik ErlandsonDatabricks
As a developer, data engineer, or data scientist, you’ve seen how Apache Spark is expressive enough to let you solve problems elegantly and efficient enough to let you scale out to handle more data. However, if you’re solving the same problems again and again, you probably want to capture and distribute your solutions so that you can focus on new problems and so other people can reuse and remix them: you want to develop a library that extends Spark.
You faced a learning curve when you first started using Spark, and you’ll face a different learning curve as you start to develop reusable abstractions atop Spark. In this talk, two experienced Spark library developers will give you the background and context you’ll need to turn your code into a library that you can share with the world. We’ll cover: Issues to consider when developing parallel algorithms with Spark, Designing generic, robust functions that operate on data frames and datasets, Extending data frames with user-defined functions (UDFs) and user-defined aggregates (UDAFs), Best practices around caching and broadcasting, and why these are especially important for library developers, Integrating with ML pipelines, Exposing key functionality in both Python and Scala, and How to test, build, and publish your library for the community.
We’ll back up our advice with concrete examples from real packages built atop Spark. You’ll leave this talk informed and inspired to take your Spark proficiency to the next level and develop and publish an awesome library of your own.
My name is Neta Barkay , and I'm a data scientist at LivePerson.
I'd like to share with you a talk I presented at the Underscore Scala community on "Efficient MapReduce using Scalding".
In this talk I reviewed why Scalding fits big data analysis, how it enables writing quick and intuitive code with the full functionality vanilla MapReduce has, without compromising on efficient execution on the Hadoop cluster. In addition, I presented some examples of Scalding jobs which can be used to get you started, and talked about how you can use Scalding's ecosystem, which includes Cascading and the monoids from Algebird library.
Read more & Video: https://connect.liveperson.com/community/developers/blog/2014/02/25/scalding-reaching-efficient-mapreduce
Apple's Swift has achieved the top place in Stack Overflow's "Most Loved" list of programming languages in its 2015 Developer Survey. Based on information gleaned from GitHub and Stack Overflow, analyst firm RedMonk has seen Swift's popularity ranking soar from 68 to 22 in an unprecedented 6 months.
The "Extreme Swift" event does not require advanced, or even any, knowledge of Swift. Learn about some of the more outrageous features of the language which help explain what the fuss is all about!
Never look at programming the same way again — even if you never end up writing a single line of Swift code in your life.
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiInfluxData
When a large group of people change their habits, it can be tricky for infrastructures! Working from home and spending time indoor today means attending video calls and streaming movies and tv shows. This leads to increased internet traffic that can create congestion on the network infrastructure. So how do you get real-time visibility into your ISP connection? In this meetup, Mirko presents his setup based on a time series database and Raspberry Pi to better understand his ISP connection quality and speed — including upload and download speeds. Join us to discover how he does it using Telegraf, InfluxDB Cloud, Astro Pi, Telegram and Grafana! Finally, proof that your ISP connection is (or is not) as fast as it promises.
Similar to CS442 - Rogue: A Scala DSL for MongoDB (20)
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
CS442 - Rogue: A Scala DSL for MongoDB
1. rogue:
a scala dsl for mongodb
CS442 - 5/24/2011
Jorge Ortiz (@jorgeortiz85)
2. what is foursquare?
location-based social network - “check-in” to bars,
restaurants, museums, parks, etc
friend-finder (where are my friends right now?)
virtual game (badges, points, mayorships)
city guide (local, personalized recommendations)
location diary + stats engine (where was I a year ago?)
specials (get rewards at your favorite restaurant)
4. foursquare: the tech
Nginx, HAProxy
Scala, Lift
MongoDB, PostgreSQL (legacy)
(Kestrel, Munin, Ganglia, Python, Memcache, ...)
All on EC2
5. what is mongodb?
fast, schema-less document store
indexes & rich queries on any attribute
sharding, auto-balancing
replication
geo-indexes
6. mongodb: our numbers
8 clusters
some sharded, some not
some master/slave, some replica set
~40 machines (68.4GB, m2.4xl on EC2)
2.3 billion records
~15k QPS
10. mongodb: query example
val query =
(BasicDBOBjectBuilder
.start
.push(“mayorid”)
.add(“$lte”, 100)
.pop
.push(“veneuname”)
.add(“$eq”, “Starbucks”)
.pop
.get)
11. rogue: a scala dsl for mongo
type-safe
all mongo query features
logging & validation hooks
pagination
index-aware
cursors
http://github.com/foursquare/rogue
13. rogue: code example
val vs: List[Venue] =
(Venue where (_.mayorid <= 100)
and (_.venuename eqs “Starbucks”)
and (_.tags contains “wifi”)
and (_.latlng near
(39.0, -74.0, Degrees(0.2))
orderDesc (_._id)
fetch (5))