In this talk, I will talk about why log files are horrible, logging log lines, and more structured performance metrics from large scale production applications as well as building reliable, scaleable and flexible large scale software systems in multiple languages.
Why (almost) all log formats are horrible will be explained, and why JSON is a good solution for logging will be discussed, along with a number of message queuing, middleware and network transport technologies, including STOMP, AMQP and ZeroMQ.
The Message::Passing framework will be introduced, along with the logstash.net project which the perl code is interoperable with. These are pluggable frameworks in ruby/java/jruby and perl with pre-written sets of inputs, filters and outputs for many many different systems, message formats and transports.
They were initially designed to be aggregators and filters of data for logging. However they are flexible enough to be used as part of your messaging middleware, or even as a replacement for centralised message queuing systems.
You can have your cake and eat it too - an architecture which is flexible, extensible, scaleable and distributed. Build discrete, loosely coupled components which just pass messages to each other easily.
Integrate and interoperate with your existing code and code bases easily, consume from or publish to any existing message queue, logging or performance metrics system you have installed.
Simple examples using common input and output classes will be demonstrated using the framework, as will easily adding your own custom filters. A number of common messaging middleware patterns will be shown to be trivial to implement.
Some higher level use-cases will also be explored, demonstrating log indexing in ElasticSearch and how to build a responsive platform API using webhooks.
Interoperability is also an important goal for messaging middleware. The logstash.net project will be highlighted and we'll discuss crossing the single language barrier, allowing us to have full integration between java, ruby and perl components, and to easily write bindings into libraries we want to reuse in any of those languages.
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)Tim Bunce
Slides of my talk on Devel::NYTProf and optimizing perl code at the Italian Perl Workshop (IPW09). It covers the new features in NYTProf v3 and a new section outlining a multi-phase approach to optimizing your perl code.
30 mins long plus 10 mins of questions. Best viewed fullscreen.
Databricks Spark Chief Architect Reynold Xin's keynote at Spark Summit East 2016, discussing streaming, continuous applications, and DataFrames in Spark.
Performance Benchmarking: Tips, Tricks, and Lessons LearnedTim Callaghan
Presentation covering 25 years worth of lessons learned while performance benchmarking applications and databases. Presented at Percona Live London in November 2014.
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)Tim Bunce
Slides of my talk on Devel::NYTProf and optimizing perl code at the Italian Perl Workshop (IPW09). It covers the new features in NYTProf v3 and a new section outlining a multi-phase approach to optimizing your perl code.
30 mins long plus 10 mins of questions. Best viewed fullscreen.
Databricks Spark Chief Architect Reynold Xin's keynote at Spark Summit East 2016, discussing streaming, continuous applications, and DataFrames in Spark.
Performance Benchmarking: Tips, Tricks, and Lessons LearnedTim Callaghan
Presentation covering 25 years worth of lessons learned while performance benchmarking applications and databases. Presented at Percona Live London in November 2014.
4Developers 2015: Scaling LAMP doesn't have to suck - Sebastian GrodzickiPROIDEA
Sebastian Grodzicki
Language: Polish
Tradycyjny LAMP sprawdza się świetnie … w środowisku deweloperskim. Nawet jeżeli nie tworzysz kolejnego Facebooka, to każda niedostępność Twojego serwisu kosztuje Cię sporo nerwów oraz pieniędzy. Awaria wystąpi wcześniej lub później. Pytanie nie brzmi "czy" leczy "kiedy". Dlatego warto o tym pomyśleć zawczasu i zbudować taką architekturę, która nie jest podatna na niedostępność z powodu awarii jednego z jej elementów. Jakie narzędzia wybrać? Jak wycisnąć z nich ostatnie soki?
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim CrontabsPaolo Negri
Slide of the RailsConf 2009 session
Discover how is possible to use parallel execution to batch process large amount of data, learn how to use queues to distribute workload and coordinate processes, increase the throughput on system with high latency. Have fun with EventMachine, AMQP, RabbitMQ and get rid of that every 5mins cronjob
Two popular tools for doing Machine Learning on top of JVM ecosystem is H2O and SparkML. This presentation compares these two tools as Machine Learning libraries (Didn't consider Spark's Data Munjing perspective). This work was done during June of 2018.
But we're already open source! Why would I want to bring my code to Apache?gagravarr
From ApacheCon Europe 2015 in Budapest
So, your business has already opened sourced some of its code? Great! Or you're thinking about it? That's fine! But now, someone's asking you about giving it to these Apache people? What's up with that, and why isn't just being open source enough?
In this talk, we'll look at several real world examples of where companies have chosen to contribute their existing open source code to the Apache Software Foundation. We'll see the advantages they got from it, the problems they faced along the way, why they did it, and how it helped their business. We'll also look briefly at where it may not be the right fit.
Wondering about how to take your business's open source involvement to the next level, and if contributing to projects at the Apache Software Foundation will deliver RoI, then this is the talk for you!
Dockersh and a brief intro to the docker internalsTomas Doran
Dockersh is a new tool to give a login shell into per-user Docker containers. (https://github.com/Yelp/dockersh) This talk will be an illustrated tour of what dockersh does, and why it might be useful to you. During this journey we’ll dive into the Go programming language, + libcontainer (the technologies Docker is built on) in addition to the facilities Docker uses in the kernel (Namespaces, Cgroups and Capabilities), how these work, and how normal mortals can (ab)use them for fun and profit
Lightning talk showing various aspectos of software system performance. It goes through: latency, data structures, garbage collection, troubleshooting method like workload saturation method, quick diagnostic tools, famegraph and perfview
Se muestra un estudio a un portafolio de acciones del sector tecnologico, mediante alteryx se hace un analisis para dar conclusion si se recomienda o no invertir en las acciones que forman parte del portafolio
4Developers 2015: Scaling LAMP doesn't have to suck - Sebastian GrodzickiPROIDEA
Sebastian Grodzicki
Language: Polish
Tradycyjny LAMP sprawdza się świetnie … w środowisku deweloperskim. Nawet jeżeli nie tworzysz kolejnego Facebooka, to każda niedostępność Twojego serwisu kosztuje Cię sporo nerwów oraz pieniędzy. Awaria wystąpi wcześniej lub później. Pytanie nie brzmi "czy" leczy "kiedy". Dlatego warto o tym pomyśleć zawczasu i zbudować taką architekturę, która nie jest podatna na niedostępność z powodu awarii jednego z jej elementów. Jakie narzędzia wybrać? Jak wycisnąć z nich ostatnie soki?
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim CrontabsPaolo Negri
Slide of the RailsConf 2009 session
Discover how is possible to use parallel execution to batch process large amount of data, learn how to use queues to distribute workload and coordinate processes, increase the throughput on system with high latency. Have fun with EventMachine, AMQP, RabbitMQ and get rid of that every 5mins cronjob
Two popular tools for doing Machine Learning on top of JVM ecosystem is H2O and SparkML. This presentation compares these two tools as Machine Learning libraries (Didn't consider Spark's Data Munjing perspective). This work was done during June of 2018.
But we're already open source! Why would I want to bring my code to Apache?gagravarr
From ApacheCon Europe 2015 in Budapest
So, your business has already opened sourced some of its code? Great! Or you're thinking about it? That's fine! But now, someone's asking you about giving it to these Apache people? What's up with that, and why isn't just being open source enough?
In this talk, we'll look at several real world examples of where companies have chosen to contribute their existing open source code to the Apache Software Foundation. We'll see the advantages they got from it, the problems they faced along the way, why they did it, and how it helped their business. We'll also look briefly at where it may not be the right fit.
Wondering about how to take your business's open source involvement to the next level, and if contributing to projects at the Apache Software Foundation will deliver RoI, then this is the talk for you!
Dockersh and a brief intro to the docker internalsTomas Doran
Dockersh is a new tool to give a login shell into per-user Docker containers. (https://github.com/Yelp/dockersh) This talk will be an illustrated tour of what dockersh does, and why it might be useful to you. During this journey we’ll dive into the Go programming language, + libcontainer (the technologies Docker is built on) in addition to the facilities Docker uses in the kernel (Namespaces, Cgroups and Capabilities), how these work, and how normal mortals can (ab)use them for fun and profit
Lightning talk showing various aspectos of software system performance. It goes through: latency, data structures, garbage collection, troubleshooting method like workload saturation method, quick diagnostic tools, famegraph and perfview
Se muestra un estudio a un portafolio de acciones del sector tecnologico, mediante alteryx se hace un analisis para dar conclusion si se recomienda o no invertir en las acciones que forman parte del portafolio
En la presentación se analiza bajo el modelo de Montecarlo aplicado bajo el software palisade una cartera de opciones sobre 5 acciones diferentes (Apple, General electric, Amazon, Microsoft e IBM)
How to Write the Fastest JSON Parser/Writer in the WorldMilo Yip
How RapidJSON is developed in order to achieve highest performance among 20 C/C++ JSON libraries. Benchmarks, some C++ design, algorithm and low-level optimizations are covered.
Redis Day Keynote Salvatore Sanfillipo Redis LabsRedis Labs
Redis' seventh birthday was recently celebrated with the community, several contributors and users. This is Salvatore's keynote as he kicked off Redis Day in Tel Aviv.
Trick or XFLTReaT a.k.a. Tunnel All The ThingsBalazs Bucsay
XFLTReaT presentation from RuxCon 2017
This presentation will sum up how to do tunnelling with different protocols and will have different perspectives detailed. For example, companies are fighting hard to block exfiltration from their network: they use http(s) proxies, DLP, IPS technologies to protect their data, but are they protected against tunnelling? There are so many interesting questions to answer for users, abusers, companies and malware researchers. Mitigation and bypass techniques will be shown you during this presentation, which can be used to filter any tunnelling on your network or to bypass misconfigured filters.
Our new tool XFLTReaT is an open-source tunnelling framework that handles all the boring stuff and gives users the capability to take care of only the things that matter. It provides significant improvements over existing tools. From now on there is no need to write a new tunnel for each and every protocol or to deal with interfaces and routing. Any protocol can be converted to a module, which works in a plug-and-play fashion; authentication and encryption can be configured and customised on all traffic and it is also worth mentioning that the framework was designed to be easy to configure, use and develop. In case there is a need to send packets over ICMP type 0 or HTTPS TLS v1.2 with a special header, then this can be done in a matter of minutes, instead of developing a new tool from scratch. The potential use (or abuse) cases are plentiful, such as bypassing network restrictions of an ISP, the proxy of a workplace or obtaining Internet connectivity through bypassing captive portals in the middle of the Atlantic Ocean or at an altitude of 33000ft on an airplane.
This framework is not just a tool; it unites different technologies in the field of tunnelling. While we needed to use different tunnels and VPNs for different protocols in the past like OpenVPN for TCP and UDP, ptunnel for ICMP or iodined for DNS tunnelling, it changes now. After taking a look at these tools it was easy to see some commonality, all of them are doing the same things only the means of communication are different. We simplified the whole process and created a framework that is responsible for everything but the communication itself, we rethought the old way of tunnelling and tried to give something new to the community. After the initial setup the framework takes care of everything. With the check functionality we can even find out, which module can be used on the network, there is no need for any low-level packet fu and hassle. I guarantee that you won’t be disappointed with the tool and the talk, actually you will be richer with an open-source tool.
XFLTReaT: a new dimension in tunnelling (BruCON 0x09 2017)Balazs Bucsay
XFLTReaT presentation from BruCON 0x09 2017
https://www.youtube.com/watch?v=0hnxgu8lkfc
This presentation will sum up how to do tunnelling with different protocols and will have different perspectives detailed. For example, companies are fighting hard to block exfiltration from their network: they use http(s) proxies, DLP, IPS technologies to protect their data, but are they protected against tunnelling? There are so many interesting questions to answer for users, abusers, companies and malware researchers. Mitigation and bypass techniques will be shown you during this presentation, which can be used to filter any tunnelling on your network or to bypass misconfigured filters.
Our new tool XFLTReaT is an open-source tunnelling framework that handles all the boring stuff and gives users the capability to take care of only the things that matter. It provides significant improvements over existing tools. From now on there is no need to write a new tunnel for each and every protocol or to deal with interfaces and routing. Any protocol can be converted to a module, which works in a plug-and-play fashion; authentication and encryption can be configured and customised on all traffic and it is also worth mentioning that the framework was designed to be easy to configure, use and develop. In case there is a need to send packets over ICMP type 0 or HTTPS TLS v1.2 with a special header, then this can be done in a matter of minutes, instead of developing a new tool from scratch. The potential use (or abuse) cases are plentiful, such as bypassing network restrictions of an ISP, the proxy of a workplace or obtaining Internet connectivity through bypassing captive portals in the middle of the Atlantic Ocean or at an altitude of 33000ft on an airplane.
This framework is not just a tool; it unites different technologies in the field of tunnelling. While we needed to use different tunnels and VPNs for different protocols in the past like OpenVPN for TCP and UDP, ptunnel for ICMP or iodined for DNS tunnelling, it changes now. After taking a look at these tools it was easy to see some commonality, all of them are doing the same things only the means of communication are different. We simplified the whole process and created a framework that is responsible for everything but the communication itself, we rethought the old way of tunnelling and tried to give something new to the community. After the initial setup the framework takes care of everything. With the check functionality we can even find out, which module can be used on the network, there is no need for any low-level packet fu and hassle. I guarantee that you won’t be disappointed with the tool and the talk, actually you will be richer with an open-source tool.
Empowering developers to deploy their own data storesTomas Doran
Empowering developers to deploy their own data stores using Terrafom, Puppet and rage. A talk about automating server building and configuration for Elasticsearch clusters, using Hashicorp and puppet labs tool. Presented at Config Management Camp 2016 in Ghent
Sensu and Sensibility - Puppetconf 2014Tomas Doran
As the Yelp infrastructure and engineering team grew, so did the pain of managing Nagios. Problems like splitting alerting across multiple teams, providing high availability and managing nagios systems in multiple environments had become pressing. As we grew towards a service oriented architecture and pushed some services out into the cloud, we rapidly needed more automated monitoring configuration.
An evolutionary solution wasn’t going to solve all of our problems, we needed to revolutionize our monitoring. Sensu is built from the ground up to solve many of our issues and be easy to extend.
This talk covers our puppet ‘monitoring_check’ API (that sets up monitoring for our services within puppet), how and why we deploy Sensu and our custom handlers and escalations, along with how we provide automatic ‘self service’ monitoring for dynamic services and how we deal with the challenges posed by the more ephemeral nature of cloud architectures.
Building a smarter application stack - service discovery and wiring for DockerTomas Doran
There are many advantages to a container based, microservices architecture - however, as always, there is no silver bullet. Any serious deployment will involve multiple host machines, and will have a pressing need to migrate containers between hosts at some point. In such a dynamic world hard coding IP addresses, or even host names is not a viable solution.
This talk will take a journey through how Yelp has solved the discovery problems using Airbnb’s SmartStack to dynamically discover service dependencies, and how this is helping unify our architecture, from traditional metal to EC2 ‘immutable’ SOA images, to Docker containers.
Chasing AMI - Building Amazon machine images with Puppet, Packer and JenkinsTomas Doran
Using puppet when configuring EC2 machines seems a natural fit. However bringing up new machines from a community image with puppet is not trivial and can be slow, and so not useful for auto-scaling.
The cloud also offers a solution to ongoing server maintenance, allowing you to launch fresh instances whenever you upgrade your applications (Immutable or Phoenix servers). However to predictably succeed, you need to freeze the puppet code alongside the application version for deployment.
The solution to these issues is generating custom machine images (AMIs) with your software inlined. This talk will cover Yelp's use of a Packer, Jenkins and Puppet for generating AMIs. This will include how we deal with issues like bootstrapping, getting canonical information about a machine's environment and cluster state at launch time, as well as supporting immutable/phoenix servers in combination with more traditional long lived servers inside our hybrid cloud infrastructure.
My talk from the Bay area puppetcamp about deploying puppet code to a global network of puppet masters as quickly as possible.
Covers the design and implementation of the TIM Group (and now Yelp) puppetupdate mcollective agent: https://github.com/Yelp/puppetupdate/
Talk from Puppet Camp Munich 2013 about how to lay out classes and defines in puppet code, and how to use hiera data.
Covers puppet 2.7 => 3.3 and how to write sanely forwards compatible code between them.
My talk from the pupet devops conference Berlin 2014 (http://code-your-config.com/). A low level tour of some terrible terrible puppet code, with advice on how to do it better (including showing off the awesome new each() construct you get in puppet 3.2)
Herding a Cat with Antlers - Catalyst 5.80Tomas Doran
Catalyst 5.80 - the new major version allows you lots of new ways to build applications. This talk looks at some of the technologies you may want to use, and points out some examples and other modules you might want to look at on CPAN.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
2. Who are you?
• Perl Developer
• Been paid to write perl code for ~14 years
• Open Source hacker
• Catalyst core team
• >160 CPAN dists
• Also C, Javascript, ruby, etc..
3. Sponsored by
• state51
• Pb of mogilefs, 100+ boxes.
• > 4 million tracks on-demand via API
• > 400 reqs/s per server, >1Gb peak from backhaul
• Suretec VOIP Systems
• UK voice over IP provider
• Extensive API, including WebHooks for notifications
• TIM Group
• International Financial apps
• Java / ruby / puppet
5. Why?
• I’d better stop, and explain a specific
problem.
• The solution that grew out of this is more
generic.
• But it illustrates my concerns and design
choices well.
• And everyone likes a story, right?
6. Once upon a time...
• I was bored of tailing log files across dozens
of servers
• splunk was amazing, but unaffordable
8. Centralised logging
• Syslog isn’t good enough
• UDP is lossy, TCP not much better
• Limited fields
• No structure to actual message
• RFC3164 - “This document describes the
observed behaviour of the syslog protocol”
9. Centralised logging
• Syslog isn’t good enough
• Structured app logging
• We want to log data, rather than text
from our application
• E.g. HTTP request - vhost, path, time to
generate, N db queries etc..
11. Centralised logging
• Syslog isn’t good enough
• Structured app logging
• Post-process log files to re-structure
• We can do this in cases we don’t control
• Apache logs, etc..
• SO MANY DATE FORMATS. ARGHH!!
12. Centralised logging
• Syslog isn’t good enough
• Structured app logging
• Post-process log files to re-structure
• Publish logs as JSON to a message queue
• JSON is fast, and widely supported
• Great for arbitrary structured data!
13. Message queue
• Flattens load spikes!
• Only have to keep up with average message
volume, not peak volume.
• Logs are bursty! (Peak rate 1000x average.)
• Easy to scale - just add more consumers
• Allows smart routing
• Great as a common integration point.
14. elasticsearch
• Just tip JSON documents into it
• Figures out type for each field, indexes
appropriately.
• Free sharding and replication
• Histograms!
15. Histograms!
• elasticsearch does ‘big data’, not just text
search.
• Ask arbitrary questions
• Get back aggregate metrics / counts
• Very powerful.
16. Logstash
In JRuby, by Jordan Sissel
Input
Simple: Filter
Output
Flexible
Extensible
Plays well with others
Nice web interface
28. Java (JRuby) decoding
AMQP is, however
much much faster than
perl doing that...
JVM+-
29. Logstash on each host
is totally out...
• Running it on elasticsearch servers which
are already dedicated to this is fine..
• I’d still like to reuse all of it’s parsing
30. This talk
• Is about my new library: Message::Passing
• The clue is in the name...
• Hopefully really simple
• Maybe even useful!
31. Wait a second!
• My app logs are already structured!
• Why don’t I just publish AMQP from the
app
32. Good question!
• I tried that.
• App logging relies on RabbitMQ being up
• Adds a single point of failure.
• Logging isn’t that important!
• ZeroMQ to the rescue
33. ZeroMQ has the
correct semantics
• Pub/Sub sockets
• Never, ever blocking
• Lossy! (If needed)
• Buffer sizes / locations configureable
• Arbitrary message size
• IO done in a background thread
34. On host log collector
• ZeroMQ SUB socket
• App logs - pre structured
• Syslog listener
• Forward rsyslogd
• Log file tailer
• Ship to AMQP
36. Lets make it generic!
• So, I wanted a log shipper
• I ended up with a framework for messaging
interoperability
• Whoops!
• Got sick of writing scripts..
37. Events - my model for
message passing
• a hash {}
• Output consumes events:
• method consume ($event) { ...
• Input produces events:
• has output_to => (..
• Filter does both
40. That’s it.
• No, really - that’s all the complexity you
have to care about!
• Except for the complexity introduced by
the inputs and outputs you use.
• Unified attribute names / reconnection
model, etc.. This helps, somewhat..
41. Inputs and outputs
• ZeroMQ In / Out
• AMQP (RabbitMQ) In / Out
• STOMP (ActiveMQ) In / Out
• elasticsearch Out
• Redis PubSub In/Out
• Syslog In
• HTTP POST (“WebHooks”) Out
42. DSL
• Building more complex
chains easy!
• Multiple inputs
• Multiple outputs
• Multiple independent chains
43. CLI
• 1 Input
• 1 Output
• 1 Filter (default Null)
• For simple use, or testing.
44. CLI
• Encode / Decode step is just a Filter
• JSON by default
52. Does this actually
work?
• YES - In production at two sites.
• Some of the adaptors are partially
complete
53. Does this actually
work?
• YES - In production at two sites.
• Some of the adaptors are partially
complete
• Dumber than logstash - no multiple
threads/cores
54. Does this actually
work?
• YES - In production at two sites.
• Some of the adaptors are partially
complete
• Dumber than logstash - no multiple
threads/cores
• ZeroMQ is insanely fast
55. Other people are using
it in production!
Two people I know of already writing adaptors!
56. What about logstash?
• Use my lightweight code on end nodes.
• Use ‘proper’ logstash for parsing/filtering
on the dedicated hardware (elasticsearch
boxes)
• Filter to change my hashes to logstash
compatible hashes
• For use with MooseX::Storage and/or
Log::Message::Structured
57. Other applications
• Anywhere an asynchronous event stream is
useful!
• Monitoring
• Metrics transport
• Queued jobs
58. Other applications
(Web stuff)
• User activity (ajax ‘what are your users
doing’)
• WebSockets / MXHR
• HTTP Push notifications - “WebHooks”
64. Demo?
• The last demo wasn’t silly enough!
• How could I top that?
• Plan - Re-invent mongrel2
• Badly
65. PSGI
• PSGI $env is basically just a hash.
• (With a little fiddling), you can serialize it as
JSON
• PSGI response is just an array.
• Ignore streaming responses!
66. PUSH socket does fan
out between multiple
handlers.
Reply to address
embedded in request
Mention JFDI, and I really don’t care what language it’s in\n
Mention state51 are hiring in London\nMention Tim Group are hiring in London/Boston.\n
But, before I talk about perl at you, I’m going to go off on a tangent..\n
I wrote code. And writing code is never something to be proud of; at least if your code looks like mine it isn’t... So I’d better justify this hubris somehow..\n
\n
Isn’t he cute? And woody!\nWho knows what this is?\n
Ok, so logstash is an open source project, in ruby.\nBefore I talk about it in detail, I’ll go through some of the design choices for supporting technologies.\nDoes anyone need convincing why centralised logging is something you want?\n
\n\n
MooseX::Storage!\nThis isn’t mandatory - you can just log plain hashes if you’re concerned about performance.\nSPOT THE TYPO\n
\n\n
Every language has a JSON library. This makes passing hashes of JSON data around a great way to interoperate.\nNo, really - JSON::XS is lightning fast\n
There are a whole pile of different queue products. Why would you want to use one (for logging to)?\nAverage volume is really important!\nA solution with hosts polling the database server has (at least) a cost of O(n).\nA message queue (can at least theoretically) perform as O(1), no matter how many consumer.\nBy ‘smart routing’, I mean you can publish a ‘firehose’ message stream.\nMost message queue products allow you to get a subset of that stream.\nMost message queues have bindings in most languages.. So by abstracting message routing out of your application, and passing JSON hashes - you are suddenly nicely cross language!\n
If you haven’t yet heard of elasticsearch, I recommend you check it out.\nIt’s big, it’s Java, it needs some care and feeding, but!\nYou can just throw data into it.\nelasticsearch is smart - and works out the field types for you.\nGiven you do things sensibly, elasticsearch is pretty amazing for scaleability and replication - you can just add more boxes to your cluster and it all goes faster!\nPonies and unicorns for everyone.\n
These deserve a little of their own description!\nYou can query across an arbitrary set of JSON documents, fast!\nAnd then get stats about the documents out. Like averages, sums, counts, max/min etc.\nIf you think about this for a bit, you can re-implement all your RRDs in elasticsearch quite easily. Ponies and unicorns for everyone.\nYou may not actually want to re-invent RRD, especially given you have no (native) way of collapsing data points down... However it’s brilliant for making up metrics you may want an RRD for, and asking elasticsearch to generate you a graph to see if it might be useful!\n
Very simple model - input (pluggable), filtering (pluggable by type) in C, output (pluggable)\nLots of backends - AMQP and elasticsearch + syslog and many others\nPre-built parser library for various line based log formats\nComes with web app for searches.. Everything I need!\n
And it has an active community.\nThis is the alternate viewer app..\n
Lets take a simple case here - I’ll shove my apache logs from N servers into elasticsearch\nI run a logstash on each host (writer), and one on each elasticsearch server (reader)..\n
So, that has 2 logstashes - one reading files and writing AMQP\nOne reading AMQP and writing to elasticsearch\nHowever, my raw apache log lines need parsing (in the filter stage) - to be able to do things like ‘all apache requests with 500 status’, rather than ‘all apache requests containing the string 500’\n
So, the ‘filter’ step, for example - is the parsing apache logs and re-structuring them.\n
Red indicates the filtering\n
Except I could instead do the filtering here, if I wanted to.\nDoesn’t really matter - depends what’s best for me..\nRight, so... Lets try that then?\n
First problem...\n
Well then, I’m not going to be running this on the end nodes.\n
And it’s not tiny, even on machines dedicated to log parsing / filtering / indexing\n
But sure, I spun it up on a couple of spare machines...\n
It works really well as advertised.\n
The JVM giveth (lots of awesome software), the JVM taketh away (any RAM you had).\nruby is generally slower than perl. jruby is generally faster than perl. jruby trounces perl at (pure ruby) AMQP decoding. MRI 30% slower than perl. JRuby 30% faster than perl!\nSo I’m not actually knocking the technology here - just saying it won’t work in this situation for me.\n
So, anyway, I’m totally stuffed... The previous plan is a non-starter.\nSo I need something to collect logs from each host and ship them to AMQP\nOk, cool, I can write that in plain ruby or plain perl and it’s gotta be slimmer, right?\nI still plan to reuse logstash - just not on end nodes!\nHas a whole library of pre-built parsers for common log formats.\nAlso, as noted, it’s faster, and notably it’s multi-threaded, so it’ll use multiple cores..\n
Ok, so hopefully I’ve explained one of the problems I want to solve.\nAnd I’ve maybe explained why I have the hubris to solve it myself\nI’ve tried to keep things (at least conceptually) as simple as possible\nAt the same time, I want something that can be used for real work (i.e. not just a toy)\n
Good question!\n
But wait a second... I just want to get something ‘real’ running here...\nSo, I’m already tipping stuff into AMQP..\n\n\n
ZeroMQ looked like the right answer.\nI played with it. It works REALLY well.\nI’d recommend you try it.\nThe last point here is most important - ZMQ networking works entirely in a background thread perl knows nothing about, which means that you can asynchronously ship messages with no changes to your existing codebase.\n
Yes, this could still be ‘a script’, in fact I did that at first...\nBut I now have 3 protocols, who’s to say I won’t want a 4th..\n\n
Note the fact that we have a cluster of ES servers here.\nAnd we have two log indexers.\nYou can cluster RabbitMQ also.\nHighly reliable solution (against machine failure). Highly scaleable solution (just add ES servers)\n
This is where I went crazy.\nThis isn’t how I started.\nI am blaming AMQP! Too complex for simple cases\nI had a log shipper script. A long indexer script. An alerting (nagios) script. An irc notification script.\n
I mean, solving this in the simple case has got to be easy, right?\nI stole logstash’s terminology!\nAnd here’s the API, we have Outputs, which consume messages\nWe have inputs, which output messages.\nFilters are just a combination of input and output\n
So the input has an output, that output always has a consume method...\nTADA!\n
You can build a “chain” of events. This can work either way around.\nThe input can be a log file, the output can be a message queue (publisher)\nInput can be a message queue, output can be a log file (consumer)\n
STOMP is very different to AMQP is very different to RabbitMQ. I can’t really help much here, except for trying to make the docs not suck.\nThe docs still suck, sorry - I have tried ;)\n
All of these are on CPAN already.\n
DSL - Domain specific language.\nTry to make writing scripts really simple.\n
But you shouldn’t have to write ANY code to play around.\n
\n
How are we doing for time?\nI can do some demos, or we can have some questions, or both!\n(Remember to click the next slides as people as questions)\n
\n
\n
\n
Demo1\nSimple demo of the CLI in one process (STDOUT/STDIN)\n
Less simple demo - lets actually pass messages between two processes.\nArrows indicate message flow. ZeroMQ is a lightning bolt as it’s not quite so trivial..\n
\n
\n
\n
By insanely fast, I mean I can generate, encode as JSON, send, receive, decode as JSON over 25k messages a second. On this 3 year old macbook..\n
\n
\n
\n
I’ll talk a very little more about webhooks\n
Error stream\n
\n
Demo PUBSUB and round robin..\n
So, lets play Jenga with message queues!\n
I would have added ZeroMQ. Except then the diagram doesn’t fit on the page.\nI’ll leave this as an exercise for the reader!\n