Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDataWorks Summit
When interacting with analytics dashboards in order to achieve a smooth user experience, two major key requirements are sub-second response time and data freshness. Cluster computing frameworks such as Hadoop or Hive/Hbase work well for storing large volumes of data, although they are not optimized for ingesting streaming data and making it available for queries in realtime. Also, long query latencies make these systems sub-optimal choices for powering interactive dashboards and BI use-cases.
In this talk we will present Druid as a complementary solution to existing hadoop based technologies. Druid is an open-source analytics data store, designed from scratch, for OLAP and business intelligence queries over massive data streams. It provides low latency realtime data ingestion and fast sub-second adhoc flexible data exploration queries.
Many large companies are switching to Druid for analytics, and we will cover how druid is able to handle massive data streams and why it is a good fit for BI use cases.
Agenda -
1) Introduction and Ideal Use cases for Druid
2) Data Architecture
3) Streaming Ingestion with Kafka
4) Demo using Druid, Kafka and Superset.
5) Recent Improvements in Druid moving from lambda architecture to Exactly once Ingestion
6) Future Work
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
VictoriaLogs: Open Source Log Management System - PreviewVictoriaMetrics
VictoriaLogs Preview - Aliaksandr Valialkin
* Existing open source log management systems
- ELK (ElasticSearch) stack: Pros & Cons
- Grafana Loki: Pros & Cons
* What is VictoriaLogs
- Open source log management system from VictoriaMetrics
- Easy to setup and operate
- Scales vertically and horizontally
- Optimized for low resource usage (CPU, RAM, disk space)
- Accepts data from Logstash and Fluentbit in Elasticsearch format
- Accepts data from Promtail in Loki format
- Supports stream concept from Loki
- Provides easy to use yet powerful query language - LogsQL
* LogsQL Examples
- Search by time
- Full-text search
- Combining search queries
- Searching arbitrary labels
* Log Streams
- What is a log stream?
- LogsQL examples: querying log streams
- Stream labels vs log labels
* LogsQL: stats over access logs
* VictoriaLogs: CLI Integration
* VictoriaLogs Recap
Presentation by Lorenzo Mangani of QXIP at the October 26 SF Bay Area ClickHouse meetup
https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup
https://qxip.net/
Roko Kruze of vectorized.io describes real-time analytics using Redpanda event streams and ClickHouse data warehouse. 15 December 2021 SF Bay Area ClickHouse Meetup
How to use Impala query plan and profile to fix performance issuesCloudera, Inc.
Apache Impala is an exceptional, best-of-breed massively parallel processing SQL query engine that is a fundamental component of the big data software stack. Juan Yu demystifies the cost model Impala Planner uses and how Impala optimizes queries and explains how to identify performance bottleneck through query plan and profile and how to drive Impala to its full potential.
Avro Tutorial - Records with Schema for Kafka and HadoopJean-Paul Azar
Covers how to use Avro to save records to disk. This can be used later to use Avro with Kafka Schema Registry. This provides background on Avro which gets used with Hadoop and Kafka.
Kafka on ZFS: Better Living Through Filesystems confluent
(Hugh O'Brien, Jet.com) Kafka Summit SF 2018
You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!)
Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka.
-Striping cheap disks to maximize instance IOPS
-Block compression to reduce disk usage by ~80% (JSON data)
-Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments
-Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free
We’ll cover:
-Basic Principles
-Adapting ZFS for cloud instances (gotchas)
-Performance tuning for Kafka
-Benchmarks
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
Ceph is designed around the assumption that all components of the system (disks, hosts, networks) can fail, and has traditionally leveraged replication to provide data durability and reliability. The CRUSH placement algorithm is used to allow failure domains to be defined across hosts, racks, rows, or datacenters, depending on the deployment scale and requirements.
Recent releases have added support for erasure coding, which can provide much higher data durability and lower storage overheads. However, in practice erasure codes have different performance characteristics than traditional replication and, under some workloads, come at some expense. At the same time, we have introduced a storage tiering infrastructure and cache pools that allow alternate hardware backends (like high-end flash) to be leveraged for active data sets while cold data are transparently migrated to slower backends. The combination of these two features enables a surprisingly broad range of new applications and deployment configurations.
This talk will cover a few Ceph fundamentals, discuss the new tiering and erasure coding features, and then discuss a variety of ways that the new capabilities can be leveraged.
Where's the source, Luke? : How to find and debug the code behind PloneVincenzo Barone
Plone, being a python based CMS written as a project for the Zope application server, consist almost entirely of python modules and a number of configuration files. Python source code is loved by many in the community for its explicit readablity; however, for many experienced software developers, coming over to the Plone technology stack can be a haunting experience. It seems everything is hidden away as pickled object in the ZODB, and that layers of magic prevent one from understanding how it works and how to affect change. This presentation will explain to the novice: - how to track down the python source behind Plone - how to take advantage of rich open source tools like ctags and pdb - best practices for getting started with file system product development
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDataWorks Summit
When interacting with analytics dashboards in order to achieve a smooth user experience, two major key requirements are sub-second response time and data freshness. Cluster computing frameworks such as Hadoop or Hive/Hbase work well for storing large volumes of data, although they are not optimized for ingesting streaming data and making it available for queries in realtime. Also, long query latencies make these systems sub-optimal choices for powering interactive dashboards and BI use-cases.
In this talk we will present Druid as a complementary solution to existing hadoop based technologies. Druid is an open-source analytics data store, designed from scratch, for OLAP and business intelligence queries over massive data streams. It provides low latency realtime data ingestion and fast sub-second adhoc flexible data exploration queries.
Many large companies are switching to Druid for analytics, and we will cover how druid is able to handle massive data streams and why it is a good fit for BI use cases.
Agenda -
1) Introduction and Ideal Use cases for Druid
2) Data Architecture
3) Streaming Ingestion with Kafka
4) Demo using Druid, Kafka and Superset.
5) Recent Improvements in Druid moving from lambda architecture to Exactly once Ingestion
6) Future Work
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
VictoriaLogs: Open Source Log Management System - PreviewVictoriaMetrics
VictoriaLogs Preview - Aliaksandr Valialkin
* Existing open source log management systems
- ELK (ElasticSearch) stack: Pros & Cons
- Grafana Loki: Pros & Cons
* What is VictoriaLogs
- Open source log management system from VictoriaMetrics
- Easy to setup and operate
- Scales vertically and horizontally
- Optimized for low resource usage (CPU, RAM, disk space)
- Accepts data from Logstash and Fluentbit in Elasticsearch format
- Accepts data from Promtail in Loki format
- Supports stream concept from Loki
- Provides easy to use yet powerful query language - LogsQL
* LogsQL Examples
- Search by time
- Full-text search
- Combining search queries
- Searching arbitrary labels
* Log Streams
- What is a log stream?
- LogsQL examples: querying log streams
- Stream labels vs log labels
* LogsQL: stats over access logs
* VictoriaLogs: CLI Integration
* VictoriaLogs Recap
Presentation by Lorenzo Mangani of QXIP at the October 26 SF Bay Area ClickHouse meetup
https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup
https://qxip.net/
Roko Kruze of vectorized.io describes real-time analytics using Redpanda event streams and ClickHouse data warehouse. 15 December 2021 SF Bay Area ClickHouse Meetup
How to use Impala query plan and profile to fix performance issuesCloudera, Inc.
Apache Impala is an exceptional, best-of-breed massively parallel processing SQL query engine that is a fundamental component of the big data software stack. Juan Yu demystifies the cost model Impala Planner uses and how Impala optimizes queries and explains how to identify performance bottleneck through query plan and profile and how to drive Impala to its full potential.
Avro Tutorial - Records with Schema for Kafka and HadoopJean-Paul Azar
Covers how to use Avro to save records to disk. This can be used later to use Avro with Kafka Schema Registry. This provides background on Avro which gets used with Hadoop and Kafka.
Kafka on ZFS: Better Living Through Filesystems confluent
(Hugh O'Brien, Jet.com) Kafka Summit SF 2018
You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!)
Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka.
-Striping cheap disks to maximize instance IOPS
-Block compression to reduce disk usage by ~80% (JSON data)
-Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments
-Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free
We’ll cover:
-Basic Principles
-Adapting ZFS for cloud instances (gotchas)
-Performance tuning for Kafka
-Benchmarks
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
Ceph is designed around the assumption that all components of the system (disks, hosts, networks) can fail, and has traditionally leveraged replication to provide data durability and reliability. The CRUSH placement algorithm is used to allow failure domains to be defined across hosts, racks, rows, or datacenters, depending on the deployment scale and requirements.
Recent releases have added support for erasure coding, which can provide much higher data durability and lower storage overheads. However, in practice erasure codes have different performance characteristics than traditional replication and, under some workloads, come at some expense. At the same time, we have introduced a storage tiering infrastructure and cache pools that allow alternate hardware backends (like high-end flash) to be leveraged for active data sets while cold data are transparently migrated to slower backends. The combination of these two features enables a surprisingly broad range of new applications and deployment configurations.
This talk will cover a few Ceph fundamentals, discuss the new tiering and erasure coding features, and then discuss a variety of ways that the new capabilities can be leveraged.
Where's the source, Luke? : How to find and debug the code behind PloneVincenzo Barone
Plone, being a python based CMS written as a project for the Zope application server, consist almost entirely of python modules and a number of configuration files. Python source code is loved by many in the community for its explicit readablity; however, for many experienced software developers, coming over to the Plone technology stack can be a haunting experience. It seems everything is hidden away as pickled object in the ZODB, and that layers of magic prevent one from understanding how it works and how to affect change. This presentation will explain to the novice: - how to track down the python source behind Plone - how to take advantage of rich open source tools like ctags and pdb - best practices for getting started with file system product development
In this deck, I quickly summarize how people have dealt with logging in Docker historically and then describe a comprehensive approach to logging and monitoring in Docker, based on research and customer interviews.
While Docker adds a welcome layer of abstraction to the deployment of applications, it also challenges assumptions on how those applications should be managed. I discuss a comprehensive approach for collecting logs and metrics into a centralized platform and dissect the latest additions to Docker itself (log drivers, stats).
The most hated thing a developer can imageine is writing documentation but on the other hand nothing can compare with a well documented source code if you want to change or extend some code. PhpDocumentor is one of many tools enabling you to parse the inline documentation and generate well structured and referenced documents. This tallk will show you how to get the most out of phpDocumentor and shall enable you to write fantastic documentation.
Von Entwicklern zu tiefst verachtet und in vielen Situationen dennoch heiß geliebt, ist eine ausführliche Dokumentation des Quellcodes. Grade, wenn es um die Anpassung und/oder Erweiterung von legacy Code geht, wird der Ruf nach Dokumentation laut.
PhpDocumentor ist eines von vielen Tools, die uns Entwicklern das dokumentatorische Leben etwas leichter machen können. Es scannt den Quellcode nach Annotationen, Vererbungen, etc. und generiert strukturierte Dokumentationen daraus.
Dieser Vortrag stellt PhpDocumentor im Detail vor und geht nicht nur auf die zahlreichen Möglichkeiten dieses Tools ein, sondern zeigt detailliert anhand von Beispielen, wie diese optimal eingesetzt werden können.
Covers Performance improvements with the Symfony web framework for PHP.
- Google cares about user happiness, Google owns your search traffic ...so Google put page speed in PageRank (and crawl speed)
- Your site is more trustworthy and less frustrating
- Increase page views and ad impressions
- Increase conversions and revenue! It pays for itself!
- Bonus: run less app servers
Overview of RPM packaging in Fedora project. How to get started with RPM packaging and how RPMs are built for Fedora and what tools are used for the process.
These are the slides for the seminar to have a basic overview on the GO Language, By Alessandro Sanino.
They were used on a Lesson in University of Turin (Computer Science Department) 11-06-2018
Why I like PHPStorm
Advantages of Using Docker
Client, Docker Host, Registry
Docker Usage
Solr Docker File
Every Day Docker Commands
Docker Search
One Line Scripts
Portainer
Kinematic
Docker Compose
Grafana
Coding style guide
PHPCS/MD
Documentation Rules
Xdebug
Postman
Fluentd meetup dive into fluent plugin (outdated)N Masahiro
Fluentd meetup in Japan. I talked about "Dive into Fluent plugin".
Some contents are outdated. See this slide: http://www.slideshare.net/repeatedly/dive-into-fluentd-plugin-v012
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
8. Agenda: This slide
talk about
- an example of Fluentd plugins
- Fluentd and libraries
- how to develop a Fluentd plugin
don’t talk about
- the details of each plugin
- the experience of production
25. Serialization:
JSON like fast and compact format.
RPC:
Async and parallelism for high performance.
IDL:
Easy to integrate and maintain the service.
37. We use ‘register_xxx’ to register a plugin.
’xxx’ is ’input’, ’filter’, ’buffer’ and ’output’
38. We can load the plugin configuration using
config_param and configure method.
config_param set config value to
@<config name> automatically.
Supported types are below:
http://docs.fluentd.org/articles/config-
file#supported-data-types-for-values
39. class TailInput < Input
Plugin.register_input(’tail’, self)
config_param :path, :string
...
end
<source>
type tail
path /path/to/log
...
</source> fluentd.conf
in_tail.rb
40. One trick is here:
Fluentd’s configuration module does not
verify a default value. So,
we can use the nil like Tribool :)
config_param :tag, :string, :default => nil
Fluentd does not check the type
42. SetTagKeyMixin:
Provide ‘tag_key’ and ‘include_tag_key’.
SetTimeKeyMixin:
Provide ‘time_key’ and ‘include_time_key’.
HandleTagNameMixin:
Provide
‘remove_tag_prefix’ and ‘add_tag_prefix’
‘remove_tag_suffix’ and ‘add_tag_suffix’
43. Code Flow
Mixin usage
class MongoOutput <
BufferedOutput
...
include SetTagKeyMixin
config_set_default
:include_tag_key, false
...
end MongoOutput
SetTagKeyMixin
BufferedOutput
super
super
super
45. Default 3rd party
Available plugins
exec
forward
http
syslog
tail
monitor_agent
etc...
mongo_tail
scribe
msgpack
dstat
kafka
amqp2
etc...
46. class NewInput < Input
...
def configure(conf)
# parse a configuration manually
end
def start
# invoke action
end
def shutdown
# cleanup resources
end
end
47. In action method,
we use router.emit to input data.
tag = "app.tag"
time = Engine.now
record = {"key" => "value", ...}
router.emit(tag, time, record)
Sample:
48. How to read an input in an efficient way?
We use a thread and an event loop.
49. class ForwardInput < Fluent::Input
...
def start
...
@thread = Thread.new(&method(:run))
end
def run
...
end
end
Thread
50. class ForwardInput < Fluent::Input
...
def start
@loop = Coolio::Loop.new
@lsock = listen
@loop.attach(@lsock)
...
end
...
end
Event loop
53. Default 3rd party
Available plugins
grep
stdout
record_transformer
parser
geoip
record_map
typecast
record_modifier
etc...
54. class NewFilter < Filter
# configure, start and shutdown
# are same as input plugin
def filter(tag, time, record)
# Modify record and return it.
# If returns nil, that records are ignored
record
end
end
62. class NewOutput < BufferedOutput
# configure, start and shutdown
# are same as input plugin
def format(tag, time, record)
# convert event to raw string
end
def write(chunk)
# write chunk to target
# chunk has multiple formatted data
end
end
63. Output has 3 buffering modes.
None
Buffered
ObjectBuffered
Time sliced
64. Buffered Time sliced
Buffering type
chunk
go out
chunk
chunk
from in
chunk
limit
queue
limit
Buffer has an internal
map to manage a chunk.
A key is “” in Buffered,
but a key is time slice in
TimeSliced buffer.
def write(chunk)
# chunk.key is time slice
end
67. class MongoOutputTest < Test::Unit::TestCase
def setup
Fluent::Test.setup
require 'fluent/plugin/out_mongo'
end
def create_driver(conf = CONFIG)
Fluent::Test::BufferedOutputTestDriver.new
(Fluent::MongoOutput) {
def start # prevent external access
super
end
...
}.configure(conf)
end
68. ...
def test_format
# test format using emit and expect_format
end
def test_write
d = create_driver
t = emit_documents(d)
# return a result of write method
collection_name, documents = d.run
assert_equal([{...}, {...}, ...], documents)
assert_equal('test', collection_name)
end
...
end