Masahiro Nakagawa from Treasure Data gave a presentation on Fluentd, an open source log collector. Fluentd allows for reliable and structured logging, forwarding, and processing of data through its pluggable architecture. It can collect logs from various sources and output to different destinations using plugins. Common uses of Fluentd include log aggregation, monitoring, and analysis on large-scale architectures.
Fluentd meetup dive into fluent plugin (outdated)N Masahiro
Fluentd meetup in Japan. I talked about "Dive into Fluent plugin".
Some contents are outdated. See this slide: http://www.slideshare.net/repeatedly/dive-into-fluentd-plugin-v012
Fluentd meetup dive into fluent plugin (outdated)N Masahiro
Fluentd meetup in Japan. I talked about "Dive into Fluent plugin".
Some contents are outdated. See this slide: http://www.slideshare.net/repeatedly/dive-into-fluentd-plugin-v012
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...Data Con LA
Since Doug Cutting invented Hadoop and Amazon Web Services released S3 ten years ago, we've seen quite a bit of innovation in large-scale data storage and processing. These innovations have enabled engineers to build data infrastructure at scale, many of them fail to fill their scalable systems with useful data, struggling to unify data silos or failing to collect logs from thousands of servers and millions of containers. Fluentd and Embulk are two projects that I've been involved to solve the unsexy yet critical problem of data collection and transport. In this talk, I will give an overview of Fluentd and Embulk and give a survey of how they are used at companies like Microsoft and Atlassian or in projects like Docker and Kubernetes.
From zero to hero - Easy log centralization with Logstash and ElasticsearchRafał Kuć
Presentation I gave during DevOps Days Warsaw 2014 about combining Elasticsearch, Logstash and Kibana together or use our Logsene solution instead of Elasticsearch.
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...Data Con LA
Since Doug Cutting invented Hadoop and Amazon Web Services released S3 ten years ago, we've seen quite a bit of innovation in large-scale data storage and processing. These innovations have enabled engineers to build data infrastructure at scale, many of them fail to fill their scalable systems with useful data, struggling to unify data silos or failing to collect logs from thousands of servers and millions of containers. Fluentd and Embulk are two projects that I've been involved to solve the unsexy yet critical problem of data collection and transport. In this talk, I will give an overview of Fluentd and Embulk and give a survey of how they are used at companies like Microsoft and Atlassian or in projects like Docker and Kubernetes.
From zero to hero - Easy log centralization with Logstash and ElasticsearchRafał Kuć
Presentation I gave during DevOps Days Warsaw 2014 about combining Elasticsearch, Logstash and Kibana together or use our Logsene solution instead of Elasticsearch.
How I learned to time travel, or, data pipelining and scheduling with AirflowLaura Lorenz
****UPDATE: Project is now open sourced at https://www.github.com/industrydive/fileflow****
From Pydata DC 2016
Description
Data warehousing and analytics projects can, like ours, start out small - and fragile. With an organically growing mess of scripts glued together and triggered by cron jobs hiding on different servers, we needed better plumbing. After perusing the data pipelining landscape, we landed on Airflow, an Apache incubating batch processing pipelining and scheduler tool from Airbnb.
Abstract
The power of any reporting tool breaks based on the data behind it, so when our data warehousing process got too big for its humble origins, we searched for something better. After testing out several options such as Drake, Pydoit, Luigi, AWS Data Pipeline, and Pinball, we landed on Airflow, an Apache incubating batch processing pipelining and scheduler tool originating from Airbnb, that provides the benefits of pipeline construction as directed acyclic graphs (DAGs), along with a scheduler that can handle alerting, retries, callbacks and more to make your pipeline robust. This talk will discuss the value of DAG based pipelines for data processing workflows, highlight useful features in all of the pipelining projects we tested, and dive into some of the specific challenges (like time travel) and successes (like time travel!) we’ve experienced using Airflow to productionize our data engineering tasks. By the end of this talk, you will learn
- pros and cons of several Python-based/Python-supporting data pipelining libraries
- the design paradigm behind Airflow, an Apache incubating data pipelining and scheduling service, and what it is good for
- some epic fails to avoid and some epic wins to emulate from our experience porting our data engineering tasks to a more robust system
- some quick-start tips for implementing Airflow at your organization.
Lone Star PHP 2013 - Sysadmin Skills for PHP DevelopersPablo Godel
A PHP/web developer is not complete without knowing server administration. When looking for a job, it is quite likely that you will be required to know about installing and configuring a LAMP (Linux, Apache, MySQL and PHP). Your prospects of getting that job are much higher when you know these. In this session we will cover: - Introduction to LAMP servers - Frequently used Unix commands - Linux distro Differences - Prepare your LAMP server before going live - Overall LAMP Security - Performance tips to improve your website speed - Compiling PHP from source. Adding PHP extensions. - Basic description of web hosting options for PHP.
The new GDPR regulation went into effect on May 25th. While a majority of conversations have revolved around the security and IT aspects of the law, marketing teams will play a crucial role in helping organizations meet GDPR standards and playing a strategic role across the organization . Join us to learn more, engage with your peers and get prepared.
This webinar will cover:
- How complying with the GDPR will drive better marketing and raise the standard of the quality of your customer engagement
- The GDPR elements marketers must know about
- The elements of PII that will be affected and what marketers need to do about it
- A deep dive on how GDPR regulations will affect your marketing channels - email, programmatic advertising, cold calls, etc.
- Tactical marketing updates needed to meet GDPR guidelines
AR and VR by the Numbers: A Data First Approach to the Technology and MarketTreasure Data, Inc.
With AR and VR technologies, it’s the first time that data collection has been part of the front-end strategy vs back-end process. As companies compete to create new, interactive experiences, data is the tool of choice to measure all aspects of player engagement and marketing effectiveness. In this webinar, two industry experts, Nicolas Nadeau and Andrew Mayer, will talk about the trends driving AR and VR markets today, and what data-driven approaches companies need to think about to compete in these markets tomorrow.
An overview of Customer Data Platforms (CDP) with the industry leader who coined the term, David Raab. Find out how to use Live Customer Data to create a better customer experience and how Live Data Management can give you a competitive edge with a 360 degree view of your clients.
Learn:
- The definition and requirements for Customer Data Platforms
- The differences between Customer Data Platforms and comparative technologies such as Data Warehousing and Marketing Automation
- Reference architectures/approaches to building CDP
- How Treasure Data is used to build Customer Data Platforms
And here's the song: https://youtu.be/RalMozVq55A
In this hands-on webinar we will cover how to leverage the Treasure Data Javascript SDK library to ensure user stitching of web data into the Treasure Data Customer Data Platform to provide a holistic view of prospects and customers.
We will demo the native SDK, as well as deploying the SDK inside of Adobe DTM and Google Tag Manager.
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowTreasure Data, Inc.
In this hands-on webinar we'll explore the data warehousing concept of Slowly Changing Dimensions (SCDs) and common use cases for managing SCDs when dealing with customer data. This webinar will demonstrate different methods for tracking SCDs in a data warehouse, and how Treasure Data Workflow can be used to create robust data pipelines to handle these processes.
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsTreasure Data, Inc.
Gaming companies with multiple products often struggle to calculate accurate Customer Lifetime Value (CLTV) across their portfolio. This is because user data is often analyzed in silos so companies are unable to get a clear picture of ROI and CLTV across platforms, devices and apps.
In this webinar we’ll look at how you can apply a holistic and complete approach to your CLTV and ROI through the lens of gaming companies, though this technique is applicable for any company who has products spanning platforms.
We’ll also explore:
How the integral power of data in business has shifted over the past 10 years.
Discover the current technologies and processes used to analyze data across different platforms by combining multiple data streams, looking at examples in brand and portfolio-based LTV.
How to process and centralize dozens of varying data streams.
Nicolas Nadeau will speak from his extensive experience and show how leveraging data from multiple product strategies spanning many platforms can be highly beneficial for your company.
Do you know what your top ten 'happy' customers look like? Would you like to find ten more just like them? Come learn how to leverage 1st & 3rd party data to map your customer journey and drive users down a path where every interaction is personalized, fun, & data-driven. No more detractors, power your Customer Experience with data!
In this webinar you will learn:
-When, why, and how to leverage 1st, 2nd, and 3rd party data
-Tips & Tricks for marketers to become more data driven when launching their campaigns
-Why all marketers needs a 360 degree customer view
The reality is virtual, but successful VR games still require cold, hard data. For wildly popular games like Survios’ Raw Data, the first VR-exclusive game to reach #1 on Steam’s Global Top Sellers list, data and analytics are the key to success.
And now online gaming companies have the full-stack analytics infrastructure and tools to measure every aspect of a virtual reality game and its ecosystem in real time. You can keep tabs on lag, which ruins a VR experience, improve gameplay and identify issues before they become showstoppers, and create fully personalized, completely immersive experiences that blow minds and boost adoption, and more. All with the right tools.
Make success a reality: Register now for our latest interactive VB Live event, where we’ll tap top experts in the industry to share insights into turning data into winning VR games.
Attendees will:
* Understand the role of VR in online gaming
* Find out how VR company Survios successfully leverages the Exostatic analytics infrastructure for commercial and gaming success
* Discover how to deploy full-stack analytics infrastructure and tools
Speakers:
Nicolas Nadeau, President, Exostatic
Kiyoto Tamura, VP Marketing, Treasure Data
Ben Solganik, Producer, Survios
Stewart Rogers, Director of Marketing Technology, VentureBeat
Wendy Schuchart, Moderator, VentureBeat
Harnessing Data for Better Customer Experience and Company SuccessTreasure Data, Inc.
As big data has exploded, the ability for companies to easily leverage it has imploded. Organizations are drowning in their own information, unable to see the forest through the trees, while the big players consistently outperform in their ability to deliver a great customer experience, faster, cheaper…As a result, the vast majority of companies are scrambling to catch up and become more agile, data-driven, to use their data more effectively so they can attract and retain their elusive customers...
In this joint deck by 451 Research and Treasure Data, you will learn how to enable your line of business team to own their own data (instead of relying on IT) to be able to:
- deliver a single, persistent view of your customer based on behavior data
- make that data accessible to the right people at the right time
- Increase organizational effectiveness by (finally) breaking down silos with data
- enable powerful marketing tools to enhance the customer experience
How to make your open source project MATTER
Let’s face it: most open source projects die. “For every Rails, Docker and React, there are thousands of projects that never take off. They die in the lonely corners of GitHub, only to be discovered by bots scanning for SSH private keys.
Over the last 5 years, I worked on and off on marketing a piece of infrastructure middleware called Fluentd. We tried many things to ensure that it did not die: From speaking at events, speaking to strangers, giving away stickers, making people install Fluentd on their laptop. Most everything I tried had a small, incremental effect, but there were several initiatives/hacks that raised Fluentd’s awareness to the next level. As I listed up these “ideas that worked”, I noticed the common thread: they all brought Fluentd into a new ecosystem via packaging.”
* 행사 정보 :2016년 10월 14일 MARU180 에서 진행된 '데이터야 놀자' 1day 컨퍼런스 발표 자료
* 발표자 : Dylan Ko (고영혁) Data Scientist / Data Architect at Treasure Data
* 발표 내용
- 데이터사이언티스트 고영혁 소개
- Treasure Data (트레저데이터) 소개
- 데이터로 돈 버는 글로벌 사례 #1
>> MUJI : 전통적 리테일에서 데이터 기반 O2O
- 데이터로 돈 버는 글로벌 사례 #2
>> WISH : 개인화&자동화를 통한 쇼핑 최적화
- 데이터로 돈 버는 글로벌 사례 #3
>> Oisix : 머신러닝으로 이탈고객 예측&방지
- 데이터로 돈 버는 글로벌 사례 #4
>> 워너브로스 : 프로세스 자동화로 시간과 돈 절약
- 데이터로 돈 버는 글로벌 사례 #5
>> Dentsu 등의 애드테크(Adtech) 회사들
- 데이터로 돈을 벌고자 할 때 반드시 체크해야 하는 것
Keynote on Fluentd Meetup Summer
Related Slide
- Fluentd ServerEngine Integration & Windows Support http://www.slideshare.net/RittaNarita/fluentd-meetup-2016-serverengine-integration-windows-support
- Fluentd v0.14 Plugin API Details http://www.slideshare.net/tagomoris/fluentd-v014-plugin-api-details
John Hammink's Talk at Great Wide Open 2016. We discuss: 1.) the need for data analytics infrastructure that can scale exponentially and 2.) what such an infrastructure must contain and finally 3.) the need for an infrastructure to be able to handle un - and semi-structured data.
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...Treasure Data, Inc.
Migrate your semi-structured data from MySQL to Amazon Redshift in as few steps as possible. From Amazon Web Services Bay Area meetup @ Sumo Logic, December 3, 2015.
This presentation describes the common issues when doing application logging and introduce how to solve most of the problems through the implementation of an unified logging layer with Fluentd.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
7. Related Products
easier & shorter time
Collect
???
Thursday, October 31, 13
Store Process
Cloudera
Horton Works
Treasure Data
Visualize
Excel
Tableau
R
12. In short
>
Open sourced log collector written in Ruby
>
Using rubygems ecosystem for plugins
It’s like syslogd, but
uses JSON for log messages
Thursday, October 31, 13
14. Event structure(log message)
✓ Time
>
default second unit
>
from data source or
adding parsed time
✓ Tag
>
for message routing
Thursday, October 31, 13
✓ Record
>
JSON format
>
MessagePack
internally
>
non-unstructured
17. Configuration and operation
●
>
No central / master node
>
●
>
HTTP include helps conf sharing
Operation depends on your environment
>
>
●
>
Use your deamon management
Use Chef in Treasure Data
Apache like syntax and Ruby DSL
Thursday, October 31, 13
18. # receive events via HTTP
<source>
type http
port 8888
</source>
# save alerts to a file
<match alert.**>
type file
path /var/log/fluent/alerts
</match>
# read logs from a file
<source>
type tail
path /var/log/httpd.log
format apache
tag apache.access
</source>
# forward other logs to servers
<match **>
type forward
<server>
host 192.168.0.11
weight 20
</server>
<server>
host 192.168.0.12
weight 60
</server>
</match>
# save access logs to MongoDB
<match apache.access>
type mongo
database apache
collection log
</match>
Thursday, October 31, 13
include http://example.com/conf
19. Reliability (core + plugin)
>
●
Buffering
>
Use file buffer for persistent data
>
buffer chunk has ID for idempotent
>
●
Retrying
>
●
Error handling
>
transaction, failover, etc on forward plugin
>
secondary
Thursday, October 31, 13
30. Other status
>
●
Localizing docs into Japanese
>
>
●
https://github.com/fluent/fluentd-docs/tree/
master/docs/ja
Windows support
>
Started by JBAT
https://github.com/fluent/fluentd/tree/windows
>
Thursday, October 31, 13
Feedback and patch are welcome!
31. v11
>
●
Spec is not fixed yet
>
●
Breaking source code compatibility
>
●
Several improvments
>
>
>
●
routing label, filter, error stream, etc.
serverengine based: multi-process, signal, etc.
http://magazine.rubyist.net/?0044FluentdV11NewFeatures
Thursday, October 31, 13
32. td-agent
>
●
Open sourced distribution package of Fluentd
>
>
>
●
ETL part of Treasure Data
deb, rpm, homebrew
Including useful components
>
>
>
●
ruby, jemalloc, fluentd
3rd party gems: td, mongo, webhdfs, etc...
http://packages.treasure-data.com/
Thursday, October 31, 13
36. Pros and Cons
>
●
Pros
>
>
●
Using central master to manage all nodes
Cons
>
Java culture (Pros for Java-er?)
Difficult configuration and setup
>
Difficult topology
>
Mainly for Hadoop
less plugins?
Thursday, October 31, 13
38. Pros and Cons
>
●
Pros
>
>
Built-in ElasticSearch and Kibana
>
>
●
Bundled 140 plugins (input/filter/codec/output)
Works on Windows but unstable...
Cons
>
mainly for JRuby
>
Need external daemon for centralized env
Redis, RabbitMQ or etc
Thursday, October 31, 13
40. Treasure Data
Worker
Frontend
Hadoop
Job Queue
Hadoop
Applications push
metrics to Fluentd
(via local Fluentd)
Treasure
Data
for historical analysis
Thursday, October 31, 13
Fluentd
Fluentd
sums up data minutes
(partial aggregation)
Librato
Metrics
for realtime analysis
41. Cookpad
hundreds of app servers
Rails app
td-agent
sends event logs
Rails app
td-agent
Daily/Hourly
Batch
Treasure Data
sends event logs
Rails app
MySQL
td-agent
sends event logs
Unlimited scalability
Flexible schema
Realtime
Less performance impact
Thursday, October 31, 13
Google
Spreadsheet
Logs are available
after several mins.
Feedback rankings
KPI
visualization
✓ Over 100 RoR servers (2012/2/4)
45. Conclusion
>
●
Fluentd is now a widely-used project
>
>
>
●
There are many use cases
Many contributors and plugins
Keep it simple
>
Thursday, October 31, 13
Easy to use and integrate your environment