Timeseries data in Riak - Riak Meetup Stockholm 1/11/2012

•

4 likes•2,629 views

Bip Thelin

Technology

Metrics with Riak
A retrospective

Martin
Törnwall

Metrics?
Many definitions, but here's ours...

Recording things
that change over time
So we can visualize it and search for
patterns

OS
CPU, network, memory and disk usage, ...

Application
Number of requests, errors, events, ...

External events
Text messages or emails sent, customer
service calls, ...

What is a Metric?
● A named variable: "sys.mem.free"
● With tags: "host=sl075", "code=403", ...

avg("sys.mem.free") from 1 hour ago
where host="sl075"

We have distributed
services
Why not have distributed metrics?

Reinventing the wheel?
Solutions exist, but rely on technology stacks
we had no experience of (e.g., HBASE)

I mean, really...
Just how hard can it be?

Introducing Metyr
Our weekend hack glorious metrics
storage and processing software

Design Decisions
● Use familiar tools: Erlang, Riak, HTTP
● Not a critical service but ...
● ... Avoid SPOF
● Write performance >> read performance
● Centralized reference clock
● Integer only
● Avoid 2i if possible
● When in doubt, leave it to Riak

In Theory...

Client Client Client

Metyr Metyr Metyr

Riak cluster

Storing metrics in Riak
No SQL, no schemas, no indices (?), no
aggregate operations

Attempt 1
The naïve way just never works...

Make each sample an
object
A bucket per metric; index by Epoch time

The Good™
Atomicity, write-once, fast range queries

The Bad
Slow, large overhead, requires 2i

Attempt 2
Combine samples into chunks by time

Key Points
● One bucket per metric as before
● Split into hour-sized chunks
(configurable)
● Chunk key: Epoch time
● Chunk value: List of samples
● To read: Fetch chunks within interval
● To write: Fetch chunk, add sample, write
back

Chunk Anatomy

One sample

Time0 Value0 Tags0... ... TimeN ValueN TagsN...

64 bits 64 bits

Writing just got harder
Slower since we must fetch a chunk first;
potential race conditions, ...

(Arbitrary) Goal:
Write 1K samples/sec
Tests showed that the solution described
so far was inadequate

Buffer them writes
Keep per-metric write buffers, flushed
every 10 seconds or so

Some Remaining Issues
● Race condition on write
● Storage requirements
● Downsampling of old data

Viewers also liked

Riak Meetup Stockholm 1/11/2012

Bip Thelin

Riak at Kivra

Bip Thelin

The technologies and people we are designing experiences for are constantly changing, in most cases they are changing at a rate that is difficult keep up with. When we think about how our teams are structured and the design processes we use in light of this challenge, a new design problem (or problem space) emerges, one that requires us to focus inward. How do we structure our teams and processes to be resilient? What would happen if we looked at our teams and design process as IA’s, Designers, Researchers? What strategies would we put in place to help them be successful? This talk will look at challenges we face leading, supporting, or simply being a part of design teams creating experiences for user groups with changing technological needs.

Designing Teams for Emerging Challenges

Aaron Irizarry

These slides are adapted from a talk I gave at the Welsh Government's Marketing Awards for the LAM sector, in 2017. It offers a primer on UX - User Experience - and how ethnography and design might be used in the library, archive and museum worlds to better understand our users. All good marketing starts with audience insight. The presentation covers the following: 1) An introduction to UX 2) Ethnography, with definitions and examples of 7 ethnographic techniques 3) User-centred design and Design Thinking 4) Examples of UX-led changes made at institutions in the UK and Scandinavia 5) Next Steps - if you'd like to try out UX at your own organisation

UX, ethnography and possibilities: for Libraries, Museums and Archives

Ned Potter

An immersive workshop at General Assembly, SF. I typically teach this workshop at General Assembly, San Francisco. To see a list of my upcoming classes, visit https://generalassemb.ly/instructors/seth-familian/4813 I also teach this workshop as a private lunch-and-learn or half-day immersive session for corporate clients. To learn more about pricing and availability, please contact me at http://familian1.com

Visual Design with Data

Seth Familian

3 Things Every Sales Team Needs to Be Thinking About in 2017

Drift

Viewers also liked (6)

Riak Meetup Stockholm 1/11/2012

Riak at Kivra

Designing Teams for Emerging Challenges

UX, ethnography and possibilities: for Libraries, Museums and Archives

Visual Design with Data

3 Things Every Sales Team Needs to Be Thinking About in 2017

Similar to Timeseries data in Riak - Riak Meetup Stockholm 1/11/2012

SOSCON 2016 JerryScript

Samsung Open Source Group

Invitae is one of the fastest growing genetic information companies, whose mission is to bring comprehensive genetic information into mainstream medical practice to improve the quality of healthcare for billions of people. We have recently partnered with another lab, requiring an integration layer that was developed as part of a dizzying leap from a traditional Python service architecture to Scala Streaming applications on Kafka and Kubernetes. This presentation is our story, where we discuss challenges and solutions, error handling and resilience techniques, technology stack choices and compromises, tools and approaches we have developed, and general insights. Beyond engineering itself, our team's goal is enabling others to join in. Building an application entirely of Streams is a significant and in many ways liberating paradigm shift. In addition to learning to architect and understand how the application will behave and evolve, success depends on great tooling. We will show, for example, how we extended KStreams API to seamlessly include Avro Schema as part of our build and code infrastructure, completely automating SerDe derivation, introducing typed topics, and still supporting polyglot teams. Other highlights: - Self-healing streams with aggregation, and deciding when to crash - Connectors vs Streams for side effects - Scheduling with Streams - Deriving topology diagrams - Monitoring and metrics as Streams - Combining Avro, Swagger and code generation, plus avro4s vs avrohugger comparison - Typelevel Cats and its role in our success - http4s and hybrid testing

From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...

confluent

Building a CRM on top of ElasticSearch

Mark Greene

Faceted search with Oracle InMemory option

Alexander Tokarev

Cloud Security Monitoring and Spark Analytics

amesar0

AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English

Omid Vahdaty

A brief intro to how Barracuda Networks uses Cassandra and the ways in which they are replacing their MySQL infrastructure, with Cassandra. This presentation will include the lessons they've learned along the way during this migration. Speaker: Michael Kjellman, Software Engineer at Barracuda Networks Michael Kjellman is a Software Engineer, from San Francisco, working at Barracuda Networks. Michael works across multiple products, technologies, and languages. He primarily works on Barracuda's spam infrastructure and web filter classification data.

Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known

DataStax

Spark has made writing big data pipelines much easier than before. But a lot of effort is required to maintain performant and stable data pipelines in production over time. Did I choose the right type of infrastructure for my application? Did I set the Spark configurations correctly? Can my application keep running smoothly as the volume of ingested data grows over time? How to make sure that my pipeline always finishes on time and meets its SLA? These questions are not easy to answer even for a handful of jobs, and this maintenance work can become a real burden as you scale to dozens, hundreds, or thousands of jobs. This talk will review what we found to be the most useful piece of information and parameters to look at for manual tuning, and the different options available to engineers who want to automate this work, from open-source tools to managed services provided by the data platform or third parties like the Data Mechanics platform.

How to Automate Performance Tuning for Apache Spark

Databricks

The number of deployments of Apache Kafka at enterprise scale has greatly increased in the years since Kafka’s original development in 2010. Along with this rapid growth has come a wide variety of use cases and deployment strategies that transcend what Kafka’s creators imagined when they originally developed the technology. As the scope and reach of streaming data platforms based on Apache Kafka has grown, the need to understand monitoring and troubleshooting strategies has as well. Dustin Cote and Ryan Pridgeon share their experience supporting Apache Kafka at enterprise-scale and explore monitoring and troubleshooting techniques to help you avoid pitfalls when scaling large-scale Kafka deployments. Topics include: - Effective use of JMX for Kafka - Tools for preventing small problems from becoming big ones - Efficient architectures proven in the wild - Finding and storing the right information when it all goes wrong Visit www.confluent.io for more information.

Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka

confluent

Fixing twitter

Roger Xia

Fixing_Twitter

liujianrong

Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...

smallerror

Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...

xlight

Building data "Py-pelines"

Rob Winters

MongoDB for Time Series Data: Sharding

MongoDB

Benchmarks, performance, scalability, and capacity what's behind the numbers

Justin Dorfman

Benchmarks, performance, scalability, and capacity what s behind the numbers...

james tong

[RakutenTechConf2013] [D-3_2] Counting Big Databy Streaming Algorithms

Rakuten Group, Inc.

At Nielsen, data is very important. Being the core of our business, we love it and there’s lots of it. We don’t want to lose it, and at the same time, we don’t want to duplicate it. Our data goes through a robust Kafka architecture, into several ETLs, receiving, transforming and storing the data. While we clearly understood our ETLs’ workflow, we had no visibility into what parts of the data, if any, were lost or duplicated, and in which stage or stages of the workflow, from source to destination. But how much do we know about the way our data makes though our systems? And what about the life long question, is it the end of the day yet? In this talk I’m going to present to you the design process behind our Data Auditing system, Life Line. From tracking and producing , to analysing and storing auditing information, using technologies such as Kafka, Avro, Spark, Lambda functions and complex SQL queries. We’re going to cover: * AVRO Audit header * Auditing heart beat - designing your metadata * Designing and optimising your auditing table - what does this data look like anyway? * Creating an alert based monitoring system * Answering the most important question of all - is it the end of the day yet?

Auditing data and answering the life long question, is it the end of the day ...

Simona Meriam

High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...

Lucidworks

Similar to Timeseries data in Riak - Riak Meetup Stockholm 1/11/2012 (20)

SOSCON 2016 JerryScript

From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...

Building a CRM on top of ElasticSearch

Faceted search with Oracle InMemory option

Cloud Security Monitoring and Spark Analytics

AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English

Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known

How to Automate Performance Tuning for Apache Spark

Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka

Fixing twitter

Fixing_Twitter

Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...

Building data "Py-pelines"

MongoDB for Time Series Data: Sharding

Benchmarks, performance, scalability, and capacity what's behind the numbers

Benchmarks, performance, scalability, and capacity what s behind the numbers...

[RakutenTechConf2013] [D-3_2] Counting Big Databy Streaming Algorithms

Auditing data and answering the life long question, is it the end of the day ...

High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...

Recently uploaded

Imagine a world where information flows as swiftly as thought itself, making decision-making as fluid as the data driving it. Every moment is critical, and the right tools can significantly boost your organization’s performance. The power of real-time data automation through FME can turn this vision into reality. Aimed at professionals eager to leverage real-time data for enhanced decision-making and efficiency, this webinar will cover the essentials of real-time data and its significance. We’ll explore: FME’s role in real-time event processing, from data intake and analysis to transformation and reporting An overview of leveraging streams vs. automations FME’s impact across various industries highlighted by real-life case studies Live demonstrations on setting up FME workflows for real-time data Practical advice on getting started, best practices, and tips for effective implementation Join us to enhance your skills in real-time data automation with FME, and take your operational capabilities to the next level.

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Safe Software

Tech Trends Report 2024 Future Today Institute.pdf

hans926745

🐬 The future of MySQL is Postgres 🐘

RTylerCroy

GenCyber Cyber Security Day Presentation

Michael W. Hawkins

Presentation on how to chat with PDF using ChatGPT code interpreter

naman860154

Scaling API-first – The story of a global engineering organization

Radu Cotescu

The Raspberry Pi 5 was announced on October 2023. This new version of the popular embedded device comes with a new iteration of Broadcom’s VideoCore GPU platform, and was released with a fully open source driver stack, developed by Igalia. The presentation will discuss some of the major changes required to support this new Video Core iteration, the challenges we faced in the process and the solutions we provided in order to deliver conformant OpenGL ES and Vulkan drivers. The talk will also cover the next steps for the open source Raspberry Pi 5 graphics stack. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://eoss24.sched.com/event/1aBEx

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Igalia

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

How to convert PDF to text with Nanonets

naman860154

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

BooK Now Call us at +918448380779 to hire a gorgeous and seductive call girl for sex. Take a Delhi Escort Service. The help of our escort agency is mostly meant for men who want sexual Indian Escorts In Delhi NCR. It should be noted that any impersonator will get 100 attention from our Young Girls Escorts in Delhi. They will assume the position of reliable allies. VIP Call Girl With Original Photos Book Tonight +918448380779 Our Cheap Price 1 Hour not available 2 Hours 5000 Full Night 8000 TAG: Call Girls in Delhi, Noida, Gurgaon, Ghaziabad, Connaught Place, Greater Kailash Delhi, Lajpat Nagar Delhi, Mayur Vihar Delhi, Chanakyapuri Delhi, New Friends Colony Delhi, Majnu Ka Tilla, Karol Bagh, Malviya Nagar, Saket, Khan Market, Noida Sector 18, Noida Sector 76, Noida Sector 51, Gurgaon Mg Road, Iffco Chowk Gurgaon, Rajiv Chowk Gurgaon All Delhi Ncr Free Home Deliver

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Delhi Call girls

Partners Life - Insurer Innovation Award 2024

The Digital Insurer

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Rafal Los

[2024]Digital Global Overview Report 2024 Meltwater.pdf

hans926745

What are drone anti-jamming systems? The drone anti-jamming systems and anti-spoof technology protect against interference, jamming, and spoofing of the UAVs. To protect their security, countries are beginning to research drone anti-jamming systems, also known as drone strike weapons. The anti-jam and anti-spoof technology protects against interference, jamming and spoofing. A drone strike weapon is a drone attack weapon that can attack and destroy enemy drones. So what is so unique about this amazing system?

What Are The Drone Anti-jamming Systems Technology?

Antenna Manufacturer Coco

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Discord is a free app offering voice, video, and text chat functionalities, primarily catering to the gaming community. It serves as a hub for users to create and join servers tailored to their interests. Discord’s ecosystem comprises servers, each functioning as a distinct online community with its own channels dedicated to specific topics or activities. Users can engage in text-based discussions, voice calls, or video chats within these channels. Understanding Discord Servers Discord servers are virtual spaces where users congregate to interact, share content, and build communities. Servers may revolve around gaming, hobbies, interests, or fandoms, providing a platform for like-minded individuals to connect. Communication Features Discord offers a range of communication tools, including text channels for messaging, voice channels for real-time audio conversations, and video channels for face-to-face interactions. These features facilitate seamless communication and collaboration. What Does NSFW Mean? The acronym NSFW stands for “Not Safe For Work,” indicating content that may be inappropriate for professional or public settings. NSFW Content NSFW content encompasses material that is sexually explicit, violent, or otherwise graphic in nature. It often includes nudity, profanity, or depictions of sensitive topics.

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

UK Journal

In an era where artificial intelligence (AI) stands at the forefront of business innovation, Information Architecture (IA) is at the core of functionality. See “There’s No AI Without IA” – (from 2016 but even more relevant today) Understanding and leveraging how Information Architecture (IA) supports AI synergies between knowledge engineering and prompt engineering is critical for senior leaders looking to successfully deploy AI for internal and externally facing knowledge processes. This webinar be a high-level overview of the methodologies that can elevate AI-driven knowledge processes supporting both employees and customers. Core Insights Include: Strategic Knowledge Engineering: Delve into how structuring AI's knowledge base is required to prevent hallucinations, enable contextual retrieval of accurate information. This will include discussion of gold standard libraries of use cases support testing various LLMs and structures and configurations of knowledge base. Precision in Prompt Engineering: Learn the art of crafting prompts that direct AI to deliver targeted, relevant responses, thereby optimizing customer experiences and business outcomes. Unified Approach for Enhanced AI Performance: Explore the intersection of knowledge and prompt engineering to develop AI systems that are not only more responsive but also aligned with overarching business strategies. Guiding Principles for Implementation: Equip yourself with best practices, ethical guidelines, and strategic considerations for embedding these technologies into your business ecosystem effectively. This webinar is designed to empower business and technology leaders with the knowledge to harness the full potential of AI, ensuring their organizations not only keep pace with digital transformation but lead the charge. Join us to map a roadmap to fully leverage Information Architecture (IA) and AI chart a course towards a future where AI is a key pillar of strategic innovation and business success.

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Earley Information Science

GenAI Risks & Security Meetup 01052024.pdf

lior mazor

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Tech Trends Report 2024 Future Today Institute.pdf

🐬 The future of MySQL is Postgres 🐘

GenCyber Cyber Security Day Presentation

Presentation on how to chat with PDF using ChatGPT code interpreter

Scaling API-first – The story of a global engineering organization

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Data Cloud, More than a CDP by Matt Robison

2024: Domino Containers - The Next Step. News from the Domino Container commu...

How to convert PDF to text with Nanonets

Exploring the Future Potential of AI-Enabled Smartphone Processors

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Partners Life - Insurer Innovation Award 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024

[2024]Digital Global Overview Report 2024 Meltwater.pdf

What Are The Drone Anti-jamming Systems Technology?

How to Troubleshoot Apps for the Modern Connected Worker

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

GenAI Risks & Security Meetup 01052024.pdf

Timeseries data in Riak - Riak Meetup Stockholm 1/11/2012

1. Metrics with Riak A retrospective Martin Törnwall

2. Metrics? Many definitions, but here's ours...

3. Recording things that change over time So we can visualize it and search for patterns

4. OS CPU, network, memory and disk usage, ...

5. Application Number of requests, errors, events, ...

6. External events Text messages or emails sent, customer service calls, ...

7. What is a Metric? ● A named variable: "sys.mem.free" ● With tags: "host=sl075", "code=403", ... avg("sys.mem.free") from 1 hour ago where host="sl075"

8. Going Technical

9. We have distributed services Why not have distributed metrics?

10. Reinventing the wheel? Solutions exist, but rely on technology stacks we had no experience of (e.g., HBASE)

11. I mean, really... Just how hard can it be?

12. I mean, really... Just how hard can it be?

13. Introducing Metyr Our weekend hack glorious metrics storage and processing software

14. Design Decisions ● Use familiar tools: Erlang, Riak, HTTP ● Not a critical service but ... ● ... Avoid SPOF ● Write performance >> read performance ● Centralized reference clock ● Integer only ● Avoid 2i if possible ● When in doubt, leave it to Riak

15. In Theory... Client Client Client Metyr Metyr Metyr Riak cluster

16. Storing metrics in Riak No SQL, no schemas, no indices (?), no aggregate operations

17. Attempt 1 The naïve way just never works...

18. Make each sample an object A bucket per metric; index by Epoch time

19. The Good™ Atomicity, write-once, fast range queries

20. The Bad Slow, large overhead, requires 2i

21. Attempt 2 Combine samples into chunks by time

22. Key Points ● One bucket per metric as before ● Split into hour-sized chunks (configurable) ● Chunk key: Epoch time ● Chunk value: List of samples ● To read: Fetch chunks within interval ● To write: Fetch chunk, add sample, write back

23. Chunk Anatomy One sample Time0 Value0 Tags0... ... TimeN ValueN TagsN... 64 bits 64 bits

24. Writing just got harder Slower since we must fetch a chunk first; potential race conditions, ...

25. (Arbitrary) Goal: Write 1K samples/sec Tests showed that the solution described so far was inadequate

26. Buffer them writes Keep per-metric write buffers, flushed every 10 seconds or so

27. Some Remaining Issues ● Race condition on write ● Storage requirements ● Downsampling of old data

28. Thank you!

Timeseries data in Riak - Riak Meetup Stockholm 1/11/2012

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to Timeseries data in Riak - Riak Meetup Stockholm 1/11/2012

Similar to Timeseries data in Riak - Riak Meetup Stockholm 1/11/2012 (20)

Recently uploaded

Recently uploaded (20)

Timeseries data in Riak - Riak Meetup Stockholm 1/11/2012