Nats and netlify

•

3 likes•1,169 views

How we built our our data plane using NATS. Unifying the logging for debugging and exporting the data of our different microservices.

Engineering

NATS and Netlify
Building out a data plane for a globally distributed system
@ry_boflavin @netlify

About Me: Ryan Neal
- Head of Infrastructure at Netlify
- Simultaneously fixing and breaking everything
- Senior Dev at Yelp
- Internal tools and metrics team
- Used to about 400k metrics/sec
- 12-18k pageviews/sec
- FDE at Palantir
- Developed counter terrorist software
- 4 Billion records / day
@ry_boflavin @netlify

@ry_boflavin @netlify
A developer’s toolkit for deploying git-backed,
browser-driven sites to an intelligent CDN
- Global CDN
- CI cluster
- Redundant DNS
- Prerender cluster
- Mongo cluster
- Rails cluster
- 4 cloud providers
- 14 PoPs

API cluster
Global CDN
Pre-Render cluster
CI cluster
Distributed systems are cool
buildbot
buildbot
buildbot
buildbot
API
API
API
API
CDN CDN CDN CDN CDN CDN CDN API
API
DB land

The problem
your request
@ry_boflavin @netlify
CDN Node
db

The problem
your request
@ry_boflavin @netlify
CDN Node
proxy
db

The problem
your request
@ry_boflavin @netlify
CDN Node
proxy origin
db

The problem
your request
@ry_boflavin @netlify
CDN Node
proxy origin
api
db

The problem
your request
@ry_boflavin @netlify
CDN Node
db
X
X
X

Unity
- Cohesive view of system
- Traceability between services
- Build for now, not later

The Naive Solution
@ry_boflavin @netlify
random service
logs
Papertrail
daemon
papertrailrandom service

Immediate Problem
- Make the logs searchable
- Easy to add more logs
Long Term Vision
- A generic system to let services push data out
- An easy way to access that data for new and fun uses
Tool Requirements
- Easy installation
- Good scaling factors
- Secure
Spec before building
@ry_boflavin @netlify

And so the story begins...
@ry_boflavin @netlify
Rabbit MQ
- Existing infrastructure
- Didn’t need enterprise messaging features
- Data was only metrics, telemetry and logs
Kafka
- Didn’t want to run zookeeper
- Didn’t need rewind or buffering

Creating the Data plane
@ry_boflavin @netlify
logs nats
random service

Creating the Data plane
@ry_boflavin @netlify
logs nats
random service
streamer

Creating the Data plane
@ry_boflavin @netlify
random service
logs nats
random service
streamer

Creating the Data plane
@ry_boflavin @netlify
random service
logs nats
random service
streamer elastinats
es

Creating the Data plane
@ry_boflavin @netlify
random service
logs nats
random service
streamer
elastinats
elastinats
elastinats
elastinats
es

Creating the Data plane
@ry_boflavin @netlify
random service
logs nats
random service
streamer
taptap
elastinats
elastinats
elastinats
elastinats
es

$Elastinats lessons @ry_boflavin @netlify func(m *nats.Msg) { stats.IncrementMessagesConsumed() go func() { payload := message.NewPayload(string(m.Data), m.Subject) // maybe it is json! _ = json.Unmarshal(m.Data, payload) c <- payload }() } func(m *nats.Msg) { stats.IncrementMessagesConsumed() payload := message.NewPayload(string(m.Data), m.Subject) // maybe it is json! _ = json.Unmarshal(m.Data, payload) c <- payload } - Don’t block the consumer$

$Elastinats lessons @ry_boflavin @netlify - Don’t block the consumer - Use ES’s Bulk API - Add error reporting handle := func(nc *nats.Conn, sub *nats.Subscription, err error) { log.Warn(err) } nc, err := nats.Connect(serverString, nats.Secure(tlsConfig), nats.ErrorHandler(handle)) if err != nil { panic(err) }$

$Elastinats lessons @ry_boflavin @netlify - Don’t block the consumer - Use ES’s Bulk API - Add error reporting - Use buffering ch := make(chan *nats.Msg, 100000) sub, err := nc.ChanSubscribe(subject, ch) if err != nil { log.Fatal("Failed to subscribe") } defer sub.Unsubscribe() sub, err := nc.SubscribeSync(subject) if err != nil { log.Fatal("Failed to subscribe") } defer sub.Unsubscribe() err := sub.SetPendingLimits(numMsgs, numBytes)$

Future Work
@ry_boflavin @netlify
- Use a nats_metrics library to measure and push to nats
- Add more taps for log analysis
- Migrate legacy services to push based metrics and logs

Links
https://github.com/netlify/elastinats
https://github.com/netlify/streamer
https://github.com/rybit/nats_metrics
@ry_boflavin @netlify
https://github.com/rybit
ryan@netlify.com
Check out the slides on slideshare!!

It's difficult to find off-the-shelf, open-source solutions for creating lean, simple, and language-agnostic data-processing pipelines for machine learning (ML). This session shows you how to use Amazon S3, Docker, Amazon EC2, Auto Scaling, and a number of open source libraries as cornerstones to build one. We also share our experience creating elastically scalable and robust ML infrastructure leveraging the Spot instance market.

A TRUE STORY ABOUT DATABASE ORCHESTRATION

InfluxData

Data-Driven Development Era and Its Technologies

SATOSHI TAGOMORI

This document discusses data-driven development and the technologies used in the data analytics process. It covers topics like data collection, storage, processing, and visualization. The document advocates using managed cloud services for data and analytics to focus on data instead of managing infrastructure. Choosing technologies should be based on the type of data and problems to solve, not the other way around. Services like Google BigQuery, Amazon Redshift, and Treasure Data are recommended for their ease of use.

DOWNSAMPLING DATA

InfluxData

The document summarizes a workshop agenda for new InfluxData practitioners. It outlines the schedule of presentations and topics to be covered throughout the day-long workshop, including installing and querying the TICK stack, chronograf dashboarding, writing queries, architecting InfluxEnterprise, optimizing the TICK stack, and downsampling data. The final presentation on downsampling data is given by Michael DeSa and covers the concepts of downsampling, why it is useful, and how to perform it in InfluxDB using continuous queries and Kapacitor.

Apache Airflow Architecture

Gerard Toonstra

Presentation given at Coolblue B.V. demonstrating Apache Airflow (incubating), what we learned from the underlying design principles and how an implementation of these principles reduce the amount of ETL effort. Why choose Airflow? Because it makes your engineering life easier, more people can contribute to how data flows through the organization, so that you can spend more time applying your brain to more difficult problems like Machine Learning, Deep Learning and higher level analysis.

A True Story About Database Orchestration

InfluxData

Netflix runs Presto in its AWS cloud environment to enable low-latency ad-hoc queries on petabyte-scale data stored in S3. Some key things Netflix did include optimizing Presto to read from and write directly to S3, fixing bugs, integrating Presto with its EMR and Ganglia monitoring, and deploying a 100+ node Presto cluster that handles over 1000 queries per day. Performance testing showed Presto was often 10x faster than Hive for various queries and joins. Netflix continues optimizing Presto for its needs like supporting Parquet, ODBC/JDBC drivers, and looking to address current limitations.

Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]

Kevin Xu

Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...

javier ramirez

Big data is amazing. You can get insights from your users, find interesting patterns and have lots of geek fun. Problem is big data usually means many servers, a complex set up, intensive monitoring and a steep learning curve. All those things cost money. If you don’t have the money, you are losing all the fun. In my talk I show you how you can use Google BigQuery to manage big data from your application using a hosted solution. And you can start with less than $1 per month.

SYNCING IN JAVASCRIPT: MULTI-CLIENT COLLABORATION THROUGH DATA SHARING (Steve...

Future Insights

Presentation taken from Future of Web Apps Boston (http://futureofwebapps.com/boston-2014) In this talk, Steve will build a system from scratch for cross-device data synchronization in JavaScript. Through demos, he will explore all the things you're probably not thinking about when rolling your own sync engine, like offline caching, change notification, and conflict resolution. Drawing on his experience from Dropbox, Steve will discuss the thorny challenges around sync and how to solve them.

WordPress RESTful API & Amazon API Gateway - WordCamp Kansai 2016

崇之清水

This document summarizes a presentation given at WordCamp Kansai 2016 about building REST APIs and microservices with Amazon API Gateway and WordPress. The presentation covered: 1. Using REST APIs with WordPress 2. Integrating WordPress with Amazon API Gateway 3. Examples of building WordPress APIs to access third party services and custom backends The presentation provided examples of using API Gateway as a proxy for the WordPress REST API, enabling CORS, and building microservices architectures with API Gateway, Lambda, and other AWS services behind the WordPress frontend. Attendees were encouraged to explore building scalable WordPress sites and applications with REST APIs and serverless architectures on AWS.

1Spatial: Cardiff FME World Tour: Live vessel tracking - FME Cloud

1Spatial

FME Cloud was used to build a solution to ingest live shipping data from an API into an ArcGIS Online map for a client within a week. An FME workspace was created to retrieve vessel positions from the MarineTraffic API every 2 minutes and write them to truncate and update an ArcGIS Online feature service. It also archived positions to a Google Fusion Table. FME Cloud monitoring and notifications ensured the solution ran continuously and issues were detected. The solution met all requirements including being live by Friday without requiring on-premise hardware or ongoing costs beyond the initial subscription period.

Pinot: Realtime Distributed OLAP datastore

Kishore Gopalakrishna

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

Till Rohrmann

You got schema in my json

Philipp Fehre

The document discusses evolving schemas in NoSQL databases. It describes starting with a simple data structure and search index, then enhancing it to support dynamic filtering and cached previews without hitting the main data store. It also covers approaches for migrating data to a new format, such as adding new fields, while the system is live using techniques like versioning the data and writing upgrade functions. Finally, it recommends some lessons learned, such as that schemaless does not mean no schema, changes should be painless, and agile code needs agile data.

Docker for mac & local developer environment optimization

Radek Baczynski

Docker can be used to optimize a local development environment by providing the same environment as production. Issues with performance on Docker for Mac can be addressed through techniques like using delegated volume mounts, removing xdebug, and using a solution like mutagen that syncs files without mounted volumes for faster performance. Mutagen provides near native performance, easy setup and monitoring, and works with any dockerized application.

Kapacitor Stream Processing

InfluxData

Technologies, Data Analytics Service and Enterprise Business

SATOSHI TAGOMORI

This document discusses technologies for data analytics services for enterprise businesses. It begins by defining enterprise businesses as those "not about IT" and data analytics services as providing insights into business metrics like customer reach, ad views, purchases, and more using data. It then outlines some key technologies needed for such services, including data management systems, distributed processing systems, queues and schedulers, tools for connecting systems, and methods for controlling jobs and workflows with retries to handle failures. Specific challenges around deadlines, idempotent operations, and replay-able workflows are also addressed.

Wisely Chen Spark Talk At Spark Gathering in Taiwan

Wisely chen

- The document discusses SparkSQL and Parquet as part of Appier's data pipeline. Appier uses SparkSQL with Parquet on HDFS to enable SQL queries on large datasets and support machine learning applications. - Parquet was chosen because it has good performance, supports nested data structures, and is the preferred file format for SparkSQL. Storing data in Parquet files on HDFS provides a low-cost and scalable solution that gives Appier full control over their data. - SparkSQL allows any Spark or SQL code to be reused across ETL, machine learning, and SQL querying applications. This makes development and maintenance more efficient for Appier's data team.

Elastic @Deezer

Aurelien Saint Requier

This document discusses Deezer's use of Elasticsearch for search, recommendations, and analysis of music metadata. It provides an overview of Deezer's Elasticsearch architecture, which includes indexing over 50 million tracks from Hadoop and replicating indexes across clusters. It also discusses how Deezer queries Elasticsearch using custom analyzers, multi search APIs, and function score queries for recommendations. Finally, it describes Deezer's use of the ELK stack to analyze over 2 billion logs and metrics documents through Kibana dashboards.

Open source data ingestion

Treasure Data, Inc.

This document discusses data collection and ingestion tools. It begins with an overview of data collection versus ingestion, with collection happening at the source and ingestion receiving the data. Examples of data collection tools include rsyslog, Scribe, Flume, Logstash, Heka, and Fluentd. Examples of ingestion tools include RabbitMQ, Kafka, and Fluentd. The document concludes with a case study of asynchronous application logging and challenges to consider.

FUTURESTACK13: Software analytics with Project Rubicon from Alex Kroman Engin...

New Relic

The document discusses Project Rubicon, a software analytics tool from New Relic. It summarizes Rubicon's ability to capture raw event data from applications, allowing users to ask complex questions. It then demonstrates how to write NRQL queries to analyze metrics like page views and custom events over time. NRQL makes it easy to aggregate large amounts of data through functions, time windows, time series, and facets. The document also provides an overview of Rubicon's architecture and how it handles billions of events through techniques like using memory efficiently and building for failure.

Using Embulk at Treasure Data

Treasure Data, Inc.

Muga Nishizawa discusses Embulk, an open-source bulk data loader. Embulk loads records from various sources to various targets in parallel using plugins. Treasure Data customers use Embulk to upload different file formats and data sources to their TD database. While Embulk is focused on bulk loading, TD also develops additional tools to generate Embulk configurations, manage loads over time, and scale Embulk using a MapReduce executor on Hadoop clusters for very large data loads.

Data Warehousing with Amazon Redshift

Amazon Web Services

A closer look at the fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. We'll show how to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution. Speakers: Karan Desai - Solutions Architect, AWS Neel Mitra - Solutions Architect, AWS

React native meetup 2019

Arjun Kava

tado° Makes Your Home Environment Smart with InfluxDB

InfluxData

Michal Knizek, Head of Research and Development at tado° GmbH, will share how they use InfluxData to gather data collected from their Smart Thermostat to help turn any home thermostat into a smart device. This device uses a variety of information collected (geo-location, temperature, user settings, current device functional state) to serve information to automatically control the environment temperature as well as letting users know when the device may need maintenance.

ApacheCon 2021 - Apache NiFi Deep Dive 300

Timothy Spann

21-September-2021 - ApacheCon - Tuesday 17:10 UTC Apache NIFi Deep Dive 300 * https://github.com/tspannhw/EverythingApacheNiFi * https://github.com/tspannhw/FLiP-ApacheCon2021 * https://www.datainmotion.dev/2020/06/no-more-spaghetti-flows.html * https://github.com/tspannhw/FLiP-IoT * https://github.com/tspannhw/FLiP-Energy * https://github.com/tspannhw/FLiP-SOLR * https://github.com/tspannhw/FLiP-EdgeAI * https://github.com/tspannhw/FLiP-CloudQueries * https://github.com/tspannhw/FLiP-Jetson * https://www.linkedin.com/pulse/2021-schedule-tim-spann/ Tuesday 17:10 UTC Apache NIFi Deep Dive 300 Timothy Spann For Data Engineers who have flows already in production, I will dive deep into best practices, advanced use cases, performance optimizations, tips, tricks, edge cases, and interesting examples. This is a master class for those looking to learn quickly things I have picked up after years in the field with Apache NiFi in production. This will be interactive and I encourage questions and discussions. You will take away examples and tips in slides, github, and articles. This talk will cover: Load Balancing Parameters and Parameter Contexts Stateless vs Stateful NiFi Reporting Tasks NiFi CLI NiFi REST Interface DevOps Advanced Record Processing Schemas RetryFlowFile Lookup Services RecordPath Expression Language Advanced Error Handling Techniques Tim Spann is a Developer Advocate @ StreamNative where he works with Apache NiFi, Apache Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.

Evolution of a big data project

Michael Peacock

The document discusses the challenges of processing and storing billions of data inserts per day from vehicle telematics projects. Some key points: - The project involves receiving continuous data streams from over 500 vehicles with 2500 data points captured per vehicle per second, resulting in over 1.5 billion MySQL inserts daily. - A message queue is used to receive the streaming data and buffer inserts to help scale processing. Additional optimizations include bulk loading data via LOAD DATA INFILE for speed. - Sharding and splitting the data across multiple databases by vehicle and time period (weekly tables) helps improve query performance for both live and historical data access. - Techniques like asynchronous requests, caching, and a single entry point

What's hot

Netflix running Presto in the AWS Cloud

Zhenxiao Luo

Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]

Kevin Xu

Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...

javier ramirez

SYNCING IN JAVASCRIPT: MULTI-CLIENT COLLABORATION THROUGH DATA SHARING (Steve...

Future Insights

WordPress RESTful API & Amazon API Gateway - WordCamp Kansai 2016

崇之清水

1Spatial: Cardiff FME World Tour: Live vessel tracking - FME Cloud

1Spatial

Pinot: Realtime Distributed OLAP datastore

Kishore Gopalakrishna

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

Till Rohrmann

You got schema in my json

Philipp Fehre

Docker for mac & local developer environment optimization

Radek Baczynski

Kapacitor Stream Processing

InfluxData

Technologies, Data Analytics Service and Enterprise Business

SATOSHI TAGOMORI

Wisely Chen Spark Talk At Spark Gathering in Taiwan

Wisely chen

Elastic @Deezer

Aurelien Saint Requier

Open source data ingestion

Treasure Data, Inc.

FUTURESTACK13: Software analytics with Project Rubicon from Alex Kroman Engin...

New Relic

Using Embulk at Treasure Data

Treasure Data, Inc.

Data Warehousing with Amazon Redshift

Amazon Web Services

React native meetup 2019

Arjun Kava

tado° Makes Your Home Environment Smart with InfluxDB

InfluxData

What's hot (20)

Netflix running Presto in the AWS Cloud

Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]

Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...

SYNCING IN JAVASCRIPT: MULTI-CLIENT COLLABORATION THROUGH DATA SHARING (Steve...

WordPress RESTful API & Amazon API Gateway - WordCamp Kansai 2016

1Spatial: Cardiff FME World Tour: Live vessel tracking - FME Cloud

Pinot: Realtime Distributed OLAP datastore

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

You got schema in my json

Docker for mac & local developer environment optimization

Kapacitor Stream Processing

Technologies, Data Analytics Service and Enterprise Business

Wisely Chen Spark Talk At Spark Gathering in Taiwan

Elastic @Deezer

Open source data ingestion

FUTURESTACK13: Software analytics with Project Rubicon from Alex Kroman Engin...

Using Embulk at Treasure Data

Data Warehousing with Amazon Redshift

React native meetup 2019

tado° Makes Your Home Environment Smart with InfluxDB

Similar to Nats and netlify

ApacheCon 2021 - Apache NiFi Deep Dive 300

Timothy Spann

Evolution of a big data project

Michael Peacock

A general introduction to Spring Data / Neo4J

Florent Biville

Apache Flink Adoption at Shopify

Yaroslav Tkachenko

At the beginning of 2021, Shopify Data Platform decided to adopt Apache Flink to enable modern stateful stream-processing. Shopify had a lot of experience with other streaming technologies, but Flink was a great fit due to its state management primitives. After about six months, Shopify now has a flourishing ecosystem of tools, tens of prototypes from many teams across the company and a few large use-cases in production. Yaroslav will share a story about not just building a single data pipeline but building a sustainable ecosystem. You can learn about how they planned their platform roadmap, the tools and libraries Shopify built, the decision to fork Flink, and how Shopify partnered with other teams and drove the adoption of streaming at the company.

Introduction to data flow management using apache nifi

Anshuman Ghosh

Neo4j Database and Graph Platform Overview

Neo4j

The document describes the Neo4j graph database and platform vision. It discusses key components like index-free adjacency, ACID transactions, clustering, and hardware optimizations. It outlines use cases for graph analytics, transactions, AI, and data integration. It also covers drivers, APIs, visualization, and administration tools. Finally, it previews upcoming innovations in Neo4j 3.4 like geospatial support, native string indexes, and rolling upgrades.

Case Study: VF Corporation Takes a Practical Approach to Improving its MOJO w...

CA Technologies

VF Corporation uses CA Application Performance Management (APM) to monitor their ecommerce system called MOJO. With minimal customizations to APM, VF has been able to improve MOJO's performance, minimize downtime, and deliver useful performance data to application teams. Some key results include reduced average response times from 30 to under 15 seconds after a data center upgrade, preventing downtime from a growing directory issue, and correlating batch job runs to system impacts.

Data science for infrastructure dev week 2022

ZainAsgar1

The document discusses using data science and automation for infrastructure monitoring. It introduces Pixie, a tool that allows users to collect raw data, transform it into signals, and then take actions based on those signals. Two examples are provided: 1) detecting SQL injections from application logs and sending Slack alerts, and 2) automatically scaling a deployment based on HTTP request throughput metrics. Pixie uses an embedded domain-specific language called PxL to define logical data workflows and queries.

Machine Learning Platform in LINE Fukuoka

LINE Corporation

Data Labs supports LINE services by performing high-level data analysis and machine learning model development using their Hadoop data lake. The machine learning lifecycle involves many steps beyond just model training, including data collection, preprocessing, deployment, and monitoring. LINE's platform provides the necessary infrastructure to efficiently perform each step of the lifecycle, allowing for rapid continuous development and experimentation through tools like HDFS, Kubernetes, Jupyter notebooks, and CI/CD pipelines.

Near real-time anomaly detection at Lyft

markgrover

IoT Edge Data Processing with NVidia Jetson Nano oct 3 2019

Timothy Spann

Lares from LOW to PWNED

Chris Gates

This document discusses various techniques for finding and exploiting vulnerabilities during a penetration test when vulnerabilities are marked as "low" or "medium" in severity. It argues that penetration testers and clients should not rely solely on vulnerability scanners and should thoroughly investigate even lower severity issues. Specific techniques mentioned include exploiting default credentials on services like VNC, exploiting exposed admin interfaces found through tools like Metasploit, taking advantage of browsable directories with backups or other sensitive files, exploiting SharePoint misconfigurations, exploiting HTTP PUT or WebDAV configurations, exploiting Apple Filing Protocol, and exploiting trace.axd to view request details in .NET applications. The document emphasizes finding overlooked vulnerabilities and keeping "a human in the mix" rather than full reliance

Apache Flink Training: System Overview

Flink Forward

This document provides an overview of Apache Flink, an open-source stream processing framework. It discusses Flink's capabilities in supporting streaming, batch, and iterative processing natively through a streaming dataflow model. It also describes Flink's architecture including the client, job manager, task managers, and various execution setups like local, remote, YARN, and embedded. Finally, it compares Flink to other stream and batch processing systems in terms of their APIs, fault tolerance guarantees, and strengths.

The devops approach to monitoring, Open Source and Infrastructure as Code Style

Julien Pivotto

Data Science

Ahmet Bulut

Improvements to Flink & it's Applications in Alibaba Search

DataWorks Summit/Hadoop Summit

The document discusses improvements made to Apache Flink by Alibaba, called Blink. Blink provides a unified SQL layer for both batch and streaming processes. It supports features like UDF/UDTF/UDAGG, stream-stream joins, windowing, and retraction. Blink also improves Flink's runtime to be more reliable and production-quality when running on large YARN clusters. It has a new architecture using a JobMaster and TaskExecutors. Checkpointing and state management were optimized for incremental backups. Blink has been running in production supporting many of Alibaba's critical systems and processing massive amounts of data.

Neo4j Vision and Roadmap

Neo4j

This document provides an overview of Neo4j's vision and roadmap. It discusses Neo4j's goal of being a modern, enterprise data platform that can power both operational and analytical workloads. Key aspects of Neo4j's strategy include building a fully cloud-native database designed for operational and analytical graph workloads, with autonomous clustering to provide unlimited horizontal scalability. The document also briefly reviews recent Neo4j releases and highlights some new features like graph pattern matching and change data capture.

Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...

Demi Ben-Ari

Once you start working with distributed Big Data systems, you start discovering a whole bunch of problems you won’t find in monolithic systems. All of a sudden to monitor all of the components becomes a big data problem itself. In the talk we’ll mention all of the aspects that you should take in consideration when monitoring a distributed system once you’re using tools like: Web Services, Apache Spark, Cassandra, MongoDB, Amazon Web Services. Not only the tools, what should you monitor about the actual data that flows in the system? And we’ll cover the simplest solution with your day to day open source tools, the surprising thing, that it comes not from an Ops Guy.

Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...

Codemotion

Once you start working with Big Data systems, you discover a whole bunch of problems you won’t find in monolithic systems. Monitoring all of the components becomes a big data problem itself. In the talk we’ll mention all of the aspects that you should take in consideration when monitoring a distributed system using tools like: Web Services,Spark,Cassandra,MongoDB,AWS. Not only the tools, what should you monitor about the actual data that flows in the system? We’ll cover the simplest solution with your day to day open source tools, the surprising thing, that it comes not from an Ops Guy.

Python+gradle

Stephen Holsapple

This document discusses Python packaging tools like setuptools and pip. It notes that setuptools is the core API that most packaging tools use for building, packaging, metadata, and dependency management. Pip is an implementation of the setuptools programming interface and is useful for finding, installing, and managing dependencies. The document recommends using Gradle as a build orchestrator to resolve dependencies, run builds, tests, and publishing. It proposes ways to integrate Python packaging metadata with Gradle.

Similar to Nats and netlify (20)

ApacheCon 2021 - Apache NiFi Deep Dive 300

Evolution of a big data project

A general introduction to Spring Data / Neo4J

Apache Flink Adoption at Shopify

Introduction to data flow management using apache nifi

Neo4j Database and Graph Platform Overview

Case Study: VF Corporation Takes a Practical Approach to Improving its MOJO w...

Data science for infrastructure dev week 2022

Machine Learning Platform in LINE Fukuoka

Near real-time anomaly detection at Lyft

IoT Edge Data Processing with NVidia Jetson Nano oct 3 2019

Lares from LOW to PWNED

Apache Flink Training: System Overview

The devops approach to monitoring, Open Source and Infrastructure as Code Style

Data Science

Improvements to Flink & it's Applications in Alibaba Search

Neo4j Vision and Roadmap

Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...

Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...

Python+gradle

Recently uploaded

spirit beverages ppt without graphics.pptx

Madan Karki

AI assisted telemedicine KIOSK for Rural India.pptx

architagupta876

BRAIN TUMOR DETECTION for seminar ppt.pdf

LAXMAREDDY22

Welding Metallurgy Ferrous Materials.pdf

AjmalKhan50578

Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024

Sinan KOZAK

Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.

学校原版美国波士顿大学毕业证学历学位证书原版一模一样

171ticu

原版一模一样【微信：741003700 】【美国波士顿大学毕业证学历学位证书】【微信：741003700 】学位证，留信认证（真实可查，永久存档）offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原海外各大学 Bachelor Diploma degree, Master Degree Diploma 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

Curve Fitting in Numerical Methods Regression

Nada Hikmah

LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant

Anant Corporation

Properties Railway Sleepers and Test.pptx

MDSABBIROJJAMANPAYEL

Generative AI leverages algorithms to create various forms of content

Hitesh Mohapatra

2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf

Yasser Mahgoub

132/33KV substation case study Presentation

kandramariana6

Material for memory and display system h

gowrishankartb2005

Embedded machine learning-based road conditions and driving behavior monitoring

IJECEIAES

Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.

22CYT12-Unit-V-E Waste and its Management.ppt

KrishnaveniKrishnara1

Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样

insn4465

原版一模一样【微信：741003700 】【(csu毕业证书)查尔斯特大学毕业证硕士学历】【微信：741003700 】学位证，留信认证（真实可查，永久存档）offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原海外各大学 Bachelor Diploma degree, Master Degree Diploma 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...

IJECEIAES

Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to precisely delineate tumor boundaries from magnetic resonance imaging (MRI) scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The model is rigorously trained and evaluated, exhibiting remarkable performance metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical image analysis and enhance healthcare outcomes. This research paves the way for future exploration and optimization of advanced CNN models in medical imaging, emphasizing addressing false positives and resource efficiency.

Hematology Analyzer Machine - Complete Blood Count

shahdabdulbaset

The CBC machine is a common diagnostic tool used by doctors to measure a patient's red blood cell count, white blood cell count and platelet count. The machine uses a small sample of the patient's blood, which is then placed into special tubes and analyzed. The results of the analysis are then displayed on a screen for the doctor to review. The CBC machine is an important tool for diagnosing various conditions, such as anemia, infection and leukemia. It can also help to monitor a patient's response to treatment.

CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT

jpsjournal1

The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been referred to as the "New Great Game." This research centres on the power struggle, considering geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil politics, and conventional and nontraditional security are all explored and explained by the researcher. Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role in Central Asia. This study adheres to the empirical epistemological method and has taken care of objectivity. This study analyze primary and secondary research documents critically to elaborate role of china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade, pipeline politics, and winning states, according to this study, thanks to important instruments like the Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study, China is seeing significant success in commerce, pipeline politics, and gaining influence on other governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai Cooperation Organisation and the Belt and Road Economic Initiative.

Software Engineering and Project Management - Introduction, Modeling Concepts...

Prakhyath Rai

Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.

Recently uploaded (20)

spirit beverages ppt without graphics.pptx

AI assisted telemedicine KIOSK for Rural India.pptx

BRAIN TUMOR DETECTION for seminar ppt.pdf

Welding Metallurgy Ferrous Materials.pdf

Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024

学校原版美国波士顿大学毕业证学历学位证书原版一模一样

Curve Fitting in Numerical Methods Regression

LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant

Properties Railway Sleepers and Test.pptx

Generative AI leverages algorithms to create various forms of content

2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf

132/33KV substation case study Presentation

Material for memory and display system h

Embedded machine learning-based road conditions and driving behavior monitoring

22CYT12-Unit-V-E Waste and its Management.ppt

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样

Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...

Hematology Analyzer Machine - Complete Blood Count

CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT

Software Engineering and Project Management - Introduction, Modeling Concepts...

Nats and netlify

1. NATS and Netlify Building out a data plane for a globally distributed system @ry_boflavin @netlify

2. About Me: Ryan Neal - Head of Infrastructure at Netlify - Simultaneously fixing and breaking everything - Senior Dev at Yelp - Internal tools and metrics team - Used to about 400k metrics/sec - 12-18k pageviews/sec - FDE at Palantir - Developed counter terrorist software - 4 Billion records / day @ry_boflavin @netlify

3. @ry_boflavin @netlify A developer’s toolkit for deploying git-backed, browser-driven sites to an intelligent CDN - Global CDN - CI cluster - Redundant DNS - Prerender cluster - Mongo cluster - Rails cluster - 4 cloud providers - 14 PoPs

4. API cluster Global CDN Pre-Render cluster CI cluster Distributed systems are cool buildbot buildbot buildbot buildbot API API API API CDN CDN CDN CDN CDN CDN CDN API API DB land

5. The problem your request @ry_boflavin @netlify CDN Node db

6. The problem your request @ry_boflavin @netlify CDN Node db

7. The problem your request @ry_boflavin @netlify CDN Node proxy db

8. The problem your request @ry_boflavin @netlify CDN Node proxy origin db

9. The problem your request @ry_boflavin @netlify CDN Node proxy origin api db

10. The problem your request @ry_boflavin @netlify CDN Node db X X X

11. Unity - Cohesive view of system - Traceability between services - Build for now, not later

12. The Naive Solution @ry_boflavin @netlify random service logs Papertrail daemon papertrailrandom service

13. Immediate Problem - Make the logs searchable - Easy to add more logs Long Term Vision - A generic system to let services push data out - An easy way to access that data for new and fun uses Tool Requirements - Easy installation - Good scaling factors - Secure Spec before building @ry_boflavin @netlify

14. And so the story begins... @ry_boflavin @netlify Rabbit MQ - Existing infrastructure - Didn’t need enterprise messaging features - Data was only metrics, telemetry and logs Kafka - Didn’t want to run zookeeper - Didn’t need rewind or buffering

15. Creating the Data plane @ry_boflavin @netlify logs nats random service

16. Creating the Data plane @ry_boflavin @netlify logs nats random service streamer

17. Creating the Data plane @ry_boflavin @netlify random service logs nats random service streamer

18. Creating the Data plane @ry_boflavin @netlify random service logs nats random service streamer elastinats es

19. Creating the Data plane @ry_boflavin @netlify random service logs nats random service streamer elastinats elastinats elastinats elastinats es

20. Creating the Data plane @ry_boflavin @netlify random service logs nats random service streamer taptap elastinats elastinats elastinats elastinats es

21. Elastinats lessons @ry_boflavin @netlify func(m *nats.Msg) { stats.IncrementMessagesConsumed() go func() { payload := message.NewPayload(string(m.Data), m.Subject) // maybe it is json! _ = json.Unmarshal(m.Data, payload) c <- payload }() } func(m *nats.Msg) { stats.IncrementMessagesConsumed() payload := message.NewPayload(string(m.Data), m.Subject) // maybe it is json! _ = json.Unmarshal(m.Data, payload) c <- payload } - Don’t block the consumer

22. Elastinats lessons @ry_boflavin @netlify - Don’t block the consumer - Use ES’s Bulk API

23. Elastinats lessons @ry_boflavin @netlify - Don’t block the consumer - Use ES’s Bulk API - Add error reporting handle := func(nc *nats.Conn, sub *nats.Subscription, err error) { log.Warn(err) } nc, err := nats.Connect(serverString, nats.Secure(tlsConfig), nats.ErrorHandler(handle)) if err != nil { panic(err) }

24. Elastinats lessons @ry_boflavin @netlify - Don’t block the consumer - Use ES’s Bulk API - Add error reporting - Use buffering ch := make(chan *nats.Msg, 100000) sub, err := nc.ChanSubscribe(subject, ch) if err != nil { log.Fatal("Failed to subscribe") } defer sub.Unsubscribe() sub, err := nc.SubscribeSync(subject) if err != nil { log.Fatal("Failed to subscribe") } defer sub.Unsubscribe() err := sub.SetPendingLimits(numMsgs, numBytes)

25. Future Work @ry_boflavin @netlify

26. Future Work @ry_boflavin @netlify - Use a nats_metrics library to measure and push to nats - Add more taps for log analysis - Migrate legacy services to push based metrics and logs

27. @ry_boflavin @netlify Questions?

28. Links https://github.com/netlify/elastinats https://github.com/netlify/streamer https://github.com/rybit/nats_metrics @ry_boflavin @netlify https://github.com/rybit ryan@netlify.com Check out the slides on slideshare!!

Nats and netlify

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Nats and netlify

Similar to Nats and netlify (20)

Recently uploaded

Recently uploaded (20)

Nats and netlify