Talk I did on log aggregation with the ELK stack at Leeds DevOps. Covers how we process over 800,000 logs per hour at laterooms, and the cultural changes this has helped drive.
This document summarizes an ELK meetup that took place on March 2nd 2015. It discusses using ELK for log processing, in public clouds like AWS, and activities like kite surfing. The document also provides information on Wind Analytics and their next steps, monitoring large AWS environments, implementing ELK with the right architecture, and Logz.io which provides an ELK as a service solution and insights. It includes demos of Logz.io's architecture and log processing. The meetup concluded with information on job opportunities at Logz.io.
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Andrii Vozniuk
My workshop at the Learning Analytics Summer Institute (LASI) 2016: http://lasi16.snola.es/#!/schedule/113
Educational data continues to grow in volume, velocity and variety. Making sense of the educational data in such conditions requires deployment and usage of appropriate scalable, real-time processing tools supporting a flexible data schema. Elasticsearch is one of the popular open-source tools meeting the enlisted requirements. Initially envisioned as a search engine capable of operating at scale and in real time, Elasticsearch is used by organisations such as Wikimedia and Github, which deal with big data on daily basis. In addition, Elasticsearch is used increasingly often as analytics platform thanks to its scalable architecture and expressive query language. Until recently, the exploitation of Elasticsearch for (learning) analytical purposes by practitioners was hindered by a high entrance barrier due to the complexity of the query language and the query specificities. This is currently changing with the ongoing development of Kibana, an open-source tool that allows to conduct analysis and build visualisations of Elasticsearch data through a graphical user interface. Kibana does not require the user to dive into technical details of the queries (although it is still possible) and hence makes big educational data visualisations accessible to regular users. The additional value of Kibana comes in play whenever several visualisations are combined on a single dashboard, enabling to use multiple coordinated views for an interactive explorative analysis. Both Elasticsearch and Kibana, together with Logstash are part of an analytics stack often referred to as ELK. Logstash supports data acquisition from multiple sources (including twitter, RSS, event logs) thanks to its rich set of available connectors. Custom connectors can be developed for case-specific sources. In addition to the mentioned values, ELK enables building analytics infrastructures decoupled from the learning platform, i.e., it allows to host separately the learning environment (with the analytics functionalities) and the data storage without affecting the end-user experience.
Log analysis using Logstash,ElasticSearch and KibanaAvinash Ramineni
This document provides an overview of Logstash, Elasticsearch, and Kibana for log analysis. It discusses how logging is used for troubleshooting, security, and monitoring. It then introduces Logstash as an open-source log collection and parsing tool. Elasticsearch is described as a search and analytics engine that indexes log data from Logstash. Kibana provides a web interface for visualizing and searching logs stored in Elasticsearch. The document concludes with discussing demo, installation, scaling, and deployment considerations for these log analysis tools.
The document discusses log aggregation and analysis using the Elastic Stack. It describes how the Elastic Stack collects logs from various sources using lightweight data shippers called Beats. The logs are then processed and structured by Logstash before being stored in Elasticsearch for exploration and visualization using Kibana. Demos are provided showing how the Elastic Stack can parse nginx logs, capture logs from a Django application, and monitor node metrics.
Toronto High Scalability meetup - Scaling ELKAndrew Trossman
The document discusses scaling logging and monitoring infrastructure at IBM. It describes:
1) User scenarios that generate varying amounts of log data, from small internal groups generating 3-5 TB/day to many external users generating kilobytes to gigabytes per day.
2) The architecture uses technologies like OpenStack, Docker, Kafka, Logstash, Elasticsearch, Grafana to process and analyze logs and metrics.
3) Key aspects of scaling include automating deployments with Heat and Ansible, optimizing components like Logstash and Elasticsearch, and techniques like sharding indexes across multiple nodes.
ELK (Elasticsearch, Logstash, Kibana) is an open source toolset for centralized logging, where Logstash collects, parses, and filters logs, Elasticsearch stores and indexes logs for search, and Kibana visualizes logs. Logstash processes logs through an input, filter, output pipeline using plugins. It can interpret various log formats and event types. Elasticsearch allows real-time search and scaling through replication/sharding. Kibana provides browser-based dashboards and visualization of Elasticsearch query results.
A presentation about the deployment of an ELK stack at bol.com
At bol.com we use Elasticsearch, Logstash and Kibana in a logsearch system that allows our developers and operations people to easilly access and search thru logevents coming from all layers of its infrastructure.
The presentations explains the initial design and its failures. It continues with explaining the latest design (mid 2014). Its improvements. And finally a set of tips are giving regarding Logstash and Elasticsearch scaling.
These slides were first presented at the Elasticsearch NL meetup on September 22nd 2014 at the Utrecht bol.com HQ.
This document summarizes an ELK meetup that took place on March 2nd 2015. It discusses using ELK for log processing, in public clouds like AWS, and activities like kite surfing. The document also provides information on Wind Analytics and their next steps, monitoring large AWS environments, implementing ELK with the right architecture, and Logz.io which provides an ELK as a service solution and insights. It includes demos of Logz.io's architecture and log processing. The meetup concluded with information on job opportunities at Logz.io.
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Andrii Vozniuk
My workshop at the Learning Analytics Summer Institute (LASI) 2016: http://lasi16.snola.es/#!/schedule/113
Educational data continues to grow in volume, velocity and variety. Making sense of the educational data in such conditions requires deployment and usage of appropriate scalable, real-time processing tools supporting a flexible data schema. Elasticsearch is one of the popular open-source tools meeting the enlisted requirements. Initially envisioned as a search engine capable of operating at scale and in real time, Elasticsearch is used by organisations such as Wikimedia and Github, which deal with big data on daily basis. In addition, Elasticsearch is used increasingly often as analytics platform thanks to its scalable architecture and expressive query language. Until recently, the exploitation of Elasticsearch for (learning) analytical purposes by practitioners was hindered by a high entrance barrier due to the complexity of the query language and the query specificities. This is currently changing with the ongoing development of Kibana, an open-source tool that allows to conduct analysis and build visualisations of Elasticsearch data through a graphical user interface. Kibana does not require the user to dive into technical details of the queries (although it is still possible) and hence makes big educational data visualisations accessible to regular users. The additional value of Kibana comes in play whenever several visualisations are combined on a single dashboard, enabling to use multiple coordinated views for an interactive explorative analysis. Both Elasticsearch and Kibana, together with Logstash are part of an analytics stack often referred to as ELK. Logstash supports data acquisition from multiple sources (including twitter, RSS, event logs) thanks to its rich set of available connectors. Custom connectors can be developed for case-specific sources. In addition to the mentioned values, ELK enables building analytics infrastructures decoupled from the learning platform, i.e., it allows to host separately the learning environment (with the analytics functionalities) and the data storage without affecting the end-user experience.
Log analysis using Logstash,ElasticSearch and KibanaAvinash Ramineni
This document provides an overview of Logstash, Elasticsearch, and Kibana for log analysis. It discusses how logging is used for troubleshooting, security, and monitoring. It then introduces Logstash as an open-source log collection and parsing tool. Elasticsearch is described as a search and analytics engine that indexes log data from Logstash. Kibana provides a web interface for visualizing and searching logs stored in Elasticsearch. The document concludes with discussing demo, installation, scaling, and deployment considerations for these log analysis tools.
The document discusses log aggregation and analysis using the Elastic Stack. It describes how the Elastic Stack collects logs from various sources using lightweight data shippers called Beats. The logs are then processed and structured by Logstash before being stored in Elasticsearch for exploration and visualization using Kibana. Demos are provided showing how the Elastic Stack can parse nginx logs, capture logs from a Django application, and monitor node metrics.
Toronto High Scalability meetup - Scaling ELKAndrew Trossman
The document discusses scaling logging and monitoring infrastructure at IBM. It describes:
1) User scenarios that generate varying amounts of log data, from small internal groups generating 3-5 TB/day to many external users generating kilobytes to gigabytes per day.
2) The architecture uses technologies like OpenStack, Docker, Kafka, Logstash, Elasticsearch, Grafana to process and analyze logs and metrics.
3) Key aspects of scaling include automating deployments with Heat and Ansible, optimizing components like Logstash and Elasticsearch, and techniques like sharding indexes across multiple nodes.
ELK (Elasticsearch, Logstash, Kibana) is an open source toolset for centralized logging, where Logstash collects, parses, and filters logs, Elasticsearch stores and indexes logs for search, and Kibana visualizes logs. Logstash processes logs through an input, filter, output pipeline using plugins. It can interpret various log formats and event types. Elasticsearch allows real-time search and scaling through replication/sharding. Kibana provides browser-based dashboards and visualization of Elasticsearch query results.
A presentation about the deployment of an ELK stack at bol.com
At bol.com we use Elasticsearch, Logstash and Kibana in a logsearch system that allows our developers and operations people to easilly access and search thru logevents coming from all layers of its infrastructure.
The presentations explains the initial design and its failures. It continues with explaining the latest design (mid 2014). Its improvements. And finally a set of tips are giving regarding Logstash and Elasticsearch scaling.
These slides were first presented at the Elasticsearch NL meetup on September 22nd 2014 at the Utrecht bol.com HQ.
How bol.com makes sense of its logs, using the Elastic technology stack.Renzo Tomà
Bol.com uses the Elastic (ELK) stack to make sense of logs from over 1,600 servers and 500-600 million events per day. Key aspects of their system include:
1. Shipping JSON-formatted log events from sources like Apache, databases, and applications to Redis queues to allow multiple Logstash instances to process events in real-time without data loss.
2. Enriching log events with information like request IDs to correlate requests across services, and IP-to-role mappings to identify client roles.
3. Using Elasticsearch aggregations and transformations to generate a directed graph of service dependencies based on logs, to help understand their distributed architecture.
Centralized Logging System Using ELK StackRohit Sharma
Centralized Logging System using ELK Stack
The document discusses setting up a centralized logging system (CLS) using the ELK stack. The ELK stack consists of Logstash to capture and filter logs, Elasticsearch to index and store logs, and Kibana to visualize logs. Logstash agents on each server ship logs to Logstash, which filters and sends logs to Elasticsearch for indexing. Kibana queries Elasticsearch and presents logs through interactive dashboards. A CLS provides benefits like log analysis, auditing, compliance, and a single point of control. The ELK stack is an open-source solution that is scalable, customizable, and integrates with other tools.
This document introduces the (B)ELK stack, which consists of Beats, Elasticsearch, Logstash, and Kibana. It describes each component and how they work together. Beats are lightweight data shippers that collect data from logs and systems. Logstash processes and transforms data from inputs like Beats. Elasticsearch stores and indexes the data. Kibana provides visualization and analytics capabilities. The document provides examples of using each tool and tips for working with the ELK stack.
This presentation deals with logging in the course of mobile development, namely describing the open source logging environment built with ELK stack (ElasticSearch, Logstash and Kibana).
Presentation by Igor Rudyk (Software Engineer, GlobalLogic, Lviv), delivered at Mobile TechTalk Lviv on April 28, 2015.
More details - http://globallogic.com.ua/mobile-techtalk-lviv-2015-report
This document discusses the ELK stack, which consists of Elasticsearch, Logstash, and Kibana. It provides an overview of each component, including that Elasticsearch is a search and analytics engine, Logstash is a data collection engine, and Kibana is a data visualization platform. The document then discusses setting up an ELK stack to index and visualize application logs.
This document discusses using the ELK stack (Elasticsearch, Logstash, Kibana) for log analysis. It describes the author's experience using Splunk and alternatives like Graylog and Elasticsearch before settling on the ELK stack. The key components - Logstash for input, Elasticsearch for storage and searching, and Kibana for the user interface - are explained. Troubleshooting tips are provided around checking that the components are running and communicating properly.
La gestione dei log è da sempre un argomento complesso e nel tempo si sono cercate varie soluzioni più o meno complesse, spesso difficili da integrare nel proprio stack applicativo. Daremo un’ overview generale dei principali sistemi di aggregazione evoluta dei log in realtime (Fluentd, Greylog, eccetera) e illustreremo del motivo ci ha spinto a scegliere ELK per risolvere un’esigenza del nostro cliente; ovvero di consultare i log in modo piu comprensibile da persone non tecniche.
Lo stack ELK (Elasticsearch Logstash Kibana) permette agli sviluppatori di consultare i log in fase di debug / produzione senza avvalersi dello staff sistemistico. Dimostreremo come abbiamo eseguito il deployment dello stack ELK e lo abbiamo implementato per interpretare e strutturare
i log applicativi di Magento.
This document provides an overview of Presto as a Service in Treasure Data, including how Treasure Data deploys and monitors Presto. Key points include:
- Treasure Data offers Presto as an interactive query engine accessible through its API and web console.
- Treasure Data uses blue-green deployments and a private Maven repository to deploy new Presto versions with no downtime.
- Treasure Data monitors Presto using its REST API and collects query logs to analyze performance and detect anomalies.
- Treasure Data implements multi-tenancy in Presto by allocating resources like worker nodes based on customers' price plans and resource usage.
Kibana + timelion: time series with the elastic stackSylvain Wallez
The document discusses Kibana and Timelion, which are tools for visualizing and analyzing time series data in the Elastic Stack. It provides an overview of Kibana's evolution and capabilities for creating dashboards. Timelion is introduced as a scripting language that allows users to transform, aggregate, and calculate on time series data from multiple sources to create visualizations. The document demonstrates Timelion's expression language, which includes functions, combinations, filtering, and attributes to process and render time series graphs.
Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...Amazon Web Services
This document discusses monitoring AWS Lambda functions. It begins with an introduction to AWS Lambda and important concepts like triggers, statelessness, and serverlessness. It then covers how to create and add Lambda functions to infrastructure, and provides examples of common uses. The document emphasizes that collecting data is cheap but not having it when needed can be expensive. It outlines three options for monitoring Lambda functions and how Datadog specifically handles it by adding lines to CloudWatch logs. The presentation concludes with a thank you and opportunities to follow up.
This document summarizes recent updates to Norikra, an open source stream processing server. Key updates include:
1) The addition of suspended queries, which allow queries to be temporarily stopped and resumed later, and NULLABLE fields, which handle missing fields as null values.
2) New listener plugins that allow processing query outputs in customizable ways, such as pushing to users, enqueueing to Kafka, or filtering records.
3) Dynamic plugin reloading that loads newly installed plugins without requiring a restart, improving uptime.
Centralized logging system using mongoDBVivek Parihar
This talk will cover the need of a centralized logging system, showcasing the architecture of the system. Also, I talk about how we ended up building this centralized logging system, What was the need for such a system, what problems we faced, how MongoDB fits into this and what others can learn from this.
I also covered some how can we use mongoDB to make our logging system for realtime analytics and alerting system
. The major use case of this system it to keeping track of meaningful events. This could be -:
1. How many users registered ?
2. How many registrations fails ?
3. Most occurred errors while doing something.
4. Realtime Analytics and Alerts
5. Identifying the possible threats.
This document summarizes Johan Gustavsson's presentation on scaling Hadoop in the cloud. It discusses replacing an on-premise Hadoop cluster with Plazma storage on S3 and job execution in isolated pools. It also covers Treasure Data's Patchset project which aims to support multiple Hadoop versions and allow job-preserving restarts of the Elephant server.
Spark is used to perform in-memory transformations on customer data collected by Totango to generate analytics and insights. Luigi is used as a workflow engine to manage dependencies between batch processing tasks like metrics generation, health scoring, and alerting. The tasks are run on Spark and output to S3. A custom Gameboy controller provides monitoring and management of the Luigi workflow.
Technologies, Data Analytics Service and Enterprise BusinessSATOSHI TAGOMORI
This document discusses technologies for data analytics services for enterprise businesses. It begins by defining enterprise businesses as those "not about IT" and data analytics services as providing insights into business metrics like customer reach, ad views, purchases, and more using data. It then outlines some key technologies needed for such services, including data management systems, distributed processing systems, queues and schedulers, tools for connecting systems, and methods for controlling jobs and workflows with retries to handle failures. Specific challenges around deadlines, idempotent operations, and replay-able workflows are also addressed.
Php johannesburg meetup - talk 2014 - scaling php in the enterpriseSarel van der Walt
This document discusses scaling PHP applications for enterprise environments. It provides tips on optimizing various aspects of PHP applications and infrastructure to improve scalability. These include optimizing databases, caching, background tasks, frameworks, monitoring, and more. Specific technologies and strategies mentioned include Redis, memcached, haproxy, MySQL optimization techniques like archiving, and moving work to the client side where possible using techniques like AngularJS.
Dapper: the microORM that will change your lifeDavide Mauri
ORM or Stored Procedures? Code First or Database First? Ad-Hoc Queries? Impedance Mismatch? If you're a developer or you are a DBA working with developers you have heard all this terms at least once in your life…and usually in the middle of a strong discussion, debating about one or the other. Well, thanks to StackOverflow's Dapper, all these fights are finished. Dapper is a blazing fast microORM that allows developers to map SQL queries to classes automatically, leaving (and encouraging) the usage of stored procedures, parameterized statements and all the good stuff that SQL Server offers (JSON and TVP are supported too!) In this session I'll show how to use Dapper in your projects from the very basis to some more complex usages that will help you to create *really fast* applications without the burden of huge and complex ORMs. The days of Impedance Mismatch are finally over!
How bol.com makes sense of its logs, using the Elastic technology stack.Renzo Tomà
Bol.com uses the Elastic (ELK) stack to make sense of logs from over 1,600 servers and 500-600 million events per day. Key aspects of their system include:
1. Shipping JSON-formatted log events from sources like Apache, databases, and applications to Redis queues to allow multiple Logstash instances to process events in real-time without data loss.
2. Enriching log events with information like request IDs to correlate requests across services, and IP-to-role mappings to identify client roles.
3. Using Elasticsearch aggregations and transformations to generate a directed graph of service dependencies based on logs, to help understand their distributed architecture.
Centralized Logging System Using ELK StackRohit Sharma
Centralized Logging System using ELK Stack
The document discusses setting up a centralized logging system (CLS) using the ELK stack. The ELK stack consists of Logstash to capture and filter logs, Elasticsearch to index and store logs, and Kibana to visualize logs. Logstash agents on each server ship logs to Logstash, which filters and sends logs to Elasticsearch for indexing. Kibana queries Elasticsearch and presents logs through interactive dashboards. A CLS provides benefits like log analysis, auditing, compliance, and a single point of control. The ELK stack is an open-source solution that is scalable, customizable, and integrates with other tools.
This document introduces the (B)ELK stack, which consists of Beats, Elasticsearch, Logstash, and Kibana. It describes each component and how they work together. Beats are lightweight data shippers that collect data from logs and systems. Logstash processes and transforms data from inputs like Beats. Elasticsearch stores and indexes the data. Kibana provides visualization and analytics capabilities. The document provides examples of using each tool and tips for working with the ELK stack.
This presentation deals with logging in the course of mobile development, namely describing the open source logging environment built with ELK stack (ElasticSearch, Logstash and Kibana).
Presentation by Igor Rudyk (Software Engineer, GlobalLogic, Lviv), delivered at Mobile TechTalk Lviv on April 28, 2015.
More details - http://globallogic.com.ua/mobile-techtalk-lviv-2015-report
This document discusses the ELK stack, which consists of Elasticsearch, Logstash, and Kibana. It provides an overview of each component, including that Elasticsearch is a search and analytics engine, Logstash is a data collection engine, and Kibana is a data visualization platform. The document then discusses setting up an ELK stack to index and visualize application logs.
This document discusses using the ELK stack (Elasticsearch, Logstash, Kibana) for log analysis. It describes the author's experience using Splunk and alternatives like Graylog and Elasticsearch before settling on the ELK stack. The key components - Logstash for input, Elasticsearch for storage and searching, and Kibana for the user interface - are explained. Troubleshooting tips are provided around checking that the components are running and communicating properly.
La gestione dei log è da sempre un argomento complesso e nel tempo si sono cercate varie soluzioni più o meno complesse, spesso difficili da integrare nel proprio stack applicativo. Daremo un’ overview generale dei principali sistemi di aggregazione evoluta dei log in realtime (Fluentd, Greylog, eccetera) e illustreremo del motivo ci ha spinto a scegliere ELK per risolvere un’esigenza del nostro cliente; ovvero di consultare i log in modo piu comprensibile da persone non tecniche.
Lo stack ELK (Elasticsearch Logstash Kibana) permette agli sviluppatori di consultare i log in fase di debug / produzione senza avvalersi dello staff sistemistico. Dimostreremo come abbiamo eseguito il deployment dello stack ELK e lo abbiamo implementato per interpretare e strutturare
i log applicativi di Magento.
This document provides an overview of Presto as a Service in Treasure Data, including how Treasure Data deploys and monitors Presto. Key points include:
- Treasure Data offers Presto as an interactive query engine accessible through its API and web console.
- Treasure Data uses blue-green deployments and a private Maven repository to deploy new Presto versions with no downtime.
- Treasure Data monitors Presto using its REST API and collects query logs to analyze performance and detect anomalies.
- Treasure Data implements multi-tenancy in Presto by allocating resources like worker nodes based on customers' price plans and resource usage.
Kibana + timelion: time series with the elastic stackSylvain Wallez
The document discusses Kibana and Timelion, which are tools for visualizing and analyzing time series data in the Elastic Stack. It provides an overview of Kibana's evolution and capabilities for creating dashboards. Timelion is introduced as a scripting language that allows users to transform, aggregate, and calculate on time series data from multiple sources to create visualizations. The document demonstrates Timelion's expression language, which includes functions, combinations, filtering, and attributes to process and render time series graphs.
Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...Amazon Web Services
This document discusses monitoring AWS Lambda functions. It begins with an introduction to AWS Lambda and important concepts like triggers, statelessness, and serverlessness. It then covers how to create and add Lambda functions to infrastructure, and provides examples of common uses. The document emphasizes that collecting data is cheap but not having it when needed can be expensive. It outlines three options for monitoring Lambda functions and how Datadog specifically handles it by adding lines to CloudWatch logs. The presentation concludes with a thank you and opportunities to follow up.
This document summarizes recent updates to Norikra, an open source stream processing server. Key updates include:
1) The addition of suspended queries, which allow queries to be temporarily stopped and resumed later, and NULLABLE fields, which handle missing fields as null values.
2) New listener plugins that allow processing query outputs in customizable ways, such as pushing to users, enqueueing to Kafka, or filtering records.
3) Dynamic plugin reloading that loads newly installed plugins without requiring a restart, improving uptime.
Centralized logging system using mongoDBVivek Parihar
This talk will cover the need of a centralized logging system, showcasing the architecture of the system. Also, I talk about how we ended up building this centralized logging system, What was the need for such a system, what problems we faced, how MongoDB fits into this and what others can learn from this.
I also covered some how can we use mongoDB to make our logging system for realtime analytics and alerting system
. The major use case of this system it to keeping track of meaningful events. This could be -:
1. How many users registered ?
2. How many registrations fails ?
3. Most occurred errors while doing something.
4. Realtime Analytics and Alerts
5. Identifying the possible threats.
This document summarizes Johan Gustavsson's presentation on scaling Hadoop in the cloud. It discusses replacing an on-premise Hadoop cluster with Plazma storage on S3 and job execution in isolated pools. It also covers Treasure Data's Patchset project which aims to support multiple Hadoop versions and allow job-preserving restarts of the Elephant server.
Spark is used to perform in-memory transformations on customer data collected by Totango to generate analytics and insights. Luigi is used as a workflow engine to manage dependencies between batch processing tasks like metrics generation, health scoring, and alerting. The tasks are run on Spark and output to S3. A custom Gameboy controller provides monitoring and management of the Luigi workflow.
Technologies, Data Analytics Service and Enterprise BusinessSATOSHI TAGOMORI
This document discusses technologies for data analytics services for enterprise businesses. It begins by defining enterprise businesses as those "not about IT" and data analytics services as providing insights into business metrics like customer reach, ad views, purchases, and more using data. It then outlines some key technologies needed for such services, including data management systems, distributed processing systems, queues and schedulers, tools for connecting systems, and methods for controlling jobs and workflows with retries to handle failures. Specific challenges around deadlines, idempotent operations, and replay-able workflows are also addressed.
Php johannesburg meetup - talk 2014 - scaling php in the enterpriseSarel van der Walt
This document discusses scaling PHP applications for enterprise environments. It provides tips on optimizing various aspects of PHP applications and infrastructure to improve scalability. These include optimizing databases, caching, background tasks, frameworks, monitoring, and more. Specific technologies and strategies mentioned include Redis, memcached, haproxy, MySQL optimization techniques like archiving, and moving work to the client side where possible using techniques like AngularJS.
Dapper: the microORM that will change your lifeDavide Mauri
ORM or Stored Procedures? Code First or Database First? Ad-Hoc Queries? Impedance Mismatch? If you're a developer or you are a DBA working with developers you have heard all this terms at least once in your life…and usually in the middle of a strong discussion, debating about one or the other. Well, thanks to StackOverflow's Dapper, all these fights are finished. Dapper is a blazing fast microORM that allows developers to map SQL queries to classes automatically, leaving (and encouraging) the usage of stored procedures, parameterized statements and all the good stuff that SQL Server offers (JSON and TVP are supported too!) In this session I'll show how to use Dapper in your projects from the very basis to some more complex usages that will help you to create *really fast* applications without the burden of huge and complex ORMs. The days of Impedance Mismatch are finally over!
Leveraging Databricks for Spark PipelinesRose Toomey
How Coatue Management saved time and money by moving Spark pipelines to Databricks.
Talk given at AWS + Databricks ML Dev Day workshop in NYC on 27 February 2020.
Leveraging Databricks for Spark pipelinesRose Toomey
How Coatue Management saved time and money by moving Spark pipelines to Databricks.
Talk given at AWS + Databricks ML Dev Day workshop in NYC on 27 February 2020.
The document discusses the hype around NoSQL databases and provides guidance on selecting the right database solution. It summarizes different database types and evaluates databases based on characteristics like concurrency control, data storage, replication, and transaction support. The document advises profiling applications carefully before selecting a database and avoiding premature decoupling of data.
PostgreSQL is the new NoSQL - at Devoxx 2018Quentin Adam
Have you seen the latest updates for traditional RDBNS lately? It's insane. They are all catching up and won't be left out. While all NoSQL stores are proposing SQL, all RDMS are proposing top notch JSON support. And it does not stop there.
Latest PostgreSQL version have added new scalability features like table partitioning, query parallelism, pub/sub framework, a new quorum system for data sync. They have also improved their window functions for better time series queryability.
And as it happens, we are using some of these new functionalities at Clever Cloud. In this talk I will showcase some of them to try to convince you that PostgreSQL is the new NoSQL.
talk is recorded here: https://www.youtube.com/watch?v=t8-BQjWJFKw
https://dvbe18.confinabox.com/talk/BLA-3308/PostgreSQL_is_the_new_NoSQL
The document discusses various hosting solutions for Drupal including web hosting, virtual private servers, dedicated servers, and Amazon EC2. It provides details on the costs, reliability, customization options, and maintenance requirements for each solution. Additionally, it covers some key terms and tools related to using Amazon EC2, such as instances, AMIs, EBS, S3 storage, the command line interface, and the ElasticFox browser plugin.
Deferred Processing in Ruby - Philly rb - August 2011rob_dimarco
The document discusses various options for deferred processing and queuing in Ruby, including Delayed::Job, Resque, Amazon SQS, and AMQP. It provides an overview of how each works, how to install and use them, their advantages and disadvantages, and when each may or may not be a good fit for different needs.
Amazing Speed: Elasticsearch for the .NET Developer- Adrian Carr, Codestock 2015Adrian Carr
The document summarizes a presentation about using Elasticsearch to improve search performance for applications with large amounts of data. It describes how the presenter previously used Elasticsearch at a previous job to speed up searches of a growing product catalog. The presenter then demonstrates how to install and use Elasticsearch with .NET applications using the Nest client library. Issues that may arise with integrating Elasticsearch into existing applications are also discussed, such as differences from relational databases and potential rework of user interfaces.
PostgreSQL is a well-known relational database. But in the last few years, it has gained capabilities that previously belonged only to "NoSQL" databases. In this talk, I describe several of PostgreSQL that give it such capabilities.
Slides for a talk.
Talk abstract:
In the dark of the night, if you listen carefully enough, you can hear databases cry. But why? As developers, we rarely consider what happens under the hood of widely used abstractions such as databases. As a consequence, we rarely think about the performance of databases. This is especially true to less widespread, but often very useful NoSQL databases.
In this talk we will take a close look at NoSQL database performance, peek under the hood of the most frequently used features to see how they affect performance and discuss performance issues and bottlenecks inherent to all databases.
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud
This document discusses using Amazon Elastic MapReduce (EMR) for cost-effective big data processing. It describes the author's experience using EMR to process 1TB of log data per week for a startup. Key advantages of EMR include only paying for usage, no hardware to maintain, and ability to customize cluster resources for different jobs. The author outlines best practices learned, such as splitting logs by type and processing in smaller windows, as well as next steps like using spot instances and NoSQL for improved performance and cost savings.
Slides for GUUG FFG2018 talk on rsyslog and containers. Describes the initial steps the rsyslog project took towards containers, uses cases seen by the team, problems we have seen and use of docker inside rsyslog's CI.
From a student to an apache committer practice of apache io tdbjixuan1989
This talk is introduce by Xiangdong Huang, who is a PPMC of Apache IoTDB (incubating) project, at Apache Event at Tsinghua University in China.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduceHadoop User Group
The document discusses the Public Terabyte Dataset Project which aims to create a large crawl of top US domains for public use on Amazon's cloud. It describes how the project uses various Amazon Web Services like Elastic MapReduce and SimpleDB along with technologies like Hadoop, Cascading, and Tika for web crawling and data processing. Common issues encountered include configuration problems, slow performance from fetching all web pages or using Tika language detection, and generating log files instead of results.
A presentation on the selection criteria, testing + evaluation and successful, zero-downtime migration to MongoDB. Additionally details on Wordnik's speed and stability are covered as well as how NoSQL technologies have changed the way Wordnik scales.
The document discusses a presentation on using PostgreSQL as a schemaless database. It provides an overview of different document storage options in PostgreSQL, including XML, hstore, and JSON. It then describes some performance tests conducted to compare loading and querying data stored in these PostgreSQL document formats versus a traditional relational schema and MongoDB. The test results showed PostgreSQL with a relational schema performed best for bulk loading, while PostgreSQL with B-tree indexes outperformed hstore, XML, JSON and MongoDB for primary key lookups. Hstore indexes were much slower than B-tree indexes for simple queries.
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)Maarten Balliauw
The document discusses Azure Cache and Redis. It provides an overview of Redis, including its data types, transactions, pub/sub capabilities, scripting, and sharding/partitioning. It then discusses common patterns for using Redis, such as caching, counting likes on Facebook, getting the latest reviews, rate limiting, and autocompletion. The document emphasizes that Redis is very flexible and can be used for more than just caching, acting as a general datastore. It concludes by recommending a Redis reference book for further learning.
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka
Tomáš Červenka will discuss Hive, an open-source data warehousing system built on Hadoop that provides SQL-like queries over large datasets. He will explain what Hive is useful for (big data analytics and processing), and not useful for (real-time queries and algorithms difficult to parallelize). He will demonstrate how to get started with Hive using Amazon EMR and provide a sample query, and discuss how VisualDNA uses Hive for analytics, reporting pipelines, and machine learning inference. Tips provided include using fast instance types, compression, and partitioning data.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
3. Home growing a metrics culture
Needed visibility of live issues
Had trialled off the shelf before (Splunk)
Hadn’t gained traction
Wanted the data still
4. Options...
Tried Splunk
...Bit pricey, pay for HW and volume of data
indexed
Looked at cloud based options, were also
expensive
19. Why the Queue?
● Resiliancy
● Single source of data for everyone
● Logstash used to recommend RabbitMQ,
now they recommend Redis
● We still use RabbitMQ, works for us
20. Kibana
● Easy to build dashboards
● Gateway drug to ElasticSearch queries
● Examples!
27. Mistake: Using elasticsearch as a TSDB
Lots of graphs just cared about
top level values, should
use a TSDB (such as graphite)
instead
Elasticsearch use case for more in-depth data
analysis
28. Mistake: Trying to keep too much data
● Nodes going out of memory or disk space is
bad
● Long GC can cause nodes to drop
● Can lead to split brain
● More shards = more memory
● usage, watch your scaling
29. Scaling
Hit two bottlenecks
- Ingestion (solved with SSDs)
- Search (solved by scaling horizontally)
1.4.0 brings stability improvements, should
handle oom better
30. Other Mistakes
Should have automated
sooner
(Good chef/puppet support)
Should have used
“normal” logstash more
More
node
More
awesome??
31. What went right?
● Free and easy access to Data
● Doesn’t need to be on elasticsearch, but the
tooling makes it easy
● Give people access and they’ll seek out the
data to drive decisions - start the feedback
loop
● Dev/Test instance
32. ELK in the wild
Data Driven QA
Data Driven...Managering
33. But wait, theres more!
Curator, Kibana 4 (Woo - aggregations),
alerting, linking
logs together…
Too much to
cover here!
34. Thanks for Listening!
More: elasticsearch.org, logstash.net
Blog: www.tegud.net
Twitter: @tegud
Github: www.github.com/tegud
Come say hi!