The document discusses architecting and sizing a Splunk deployment. It covers key factors to consider like data volume, search volume, and roles of servers in a distributed Splunk topology. Recommendations are provided around server configurations based on roles like indexer, search head, and forwarder. A reference server specification is also outlined for estimating hardware needs.
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringDatabricks
The Spark Listener interface provides a fast, simple and efficient route to monitoring and observing your Spark application - and you can start using it in minutes. In this talk, we'll introduce the Spark Listener interfaces available in core and streaming applications, and show a few ways in which they've changed our world for the better at SpotX. If you're looking for a "Eureka!" moment in monitoring or tracking of your Spark apps, look no further than Spark Listeners and this talk!
Scaling ML-Based Threat Detection For Production Cyber AttacksDatabricks
Vulnerabilities such as Spectre and Meltdown continue to plague many production servers, based on Intel CPUs. Our solution involves software-based monitoring of hardware counters and sending that data to Apache Spark clusters for threat detection. We leverage Spark's support for support vector machine (SVM) inference. Our machine learning models are trained off-line by a data scientist within a Jupyter notebook environment. As new models are validated, they can be easily deployed to the Spark cluster from the notebook. We have standardized model export and import using the ONNX machine learning open file format. In our presentation, we will demo the full pipeline, from model training to deployment. We will discuss the various challenges when deploying ML-based cyber-threat detection at scale using Apache Spark. For example, we found that gaps in detection can occur when Spark models are updated. We will describe a novel data ingestion architecture, based on Apache Kafka, that we developed to deal with this issue.
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...Spark Summit
The central premise of DataXu is to apply data science to better marketing. At its core, is the Real Time Bidding Platform that processes 2 Petabytes of data per day and responds to ad auctions at a rate of 2.1 million requests per second across 5 different continents. Serving on top of this platform is Dataxu’s analytics engine that gives their clients insightful analytics reports addressed towards client marketing business questions. Some common requirements for both these platforms are the ability to do real-time processing, scalable machine learning, and ad-hoc analytics. This talk will showcase DataXu’s successful use-cases of using the Apache Spark framework and Databricks to address all of the above challenges while maintaining its agility and rapid prototyping strengths to take a product from initial R&D phase to full production. The team will share their best practices and highlight the steps of large scale Spark ETL processing, model testing, all the way through to interactive analytics.
Headaches and Breakthroughs in Building Continuous ApplicationsDatabricks
At SpotX, we have built and maintained a portfolio of Spark Streaming applications -- all of which process records in the millions per minute. From pure data ingestion, to ETL, to real-time reporting, to live customer-facing products and features, continuous applications are in our DNA. Come along with us as we outline our journey from square one to present in the world of Spark Streaming. We'll detail what we've learned about efficient processing and monitoring, reliability and stability, and long term support of a streaming app. Come learn from our mistakes, and leave with some handy settings and designs you can implement in your own streaming apps.
Life occurs in real-time, and not surprisingly, more solutions are being built using streaming technologies. Event-based architectures are becoming the norm, and customers are expecting immediate access to their data. This new world offers many exciting opportunities, but also some new challenges. What do you do when your streaming data is not complete? What if it relies on another data source? Does the dependent data exist yet, and does it come from a 3rd party? How do we merge a complete picture of data when data is sourcing from multiple places at the same time? A new norm in the world of distributed services. Join us as we dive deep into the technical details around these scenarios and more. Expect to learn about stream-stream joins, enriching stream data using local or remote data, and ways to anticipate and correct errors within the stream. Leave with a better understanding of managing data dependencies within a Spark Structured Streaming application.
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringDatabricks
The Spark Listener interface provides a fast, simple and efficient route to monitoring and observing your Spark application - and you can start using it in minutes. In this talk, we'll introduce the Spark Listener interfaces available in core and streaming applications, and show a few ways in which they've changed our world for the better at SpotX. If you're looking for a "Eureka!" moment in monitoring or tracking of your Spark apps, look no further than Spark Listeners and this talk!
Scaling ML-Based Threat Detection For Production Cyber AttacksDatabricks
Vulnerabilities such as Spectre and Meltdown continue to plague many production servers, based on Intel CPUs. Our solution involves software-based monitoring of hardware counters and sending that data to Apache Spark clusters for threat detection. We leverage Spark's support for support vector machine (SVM) inference. Our machine learning models are trained off-line by a data scientist within a Jupyter notebook environment. As new models are validated, they can be easily deployed to the Spark cluster from the notebook. We have standardized model export and import using the ONNX machine learning open file format. In our presentation, we will demo the full pipeline, from model training to deployment. We will discuss the various challenges when deploying ML-based cyber-threat detection at scale using Apache Spark. For example, we found that gaps in detection can occur when Spark models are updated. We will describe a novel data ingestion architecture, based on Apache Kafka, that we developed to deal with this issue.
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...Spark Summit
The central premise of DataXu is to apply data science to better marketing. At its core, is the Real Time Bidding Platform that processes 2 Petabytes of data per day and responds to ad auctions at a rate of 2.1 million requests per second across 5 different continents. Serving on top of this platform is Dataxu’s analytics engine that gives their clients insightful analytics reports addressed towards client marketing business questions. Some common requirements for both these platforms are the ability to do real-time processing, scalable machine learning, and ad-hoc analytics. This talk will showcase DataXu’s successful use-cases of using the Apache Spark framework and Databricks to address all of the above challenges while maintaining its agility and rapid prototyping strengths to take a product from initial R&D phase to full production. The team will share their best practices and highlight the steps of large scale Spark ETL processing, model testing, all the way through to interactive analytics.
Headaches and Breakthroughs in Building Continuous ApplicationsDatabricks
At SpotX, we have built and maintained a portfolio of Spark Streaming applications -- all of which process records in the millions per minute. From pure data ingestion, to ETL, to real-time reporting, to live customer-facing products and features, continuous applications are in our DNA. Come along with us as we outline our journey from square one to present in the world of Spark Streaming. We'll detail what we've learned about efficient processing and monitoring, reliability and stability, and long term support of a streaming app. Come learn from our mistakes, and leave with some handy settings and designs you can implement in your own streaming apps.
Life occurs in real-time, and not surprisingly, more solutions are being built using streaming technologies. Event-based architectures are becoming the norm, and customers are expecting immediate access to their data. This new world offers many exciting opportunities, but also some new challenges. What do you do when your streaming data is not complete? What if it relies on another data source? Does the dependent data exist yet, and does it come from a 3rd party? How do we merge a complete picture of data when data is sourcing from multiple places at the same time? A new norm in the world of distributed services. Join us as we dive deep into the technical details around these scenarios and more. Expect to learn about stream-stream joins, enriching stream data using local or remote data, and ways to anticipate and correct errors within the stream. Leave with a better understanding of managing data dependencies within a Spark Structured Streaming application.
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real WorldWSO2
Abundant data is all around. The most important aspect is how you as an organization can access the data, process it, and present information to the relevant authorities on time. To gain competitive advantage the means of accessing, processing and presenting the data should be optimal, highly available and scalable.
In this talk, we will discuss how you can leverage WSO2 Data Analytics Server, WSO2 IoT Server, WSO2 Enterprise Service Bus and other WSO2 products in order to analyze the data. We will also discuss different deployment patterns that can provide you with a suitable solution that lets you analyze relevant data historically, in real-time or interactively and predict future states to make better decisions for your organization’s success.
Real-time Analytics with Presto and Apache PinotXiang Fu
Presto Con 2021
In this world, most analytics products either focus on ad-hoc analytics, which requires query flexibility without guaranteed latency, or low latency analytics with limited query capability.
In this talk, we will explore how to get the best of both worlds using Apache Pinot and Presto:
1. How people do analytics today to trade-off Latency and Flexibility: Comparison over analytics on raw data vs pre-join/pre-cube dataset.
2. Introduce Apache Pinot as a column store for fast real-time data analytics and Presto Pinot Connector to cover the entire landscape.
3. Deep dive into Presto Pinot Connector to see how the connector does predicate and aggregation push down.
4. Benchmark results for Presto Pinot connector.
Building a big data intelligent application on top of xPatterns using tools that leverage Spark, Shark, Mesos, Tachyon and Cassandra. Jaws, open sourcing our own spark sql restful service and our own contributions to the Spark and Mesos projects, lessons learned
Logging, Metrics, and APM: The Operations Trifecta (P)Elasticsearch
Take your operational visibility to the next level by bringing your logs, metrics, and now APM data under one roof. Learn how Elasticsearch efficiently combines these types of data in a single store and see how Kibana is used to search logs, analyze metrics, and leverage APM features for better performance monitoring and faster troubleshooting.
The evolution of the big data platform @ Netflix (OSCON 2015)Eva Tse
At Netflix, the big data platform is the foundation for analytics that drive all product decisions that directly impact our customer experience. As for scale, it is one of the top three largest services running at Netflix, in terms of compute power and data size.
In this talk, we will take the audience through a journey to understand how we scale the platform to handle the increasing amount of data (over 500 billion events generated daily), the increasing demand of analytics (which translates to compute power), and the increasing number of users dependent on our platform to make business decisions.
Grab is a “superapp,” an all-in-one app used by tens of millions of users daily across Southeast Asia to shop, hail rides, make payments and order food deliveries. Discover how Grab is using Scylla today to continuously evolve and expand its platform and capabilities to new use cases.
GPU databases - How to use them and what the future holdsArnon Shimoni
GPU databases are the hottest new thing, with about 7 different companies producing their own variant. In this session, we will discuss why they were created, how they are already disrupting the database world, and what the future of computing holds for them.
This presentation demonstrates how the power of NVIDIA GPUs can be leveraged to both accelerate speed to insight and to scale the amount of hot and warm data analyzed to meet the increasing demands of data scientists and business intelligence professionals alike, as well as to find tactical and strategic insights with greater speed on exponentially growing datasets.
Organizations commonly believe that they are advancing in analytical capabilities due to the rise in the data science profession and the myriad of technologies available for analytics, business intelligence, artificial intelligence and machine learning. However, if you do the math, they are actually falling behind as the increases in the rates of data collection volume far outpace the rate of increases in hot and warm data used for analytics. This is causing organizations to rely on an ever-decreasing percentage of their information assets for decision making.
We talk about why GPU databases were created and share what sets SQream apart from other GPU databases, MPP solutions, in memory and Hadoop based analytic alternatives.
We will also outline how an organization can use GPU databases to thrive in the information revolution by using a significantly greater percentage of its data for analytical purposes, obtaining insights that are desired today, and will remain cost-effective into the next few years when data lakes are expected to balloon from petabytes to exabytes.
Bridging the Gap Between Datasets and DataFramesDatabricks
Apple leverages Apache Spark for processing large datasets to power key components of Apple's production services. The majority of users rely on Spark SQL to benefit from state-of-the-art optimizations in Catalyst and Tungsten. As there are multiple APIs to interact with Spark SQL, users have to make a wise decision which one to pick. While DataFrames and SQL are widely used, they lack type safety so that the analysis errors will not be detected during the compile time such as invalid column names or types. Also, the ability to apply the same functional constructions as on RDDs is missing in DataFrames. Datasets expose a type-safe API and support for user-defined closures at the cost of performance. This talk will explain cases when Spark SQL cannot optimize typed Datasets as much as it can optimize DataFrames. We will also present an effort to use bytecode analysis to convert user-defined closures into native Catalyst expressions. This helps Spark to avoid the expensive conversion between the internal format and JVM objects as well as to leverage more Catalyst optimizations. A consequence, we can bridge the gap in performance between Datasets and DataFrames, so that users do not have to sacrifice the benefits of Datasets for performance reasons.
eBay Pulsar: Real-time analytics platformKyoungMo Yang
http://blog.embian.com/74
Pulsar – an open-source, real-time analytics platform and stream processing framework. Pulsar can be used to collect and process user and business events in real time, providing key insights and enabling systems to react to user activities within seconds. In addition to real-time sessionization and multi-dimensional metrics aggregation over time windows, Pulsar uses a SQL-like event processing language to offer custom stream creation through data enrichment, mutation, and filtering. Pulsar scales to a million events per second with high availability. It can be easily integrated with metrics stores like Cassandra and Druid.
Enterprises are Increasingly demanding realtime analytics and insights to power use cases like personalization, monitoring and marketing. We will present Pulsar, a realtime streaming system used at eBay which can scale to millions of events per second with high availability and SQL-like language support, enabling realtime data enrichment, filtering and multi-dimensional metrics aggregation.
We will discuss how Pulsar integrates with a number of open source Apache technologies like Kafka, Hadoop and Kylin (Apache incubator) to achieve the high scalability, availability and flexibility. We use Kafka to replay unprocessed events to avoid data loss and to stream realtime events into Hadoop enabling reconciliation of data between realtime and batch. We use Kylin to provide multi-dimensional OLAP capabilities.
Sobre a 99Taxis
Desafios que levaram ao Splunk
Troubleshooting: análise de errors / performance
Monitoramento de métricas de sistema
Monitoramento de métricas de negócio
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real WorldWSO2
Abundant data is all around. The most important aspect is how you as an organization can access the data, process it, and present information to the relevant authorities on time. To gain competitive advantage the means of accessing, processing and presenting the data should be optimal, highly available and scalable.
In this talk, we will discuss how you can leverage WSO2 Data Analytics Server, WSO2 IoT Server, WSO2 Enterprise Service Bus and other WSO2 products in order to analyze the data. We will also discuss different deployment patterns that can provide you with a suitable solution that lets you analyze relevant data historically, in real-time or interactively and predict future states to make better decisions for your organization’s success.
Real-time Analytics with Presto and Apache PinotXiang Fu
Presto Con 2021
In this world, most analytics products either focus on ad-hoc analytics, which requires query flexibility without guaranteed latency, or low latency analytics with limited query capability.
In this talk, we will explore how to get the best of both worlds using Apache Pinot and Presto:
1. How people do analytics today to trade-off Latency and Flexibility: Comparison over analytics on raw data vs pre-join/pre-cube dataset.
2. Introduce Apache Pinot as a column store for fast real-time data analytics and Presto Pinot Connector to cover the entire landscape.
3. Deep dive into Presto Pinot Connector to see how the connector does predicate and aggregation push down.
4. Benchmark results for Presto Pinot connector.
Building a big data intelligent application on top of xPatterns using tools that leverage Spark, Shark, Mesos, Tachyon and Cassandra. Jaws, open sourcing our own spark sql restful service and our own contributions to the Spark and Mesos projects, lessons learned
Logging, Metrics, and APM: The Operations Trifecta (P)Elasticsearch
Take your operational visibility to the next level by bringing your logs, metrics, and now APM data under one roof. Learn how Elasticsearch efficiently combines these types of data in a single store and see how Kibana is used to search logs, analyze metrics, and leverage APM features for better performance monitoring and faster troubleshooting.
The evolution of the big data platform @ Netflix (OSCON 2015)Eva Tse
At Netflix, the big data platform is the foundation for analytics that drive all product decisions that directly impact our customer experience. As for scale, it is one of the top three largest services running at Netflix, in terms of compute power and data size.
In this talk, we will take the audience through a journey to understand how we scale the platform to handle the increasing amount of data (over 500 billion events generated daily), the increasing demand of analytics (which translates to compute power), and the increasing number of users dependent on our platform to make business decisions.
Grab is a “superapp,” an all-in-one app used by tens of millions of users daily across Southeast Asia to shop, hail rides, make payments and order food deliveries. Discover how Grab is using Scylla today to continuously evolve and expand its platform and capabilities to new use cases.
GPU databases - How to use them and what the future holdsArnon Shimoni
GPU databases are the hottest new thing, with about 7 different companies producing their own variant. In this session, we will discuss why they were created, how they are already disrupting the database world, and what the future of computing holds for them.
This presentation demonstrates how the power of NVIDIA GPUs can be leveraged to both accelerate speed to insight and to scale the amount of hot and warm data analyzed to meet the increasing demands of data scientists and business intelligence professionals alike, as well as to find tactical and strategic insights with greater speed on exponentially growing datasets.
Organizations commonly believe that they are advancing in analytical capabilities due to the rise in the data science profession and the myriad of technologies available for analytics, business intelligence, artificial intelligence and machine learning. However, if you do the math, they are actually falling behind as the increases in the rates of data collection volume far outpace the rate of increases in hot and warm data used for analytics. This is causing organizations to rely on an ever-decreasing percentage of their information assets for decision making.
We talk about why GPU databases were created and share what sets SQream apart from other GPU databases, MPP solutions, in memory and Hadoop based analytic alternatives.
We will also outline how an organization can use GPU databases to thrive in the information revolution by using a significantly greater percentage of its data for analytical purposes, obtaining insights that are desired today, and will remain cost-effective into the next few years when data lakes are expected to balloon from petabytes to exabytes.
Bridging the Gap Between Datasets and DataFramesDatabricks
Apple leverages Apache Spark for processing large datasets to power key components of Apple's production services. The majority of users rely on Spark SQL to benefit from state-of-the-art optimizations in Catalyst and Tungsten. As there are multiple APIs to interact with Spark SQL, users have to make a wise decision which one to pick. While DataFrames and SQL are widely used, they lack type safety so that the analysis errors will not be detected during the compile time such as invalid column names or types. Also, the ability to apply the same functional constructions as on RDDs is missing in DataFrames. Datasets expose a type-safe API and support for user-defined closures at the cost of performance. This talk will explain cases when Spark SQL cannot optimize typed Datasets as much as it can optimize DataFrames. We will also present an effort to use bytecode analysis to convert user-defined closures into native Catalyst expressions. This helps Spark to avoid the expensive conversion between the internal format and JVM objects as well as to leverage more Catalyst optimizations. A consequence, we can bridge the gap in performance between Datasets and DataFrames, so that users do not have to sacrifice the benefits of Datasets for performance reasons.
eBay Pulsar: Real-time analytics platformKyoungMo Yang
http://blog.embian.com/74
Pulsar – an open-source, real-time analytics platform and stream processing framework. Pulsar can be used to collect and process user and business events in real time, providing key insights and enabling systems to react to user activities within seconds. In addition to real-time sessionization and multi-dimensional metrics aggregation over time windows, Pulsar uses a SQL-like event processing language to offer custom stream creation through data enrichment, mutation, and filtering. Pulsar scales to a million events per second with high availability. It can be easily integrated with metrics stores like Cassandra and Druid.
Enterprises are Increasingly demanding realtime analytics and insights to power use cases like personalization, monitoring and marketing. We will present Pulsar, a realtime streaming system used at eBay which can scale to millions of events per second with high availability and SQL-like language support, enabling realtime data enrichment, filtering and multi-dimensional metrics aggregation.
We will discuss how Pulsar integrates with a number of open source Apache technologies like Kafka, Hadoop and Kylin (Apache incubator) to achieve the high scalability, availability and flexibility. We use Kafka to replay unprocessed events to avoid data loss and to stream realtime events into Hadoop enabling reconciliation of data between realtime and batch. We use Kylin to provide multi-dimensional OLAP capabilities.
Sobre a 99Taxis
Desafios que levaram ao Splunk
Troubleshooting: análise de errors / performance
Monitoramento de métricas de sistema
Monitoramento de métricas de negócio
Segue um material interessante do que a Vodafone está fazendo com o Splunk.
Esse em especial foi apresentado no .conf2013, convenção mundial da Splunk e teremos o .conf2014 em Outubro desse ano - programem-se e participem, vale cada centavo!
Lembrando, o .conf2014 já está com as inscrições abertas e em preço promocional.
Mais informações, aqui: http://conf.splunk.com/?r=homepage
About ExxonMobil and Geoffrey Martins
Why Shared Service?
The Four Major Challenges
Final Unified Network
Next Steps
Takeouts on how to build a successful Shared Service
Q&A
Splunk conf2014 - Splunk Monitoring - New Native Tools for Monitoring your Sp...Splunk
Collecting, interpreting and reporting on what Splunk is doing, especially in a distributed Splunk deployment can be challenging for the Splunk administrator. Where is the data that I'm indexing in Splunk coming from? What searches are taking up large amounts of system resources? How are the machines that Splunk is running on performing? This session covers new native tools in the Splunk platform for performing these and other administrative activities.
Splunk is a powerful platform that can harness your machine data and turn it into valuable information thereby enabling your business to make informed decisions, taking your organization from reactive to proactive. Just like any other platform, Splunk is only as powerful as the data it has access to, therefore in this session we will be conducting a walk thru of how to successfully on-board data, with samples of data ranging from simple to complex. We will also be taking a look at how to use common TA’s to bring valuable data into Splunk. This session is designed to give you a better understanding of how to onboard data into Splunk enabling you to unlock the power of your data
ntroduced in Splunk 6.2, the Distributed Management Console helps Splunk Admins deal with the monitoring and health of their Splunk deployment. In Splunk 6.3, we built views for Splunk Index and Volume Usage, Forwarder Monitoring, Search Head Cluster Monitoring, Index Cluster Monitoring, and tools for visualizing your Splunk Topology. Leverage Splunk DMC and come see the forest -and- the trees in your Splunk deployment!
Taking Splunk to the Next Level – ArchitectureSplunk
Are you outgrowing your initial Splunk deployment? Is Splunk becoming mission critical and you need to make sure it's Enterprise ready? Attend this session led by Splunk experts to learn about taking your Splunk deployment to the next level. Learn about Splunk high availability architectures with Splunk Search Head Clustering and Index Replication. Additionally, learn how to manage your deployment with Splunk’s operational and management controls to manage Splunk capacity and end user experience.
Julian Harty, Sr. Sales Engineer, Splunk reviews the internals of how a Splunk search is performed, use of job inspector, search log, and gives a review of where and when to use certain commands.
You Can't Protect What you Can't See. AWS Security Best Practices - Session S...Amazon Web Services
AWS utilises a shared security model where both AWS and the customer share responsibility for the security of data, applications and resources. As part of this model, it is critical that customers leverage services such as AWS CloudTrail, Config, and more. Attend this session to learn best practices on how to leverage these and other AWS services to gain end-to-end visibility and robust security on AWS. You will also hear how customers leverage third-party tools such as the Splunk App for AWS as critical elements of their security posture.
Speakers: Dan Miller, Cloud Sales Director, APAC, Splunk & Simon O'Brien, Senior Systems Engineer, Splunk
Are you outgrowing your initial Splunk deployment? Is Splunk becoming mission critical and you need to make sure it's Enterprise ready? Learn about Splunk high availability architectures with Splunk Search Head Clustering and Index Replication. Additionally, learn how to manage your deployment with Splunk’s operational and management controls to manage Splunk capacity and end user experience.
Taking Splunk to the Next Level – ArchitectureSplunk
Are you outgrowing your initial Splunk deployment? Is Splunk becoming mission critical and you need to make sure it's Enterprise ready? Attend this session led by Splunk experts to learn about taking your Splunk deployment to the next level. Learn about Splunk high availability architectures with Splunk Search Head Clustering and Index Replication. Additionally, learn how to manage your deployment with Splunk’s operational and management controls to manage Splunk capacity and end user experience.
Taking Splunk to the Next Level - ArchitectureSplunk
This session led by Michael Donnelly will teach you how to take your Splunk deployment to the next level. Learn about Splunk high availability architectures with Splunk Search Head Clustering and Index Replication. Additionally, learn how to manage your deployment with Splunk’s operational and management controls to manage Splunk capacity and end user experience
This session covers how Search Head Clustering provides horizontal scalability to support more users and searches, and high availability to ensure users can access their searches at all times. We also cover the architecture, how it works, and best practices guides for large scale deployment.
Splunk Ninjas: New Features, Pivot and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
Splunk Ninjas: New Features, Pivot and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
Splunk Ninjas: New Features, Pivot and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
Splunk Ninja: New Features, Pivot and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB
How do you determine whether your MongoDB Atlas cluster is over provisioned, whether the new feature in your next application release will crush your cluster, or when to increase cluster size based upon planned usage growth? MongoDB Atlas provides over a hundred metrics enabling visibility into the inner workings of MongoDB performance, but how do apply all this information to make capacity planning decisions? This presentation will enable you to effectively analyze your MongoDB performance to optimize your MongoDB Atlas spend and ensure smooth application operation into the future.
Who should attend? Splunk Admin-Intermediate ITOA user-intermediate Security user-intermediate
Description: Join this workshop session to harness power of the Splunk Search Processing Language (SPL). In this hands-on workshop, you'll learn how to use Splunk's simple search language for searching and filtering through data, charting statistics and predicting values, converging data sources and grouping transactions, and finally data science and exploration. We'll begin with basic search commands and build up to more powerful advanced tactics to help you harness your SplunkFu!
You’ll need to install Splunk Enterprise to participate, so don't forget to download Splunk Enterprise (https://www.splunk.com/download) and don’t forget to bring your laptop and follow along for a hands-on experience.
DoneDeal AWS Data Analytics Platform build using AWS products: EMR, Data Pipeline, S3, Kinesis, Redshift and Tableau. Custom built ETL was written using PySpark.
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and toolsSeveralnines
This is part 1 of a webinar trilogy on MySQL Query Tuning, in which we look at query tuning process and tools to help with that. We’ve covered topics such as SQL tuning, indexing, the optimizer and how to leverage EXPLAIN to gain insight into execution plans. Part 1: Query tuning process and tools.
AGENDA
• Query tuning process
- Build
- Collect
- Analyze
- Tune
- Test
• Tools
- tcpdump
- pt-query-digest
SPEAKER
Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.
Deploying any software can be a challenge if you don't understand how resources are used or how to plan for the capacity of your systems. Whether you need to deploy or grow a single MongoDB instance, replica set, or tens of sharded clusters then you probably share the same challenges in trying to size that deployment.
This webinar will cover what resources MongoDB uses, and how to plan for their use in your deployment. Topics covered will include understanding how to model and plan capacity needs for new and growing deployments. The goal of this webinar will be to provide you with the tools needed to be successful in managing your MongoDB capacity planning tasks.
Similar to Deploying Splunk. Arquitetura e dimensionamento do Splunk (20)
O Splunk é o mecanismo para os dados gerados por máquina
Sua infraestrutura de TI gera enormes quantidades de dados. Dados gerados por máquina - gerados por sites, aplicativos, servidores, redes, dispositivos móveis e afins. Ao monitorar e analisar tudo, de clickstreams e transações de clientes à atividade de rede para registrar chamadas, o Splunk transforma seus dados de máquina em percepções valiosas.
Solucione problemas e investigue incidentes de segurança em minutos (não horas ou dias). Monitore sua infraestrutura de ponta a ponta para evitar a degradação ou interrupções de serviço. E obtenha visibilidade em tempo real sobre a experiência, transações e comportamento dos clientes
Coletamos os dados gerados por máquina e damos sentido a eles.
Sentido de TI. Sentido de segurança. Sentido empresarial. Bom senso. O Splunk oferece visibilidade e percepção em tempo realpara TI e os negócios.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
4. About
Me
!
!
5+
years
@
Splunk
Experience:
– SupporAng,
administering,
and
architecAng
large
scale
deployments
– OEM
–
technical
sales
– Strategic
accounts
–
technical
sales
!
!
Based
in
HQ
(San
Francisco
office)
Currently:
Business
Development,
Technical
Synergies
4
8. Sizing
Factors
!
How
much
data
(raw
sizes)?
–
–
–
–
!
Daily
volume
Peak
volume
Retained
volume
(archive
size)
Future
volume?
How
much
searching?
– Use
cases
– How
many
people?
How
oTen?
!
Jobs
– SummarizaAon,
alerAng,
reporAng
8
9. Data
Volumes
!
EsAmate
input
volume
– Verify
raw
log
sizes
– Leverage
_internal
metrics
to
get
actual
input
volumes
!
Confirm
esAmates
with
actual
data
– Create
a
baseline
with
real
or
simulated
data
– Find
compression
rates
(range
from
30%-‐120%,
typically
50%)
– Determine
retenAon
needs
!
Document
use
cases
– Use
case
determines
search
needs
– Plan
for
expansion
as
adopAon
grows
(search
and
volume)
9
10. Data
Sizing
Exercise
!
Via
Filesystem
Use
the
Splunk
log
files:
metrics.log
or
license_usage.log
!
OpAonally:
!
– License
report
view
in
Splunk
Enterprise
6
– S.o.S
app
in
5.x
10
11. Search
Volumes
!
!
!
Gather
use
case
informaAon
– How
much
ad-‐hoc
searching?
– How
much
background
searching?
Ad-‐hoc
searching
– Evaluate
the
data
being
searched
– Evaluate
the
Ame
duraAon
(real-‐Ame
vs
historic)
– Real-‐Ame
searches
are
typically
less
overhead
Background
searching
– AlerAng
and
monitoring
– General
reports
– Summary
indexing
11
12. Final
Sizing
Numbers
!
Data
capacity
– Daily
and
peak
!
User
capacity
– Concurrent
and
total
!
Search
capacity
– Concurrent
and
total
*Document
the
use
cases!!
12
18. What s
a
Reference
Server?
!
!
!
!
!
!
Sizing
based
on
commodity
x86
servers
Dual
quad-‐core
CPUs
at
3.0
GHz
(dual
six
core
is
common)
8
GB
of
RAM
–
(16
GB
is
common)
64-‐bit
OS
4x10k
RPM
local
SAS
drives
in
RAID
1+0
(800+
IOPs)
VariaAons
cause
corresponding
changes
in
performance/
requirements
18
19. Rules
of
Thumb
!
!
!
!
!
!
These
all
have
excepAons
and
qualificaAons
1
reference
indexer
per
100
GB/day
1
reference
search
head
per
8
to
12
users
1
reference
job
server
per
20
concurrent
jobs
1
deployment
server
per
3000
polls/min
ReplicaAon
later…
19
20. How
Many
Indexers?
!
!
Rule
of
thumb
says:
1
per
100
GB/day
Leaves
room
for:
– Daily
peaks
– Light
searching
and
reporAng
for
about
3
concurrent
users
!
Need
more
indexers
for:
– Heavy
reporAng
– More
users
– Slower
disks,
slower
CPUs,
fewer
CPUs
20
21. How
Many
Search
Heads?
!
!
!
!
!
!
!
Rule
of
thumb
says:
1
per
8
to
12
concurrent
users
Limit
is
concurrent
queries
30-‐50
web
sessions
1:1
raAo
of
search
query
to
CPU
core
Only
add
first
search
head
if
≥3
indexers
Don’t
add
search
heads;
add
indexers:
indexers
do
most
work
But
you
need
more
if:
– Running
a
lot
of
scheduled
jobs
on
the
search
head
21
22. Search
Head
vs.
Job
Server
!
Search
Head
Pooling
(SHP):
uses
NFS
to
manage
user
profiles/
configuraAons
and
job
queue
!
!
Search
head
and
job
server
are
equivalent
with
SHP
Use
job
servers
for
scheduled
searches
(summaries,
alerts,
and
reports)
!
Use
search
heads
for
ad-‐hoc
searching
22
23. How
Many
Deployment
Servers?
!
!
!
!
!
!
Rule
of
thumb
says:
1
per
3000
polls/minute
Just
use
one
deployment
server,
and
adjust
the
polling
period
Small
deployments
can
share
the
same
splunkd
Low
requirement
for
disk
performance
(good
candidate
for
virtualizaAon)
Windows
OS
–
1
per
500
polls/minute
Or
use
something
other
than
deployment
server
– Puppet,
SCCM,
cfengine,
chef…
23
24. More
is
Bexer?
!
CPUs
– Search
process
uAlizes
up
to
1
CPU
core
(1:1)
– Indexers
sAll
need
to
do
the
heavy
liTing
(search
exists
on
indexer
AND
search
head)
– Limited
benefit
for
indexing
(up
to
2
CPU
cores
for
indexing)
!
Memory
– Good
for
search
heads
and
indexers
(16+
GB)
!
Disks
– Faster
is
bexer
(15k
rpm)
– More
disks
in
RAID
1+0
=
faster
24
25. Performance
and
Sizing
Tips
System
change
Search
Speed
Indexing
Speed
Faster
disks
++
++
Add
an
indexer
++
++
Add
a
search
head
+
Report
acceleraAon/
summaries
++
25
26. Performance
and
Sizing
Tips
System
change
Search
Speed
OpAmize
searches
+++
OpAmize
field
extracAon
Indexing
Speed
+
+
OpAmize
input
parsing
+
Faster
CPU
26
+
27. Capacity
à
Architecture
!
Sizing
recipe
– Capacity
– Rules
of
thumb
determines
number
of
servers
!
Building
blocks
for
architecture
27
28. Architecture
Factors
!
!
!
!
!
!
!
What
are
my
sizing
requirements?
Where
is
the
data?
Where
are
the
users?
What
is
the
security
policy?
What
are
the
retenAon
and
compliance
policies?
What
is
the
availability
requirement?
What
about
the
cloud?
28
29. Architecture
Factors
!
What
are
my
sizing
requirements?
– Data
capacity
– Search
capacity
– User
capacity
!
Obtained
from
the
sizing
process
29
30. Architecture
Factors
!
Where
is
the
data?
–
–
–
–
–
!
Local
or
remote
to
the
indexing
machine
If
remote
–
use
forwarders
when
possible
Index
in
local
data
center
(zone)
or
index
centrally
Persist
network
data
to
disk
as
a
best
pracAce
Use
intermediate
forwarders
to
distribute
data
Where
are
the
users?
– User
experience
affected
by
search
head
locaAon
ê Time
zone
tuning
(5.x
+)
ê Distributed
search
over
LAN
vs
WAN
30
31. Architecture
Factors
!
What
is
the
Security
Policy?
– Apply
user
security
policies
ê Auth
method
ê Roles
ê Filters
– Apply
physical
security
policies
ê Index
locaAon
31
32. Architecture
Factors
!
RetenAon,
compliance,
governance
– Where
is
the
data
allowed
to
be?
– Where
is
the
data
not
allowed
to
go?
– Where
must
the
data
go?
!
Availability
– Local
failover,
fault-‐tolerance,
clustering
– Geographic
disaster
recovery/fault-‐tolerance
– Index
replicaAon!
32
33. Architecture
Factors
!
!
Same
old
story
Cloud
consideraAons
–
–
–
–
AuthenAcaAon
restricAons
Data
transfer
costs
Security
–
SSL
tunnel
Zones
33
39. Scaling
and
Expansion
!
Add
to
your
indexer
pool
for
more
performance
or
capacity
– Mixed
pla{orm
and
hardware
is
okay
!
Use
search
head
pooling
for
more
UI
capacity
– Requires
NFS
!
Create
new
indexes
for
new
data
types
– Follows
best
pracAces
39
40. Index
ReplicaAon
(aka
Clustering)
!
What
is
it?
– Indexes
are
replicated
to
1
or
more
indexers
(tunable)
– Splunk
controlled
!
Basics
– Master
node
(manages
indexing
and
searching
locaAon)
– Distributed
deployment
– NOT
=
“index
and
forward
“
!
HA
license
VS.
index
replicaAon
– HAL
–
Separate
fully
funcAoning
Splunk
deployments
– IR
–
Data
is
made
available
on
1
or
more
indexers
40
41. Index
ReplicaAon
!
Rule
of
thumb:
1
per
50
GB/day
– Assume
simple
replicaAon
(2
in
existence)
– Increase
in
I/O,
CPU,
and
disk
requirement
!
Need
more
indexers
if:
– Increase
in
replicaAon
factor
– Performance
or
capacity
needs
(search
and
index)
41
42. Index
ReplicaAon
(aka
–
Clustering)
Cluster Master
Search Head Pooling
Peer Nodes
Forwarders
42
43. Index
ReplicaAon
!
!
!
!
Data
is
replicated
and
made
available
WAN
configuraAon
is
not
recommended
Careful
consideraAon
when
inserted
into
standard
topologies
Increases
– Storage
requirement
– I/O
requirement
(disk,
network)
– Total
indexer
requirement
!
Disaster
recovery
and
high
availability
.conf
session
43
44. Final
Thoughts
!
Sizing
is
more
than
data
volume
–
it’s
also
search
load
Centralized
architecture
is
the
baseline
!
VariaAons
on
architecture
are
driven
by
!
–
–
–
–
–
Sizing
Data
locaAon
User
locaAon
RetenAon/access/governance
Availability
requirements
44
45. Next
Steps
1
Download
the
.conf2013
Mobile
App
If
not
iPhone,
iPad
or
Android,
use
the
Web
App
2
Take
the
survey
&
WIN
A
PASS
FOR
.CONF2014…
Or
one
of
these
bags!
3
View
the
sessions
listed
on
the
next
slide
Available
on
the
Mobile
App
45
46. More
InformaAon
!
!
!
!
Contact:
syep@splunk.com
DocumenaAon:
hxp://docs.splunk.com
Answers:
hxp://answers.splunk.com
Other
presentaAons
– Best
PracAces:
Deploying
Splunk
on
Physical,
Virtual,
and
Cloud
Environments
– ArchitecAng
Splunk
for
High
Availability
and
Disaster
Recovery
46