Amazon EC2 may offer the possibility of high performance computing to programmers on a budget. Instead of building and maintaining a permanent Beowulf cluster, we can launch a cluster on-demand using Python and EC2. This talk will cover the basics involved in getting your own cluster running using Python, demonstrate how to run some large parallel computations using Python MPI wrappers, and show some initial results on cluster performance.
Amazon EC2 may offer the possibility of high performance computing to programmers on a budget. Instead of building and maintaining a permanent Beowulf cluster, we can launch a cluster on-demand using Python and EC2. This talk will cover the basics involved in getting your own cluster running using Python, demonstrate how to run some large parallel computations using Python MPI wrappers, and show some initial results on cluster performance.
The data science team at Zymergen is applying machine learning techniques to identify genetic targets, work that is supported by extensive analytical automation that systematically identifies outliers, removes process-related bias, and quantifies performance improvements. We’re using Apache Airflow to construct robust data pipelines that allow us to produce clean, reliable inputs to our predictive models. In this talk, I’ll discuss the unique data processing challenges we face in working with high-throughput, biological data and provide an overview of how we’re using Apache Airflow to meet those challenges.
Online statistical analysis using transducers and sketch algorithmsSimon Belak
Online statistical analysis using transducers and sketch algorithms. Don’t know what either is? You are going to learn something very cool (and perspective-changing) then. Know them, but want an experience report? Got you covered, fam.
Transducers -- composable algorithmic transformation decoupled from input or output sources -- are Clojure’s take on data transformation. In this talk we will look at what makes a transducer; push their composability to the limit chasing the panacea of building complex single-pass transformations out of reusable components (eg. calculating a bunch of descriptive statistics like sum, sum of squares, mean, variance, ... in a single pass without resorting to a spaghetti ball fold); explore how the fact they are decoupled from input and output traversal opens up some interesting possibilities as they can be made to work in both online and batch settings; all drawing from practical examples of using Clojure to analize “awkward-size” data.
Become Thanos of the LambdaLand: Wield all the Infinity StonesSrushith Repakula
Each infinity stone has one significant power, even in the land of lambdas! The six stones are: Space (code size), Time (time), Mind (code), Power (memory), Soul (design principles), Reality (pragmatics). Wielding all of them was not easy, not even for the mad Titan - Thanos.
This talk signifies the importance of Lambda best practices and how you can wield them into a gauntlet and use it to snap more than half of your problems/bugs. It also focuses on various real-life scenarios and experiences relating to the best practices, how and when to apply them, and finally when to violate them.
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019Shubham Tagra
Strata SF 2019 presentation about presto's limitation in leveraging spot nodes, qubole's features to reliably use spot nodes in presto and case study on the efficacy of the solution
Learn about core functions and architecture of Zentral. Zentral is a open source hub to process event streams from osquery and other sources into the ElasticStack. Besides support for distinct osquery features like file carving, Zentral provides numerous integrations for inventory acquisition and alerting.
A presentation discussing how to run a large-scale Drupal installation using Amazon Web Services (AWS). The final system is capable of serving millions of unique pages, and storing tens of terabytes of data.
First presented at DrupalCamp Brighton in January 2015. There is an hour long recording of this presentation at https://www.youtube.com/watch?v=Rh_yBzRpOnk
SF ElasticSearch Meetup 2013.04.06 - MonitoringSushant Shankar
Using monitoring tools Zabbix for systems-level monitoring of ElasticSearch and SPM (http://sematext.com/spm/elasticsearch-performance-monitoring/index.html) for ElasticSearch-specific monitoring. Using these tools was crucial was optimizing index building performance as well as query performance. Some general tips for index building and query performance.
The data science team at Zymergen is applying machine learning techniques to identify genetic targets, work that is supported by extensive analytical automation that systematically identifies outliers, removes process-related bias, and quantifies performance improvements. We’re using Apache Airflow to construct robust data pipelines that allow us to produce clean, reliable inputs to our predictive models. In this talk, I’ll discuss the unique data processing challenges we face in working with high-throughput, biological data and provide an overview of how we’re using Apache Airflow to meet those challenges.
Online statistical analysis using transducers and sketch algorithmsSimon Belak
Online statistical analysis using transducers and sketch algorithms. Don’t know what either is? You are going to learn something very cool (and perspective-changing) then. Know them, but want an experience report? Got you covered, fam.
Transducers -- composable algorithmic transformation decoupled from input or output sources -- are Clojure’s take on data transformation. In this talk we will look at what makes a transducer; push their composability to the limit chasing the panacea of building complex single-pass transformations out of reusable components (eg. calculating a bunch of descriptive statistics like sum, sum of squares, mean, variance, ... in a single pass without resorting to a spaghetti ball fold); explore how the fact they are decoupled from input and output traversal opens up some interesting possibilities as they can be made to work in both online and batch settings; all drawing from practical examples of using Clojure to analize “awkward-size” data.
Become Thanos of the LambdaLand: Wield all the Infinity StonesSrushith Repakula
Each infinity stone has one significant power, even in the land of lambdas! The six stones are: Space (code size), Time (time), Mind (code), Power (memory), Soul (design principles), Reality (pragmatics). Wielding all of them was not easy, not even for the mad Titan - Thanos.
This talk signifies the importance of Lambda best practices and how you can wield them into a gauntlet and use it to snap more than half of your problems/bugs. It also focuses on various real-life scenarios and experiences relating to the best practices, how and when to apply them, and finally when to violate them.
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019Shubham Tagra
Strata SF 2019 presentation about presto's limitation in leveraging spot nodes, qubole's features to reliably use spot nodes in presto and case study on the efficacy of the solution
Learn about core functions and architecture of Zentral. Zentral is a open source hub to process event streams from osquery and other sources into the ElasticStack. Besides support for distinct osquery features like file carving, Zentral provides numerous integrations for inventory acquisition and alerting.
A presentation discussing how to run a large-scale Drupal installation using Amazon Web Services (AWS). The final system is capable of serving millions of unique pages, and storing tens of terabytes of data.
First presented at DrupalCamp Brighton in January 2015. There is an hour long recording of this presentation at https://www.youtube.com/watch?v=Rh_yBzRpOnk
SF ElasticSearch Meetup 2013.04.06 - MonitoringSushant Shankar
Using monitoring tools Zabbix for systems-level monitoring of ElasticSearch and SPM (http://sematext.com/spm/elasticsearch-performance-monitoring/index.html) for ElasticSearch-specific monitoring. Using these tools was crucial was optimizing index building performance as well as query performance. Some general tips for index building and query performance.
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
In the last six month, we have set up Amazon Redshift to power our interactive data analysis at Pinterest. It has tremendously improved the speed of analyzing our data.
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceAmazon Web Services
Elasticsearch is a fully featured search engine used for real-time analytics, and Amazon Elasticsearch Service makes it easy to deploy Elasticsearch clusters on AWS. With Amazon ES, you can ingest and process billions of events per day, and explore the data using Kibana to discover patterns. In this session, we use Apache web logs as example and show you how to build an end-to-end analytics solution.
Deploying any software can be a challenge if you don't understand how resources are used or how to plan for the capacity of your systems. Whether you need to deploy or grow a single MongoDB instance, replica set, or tens of sharded clusters then you probably share the same challenges in trying to size that deployment.
This webinar will cover what resources MongoDB uses, and how to plan for their use in your deployment. Topics covered will include understanding how to model and plan capacity needs for new and growing deployments. The goal of this webinar will be to provide you with the tools needed to be successful in managing your MongoDB capacity planning tasks.
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
Everything generates logs. Applications, infrastructure, security ... everything. Keeping track of the flood of log data is a big challenge, yet critical to your ability to understand your systems and troubleshoot (or prevent) issues. In this session, we will use both Amazon CloudWatch and application logs to show you how to build an end-to-end log analytics solution. First, we cover how to configure an Amazon Elaticsearch Service domain and ingest data into it using Amazon Kinesis Firehose, demonstrating how easy it is to transform data with Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data and configure a secure analytics environment. We demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we dive deep into the Elasticsearch query DSL and review approaches for generating custom, ad-hoc reports.
What is Deep Learning
Rise of Deep Learning
Phases of Deep Learning - Training and Inference
AI & Limitations of Deep Learning
Apache MXNet History, Apache MXNet concepts
How to use Apache MXNet and Spark together for Distributed Inference.
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouDatabricks
Tuning Apache Spark can be complex and difficult, since there are many different configuration parameters and metrics. As the Spark applications running on LinkedIn’s clusters become more diverse and numerous, it is no longer feasible for a small team of Spark experts to help individual users debug and tune their Spark applications. Users need to be able to get advice quickly and iterate on their development, and any problems need to be caught promptly to keep the cluster healthy.
In order to achieve this, we automated the process of identifying performance issues and providing custom tuning advice to users, and made improvements for scaling to handle thousands of Spark applications per day. We leverage Spark History Server (SHS) to gather application metrics, but as the number of Spark applications and size of individual applications have increased, the SHS has not been able to keep up. It can fall hours behind during peak usage. We will discuss changes to the SHS to improve efficiency, performance and stability, enabling SHS to analyze large amount of logs. Another challenge we encountered was a lack of proper metrics related to Spark application performance. We will present new metrics added to Spark which can precisely report resource usage during runtime, and discuss how these are used in heuristics to identify problems.
Based on this analysis, custom recommendations are provided to help users tune their applications. We will also show the impact provided by these tuning recommendations, including improvements in application performance itself and the overall cluster utilization.
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceAmazon Web Services
Everything generates logs. Applications, infrastructure, security ... everything. Keeping track of the flood of log data is a big challenge, yet critical to your ability to understand your systems and troubleshoot (or prevent) issues. In this session, we will use both Amazon CloudWatch and application logs to show you how to build an end-to-end log analytics solution. First, we cover how to configure an Amazon Elaticsearch Service domain and ingest data into it using Amazon Kinesis Firehose, demonstrating how easy it is to transform data with Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data and configure a secure analytics environment. We demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we dive deep into the Elasticsearch query DSL and review approaches for generating custom, ad-hoc reports.
An Open Talk at DeveloperWeek Austin 2017 by Kimberly Wilkins (@dba_denizen), Principal Engineer - Databases at ObjectRocket. Featuring new use cases like Bitcoin, AI, IoT, and all the cool things.
Speaker: Jay Runkel, Principal Solution Architect, MongoDB
Session Type: 40 minute main track session
Track: Operations
When architecting a MongoDB application, one of the most difficult questions to answer is how much hardware (number of shards, number of replicas, and server specifications) am I going to need for an application. Similarly, when deploying in the cloud, how do you estimate your monthly AWS, Azure, or GCP costs given a description of a new application? While there isn’t a precise formula for mapping application features (e.g., document structure, schema, query volumes) into servers, there are various strategies you can use to estimate the MongoDB cluster sizing. This presentation will cover the questions you need to ask and describe how to use this information to estimate the required cluster size or cloud deployment cost.
What You Will Learn:
- How to architect a sharded cluster that provides the required computing resources while minimizing hardware or cloud computing costs
- How to use this information to estimate the overall cluster requirements for IOPS, RAM, cores, disk space, etc.
- What you need to know about the application to estimate a cluster size
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
Everything generates logs. Applications, infrastructure, security ... everything. Keeping track of the flood of log data is a big challenge, yet critical to your ability to understand your systems and troubleshoot (or prevent) issues. In this session, we will use both Amazon CloudWatch and application logs to show you how to build an end-to-end log analytics solution. First, we cover how to configure an Amazon Elaticsearch Service domain and ingest data into it using Amazon Kinesis Firehose, demonstrating how easy it is to transform data with Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data and configure a secure analytics environment. We demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we dive deep into the Elasticsearch query DSL and review approaches for generating custom, ad-hoc reports.
In this webinar, we will be covering general best practices for running MongoDB on AWS.
Topics will range from instance selection to storage selection and service distribution to ensure service availability. We will also look at any specific best practices related to using WiredTiger. We will then shift gears and explore recommended strategies for managing your MongoDB instance on AWS.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
Performance Monitoring for the Cloud - Java2Days 2017Werner Keil
Performance Monitoring tools like Performance Co-Pilot (PCP) existed almost longer than the World Wide Web. It was developed in the early 90s by SGI. Parts were made available open source from 2000 on, which led to a further spread of the tool. In recent years an active community formed and a variety of new features and enhancements were added. PCP is now part of Red Hat and SuSE Linux Enterprise editions and included in many other Linux distributions. Versions for other Unix variants, OS X and Windows also exist. This session compares popular Open Source Monitoring Tools like Performance Co-Pilot, StatsD, Dropwizard Metrics, Prometeus, MicroProfile Metrics or StatsD. How they each support Containers or Virtualization, share data with IT monitoring systems like Nagios or Zabbix, or process analyze and visualize it via Carbon, Graphite or Grafana/ElasticSerch.
We created a web application for a well-known US newspaper, to create a maps-like zooming application on top of the 60,000 newspapers since 1850 and using Solr over the 28,000,000 articles to create an interactive heatmap over it. The out-of-the-box faceting solution was optimized using domain knowledge by order-of-magnitude which allowed us to create a great visual way of exploring trends in historical newspapers.
Everything generates logs. Applications, infrastructure, security ... everything. Keeping track of the flood of log data is a big challenge, yet critical to your ability to understand your systems and troubleshoot (or prevent) issues. In this session, we will use both Amazon CloudWatch and application logs to show you how to build an end-to-end log analytics solution. First, we cover how to configure an Amazon Elaticsearch Service domain and ingest data into it using Amazon Kinesis Firehose, demonstrating how easy it is to transform data with Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data and configure a secure analytics environment. We demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we dive deep into the Elasticsearch query DSL and review approaches for generating custom, ad-hoc reports.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Leading Change strategies and insights for effective change management pdf 1.pdf
SF ElasticSearch Meetup 2012.10.03
1. Scaling ElasticSearch
SF Meetup
2012.10.03
Sushant Shankar
sushant.shankar@33across.com
2. Agenda
• Why we need a search engine
• Monitoring
• Index Building
• Query Performance
3. Who is asdfas
>600,000 Publishers
Machine Learning and Graph algorithms to:
- Build advertising segments
- Extract insights out of social and interest data
- Target via high-performance distributed systems that
integrate with our advertising partners
Website | Facebook | Twitter
4. Why we really need a search engine
Batch! Good for complicated tasks
(Machine Learning, Graph Algorithms, etc.)
… …
6. Mappers to build index
6 nodes, 24GB RAM
16GB for ES service
4 cores
3x 1.5TB drive
>1TB/index
Build index
(replicated)
using MR job
~300M documents
and Bulk API
~5KB / document
~3 hours
10. Index Building: Learnings
• Bulk API
• No replicas
• 2 shards / CPU
• 10,000 documents (users) per indexing
request
• Refresh off (index.refresh_interval = -1)
12. Query Performance: Learnings
• 1-2 Replicas (and for reliability)
• Turn refresh on again (5s default)
• Warm up effect (Index Warm up API 0.20+)
• Optimize API
• Simulate multiple users
Collect information over 1B users internationally – text copied from over 600K publisher sites, images, searches, pages visitedDifferent slices of data – now!