The document discusses composing reusable extract-transform-load (ETL) processes on Hadoop. It covers the data science lifecycle of acquiring, analyzing and taking action on data. It states that 80% of work in data science is spent on acquiring and preparing data. The document then discusses using Cascading, an abstraction framework for building MapReduce jobs, to create reusable ETL processes that are linearly scalable and follow a single-purpose composable design.
Benchmark OpenERP : testez les performances et la robustesse d'OpenERP sur vos volumes de données.
Le benchmark technique d'OpenERP propose des scripts automatiques pour tester la charge d'OpenERP vis-à-vis de vos données et de votre activité.
My presentation from Optimise Oxford in November 2016.
In it I discuss why you should be making use of server logs, and how to go about utilising them.
Stress Testing at Twitter: a tale of New Year EvesHerval Freire
Failure testing is a fundamental piece of Twitter’s reliability engineering. Over the years, we developed a rich toolchain that allows us to detect and fix scalability problems long before they happen. In this talk, we’ll cover some of the strategies we employ and discuss our always evolving approach to API stress testing and its “unit test” equivalent, redline testing.
These are the slides from my Senacor DevCon Talk 2017. The talk is about dos and don'ts when designing REST APIs. It comes with a couple of easy to follow rules for your daily REST API designing needs.
To Hire, or to train, that is the question (Percona Live 2014)Geoffrey Anderson
"We're hiring!"
How many times have you heard this phrase at a conference? Every database-driven company is hiring and that makes for pretty stiff competition when trying to get a new DBA. Instead of searching for the perfect database administrator from a conference or Linkedin, why not look internally at your organization for system administrators or engineers who may be an equally good fit given the right training.
In this talk, I'll explain how the DBAs at Box developed a knowledge-sharing culture around databases and disseminated important learnings to other members of the company. I'll also cover the mentorship process we established to train other members of our Operations team to become rock star DBAs and manage our MySQL and HBase infrastructure at Box.
Benchmark OpenERP : testez les performances et la robustesse d'OpenERP sur vos volumes de données.
Le benchmark technique d'OpenERP propose des scripts automatiques pour tester la charge d'OpenERP vis-à-vis de vos données et de votre activité.
My presentation from Optimise Oxford in November 2016.
In it I discuss why you should be making use of server logs, and how to go about utilising them.
Stress Testing at Twitter: a tale of New Year EvesHerval Freire
Failure testing is a fundamental piece of Twitter’s reliability engineering. Over the years, we developed a rich toolchain that allows us to detect and fix scalability problems long before they happen. In this talk, we’ll cover some of the strategies we employ and discuss our always evolving approach to API stress testing and its “unit test” equivalent, redline testing.
These are the slides from my Senacor DevCon Talk 2017. The talk is about dos and don'ts when designing REST APIs. It comes with a couple of easy to follow rules for your daily REST API designing needs.
To Hire, or to train, that is the question (Percona Live 2014)Geoffrey Anderson
"We're hiring!"
How many times have you heard this phrase at a conference? Every database-driven company is hiring and that makes for pretty stiff competition when trying to get a new DBA. Instead of searching for the perfect database administrator from a conference or Linkedin, why not look internally at your organization for system administrators or engineers who may be an equally good fit given the right training.
In this talk, I'll explain how the DBAs at Box developed a knowledge-sharing culture around databases and disseminated important learnings to other members of the company. I'll also cover the mentorship process we established to train other members of our Operations team to become rock star DBAs and manage our MySQL and HBase infrastructure at Box.
Systems Monitoring with Prometheus (Devops Ireland April 2015)Brian Brazil
Monitoring means many things to many people. This talk looks at Systems Monitoring, that is how to keep an eye on a given system and use this as part of overall management of a system. This talk will cover Why one monitors, What to monitor, How to monitor, the general design of a monitoring system and how Prometheus is a good fit for this in terms of instrumentation, consoles, alerts, general system health and sanity.
Prometheus is a next-generation monitoring system publicly announced earlier this year, developed by companies including SoundCloud, locals Boxever and Docker. Since launch there has been wide-spread interest, and many community contributions.
For more information see http://prometheus.io or http://www.boxever.com/tag/monitoring
Andrew Betts Web Developer, The Financial Times at Fastly Altitude 2016
Running custom code at the Edge using a standard language is one of the biggest advantages of working with Fastly’s CDN. Andrew gives you a tour of all the problems the Financial Times and Nikkei solve in VCL and how their solutions work.
These slides show how to reduce latency on websites and reduce bandwidth for improved user experience.
Covering network, compression, caching, etags, application optimisation, sphinxsearch, memcache, db optimisation
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRick Copeland
With over 180,000 projects and over 2 million users, SourceForge has tons of data about people developing and downloading open source projects. Until recently, however, that data didn't translate into usable information, so Zarkov was born. Zarkov is system that captures user events, logs them to a MongoDB collection, and aggregates them into useful data about user behavior and project statistics. This talk will discuss the components of Zarkov, including its use of Gevent asynchronous programming, ZeroMQ sockets, and the pymongo/bson driver.
Reverse proxy & web cache with NGINX, HAProxy and VarnishEl Mahdi Benzekri
Discover the very wide world of web servers, in addition to the basic web deliverance fonctionnality, we will cover the reverse proxy, the resource caching and the load balancing.
Nginx and apache HTTPD will be used as web server and reverse proxy, and to illustrate some caching features we will also present varnish a powerful caching server.
To introduce load balancers we will compare between Nginx and Haproxy.
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...Aman Kohli
The power of Gatling is the DSL it provides to allow writing meaningful and expressive tests. We provide an overview of the framework, a description of their development environment and goals, and present their test results.
Source code available https://github.com/lawlessc/random-response-time
Systems Monitoring with Prometheus (Devops Ireland April 2015)Brian Brazil
Monitoring means many things to many people. This talk looks at Systems Monitoring, that is how to keep an eye on a given system and use this as part of overall management of a system. This talk will cover Why one monitors, What to monitor, How to monitor, the general design of a monitoring system and how Prometheus is a good fit for this in terms of instrumentation, consoles, alerts, general system health and sanity.
Prometheus is a next-generation monitoring system publicly announced earlier this year, developed by companies including SoundCloud, locals Boxever and Docker. Since launch there has been wide-spread interest, and many community contributions.
For more information see http://prometheus.io or http://www.boxever.com/tag/monitoring
Andrew Betts Web Developer, The Financial Times at Fastly Altitude 2016
Running custom code at the Edge using a standard language is one of the biggest advantages of working with Fastly’s CDN. Andrew gives you a tour of all the problems the Financial Times and Nikkei solve in VCL and how their solutions work.
These slides show how to reduce latency on websites and reduce bandwidth for improved user experience.
Covering network, compression, caching, etags, application optimisation, sphinxsearch, memcache, db optimisation
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRick Copeland
With over 180,000 projects and over 2 million users, SourceForge has tons of data about people developing and downloading open source projects. Until recently, however, that data didn't translate into usable information, so Zarkov was born. Zarkov is system that captures user events, logs them to a MongoDB collection, and aggregates them into useful data about user behavior and project statistics. This talk will discuss the components of Zarkov, including its use of Gevent asynchronous programming, ZeroMQ sockets, and the pymongo/bson driver.
Reverse proxy & web cache with NGINX, HAProxy and VarnishEl Mahdi Benzekri
Discover the very wide world of web servers, in addition to the basic web deliverance fonctionnality, we will cover the reverse proxy, the resource caching and the load balancing.
Nginx and apache HTTPD will be used as web server and reverse proxy, and to illustrate some caching features we will also present varnish a powerful caching server.
To introduce load balancers we will compare between Nginx and Haproxy.
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...Aman Kohli
The power of Gatling is the DSL it provides to allow writing meaningful and expressive tests. We provide an overview of the framework, a description of their development environment and goals, and present their test results.
Source code available https://github.com/lawlessc/random-response-time
This presentation was given to the Dublin Node (JS) Community on May 29th 2014.
Presented by: Chris Lawless, Kevin Yu Wei Xia, Fergal Carroll @phergalkarl, Ciarán Ó hUallacháin, and Aman Kohli @akohli
From zero to hero - Easy log centralization with Logstash and ElasticsearchRafał Kuć
Presentation I gave during DevOps Days Warsaw 2014 about combining Elasticsearch, Logstash and Kibana together or use our Logsene solution instead of Elasticsearch.
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchSematext Group, Inc.
Originally presented at DevOpsDays Warsaw 2014. How to set up centralized logging either using ELK stack - Logstash, Elasticsearch, and Kibana or using Logsene.
High-level Programming Languages: Apache Pig and Pig LatinPietro Michiardi
This slide deck is used as an introduction to the Apache Pig system and the Pig Latin high-level programming language, as part of the Distributed Systems and Cloud Computing course I hold at Eurecom.
Course website:
http://michiard.github.io/DISC-CLOUD-COURSE/
Sources available here:
https://github.com/michiard/DISC-CLOUD-COURSE
Web Performance, Scalability, and Testing Techniques - Boston PHP MeetupJonathan Klein
I gave this talk on 4/27/11 at the Boston PHP Meetup Group. It covers both server side and client side optimizations, as well as monitoring tools and techniques.
Text Editors (Atom / Sublime)
Apache Server (sftp/ssh/php) – Todd's Server!
CPanel / Wordpress (server side details)
Working with any Web API (Mapping Example)
(facebook, linkedin, twitter, maps, d3.js, jquary)
JSON and HTML <img>
GIT http://www.github.com
Jeff Scudder, Eric Bidelman
The number of APIs made available for Google products has exploded from a handful to a slew! Get
the big picture on what is possible with the APIs for everything from YouTube, to Spreadsheets, to
Search, to Translate. We'll go over a few tools to help you get started and the things these APIs share
in common. After this session picking up new Google APIs will be a snap.
CouchDB is a document-oriented database that uses JSON documents, has a RESTful HTTP API, and is queried using map/reduce views. Each of these properties alone, especially MapReduce views, may seem foreign to developers more familiar with relational databases. This tutorial will teach web developers the concepts they need to get started using CouchDB in their projects. CouchDB’s RESTful HTTP API makes it suitable for interfacing with any programming language. CouchDB libraries are available for many programming languages and we will take a look at some of the more popular ones.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
5. To these
✤ Removing bots and crawlers
✤ Picking out relevant events
✤ Grouping events by users
✤ Sequencing the events
✤ Structuring as a matrix
✤ Graphing
6. Using this
✤ an abstraction framework for building MapReduce jobs
✤ linearly scalable data processing
✤ www.cascading.org
7. Word Count - MapReduce
✤ public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable>
✤ public void map(LongWritable key, Text value, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws IOException
✤ “take this line and split it to word tokens”
✤ public static class Reduce extends MapReduceBase implements Reducer<Text,
IntWritable, Text, IntWritable>
✤ public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws IOException
✤ “take each word token and increment a counter”
9. Benefits of
Domain Specific Language
✤ At same level of abstraction of the problem
✤ split words, then do a count on them
✤ Fewer custom code, less prone to implementation bugs
✤ More readable
✤ More productive
10. TF-IDF
✤ Extended from word
count example
✤ Single-purpose
methods
✤ Composition of
functions
✤ github.com/Quantisan/Impatient
✤ github.com/Cascading/Impatient
11. Our data processing methodology
✤ Apply single-purpose
functions to immutable data
✤ Only build what we need as
we go
✤ Composability, extensibility,
maintainability
✤ Use the right tool for the right
task
12. Contact
✤ Paul Lam, data scientist at uSwitch
✤ @Quantisan
✤ paul.lam@forward.co.uk