ClickHouse 2018. How to stop waiting for your queries to complete and start ...Altinity Ltd
ClickHouse 2018. How to stop waiting for your queries to complete and start having fun, by Alexander Zaitsev, Altinity CTO
Presented at Percona Live Frankfurt
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...Altinity Ltd
Columnar stores like ClickHouse enable users to pull insights from big data in seconds, but only if you set things up correctly. This talk will walk through how to implement a data warehouse that contains 1.3 billion rows using the famous NY Yellow Cab ride data. We'll start with basic data implementation including clustering and table definitions, then show how to load efficiently. Next, we'll discuss important features like dictionaries and materialized views, and how they improve query efficiency. We'll end by demonstrating typical queries to illustrate the kind of inferences you can draw rapidly from a well-designed data warehouse. It should be enough to get you started--the next billion rows is up to you!
ClickHouse 2018. How to stop waiting for your queries to complete and start ...Altinity Ltd
ClickHouse 2018. How to stop waiting for your queries to complete and start having fun, by Alexander Zaitsev, Altinity CTO
Presented at Percona Live Frankfurt
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...Altinity Ltd
Columnar stores like ClickHouse enable users to pull insights from big data in seconds, but only if you set things up correctly. This talk will walk through how to implement a data warehouse that contains 1.3 billion rows using the famous NY Yellow Cab ride data. We'll start with basic data implementation including clustering and table definitions, then show how to load efficiently. Next, we'll discuss important features like dictionaries and materialized views, and how they improve query efficiency. We'll end by demonstrating typical queries to illustrate the kind of inferences you can draw rapidly from a well-designed data warehouse. It should be enough to get you started--the next billion rows is up to you!
From Overnight to Always On @ Jfokus 2019Enno Runne
Systems integration is everywhere, not because we want it, but because we need it.
It's the download of exchange rates, the list of yesterday's orders and the latest inventory. Not long time ago, we'd pull this kind of information in overnight batches and every system had something to work on. That was the age where we had printed newspapers.
Today, data needs to be there. Instantaneously. Or "as fast as possible". We don't want to transfer huge piles of data once every night, but have the updates coming by - just after the change happened. We want streaming data.
In this talk, we exemplify the path to move from overnight file exchanges to streaming data by using Alpakka, which is an integration library based on Reactive Streams and Akka.
Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10Altinity Ltd
Presentation by Robert Hodges introducing the many ways that ClickHouse can read and write data from other systems, including MySQL, Kafka, S3, and Snowflake.
In this lecture we analyze document oriented databases. In particular we consider why there are the first approach to nosql and what are the main features. Then, we analyze as example MongoDB. We consider the data model, CRUD operations, write concerns, scaling (replication and sharding).
Finally we presents other document oriented database and when to use or not document oriented databases.
AWS Athena vs. Google BigQuery for interactive SQL QueriesDoiT International
During the re:Invent 2016, AWS has released the Amazon Athena - an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
We took a look on AWS Athena and compared it to the Google BigQuery - another player of serverless interactive data analysis.
Would you like to know which one is the right tool for you? Join us for this meetup to learn AWS Athena and for the test drive of querying exactly the same dataset using AWS Athena and Google BigQuery to see where each one shines (or totally blows it).
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLONOutlyer
Video:
Paul Dix (Founder of InfluxDB) talking about his awesome Open-Source projects for monitoring.
For more info visit: InfluxDB: www.influxdb.com
Join DevOps Exchange London here: http://www.meetup.com/DevOps-Exchange-London/
Follow DOXLON on twitter: twitter.com/doxlon
InfluxDB 1.0 - Optimizing InfluxDB by Sam DillardInfluxData
Learn how to optimize InfluxDB 1.0 for performance including hardware and architecture choices, schema design, configuration setup, and running queries. In this InfluxDays NYC 2019 presentation, Sam Dillard provides numerous actionable tips and insights into InfluxDB optimization.
The workshop will present how to combine tools to quickly query, transform and model data using command line tools.
The goal is to show that command line tools are efficient at handling reasonable sizes of data and can accelerate the data science
process. We will show that in many instances, command line processing ends up being much faster than ‘big-data’ solutions. The content
of the workshop is derived from the book of the same name (http://datascienceatthecommandline.com/). In addition, we will cover
vowpal-wabbit (https://github.com/JohnLangford/vowpal_wabbit) as a versatile command line tool for modeling large datasets.
From Overnight to Always On @ Jfokus 2019Enno Runne
Systems integration is everywhere, not because we want it, but because we need it.
It's the download of exchange rates, the list of yesterday's orders and the latest inventory. Not long time ago, we'd pull this kind of information in overnight batches and every system had something to work on. That was the age where we had printed newspapers.
Today, data needs to be there. Instantaneously. Or "as fast as possible". We don't want to transfer huge piles of data once every night, but have the updates coming by - just after the change happened. We want streaming data.
In this talk, we exemplify the path to move from overnight file exchanges to streaming data by using Alpakka, which is an integration library based on Reactive Streams and Akka.
Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10Altinity Ltd
Presentation by Robert Hodges introducing the many ways that ClickHouse can read and write data from other systems, including MySQL, Kafka, S3, and Snowflake.
In this lecture we analyze document oriented databases. In particular we consider why there are the first approach to nosql and what are the main features. Then, we analyze as example MongoDB. We consider the data model, CRUD operations, write concerns, scaling (replication and sharding).
Finally we presents other document oriented database and when to use or not document oriented databases.
AWS Athena vs. Google BigQuery for interactive SQL QueriesDoiT International
During the re:Invent 2016, AWS has released the Amazon Athena - an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
We took a look on AWS Athena and compared it to the Google BigQuery - another player of serverless interactive data analysis.
Would you like to know which one is the right tool for you? Join us for this meetup to learn AWS Athena and for the test drive of querying exactly the same dataset using AWS Athena and Google BigQuery to see where each one shines (or totally blows it).
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLONOutlyer
Video:
Paul Dix (Founder of InfluxDB) talking about his awesome Open-Source projects for monitoring.
For more info visit: InfluxDB: www.influxdb.com
Join DevOps Exchange London here: http://www.meetup.com/DevOps-Exchange-London/
Follow DOXLON on twitter: twitter.com/doxlon
InfluxDB 1.0 - Optimizing InfluxDB by Sam DillardInfluxData
Learn how to optimize InfluxDB 1.0 for performance including hardware and architecture choices, schema design, configuration setup, and running queries. In this InfluxDays NYC 2019 presentation, Sam Dillard provides numerous actionable tips and insights into InfluxDB optimization.
The workshop will present how to combine tools to quickly query, transform and model data using command line tools.
The goal is to show that command line tools are efficient at handling reasonable sizes of data and can accelerate the data science
process. We will show that in many instances, command line processing ends up being much faster than ‘big-data’ solutions. The content
of the workshop is derived from the book of the same name (http://datascienceatthecommandline.com/). In addition, we will cover
vowpal-wabbit (https://github.com/JohnLangford/vowpal_wabbit) as a versatile command line tool for modeling large datasets.
I inherited a MongoDB database server with 60 collections and 100 or so indexes.
The business users are complaining about slow report completion times. What can I do to improve performance?
Importing Data into Neo4j quickly and easily - StackOverflowNeo4j
In this GraphConnect presentation Mark and Michael show several ways to import large amounts of highly connected data from different formats into Neo4j. Both Cypher's LOAD CSV as well as the bulk importer is demonstrated along with many tips.
We use the well know StackOverflow Q&A site data which is interestingly very graphy.
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...NoSQLmatters
Simon Elliston Ball – When to NoSQL and When to Know SQL
With NoSQL, NewSQL and plain old SQL, there are so many tools around it’s not always clear which is the right one for the job.This is a look at a series of NoSQL technologies, comparing them against traditional SQL technology. I’ll compare real use cases and show how they are solved with both NoSQL options, and traditional SQL servers, and then see who wins. We’ll look at some code and architecture examples that fit a variety of NoSQL techniques, and some where SQL is a better answer. We’ll see some big data problems, little data problems, and a bunch of new and old database technologies to find whatever it takes to solve the problem.By the end you’ll hopefully know more NoSQL, and maybe even have a few new tricks with SQL, and what’s more how to choose the right tool for the job.
Apache Calcite is a dynamic data management framework. Think of it as a toolkit for building databases: it has an industry-standard SQL parser, validator, highly customizable optimizer (with pluggable transformation rules and cost functions, relational algebra, and an extensive library of rules), but it has no preferred storage primitives. In this tutorial, the attendees will use Apache Calcite to build a fully fledged query processor from scratch with very few lines of code. This processor is a full implementation of SQL over an Apache Lucene storage engine. (Lucene does not support SQL queries and lacks a declarative language for performing complex operations such as joins or aggregations.) Attendees will also learn how to use Calcite as an effective tool for research.
More Data, More Problems: Evolving big data machine learning pipelines with S...Alex Sadovsky
These are the slides from the Denver/Boulder Spark meet-up on February 24th, 2016. (deck build animations are all broken here... sorry!)
This talk provides an evaluation of existing machine learning pipelines in the eyes of different key stakeholders in the data science ecosystem. Focus is be placed upon the entire process from data to product (and keeping everyone in-between happy). Ultimately I explore how to utilize Spotify’s Luigi pipeline tool in combination with Spark to produce batch processing machine learning pipelines that have operational insights and redundancy built in.
Slick: Bringing Scala’s Powerful Features to Your Database Access Rebecca Grenier
This talk will teach you how to use Slick in practice, based on our experience at EatingWell Media Group. Slick is a totally different (and better!) relational database mapping tool that brings Scala’s powerful features to your database interactions, namely: static-checking, compile-time safety, and compositionality.
Here at EatingWell, we have learned quite a bit about Slick over the past two years as we transitioned from a PHP website to Scala. I will share with you tips and tricks we have learned, as well as everything you need to get started using Slick in your Scala application.
I will begin with Slick fundamentals: how to get started making your connection, the types of databases it can access, how to actually create table objects and make queries to and from them. We will using these fundamentals to demonstrate the powerful features inherited from the Scala language itself: static-checking, compile-time safety, and compositionality. And throughout I will share plenty of tips that will help you in everything from getting started to connection pooling options and configuration for use at scale.
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...MongoDB
Drawn from Think Big's experience on real-world client projects, Think Big Academy Director and Principal Architect Jeffrey Breen will review specific ways to integrate NoSQL databases into Hadoop-based Big Data systems: preserving state in otherwise stateless processes; storing pre-computed metrics and aggregates to enable interactive analytics and reporting; and building a secondary index to provide low latency, random access to data stored stored on the high latency HDFS. A working example of secondary indexing is presented in which MongoDB is used to index web site visitor locations from Omniture clickstream data stored on HDFS.
Streaming ETL - from RDBMS to Dashboard with KSQLBjoern Rost
Apache Kafka is a massively scalable message queue that is being used at more and more places connecting more and more data sources. This presentation will introduce Kafka from the perspective of a mere mortal DBA and share the experience of (and challenges with) getting events from the database to Kafka using Kafka connect including poor-man’s CDC using flashback query and traditional logical replication tools. To demonstrate how and why this is a good idea, we will build an end-to-end data processing pipeline. We will discuss how to turn changes in database state into events and stream them into Apache Kafka. We will explore the basic concepts of streaming transformations using windows and KSQL before ingesting the transformed stream in a dashboard application.
node.js and native code extensions by examplePhilipp Fehre
Over the last years node.js has evolved to be a great language to build web applications. The reason for this is not only that it is based on JavaScript which already is established around "the web" but also that it provides excellent facilities for extensions, not only via JavaScript but also integration of native C libraries. Couchbase makes a lot of use of this fact making the Couchbase node.js SDK (Couchnode) a wrapper around the C library providing a node.js like API, but leveraging the power of a native C library underneat. So how is this done? How does such a package look like? Let me show you how integration of C in node.js works and how to "read" a package like Couchnode.
JRuby is a great way to use native Java libraries and get around the project overhead of Java, but how do you actually use Java from JRuby? This talk explores building a JRuby application, backed by the portable Java version of Couchbase Mobile.
While JRuby is build to interface with Java, when calling out to JVM land there are all those little hurdles to overcome. Handling Strings correctly, using native Collection types, and interfacing with libraries which expect those native types instead of the ones provided by JRuby, and last but not least Implementing native interfaces to pass around.
Oh and by the way all of this runs on a little Raspberry Pi!
This presentation was given by David Maier @magicable @munichnosql may 2014. The code can be found https://github.com/dmaier-couchbase/cbl-android-tasklist
Before joining Couchbase Phil has been a consultant on many different node.js and NoSQL projects working with many different languages and databases. By helping clients solve problems regarding scalability as well building completely new APIs he gained a broad knowledge of the available platforms and their tradeoffs in the big and small. He's a Developer Evangelist for Couchbase where he works to educate developers on the different parts of using a NoSQL database from mobile to big iron servers.
Walk through some basic examples for Riaks Solr integration yokozuna, CRDTs, and Authentication
Find all the example on Github https://github.com/sideshowcoder/whats_new_in_riak_2_0
Introduction to Riak, and Riak-CS at "Munich Rubyshift The big Ruby & Database shootout!" 9/2013 http://www.meetup.com/Munich-Rubyshift-Ruby-User-Group/
Starting up rails is crazy slow! Sometimes I drag to use some tools
just because they startup rails and it takes like 30 sec.
It's bad, and it breaks flow. Zeus is here to change this by giving you a
fast way to run your rails environment.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
30. class AddFields < ActiveRecord::Migration
def up
change_table :quotas do |t|
t.column :free, :integer
end
Quota.all do |qota|
quota.free = calculate_free(quota)
quota.save
end
end
end
32. class AddFields < ActiveRecord::Migration
def up
change_table :quotas do |t|
t.column :free, :integer
end
execute "COMMIT"
Quota.all do |qota|
quota.free = calculate_free(quota)
quota.save
end
end
end
38. module DB
STORE = {
keyOne: [{id:
{id:
keyTwo: [{id:
{id:
{id:
}
1,
2,
1,
3,
4,
prop:
prop:
prop:
prop:
prop:
"bar", vsn: 2, free: 20 },
"foo", vsn: 2, free: 10 }],
"bar"},
"bar"},
"bar"}]
!
def self.find_all keys, prop
keys.map { |k|
STORE.fetch(k.to_sym, []).map { |e|
e if e[:prop] == prop
}.compact
}
end
end
39. def is_version2? data; data[:vsn] == 2; end
def get_free id; 20; end
def save data; data; end
!
def transfrom_to_v2 data
return data if is_version2?(data)
!
data[:vsn] = 2
data[:free] = calculate_free(data[:id])
save data
end
41. DRY it up: Deprecator
https://github.com/sideshowcoder/deprecator
42. class Thing
def initialize *args
args.each do |k, v|
self.instance_variable_set "@#{k}", v
end
@version = 0 unless @version
end
attr_accessor :version
!
include Deprecator::Versioning
ensure_version 2, :upgrade_to
!
def upgrade_to expected_version
# handle the version upgrade
save
end
!
def save
# save back to the store
end
end
50. Talks to listen to
!
•
Schemalessness: http://cloud.dzone.com/articles/martinfowler-schemalessness
•
Introduction to NoSQL: http://www.youtube.com/watch?
v=qI_g07C_Q5I