Title: Dive into ClickHouse storage system
Speaker: Alexander Sapin (https://github.com/alesapin/)
Date: Thursday, March 5, 2020
Event: https://meetup.com/Athens-Big-Data/events/268379195/
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...Altinity Ltd
Slides for the webinar presented on June 16, 2020
By James Hartig, Co-Founders of Admiral and Robert Hodges, Altinity CEO
Advertising is dying in the wake of privacy and adblockers. Join us for a conversation with James Hartig, a Co-Founder at Admiral (getadmiral.com), who helps publishers diversify their revenue and build more meaningful relationships with users. We'll start with an overview of Admiral's platform and how they use large scale session data to power their engagement engine. We'll then discuss the ClickHouse features that Admiral uses to power these real-time decisions. Finally, we'll walk through how Admiral migrated from MongoDB to ClickHouse and some of their plans for future projects. Join us to learn how ClickHouse drives cutting edge real-time applications today!
Speaker Bios:
James Hartig is one of the Co-Founders of Admiral working on distributed systems in Golang. Before this, he worked at the online music streaming platform, Grooveshark.
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
When we talk about bucketing we essentially talk about possibilities to split cassandra partitions in several smaller parts, rather than having only one large partition.
Bucketing of cassandra partitions can be crucial for optimizing queries, preventing large partitions or to fight TombstoneOverwhelmingException which can occur when creating too many tombstones.
In this talk I want to show how to recognize large partitions during datamodeling. I will also show different strategies we used in our projects to create, use and maintain buckets for our partitions.
About the Speaker
Markus Hofer IT Consultant, codecentric AG
Markus Hofer works as an IT Consultant for codecentric AG in Minster, Germany. He works on microservice architectures backed by DSE and/or Apache Cassandra. Markus supports and trains customers building cassandra based applications.
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOAltinity Ltd
From webinar on December 3, 2019
New users of ClickHouse love the speed but may run into a few surprises when designing applications. Column storage turns classic SQL design precepts on their heads. This talk shares our favorite tricks for building great applications. We'll talk about fact tables and dimensions, materialized views, codecs, arrays, and skip indexes, to name a few of our favorites. We'll show examples of each and also reserve time to handle questions. Join us to take your next step to ClickHouse guruhood!
Speaker Bio:
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
MariaDB and Clickhouse Percona Live 2019 talkAlexander Rubin
Running an analytical (OLAP) workload on top of MySQL can be slow and painful. A specifically designed storage format ("Column Store") can significantly improve analytical queries' performance. There are a number of opensource column store databases around. In this talk, I will focus on two of them which can support MySQL protocol: MariaDB ColumnStore and ClickHouse.
I will show some realtime benchmarks and use cases, and demonstrate how MariaDB ColumnStore and ClickHouse can be used for typical OLAP queries.
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...Altinity Ltd
Slides for the webinar presented on June 16, 2020
By James Hartig, Co-Founders of Admiral and Robert Hodges, Altinity CEO
Advertising is dying in the wake of privacy and adblockers. Join us for a conversation with James Hartig, a Co-Founder at Admiral (getadmiral.com), who helps publishers diversify their revenue and build more meaningful relationships with users. We'll start with an overview of Admiral's platform and how they use large scale session data to power their engagement engine. We'll then discuss the ClickHouse features that Admiral uses to power these real-time decisions. Finally, we'll walk through how Admiral migrated from MongoDB to ClickHouse and some of their plans for future projects. Join us to learn how ClickHouse drives cutting edge real-time applications today!
Speaker Bios:
James Hartig is one of the Co-Founders of Admiral working on distributed systems in Golang. Before this, he worked at the online music streaming platform, Grooveshark.
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
When we talk about bucketing we essentially talk about possibilities to split cassandra partitions in several smaller parts, rather than having only one large partition.
Bucketing of cassandra partitions can be crucial for optimizing queries, preventing large partitions or to fight TombstoneOverwhelmingException which can occur when creating too many tombstones.
In this talk I want to show how to recognize large partitions during datamodeling. I will also show different strategies we used in our projects to create, use and maintain buckets for our partitions.
About the Speaker
Markus Hofer IT Consultant, codecentric AG
Markus Hofer works as an IT Consultant for codecentric AG in Minster, Germany. He works on microservice architectures backed by DSE and/or Apache Cassandra. Markus supports and trains customers building cassandra based applications.
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOAltinity Ltd
From webinar on December 3, 2019
New users of ClickHouse love the speed but may run into a few surprises when designing applications. Column storage turns classic SQL design precepts on their heads. This talk shares our favorite tricks for building great applications. We'll talk about fact tables and dimensions, materialized views, codecs, arrays, and skip indexes, to name a few of our favorites. We'll show examples of each and also reserve time to handle questions. Join us to take your next step to ClickHouse guruhood!
Speaker Bio:
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
MariaDB and Clickhouse Percona Live 2019 talkAlexander Rubin
Running an analytical (OLAP) workload on top of MySQL can be slow and painful. A specifically designed storage format ("Column Store") can significantly improve analytical queries' performance. There are a number of opensource column store databases around. In this talk, I will focus on two of them which can support MySQL protocol: MariaDB ColumnStore and ClickHouse.
I will show some realtime benchmarks and use cases, and demonstrate how MariaDB ColumnStore and ClickHouse can be used for typical OLAP queries.
ClickHouse Materialized Views: The Magic ContinuesAltinity Ltd
Slides for the webinar, presented on February 26, 2020
By Robert Hodges, Altinity CEO
Materialized views are the killer feature of ClickHouse, and the Altinity 2019 webinar on how they work was very popular. Join this updated webinar to learn how to use materialized views to speed up queries hundreds of times. We'll cover basic design, last point queries, using TTLs to drop source data, counting unique values, and other useful tricks. Finally, we'll cover recent improvements that make materialized views more useful than ever.
OSDC 2012 | Scaling with MongoDB by Ross LawleyNETWAYS
MongoDB's architecture features built-in support for horizontal scalability, and high availability through replica sets. Auto-sharding allows users to easily distribute data across many nodes. Replica sets enable automatic failover and recovery of database nodes within or across data centers. This session will provide an introduction to scaling with MongoDB by one of the developers working on the project.
Conference: HP Big Data Conference 2015
Session: Real-world Methods for Boosting Query Performance
Presentation: "Extra performance out of thin air"
Presenter: Konstantine Krutiy, Principal Software Engineer / Vertica Whisperer
Company: Localytics
Description:
Learn how to get extra performance out of Vertica from areas you never expected.
This presentation will illustrate how you can improve performance of your Vertica cluster without extra budget.
All you need is ingenuity, knowledge of Vertica internals, and the ability to challenge conventional wisdom.
We will show you real world examples on gaining performance by eliminating unneeded work, eliminating unneeded system waits and making your system operate more efficiently.
Visit my blog http://www.dbjungle.com for more Vertica insights
Unified Data Platform, by Pauline Yeung of Cisco SystemsAltinity Ltd
Presented on December ClickHouse Meetup. Dec 3, 2019
Our journey from using ClickHouse in an internal threat library web application, to experimenting with ClickHouse to migrating production data from Elasticsearch, Postgres, HBase, to trying ClickHouse for error metrics in a product under development.
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Altinity Ltd
Webinar May 27, 2020
ClickHouse is famously fast, but a small amount of extra work makes it much faster. Join us for the latest version of our popular talk on single-node ClickHouse performance. We start by examining the system log to see what ClickHouse queries are doing. Then we introduce standard tricks to increase speed: adding CPUs, reducing I/O with filters, restructuring joins, adding indexes, and using materialized views, plus many more. In each case we show how to measure the results of your work. There will as usual be time for questions as well at the end. Sign up now to polish your ClickHouse performance skills!
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeJeff Frost
Let's face it, major version upgrades can be a pain. Most people are familiar with the dump/restore method, but if your database happens to be larger than a few GB, the downtime required for dump/restore is likely going to exceed your maximum maintenance window.
We'll briefly explore upgrade options including dump/restore, pg_upgrade and logical replication tools like Slony and why I think Slony is currently the best option. Then we'll run through a tutorial on how to use slony for a major version upgrade with minimal downtime.
You can find the scripts used in the demos here:
https://github.com/jfrost/scale-15x-talk
ClickHouse Materialized Views: The Magic ContinuesAltinity Ltd
Slides for the webinar, presented on February 26, 2020
By Robert Hodges, Altinity CEO
Materialized views are the killer feature of ClickHouse, and the Altinity 2019 webinar on how they work was very popular. Join this updated webinar to learn how to use materialized views to speed up queries hundreds of times. We'll cover basic design, last point queries, using TTLs to drop source data, counting unique values, and other useful tricks. Finally, we'll cover recent improvements that make materialized views more useful than ever.
OSDC 2012 | Scaling with MongoDB by Ross LawleyNETWAYS
MongoDB's architecture features built-in support for horizontal scalability, and high availability through replica sets. Auto-sharding allows users to easily distribute data across many nodes. Replica sets enable automatic failover and recovery of database nodes within or across data centers. This session will provide an introduction to scaling with MongoDB by one of the developers working on the project.
Conference: HP Big Data Conference 2015
Session: Real-world Methods for Boosting Query Performance
Presentation: "Extra performance out of thin air"
Presenter: Konstantine Krutiy, Principal Software Engineer / Vertica Whisperer
Company: Localytics
Description:
Learn how to get extra performance out of Vertica from areas you never expected.
This presentation will illustrate how you can improve performance of your Vertica cluster without extra budget.
All you need is ingenuity, knowledge of Vertica internals, and the ability to challenge conventional wisdom.
We will show you real world examples on gaining performance by eliminating unneeded work, eliminating unneeded system waits and making your system operate more efficiently.
Visit my blog http://www.dbjungle.com for more Vertica insights
Unified Data Platform, by Pauline Yeung of Cisco SystemsAltinity Ltd
Presented on December ClickHouse Meetup. Dec 3, 2019
Our journey from using ClickHouse in an internal threat library web application, to experimenting with ClickHouse to migrating production data from Elasticsearch, Postgres, HBase, to trying ClickHouse for error metrics in a product under development.
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Altinity Ltd
Webinar May 27, 2020
ClickHouse is famously fast, but a small amount of extra work makes it much faster. Join us for the latest version of our popular talk on single-node ClickHouse performance. We start by examining the system log to see what ClickHouse queries are doing. Then we introduce standard tricks to increase speed: adding CPUs, reducing I/O with filters, restructuring joins, adding indexes, and using materialized views, plus many more. In each case we show how to measure the results of your work. There will as usual be time for questions as well at the end. Sign up now to polish your ClickHouse performance skills!
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeJeff Frost
Let's face it, major version upgrades can be a pain. Most people are familiar with the dump/restore method, but if your database happens to be larger than a few GB, the downtime required for dump/restore is likely going to exceed your maximum maintenance window.
We'll briefly explore upgrade options including dump/restore, pg_upgrade and logical replication tools like Slony and why I think Slony is currently the best option. Then we'll run through a tutorial on how to use slony for a major version upgrade with minimal downtime.
You can find the scripts used in the demos here:
https://github.com/jfrost/scale-15x-talk
Positive Hack Days. Матросов. Мастер-класс: Проведение криминалистической экс...Positive Hack Days
В рамках мастер-класса будет рассмотрены следующие вопросы:
методы внедрения и работы руткита TDL4;
инструментарий и методы сбора данных для проведения криминалистической экспертизы зараженной машины;
отладка буткит-составляющей на ранней стадии загрузки системы с использованием эмулятора Bochs;
анализ зараженной машины при помощи WinDbg;
удаление руткита из системы после сбора всех необходимых данных.
This presentation will recount the story of Macys.com (and Bloomingdales.com)'s selection and migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax.
We'll start with a mercifully brief backgrounder on our website and our business. Then we will go over the various technologies that we considered, as well as our use case-based performance benchmarks that led to the decision to go with Cassandra.
We'll cover the various schema options that we tried and how we settled on the current one. We'll show you a selection of some of our extensive performance tuning benchmarks.
One thing that differentiates this talk from others on Cassandra is Macy's philosophy of "doing more with less." You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations.
And, finally, we will wrap up with our "lessons learned" and a brief look at our future plans.
Transparent sharding with Spider: what's new and getting startedMariaDB plc
OpenWorks 2019 Session
MariaDB Server 10.3 introduced transparent, built-in sharding with the Spider storage engine to scale out reads, writes and storage. MariaDB Server 10.4 will include a number of improvements, including DDL pushdown. In this session, Ralf Gebhardt and Kentoku Shiba of MariaDB show how to set up a sharded MariaDB cluster and scale out on demand, as well explore as best practices for high availability and consistency in a sharded deployment.
Optimizing Parallel Reduction in CUDA : NOTESSubhajit Sahu
Highlighted notes on Optimizing Parallel Reduction in CUDA
While doing research work under Prof. Dip Banerjee, Prof. Kishore Kothapalli.
Interesting optimizations, i should try these soon as PageRank is basically lots of sums.
All You Need to Know About HCL Notes 64-Bit Clientspanagenda
Webinar Recording: https://www.panagenda.com/webinars/all-you-need-to-know-about-hcl-notes-64-bit-clients/
Let’s face it, it took a while but now chances are that with the release of HCL Domino 12.0.2 there will finally be an HCL Notes 64-bit client! Time to get excited! Or maybe not? At first glance, you might think it’s a simple upgrade and you’re done. But when was life ever that simple?
Upgrading to an HCL Notes 64-bit client comes with quite a few things you want to consider. Such as changes in install location, problems with certain coding constructs in Domino applications, and third-party add-ins are just a few bumps on your road. And as with any new software, there are also considerations about performance, stability and security. You want to be prepared, of course.
In our webinar, about how to run HCL Nomad Configurations on any device we showed you that MarvelClient Roaming can help you solve many challenges. It enables you to automatically back up, restore, and share configurations (desktop, recent apps, settings, and more) among devices using Nomad.
Of course, we went a little further. So in our webinar and live demo about how to bring HCL Nomad and Domino together without SafeLinx, we showed you the skills needed to operate Nomad Web with Domino directly. By checking out the recordings, you can learn how to install and configure the Nomad Web Server, how it works from a user’s point of view, and in which scenarios you might want to keep using SafeLinx.
Join our webinar on January 31, and you will leave with a good understanding of any issues and considerations around the new HCL Notes 64-bit client which allows you to plan your way forward with confidence.
What you will learn
- How to install HCL Notes 64-bit clients from scratch
- How to upgrade old 32-bit clients
- Daily challenges when operating 64-bit clients
- A look at performance and stability
- Domino application design issues
- Third-party add-in dependencies
This slides details on system architecture for x86 which includes register basics , memory paging and security handling . The system architecture is configured by BIOS during booting of x86 based PC/Desktops.
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...Athens Big Data
Title: MLOps Workshop: The Full ML Lifecycle - How to Use ML in Production
Speakers: Spyros Cavadias (https://www.linkedin.com/in/spyros-cavadias/), Konstantinos Pittas (https://www.linkedin.com/in/konstantinos-pittas-83310270/), Thanos Gkinakos (https://www.linkedin.com/in/thanos-gkinakos-03582a128/)
Date: Saturday, December 17, 2022
Event: https://www.meetup.com/athens-big-data/events/289927468/
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...Athens Big Data
Title: NLP: From news recommendation to word prediction
Speaker: Mihalis Papakonstantinou (https://linkedin.com/in/mihalispapak/)
Date: Thursday, December 12, 2019
Event: https://meetup.com/Athens-Big-Data/events/266762191/
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...Athens Big Data
Title: Fast and simple data exploration with ClickHouse
Speaker: Alexander Kuzmenkov (https://github.com/akuzm/)
Date: Thursday, March 5, 2020
Event: https://meetup.com/Athens-Big-Data/events/268379195/
20th Athens Big Data Meetup - 2nd Talk - Druid: under the coversAthens Big Data
Title: Druid: under the covers
Speaker: Peter Marshall (https://linkedin.com/in/amillionbytes/)
Date: Tuesday, January 28, 2020
Event: https://meetup.com/Athens-Big-Data/events/266900242/
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...Athens Big Data
Title: Druid: the open source, performant, real-time, analytical datastore
Speaker: Peter Marshall (https://linkedin.com/in/amillionbytes/)
Date: Tuesday, January 28, 2020
Event: https://meetup.com/Athens-Big-Data/events/266900242/
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on KubernetesAthens Big Data
Title: Run Spark and Flink Jobs on Kubernetes
Speaker: Chaoran Yu (https://linkedin.com/in/chaoran-yu-97b1144a/)
Date: Thursday, November 14, 2019
Event: https://meetup.com/Athens-Big-Data/events/265957761/
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a ServiceAthens Big Data
Title: Timeseries Forecasting as a Service
Speaker: Thanassis Spyrou (https://linkedin.com/in/thanassis-spyrou-92911959/)
Date: Thursday, November 14, 2019
Event: https://meetup.com/Athens-Big-Data/events/265957761/
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...Athens Big Data
Title: Data Flow Building and Calculation Pipelines via PySpark and ML Modeling via Python
Speaker: Theodoros Michalareas (https://linkedin.com/in/theodorosmichalareas/)
Date: Tuesday, September 24, 2019
Event: https://meetup.com/Athens-Big-Data/events/264702584/
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...Athens Big Data
Title: A Focus on Building and Optimizing ML Models with Amazon SageMaker
Speaker: Pavlos Mitsoulis-Ntompos (https://linkedin.com/in/pavlosmitsoulis/)
Date: Thursday, March 14, 2019
Event: https://www.meetup.com/Athens-Big-Data/events/259091496/
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...Athens Big Data
Title: An Introduction to Machine Learning with Python and Scikit-Learn
Speaker: Julien Simon (https://linkedin.com/in/juliensimon/)
Date: Thursday, March 14, 2019
Event: https://www.meetup.com/Athens-Big-Data/events/259091496/
15th Athens Big Data Meetup - 1st Talk - Running Spark On MesosAthens Big Data
Title: Running Spark On Mesos
Speaker: Mr. Chris Sidiropoulos (https://linkedin.com/in/chris-sidiropoulos-2a6b156a//)
Date: Tuesday, November 27, 2018
Event: https://meetup.com/Athens-Big-Data/events/256098657/
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...Athens Big Data
Title: Real-Time Training and Deploying Spark ML Recommendations With Kafka and NetflixOSS
Speaker: Chris Fregly (https://linkedin.com/in/cfregly/)
Date: Monday, October 17, 2016
Event: https://meetup.com/Athens-Big-Data/events/234546355/
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...Athens Big Data
Title: Training Neural Networks With Enterprise Relational Data
Speaker: Dr. Christos Malliopoulos (https://linkedin.com/in/cmalliopoulos//)
Date: Thursday, June 21, 2018
Event: https://meetup.com/Athens-Big-Data/events/251411957/
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...Athens Big Data
Title: Beyond Bitcoin; Blockchain Technologies And How They Disrupt Every Industry
Speaker: Mr. Dimitrios Kouzis-Loukas (https://linkedin.com/in/lookfwd//)
Date: Wednesday, December 27, 2017
Event: https://meetup.com/Athens-Big-Data/events/246033156/
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And GradingAthens Big Data
Title: Lead Scoring And Lead Grading
Speaker: Dr. Lefteris Mantelas (https://linkedin.com/in/lefterismantelas//)
Date: Tuesday, October 24, 2017
Event: https://meetup.com/Athens-Big-Data/events/243829781/
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
5. MergeTree Engines Family
Advantages:
› Inserts are atomic
› Selects are blazing fast
› Primary and secondary indexes
› Data modification!
› Inserts and selects don’t block each other
2 / 84
6. MergeTree Engines Family
Advantages:
› Inserts are atomic
› Selects are blazing fast
› Primary and secondary indexes
› Data modification!
› Inserts and selects don’t block each other
Features:
› Infrequent INSERTs required
3 / 84
7. MergeTree Engines Family
Advantages:
› Inserts are atomic
› Selects are blazing fast
› Primary and secondary indexes
› Data modification!
› Inserts and selects don’t block each other
Features:
› Infrequent INSERTs required (work in progress)
4 / 84
8. MergeTree Engines Family
Advantages:
› Inserts are atomic
› Selects are blazing fast
› Primary and secondary indexes
› Data modification!
› Inserts and selects don’t block each other
Features:
› Infrequent INSERTs required (work in progress)
› Background operations on records with the same keys
› Primary key is NOT unique
5 / 84
10. Create Table and Fill Some Data
Create Table:
CREATE TABLE mt (
EventDate Date,
OrderID Int32,
BannerID UInt64,
GoalNum Int8
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(EventDate) ORDER BY (OrderID, BannerID)
7 / 84
11. Create Table and Fill Some Data
Create Table:
CREATE TABLE mt (
EventDate Date,
OrderID Int32,
BannerID UInt64,
GoalNum Int8
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(EventDate) ORDER BY (OrderID, BannerID)
Fill Data (twice):
INSERT INTO mt SELECT toDate('2018-09-26'),
number, number + 10000, number % 128 from numbers(1000000);
INSERT INTO mt SELECT toDate('2018-10-15'),
number, number + 10000, number % 128 from numbers(1000000, 1000000);
8 / 84
13. Table on Disk
metadata:
$ ls /var/lib/clickhouse/metadata/default/
mt.sql
data:
$ ls /var/lib/clickhouse/data/default/mt
201809_2_2_0 201809_3_3_0 201810_1_1_0 201810_4_4_0 201810_1_4_1
detached format_version.txt
10 / 84
14. Table on Disk
metadata:
$ ls /var/lib/clickhouse/metadata/default/
mt.sql
data:
$ ls /var/lib/clickhouse/data/default/mt
201809_2_2_0 201809_3_3_0 201810_1_1_0 201810_4_4_0 201810_1_4_1
detached format_version.txt
Contents:
› Format file format_version.txt
› Directories with parts
› Directory for detached parts
11 / 84
15. Parts: Details
› Part of the data in PK order
› Contain interval of insert numbers
› Created for each INSERT
› Cannot be changed (immutable)
12 / 84
16. Parts: Why?
PK ordering is needed for efficient OLAP queries
› In our case (OrderID, BannerID)
13 / 84
17. Parts: Why?
PK ordering is needed for efficient OLAP queries
› In our case (OrderID, BannerID)
But the data comes in order of time
› By EventDate
14 / 84
18. Parts: Why?
PK ordering is needed for efficient OLAP queries
› In our case (OrderID, BannerID)
But the data comes in order of time
› By EventDate
Re-sorting all the data is expensive
› ClickHouse handle hundreds of terabytes
15 / 84
19. Parts: Why?
PK ordering is needed for efficient OLAP queries
› In our case (OrderID, BannerID)
But the data comes in order of time
› By EventDate
Re-sorting all the data is expensive
› ClickHouse handle hundreds of terabytes
Solution: Store data in a set of ordered parts!
16 / 84
27. Parts: Data
$ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1
GoalNum.bin GoalNum.mrk BannerID.bin ...
primary.idx checksums.txt count.txt columns.txt
partition.dat minmax_EventDate.idx
24 / 84
28. Parts: Data
$ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1
GoalNum.bin GoalNum.mrk BannerID.bin ...
primary.idx checksums.txt count.txt columns.txt
partition.dat minmax_EventDate.idx
Contents:
› primary.idx – primary key on disk
25 / 84
29. Parts: Data
$ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1
GoalNum.bin GoalNum.mrk BannerID.bin ...
primary.idx checksums.txt count.txt columns.txt
partition.dat minmax_EventDate.idx
Contents:
› primary.idx – primary key on disk
› GoalNum.bin – compressed column
› GoalNum.mrk – marks for column
26 / 84
30. Parts: Data
$ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1
GoalNum.bin GoalNum.mrk BannerID.bin ...
primary.idx checksums.txt count.txt columns.txt
partition.dat minmax_EventDate.idx
Contents:
› primary.idx – primary key on disk
› GoalNum.bin – compressed column
› GoalNum.mrk – marks for column
› partition.dat – partition id
27 / 84
31. Parts: Data
$ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1
GoalNum.bin GoalNum.mrk BannerID.bin ...
primary.idx checksums.txt count.txt columns.txt
partition.dat minmax_EventDate.idx
Contents:
› primary.idx – primary key on disk
› GoalNum.bin – compressed column
› GoalNum.mrk – marks for column
› partition.dat – partition id
› ... – a lot of other useful files
28 / 84
33. Columns
› Each column in separate file
› Compressed by blocks
› Checksums for each block
OrderID.bin
0
65795
131614
197433
Uncompressed
block
14324
17214
17215
17216
17217
Checksums
30 / 84
34. How to Use Index?
Problem:
› Index is sparse and contain rows
› Columns contain compressed blocks
How to match index with columns?
31 / 84
35. Solution: Marks
› Mark – offset in compressed file and uncompressed block
› Stored in column_name.mrk files
› One for each index row
32 / 84
36. Solution: Marks
› Mark – offset in compressed file and uncompressed block
› Stored in column_name.mrk files
› One for each index row
65795 0
OrderID.mrk
OrderID.bin
0
65795
131614
197433
0
32768
Uncompressed
block
8192
65795 32768
33 / 84
37. Put it all together
Algorithm:
› Determine required index rows
› Found corresponding marks
› Distribute granules (stripe of marks) among threads
› Read required granules
Properties:
› Read granules concurrently
› Threads can steal tasks
34 / 84
38. Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread 1 thread 2
OrderID BannerID
SELECT any(EventDate),
max(GoalNum) FROM mt
WHERE OrderID BETWEEN
6123 AND 17345
35 / 84
39. Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread 1 thread 2
OrderID BannerID
SELECT any(EventDate),
max(GoalNum) FROM mt
WHERE OrderID BETWEEN
6123 AND 17345
36 / 84
40. Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread 1 thread 2
OrderID BannerID
SELECT any(EventDate),
max(GoalNum) FROM mt
WHERE OrderID BETWEEN
6123 AND 17345
37 / 84
41. Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread 1 thread 2
OrderID BannerID
SELECT any(EventDate),
max(GoalNum) FROM mt
WHERE OrderID BETWEEN
6123 AND 17345
38 / 84
42. Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread 1 thread 2
OrderID BannerID
SELECT any(EventDate),
max(GoalNum) FROM mt
WHERE OrderID BETWEEN
6123 AND 17345
39 / 84
43. Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread 1 thread 2
OrderID BannerID
SELECT any(EventDate),
max(GoalNum) FROM mt
WHERE OrderID BETWEEN
6123 AND 17345
40 / 84
44. Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread 1 thread 2
OrderID BannerID
SELECT any(EventDate),
max(GoalNum) FROM mt
WHERE OrderID BETWEEN
6123 AND 17345
41 / 84
45. Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread 1 thread 2
OrderID BannerID
SELECT any(EventDate),
max(GoalNum) FROM mt
WHERE OrderID BETWEEN
6123 AND 17345
42 / 84
46. Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread 1 thread 2
OrderID BannerID
SELECT any(EventDate),
max(GoalNum) FROM mt
WHERE OrderID BETWEEN
6123 AND 17345
43 / 84
47. Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread 1 thread 2
OrderID BannerID
SELECT any(EventDate),
max(GoalNum) FROM mt
WHERE OrderID BETWEEN
6123 AND 17345
44 / 84
48. Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread 1 thread 2
OrderID BannerID
SELECT any(EventDate),
max(GoalNum) FROM mt
WHERE OrderID BETWEEN
6123 AND 17345
45 / 84
50. Problem: Amount Files in Parts
Test example: Almost OK
$ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1 | wc -l
14
47 / 84
51. Problem: Amount Files in Parts
Test example: Almost OK
$ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1 | wc -l
14
Production: Too much
$ ssh -A root@mtgiga075-2.metrika.yandex.net
# ls /var/lib/clickhouse/data/merge/visits_v2/202002_4462_4462_0 | wc -l
1556
48 / 84
52. Solution: Merges
M N N+1
Primary
key
Insert number
Part
[M,N]
Part
[N+1]
49 / 84
53. Solution: Merges
M N N+1
Background merge
[M,N] [N+1]
Primary
key
Part Part
Insert number
50 / 84
55. Properties of Merge
› Each part participate in a single successful merge
› Source parts became inactive
› Addional logic during merge
52 / 84
56. Things to do while merging
Replace/update records
› ReplacingMergeTree – replace
› SummingMergeTree – sum
› CollapsingMergeTree – fold
› VersionedCollapsingMergeTree – fold rows + versioning
Pre-aggregate data
› AggregatingMergeTree – merge aggregate function states
Metrics rollup
› GraphiteMergeTree – rollup in graphite fashion
53 / 84
59. Partitioning
ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY ...
› Logical entities (doesn’t stored on disk)
› Table can be partitioned by any expression (default: by month)
› Parts from different partitions never merged
› MinMax index by partition columns
› Easy manipulation of partitions:
56 / 84
60. Partitioning
ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY ...
› Logical entities (doesn’t stored on disk)
› Table can be partitioned by any expression (default: by month)
› Parts from different partitions never merged
› MinMax index by partition columns
› Easy manipulation of partitions:
ALTER TABLE mt DROP PARTITION 201810
ALTER TABLE mt DETACH/ATTACH PARTITION 201809
57 / 84
61. Mutations
ALTER TABLE mt DELETE WHERE OrderID < 1205
ALTER TABLE mt UPDATE GoalNum = 3 WHERE BannerID = 235433;
Features
› NOT designed for regular usage
› Overwrite all touched parts on disk
› Work in background
› Original parts became inactive
58 / 84
84. Things to remember
Control total number of parts
› Rate of INSERTs
Merging runs in the background
› Even when there are no queries!
› With additional fold logic
81 / 84
85. Things to remember
Control total number of parts
› Rate of INSERTs
Merging runs in the background
› Even when there are no queries!
› With additional fold logic
Index is sparse
› Must fit into memory
› Determines order of data on disk
› Using the index is always beneficial
82 / 84
86. Things to remember
Control total number of parts
› Rate of INSERTs
Merging runs in the background
› Even when there are no queries!
› With additional fold logic
Index is sparse
› Must fit into memory
› Determines order of data on disk
› Using the index is always beneficial
Partitions is logical entity
› Easy manipulation with portions of data
› Cannot improve SELECTs performance
83 / 84