This document discusses measuring and optimizing data processing performance on CPUs and GPUs. It provides examples using Cupy, NumPy, Apache Ignite, and ROCm to perform operations and measure throughput on different hardware. Specific examples include sorting random data on CPU and GPU to compare performance, running Apache Ignite benchmarks on different CPUs, and using ROCm and PyOpenCL to leverage an AMD GPU for general-purpose computing. The document emphasizes that tightly coupling fast memory and computation is important for workloads with large data processing requirements.
Video games are written as a main loop: process player input, update the state of the game, render a new frame to the screen, repeat. They do this 60 times a second, with millisecond timing. Most monitoring tools are also written as loops: send a probe, wait for the response, update a data store, sleep. Often this is done pretty slowly, maybe once a second! In video games if you can’t update fast enough, you skip the rendering step and the frame rate drops. With monitoring tools if your loop takes to long you also stop logging data as often, and instead of choppy gameplay you get gaps in your graphs, often when you need that data the most!
Let’s use ping as an example and see how we can rewrite its main loop to function more like a video game, keeping a high frame rate.
In the Container world, there is a need to build observability around apps and backing services running in containers. The observability should allow to capture on demand low level metrics at a low overhead. The proposal is to use ebpf as the tracing technology to capture details at kernel & user level, and generate on demand flamegraphs, heat maps for applications & backing services. The Linux kernel has a built-in BPF JIT compiler, and an in-kernel verifier which is used to validate eBPF programs. This allows user defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. eBPF provides in-kernel implementation of storage maps such as histograms and hash-maps, which helps in efficient copy of summarized monitoring data from kernel to user space with low overhead.
These features make eBPF programs safe to run in production and allow admins and engineers to collect crucial data from systems for performance analysis and monitoring.
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries. These stored functions currently only work on Linux-based systems.
This slide will show you how to use SOFA to do performance analysis of CPU/GPU cooperative programs, especially programs running with deep software stacks like TensorFlow, PyTorch, etc.
source code at:
https://github.com/cyliustack/sofa
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and TelegrafInfluxData
Did you know you can use InfluxDB to monitor your BBQ and to ensure the tastiest results? Join this meetup to learn two different approaches to using a time series database to monitor a BBQ or a smoker. Learn how Will Cooke uses Python, MQTT, Telegraf and InfluxDB 2.0 to monitor his smoker and to gain insight into temperature changes, the stall, and other important stats about his brisket. Scott Anderson will demonstrate how he uses a FireBoard wireless thermometer, Telegraf and InfluxDB 2.0 to continuously work towards the perfect smoke.
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries.
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PROIDEA
Wybór docelowej platformy sieciowej (np. routera, firewalla, scrubbera DDoS) jest często poprzedzony jej testami. Jednym z celów testów jest sprawdzenie, czy parametry wydajnościowe deklarowane przez producenta odpowiadają rzeczywistości. Zespół rozwijający redGuardian Anty DDoS testuje rozwiązanie regresyjnie i wydajnościowo w sposób zautomatyzowany od początku jego istnienia. W czasie prezentacji przeanalizujemy aspekty, na które warto zwrócić uwagę w czasie testów wydajnościowych urządzeń IP oraz przyjrzymy się narzędziom open source pomocnym w realizacji tego zadania.
"If we were able to take a microscope and observe how our programs work on the lowest level, we would be surprised and shocked. Close to the wire, programs behave very differently from what we expect.
In this session we will go through code examples that show the counter-intuitive behavior of Java on the microscopic scale. We will take a detailed look at how the underlying technology works that causes the surprising behavior and how we can measure our programs on the lowest level. Topics covered will be the cache hierarchy, false sharing, pipelining, branch prediction, and out-of-order execution.
After this talk you will have a good understanding of how modern CPUs work and how this can affect the performance of your programs."
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...Igalia
By Víctor Jáquez.
GStreamer 1.0 brings new features for memory handling, particularly the management of buffers. Now it is possible to implement elements that make use of memory areas that are not accessed directly by CPU, such as the video memory, continuous memory areas and so on.
The purpose of this talk is to show how we can use these new interfaces for developing GStreamer elements, that are tightly integrated with the hardware. In particular, we will show how we implemented a simple video sink for the pandaboard, using the Direct Rendering Management (DRM) interface.
This element, called kmssink, uses many of the new concepts for memory handling, added in GStreamer v1.0, such as allocators, buffer pools, and so on. We will review these concepts and how they were used in the element.
This presentation will give a quick introduction on how to use slurm, the scheduler that runs programs (scripts) in HPC. Targeted for audience who are new to the Lawrencium or who may want to learn a few more things in troubleshooting their jobs.
Video games are written as a main loop: process player input, update the state of the game, render a new frame to the screen, repeat. They do this 60 times a second, with millisecond timing. Most monitoring tools are also written as loops: send a probe, wait for the response, update a data store, sleep. Often this is done pretty slowly, maybe once a second! In video games if you can’t update fast enough, you skip the rendering step and the frame rate drops. With monitoring tools if your loop takes to long you also stop logging data as often, and instead of choppy gameplay you get gaps in your graphs, often when you need that data the most!
Let’s use ping as an example and see how we can rewrite its main loop to function more like a video game, keeping a high frame rate.
In the Container world, there is a need to build observability around apps and backing services running in containers. The observability should allow to capture on demand low level metrics at a low overhead. The proposal is to use ebpf as the tracing technology to capture details at kernel & user level, and generate on demand flamegraphs, heat maps for applications & backing services. The Linux kernel has a built-in BPF JIT compiler, and an in-kernel verifier which is used to validate eBPF programs. This allows user defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. eBPF provides in-kernel implementation of storage maps such as histograms and hash-maps, which helps in efficient copy of summarized monitoring data from kernel to user space with low overhead.
These features make eBPF programs safe to run in production and allow admins and engineers to collect crucial data from systems for performance analysis and monitoring.
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries. These stored functions currently only work on Linux-based systems.
This slide will show you how to use SOFA to do performance analysis of CPU/GPU cooperative programs, especially programs running with deep software stacks like TensorFlow, PyTorch, etc.
source code at:
https://github.com/cyliustack/sofa
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and TelegrafInfluxData
Did you know you can use InfluxDB to monitor your BBQ and to ensure the tastiest results? Join this meetup to learn two different approaches to using a time series database to monitor a BBQ or a smoker. Learn how Will Cooke uses Python, MQTT, Telegraf and InfluxDB 2.0 to monitor his smoker and to gain insight into temperature changes, the stall, and other important stats about his brisket. Scott Anderson will demonstrate how he uses a FireBoard wireless thermometer, Telegraf and InfluxDB 2.0 to continuously work towards the perfect smoke.
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries.
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PROIDEA
Wybór docelowej platformy sieciowej (np. routera, firewalla, scrubbera DDoS) jest często poprzedzony jej testami. Jednym z celów testów jest sprawdzenie, czy parametry wydajnościowe deklarowane przez producenta odpowiadają rzeczywistości. Zespół rozwijający redGuardian Anty DDoS testuje rozwiązanie regresyjnie i wydajnościowo w sposób zautomatyzowany od początku jego istnienia. W czasie prezentacji przeanalizujemy aspekty, na które warto zwrócić uwagę w czasie testów wydajnościowych urządzeń IP oraz przyjrzymy się narzędziom open source pomocnym w realizacji tego zadania.
"If we were able to take a microscope and observe how our programs work on the lowest level, we would be surprised and shocked. Close to the wire, programs behave very differently from what we expect.
In this session we will go through code examples that show the counter-intuitive behavior of Java on the microscopic scale. We will take a detailed look at how the underlying technology works that causes the surprising behavior and how we can measure our programs on the lowest level. Topics covered will be the cache hierarchy, false sharing, pipelining, branch prediction, and out-of-order execution.
After this talk you will have a good understanding of how modern CPUs work and how this can affect the performance of your programs."
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...Igalia
By Víctor Jáquez.
GStreamer 1.0 brings new features for memory handling, particularly the management of buffers. Now it is possible to implement elements that make use of memory areas that are not accessed directly by CPU, such as the video memory, continuous memory areas and so on.
The purpose of this talk is to show how we can use these new interfaces for developing GStreamer elements, that are tightly integrated with the hardware. In particular, we will show how we implemented a simple video sink for the pandaboard, using the Direct Rendering Management (DRM) interface.
This element, called kmssink, uses many of the new concepts for memory handling, added in GStreamer v1.0, such as allocators, buffer pools, and so on. We will review these concepts and how they were used in the element.
This presentation will give a quick introduction on how to use slurm, the scheduler that runs programs (scripts) in HPC. Targeted for audience who are new to the Lawrencium or who may want to learn a few more things in troubleshooting their jobs.
Mark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries.
Using open source tools for network device dataplane testing.
Our experiences from redGuardian DDoS mitigation scrubber testing.
Presented at PLNOG 20 (2018).
NS2 - the network simulator which is proved useful in studying the dynamic nature of communication networks. Simulation of wired as well as wireless network functions and protocols( e.g. routing algorithms, TCP, UDP ) can be done using NS2
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pc_proctab is a collection of PostgreSQL stored functions that allow you to access the operating system process table using SQL. See examples on how to use these stored functions to collect processor and I/O statistics on SQL statements run against the database.
The search for faster computing remains of great importance to the software community. Relatively inexpensive modern hardware, such as GPUs, allows users to run highly parallel code on thousands, or even millions of cores on distributed systems.
Building efficient GPU software is not a trivial task, often requiring a significant amount of engineering hours to attain the best performance. Similarly, distributed computing systems are inherently complex. In recent years, several libraries were developed to solve such problems. However, they often target a single aspect of computing, such as GPU computing with libraries like CuPy, or distributed computing with Dask.
Libraries like Dask and CuPy tend to provide great performance while abstracting away the complexity from non-experts, being great candidates for developers writing software for various different applications. Unfortunately, they are often difficult to be combined, at least efficiently.
With the recent introduction of NumPy community standards and protocols, it has become much easier to integrate any libraries that share the already well-known NumPy API. Such changes allow libraries like Dask, known for its easy-to-use parallelization and distributed computing capabilities, to defer some of that work to other libraries such as CuPy, providing users the benefits from both distributed and GPU computing with little to no change in their existing software built using the NumPy API.
In a world where microservices are more and more a standard architecture for Java based applications running in the cloud, the JVM warmup time can become a limitation. Especially when you look at spinning up new instances of an app as response to changes in load, the warmup time can be a problem. Native images are one solution to solve these problems because their statically ahead of time compiled code simply doesn’t have to warmup and so has short startup time. But even with the shorter startup time and smaller footprint it doesn’t come without a drawback. The overall performance might be slower because of the missing JIT optimizations at runtime. There is a new OpenJDK project called CRaC (Coordinated Restore at Checkpoint) which goal it is to address the JVM warmup problem with a different approach. The idea is to take a snapshot of the running JVM, store it in files and restore the JVM at a later point in time (or even on another machine).
This session will give you a short overview of the CRaC project and shows some results from a proof of concept implementation.
In this deck from the UK HPC Conference, Gunter Roeth from NVIDIA presents: Hardware & Software Platforms for HPC, AI and ML.
"Data is driving the transformation of industries around the world and a new generation of AI applications are effectively becoming programs that write software, powered by data, vs by computer programmers. Today, NVIDIA’s tensor core GPU sits at the core of most AI, ML and HPC applications, and NVIDIA software surrounds every level of such a modern application, from CUDA and libraries like cuDNN and NCCL embedded in every deep learning framework and optimized and delivered via the NVIDIA GPU Cloud to reference architectures designed to streamline the deployment of large scale infrastructures."
Watch the video: https://wp.me/p3RLHQ-l2Y
Learn more: http://nvidia.com
and
http://hpcadvisorycouncil.com/events/2019/uk-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
21. ALL FLASH DATACENTER & IN-MEMORY COMPUTING: HOT TOPICS
21
SOURCE: SAKURA Internet Research Center. (2017/10), Project Sprig.
22. ClickHouse column-oriented database Install memo
22
# uname -sr; cat /etc/issue
Linux 4.10.0-35-generic
Ubuntu 17.04
# apt install software-properties-common
# apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4
# apt-add-repository "deb http://repo.yandex.ru/clickhouse/trusty stable main"
# apt-get update
# apt-get install clickhouse-server-common clickhouse-client -y
# service clickhouse-server start
# clickhouse-client --multiline
ClickHouse client version 1.1.54304.
Connecting to localhost:9000.
Connected to ClickHouse server version 1.1.54304.
:) CREATE TABLE ontime
(
Year UInt16,
Quarter UInt8,
Month UInt8,
:
Div5TailNum String
)
ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192);
or
# xz -v -c -d < ontime.csv.xz | clickhouse-client --query="INSERT INTO ontime FORMAT CSV"
23. MariaDB ColumnStore column-oriented database Install memo
23
# uname -sr; cat /etc/redhat-release
Linux 3.10.0-514.el7.x86_64
Red Hat Enterprise Linux Server release 7.4 (Maipo)
# mkdir mcs; cd mcs;
# wget https://downloads.mariadb.com/ColumnStore/1.0.11/centos/x86_64/7/mariadb-columnstore-1.0.11-1-centos7.x86_64.rpm.tar.gz
# tar xzvf ./mariadb-columnstore-1.0.11-1-centos7.x86_64.rpm.tar.gz
# yum install boost boost-devel boost-doc expect perl-DBD-MySQL -y
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-common.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-common.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-client.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-server.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-libs.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-shared.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-gssapi-client.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-gssapi-server.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-platform.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-storage-engine.rpm -vh
# /usr/local/mariadb/columnstore/bin/postConfigure
Select the type of System Server install [1=single, 2=multi] (2) > 1
Enter System Name (columnstore-1) > sprig-1
Select the type of Data Storage [1=internal, 2=external, 3=GlusterFS] (1) > 1
Enter the list (Nx,Ny,Nz) or range (Nx-Nz) of DBRoot IDs assigned to module 'pm1' (1) > 1
# . /usr/local/mariadb/columnstore/bin/columnstoreAlias
# mcsadmin
MariaDB ColumnStore Admin Console
enter 'help' for list of commands
enter 'exit' to exit the MariaDB ColumnStore Command Console
use up/down arrows to recall commands
mcsadmin>