It Takes Two: Instrumenting the Interaction between In-Memory Databases and Solid-State Drives CIDR 2020 presentation

Data & Analytics

It Takes Two: Instrumenting the
Interaction between In-Memory
Databases and Solid-State Drives
Alberto Lerner1 Jaewook Kwak2 Sangjin Lee2 Kibin Park2
Yong Ho Song2,3 Philippe Cudré-Mauroux1
1 XI Lab – University of Fribourg, Switzerland
2 ENC Lab – Hanyang University, Korea
3 Samsung Electronics, Korea
CIDR – January 2020 - Amsterdam

Motivation
• Where is time going?
• CPU/cache utilization
-> HW performance counters
• Per-instruction cost
-> pprof, linux perf tool
• Operating System impact
-> systemtap, several others
• SSD performance
-> ?
2

Challenges in In-Memory Databases Durability
• Log needs to be written as fast as
possible
• Checkpoint competes with client
request for memory and disk
access
• Can we understand the
interference? Was the TX Log IO
pattern efficient to begin with?
¼
Users Txn’s CP workers
3
host
storage
Txn
Log
Check
point

Cosmos+ OpenSSD
• Idea: let’s instrument an actual
device!
• SSD rapid prototyping platform
• SoC-based
• Fully functional
• Open source firmware
• Next generation is on final stages
of development
4

Instrumentation
• Timestamping (in red)
• Counters (in green)
• Pagemap
• Mechanisms
• Triggers
• Data extraction
commands
8

Performance Event Records (PEV)
• Currently four types of records
IO_TIMESTAMP Regular timestamp stations
GC_TIMESTAMP FTL timestamp stations
PERFORMANCE_INDEX Aggregated counter
PERFORMANCE_INDEX_PER_CH Per channel counters
9

Experimenting with Timestamps
• In-memory Databases Simulated
Workloads
• (1-1) WAL – IPP
• (1-N) WAL – CALC
• (M-N) SILOR / CPR ¼
...
10
Txn
Log
Check
point

Interference Analysis
No interference
2.5x
12

Research Agenda I - Instrumentation
• Functionality Limitations
• Currently limited at 4 channels
• Further annotations to trace back
valid copies
• Contextual triggers
• Signal Generation
• Process instrumentation records
on-the-fly
• Identify scenarios where a
scheduling policy change is
beneficial
13

Research Agenda II – SSD as a Platform
• Adaptive Scheduling
• Respond instantaneously to
signals generated by changing
priorities
• In-Storage Checkpoint
”Derivation”
• Move the checkpoint process
partially or entirely into the device
14

Conclusion
• SSDs don’t have to be black boxes
• The Instrumented Cosmos+ allows designers of both Databases and FTLs to
analyze and understand interference in workloads
• Opportunities to
• Have SSDs interact with applications in richer ways
• Exploit new possibilities of Near-Data Computing for Databases
15

This document discusses improving software implementations of non-linear feedback shift registers (NLFSRs) through parallelism. It presents two approaches - one based on lookup tables (LUTs) and one based on algebraic normal forms (ANFs). The goal is to automatically generate different implementations to introduce variability and improve resistance against side-channel attacks. Experimental results on a KeeLoq implementation for MSP430 show that applying optimizations to the ANF-based approach improved performance by 2.45x in cycles compared to a baseline one-bit at a time implementation, though code size grew by 2.27x.

Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps

Heroku

Embedded Recipes 2017 - Reliable monitoring with systemd - Jérémy Rosen

Anne Nicolas

Embedded systems are autonomous. This simple fact is a driving force in the design of embedded systems which cannot afford the luxury of an operator to press a reset button or even a remote sysadmin to check what happened. Monitoring an application in an embedded system is a complex problem that must deal with the various ways an application can fail, detect them and restart the application if need be. Systemd provides a comprehensive toolbox for the embedded developer to diagnose, monitor and restart the main application of an embedded system. Especially if the embedded application is a black-box software. This talk will review the tools provided by systemd for process monitoring and discuss how to easily deploy them in an embedded system. Jérémy Rosen – Smile-Embedded and connected systems

RTX Kernal

Team-VLSI-ITMU

The RTX kernel is a royalty-free real-time operating system designed for ARM and Cortex-M devices. It allows programs to perform multiple functions simultaneously using parallel tasks. The RTX kernel provides functions to create and manage concurrent tasks, prioritize tasks, and initialize the kernel. Some basic RTX kernel functions include os_sys_init to initialize the kernel, os_tsk_create to create tasks, and os_tsk_delete to terminate tasks.

State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...

Royalzig Luxury Furniture

BKK16-203 Irq prediction or how to better estimate idle time

Linaro

Run time

The document discusses run-time environments and activation records. It explains that activation records are used to manage information for each procedure call and are allocated on the stack. Activation records contain fields for return values, parameters, local variables, and more. When a procedure is called, its activation record is pushed onto the stack and popped off when it returns. Activation records allow recursive calls by creating a new record each time a procedure is activated.

A Gomez T Tat Cesga

Miguel Morales

TimTrack is a software for tracking charged particles. It uses C language for its speed and flexibility. Previous versions used LAPACK or Intel IPP libraries for linear algebra operations. The newest version TimTrack v2.0 uses LAPACK and is 23.6 seconds faster than earlier versions for tracking 1 million particles. Future plans include parallelizing with OpenMP, MPI, and implementing on GPUs using CUDA.

Higher-order finite-volume methods for solving conservation laws can achieve high arithmetic intensity (AI) and improved performance. Theoretical analysis showed that 6th and 8th order methods reach the target AI for modern machines with infinite cache. Measurements of AI using hardware counters on a IBM Blue Gene/Q supercomputer matched the theoretical predictions when using multi-dimensional cache blocking. However, 3D blocking requires too much cache space due to wide halos from higher-order stencils. Iterating rectangular blocks in columns reduces cache usage and allows 6th and 8th order methods to achieve high AI with realistic cache sizes.

Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans

Evention

This talk will start with brief introduction to streaming processing and Flink itself. Next, we will take a look at some of the most interesting recent improvements in Flink such as incremental checkpointing, end-to-end exactly-once processing guarantee and network latency optimizations. We’ll discuss real problems that Flink’s users were facing and how they were addressed by the community and dataArtisans.

Aggregate Sharing for User-Define Data Stream Windows

Aggregation queries on data streams are evaluated over evolving and often overlapping logical views called windows. While the aggregation of periodic windows were extensively studied in the past through the use of aggregate sharing techniques such as Panes and Pairs, little to no work has been put in optimizing the aggregation of very common, non-periodic windows. Typical examples of non-periodic windows are punctuations and sessions which can implement complex business logic and are often expressed as user- defined operators on platforms such as Google Dataflow or Apache Storm. The aggregation of such non-periodic or user-defined windows either falls back to expensive, best-effort aggregate sharing methods, or is not optimized at all. In this paper we present a technique to perform efficient aggregate sharing for data stream windows, which are de- clared as user-defined functions (UDFs) and can contain arbitrary business logic. To this end, we first introduce the concept of User-Defined Windows (UDWs), a simple, UDF-based programming abstraction that allows users to programmatically define custom windows. We then define semantics for UDWs, based on which we design Cutty, a low-cost aggregate sharing technique. Cutty improves and outperforms the state of the art for aggregate sharing on single and multiple queries. Moreover, it enables aggregate sharing for a broad class of non-periodic UDWs. We implemented our techniques on Apache Flink, an open source stream processing system, and performed experiments demonstrating orders of magnitude of reduction in aggregation costs compared to the state of the art.

Accidental Data Analytics

This document provides an overview of using ClickHouse and Grafana for DNS analytics. Some key points: - ClickHouse is a column-oriented database that is fast, scalable, and easy to use for analytics on large datasets like DNS logs. - Grafana is used to visualize the DNS data by connecting it as a data source to ClickHouse. - Examples show querying ClickHouse to analyze DNS data and identify top clients by ASN, response types, and flag combinations. Visualizations like histograms are also demonstrated. - The installation process outlines adding the ClickHouse and Grafana repositories, installing the packages, and configuring the ClickHouse data source plugin for Grafana.

FPGAの処理をソフトウェアコンポーネント化する設計ツールcReCompの高機能化の検討

Kazushi Yamashina

This document discusses using cReComp to develop ROS-compliant FPGA components. cReComp is a tool that takes specifications written in scrp and generates FPGA IP cores and C++ driver code. An example is presented where cReComp is used to generate a FIR filter component from a scrp specification. The component communicates with ROS using topics and processes data in real-time on the FPGA to provide latency of less than 1ms. Details are provided on the component architecture generated by cReComp and how it integrates FPGA hardware acceleration with the ROS framework.

Impatience is a Virtue: Revisiting Disorder in High-Performance Log Analytics

Badrish Chandramouli

There is a growing interest in processing real-time queries over out-of-order streams in this big data era. This paper presents a comprehensive solution to meet this requirement. Our solution is based on Impatience sort, an online sorting technique that is based on an old technique called Patience sort. Impatience sort is tailored for incrementally sorting streaming datasets that present themselves as almost sorted, usually due to network delays and machine failures. With several optimizations, our solution can adapt to both input streams and query logic. Further, we develop a new Impatience framework that leverages Impatience sort to reduce the latency and memory usage of query execution, and supports a range of user latency requirements, without compromising on query completeness and throughput, while leveraging existing efficient in-order streaming engines and operators. We evaluate our proposed solution in Trill, a high-performance streaming engine, and demonstrate that our techniques significantly improve sorting performance and reduce memory usage – in some cases, by over an order of magnitude.

Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...

Flink Forward

This document discusses providing an R dataframe abstraction for efficient distributed computation on Apache Flink. The goals are to provide a natural API for R and achieve performance comparable to Flink's native dataflow. The approach represents R dataframes as Flink data sets and compiles R functions into the native execution plan where possible. For user-defined R functions, they are evaluated within worker tasks using a just-in-time compiler. This allows executing R code within the same Java virtual machine as Flink for good performance, even on a single node. Results show it can achieve native Flink performance even for functions containing R code.

IETF 104: Regext RDAP mirroring

17 registers

Mohammed108

Registers can store multiple bits and are used for temporary storage in a processor. Flip-flops can only store one bit, so registers are needed for tasks like storing 32-bit integers. Registers are faster and more convenient than main memory. Having more registers can help speed up complex calculations. The document then discusses different types of shift registers and how a basic 4-bit register is implemented using D flip-flops.

Apache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream Processing

Reintroducing the Stream Processor: A universal tool for continuous data anal...

poster_A4

Mohamed El Mehdi

Experiments were conducted to evaluate the impact of virtualization on an RTOS by measuring its overheads and latencies when run natively and in a virtual machine. A real-time Linux system was used as the host OS with KVM/Qemu virtualization and Litmus^RT as the guest RTOS. Performance degradation in the virtualized RTOS was found due to the emulation of I/O interrupts by the virtual machine monitor and scheduling of virtual machine processes by the host OS.

CArcMOOC 05.03 - Pipeline hazards

Alessandro Bogliolo

BKK16-506 PMWG Farm

Linaro

FIFODC

sumeet jain

This document describes the gate-level synthesis of a FIFO design using Synopsys Design Compiler. It discusses the FIFO description, introduces Design Compiler and the libraries used. It then outlines the steps to set up Design Compiler and synthesize the design, including specifying libraries, reading the HDL file, setting constraints, and compiling. Timing and reference reports are generated and the synthesized netlist is written out.

Combining Phase Identification and Statistic Modeling for Automated Parallel ...

Mingliang Liu

Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time- and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPrime, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters to create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPrime benchmarks. They retain the original applications' performance characteristics, in particular their relative performance across platforms. Also, the result benchmarks, already released online, are much more compact and easy-to-port compared to the original applications. http://dl.acm.org/citation.cfm?id=2745876

An Introduction to Distributed Data Streaming

Artefactual Systems - Archivematica

Practical Experience with Automation Tools by Tim Walsh (Archivematica Camp B...

Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview

Streamlining pipeline execution for large scale RNA-Seq analysis

Deepak Purushotham

This document describes a pipeline for streamlining RNA-seq execution on high-performance clusters. The pipeline involves importing read files for each sample, performing rRNA filtering, Bowtie alignment, chromosome filtering, transcriptome mapping, mapping quality control, HTSeq counting, and read quality control. It results in a 75% reduction in read files size through gzip compression and removes temporary files to greatly reduce input/output levels for efficient cluster processing.

Harnessing OpenCL in Modern Coprocessors

Unai Lopez-Novoa

Talk @ APT Group, University of Manchester, 06 August 2014 Abstract: Nowadays HPC systems, such as those in the Top500, are equipped with a range of different processors, from multi-core CPUs to GPUs. Programming them can be a tough job, specially if we want to squeeze every last FLOPs of performance out of them. As a Phd Student, I am now doing a brief research visit in the APT group, working in topics related to the programmability and efficient use of GPUs and many-core coprocessors. In particular, I am implementing a large database operation using OpenCL in these state-of-the-art systems. In this talk I will summarize my work in Manchester and discuss the future work in this topic.

Pipelining slides

PrasantaKumarDash2

The document discusses the basics of RISC instruction set architectures and pipelining in CPUs. It begins by describing properties of RISC ISAs, including that operations apply to full registers, only load/store instructions affect memory, and instructions are typically one size. It then describes different types of RISC instructions like ALU, load/store, and branches. The document goes on to explain the implementation of a RISC pipeline in 5 stages and the concept of pipelining to improve CPU performance by overlapping instruction execution. It also discusses potential hazards that can degrade pipeline performance like structural, data, and control hazards.

What's hot

Loffeld_SIAMCSE15

Karen Pao

Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans

Evention

Aggregate Sharing for User-Define Data Stream Windows

Accidental Data Analytics

FPGAの処理をソフトウェアコンポーネント化する設計ツールcReCompの高機能化の検討

Kazushi Yamashina

Impatience is a Virtue: Revisiting Disorder in High-Performance Log Analytics

Badrish Chandramouli

Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...

Flink Forward

IETF 104: Regext RDAP mirroring

17 registers

Mohammed108

Apache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream Processing

Reintroducing the Stream Processor: A universal tool for continuous data anal...

poster_A4

Mohamed El Mehdi

CArcMOOC 05.03 - Pipeline hazards

Alessandro Bogliolo

BKK16-506 PMWG Farm

Linaro

FIFODC

sumeet jain

Combining Phase Identification and Statistic Modeling for Automated Parallel ...

Mingliang Liu

An Introduction to Distributed Data Streaming

Artefactual Systems - Archivematica

Practical Experience with Automation Tools by Tim Walsh (Archivematica Camp B...

Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview

Streamlining pipeline execution for large scale RNA-Seq analysis

Deepak Purushotham

What's hot (20)

Loffeld_SIAMCSE15

Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans

Aggregate Sharing for User-Define Data Stream Windows

Accidental Data Analytics

FPGAの処理をソフトウェアコンポーネント化する設計ツールcReCompの高機能化の検討

Impatience is a Virtue: Revisiting Disorder in High-Performance Log Analytics

Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...

IETF 104: Regext RDAP mirroring

17 registers

Apache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream Processing

Reintroducing the Stream Processor: A universal tool for continuous data anal...

poster_A4

CArcMOOC 05.03 - Pipeline hazards

BKK16-506 PMWG Farm

FIFODC

Combining Phase Identification and Statistic Modeling for Automated Parallel ...

An Introduction to Distributed Data Streaming

Practical Experience with Automation Tools by Tim Walsh (Archivematica Camp B...

Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview

Streamlining pipeline execution for large scale RNA-Seq analysis

Similar to It Takes Two: Instrumenting the Interaction between In-Memory Databases and Solid-State Drives CIDR 2020 presentation

Harnessing OpenCL in Modern Coprocessors

Unai Lopez-Novoa

Pipelining slides

PrasantaKumarDash2

Coa.ppt2

PrasantaKumarDash2

The document discusses RISC instruction set basics and pipelining concepts. It begins by describing properties of RISC architectures, including that operations apply to full registers and only load/store instructions affect memory. It then describes different types of RISC instructions like ALU, load/store, and branches. The document goes on to explain the implementation of instructions in a MIPS64 pipeline with 5 stages: instruction fetch, decode/register fetch, execute, memory access, and write-back. It concludes by defining pipelining and describing how it can increase throughput by overlapping instruction execution.

CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...

CanSecWest

This document discusses using Intel Processor Trace (Intel PT) for hardware-based tracing on Windows. It provides an overview of Intel PT capabilities and how it can be used for fuzzing and vulnerability discovery. Specifically, it describes the development of WinAFL IntelPT, which integrates Intel PT tracing with the WinAFL evolutionary fuzzer to enable high-performance, hardware-driven fuzzing on Windows.

KSpeculative aspects of high-speed processor design

ssuser7dcef0

Performance Enhancement with Pipelining

Aneesh Raveendran

This document discusses instruction pipelining in computer processors. It begins by defining pipelining and explaining how it works like an assembly line to increase throughput. It then discusses different types of pipelines and introduces the MIPS instruction pipeline as an example. The document goes on to explain different types of pipeline hazards like structural hazards, control hazards, and data hazards. It provides examples of how to detect and resolve these hazards through techniques like forwarding, stalling, predicting, and delayed branching. Key concepts covered include pipeline registers, control signals, forwarding units, and branch prediction buffers.

참여기관_발표자료-국민대학교 201301 정기회의

DzH QWuynh

1. The document discusses research activities related to reducing energy consumption by at least 30% through the development of core source technologies for universal operating systems. 2. It describes four papers being presented, including ones on system and device latency modeling, power management frameworks for embedded systems, and automatic selection of power policies for operating systems. 3. It also summarizes four research topics from the National University, including performance evaluation of parallel applications using a power-aware paging method on next-generation memory architectures.

MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...

Heechul Yun

This document describes MemGuard, an operating system mechanism for providing efficient per-core memory performance isolation on commercial off-the-shelf hardware. MemGuard uses memory bandwidth reservation to guarantee each core's minimum memory bandwidth. It then performs predictive bandwidth donation and on-demand reclaiming to redistribute excess bandwidth, improving overall utilization. Evaluation shows MemGuard isolates performance and eliminates over 50% slowdown of a foreground real-time task due to interference, while maximizing throughput via bandwidth sharing.

Operating Systems 1 (10/12) - Scheduling

Peter Tröger

Preparing OpenSHMEM for Exascale

inside-BigData.com

In this video from the 2015 Stanford HPC Conference, Pavel Shamis from ORNL presents: Preparing OpenSHMEM for Exascale. "OpenSHMEM is a partitioned global address space (PGAS) one-sided communications library that enables remote memory access (RMA) across processing elements (PEs). Its API allows data to be transferred from one PE memory space to another PE’s symmetric memory space; decoupling the data transfers from synchronizations. OpenSHMEM is useful for applications that are latency driven or that have irregular communication patterns, because its one-sided API can be mapped very efficiently to hardware (e.g. RDMA interconnects, etc), and its one-sided programming model helps the overlapping of communication with computation. Summit is Oak Ridge National Laboratory’s next high performance supercomputer system that will be based on a many core/GPU hybrid architecture. In order to prepare OpenSHMEM for future systems, it is important to enhance its programming model to enable efficient utilization of the new hardware capabilities (e.g. massive multithreaded systems, accesses different type memories, next generation of interconnects, etc). This session will present recent advances in the area of OpenSHMEM extensions, implementations, and tools.” Watch the video: http://insidehpc.com/2015/02/video-preparing-openshmem-for-exascale/ See more talks in the Stanford HPC Conference Video Gallery: http://wp.me/P3RLHQ-dOO

AN INTRODUCTION TO OPERATING SYSTEMS : CONCEPTS AND PRACTICE - PHI Learning

PHI Learning Pvt. Ltd.

The book, now in its Fifth Edition, aims to provide a practical view of GNU/Linux and Windows 7, 8 and 10, covering different design considerations and patterns of use. The section on concepts covers fundamental principles, such as file systems, process management, memory management, input-output, resource sharing, interprocess communication (IPC), distributed computing, OS security, real-time and microkernel design. This thoroughly revised edition comes with a description of an instructional OS to support teaching of OS and also covers Android, currently the most popular OS for handheld systems. Basically, this text enables students to learn by practicing with the examples and doing exercises.

Module 3-cpu-scheduling

Hesham Elmasry

This document provides an overview of CPU scheduling concepts including multiprogramming, multitasking, process creation, the short-term scheduler, process control blocks, the long-term scheduler, and the medium-term scheduler. It also discusses classifications of processes as interactive, batch, or real-time processes and as I/O-bound or CPU-bound processes. Finally, it introduces common CPU scheduling algorithms like first-come first-served (FCFS), shortest job first (SJF), and round-robin (RR).

RTOS Material hfffffffffffffffffffffffffffffffffffff

adugnanegero

EKernel Thesis: an object-oriented micro-kernel

Murphy Chen

The document describes the design and implementation of EKernel, an object-oriented microkernel. It aims to address issues of portability, maintainability, extensibility, and efficiency in operating system design. The key aspects of EKernel's design include using processes and threads as core abstractions, implementing inter-process communication via messaging, and providing a modular architecture with well-defined interfaces. Performance tests show EKernel achieves lower overhead than other microkernels for operations like context switches and IPC. Future work plans to enhance EKernel's scheduler and implement a networking subsystem.

Provenance for Data Munging Environments

Paul Groth

Data munging is a crucial task across domains ranging from drug discovery and policy studies to data science. Indeed, it has been reported that data munging accounts for 60% of the time spent in data analysis. Because data munging involves a wide variety of tasks using data from multiple sources, it often becomes difficult to understand how a cleaned dataset was actually produced (i.e. its provenance). In this talk, I discuss our recent work on tracking data provenance within desktop systems, which addresses problems of efficient and fine grained capture. I also describe our work on scalable provence tracking within a triple store/graph database that supports messy web data. Finally, I briefly touch on whether we will move from adhoc data munging approaches to more declarative knowledge representation languages such as Probabilistic Soft Logic. Presented at Information Sciences Institute - August 13, 2015

OOW-IMC-final

Manuel Martin Marquez

The document discusses CERN's use of Oracle's In-Memory Column Store to perform real-time analysis of physics experiment data from the Large Hadron Collider. Benchmark tests showed significant performance improvements over traditional row-based storage, with analytic queries running 10-100x faster. The columnar format also improved data compression rates. Additionally, OLTP workloads saw no negative impacts. CERN plans to consider the technology for future projects given its ability to enable real-time analysis that was previously not possible.

Intel® hyper threading technology

Amirali Sharifian

Os2

gopal10scs185

This document discusses the slides for Unit 2 of the Operating Systems course. It includes an index of lecture topics that will be covered, such as process concepts and threads, scheduling criteria and algorithms, thread scheduling, case studies of UNIX/Linux and Windows operating systems, and revision. Key concepts that will be covered include processes and threads, process state diagrams, process control blocks, CPU scheduling queues, producer-consumer problem solutions, scheduling criteria and algorithms like FCFS, SJF, priority and round robin, and thread scheduling models.

Lec 9-os-review

Mothi R

This document discusses operating systems and their core abstractions like uninterrupted computation, infinite memory, and simple I/O. It describes how operating systems provide these abstractions using mechanisms like context switching, virtual memory, and system calls. It also covers different types of operating systems and characteristics of embedded operating systems like real-time capabilities.

Using the big guns: Advanced OS performance tools for troubleshooting databas...

Nikolay Savvinov

Using OS performance tools and basic alternatives to troubleshoot production database issues The document discusses using Linux performance tools like pidstat, ps, and tracing tools like perf, systemtap, and dtrace to troubleshoot complex database problems that may involve issues at the operating system, hardware, or network level. It provides examples of using these tools to diagnose specific issues like memory fragmentation, I/O problems, and network congestion and presents a methodology around reproducing issues, analyzing tool output, identifying root causes, and developing solutions.

Similar to It Takes Two: Instrumenting the Interaction between In-Memory Databases and Solid-State Drives CIDR 2020 presentation (20)

Harnessing OpenCL in Modern Coprocessors

Pipelining slides

Coa.ppt2

CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...

KSpeculative aspects of high-speed processor design

Performance Enhancement with Pipelining

참여기관_발표자료-국민대학교 201301 정기회의

MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...

Operating Systems 1 (10/12) - Scheduling

Preparing OpenSHMEM for Exascale

AN INTRODUCTION TO OPERATING SYSTEMS : CONCEPTS AND PRACTICE - PHI Learning

Module 3-cpu-scheduling

RTOS Material hfffffffffffffffffffffffffffffffffffff

EKernel Thesis: an object-oriented micro-kernel

Provenance for Data Munging Environments

OOW-IMC-final

Intel® hyper threading technology

Os2

Lec 9-os-review

Using the big guns: Advanced OS performance tools for troubleshooting databas...

More from eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction

1) The document presents HINGE, a new method for embedding hyper-relational knowledge graphs that aims to better capture information from facts containing multiple relations and entities. 2) HINGE uses a CNN to learn representations from base triplets and their associated key-value pairs to characterize the plausibility of facts. 3) An evaluation on link prediction tasks shows HINGE outperforms baselines and demonstrates that the triplet structure encodes essential information, while other representations discard important information.

Representation Learning on Complex Graphs

A force directed approach for offline gps trajectory map

Cikm 2018

HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...

This document proposes HistoSketch, a method for sketching streaming histograms that preserves similarity and adapts to concept drift. It works by: 1) Generating weighted samples from histograms such that the probability two sketches match equals histogram similarity. 2) Incrementally updating sketches using a weight decay factor to forget older data and adapt to drift over time. 3) Evaluating HistoSketch on classification tasks involving synthetic and real-world streaming data, finding it approximates histogram similarity well using small, fixed-size sketches while adapting rapidly to drift.

SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...

This document presents SwissLink, a high-precision context-free entity linking system. It extracts unambiguous surface forms (labels) from knowledge bases like DBpedia and Wikipedia to link entity mentions without context. It catalogs the surface forms, removes ambiguous ones using ratio and percentile methods, and performs fast string matching to link mentions. Evaluation on 30 Wikipedia articles shows the percentile-ratio method achieves over 95% precision and 45% recall, balancing precision and recall.

Dependency-Driven Analytics: A Compass for Uncharted Data Oceans

Crowd scheduling www2016

The document proposes a novel crowdsourcing system architecture and scheduling algorithm to address job starvation in multi-tenant crowd-powered systems. The architecture introduces HIT-Bundles to group heterogeneous tasks and control task serving. The Worker Conscious Fair Scheduling algorithm balances fairness and priority while minimizing worker context switching between tasks. Experiments on Amazon Mechanical Turk show the approach increases throughput over baseline schedulers and adapts to varying workforce levels and job priorities.

SANAPHOR: Ontology-based Coreference Resolution

This document presents SANAPHOR, an ontology-based coreference resolution system that improves upon existing approaches by leveraging semantic information. It first links entities in document clusters to semantic types and ontologies. It then splits or merges clusters based on these semantic relationships. The system was evaluated on the CoNLL-2012 dataset, where it improved coreference resolution performance over the baseline Stanford system, particularly for noun clusters. By utilizing semantic knowledge, SANAPHOR demonstrates the benefits of enhancing syntactic coreference resolution with an additional semantic layer.

Efficient, Scalable, and Provenance-Aware Management of Linked Data

The proliferation of heterogeneous Linked Data on the Web requires data management systems to constantly improve their scalability and efficiency. Despite recent advances in distributed Linked Data management, efficiently processing large amounts of Linked Data in a scalable way is still very challenging. In spite of their seemingly simple data models, Linked Data actually encode rich and complex graphs mixing both instance and schema level data. At the same time, users are increasingly interested in investigating or visualizing large collections of online data by performing complex analytic queries. The heterogeneity of Linked Data on the Web also poses new challenges to database systems. The capacity to store, track, and query provenance data is becoming a pivotal feature of Linked Data Management Systems. In this thesis, we tackle issues revolving around processing queries on big, unstructured, and heterogeneous Linked Data graphs.

Entity-Centric Data Management

1) Entity-centric data management stores information at the entity level and integrates information by interlinking entities. This provides advantages over keyword-based and relational database approaches. 2) The XI Pipeline extracts mentions from text and performs named entity recognition, entity linking, and entity typing to associate entities with text. 3) Approaches like ZenCrowd and TRank leverage both algorithms and human computation through crowdsourcing to improve entity linking and fine-grained entity typing.

SSSW 2015 Sense Making

This document summarizes a presentation given at SSSW 2015 on making sense of semantic data. It discusses challenges in understanding semantic web data, including a "language gap" between semantic web languages like SPARQL and natural language. It presents an approach to bridging this gap through automatically verbalizing SPARQL queries in English. Evaluation results show this helps non-experts understand queries better and faster than the SPARQL format. It also discusses the "semantic gap" caused by mismatches between a question's semantics and a knowledge graph, and presents an approach using templates to generate SPARQL queries from natural language questions.

LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data

Executing Provenance-Enabled Queries over Web Data

The proliferation of heterogeneous Linked Data on the Web poses new challenges to database systems. In particular, because of this heterogeneity, the capacity to store, track, and query provenance data is becoming a pivotal feature of modern triple stores. In this paper, we tackle the problem of efficiently executing provenance-enabled queries over RDF data. We propose, implement and empirically evaluate five different query execution strategies for RDF queries that incorporate knowledge of provenance. The evaluation is conducted on Web Data obtained from two different Web crawls (The Billion Triple Challenge, and the Web Data Commons). Our evaluation shows that using an adaptive query materialization execution strategy performs best in our context. Interestingly, we find that because provenance is prevalent within Web Data and is highly selective, it can be used to improve query processing performance. This is a counterintuitive result as provenance is often associated with additional overhead.

The Dynamics of Micro-Task Crowdsourcing

Micro-task crowdsourcing is rapidly gaining popularity among research communities and businesses as a means to leverage Human Computation in their daily operations. Unlike any other service, a crowdsourcing platform is in fact a marketplace subject to human factors that affect its performance, both in terms of speed and quality. Indeed, such factors shape the dynamics of the crowdsourcing market. For example, a known behavior of such markets is that increasing the reward of a set of tasks would lead to faster results. However, it is still unclear how different dimensions interact with each other: reward, task type, market competition, requester reputation, etc. In this paper, we adopt a data-driven approach to (A) perform a long-term analysis of a popular micro-task crowdsourcing platform and understand the evolution of its main actors (workers, requesters, and platform). (B) We leverage the main findings of our five year log analysis to propose features used in a predictive model aiming at determining the expected performance of any batch at a specific point in time. We show that the number of tasks left in a batch and how recent the batch is are two key features of the prediction. (C) Finally, we conduct an analysis of the demand (new tasks posted by the requesters) and supply (number of tasks completed by the workforce) and show how they affect task prices on the marketplace.

Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...

This document proposes three methods - LEXT, REXT, and LERIXT - for disambiguating the domain and range of properties in linked data by using context information. LEXT uses the type of subject resources, REXT uses the type of object resources, and LERIXT uses both. The methods were evaluated against expert judgments and achieved up to 96.5% precision for LEXT and 91.4% for REXT. LERIXT generated too many new sub-properties.

CIKM14: Fixing grammatical errors by preposition ranking

The detection and correction of grammatical errors still represent very hard problems for modern error-correction systems. As an example, the top-performing systems at the preposition correction challenge CoNLL-2013 only achieved a F1 score of 17%. In this paper, we propose and extensively evaluate a series of approaches for correcting prepositions, analyzing a large body of high-quality textual content to capture language usage. Leveraging n-gram statistics, association measures, and machine learning techniques, our system is able to learn which words or phrases govern the usage of a specific preposition. Our approach makes heavy use of n-gram statistics generated from very large textual corpora. In particular, one of our key features is the use of n-gram association measures (e.g., Pointwise Mutual Information) between words and prepositions to generate better aggregated preposition rankings for the individual n-grams. We evaluate the effectiveness of our approach using cross-validation with different feature combinations and on two test collections created from a set of English language exams and StackExchange forums. We also compare against state-of-the-art supervised methods. Experimental results from the CoNLL-2013 test collection show that our approach to preposition correction achieves ~30% in F1 score which results in 13% absolute improvement over the best performing approach at that challenge.

OLTP-Bench

OLTPBenchmark is a multi-threaded load generator. The framework is designed to be able to produce variable rate, variable mixture load against any JDBC-enabled relational database. The framework also provides data collection features, e.g., per-transaction-type latency and throughput logs. Together with the framework we provide the following OLTP/Web benchmarks: TPC-C Wikipedia Synthetic Resource Stresser Twitter Epinions.com TATP AuctionMark SEATS YCSB JPAB (Hibernate) CH-benCHmark Voter (Japanese "American Idol") SIBench (Snapshot Isolation) SmallBank LinkBench CH-benCHmark

An Introduction to Big Data

Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)