The document describes a proposed O(1) algorithm for implementing a Least Frequently Used (LFU) cache eviction scheme. It begins with uses cases where LFU is preferable to other algorithms like LRU due to its handling of frequently requested static resources. It then outlines the standard dictionary operations - insert, lookup, delete - required for an LFU cache and how previous LFU implementations had O(log n) time complexity for these. The proposed algorithm maintains two linked lists, one tracking access frequency and one for elements of the same frequency, allowing O(1) time complexity for all operations through the use of a hash table and frequency/item node lists.
A Brief introduction to Apache Storm. Talk given at the October Toronto Java User Group meeting, video available at https://www.youtube.com/watch?v=CWyH4-SOGm8
The document discusses Java NIO (New IO), which introduces channels and buffers for non-blocking and asynchronous network I/O. It covers the core NIO components like channels, buffers, and selectors. Channels handle I/O requests and are connected to buffers which provide access to data. Selectors allow a thread to handle multiple channels, monitoring them for events like incoming data. Pipes provide a one-way data connection between threads using a source and sink channel. Overall, NIO improves on standard IO by allowing non-blocking I/O and more scalable solutions for networking applications.
Apache Flink Training: DataSet API BasicsFlink Forward
This document provides an overview of the Apache Flink DataSet API. It introduces key concepts such as batch processing, data types including tuples, transformations like map, filter, group, and reduce, joining datasets, data sources and sinks, and an example word count program in Java. The word count example demonstrates reading text data, tokenizing strings, grouping and counting words, and writing the results. The document contains slides with code snippets and explanations of Flink's DataSet API concepts and features.
Apache Flink Training: DataStream API Part 2 Advanced Flink Forward
Flink can handle many data types and provides a type system to identify types for serialization and comparisons. Composite types like Tuples and POJOs can be used and fields within them can define keys. Windows provide a way to perform aggregations over finite slices of infinite streams. Connected streams allow correlating and joining multiple streams. Stateful functions have access to local and partitioned state for stateful stream processing. Kafka integration allows consuming from and producing to Kafka topics.
- Java NIO provides non-blocking and asynchronous network input/output that can help address performance issues with Java IO such as blocking behavior and thread overhead.
- Key aspects of Java NIO include direct buffers to reduce copying, channel-based IO to support scatter/gather operations, and selectors for asynchronous non-blocking IO across multiple channels.
- Channels in Java NIO support operations like file IO, socket IO and transferring data between channels, while buffers are used to store and move data for channels in direct memory without copying.
This document discusses file handling in C++. It defines a file as a collection of related data stored in a particular area on a disk. It describes why files are used to permanently store data for future use or transfer between computers. It discusses the different stream classes like fstream, ifstream and ofstream used for file input/output. It explains how to open, close and manipulate file pointers to read and write data from files using functions like open(), close(), read(), write(), seekg() and tellg(). It provides an example program that demonstrates writing student data to a file and reading from it based on roll number.
The latest slide deck about Java NIO.2 from *instinctools Java developer Alexandr Brui. You can find here Java NIO.2 design, work with data, selector & channels, reading & writing and many more.
A Brief introduction to Apache Storm. Talk given at the October Toronto Java User Group meeting, video available at https://www.youtube.com/watch?v=CWyH4-SOGm8
The document discusses Java NIO (New IO), which introduces channels and buffers for non-blocking and asynchronous network I/O. It covers the core NIO components like channels, buffers, and selectors. Channels handle I/O requests and are connected to buffers which provide access to data. Selectors allow a thread to handle multiple channels, monitoring them for events like incoming data. Pipes provide a one-way data connection between threads using a source and sink channel. Overall, NIO improves on standard IO by allowing non-blocking I/O and more scalable solutions for networking applications.
Apache Flink Training: DataSet API BasicsFlink Forward
This document provides an overview of the Apache Flink DataSet API. It introduces key concepts such as batch processing, data types including tuples, transformations like map, filter, group, and reduce, joining datasets, data sources and sinks, and an example word count program in Java. The word count example demonstrates reading text data, tokenizing strings, grouping and counting words, and writing the results. The document contains slides with code snippets and explanations of Flink's DataSet API concepts and features.
Apache Flink Training: DataStream API Part 2 Advanced Flink Forward
Flink can handle many data types and provides a type system to identify types for serialization and comparisons. Composite types like Tuples and POJOs can be used and fields within them can define keys. Windows provide a way to perform aggregations over finite slices of infinite streams. Connected streams allow correlating and joining multiple streams. Stateful functions have access to local and partitioned state for stateful stream processing. Kafka integration allows consuming from and producing to Kafka topics.
- Java NIO provides non-blocking and asynchronous network input/output that can help address performance issues with Java IO such as blocking behavior and thread overhead.
- Key aspects of Java NIO include direct buffers to reduce copying, channel-based IO to support scatter/gather operations, and selectors for asynchronous non-blocking IO across multiple channels.
- Channels in Java NIO support operations like file IO, socket IO and transferring data between channels, while buffers are used to store and move data for channels in direct memory without copying.
This document discusses file handling in C++. It defines a file as a collection of related data stored in a particular area on a disk. It describes why files are used to permanently store data for future use or transfer between computers. It discusses the different stream classes like fstream, ifstream and ofstream used for file input/output. It explains how to open, close and manipulate file pointers to read and write data from files using functions like open(), close(), read(), write(), seekg() and tellg(). It provides an example program that demonstrates writing student data to a file and reading from it based on roll number.
The latest slide deck about Java NIO.2 from *instinctools Java developer Alexandr Brui. You can find here Java NIO.2 design, work with data, selector & channels, reading & writing and many more.
This document provides an overview of new features in NIO 2 and asynchronous I/O in Java. It discusses buffers, channels, selectors, file system APIs, file change notification, and asynchronous operations using futures and completion handlers. The key aspects covered include non-blocking I/O, readiness selection, file locking, memory mapping, and file attributes.
Cleanup and new optimizations in WPython 1.1PyCon Italia
This document summarizes optimizations and cleanup made to the WPython interpreter in version 1.1. Key changes include removing an unnecessary type object "hack" from version 1.0, fixing tracing for optimized loops, removing redundant checks and opcodes, adding new opcode functions, and implementing specialized opcodes for common patterns like comparisons.
The document discusses the NIO.2 API introduced in Java 7 for improved file I/O and asynchronous operations. It compares features before and after NIO.2, including file system walking, symbolic links, attributes, permissions, and change notifications. The key classes and interfaces of NIO.2 like Path, WatchService, and FileAttributeViews are presented along with examples of basic file operations, attributes handling and change watching.
This document discusses working with files in C++. It covers opening and closing files using constructors and the open() function. It describes using input and output streams to read from and write to files. It also discusses the different file stream classes like ifstream, ofstream, and fstream and their functions. Finally, it mentions the different file opening modes that can be used with the open() function.
Linked list using Dynamic Memory Allocationkiran Patel
A linked list is a linear data structure where each node contains a link to the next node. There are several types of linked lists including singly-linked, doubly-linked, circular, and circular doubly-linked lists. Linked lists allow for efficient insertion and removal of nodes and can grow and shrink dynamically. Common operations on linked lists include creation, insertion, deletion, traversal, searching, and concatenation.
This document discusses several Java frameworks for full-text search: Lucene, Solr, SolrJ, and regular expressions. Lucene is a full-featured text search engine library written in Java. Solr is a standalone search server built on Lucene that has REST APIs and is easier to use. SolrJ is a Java client for Solr. Examples are provided for implementing searches using Lucene, Solr, and SolrJ.
The document discusses stacks and queues as linear data structures. It defines a stack as a first-in last-out (LIFO) structure where elements are inserted and deleted from one end. Stacks are commonly used to handle function calls and parameters. The document also defines queues as first-in first-out (FIFO) structures where elements are inserted at the rear and deleted from the front. Examples of stack and queue applications are given. Implementation of stacks using arrays and pointers is described along with push and pop algorithms.
The document discusses Java SE 7 features including Project Coin, NIO.2, invokedynamic, and Fork/Join framework. It provides examples of using try-with-resources to automatically close resources without finally blocks, and using the Fork/Join framework to easily parallelize tasks by splitting work, forking subtasks, and joining results.
This document discusses file processing and input/output (I/O) in C++. It covers opening and reading from input files, processing the data, writing output to output files, and closing the files. Key points include:
1) The fstream library is used for file I/O, with ifstream for input and ofstream for output. Open is used to connect the file streams to external files.
2) A while loop processes records from the input file by reading values, calculating output, and writing to the output file until the end of file is reached.
3) Constructors initialize file stream objects, and can be overloaded. Do-while loops are not suitable for file processing due
The document discusses new features in Java SE 7 including Project Coin, NIO.2, invokedynamic, and try-with-resources. It also covers concurrency topics such as fork/join, executors, and synchronization patterns. Examples are provided for using fork/join tasks to sum arrays in parallel and the try-with-resources statement to ensure stream resources are closed automatically.
Real-Time Integration Between MongoDB and SQL DatabasesEugene Dvorkin
This document describes how WebMD uses Apache Storm to build a real-time data pipeline that moves data from MongoDB to SQL databases. A Storm topology is constructed with a spout that reads continuously from the MongoDB oplog and emits tuples. These tuples are then processed by bolts that extract fields from embedded arrays, parse documents, and write the data to SQL databases. This pipeline allows for real-time analytics on user activity data stored in MongoDB to be performed using SQL queries. The topology scales easily to handle increasing data volumes and velocities.
The document discusses methods for solving the smallest k-enclosing circle problem, which is finding the smallest circle that encloses at least k points from a set of n points. It presents two simple algorithms: 1) generating circles that pass through every pair of points, running in O(n^3 log(n)) time; and 2) generating circles hinged at each point, running in O(n^2 log(n) log(d)) time where d is the geometric span of the points. The problem has applications in facility location and military targeting.
The document describes a resilient priority queue data structure that can tolerate up to δ memory corruptions. The priority queue is organized into layers of buffers that shift elements between each other to maintain invariants while inserting or deleting minimum elements. It supports insert and delete-min operations in O(logn + δ) amortized time using O(n + δ) space. The structure was inspired by cache-oblivious priority queues.
The document discusses solving the airline seating problem of arranging passengers in seats next to their assigned partners with a minimum number of seat swaps. It presents an example problem with an initial seating arrangement that can be transformed into a happy state with all partners seated together in 3 swaps. The document then provides a greedy algorithm that solves the problem for any given array size in linear time. Finally, it provides an intuitive proof that the algorithm is optimal by framing the problem as minimizing the number of complete partner sets formed with each swap.
The document summarizes Kleinberg's work on decentralized algorithms for the small world problem. Specifically, it discusses three main results: 1) For random long-range connections, no decentralized algorithm can find short paths faster than time exponential in the graph size. 2) When long-range connections follow a 2D geometric distribution, there exists an algorithm that can find paths in time polynomial in log(n). 3) More generally, the runtime depends on the power law governing long-range connections.
- The document discusses techniques for reducing the size of large datasets ("big data") by reducing the number of observations and features.
- Dimensionality reduction techniques like principal component analysis (PCA) and random projections can reduce the number of features to a lower dimensional space while preserving distances between observations.
- PCA finds an aligned coordinate system that maximizes the spread of data, while random projections randomly determine a coordinate system. Both techniques can significantly compress datasets, especially those with many redundant features like images.
Simplicity in Web Application Design - Laura Chessman, Lisa Battle and Rachel...UXPA International
Simplicity is one of the most important principles of design. It has been a pillar of design thinking for a very long time—long before the advent of human factors, usability, and user experience. But, realistically, simplicity isn’t always simple. Commercial software, enterprise applications, software as a service (SaaS), and other highly interactive applications often have no choice but to do a great number of things, because they support a range of real world tasks, some of which are complex. In this presentation, we will discuss what to try when removing functionality or features isn’t an option. We provide practical questions to ask when deciding whether and how to simplify an application. And we summarize proven design techniques to use when simplifying applications, illustrated with examples from real projects.
The document discusses two standard template library containers for storing key-value pairs: unordered_map and map. Unordered_map uses a hash table to provide fast lookup of elements in average O(1) time, but the elements are in random order. Map uses a binary search tree to provide O(logN) lookup time and stores elements in sorted order. Both support insertion, erasure, and lookup of elements using similar functions but have different underlying implementations and time complexities.
This document provides an overview of priority queues and binary trees as data structures. It discusses:
1) The implementation of a priority queue using an array, including methods for insertion, removal, and checking length. The priority queue removes elements based on priority rather than FIFO order.
2) Issues with the array implementation for large data sets, and introducing an alternative tree implementation.
3) An introduction to binary trees as a non-linear data structure, providing examples like a family tree. Binary trees partition the data into a root node and left/right subtrees.
Here are the key points covered in this introduction:
- A digital signal can assume only a finite set of values in both the dependent (usually amplitude) and independent (usually time or space) variables.
- Digital signals are commonly used to represent things like the human voice (telephone), audio (radio, TV, hi-fi), and more.
- Digital signals are becoming more prevalent because of their superior fidelity compared to analog signals.
The introduction provides a brief overview of digital signals and notes their widespread use in applications like telecommunications and audio. It establishes that the course will focus on analyzing and processing digital signals.
This document provides an overview of new features in NIO 2 and asynchronous I/O in Java. It discusses buffers, channels, selectors, file system APIs, file change notification, and asynchronous operations using futures and completion handlers. The key aspects covered include non-blocking I/O, readiness selection, file locking, memory mapping, and file attributes.
Cleanup and new optimizations in WPython 1.1PyCon Italia
This document summarizes optimizations and cleanup made to the WPython interpreter in version 1.1. Key changes include removing an unnecessary type object "hack" from version 1.0, fixing tracing for optimized loops, removing redundant checks and opcodes, adding new opcode functions, and implementing specialized opcodes for common patterns like comparisons.
The document discusses the NIO.2 API introduced in Java 7 for improved file I/O and asynchronous operations. It compares features before and after NIO.2, including file system walking, symbolic links, attributes, permissions, and change notifications. The key classes and interfaces of NIO.2 like Path, WatchService, and FileAttributeViews are presented along with examples of basic file operations, attributes handling and change watching.
This document discusses working with files in C++. It covers opening and closing files using constructors and the open() function. It describes using input and output streams to read from and write to files. It also discusses the different file stream classes like ifstream, ofstream, and fstream and their functions. Finally, it mentions the different file opening modes that can be used with the open() function.
Linked list using Dynamic Memory Allocationkiran Patel
A linked list is a linear data structure where each node contains a link to the next node. There are several types of linked lists including singly-linked, doubly-linked, circular, and circular doubly-linked lists. Linked lists allow for efficient insertion and removal of nodes and can grow and shrink dynamically. Common operations on linked lists include creation, insertion, deletion, traversal, searching, and concatenation.
This document discusses several Java frameworks for full-text search: Lucene, Solr, SolrJ, and regular expressions. Lucene is a full-featured text search engine library written in Java. Solr is a standalone search server built on Lucene that has REST APIs and is easier to use. SolrJ is a Java client for Solr. Examples are provided for implementing searches using Lucene, Solr, and SolrJ.
The document discusses stacks and queues as linear data structures. It defines a stack as a first-in last-out (LIFO) structure where elements are inserted and deleted from one end. Stacks are commonly used to handle function calls and parameters. The document also defines queues as first-in first-out (FIFO) structures where elements are inserted at the rear and deleted from the front. Examples of stack and queue applications are given. Implementation of stacks using arrays and pointers is described along with push and pop algorithms.
The document discusses Java SE 7 features including Project Coin, NIO.2, invokedynamic, and Fork/Join framework. It provides examples of using try-with-resources to automatically close resources without finally blocks, and using the Fork/Join framework to easily parallelize tasks by splitting work, forking subtasks, and joining results.
This document discusses file processing and input/output (I/O) in C++. It covers opening and reading from input files, processing the data, writing output to output files, and closing the files. Key points include:
1) The fstream library is used for file I/O, with ifstream for input and ofstream for output. Open is used to connect the file streams to external files.
2) A while loop processes records from the input file by reading values, calculating output, and writing to the output file until the end of file is reached.
3) Constructors initialize file stream objects, and can be overloaded. Do-while loops are not suitable for file processing due
The document discusses new features in Java SE 7 including Project Coin, NIO.2, invokedynamic, and try-with-resources. It also covers concurrency topics such as fork/join, executors, and synchronization patterns. Examples are provided for using fork/join tasks to sum arrays in parallel and the try-with-resources statement to ensure stream resources are closed automatically.
Real-Time Integration Between MongoDB and SQL DatabasesEugene Dvorkin
This document describes how WebMD uses Apache Storm to build a real-time data pipeline that moves data from MongoDB to SQL databases. A Storm topology is constructed with a spout that reads continuously from the MongoDB oplog and emits tuples. These tuples are then processed by bolts that extract fields from embedded arrays, parse documents, and write the data to SQL databases. This pipeline allows for real-time analytics on user activity data stored in MongoDB to be performed using SQL queries. The topology scales easily to handle increasing data volumes and velocities.
The document discusses methods for solving the smallest k-enclosing circle problem, which is finding the smallest circle that encloses at least k points from a set of n points. It presents two simple algorithms: 1) generating circles that pass through every pair of points, running in O(n^3 log(n)) time; and 2) generating circles hinged at each point, running in O(n^2 log(n) log(d)) time where d is the geometric span of the points. The problem has applications in facility location and military targeting.
The document describes a resilient priority queue data structure that can tolerate up to δ memory corruptions. The priority queue is organized into layers of buffers that shift elements between each other to maintain invariants while inserting or deleting minimum elements. It supports insert and delete-min operations in O(logn + δ) amortized time using O(n + δ) space. The structure was inspired by cache-oblivious priority queues.
The document discusses solving the airline seating problem of arranging passengers in seats next to their assigned partners with a minimum number of seat swaps. It presents an example problem with an initial seating arrangement that can be transformed into a happy state with all partners seated together in 3 swaps. The document then provides a greedy algorithm that solves the problem for any given array size in linear time. Finally, it provides an intuitive proof that the algorithm is optimal by framing the problem as minimizing the number of complete partner sets formed with each swap.
The document summarizes Kleinberg's work on decentralized algorithms for the small world problem. Specifically, it discusses three main results: 1) For random long-range connections, no decentralized algorithm can find short paths faster than time exponential in the graph size. 2) When long-range connections follow a 2D geometric distribution, there exists an algorithm that can find paths in time polynomial in log(n). 3) More generally, the runtime depends on the power law governing long-range connections.
- The document discusses techniques for reducing the size of large datasets ("big data") by reducing the number of observations and features.
- Dimensionality reduction techniques like principal component analysis (PCA) and random projections can reduce the number of features to a lower dimensional space while preserving distances between observations.
- PCA finds an aligned coordinate system that maximizes the spread of data, while random projections randomly determine a coordinate system. Both techniques can significantly compress datasets, especially those with many redundant features like images.
Simplicity in Web Application Design - Laura Chessman, Lisa Battle and Rachel...UXPA International
Simplicity is one of the most important principles of design. It has been a pillar of design thinking for a very long time—long before the advent of human factors, usability, and user experience. But, realistically, simplicity isn’t always simple. Commercial software, enterprise applications, software as a service (SaaS), and other highly interactive applications often have no choice but to do a great number of things, because they support a range of real world tasks, some of which are complex. In this presentation, we will discuss what to try when removing functionality or features isn’t an option. We provide practical questions to ask when deciding whether and how to simplify an application. And we summarize proven design techniques to use when simplifying applications, illustrated with examples from real projects.
The document discusses two standard template library containers for storing key-value pairs: unordered_map and map. Unordered_map uses a hash table to provide fast lookup of elements in average O(1) time, but the elements are in random order. Map uses a binary search tree to provide O(logN) lookup time and stores elements in sorted order. Both support insertion, erasure, and lookup of elements using similar functions but have different underlying implementations and time complexities.
This document provides an overview of priority queues and binary trees as data structures. It discusses:
1) The implementation of a priority queue using an array, including methods for insertion, removal, and checking length. The priority queue removes elements based on priority rather than FIFO order.
2) Issues with the array implementation for large data sets, and introducing an alternative tree implementation.
3) An introduction to binary trees as a non-linear data structure, providing examples like a family tree. Binary trees partition the data into a root node and left/right subtrees.
Here are the key points covered in this introduction:
- A digital signal can assume only a finite set of values in both the dependent (usually amplitude) and independent (usually time or space) variables.
- Digital signals are commonly used to represent things like the human voice (telephone), audio (radio, TV, hi-fi), and more.
- Digital signals are becoming more prevalent because of their superior fidelity compared to analog signals.
The introduction provides a brief overview of digital signals and notes their widespread use in applications like telecommunications and audio. It establishes that the course will focus on analyzing and processing digital signals.
This document discusses priority queues and Huffman encoding. It provides an overview of an assignment to build a file compression program without a GUI that uses Huffman encoding. It explains why data compression is useful and what priority queues and Huffman encoding are. It also outlines some of the key components needed to implement Huffman encoding, including building a Huffman tree from character frequencies, assigning binary encodings to characters, and using bit input/output streams to compress and decompress files.
The document discusses the C++ Standard Template Library priority queue, which provides constant-time access to the largest or smallest element while maintaining the heap property. A priority queue is a container adaptor that uses a heap to provide efficient insertion and extraction, allowing O(1) retrieval of the maximum or minimum element through top() and removal of the top element using pop(). It can be configured as a max or min heap using different comparators.
The document discusses heaps and priority queues. It provides an overview of using a complete binary tree and array-based representation to implement a heap-based priority queue. Key points include: storing nodes in an array allows easy access to parent and child nodes, a heap is a complete binary tree where each parent has a higher priority value than its children, and priority queues can be implemented efficiently using heaps.
This document discusses deque, or double-ended queue, which is an abstract data type that allows elements to be added or removed from either end. It describes two types of deques, input-restricted and output-restricted, and common operations like push, pop, and isEmpty. Examples are given of using a deque for palindrome checking, A-Steal job scheduling, and undo-redo operations in software.
The document describes the design of a FIFO (first-in first-out) buffer. It explains the FIFO interface, protocol, datapath and control path. The FIFO uses a read and write pointer to control reading and writing of data to an internal memory buffer. Status signals like full and empty are generated by comparing the read and write pointers. The design aims to ensure reliable operation by following the specified protocol of not allowing a write when full or a read when empty.
The document discusses different types of queues including linear queues, circular queues, priority queues, and deques. It describes priority queues as queues where insertion or deletion is based on some priority property like task priority. Deques are described as double-ended queues that allow insertions and deletions from both ends. The document then discusses two methods for implementing priority queues - having separate queues for each priority level, or sorting items in a single queue by descending priority order. It provides examples of using each method for job scheduling and airport runway sharing. Finally, it briefly covers deque implementation and the two types - input restricted and output restricted.
The document describes an ADT priority queue data structure. It shows examples of an empty priority queue, a queue with one element, and a queue with three elements. It then demonstrates adding a new element to the queue with priority 3, showing that it is inserted in the proper priority order. The key aspects are that the priority queue stores elements ordered by priority value and the Add function inserts new elements according to this priority order.
This document discusses priority queues, which are data structures that store entries with keys and values. Entries are removed from the priority queue in order of increasing key value. Common priority queue operations include inserting an entry, removing the minimum entry, and finding the minimum entry without removing it. Priority queues can be implemented using unsorted or sorted sequences. Selection sort and insertion sort algorithms can use priority queues to sort data in quadratic time.
The document describes the First In First Out (FIFO) page replacement algorithm. As pages 0, 1, 2, 3, 4 are accessed in sequence, they are brought into memory frames. When the fifth page is accessed, page 3 is replaced since it was the first to enter the frames. This process continues with older pages being replaced to make room for new pages.
2 Important Data Structure Interview QuestionsGeekster
The document discusses different data structures and their operations. It summarizes the common operations available for a stack data structure, including push, pop, peek, isEmpty, isFull, size, clear, and search. It then summarizes different types of linked lists, including singly linked lists, doubly linked lists, circular linked lists, doubly circular linked lists, skip lists, self-adjusting lists, and XOR linked lists.
Efficient & Lock-Free Modified Skip List in Concurrent EnvironmentEditor IJCATR
In this era the trend of increasing software demands continues consistently, the traditional approach of faster processes comes to an end, forcing major processor manufactures to turn to multi-threading and multi-core architectures, in what is called the concurrency revolution. At the heart of many concurrent applications lie concurrent data structures. Concurrent data structures coordinate access to shared resources; implementing them is hard. The main goal of this paper is to provide an efficient and practical lock-free implementation of modified skip list data structure. That is suitable for both fully concurrent (large multi-processor) systems as well as pre-emptive (multi-process) systems. The algorithms for concurrent MSL based on mutual exclusion, Causes blocking which has several drawbacks and degrades the system’s overall performance. Non-blocking algorithms avoid blocking, and are either lock-free or wait-free.
This document contains the second assignment set for the course Operating Systems, consisting of 10 questions worth 6 marks each, for a total of 60 marks. The questions cover topics related to operating systems such as virtual memory, scheduling algorithms, semaphores, and distributed systems. Sample answers are provided for each question that describe key concepts in more detail.
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...Vladimir Alexiev, PhD, PMP
Vladimir Alexiev, Dimitar Manov, Jana Parvanova and Svetoslav Petrov. In proceedings of workshop Practical Experiences with CIDOC CRM and its Extensions (CRMEX 2013) at TPDL 2013, 26 Sep 2013, Valetta, Malta
Caches are commonly used in high-performance computing to improve performance. However, high-performance computing applications often perform poorly on computer architectures that employ caches, unless the application software is specifically tuned to exploit the caches. There are several basic design elements that differentiate cache architectures, including the cache size, mapping function, replacement algorithm, write policy, line size, and number of cache levels. The replacement algorithm determines which block of data to remove from the cache when a new block needs to be added.
Dulo: an effective buffer cache management scheme to exploit both temporal an...915086731
The document discusses disk I/O performance challenges for data-intensive applications like databases, multimedia, and scientific applications. It notes that disk I/O performance depends on factors like seek time, rotational latency, and transfer rate. Specific examples are given of applications at Los Alamos National Lab that require high I/O bandwidth. The talk will present schemes to improve disk performance using buffer cache management and coordination of distributed caches.
Dulo: an effective buffer cache management scheme to exploit both temporal an...915086731
This document proposes using disk layout information stored in a block table to improve disk performance. The block table tracks access times of disk blocks through their logical block numbers (LBNs) in a multi-level structure similar to a page table. Comparing timestamps of neighboring entries allows identification of sequential access patterns. While using memory, the table can be efficiently reclaimed by removing entries with timestamps below a threshold. This approach aims to better inform caching policies of sequential versus random access costs for improved disk I/O.
Describes the concept of ADTS and illustrates the concept with three o.docxearleanp
Describes the concept of ADTS and illustrates the concept with three of the most common abstract data types.
Solution
An abstract data type is a set of values, some of which may be distinguished as constants, together with a collection of operations involving members of the set.The primary objective is to seperate the implementation of the abstract data types from their function.The program must know what the operations do.
List,stack and queues are three fundamental data structures.When you declare a variable in a .NET application, it allocates some chunk of memory in the RAM. This memory has three things: the name of the variable, the data type of the variable, and the value of the variable.Depending on the data type, your variable is allocated that type of memory. There are two types of memory allocation: stack memory and heap memory
Stack memory stores data types like int , double , Boolean etc. While heap stores data types like string and objects.Stack is used for static memory allocation and Heap for dynamic memory allocation, both stored in the computer\'s RAM .Variables allocated on the stack are stored directly to the memory and access to this memory is very fast.When a function or a method calls another function which in turns calls another function etc., the execution of all those functions remains suspended until the very last function returns its value. The stack is always reserved in a LIFO order.Variables allocated on the heap have their memory allocated at run time and accessing this memory is a bit slower, but the heap size is only limited by the size of virtual memory . Element of the heap have no dependencies with each other and can always be accessed randomly at any time. You can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time.
The Queue works like FIFO system , a first-in, first-out collection of Objects. Objects stored in a Queue are inserted at one end and removed from the other. The Queue provide additional insertion, extraction, and inspection operations. We can Enqueue (add) items in Queue and we can Dequeue (remove from Queue ) or we can Peek (that is we will get the reference of first item ) item from Queue. Queue accepts null reference as a valid value and allows duplicate elements.
A list is a collection of items that can be accessed by index and provides functionality to search, sort and manipulate list items. This can store any data types to create a list.The List class can be used to create any type including a class. In this article, we will see how to create a list of a class with several properties.
.
The document provides an overview of the layers and processes involved in executing a query in Oracle, from when a client connects and sends a query to when the results are returned. It describes the layers of Oracle's architecture, the parsing, optimization, execution plan generation and execution of the query. Key steps include connecting, parsing, optimizing, generating and executing a query plan, updating and committing any changes, and fetching the results.
In a simultaneous multithreaded system, a core’s pipeline resources are sometimes partitioned and otherwise shared amongst numerous active threads. One mutual resource is the write buffer, which acts as an intermediary between a store instruction’s retirement from the pipeline and the store value being written to cache. The write buffer takes a completed store instruction from the load/store queue and eventually writes the value to the level-one data cache. Once a store is buffered with a write-allocate cache policy, the store must remain in the write buffer until its cache block is in level-one data cache. This latency may vary from as little as a single clock cycle (in the case of a level-one cache hit) to several hundred clock cycles (in the case of a cache miss). This paper shows that cache misses routinely dominate the write buffer’s resources and deny cache hits from being written to memory, thereby degrading performance of simultaneous multithreaded systems. This paper proposes a technique to reduce denial of resources to cache hits by limiting the number of cache misses that may concurrently reside in the write buffer and shows that system performance can be improved by using this technique.
LATENCY-AWARE WRITE BUFFER RESOURCE CONTROL IN MULTITHREADED CORESijmvsc
In a simultaneous multithreaded system, a core’s pipeline resources are sometimes partitioned and
otherwise shared amongst numerous active threads. One mutual resource is the write buffer, which acts as
an intermediary between a store instruction’s retirement from the pipeline and the store value being written
to cache. The write buffer takes a completed store instruction from the load/store queue and eventually
writes the value to the level-one data cache. Once a store is buffered with a write-allocate cache policy, the
store must remain in the write buffer until its cache block is in level-one data cache. This latency may vary
from as little as a single clock cycle (in the case of a level-one cache hit) to several hundred clock cycles
(in the case of a cache miss). This paper shows that cache misses routinely dominate the write buffer’s
resources and deny cache hits from being written to memory, thereby degrading performance of
simultaneous multithreaded systems. This paper proposes a technique to reduce denial of resources to
cache hits by limiting the number of cache misses that may concurrently reside in the write buffer and
shows that system performance can be improved by using this technique.
LATENCY-AWARE WRITE BUFFER RESOURCE CONTROL IN MULTITHREADED CORESijdpsjournal
In a simultaneous multithreaded system, a core’s pipeline resources are sometimes partitioned and
otherwise shared amongst numerous active threads. One mutual resource is the write buffer, which acts as
an intermediary between a store instruction’s retirement from the pipeline and the store value being written
to cache. The write buffer takes a completed store instruction from the load/store queue and eventually
writes the value to the level-one data cache. Once a store is buffered with a write-allocate cache policy, the
store must remain in the write buffer until its cache block is in level-one data cache. This latency may vary
from as little as a single clock cycle (in the case of a level-one cache hit) to several hundred clock cycles
(in the case of a cache miss). This paper shows that cache misses routinely dominate the write buffer’s
resources and deny cache hits from being written to memory, thereby degrading performance of
simultaneous multithreaded systems. This paper proposes a technique to reduce denial of resources to
cache hits by limiting the number of cache misses that may concurrently reside in the write buffer and
shows that system performance can be improved by using this technique.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Reference counting is a technique for garbage collection where each object stores a count of references to it. When the count reaches zero, the object can be freed. It allows prompt reclamation but has issues with reference cycles and performance due to frequent reference count updates. Variants address these issues through techniques like deferred reference counting, update coalescing, and periodic cycle collection. Reference counting is used widely in systems like COM, C++, Cocoa, and Delphi.
This document discusses using big data tools like Lucene to simplify debugging of failing tests by extracting and analyzing data from large simulation log files. It describes parsing UVM log files and storing message elements in a Lucene database for fast querying. Graphical representations of the log file data are presented to aid analysis, showing messages within a time range or containing specific strings. Using big data tools in this way can shorten debug time and verification schedules.
This document discusses several topics related to operating systems:
1. It explains multiprogramming and how operating systems can interrupt processes to share resources between multiple concurrent programs or users.
2. It describes three types of schedulers in operating systems - long-term, mid-term, and short-term schedulers - and their functions in admitting, swapping, and allocating CPU time to processes.
3. It discusses the First Come First Served scheduling algorithm and its properties of being simple but potentially resulting in long wait times.
The document discusses loops in Java programming. It covers:
- Using for-each loops to iterate through collections and access each element.
- Using while loops when the number of iterations is unknown or elements need to be removed during iteration, such as for searching.
- Comparing for-each and while loops, noting that for-each is safer but while loops provide more flexibility.
- Examples of using loops to iterate through collections and repeat blocks of code a set number of times.
This document provides information about Frits Hoogland, an Oracle database performance, configuration and capacity specialist with 25 years of experience. It discusses mutexes in the Oracle database, noting they were gradually implemented starting in Oracle 10.2 to manage concurrency. The presentation assumes knowledge of concepts like mutexes/spinlocks and the general workings of the database.
Using RxJava on Android platforms provides benefits like making asynchronous code more manageable, but requires changing how developers think about programming. Key concepts include Observables that emit items, Observer subscribers, and combining/transforming data streams. For Android, RxJava and RxAndroid libraries are used, with RxAndroid mainly providing threading helpers. Effective practices include proper subscription management and error handling to avoid issues from asynchronous execution and component lifecycles.
Scaling and High Performance Storage System: LeoFS
Lfu
1. An O(1) algorithm for implementing the LFU
cache eviction scheme
Prof. Ketan Shah Anirban Mitra Dhruv Matani
August 16, 2010
Abstract
Cache eviction algorithms are used widely in operating systems, databases
and other systems that use caches to speed up execution by caching data
that is used by the application. There are many policies such as MRU
(Most Recently Used), MFU (Most Frequently Used), LRU (Least Re-
cently Used) and LFU (Least Frequently Used) which each have their
advantages and drawbacks and are hence used in specific scenarios. By
far, the most widely used algorithm is LRU, both for its O(1) speed of
operation as well as its close resemblance to the kind of behaviour that
is expected by most applications. The LFU algorithm also has behaviour
desirable by many real world workloads. However, in many places, the
LRU algorithm is is preferred over the LFU algorithm because of its lower
run time complexity of O(1) versus O(log n). We present here an LFU
cache eviction algorithm that has a runtime complexity of O(1) for all of
its operations, which include insertion, access and deletion(eviction).
1
2. 1 Introduction
The paper is organized as follows.
• A description of the use cases of LFU where it can prove to be superior
to other cache eviction algorithms
• The dictionary operations that should be supported by a LFU cache im-
plementation. These are the operations which determine the strategy’s
runtime complexity
• A description of how the currently best known LFU algorithm along with
its runtime complexity
• A description of the proposed LFU algorithm; every operation of which
has a runtime complexity of O(1)
2 Uses of LFU
Consider a caching network proxy application for the HTTP protocol. This
proxy typically sits between the internet and the user or a set of users. It
ensures that all the users are able to access the internet and enables sharing
of all the shareable resources for optimum network utiliization and improved
responsiveness. Such a caching proxy should try to maximize the amount of
data that it can cache in the limited amount of storage or memory that it has
at its disposal [[4, 8, 7]].
Typically, lots of static resources such as images, CSS style sheets and
javascript code can very easily be cached for a fairly long time before it is
replaced by newer versions. These static resources or ”assets” as programmers
call them are included in pretty much every page, so it is most beneficial to cache
them since pretty much every request is going to require them. Furthermore,
since a network proxy is required to serve thousands of requests per second, the
overhead needed to do so should be kept to a minimum.
To that effect, it should evict only those resources that are not used very
frequently. Hence, the frequently used resources should be kept at the expence
of the not so frequently used ones since the former have proved themselves to be
useful over a period of time. Of course, there is a counter argument to that which
says that resources that may have been used extensively may not be required in
the future, but we have observed this not to be the case in most situations. For
example, static resources of heavily used pages are always requested by every
user of that page. Hence, the LFU cache replacement strategy can be employed
by these caching proxies to evict the least frequently used items in its cache
when there is a dearth of memory.
LRU might also be an applicable strategy here, but it would fail when the
request pattern is such that all requested items don’t fit into the cache and the
items are requested in a round robin fashion. What will happen in case of LRU
is that items will constantly enter and leave the cache, with no user request ever
2
3. hitting the cache. Under the same conditions however, the LFU algorithm will
perform much better, with most of the cached items resulting in a cache hit.
Pathological behaviour of the LFU algoithm is not impossible though. We
are not trying to present a case for LFU here, but are instead trying to show
that if LFU is an applicable strategy, then there is a better way to implement
it than has been previously published.
3 Dictionary operations supported by an LFU
cache
When we speak of a cache eviction algorithm, we need to concern ourselves
primarily with 3 different operations on the cached data.
• Set (or insert) an item in the cache
• Retrieve (or lookup) an item in the cache; simultaneously incrementing
its usage count (for LFU)
• Evict (or delete) the Least Frequently Used (or as the strategy of the
eviction algorithm dectates) item from the cache
4 The currently best known complexity of the
LFU algorithm
As of this writing, the best known runtimes for each of the operations mentioned
above for an LFU cache eviction strategy are as follows:
• Insert: O(log n)
• Lookup: O(log n)
• Delete: O(log n)
These complexity values folllow directly from the complexity of the binomial
heap implementation and a standard collision free hash table. An LFU caching
strategy can be easily and efficiently implemented using a min heap data struc-
ture and a hash map. The min heap is created based on the usage count (of the
item) and the hash table is indexed via the element’s key. All operations on a
collision free hash table are of order O(1), so the runtime of the LFU cache is
governed by the runtime of operations on the min heap [[5, 6, 9, 1, 2]].
When an element is inserted into the cache, it enters with a usage count of
1 and since insertion into a min heap costs O(log n), inserts into the LFU cache
take O(log n) time [[3]].
When an element is looked up, it is found via a hashing function which hashes
the key to the actual element. Simultaneously, the usage count (the count in the
max heap) is incremented by one, which results in the reorganization of the min
3
4. heap and the element moves away from the root. Since the element can move
down up to log n levels at any stage, this operation too takes time O(log n).
When an element is selected for eviction and then eventually removed from
the heap, it can cause significant reorganization of the heap data structure.
The element with the least usage count is present at the root of the min heap.
Deleting the root of the min heap involves replacing the root node with the last
leaf node in the heap, and bubbling this node down to its correct position. This
operation too has a runtime complexity of O(log n).
5 The proposed LFU algorithm
The proposed LFU algorithm has a runtime complexity of O(1) for each of the
dictionary operations (insertion, lookup and deletion) that can be performed on
an LFU cache. This is achieved by maintaining 2 linked lists; one on the access
frequency and one for all elements that have the same access frequency.
A hash table is used to access elements by key (not shown in the diagram
below for clarity). A doubly linked list is used to link together nodes which rep-
resent a set of nodes that have the same access frequency (shown as rectangular
blocks in the diagram below). We refer to this doubly linked list as the frequency
list. This set of nodes that have the same access frequency is actually a doubly
linked list of such nodes (shown as circular nodes in the diagram below). We
refer to this doubly linked list (which is local to a particular frequency) as a
node list. Each node in the node list has a pointer to its parent node in the
freqency list (not shown in the diagram for clarity). Hence, nodes x and y will
have a pointer back to node 1, nodes z and a will have a pointer back to node
2 and so on...
Figure 1: The LFU dict with 6 elements
The pseudocode below shows how to initialize an LFU Cache. The hash
table used to locate elements by key is indicated by the variable bykey. We use
a SET in lieu of a linked list for storing elements with the same access frequency
for simplicitly of implementation. The variable items is a standard SET data
4
5. Figure 2: After element with key ’z’ has been accessed once more
structure which holds the keys of such elements that have the same access fre-
quency. Its insertion, lookup and deletion runtime complexity is O(1).
Creates a new frequency node with a access frequency value of 0 (zero)
NEW-FREQ-NODE()
01 Object o
02 o.value ← 0
03 o.items ← SET()
04 o.prev ← o
05 o.next ← o
06 return o
Creates a new LFU Item which is stored in the lookup table bykey
NEW-LFU-ITEM(data, parent)
01 Object o
02 o.data ← data
03 o.parent ← parent
04 return o
Creates a new LFU Cache
NEW-LFU-CACHE()
01 Object o
02 o.bykey ← HASH-MAP()
03 o.freq head ← NEW-FREQ-NODE
04 return o
The LFU cache object is accessible via the lfu cache variable
lfu cache ← NEW-LFU-CACHE()
We also define some helper functions that aid linked list manipulation
Create a new new and set its previous and next pointers to prev and next
5
6. GET-NEW-NODE(value, prev, next)
01 nn ← NEW-FREQ-NODE()
02 nn.value ← value
03 nn.prev ← prev
04 nn.next ← next
05 prev.next ← nn
06 next.prev ← nn
07 return nn
Remove (unlink) a node from the linked list
DELETE-NODE(node)
01 node.prev.next ← node.next
02 node.next.prev ← prev
Initially, the LFU cache starts off as an empty hash map and an empty fre-
quency list. When the first element is added, a single element in the hash map
is created which points to this new element (by its key) and a new frequency
node with a value of 1 is added to the frequency list. It should be clear that the
number of elements in the hash map will be the same as the number of items in
the LFU cache. A new node is added to 1’s frequency list. This node actually
points back to the frequency node whose member it is. For example, if x was
the node added, then the node x will point back to the frequency node 1. Hence
the runtime complexity of element insertion is O(1).
Access (fetch) an element from the LFU cache, simultaneously incrementing its
usage count
ACCESS(key)
01 tmp ← lfu cache.bykey[key]
02 if tmp equals null then
03 throw Exception(”No such key”)
04 freq ← tmp.parent
05 next freq ← freq.next
06
07 if next freq equals lfu cache.freq head or
08 next freq.value does not equal freq.value + 1 then
08 next freq ← GET-NEW-NODE(freq.value + 1, freq, next freq)
09 next freq.items.add(key)
10 tmp.parent ← next freq
11
12 freq.items.remove(key)
13 if freq.items.length equals 0 then
14 DELETE-NODE(freq)
15 return tmp.data
When this element is accessed once again, the element’s frequency node is
looked up and its next sibbling’s value is queried. If its sibbling does not exist
6
7. or its next sibbling’s value is not 1 more than its value, then a new frequency
node with a value of 1 more than the value of this frequency node is created
and inserted into the correct place. The node is removed from its current set
and inserted into the new frequency list’s set. The node’s frequency pointer
is updated to point to its new frequency node. For example, if the node z is
accessed once more (1) then it is removed from the frequency list having the
value of 2 and added to the frequency list having the value of 3 (2). Hence the
runtime complexity of element access is O(1).
Insert a new element into the LFU cache
INSERT(key, value)
01 if key in lfu cache.bykey then
02 throw Exception(”Key already exists”)
03
04 freq ← lfu cache.freq head.next
05 if freq.value does not equal 1 then
06 freq ← GET-NEW-NODE(1, lfu cache.freq head, freq)
07
08 freq.items.add(key)
09 lfu cache.bykey[key] ← NEW-LFU-ITEM(value, freq)
When an element with the least access frequency needs to be deleted, any
element from the 1st (leftmost) frequency list is chosen and removed. If this fre-
quency list’s node list becomes empty due to this deletion, then the frequency
node is also deleted. The element’s reference from the hash map is also deleted.
Hence the runtime complexity of deleting the least frequently used element is
O(1).
Fetches an item with the least usage count (the least frequently used item) in the
cache
GET-LFU-ITEM()
01 if lfu cache.bykey.length equals 0 then
02 throw Exception(”The set is empty”)
03 return lfu cache.bykey[ lfu cache.freq head.next.items[0] ]
Thus, we see that the runtime complexity of each of the dictionary operations
on an LFU cache is O(1).
7
8. References
[1] Hyokyung Bahn, Sang Lyul Min Sam H. Noh, and Kern Koh, Using full
reference history for efficient document replacement in web caches, usenix
(1999).
[2] Sorav Bansal and Dharmendra S. Modha, Car: Clock with adaptive replace-
ment, usenix (2004).
[3] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford
Stein, Introduction to algorithms, second edition, (2002), 130–132.
[4] G. Karakostas and D. N. Serpanos, Practical lfu implementation for web
caching, (June 19, 2000).
[5] Donghee Lee, Jongmoo Choi, Jong hun Kim, Sam H. Noh, Sang Lyul Min,
Yookun Cho, and Chong sang Kim, Lrfu: A spectrum of policies that sub-
sumes the least recently used and least frequently used policies, (March 10,
2000).
[6] Elizabeth J. O’Neil, Patrick E. O’Neil, and Gerhard Weikum, An optimality
proof of the lru-k page replacement algorithm, (1996).
[7] Junho Shim, Peter Scheuermann, and Radek Vingralek, Proxy cache de-
sign: Algorithms, implementation and performance, IEEE Transactions on
Knowledge and Data Engineering (1999).
[8] Dong Zheng, Differentiated web caching - a differentiated memory allocation
model on proxies, (2004).
[9] Yuanyuan Zhou and James F. Philbin, The multi-queue replacement algo-
rithm for second level buffer caches, usenix (2001).
8