This document discusses techniques for compressing web graph representations to reduce storage requirements. It begins by describing a naive representation that stores successor lists and offsets, taking up 288 bits per node. It then presents ideas for compression, including using variable-length coding for successors based on their expected distribution and exploiting the locality and similarity of web links. Specific coding techniques discussed include Golomb coding, Elias gamma and delta coding, k-bit variable coding, and differentiating successor lists based on gaps between nodes. The document analyzes how these techniques can compress the graph representation to around 10% of the original size by modeling the distributions of degrees, gaps between successors, and other values.
Double layer security using visual cryptography and transform based steganogr...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
11.secure compressed image transmission using self organizing feature mapsAlexander Decker
This document summarizes a research paper that proposes a method for secure compressed image transmission using self-organizing feature maps. The method involves compressing images using SOFM-based vector quantization, entropy coding the results, and encrypting the compressed data using a scrambler before transmission. Simulation results show the method achieves a compression ratio of up to 38:1 while providing security, outperforming JPEG compression by up to 1 dB. The paper presents the technical details and evaluation of the proposed secure image transmission system.
This document summarizes a research paper on identifying micro-architectures in evolving object-oriented software systems. The paper presents an approach called SGFinder that models class diagrams as labeled graphs and defines micro-architectures as connected induced subgraphs. SGFinder efficiently enumerates all micro-architectures up to a given size. The paper applies SGFinder to two open-source systems and analyzes the identified micro-architectures to find those that are particularly fault-prone, fault-free, stable or change-prone. The results provide insights into common micro-architecture patterns and their relationships to quality attributes.
Robust Watermarking Technique using 2D Logistic Map and Elliptic Curve Crypto...idescitation
Copyright protection is a vital issue in modern day’s data transmission over
internet. For copyright protection, watermarking technique is extensively used. In this
paper, we have proposed a robust watermarking scheme using 2D Logistic map and elliptic
curve cryptosystem (ECC) in the DWT domain. The combined encryption has been taken to
enhance the security of the watermark before the embedding phase. The PSNR value shows
the difference between original cover and embedded cover is minimal. Similarly, NC values
show the robustness and resistance capability of the proposed technique from the common
attacks such as scaling, Gaussian noise etc. Thus, this combined version of 2D Logistic map
and Elliptic curve cryptosystem can be used in case of higher security requirement of the
watermark signal.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This document provides an overview of elliptic curve cryptography (ECC). It discusses how ECC provides stronger security than RSA with smaller key sizes. The document describes the mathematical foundations of elliptic curves over finite fields. It explains scalar multiplication, which involves adding a point on the elliptic curve to itself multiple times, as the core operation in ECC. Finally, it discusses implementations of ECC and applications for encryption and digital signatures.
Double layer security using visual cryptography and transform based steganogr...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
11.secure compressed image transmission using self organizing feature mapsAlexander Decker
This document summarizes a research paper that proposes a method for secure compressed image transmission using self-organizing feature maps. The method involves compressing images using SOFM-based vector quantization, entropy coding the results, and encrypting the compressed data using a scrambler before transmission. Simulation results show the method achieves a compression ratio of up to 38:1 while providing security, outperforming JPEG compression by up to 1 dB. The paper presents the technical details and evaluation of the proposed secure image transmission system.
This document summarizes a research paper on identifying micro-architectures in evolving object-oriented software systems. The paper presents an approach called SGFinder that models class diagrams as labeled graphs and defines micro-architectures as connected induced subgraphs. SGFinder efficiently enumerates all micro-architectures up to a given size. The paper applies SGFinder to two open-source systems and analyzes the identified micro-architectures to find those that are particularly fault-prone, fault-free, stable or change-prone. The results provide insights into common micro-architecture patterns and their relationships to quality attributes.
Robust Watermarking Technique using 2D Logistic Map and Elliptic Curve Crypto...idescitation
Copyright protection is a vital issue in modern day’s data transmission over
internet. For copyright protection, watermarking technique is extensively used. In this
paper, we have proposed a robust watermarking scheme using 2D Logistic map and elliptic
curve cryptosystem (ECC) in the DWT domain. The combined encryption has been taken to
enhance the security of the watermark before the embedding phase. The PSNR value shows
the difference between original cover and embedded cover is minimal. Similarly, NC values
show the robustness and resistance capability of the proposed technique from the common
attacks such as scaling, Gaussian noise etc. Thus, this combined version of 2D Logistic map
and Elliptic curve cryptosystem can be used in case of higher security requirement of the
watermark signal.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This document provides an overview of elliptic curve cryptography (ECC). It discusses how ECC provides stronger security than RSA with smaller key sizes. The document describes the mathematical foundations of elliptic curves over finite fields. It explains scalar multiplication, which involves adding a point on the elliptic curve to itself multiple times, as the core operation in ECC. Finally, it discusses implementations of ECC and applications for encryption and digital signatures.
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...NTNU
The introduction of expert knowledge when learning Bayesian Networks from data is known to be an excellent approach to boost the performance of automatic learning methods, specially when the data is scarce. Previous approaches for this problem based on Bayesian statistics introduce the expert knowledge modifying the prior probability distributions. In this study, we propose a new methodology based on Monte Carlo simulation which starts with non-informative priors and requires knowledge from the expert a posteriori, when the simulation ends. We also explore a new Importance Sampling method for Monte Carlo simulation and the definition of new non-informative priors for the structure of the network. All these approaches are experimentally validated with five standard Bayesian networks.
Read more:
http://link.springer.com/chapter/10.1007%2F978-3-642-14049-5_70
The document discusses sparse coding and its applications in visual recognition tasks. It introduces sparse coding as an unsupervised learning technique that learns bases to represent image patches. Sparse coding has been shown to outperform bag-of-words models with vector quantization on datasets like Caltech-101 and PASCAL VOC. The document also discusses extensions of sparse coding, including hierarchical sparse coding and supervised methods, that have achieved further improvements on image classification benchmarks.
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...ITIIIndustries
A new approach based on the use of a deep neural network and an ensemble of particular classifiers is proposed. This approach is based on use of the novel block of fuzzy generalization for combines classes of objects into semantic groups, each of which corresponds to one or more particular classifiers. As result of processing, the sequence of frames is converted into the annotation of the event occurring in the video for a certain time interval
The document summarizes a proposed user-friendly image sharing scheme that uses JPEG-LS prediction and LSB matching functions. The scheme encodes a secret image into meaningful shadow images using different prime numbers for different blocks, as determined by JPEG-LS prediction. It hides the prime number indicators in the least significant bits of pixels using LSB matching to prevent image degradation. The experimental results showed the reconstructed image quality was higher than previous schemes, making it suitable for applications requiring high quality images like medicine, military, or art.
This document provides an overview of visual cryptography, including its introduction, types, implementation methods, advantages, disadvantages, and applications. Visual cryptography allows visual information like pictures and text to be encrypted in a way that can be decrypted by the human visual system. It was pioneered in 1994 and works by splitting an image into shares such that stacking a sufficient number of shares reveals the original image. The document discusses various visual cryptography schemes and their properties.
IRJET- Image Captioning using Multimodal EmbeddingIRJET Journal
This document proposes a novel methodology to generate a single story or caption from multiple images that share similar context. It combines existing image captioning and natural language processing models. Specifically, it uses a Convolutional Neural Network to extract visual semantics from images and generate captions. It then represents the captions as vectors using Skip Thought vectors or TF-IDF values. These vectors are combined into a matrix and used to generate a new story/caption that shares the context of the input images. The results show that the Skip Thought vector approach achieves better performance based on RMSE and MAE error metrics. The model could potentially be applied to applications like medical diagnosis, crime investigations, and lecture note generation.
This document provides an introduction to deep learning. It discusses how deep learning uses multiple layers of nonlinear processing to automatically extract features from data, avoiding the need for manual feature engineering. Deep belief networks, which are composed of stacked restricted Boltzmann machines, are a widely used deep learning model. Training deep networks is challenging, but this is addressed by an unsupervised layer-wise pretraining approach followed by supervised fine-tuning of the entire network. The document reviews literature on deep learning models and applications.
The document discusses Bézier curves and provides information about a CS 354 class. It includes details about an in-class quiz, the professor's office hours, and an upcoming lecture on Bézier curves and Project 2, which is due on Friday. The lecture will cover procedural generation of a torus from a 2D grid, GLSL functions needed for the project, normal maps, coordinate spaces, interpolation curves, and Bézier curves.
This document summarizes a class on acceleration structures for ray tracing. It discusses building bounding volume hierarchies and using them to accelerate ray intersection tests. Uniform grids, kd-trees, and binary space partitioning trees are covered as approaches for spatial subdivision. The role of acceleration structures in speeding up global illumination calculations is also discussed.
A Framework to Specify and Verify Computational Fields for Pervasive Computin...Danilo Pianini
Pervasive context-aware computing networks call for designing algorithms for information propagation and reconfiguration that promote self-adaptation, namely, which can guarantee – at least to a probabilistic extent – certain reliability and robustness properties in spite of unpredicted changes and conditions. The possibility of formally analyzing their properties is obviously an essential engineering requirement, calling for general-purpose models and tools. As proposed in recent works, several such algorithms can be modeled by the notion of computational field: a dynamically evolving spatial data structure mapping every node of the network to a data value. Based on this idea, as a contribution toward formally verifying properties of pervasive computing systems, in this article we propose a specification language to model computational fields, and a framework based on PRISM stochastic model checker explicitly targeted at supporting temporal property verification. By a number of pervasive computing examples, we show that the proposed approach can be effectively used for quantitative analysis of systems running on networks composed of hundreds of nodes.
This paper proposes a block-based compression system for VDT images that divides images into blocks that are classified into four types (smooth, text, hybrid, picture) based on histogram and gradient properties. Different coding algorithms are used for each block type to efficiently compress based on their characteristics. Experimental results show the proposed system outperforms JPEG and DjVu in terms of quality and compression performance for VDT images, with comparable complexity to JPEG.
the use of digital data has been increase over the past decade which has led to the evolution of digital world. With this evolution the use of data such as text, images and other multimedia for communication purpose over network needs to be secured during transmission. Images been the most extensively used digital data throughout the world, there is a need for the security of images, so that the confidentiality, integrity and availability of the data is maintained. There is various cryptography techniques used for image security of which the asymmetric cryptography is most extensively used for securing data transmission. This paper discusses about Elliptic Curve Cryptography an asymmetric public key cryptography method for image transmission. With security it is also crucial to address the computational aspects of the cryptography methods used for securing images. The paper proposes an Image encryption and decryption method using ECC. Integrity of image transmission is achieved by using Elliptic Curve Digital Signature Algorithm (ECDSA) and also considering computational aspects at each stage.
This document provides an outline and introduction to deep generative models. It discusses what generative models are, their applications like image and speech generation/enhancement, and different types of generative models including PixelRNN/CNN, variational autoencoders, and generative adversarial networks. Variational autoencoders are explained in detail, covering how they introduce a restriction in the latent space z to generate new data points by sampling from the latent prior distribution.
This document provides a summary of a lecture on support vector machines (SVMs). The lecture discusses how SVMs find the optimal separating hyperplane between two classes by maximizing the margin between them. It covers both the separable and non-separable cases, and how SVMs can be extended to non-linear classification using kernel tricks. The lecture concludes by mentioning further issues like multi-class classification and algorithms for building SVMs.
Visual cryptography scheme for color imagesiaemedu
This document presents a new visual cryptography scheme for color images based on the CMY color model, half-toning technique, and traditional binary image sharing. The color image is decomposed into cyan, magenta, and yellow monochrome images, which are then converted to binary images using half-toning. Each monochrome image is divided into shares using binary image sharing. Three shares with different colors can reconstruct the original image when stacked. Experimental results on a test image show the shares have negligible information and the reconstructed image has high structural similarity to the original.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data anlytics tools.
This paper proposes a heuristic-based approach to identify concepts in execution traces of software systems. The approach analyzes traces generated through dynamic analysis and groups method calls that are sequentially invoked and conceptually cohesive. It applies several steps: system instrumentation to collect traces, trace pruning and compression, analyzing method source code for cohesion/coupling, and using a genetic algorithm to identify concepts. An empirical study on two systems found the algorithm converges with 72-84% overlap across runs. Identified concepts matched the manual oracle with high precision, though some features were more difficult to distinguish. Inspection of segments revealed they sometimes only captured parts of features due to complexity.
Beginning direct3d gameprogramming01_thehistoryofdirect3dgraphics_20160407_ji...JinTaek Seo
Direct3D has evolved over many versions to support more advanced graphics capabilities. Early versions supported basic 3D rendering while later versions like DirectX 8 introduced pixel and vertex shaders, point sprites, and 3D textures. DirectX 9 improved shaders and added multiple render targets. DirectX 10 unified the shader pipeline and DirectX 11 added tessellation and support for GPGPU programming. Each version expanded the set of graphics techniques supported in hardware-accelerated 3D graphics.
Convolutional networks and graph networks through kernelstuxette
This presentation discusses how convolutional kernel networks (CKNs) can be used to model sequential and graph-structured data through kernels defined over sequences and graphs. CKNs define feature maps from substructures like n-mers in sequences and paths in graphs into high-dimensional spaces, which are then approximated to obtain low-dimensional representations that can be used for prediction tasks like classification. This approach is analogous to convolutional neural networks and can be extended to multiple layers. The presentation provides examples showing CKNs achieve good performance on problems involving protein sequences and social networks.
This document discusses image compression techniques. It begins by defining image compression as reducing the data required to represent a digital image. It then discusses why image compression is needed for storage, transmission and other applications. The document outlines different types of redundancies that can be exploited in compression, including spatial, temporal and psychovisual redundancies. It categorizes compression techniques as lossless or lossy and describes several algorithms for each type, including Huffman coding, LZW coding, DPCM, DCT and others. Key aspects like prediction, quantization, fidelity criteria and compression models are also summarized.
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...NTNU
The introduction of expert knowledge when learning Bayesian Networks from data is known to be an excellent approach to boost the performance of automatic learning methods, specially when the data is scarce. Previous approaches for this problem based on Bayesian statistics introduce the expert knowledge modifying the prior probability distributions. In this study, we propose a new methodology based on Monte Carlo simulation which starts with non-informative priors and requires knowledge from the expert a posteriori, when the simulation ends. We also explore a new Importance Sampling method for Monte Carlo simulation and the definition of new non-informative priors for the structure of the network. All these approaches are experimentally validated with five standard Bayesian networks.
Read more:
http://link.springer.com/chapter/10.1007%2F978-3-642-14049-5_70
The document discusses sparse coding and its applications in visual recognition tasks. It introduces sparse coding as an unsupervised learning technique that learns bases to represent image patches. Sparse coding has been shown to outperform bag-of-words models with vector quantization on datasets like Caltech-101 and PASCAL VOC. The document also discusses extensions of sparse coding, including hierarchical sparse coding and supervised methods, that have achieved further improvements on image classification benchmarks.
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...ITIIIndustries
A new approach based on the use of a deep neural network and an ensemble of particular classifiers is proposed. This approach is based on use of the novel block of fuzzy generalization for combines classes of objects into semantic groups, each of which corresponds to one or more particular classifiers. As result of processing, the sequence of frames is converted into the annotation of the event occurring in the video for a certain time interval
The document summarizes a proposed user-friendly image sharing scheme that uses JPEG-LS prediction and LSB matching functions. The scheme encodes a secret image into meaningful shadow images using different prime numbers for different blocks, as determined by JPEG-LS prediction. It hides the prime number indicators in the least significant bits of pixels using LSB matching to prevent image degradation. The experimental results showed the reconstructed image quality was higher than previous schemes, making it suitable for applications requiring high quality images like medicine, military, or art.
This document provides an overview of visual cryptography, including its introduction, types, implementation methods, advantages, disadvantages, and applications. Visual cryptography allows visual information like pictures and text to be encrypted in a way that can be decrypted by the human visual system. It was pioneered in 1994 and works by splitting an image into shares such that stacking a sufficient number of shares reveals the original image. The document discusses various visual cryptography schemes and their properties.
IRJET- Image Captioning using Multimodal EmbeddingIRJET Journal
This document proposes a novel methodology to generate a single story or caption from multiple images that share similar context. It combines existing image captioning and natural language processing models. Specifically, it uses a Convolutional Neural Network to extract visual semantics from images and generate captions. It then represents the captions as vectors using Skip Thought vectors or TF-IDF values. These vectors are combined into a matrix and used to generate a new story/caption that shares the context of the input images. The results show that the Skip Thought vector approach achieves better performance based on RMSE and MAE error metrics. The model could potentially be applied to applications like medical diagnosis, crime investigations, and lecture note generation.
This document provides an introduction to deep learning. It discusses how deep learning uses multiple layers of nonlinear processing to automatically extract features from data, avoiding the need for manual feature engineering. Deep belief networks, which are composed of stacked restricted Boltzmann machines, are a widely used deep learning model. Training deep networks is challenging, but this is addressed by an unsupervised layer-wise pretraining approach followed by supervised fine-tuning of the entire network. The document reviews literature on deep learning models and applications.
The document discusses Bézier curves and provides information about a CS 354 class. It includes details about an in-class quiz, the professor's office hours, and an upcoming lecture on Bézier curves and Project 2, which is due on Friday. The lecture will cover procedural generation of a torus from a 2D grid, GLSL functions needed for the project, normal maps, coordinate spaces, interpolation curves, and Bézier curves.
This document summarizes a class on acceleration structures for ray tracing. It discusses building bounding volume hierarchies and using them to accelerate ray intersection tests. Uniform grids, kd-trees, and binary space partitioning trees are covered as approaches for spatial subdivision. The role of acceleration structures in speeding up global illumination calculations is also discussed.
A Framework to Specify and Verify Computational Fields for Pervasive Computin...Danilo Pianini
Pervasive context-aware computing networks call for designing algorithms for information propagation and reconfiguration that promote self-adaptation, namely, which can guarantee – at least to a probabilistic extent – certain reliability and robustness properties in spite of unpredicted changes and conditions. The possibility of formally analyzing their properties is obviously an essential engineering requirement, calling for general-purpose models and tools. As proposed in recent works, several such algorithms can be modeled by the notion of computational field: a dynamically evolving spatial data structure mapping every node of the network to a data value. Based on this idea, as a contribution toward formally verifying properties of pervasive computing systems, in this article we propose a specification language to model computational fields, and a framework based on PRISM stochastic model checker explicitly targeted at supporting temporal property verification. By a number of pervasive computing examples, we show that the proposed approach can be effectively used for quantitative analysis of systems running on networks composed of hundreds of nodes.
This paper proposes a block-based compression system for VDT images that divides images into blocks that are classified into four types (smooth, text, hybrid, picture) based on histogram and gradient properties. Different coding algorithms are used for each block type to efficiently compress based on their characteristics. Experimental results show the proposed system outperforms JPEG and DjVu in terms of quality and compression performance for VDT images, with comparable complexity to JPEG.
the use of digital data has been increase over the past decade which has led to the evolution of digital world. With this evolution the use of data such as text, images and other multimedia for communication purpose over network needs to be secured during transmission. Images been the most extensively used digital data throughout the world, there is a need for the security of images, so that the confidentiality, integrity and availability of the data is maintained. There is various cryptography techniques used for image security of which the asymmetric cryptography is most extensively used for securing data transmission. This paper discusses about Elliptic Curve Cryptography an asymmetric public key cryptography method for image transmission. With security it is also crucial to address the computational aspects of the cryptography methods used for securing images. The paper proposes an Image encryption and decryption method using ECC. Integrity of image transmission is achieved by using Elliptic Curve Digital Signature Algorithm (ECDSA) and also considering computational aspects at each stage.
This document provides an outline and introduction to deep generative models. It discusses what generative models are, their applications like image and speech generation/enhancement, and different types of generative models including PixelRNN/CNN, variational autoencoders, and generative adversarial networks. Variational autoencoders are explained in detail, covering how they introduce a restriction in the latent space z to generate new data points by sampling from the latent prior distribution.
This document provides a summary of a lecture on support vector machines (SVMs). The lecture discusses how SVMs find the optimal separating hyperplane between two classes by maximizing the margin between them. It covers both the separable and non-separable cases, and how SVMs can be extended to non-linear classification using kernel tricks. The lecture concludes by mentioning further issues like multi-class classification and algorithms for building SVMs.
Visual cryptography scheme for color imagesiaemedu
This document presents a new visual cryptography scheme for color images based on the CMY color model, half-toning technique, and traditional binary image sharing. The color image is decomposed into cyan, magenta, and yellow monochrome images, which are then converted to binary images using half-toning. Each monochrome image is divided into shares using binary image sharing. Three shares with different colors can reconstruct the original image when stacked. Experimental results on a test image show the shares have negligible information and the reconstructed image has high structural similarity to the original.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data anlytics tools.
This paper proposes a heuristic-based approach to identify concepts in execution traces of software systems. The approach analyzes traces generated through dynamic analysis and groups method calls that are sequentially invoked and conceptually cohesive. It applies several steps: system instrumentation to collect traces, trace pruning and compression, analyzing method source code for cohesion/coupling, and using a genetic algorithm to identify concepts. An empirical study on two systems found the algorithm converges with 72-84% overlap across runs. Identified concepts matched the manual oracle with high precision, though some features were more difficult to distinguish. Inspection of segments revealed they sometimes only captured parts of features due to complexity.
Beginning direct3d gameprogramming01_thehistoryofdirect3dgraphics_20160407_ji...JinTaek Seo
Direct3D has evolved over many versions to support more advanced graphics capabilities. Early versions supported basic 3D rendering while later versions like DirectX 8 introduced pixel and vertex shaders, point sprites, and 3D textures. DirectX 9 improved shaders and added multiple render targets. DirectX 10 unified the shader pipeline and DirectX 11 added tessellation and support for GPGPU programming. Each version expanded the set of graphics techniques supported in hardware-accelerated 3D graphics.
Convolutional networks and graph networks through kernelstuxette
This presentation discusses how convolutional kernel networks (CKNs) can be used to model sequential and graph-structured data through kernels defined over sequences and graphs. CKNs define feature maps from substructures like n-mers in sequences and paths in graphs into high-dimensional spaces, which are then approximated to obtain low-dimensional representations that can be used for prediction tasks like classification. This approach is analogous to convolutional neural networks and can be extended to multiple layers. The presentation provides examples showing CKNs achieve good performance on problems involving protein sequences and social networks.
This document discusses image compression techniques. It begins by defining image compression as reducing the data required to represent a digital image. It then discusses why image compression is needed for storage, transmission and other applications. The document outlines different types of redundancies that can be exploited in compression, including spatial, temporal and psychovisual redundancies. It categorizes compression techniques as lossless or lossy and describes several algorithms for each type, including Huffman coding, LZW coding, DPCM, DCT and others. Key aspects like prediction, quantization, fidelity criteria and compression models are also summarized.
This document proposes a new digital image encryption technique based on multi-scroll chaotic delay differential equations (DDEs). The technique uses a XOR operation between separated binary planes of a grayscale image and a shuffled attractor image from a DDE. Security keys include DDE parameters like initial conditions, time constants, and simulation time. Experimental results using a 512x512 Lena image in MATLAB demonstrate the DDE dynamics, encryption/decryption security through histograms, power spectrums, and image correlations. Wrong key decryption is also shown. The technique offers potential for simple yet secure image transmission applications.
A Video Watermarking Scheme to Hinder Camcorder PiracyIOSR Journals
This document describes a video watermarking scheme to prevent camcorder piracy in movie theaters. The scheme embeds watermarks in video frames so that any compliant video player cannot play the video if recorded in a theater. The watermarking technique is robust to geometric distortions like rotation and scaling. It also prevents loss of quality from lossy compression formats. The scheme uses an integer wavelet transform for the watermark embedding and extraction processes, making it computationally efficient and lossless. Experimental results show the scheme can withstand various attacks like filtering, noise addition, resizing and rotation while accurately extracting the embedded watermarks.
Representing Simplicial Complexes with MangrovesDavid Canino
These slides have been presented at the 22nd International Meshing Roundtable, Orlando, FL, USA. They describe our GPL software, the Mangrove TDS Library: http://mangrovetds.sourceforge.net. It is a C++ tool for the fast prototyping of topological data structures, representing dynamically simplicial and cell complexes.
The document discusses Boolean equi-propagation, an approach for optimizing SAT encodings of constraint satisfaction problems (CSPs). It involves inferring new equalities from constraints and simplifying the model. This allows representing problem structures directly and compactly in conjunctive normal form (CNF). Examples show how equi-propagation simplifies models by removing variables for all-different and bit-sum constraints. Experiments demonstrate the approach generates small CNFs for balanced incomplete block designs and Nonogram puzzles.
Multidisciplinary Design Optimization of Supersonic Transport Wing Using Surr...Masahiro Kanazaki
The document summarizes research on the multidisciplinary design optimization of supersonic transport wing concepts. The objectives were to develop an efficient MDO tool for conceptual design of SST wings using low-cost computational fluid dynamics and surrogate modeling. 107 design samples were evaluated for aerodynamic performance, sonic boom levels, and wing weight. 24 non-dominated solutions were identified, with the best designs having thinner root airfoils, larger inboard sweep angles, or modified tail angles compared to a baseline compromised design. Analysis of variance identified key design variables with the most influence on objectives.
How to Create 3D Mashups by Integrating GIS, CAD, and BIMSafe Software
Find out how you can easily integrate GIS, CAD, BIM, and other spatial data to create accurate 3D representations using FME. You'll discover how to manipulate 3D data across popular formats like CityGML and SketchUp, as well as transform BIM data to load only the required information into your GIS. You'll also hear how FME enabled one customer to take vintage 2D CAD data and create informative 3D PDFs and 3D AutoCAD DWG files to help in the decommissioning of a nuclear power plant
Performance Improvement of Vector Quantization with Bit-parallelism HardwareCSCJournals
Vector quantization is an elementary technique for image compression; however, searching for the nearest codeword in a codebook is time-consuming. In this work, we propose a hardware-based scheme by adopting bit-parallelism to prune unnecessary codewords. The new scheme uses a “Bit-mapped Look-up Table” to represent the positional information of the codewords. The lookup procedure can simply refer to the bitmaps to find the candidate codewords. Our simulation results further confirm the effectiveness of the proposed scheme.
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetupLuba Elliott
This talk by Lucas Theis from Twitter/Magic Pony on "Compressing Images with Neural Networks" was presented at the Learning Image Representations event on 30th August at Twitter as part of the Creative AI meetup.
論文紹介:Learning With Neighbor Consistency for Noisy LabelsToru Tamaki
Ahmet Iscen, Jack Valmadre, Anurag Arnab, Cordelia Schmid, "Learning With Neighbor Consistency for Noisy Labels" CVPR2022
https://openaccess.thecvf.com/content/CVPR2022/html/Iscen_Learning_With_Neighbor_Consistency_for_Noisy_Labels_CVPR_2022_paper.html
Shearlet Frames and Optimally Sparse ApproximationsJakob Lemvig
Shearlet theory has become a central tool in analyzing and representing 2D and 3D data with anisotropic features. Shearlet systems are systems of functions generated by one single generator with parabolic scaling, shearing, and translation operators applied to it, in much the same way wavelet systems are dyadic scalings and translations of a single function, but including a directional parameter. The success of shearlets owes to an extensive list of desirable properties: Shearlet systems can be generated by one function, they provide precise resolution of wavefront sets, they allow compactly supported analyzing elements, they are associated with fast decomposition algorithms, and they provide a unified treatment of the continuum and the digital world.
This talk gives an introduction to shearlet theory with focus on separable and compactly supported shearlets in 2D and 3D. We will consider constructions of band-limited and compactly supported shearlet frames in those dimensions. Finally, we will show that compactly supported shearlet frames satisfying weak decay, smoothness, and directional moment conditions provide optimally sparse approximations of a generalized model of cartoon-like images comprising of piecewise C² functions that are smooth apart from piecewise C² discontinuity edges.
This talk is based on joint with G. Kutyniok and W.-Q Lim (TU Berlin).
The document discusses various techniques for image compression including:
- Run-length coding which encodes repeating pixel values and their lengths.
- Difference coding which encodes the differences between pixel values.
- Block truncation coding which divides images into blocks and assigns codewords.
- Predictive coding which predicts pixel values from neighbors and encodes differences.
Reversible compression allows exact reconstruction while lossy compression sacrifices some information for higher compression but images remain visually similar. Combining techniques can achieve even higher compression ratios.
Learning to Spot and Refactor Inconsistent Method NamesDongsun Kim
To ensure code readability and facilitate software maintenance, program methods must be named properly. In particular, method names must be consistent with the corresponding method implementations. Debugging method names remains an important topic in the literature, where various approaches analyze commonalities among method names in a large dataset to detect inconsistent method names and suggest better ones. We note that the state-of-the-art does not analyze the implemented code itself to assess consistency. We thus propose a novel automated approach to debugging method names based on the analysis of consistency between method names and method code. The approach leverages deep feature representation techniques adapted to the nature of each artifact. Experimental results on over 2.1 million Java methods show that we can achieve up to 15 percentage points improvement over the state-of-the-art, establishing a record performance of 67.9% F1-measure in identifying inconsistent method names. We further demonstrate that our approach yields up to 25% accuracy in suggesting full names, while the state-of-the-art lags far behind at 1.1% accuracy. Finally, we report on our success in fixing 66 inconsistent method names in a live study on projects in the wild.
Effect of Block Sizes on the Attributes of Watermarking Digital ImagesDr. Michael Agbaje
This work examines the effect of block sizes on attributes (robustness, capacity, time of watermarking, visibility and distortion) of watermarked digital images using Discrete Cosine Transform (DCT) function. The DCT function breaks up the image into various frequency bands and allows watermark data to be easily embedded. The advantage of this transformation is the ability to pack input image data into a few coefficients. The block size 8 x 8 is commonly used in watermarking. The work investigates the effect of using block sizes below and above 8 x 8 on the attributes of watermark. The attributes of robustness and capacity increase as the block size increases (62-70db, 31.5-35.9 bit/pixel). The time for watermarking reduces as the block size increases. The watermark is still visible for block sizes below 8 x 8 but invisible for those above it. Distortion decreases sharply from a high value at 2 x 2 block size to minimum at 8 x 8 and gradually increases with block size. The overall observation indicates that watermarked image gradually reduces in quality due to fading above 8 x 8 block size. For easy detection of image against piracy the block size 16 x 16 gives the best output result because it closely resembles the original image in terms of visual quality displayed despite the fact that it contains a hidden watermark.
The document describes EasyEDD, a software for analyzing tomographic electron diffraction data (TEDDI) obtained from synchrotron sources. EasyEDD allows batch processing and visualization of large diffraction data sets. It stores data in a 3D grid format and includes tools for corrections, fitting, visualization and exporting results. The software combines a graphical user interface with algorithms for numerical analysis. Current functionality and future improvements are outlined.
This 3-sentence summary provides the high-level information about the ICWSM'11 tutorial document:
The tutorial document announces a workshop on exploratory network analysis using Gephi, an open-source graph visualization and manipulation software, to be held on July 17, 2011 from 1-4 PM with instructors Sébastien Heymann and Julian Bilcke. The tutorial will provide an introduction to Gephi and guide participants through importing data, network visualization and manipulation, analysis, and aesthetics refinements using real datasets. Participants will work in teams and present preliminary results with the goal of learning practical skills for using Gephi on their own projects.
This document summarizes a lecture on network analysis and characterization. It discusses what networks are made of (nodes and edges), how to characterize networks using basic metrics like degree and centrality, and how network analysis originated with mathematicians studying problems like the Seven Bridges of Königsberg. Modern network analysis can be done with software like Pajek that allows calculation of metrics and visualization of network structures.
The document discusses the development of Gephi, an open-source network analysis and visualization software. It describes Gephi starting as a student project with 1 user, and growing to have 5,000 regular users through a three step process: 1) initial research and prototyping, 2) innovating for early supporters, and 3) dominating the market. It provides timelines and details for each step. The document also provides examples of how Gephi has enabled novel products and analyses for applications like mapping innovation, adverse drug reactions, social gaming monitoring, and more.
This document discusses algorithms for counting triangles in networks to calculate clustering coefficients. It presents an algorithm that [1] uses random hashing to estimate the Jaccard coefficient between neighborhoods of connected nodes, which relates to the number of triangles they form, [2] performs two passes over the network - one to calculate minima and another to count matches, and [3] estimates local and global clustering coefficients from these triangle counts.
This document summarizes the key findings from Stanley Milgram's famous "small world" experiment from the 1960s. The experiment found that on average it took around 5-6 intermediaries for messages to be passed from random starting individuals in Nebraska and Boston to a target individual in Boston. Approximately 29% of messages eventually reached the target individual. While popularized as demonstrating that everyone is connected through "six degrees of separation", the experiment had some methodological limitations and criticisms.
1. The document describes an algorithm called MWHC (Majewski, Wormald, Havas, Czech) for building a minimal perfect hash from a set of strings S in sub-linear space. It works by using multiple hash functions to map strings to vertex numbers and building a graph with edges between the hash values.
2. If the graph is acyclic, the connections in the graph can be modeled as a system of equations that can be solved to assign numbers to each string in S in a way that preserves the original ordering.
3. The algorithm uses just 1.21n log n bits of space to represent the hash function for a set of size n, and allows lookups
This document summarizes Link Analysis and PageRank. It discusses how PageRank works by assigning importance scores to web pages based on the link structure of the web viewed as a graph. It provides formal definitions of PageRank and explains how it is computed using an iterative algorithm. It also discusses factors that influence PageRank scores like the damping factor alpha and how PageRank behaves for different values of alpha.
The document discusses models of web graphs and their properties. It describes experimental observations of real-world web graphs that found they are sparse, small-world, and have power-law degree distributions. The Barabási-Albert preferential attachment model is introduced that generates graphs with these properties by adding new vertices that attach to existing vertices with probability proportional to their degree. The Bollobás-Riordan model provides an alternative construction of growing graphs with preferential attachment that has useful mathematical properties. The degree distributions produced by this model are also discussed.
1. Compressed codings
Web Graph Compression
Implementative issues
Compression techniques
Graph
Paolo Boldi, DSI, Universit` degli studi di Milano
a
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
2. Compressed codings
Web Graph Compression
Implementative issues
What do we mean by. . .
. . . “storing the web graph”?
Having a data structure that allows you, for a given node, to
know its successors.
Possibly: having a way to know which URL corresponds to a
given node and vice versa.
We already studied good data structures for the URL → node
map. The other one (less useful) is not covered here.
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
3. Compressed codings
Web Graph Compression
Implementative issues
Naive representation
0 1 2 3 4 5 6 7 8 9 10 m−1
succ 3 7 12 14 2 27 3 4 7 15 7 ........
offset 0 3 4 4 8 10 ........
0 1 2 3 4 5 n−1
The offset vector tells, for each given node x, where the successor
list of x starts from. Implicitly, it also gives the degree of each
node.
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
4. Compressed codings
Web Graph Compression
Implementative issues
Naive representation
How much space does this representation take?
Successor array: m elements (arcs), each containing a node
(log n bits); with 32 bits, we can store up to 4 billion nodes
(half of it, if we don’t have unsigned types)
Offset array: n elements (nodes), each containing an index in
the successor array (log m bits); with 32 bits, we can store up
t 4 billion arcs.
All in all, 32(n + m) bits. If we assume m = 8n (a very modest
assumption on the outdegree), we need 288n bits, i.e., 288
bits/node, 36 bits/arc.
We show how to reduce this of an order of magnitude.
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
5. Compressed codings
Web Graph Compression
Implementative issues
Idea
Use a variable-length representation for successors. Such a
representation should obviously. . .
be instantaneously decodable
minimize the expected bitlength.
What about the offset array?
bit displacement vs. byte displacement (with alignment)
we have to keep an explicit representation of the node degrees
(e.g., in the successor array, before every successsor list).
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
6. Compressed codings
Web Graph Compression
Implementative issues
Variable-length representation
3 3 7 12 2 14 5 2
succ 101110001101111101010111011111001101
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 35 36 37 38 39 40
offset 0 20 33 33 ........
0 1 2 3 n−1
Node degrees (blue background), followed by successors. Each
number is represented using an instantaneous code (possibly,
different for degree and successors).
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
7. Compressed codings
Web Graph Compression
Implementative issues
Instantaneous code
An instantaneous (binary) code for the set S is a function
c : S → {0, 1}∗ such that, for all x, y ∈ S, if c(x) is a prefix
of c(y ), then x = y .
Let lx be the length (in bits) of c(x).
Kraft-McMillan: there exists an instantaneous code with
lengths lx (x ∈ S) if and only if
2−lx ≤ 1.
x∈S
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
8. Compressed codings
Web Graph Compression
Implementative issues
Intended distribution
If p : S → [0, 1] is the probability distribution of the source,
than the ideal code for S is such that (Shannon)
lx = − log p(x)
(all logarithms are in base 2).
So a code with lengths lx has intended distribution
p(x) = 2−lx .
The choice of the code to use will be based on the expected
distribution of the data.
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
9. Compressed codings
Web Graph Compression
Implementative issues
Fixed-length coding
If S = {1, 2, . . . , N}, to represent an element of S it is
sufficient to use log N bits.
The fixed-length representation for S uses exactly that
number of bits for every element (and represents x using the
standard binary coding of x − 1 on log N bits).
Intended distribution:
p(x) = 2− log N
uniform distribution.
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
10. Compressed codings
Web Graph Compression
Implementative issues
Unary coding
If S = N, one can represent x ∈ S writing x zeroes followed
by a one.
So lx = x + 1, and the intended distribution is
p(x) = 2−x−1 geometric distribution of ratio 1/2.
0 1
1 01
2 001
3 0001
4 00001
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
11. Compressed codings
Web Graph Compression
Implementative issues
A more general viewpoint
Unary coding can be seen as a special case of a more general kind
of coding for N. Suppose you group N into slots: every slot is
made by consecutive integers; let
V = s1 , s2 , s3 , . . .
be the slot sizes (in the unary case s1 = s2 = · · · = 1).
Then, to represent x ∈ N one can
encode in unary the index i of the slot containing x;
encode in binary the offset of x within its slot (using log si
bits).
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
12. Compressed codings
Web Graph Compression
Implementative issues
Golomb coding
Golomb coding with modulus b is obtained choosing
V = b, b, b, . . . .
To represent x ∈ N you need to specify the slot where x falls (that
is, x/b ) in unary, and then represent the offset using log b bits
(or log b bits).
So
x
lx = + log b .
b
The intended distribution is
√
p(x) = 2−lx ∝ (21/b )−x geometric distribution of ratio 1/ 2.
b
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
13. Compressed codings
Web Graph Compression
Implementative issues
More precisely. . .
A finer analysis shows that Golomb coding is optimal (=Huffman)
for a geometric distribution of ratio p, provided that b is chosen as
log(2 − p)
b= .
− log(1 − p)
0 10
1 110
2 111
3 010
4 0110
5 0111
6 0010
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
14. Compressed codings
Web Graph Compression
Implementative issues
Elias’ γ
Elias’ γ coding of x ∈ N+ is obtained by representing x in binary
preceded by a unary representation of its length (minus one).
More precisely, to represent x we write in unary log x and then in
binary x − 2 log x (on log x bits). So
1
lx = 1 + 2 log x =⇒ p(x) ∝ (Zipf of exponent 2)
2x 2
1 1
2 010
3 011
4 00100
5 00101
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
15. Compressed codings
Web Graph Compression
Implementative issues
Elias’ δ
Elias’ δ coding of x ∈ N+ is obtained by representing x in binary
preceded by a representation of its length in γ.
So
1
lx = 1 + 2 log log x + log x =⇒ p(x) ∝
2x(log x)2
1 1
2 0100
3 0101
4 01100
5 01101
6 00100000
7 00100001
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
16. Compressed codings
Web Graph Compression
Implementative issues
An alternative way. . .
. . . to think of γ coding is that x is represented using its usual
binary representation (except for the initial “1”, which is omitted),
with every bit “coming with” a continuation bit, that tells whether
the representation continues or whether it stops there.
For example (up to bit permutation) γ coding of 724 (in binary:
1011010100) is
011111011101110100
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
17. Compressed codings
Web Graph Compression
Implementative issues
k-bit-variable coding
What happens if we group digits k by k?
011111011101110100
001111011011000
011101011000
1101101000
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
18. Compressed codings
Web Graph Compression
Implementative issues
k-bit-variable coding (cont’d)
For x, we use log(x)/k bits for the unary part, and the same
number of bits multiplied by k for the binary part.
So
lx = (k +1)( log(x)/k ) =⇒ p(x) ∝ x −(k+1)/k (Zipf (k + 1)/k)
A more efficient variant: the ζk codes (for Zipf 1 → 2).
γ = ζ1 ζ2 ζ3 ζ4
1 1 10 100 1000
2 010 110 1010 10010
3 011 111 1011 10011
4 00100 01000 1100 10100
5 00101 01001 1101 10101
6 00110 01010 1110 10110
7 00111 01011 1111 10111
8 0001000 011000 0100000 11000
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
19. Compressed codings
Web Graph Compression
Implementative issues
Comparing codings
x
2. 4. 7. .1e2
1.
.1
.1e–1
.1e–2
Legend
Unaria = Golomb 1
Golomb 3
gamma=1-var
3-var
delta
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
20. Compressed codings
Web Graph Compression
Implementative issues
Coding techniques. . .
. . . alone do not improve on compression: we have first to
guarantee that the data we represent have a distribution close to
the intended one (depending on the coding we are going to use).
In particular, they have to enjoy a monotonic distribution (smaller
values are more probable than larger ones).
BTW: some codings (e.g., Elias γ and δ) are universal: for
whatever monotonic distribution, they guarantee an expected
length that is only within a constant factor of the optimal one.
Degrees are distributed like a Zipf of exponent ≈ 2.7: they
can be safely encoded using γ.
What about successors? Let us assume that successors of x
are y1 , . . . , yk : how should we encode y1 , . . . , yk ?
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
21. Compressed codings
Web Graph Compression
Implementative issues
Locality
In general, we cannot say much about their distribution, unless we
make some assumption on the way in which nodes are ordered.
Many hypertextual links contained in a web page are
navigational (“home”, “next”, “up”. . . ). If we compare the
URL they refer to with that of the page containing them, they
share a long common prefix. This property is known as
locality and it was first observed by the authors of the
Connectivity Server.
To exploit this property, assume that URLs are ordered
lexicographically (that is, node 0 is the first URL in
lexicographic order, etc.). Then, if x → y is an arc, most of
the times |x − y | will be “small”.
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
22. Compressed codings
Web Graph Compression
Implementative issues
Exploiting locality
If x has successors y1 < y2 < · · · < yk , we represent its successor
list though the gaps (differentiation):
y1 − x, y2 − y1 − 1, . . . , yk − yk−1 − 1
(only the first value can be negative: wa make it into a natural
number. . . ). How are such differences distributed?
1
0.1
0.01
0.001
0.0001
1e-05
1e-06
1e-07
1 10 100 1000 10000 100000
Zipf with exponent 1.2 =⇒ we use ζ3 .
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
23. Compressed codings
Web Graph Compression
Implementative issues
Similarity
URLs close to each other (in lexicographic order) have similar
successor sets: this fact (known as similarity) was exploited for the
first time in the Link database.
We may encode the successor list of x as follows:
we write the differences with respect to the successor list of
some previous node x − r (called the reference node)
we explicitly encode (as before) only the successors of x that
were not successors of x − r .
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
24. Compressed codings
Web Graph Compression
Implementative issues
Similarity (cont’d)
More explicitly, the successor list of x is encoded as (referencing):
an intger r (reference): if r > 0, the list is described by
difference with respect to the successor list of x − r ; in this
case, we write a bitvector (of length equal to d + (x − r ))
discriminating the elements in N + (x − r ) ∩ N + (x) from the
ones in N + (x − r ) N + (x)
an explicit list of extra nodes, containing the elements of
N + (x) N + (x − r ) (or the whole N + (x), if r = 0), encoded
as explained before.
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
26. Compressed codings
Web Graph Compression
Implementative issues
Blocks (differential compression)
Instead of using a bitvector, we use run-length encoding, telling
the length of successive runs (blocks) of “0” and “1”:
Node Outd. Ref. Copy list Extra nodes
... ... ... ... ...
15 11 0 13, 15, 16, 17, 18, 19, 23, 24, 203, 315, 1034
16 10 1 01110011010 22, 316, 317, 3041
17 0
18 5 3 11110000000 50
... ... ... ... ...
Node Outd. Ref. # blocks Copy blocks Extra nodes
... ... ... ... ... ...
15 11 0 13, 15, 16, 17, 18, 19, 23, . . .
16 10 1 7 0, 0, 2, 1, 1, 0, 0 22, 316, . . .
17 0
18 5 3 1 4 50
... ... ... ... ... ...
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
27. Compressed codings
Web Graph Compression
Implementative issues
Consecutivity
Among the extra nodes, many happen to sport the consecutivity
property: they appear in clusters of consecutive integers. This
phenomenon, observed empirically, have some possible
explanations:
most pages contain groups of navigational links that
correspond to a certain hierarchical level of the website, and
are often consecutive to one another;
in the transpose graph, moreover, consecutivity is the dual of
similarity with reference 1: when there is a cluster of
consecutive pages with many similar links, in the transpose
there are intervals of consecutive outgoing links.
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
28. Compressed codings
Web Graph Compression
Implementative issues
Consecutivity (cont’d)
To exploit consecutivity, we use a special representation for the
extra node list called intervalization, that is:
sufficiently long (say ≥ T ) intervals of consecutive integers are
represented by their left extreme and their length minus T ;
other extra nodes, if any, are called residual nodes and are
represented alone.
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
30. Compressed codings
Web Graph Compression
Implementative issues
Reference window
When the reference node is chosen, how far back in the “past” are
we allowed to go? We need to keep track of a window of the last
W successor lists. The choice of W is critical:
a large W guarantees better compression, but increases
compression time and space
after W = 7 there is no significant improvement in
compression.
The choice of W does not impact on decompression time.
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques
31. Compressed codings
Web Graph Compression
Implementative issues
Reference chain length
Referencing involves recursion: to decode the successor list of x,
we need first to decompress the successor list of x − r , etc. This
chain is called the reference chain of x: decompression speed
depends on the length of such chains.
During compression, it is possible to limit their length keeping into
account of how long is the reference chain for every node in the
window and avoiding to use nodes whose reference chain is already
of a given maximum length R.
The choice of R influences the compression ratio (with R = ∞
giving the best possible compression) but also on decompression
speed (R = ∞ may produce access time that can be two orders of
magnitude larger than R = 1 — it may even produce stack
overflows).
Paolo Boldi, DSI, Universit` degli studi di Milano
a Compression techniques