This document summarizes Hannaneh Najdataei's licentiate seminar on parallel data streaming analytics in the context of Internet of Things. The seminar covered continuous clustering of LiDAR point cloud data, elasticity in stream processing, and efficient processing across edge, fog and cloud computing environments. A key focus was designing analytics that can continuously analyze streaming data, adaptively reconfigure to changes in data rates and hardware resources, and process data efficiently across different platforms in a hardware-independent manner.
Classifying Multi-Variate Time Series at Scale:
Characterizing and understanding the runtime behavior of large scale Big Data production systems is extremely important. Typical systems consist of hundreds to thousands of machines in a cluster with hundreds of terabytes of storage costing millions of dollars, solving problems that are business critical. By instrumenting each running process, and measuring their resource utilization including CPU, Memory, I/O, network etc., as time series it is possible to understand and characterize the workload on these massive clusters. Each time series is a series consisting of tens to tens of thousands of data points that must be ingested and then classified. At Pepperdata, our instrumentation of the clusters collects over three hundred metrics from each task every five seconds resulting in millions of data points per hour. At this scale the data are equivalent to the biggest IOT data sets in the world. Our objective is to classify the collection of time series into a set of classes that represent different work load types. Or phrased differently, our problem is essentially the problem of classifying multivariate time series.
In this talk, we propose a unique, off-the-shelf approach to classifying time series that achieves near best-in-class accuracy for univariate series and generalizes to multivariate time series. Our technique maps each time series to a Grammian Angular Difference Field (GADF), interprets that as an image, uses Google’s pre-trained CNN (trained on Inception v3) to map the GADF images into a 2048-dimensional vector space and then uses a small MLP with two hidden layers, with fifty nodes in each layer, and a softmax output to achieve the final classification. Our work is not domain specific – a fact proven by our achieving competitive accuracies with published results on the univariate UCR data set as well as the multivariate UCI data set.
Bio: Before joining Pepperdata, Ash was executive chairman for Marianas Labs, a deep learning startup sold in December 2015. Prior to that he was CEO for Graphite Systems, a big data storage startup that was sold to EMC DSSD in August 2015. Munshi also served as CTO of Yahoo, as a CEO of both public and private companies, and is on the board of several technology startups.
This document describes research on implementing Curran's approximation algorithm for pricing Asian options using a dataflow architecture. The algorithm was implemented on a Maxeler dataflow engine (DFE) and compared to a CPU implementation. Different fixed-point precisions were tested on the DFE and 54-bit fixed-point provided the best balance of precision and resource usage. Implementing the algorithm across multiple DFEs provided speedups of 5-12x over a 48-core CPU. Further optimization of dynamic ranges allowed increasing the unrolling factor, improving performance and energy efficiency.
This presentation discusses dynamically tuning the RTS threshold value in IEEE 802.11 networks based on the packet distribution. It outlines previous work that assumed static nodes and traffic. The proposal is to use the basic transmission scheme for relatively small packets and RTS/CTS for large packets, setting the threshold dynamically each interval so that η% of packets are below it. Simulations using NS-3 evaluate the system throughput with varying node counts, packet rates and η values.
Il tempo vola: rappresentare e manipolare sequenze di eventi e time series co...Codemotion
Rappresentare lo scorrere del tempo non è un'impresa semplice, specialmente con strumenti "tradizionali". Purtroppo però la dimensione temporale è fondamentale in mille contesti diversi, dall'analisi statistica alla rappresentazione dei rapporti di causa-effetto, dal forecasting al controllo automatico. In questo talk vedremo come utilizzare al meglio OrientDB, un Document-Graph Database, per il salvataggio, l'elaborazione e l'interrogazione di questo tipo di informazioni.
This document summarizes several algorithms for parallel matrix operations, including matrix-vector multiplication, matrix-matrix multiplication, and solving systems of linear equations via Gaussian elimination. For matrix-vector multiplication, it describes row-wise and column-wise partitioning approaches. For matrix-matrix multiplication, it discusses algorithms based on row/column broadcasting, Cannon's algorithm, and a 3D domain decomposition approach. For Gaussian elimination, it analyzes pipelined and 2D mapping implementations. The key aspects of parallelization, communication costs, computation loads, scalability, and cost efficiency are analyzed for each algorithm.
The document summarizes a research paper on Deep Crossing, a deep learning model that automatically combines features for web-scale modeling without manually crafted combinatorial features. The key points are:
1. Deep Crossing uses a neural network to automatically learn combinatorial features from individual features, avoiding the manual feature engineering required by previous models.
2. It was shown to outperform previous models like DSSM that used late feature crossing. Deep Crossing's early feature crossing was more effective.
3. Deep Crossing was able to achieve better performance than production models using much less training data, and is easier to build and maintain than manually engineered models.
This document summarizes a paper that proposes methods for continuous and parallel LiDAR point cloud clustering. It introduces Lisco, which continuously processes LiDAR data streams to cluster points. P-Lisco parallelizes Lisco's processing pipeline to further improve performance. Evaluation on synthetic and real LiDAR datasets shows P-Lisco achieves real-time processing and outperforms alternative methods like PCL. Future work involves specialized implementations and applying the continuous analysis approach to other related problems.
Classifying Multi-Variate Time Series at Scale:
Characterizing and understanding the runtime behavior of large scale Big Data production systems is extremely important. Typical systems consist of hundreds to thousands of machines in a cluster with hundreds of terabytes of storage costing millions of dollars, solving problems that are business critical. By instrumenting each running process, and measuring their resource utilization including CPU, Memory, I/O, network etc., as time series it is possible to understand and characterize the workload on these massive clusters. Each time series is a series consisting of tens to tens of thousands of data points that must be ingested and then classified. At Pepperdata, our instrumentation of the clusters collects over three hundred metrics from each task every five seconds resulting in millions of data points per hour. At this scale the data are equivalent to the biggest IOT data sets in the world. Our objective is to classify the collection of time series into a set of classes that represent different work load types. Or phrased differently, our problem is essentially the problem of classifying multivariate time series.
In this talk, we propose a unique, off-the-shelf approach to classifying time series that achieves near best-in-class accuracy for univariate series and generalizes to multivariate time series. Our technique maps each time series to a Grammian Angular Difference Field (GADF), interprets that as an image, uses Google’s pre-trained CNN (trained on Inception v3) to map the GADF images into a 2048-dimensional vector space and then uses a small MLP with two hidden layers, with fifty nodes in each layer, and a softmax output to achieve the final classification. Our work is not domain specific – a fact proven by our achieving competitive accuracies with published results on the univariate UCR data set as well as the multivariate UCI data set.
Bio: Before joining Pepperdata, Ash was executive chairman for Marianas Labs, a deep learning startup sold in December 2015. Prior to that he was CEO for Graphite Systems, a big data storage startup that was sold to EMC DSSD in August 2015. Munshi also served as CTO of Yahoo, as a CEO of both public and private companies, and is on the board of several technology startups.
This document describes research on implementing Curran's approximation algorithm for pricing Asian options using a dataflow architecture. The algorithm was implemented on a Maxeler dataflow engine (DFE) and compared to a CPU implementation. Different fixed-point precisions were tested on the DFE and 54-bit fixed-point provided the best balance of precision and resource usage. Implementing the algorithm across multiple DFEs provided speedups of 5-12x over a 48-core CPU. Further optimization of dynamic ranges allowed increasing the unrolling factor, improving performance and energy efficiency.
This presentation discusses dynamically tuning the RTS threshold value in IEEE 802.11 networks based on the packet distribution. It outlines previous work that assumed static nodes and traffic. The proposal is to use the basic transmission scheme for relatively small packets and RTS/CTS for large packets, setting the threshold dynamically each interval so that η% of packets are below it. Simulations using NS-3 evaluate the system throughput with varying node counts, packet rates and η values.
Il tempo vola: rappresentare e manipolare sequenze di eventi e time series co...Codemotion
Rappresentare lo scorrere del tempo non è un'impresa semplice, specialmente con strumenti "tradizionali". Purtroppo però la dimensione temporale è fondamentale in mille contesti diversi, dall'analisi statistica alla rappresentazione dei rapporti di causa-effetto, dal forecasting al controllo automatico. In questo talk vedremo come utilizzare al meglio OrientDB, un Document-Graph Database, per il salvataggio, l'elaborazione e l'interrogazione di questo tipo di informazioni.
This document summarizes several algorithms for parallel matrix operations, including matrix-vector multiplication, matrix-matrix multiplication, and solving systems of linear equations via Gaussian elimination. For matrix-vector multiplication, it describes row-wise and column-wise partitioning approaches. For matrix-matrix multiplication, it discusses algorithms based on row/column broadcasting, Cannon's algorithm, and a 3D domain decomposition approach. For Gaussian elimination, it analyzes pipelined and 2D mapping implementations. The key aspects of parallelization, communication costs, computation loads, scalability, and cost efficiency are analyzed for each algorithm.
The document summarizes a research paper on Deep Crossing, a deep learning model that automatically combines features for web-scale modeling without manually crafted combinatorial features. The key points are:
1. Deep Crossing uses a neural network to automatically learn combinatorial features from individual features, avoiding the manual feature engineering required by previous models.
2. It was shown to outperform previous models like DSSM that used late feature crossing. Deep Crossing's early feature crossing was more effective.
3. Deep Crossing was able to achieve better performance than production models using much less training data, and is easier to build and maintain than manually engineered models.
This document summarizes a paper that proposes methods for continuous and parallel LiDAR point cloud clustering. It introduces Lisco, which continuously processes LiDAR data streams to cluster points. P-Lisco parallelizes Lisco's processing pipeline to further improve performance. Evaluation on synthetic and real LiDAR datasets shows P-Lisco achieves real-time processing and outperforms alternative methods like PCL. Future work involves specialized implementations and applying the continuous analysis approach to other related problems.
Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...Carlos Reaño González
This document discusses pipelined compression in remote GPU virtualization systems using rCUDA. It introduces remote GPU virtualization and the challenges of slow networks. It then describes a pipelined compression architecture that can compress data on the fly during transfer. Experimental results show that compression libraries reduce execution time by 1-6 minutes for various machine learning models. Analysis finds that over 90% of transfers are small, between 1 byte and 1 KB, and could benefit from further compression. The initial implementation shows potential for reducing execution time but leaves room for improvement.
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
These are the slides I used for a crash course (4 hours) on data streaming. It contains both theory / research aspects as well as examples based on Apache Flink (DataStream API)
This thesis focuses on performance management techniques for cloud services. It presents work in three key areas: 1) Developing a scalable and generic resource allocation protocol for large cloud environments. 2) Building performance models to predict response times and capacity for a distributed key-value store. 3) Enabling real-time prediction of service metrics using analytics on low-level system statistics. The thesis contributes solutions for these challenging problems and identifies open questions around decentralized resource allocation, online performance management, and analytics-based forecasting at large scales.
Personal Research Overview presented at the KU-NAIST Research MeetingChawanat Nakasan
Chawanat Nakasan presented a personal research overview for a KU-NAIST research meeting. He summarized his academic history, including undergraduate studies in computer engineering at Kasetsart University and graduate studies at NAIST, where he focused on software-defined networking and multipath networking. At NAIST, his research optimized Multipath TCP performance using a software-defined network with an OpenFlow controller that identifies MPTCP subflow groups and assigns them to network paths. Evaluation showed the approach improved throughput. Nakasan discussed achievements including publications and awards. He is graduating from NAIST and the PRAGMA Students Steering Committee, and moving to an academic career in Japan.
The document discusses data streaming in IoT and big data analytics. It begins with an introduction to data streaming and the need for streaming techniques due to the complexity of analyzing large volumes of IoT data. It then covers the data streaming processing paradigm, including continuous queries, stateless and stateful operators, and windows. Challenges and research questions in data streaming are also discussed, such as distributed deployment, parallelism, and fault tolerance. The document concludes that data streaming is well-suited for real-time analysis of IoT data due to its ability to perform online, parallel and distributed processing.
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
This is an Image Semantic Segmentation project targeted on Satellite Imagery. The goal was to detect the pixel-wise segmentation map for various objects in Satellite Imagery including buildings, water bodies, roads etc. The data for this was taken from the Kaggle competition <https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection>.
We implemented FCN, U-Net and Segnet Deep learning architectures for this task.
The presentation slides of my Ph.D. thesis proposal ("CAT" as known in my university). I received a score of 18/20.
Supervisors:
Prof. Luís Veiga (IST, ULisboa)
Prof. Peter Van Roy (UCLouvain)
Jury:
Prof. Javid Taheri (Karlstad University)
Prof. Fernando Mira da Silva (IST, ULisboa)
This document discusses deep learning initiatives at NECSTLab focused on hardware acceleration of convolutional neural networks using FPGAs. It proposes a framework called CNNECST that provides high-level APIs to design CNNs, integrates with machine learning frameworks for training, and generates customized hardware for FPGA implementation through C++ libraries and Vivado. Experimental results show speedups and energy savings for CNNs like LeNet and MNIST on FPGA boards compared to CPU. Challenges and future work include supporting more layer types and reduced precision computations.
This document appears to be a project report submitted by two students for their Bachelor of Technology degree. It summarizes their simulation of ADIAN, a new ad-hoc network routing protocol based on distributed intelligence. The report includes chapters on introduction, background study of existing protocols, implementation details of ADIAN using NS2, and plans for conclusion and future work. The background study analyzes research papers on ad-hoc routing protocols and compares the performance of protocols like AODV, DSR, DSDV and OLSR. It also discusses networking simulation tools with a focus on NS2. The implementation chapter outlines the requirements, architecture, data structures, testing approach and risk mitigation plan for simulating ADIAN in NS2
This document is a dissertation submitted by Theofylaktos Papapanagiotou for the degree of Master of Science. It evaluates different grid performance monitoring tools and information services for distributing monitoring data in a multi-level architecture. It describes how tools like Ganglia, Nagios, BDII, and WSRF can be used to monitor load averages on grid nodes, aggregate the data, and present performance visualizations. The dissertation aims to understand how standards like the GLUE schema are used to organize information in the services and evaluate which approach better supports the multi-level monitoring model.
Slides for Ph.D. Thesis Defense of Dheryta Jaisinghani at IIIT-Delhi, INDIADheryta Jaisinghani
Dheryta Jaisinghani defended her Ph.D. thesis on understanding the role of active scans in large-scale WiFi networks. She investigated three main problems: unnecessary active scans, WiFi indoor localization with minimal active scans, and using active scans for data transfer. Her research methodology involved analyzing real-world WiFi traffic datasets, designing solutions to address the problems, and implementing and evaluating the solutions. Some key contributions included techniques to detect growth in probe traffic, infer the causes of active scanning, and reduce unnecessary probe requests from client devices.
The presentation slides of my Ph.D. thesis. For more information - https://kkpradeeban.blogspot.com/2019/07/my-phd-defense-software-defined-systems.html
This is the 2nd defense of my Ph.D. double degree.
More details - https://kkpradeeban.blogspot.com/2019/08/my-phd-defense-software-defined-systems.html
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
This document discusses the RAMSES project, which aims to develop a new science of end-to-end analytical performance modeling of science workflows in extreme-scale science environments. The RAMSES research agenda involves developing component and end-to-end models, tools to provide performance advice, data-driven estimation methods, automated experiments, and a performance database. The models will be evaluated using five challenge workflows: high-performance file transfer, diffuse scattering experimental data analysis, data-intensive distributed analytics, exascale application kernels, and in-situ analysis placement.
Slides for my Associate Professor (oavlönad docent) lecture.
The lecture is about Data Streaming (its evolution and basic concepts) and also contains an overview of my research.
Clustering-based Analysis for Heavy-Hitter Flow DetectionAPNIC
This document summarizes a research paper that proposes using unsupervised machine learning clustering techniques rather than thresholds to detect heavy hitter (HH) flows in a network. It describes collecting network flow data and analyzing it using algorithms like K-means and Gaussian mixtures to group flows. This identified multiple clusters rather than just two groups (elephants and mice). Further clustering an ambiguous zone revealed patterns that could better classify HH flows without relying on thresholds. The clustering results were then passed to an SDN controller to mark flows and take appropriate actions like re-routing.
Las plataformas IoT deben permitir la comunicación entre las aplicaciones y los dispositivos de acuerdo con sus requisitos no funcionales. Algunos de los principales requisitos no funcionales son la calidad del servicio (QoS, por sus siglas en inglés) y la calidad de la experiencia (QoE, por sus siglas en inglés), entre otros. En esta charla se presenta una Plataforma Autonómica para IoT (Internet of Things, por sus siglas en inglés), para la gestión de la QoS y QoE, basada en el concepto de ciclo autonómico de tareas de análisis de datos. En esta plataforma se han definido varios ciclos autonómicos de análisis de datos. Esta charla presenta algunos de esos ciclos autonómicos, y analiza sus capacidades de diagnóstico, basada en el perfil de estado operacional determinado por ellos.
This document discusses various techniques for optimizing deep neural network models and hardware for efficiency. It covers approaches such as exploiting activation and weight statistics, sparsity, compression, pruning neurons and synapses, decomposing trained filters, and knowledge distillation. The goal is to reduce operations, memory usage, and energy consumption to enable efficient inference on hardware like mobile phones and accelerators. Evaluation methodologies are also presented to guide energy-aware design space exploration.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...Carlos Reaño González
This document discusses pipelined compression in remote GPU virtualization systems using rCUDA. It introduces remote GPU virtualization and the challenges of slow networks. It then describes a pipelined compression architecture that can compress data on the fly during transfer. Experimental results show that compression libraries reduce execution time by 1-6 minutes for various machine learning models. Analysis finds that over 90% of transfers are small, between 1 byte and 1 KB, and could benefit from further compression. The initial implementation shows potential for reducing execution time but leaves room for improvement.
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
These are the slides I used for a crash course (4 hours) on data streaming. It contains both theory / research aspects as well as examples based on Apache Flink (DataStream API)
This thesis focuses on performance management techniques for cloud services. It presents work in three key areas: 1) Developing a scalable and generic resource allocation protocol for large cloud environments. 2) Building performance models to predict response times and capacity for a distributed key-value store. 3) Enabling real-time prediction of service metrics using analytics on low-level system statistics. The thesis contributes solutions for these challenging problems and identifies open questions around decentralized resource allocation, online performance management, and analytics-based forecasting at large scales.
Personal Research Overview presented at the KU-NAIST Research MeetingChawanat Nakasan
Chawanat Nakasan presented a personal research overview for a KU-NAIST research meeting. He summarized his academic history, including undergraduate studies in computer engineering at Kasetsart University and graduate studies at NAIST, where he focused on software-defined networking and multipath networking. At NAIST, his research optimized Multipath TCP performance using a software-defined network with an OpenFlow controller that identifies MPTCP subflow groups and assigns them to network paths. Evaluation showed the approach improved throughput. Nakasan discussed achievements including publications and awards. He is graduating from NAIST and the PRAGMA Students Steering Committee, and moving to an academic career in Japan.
The document discusses data streaming in IoT and big data analytics. It begins with an introduction to data streaming and the need for streaming techniques due to the complexity of analyzing large volumes of IoT data. It then covers the data streaming processing paradigm, including continuous queries, stateless and stateful operators, and windows. Challenges and research questions in data streaming are also discussed, such as distributed deployment, parallelism, and fault tolerance. The document concludes that data streaming is well-suited for real-time analysis of IoT data due to its ability to perform online, parallel and distributed processing.
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
This is an Image Semantic Segmentation project targeted on Satellite Imagery. The goal was to detect the pixel-wise segmentation map for various objects in Satellite Imagery including buildings, water bodies, roads etc. The data for this was taken from the Kaggle competition <https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection>.
We implemented FCN, U-Net and Segnet Deep learning architectures for this task.
The presentation slides of my Ph.D. thesis proposal ("CAT" as known in my university). I received a score of 18/20.
Supervisors:
Prof. Luís Veiga (IST, ULisboa)
Prof. Peter Van Roy (UCLouvain)
Jury:
Prof. Javid Taheri (Karlstad University)
Prof. Fernando Mira da Silva (IST, ULisboa)
This document discusses deep learning initiatives at NECSTLab focused on hardware acceleration of convolutional neural networks using FPGAs. It proposes a framework called CNNECST that provides high-level APIs to design CNNs, integrates with machine learning frameworks for training, and generates customized hardware for FPGA implementation through C++ libraries and Vivado. Experimental results show speedups and energy savings for CNNs like LeNet and MNIST on FPGA boards compared to CPU. Challenges and future work include supporting more layer types and reduced precision computations.
This document appears to be a project report submitted by two students for their Bachelor of Technology degree. It summarizes their simulation of ADIAN, a new ad-hoc network routing protocol based on distributed intelligence. The report includes chapters on introduction, background study of existing protocols, implementation details of ADIAN using NS2, and plans for conclusion and future work. The background study analyzes research papers on ad-hoc routing protocols and compares the performance of protocols like AODV, DSR, DSDV and OLSR. It also discusses networking simulation tools with a focus on NS2. The implementation chapter outlines the requirements, architecture, data structures, testing approach and risk mitigation plan for simulating ADIAN in NS2
This document is a dissertation submitted by Theofylaktos Papapanagiotou for the degree of Master of Science. It evaluates different grid performance monitoring tools and information services for distributing monitoring data in a multi-level architecture. It describes how tools like Ganglia, Nagios, BDII, and WSRF can be used to monitor load averages on grid nodes, aggregate the data, and present performance visualizations. The dissertation aims to understand how standards like the GLUE schema are used to organize information in the services and evaluate which approach better supports the multi-level monitoring model.
Slides for Ph.D. Thesis Defense of Dheryta Jaisinghani at IIIT-Delhi, INDIADheryta Jaisinghani
Dheryta Jaisinghani defended her Ph.D. thesis on understanding the role of active scans in large-scale WiFi networks. She investigated three main problems: unnecessary active scans, WiFi indoor localization with minimal active scans, and using active scans for data transfer. Her research methodology involved analyzing real-world WiFi traffic datasets, designing solutions to address the problems, and implementing and evaluating the solutions. Some key contributions included techniques to detect growth in probe traffic, infer the causes of active scanning, and reduce unnecessary probe requests from client devices.
The presentation slides of my Ph.D. thesis. For more information - https://kkpradeeban.blogspot.com/2019/07/my-phd-defense-software-defined-systems.html
This is the 2nd defense of my Ph.D. double degree.
More details - https://kkpradeeban.blogspot.com/2019/08/my-phd-defense-software-defined-systems.html
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
This document discusses the RAMSES project, which aims to develop a new science of end-to-end analytical performance modeling of science workflows in extreme-scale science environments. The RAMSES research agenda involves developing component and end-to-end models, tools to provide performance advice, data-driven estimation methods, automated experiments, and a performance database. The models will be evaluated using five challenge workflows: high-performance file transfer, diffuse scattering experimental data analysis, data-intensive distributed analytics, exascale application kernels, and in-situ analysis placement.
Slides for my Associate Professor (oavlönad docent) lecture.
The lecture is about Data Streaming (its evolution and basic concepts) and also contains an overview of my research.
Clustering-based Analysis for Heavy-Hitter Flow DetectionAPNIC
This document summarizes a research paper that proposes using unsupervised machine learning clustering techniques rather than thresholds to detect heavy hitter (HH) flows in a network. It describes collecting network flow data and analyzing it using algorithms like K-means and Gaussian mixtures to group flows. This identified multiple clusters rather than just two groups (elephants and mice). Further clustering an ambiguous zone revealed patterns that could better classify HH flows without relying on thresholds. The clustering results were then passed to an SDN controller to mark flows and take appropriate actions like re-routing.
Las plataformas IoT deben permitir la comunicación entre las aplicaciones y los dispositivos de acuerdo con sus requisitos no funcionales. Algunos de los principales requisitos no funcionales son la calidad del servicio (QoS, por sus siglas en inglés) y la calidad de la experiencia (QoE, por sus siglas en inglés), entre otros. En esta charla se presenta una Plataforma Autonómica para IoT (Internet of Things, por sus siglas en inglés), para la gestión de la QoS y QoE, basada en el concepto de ciclo autonómico de tareas de análisis de datos. En esta plataforma se han definido varios ciclos autonómicos de análisis de datos. Esta charla presenta algunos de esos ciclos autonómicos, y analiza sus capacidades de diagnóstico, basada en el perfil de estado operacional determinado por ellos.
This document discusses various techniques for optimizing deep neural network models and hardware for efficiency. It covers approaches such as exploiting activation and weight statistics, sparsity, compression, pruning neurons and synapses, decomposing trained filters, and knowledge distillation. The goal is to reduce operations, memory usage, and energy consumption to enable efficient inference on hardware like mobile phones and accelerators. Evaluation methodologies are also presented to guide energy-aware design space exploration.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Leonel Morgado
Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
ESPP presentation to EU Waste Water Network, 4th June 2024 “EU policies driving nutrient removal and recycling
and the revised UWWTD (Urban Waste Water Treatment Directive)”
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
4. Edge Computing
Fog Computing
Cloud Computing
IoT Analytics
4Introduction Continuous clustering Elasticity in stream processing Conclusions
5. Edge Computing
Fog Computing
Cloud Computing
IoT Analytics
5Introduction Continuous clustering Elasticity in stream processing Conclusions
6. 3-tier IoT Architecture
6Introduction Continuous clustering Elasticity in stream processing Conclusions
Cloud Tier
Data Centers
Fog Tier
Nodes
Edge Tier
Devices
7. Scope of the Thesis
7
The challenges
• Unbounded data
• Unpredictable data rate
• Various platforms
• Time requirements
Computationalpower
High
Medium
Low
Introduction Continuous clustering Elasticity in stream processing Conclusions
• Design and implement analytics
8. Scope of the Thesis
8
The objectives
• Continuous analysis
• Adaptive reconfiguration
• Hardware independent
• Efficient processing
Introduction Continuous clustering Elasticity in stream processing Conclusions
• Design and implement analytics
The challenges
• Unbounded data
• Unpredictable data rate
• Various platforms
• Time requirement
9. Conventional Data Analytics (Batch processing)
9
Data
Analysis
Results
Database
Introduction Continuous clustering Elasticity in stream processing Conclusions
13. Stream Processing Operators
13Introduction Continuous clustering Elasticity in stream processing Conclusions
• Stateless
• E.g. filter
• Stateful
State is the memory of the operator
tuple <ts,x>
<3,1> <2,4> <1,3><4,3>
14. Stream Processing Operators
14Introduction Continuous clustering Elasticity in stream processing Conclusions
• Stateless
• E.g. filter
• Stateful
• E.g. aggregate
State is the memory of the operator
window
<1,3><4,3>
<3,1> <2,4> <1,3>
tuple <ts,x>
<3,8>
15. Outline
15
1. Introduction
• Motivation
• Thesis objectives
o Continuous analysis
o Adaptive reconfiguration
o Hardware independent
o Efficient processing
2. Continuous clustering
3. Elasticity in stream processing
4. Conclusions
Introduction Continuous clustering Elasticity in stream processing Conclusions
16. LiDAR Point Cloud Clustering
16
Side view
Top view
𝑑
Introduction Continuous clustering Elasticity in stream processing Conclusions
Raw LiDAR data points
17. LiDAR Point Cloud Clustering
17Introduction Continuous clustering Elasticity in stream processing Conclusions
Clustered data pointsRaw LiDAR data points
18. Batch Clustering
18
1. Collect data points for one rotation
2. Store the points in search optimized data structure
3. Apply the clustering
𝜖
Parameters: 𝑚𝑖𝑛𝑃𝑡𝑠, 𝜖
Euclidean clustering
*[Ester et al.,Density-based1996] [Rusu et al., Semantic3D2010] [Rusu et al., pcl2011] [Patwary et al., DBSCAN2012]
Introduction Continuous clustering Elasticity in stream processing Conclusions
19. Batch Clustering
19
1. Collect data points for one rotation
2. Store the points in search optimized data structure
3. Apply the clustering
Introduction Continuous clustering Elasticity in stream processing Conclusions
Velodyne HDL-64E
• ~8 rotations per second
• Up to ~2.2 million points per second
Challenge?
20. Continuous Clustering
20
Ø H. Najdataei, Y. Nikolakopoulos, V. Gulisano, M. Papatriantafilou. “Continuous and Parallel LiDAR Point-cloud Clustering”
The 38th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2018.
Introduction Continuous clustering Elasticity in stream processing Conclusions
1. Collect data points for one rotation
2. Store the points in search optimized data structure
3. Apply the clustering
Lisco: continuous clustering while the data is being collected
29. Use case (1-Vehicle 1-Day)
29
t
⟨𝑥1, ⟩𝑦1GPS data ⟨𝑥2, ⟩𝑦2 ⟨𝑥3, ⟩𝑦3 ⟨𝑥5, ⟩𝑦5⟨𝑥4, ⟩𝑦4 ⟨𝑥6, ⟩𝑦6 ⟨ 𝑥7, ⟩𝑦7
Heavy traffic Exceeding speed limit
Introduction Continuous clustering Elasticity in stream processing Conclusions
30. System Model
30
Ø B. Havers, R. Duvignau, H. Najdataei, V. Gulisano, A. Chaitanya Koppisetty, M.
Papatriantafilou “DRIVEN: a framework for efficient Data Retrieval and clustering in
Vehicular Networks” The 35th International Conference on Data Engineering (ICDE).
IEEE, 2019
• Continuous bounded error approximation
• Compress volumes of data
• Utilize communication bandwidth
• Generalized form of Lisco
• Leverage the inherent ordering of spatial
and temporal data
Introduction Continuous clustering Elasticity in stream processing Conclusions
33. Stream Processing Performance
33
• Throughput
Number of tuples processed per time unit
Introduction Continuous clustering Elasticity in stream processing Conclusions
34. Stream Processing Performance
34
• Throughput
• Latency
Time difference between receiving a tuple and
producing the corresponding results
Introduction Continuous clustering Elasticity in stream processing Conclusions
40. STRETCH Framework
40
Components:
• State manager
• Virtual shared-nothing
parallelism
Introduction Continuous clustering Elasticity in stream processing Conclusions
Ø H. Najdataei, Y. Nikolakopoulos, M. Papatriantafilou, P. Tsigas, V. Gulisano “STRETCH: Scalable and Elastic Deterministic Streaming Analysis with
Virtual Shared-Nothing Parallelism” To appear in the 13th International Conference on Distributed and Event-Based Systems (DEBS). ACM, 2019.
43. ScaleGate
43Introduction Continuous clustering Elasticity in stream processing Conclusions
t t t t t t t
sourcesourcereaderreader
Tuples that are ready to be
retrieved by readers • Methods
• addTuple(tuple, sourceID)
• getNextReadyTuple(readerID)
44. Elastic ScaleGate
44
• Methods
• addTuple(tuple, sourceID)
• getNextReadyTuple(readerID)
• Additional methods
• announceReaders(List reader_IDs, rID)
• removeReaders(List reader_IDs)
• announceSources(List source_IDs, min_ts)
• removeSources(List source_IDs)
Introduction Continuous clustering Elasticity in stream processing Conclusions
t t t t t t t
sourcesourcereaderreader
Tuples that are ready to be
retrieved by readers