How can we learn from the data of small ubiquitous systems? Do we need to send the data to a server or cloud and do all learning there? Or can we learn on some small devices directly? Are smartphones small? Are navigation systems small? How complex is learning allowed to be in times of big data? What about graphical models? Can they be applied on small devices or even learned on restricted processors?
Big data are produced by various sources. Most often, they are distributedly stored at computing farms or clouds. Analytics on the Hadoop Distributed File System (HDFS) then follows the MapReduce programming model. According to the Lambda architecture of Nathan Marz and James Warren, this is the batch layer. It is complemented by the speed layer, which aggregates and integrates incoming data streams in real time. When considering big data and small devices, obviously, we imagine the small devices being hosts of the speed layer, only. Analytics on the small devices is restricted by memory and computation resources.
The interplay of streaming and batch analytics offers a multitude of configurations. In this talk, we discuss opportunities for using sophisticated models for learning spatio-temporal models. In particular, we investigate graphical models, which generate the probabilities for connected (sensor) nodes. First, we present spatio-temporal random fields that take as input data from small devices, are computed at a server, and send results to -possibly different — small devices. Second, we go even further: the Integer Markov Random Field approximates the likelihood estimates such that it can be computed on small devices. We illustrate our learning models by applications from traffic management.
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...multimediaeval
This working notes paper describes the system proposed by THU-HCSIL team for dynamic music emotion recognition. The procedure is divided into two module-feature extraction and regression. Both feature selection and feature combination are used to form the final THU feature set. In regression module, a Booster-based Multi-level Regression method is presented, which outperforms the baseline significantly on test data in RMSE metric for dynamic task.
http://ceur-ws.org/Vol-1263/mediaeval2014_submission_15.pdf
Uav route planning for maximum target coveragecseij
Utilization of Unmanned Aerial Vehicles (UAVs) in military and civil operations is getting popular. One of
the challenges in effectively tasking these expensive vehicles is planning the flight routes to monitor the
targets. In this work, we aim to develop an algorithm which produces routing plans for a limited number of
UAVs to cover maximum number of targets considering their flight range.
The proposed solution for this practical optimization problem is designed by modifying the Max-Min Ant
System (MMAS) algorithm. To evaluate the success of the proposed method, an alternative approach,
based on the Nearest Neighbour (NN) heuristic, has been developed as well. The results showed the success
of the proposed MMAS method by increasing the number of covered targets compared to the solution based
on the NN heuristic.
Solving real world delivery problem using improved max-min ant system with lo...ijaia
This paper presents a solution to real-world delive
ry problems (RWDPs) for home delivery services wher
e
a large number of roads exist in cities and the tra
ffic on the roads rapidly changes with time. The
methodology for finding the shortest-travel-time to
ur includes a hybrid meta-heuristic that combines a
nt
colony optimization (ACO) with Dijkstra’s algorithm
, a search technique that uses both real-time traff
ic
and predicted traffic, and a way to use a real-worl
d road map and measured traffic in Japan. We
previously proposed a hybrid ACO for RWDPs that use
d a MAX-MIN Ant System (MMAS) and proposed a
method to improve the search rate of MMAS. Since tr
affic on roads changes with time, the search rate i
s
important in RWDPs. In the current work, we combine
the hybrid ACO method with the improved MMAS.
Experimental results using a map of central Tokyo a
nd historical traffic data indicate that the propos
ed
method can find a better solution than conventional
methods.
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...multimediaeval
This working notes paper describes the system proposed by THU-HCSIL team for dynamic music emotion recognition. The procedure is divided into two module-feature extraction and regression. Both feature selection and feature combination are used to form the final THU feature set. In regression module, a Booster-based Multi-level Regression method is presented, which outperforms the baseline significantly on test data in RMSE metric for dynamic task.
http://ceur-ws.org/Vol-1263/mediaeval2014_submission_15.pdf
Uav route planning for maximum target coveragecseij
Utilization of Unmanned Aerial Vehicles (UAVs) in military and civil operations is getting popular. One of
the challenges in effectively tasking these expensive vehicles is planning the flight routes to monitor the
targets. In this work, we aim to develop an algorithm which produces routing plans for a limited number of
UAVs to cover maximum number of targets considering their flight range.
The proposed solution for this practical optimization problem is designed by modifying the Max-Min Ant
System (MMAS) algorithm. To evaluate the success of the proposed method, an alternative approach,
based on the Nearest Neighbour (NN) heuristic, has been developed as well. The results showed the success
of the proposed MMAS method by increasing the number of covered targets compared to the solution based
on the NN heuristic.
Solving real world delivery problem using improved max-min ant system with lo...ijaia
This paper presents a solution to real-world delive
ry problems (RWDPs) for home delivery services wher
e
a large number of roads exist in cities and the tra
ffic on the roads rapidly changes with time. The
methodology for finding the shortest-travel-time to
ur includes a hybrid meta-heuristic that combines a
nt
colony optimization (ACO) with Dijkstra’s algorithm
, a search technique that uses both real-time traff
ic
and predicted traffic, and a way to use a real-worl
d road map and measured traffic in Japan. We
previously proposed a hybrid ACO for RWDPs that use
d a MAX-MIN Ant System (MMAS) and proposed a
method to improve the search rate of MMAS. Since tr
affic on roads changes with time, the search rate i
s
important in RWDPs. In the current work, we combine
the hybrid ACO method with the improved MMAS.
Experimental results using a map of central Tokyo a
nd historical traffic data indicate that the propos
ed
method can find a better solution than conventional
methods.
Exact Data Reduction for Big Data by Jieping YeBigMine
Recent technological innovations have enabled data collection of unprecedented size and complexity. Examples include web text data, social media data, gene expression images, neuroimages, and genome-wide association study (GWAS) data. Such data have incredible potential to address complex scientific and societal questions, however analysis of these data poses major challenges for the scientists. As an emerging and powerful tool for analyzing massive collections of data, data reduction in terms of the number of variables and/or the number of samples has attracted tremendous attentions in the past few years, and has achieved great success in a broad range of applications. The intuition of data reduction is based on the observation that many real-world data with complex structures and billions of variables and/or samples can usually be well explained by a few most relevant explanatory features and/or samples. Most existing methods for data reduction are based on sampling or random projection, and the final model based on the reduced data is an approximation of the true (original) model. In this talk, I will present fundamentally different approaches for data reduction in that there is no approximation in the model, that is, the final model constructed from the reduced data is identical to the original model constructed from the complete data. Finally, I will use several real world examples to demonstrate the potential of exact data reduction for analyzing big data.
Social Media Strategies for Powerful Communicationscourtneymbarnes
Teleseminar I presented for PRSA on the topic of the book I co-authored with Paul Argenti, "Digital Strategies for Powerful Corporate Communications," which is being published by McGraw-Hill in August 2009.
Gene's law, Common gate, kernel Principal Component Analysis, ASIC Physical Design Post-Layout Verification, TSMC180nm, 0.13um IBM CMOS technology, Cadence Virtuoso, FPAA, in Spanish, Bruun E,
Self-Similarity in Complex Networks. Building on a paper by O\'Shanker with the ultimate idea of applying fractal dimension to characterize the anatomy of a website.
A Predictive Stock Data Analysis with SVM-PCA Model .......................................................................1
Divya Joseph and Vinai George Biju
HOV-kNN: A New Algorithm to Nearest Neighbor Search in Dynamic Space.......................................... 12
Mohammad Reza Abbasifard, Hassan Naderi and Mohadese Mirjalili
A Survey on Mobile Malware: A War without End................................................................................... 23
Sonal Mohite and Prof. R. S. Sonar
An Efficient Design Tool to Detect Inconsistencies in UML Design Models............................................. 36
Mythili Thirugnanam and Sumathy Subramaniam
An Integrated Procedure for Resolving Portfolio Optimization Problems using Data Envelopment
Analysis, Ant Colony Optimization and Gene Expression Programming ................................................. 45
Chih-Ming Hsu
Emerging Technologies: LTE vs. WiMAX ................................................................................................... 66
Mohammad Arifin Rahman Khan and Md. Sadiq Iqbal
Introducing E-Maintenance 2.0 ................................................................................................................. 80
Abdessamad Mouzoune and Saoudi Taibi
Detection of Clones in Digital Images........................................................................................................ 91
Minati Mishra and Flt. Lt. Dr. M. C. Adhikary
The Significance of Genetic Algorithms in Search, Evolution, Optimization and Hybridization: A Short
Review ...................................................................................................................................................... 103
Exact Data Reduction for Big Data by Jieping YeBigMine
Recent technological innovations have enabled data collection of unprecedented size and complexity. Examples include web text data, social media data, gene expression images, neuroimages, and genome-wide association study (GWAS) data. Such data have incredible potential to address complex scientific and societal questions, however analysis of these data poses major challenges for the scientists. As an emerging and powerful tool for analyzing massive collections of data, data reduction in terms of the number of variables and/or the number of samples has attracted tremendous attentions in the past few years, and has achieved great success in a broad range of applications. The intuition of data reduction is based on the observation that many real-world data with complex structures and billions of variables and/or samples can usually be well explained by a few most relevant explanatory features and/or samples. Most existing methods for data reduction are based on sampling or random projection, and the final model based on the reduced data is an approximation of the true (original) model. In this talk, I will present fundamentally different approaches for data reduction in that there is no approximation in the model, that is, the final model constructed from the reduced data is identical to the original model constructed from the complete data. Finally, I will use several real world examples to demonstrate the potential of exact data reduction for analyzing big data.
Social Media Strategies for Powerful Communicationscourtneymbarnes
Teleseminar I presented for PRSA on the topic of the book I co-authored with Paul Argenti, "Digital Strategies for Powerful Corporate Communications," which is being published by McGraw-Hill in August 2009.
Gene's law, Common gate, kernel Principal Component Analysis, ASIC Physical Design Post-Layout Verification, TSMC180nm, 0.13um IBM CMOS technology, Cadence Virtuoso, FPAA, in Spanish, Bruun E,
Self-Similarity in Complex Networks. Building on a paper by O\'Shanker with the ultimate idea of applying fractal dimension to characterize the anatomy of a website.
A Predictive Stock Data Analysis with SVM-PCA Model .......................................................................1
Divya Joseph and Vinai George Biju
HOV-kNN: A New Algorithm to Nearest Neighbor Search in Dynamic Space.......................................... 12
Mohammad Reza Abbasifard, Hassan Naderi and Mohadese Mirjalili
A Survey on Mobile Malware: A War without End................................................................................... 23
Sonal Mohite and Prof. R. S. Sonar
An Efficient Design Tool to Detect Inconsistencies in UML Design Models............................................. 36
Mythili Thirugnanam and Sumathy Subramaniam
An Integrated Procedure for Resolving Portfolio Optimization Problems using Data Envelopment
Analysis, Ant Colony Optimization and Gene Expression Programming ................................................. 45
Chih-Ming Hsu
Emerging Technologies: LTE vs. WiMAX ................................................................................................... 66
Mohammad Arifin Rahman Khan and Md. Sadiq Iqbal
Introducing E-Maintenance 2.0 ................................................................................................................. 80
Abdessamad Mouzoune and Saoudi Taibi
Detection of Clones in Digital Images........................................................................................................ 91
Minati Mishra and Flt. Lt. Dr. M. C. Adhikary
The Significance of Genetic Algorithms in Search, Evolution, Optimization and Hybridization: A Short
Review ...................................................................................................................................................... 103
DESIGN OF QUATERNARY LOGICAL CIRCUIT USING VOLTAGE AND CURRENT MODE LOGICVLSICS Design
In VLSI technology, designers main concentration were on area required and on performance of the
device. In VLSI design power consumption is one of the major concerns due to continuous increase in chip
density and decline in size of CMOS circuits and frequency at which circuits are operating. By considering
these parameter logical circuits are designed using quaternary voltage mode logic and quaternary current
mode logic. Power consumption required for quaternary voltage mode logic is 51.78 % less as compared
to binary . Area in terms of number of transistor required for quaternary voltage mode logic is 3 times
more as compared to binary. As quaternary voltage mode circuit required large area as compared to
quaternary current mode circuit but power consumption required in quaternary voltage mode circuit is less
than that required in quaternary current mode circuit .
Representing Simplicial Complexes with MangrovesDavid Canino
These slides have been presented at the 22nd International Meshing Roundtable, Orlando, FL, USA. They describe our GPL software, the Mangrove TDS Library: http://mangrovetds.sourceforge.net. It is a C++ tool for the fast prototyping of topological data structures, representing dynamically simplicial and cell complexes.
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsToyotaro Suzumura
Please cite the following paper:
Toyotaro Suzumura and Koji. Ueno, "ScaleGraph: A high-performance library for billion-scale graph analytics," 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, 2015, pp. 76-84.
doi: 10.1109/BigData.2015.7363744
Recently, large-scale graph analytics has become a very popular topic owing to the emergence of gigantic graphs whose number of vertices and edges is in millions, billions or even trillions. Many graph analytics libraries and frameworks have been proposed with various computational models and programming languages to deal with such graphs. X10 programming language is a PGAS language that aims at both software performance and programmer's productivity. We introduce ScaleGraph library developed using X10 programming to illustrate the use of X10 for large-scale graph analytics. ScaleGraph library provides XPregel framework that is inspired by Google's Pregel computation model, serving as a building block for implementing graph kernels. We also optimized X10 runtime in some parts such as collective communication and memory management. We evaluated the performance and scalability of ScaleGraph libraries. The result shows that most graph kernels have good performance and scalability. ScaleGraph library is 9.4 times faster than Giraph in the experiment of PageRank with 16 machine nodes. To the best of our knowledge, ScaleGraph is the first X10-based library to address performance, scalability and productivity issues in dealing with large-scale graph analytics.
We present a causal speech enhancement model working on the
raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with
skip-connections. It is optimized on both time and frequency
domains, using multiple loss functions. Empirical evidence
shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises,
as well as room reverb. Additionally, we suggest a set of
data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities. We perform evaluations on several standard
benchmarks, both using objective metrics and human judgements. The proposed model matches state-of-the-art performance of both causal and non causal methods while working
directly on the raw waveform.
Index Terms: Speech enhancement, speech denoising, neural
networks, raw waveform
Adaptive blind multiuser detection under impulsive noise using principal comp...csandit
In this paper we consider blind signal detection for an asynchronous code division multiple
access (CDMA) system with Principal component analysis (PCA) in impulsive noise. The blind
multiuser detector requires no training sequences compared with the conventional multiuser
detection receiver. The proposed PCA blind multiuser detector is robust when compared
with knowledge based signature waveforms and the timing of the user of interest. PCA is
a statistical method for reducing the dimension of data set, spectral decomposition of the
covariance matrix of the dataset i.e first and second order statistics are estimated.
Principal component analysis makes no assumption on the independence of the data vectors
PCA searches for linear combinations with the largest variances and when several linear
combinations are needed, it considers variances in decreasing order of importance. PCA
improves SNR of signals used for differential side channel analysis. In different to other
approaches, the linear minimum mean-square-error (MMSE) detector is obtained blindly; the
detector does not use any training sequence like in subspace methods to detect multi user
receiver. The algorithm need not estimate the subspace rank in order to reduce the
computational complexity. Simulation results show that the new algorithm offers substantial
performance gains over the traditional subspace methods.
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...csandit
In this paper we consider blind signal detection for an asynchronous code division multiple access (CDMA) system with Principal component analysis (PCA) in impulsive noise. The blind multiuser detector requires no training sequences compared with the conventional multiuser detection receiver. The proposed PCA blind multiuser detector is robust when compared with knowledge based signature waveforms and the timing of the user of interest. PCA is a statistical method for reducing the dimension of data set, spectral decomposition of the covariance matrix of the dataset i.e first and second order statistics are estimated.
Principal component analysis makes no assumption on the independence of the data vectors PCA searches for linear combinations with the largest variances and when several linear combinations are needed, it considers variances in decreasing order of importance. PCA
improves SNR of signals used for differential side channel analysis. In different to other approaches, the linear minimum mean-square-error (MMSE) detector is obtained blindly; the detector does not use any training sequence like in subspace methods to detect multi user
receiver. The algorithm need not estimate the subspace rank in order to reduce the computational complexity. Simulation results show that the new algorithm offers substantial performance gains over the traditional subspace methods.
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...cscpconf
In this paper we consider blind signal detection for an asynchronous code division multiple access (CDMA) system with Principal component analysis (PCA) in impulsive noise. The blind multiuser detector requires no training sequences compared with the conventional multiuser detection receiver. The proposed PCA blind multiuser detector is robust when compared with knowledge based signature waveforms and the timing of the user of interest. PCA is a statistical method for reducing the dimension of data set, spectral decomposition of the covariance matrix of the dataset i.e first and second order statistics are estimated. Principal component analysis makes no assumption on the independence of the data vectors
PCA searches for linear combinations with the largest variances and when several linear combinations are needed, it considers variances in decreasing order of importance. PCA
improves SNR of signals used for differential side channel analysis. In different to other approaches, the linear minimum mean-square-error (MMSE) detector is obtained blindly; the
detector does not use any training sequence like in subspace methods to detect multi user receiver. The algorithm need not estimate the subspace rank in order to reduce the omputational complexity. Simulation results show that the new algorithm offers substantialperformance gains over the traditional subspace methods.
This presentation introduces Deep Learning (DL) concepts, such as neural neworks, backprop, activation functions, and Convolutional Neural Networks, followed by an Angular application that uses TypeScript in order to replicate the Tensorflow playground.
Similar to Big Data and Small Devices by Katharina Morik (20)
Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...BigMine
Networks (i.e., graphs) appears in many high-impact applications. Often these networks are collected from different sources, at different times, at different granularities. In this talk, I will present our recent work on mining such multiple networks. First, we will present two models - one on modeling a set of inter-connected networks (NoN); and the other on modeling a set of inter-connected co-evolving time series (NoT). For both models, we will show that by treating networks as context, we are able to model more complicate real-world applications. Second, we will present some algorithmic examples on how to do mining with such new models, including ranking, imputation and prediction. Finally, we will demonstrate the effectiveness of our new models and algorithms in some applications, including bioinformatics, and sensor networks.
From Practice to Theory in Learning from Massive Data by Charles Elkan at Big...BigMine
This talk will discuss examples of how Amazon applies machine learning to large-scale data, and open research questions inspired by these applications. One important question is how to distinguish between users that can be influenced, versus those who are merely likely to respond. Another question is how to measure and maximize the long-term benefit of movie and other recommendations. A third question, is how to share data while provably protecting the privacy of users. Note: Information in the talk is already public, and opinions expressed will be strictly personal.
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16BigMine
Apache Spark has become the most active open source Big Data project, and its Machine Learning library MLlib has seen rapid growth in usage. A critical aspect of MLlib and Spark is the ability to scale: the same code used on a laptop can scale to 100’s or 1000’s of machines. This talk will describe ongoing and future efforts to make MLlib even faster and more scalable by integrating with two key initiatives in Spark. The first is Catalyst, the query optimizer underlying DataFrames and Datasets. The second is Tungsten, the project for approaching bare-metal speeds in Spark via memory management, cache-awareness, and code generation. This talk will discuss the goals, the challenges, and the benefits for MLlib users and developers. More generally, we will reflect on the importance of integrating ML with the many other aspects of big data analysis.
About MLlib: MLlib is a general Machine Learning library providing many ML algorithms, feature transformers, and tools for model tuning and building workflows. The library benefits from integration with the rest of Apache Spark (SQL, streaming, Graph, core), which facilitates ETL, streaming, and deployment. It is used in both ad hoc analysis and production deployments throughout academia and industry.
Processing Reachability Queries with Realistic Constraints on Massive Network...BigMine
Massive graphs are ubiquitous in various application domains, such as social networks, road networks, communication networks, biological networks, RDF graphs, and so on. Such graphs are massive (for example, with hundreds of millions of nodes and edges or even more) and contain rich information (for example, node/edge weights, labels and textual contents). In such massive graphs, an important class of problems is to process various graph structure related queries. Graph reachability, as an example, asks whether a node can reach another in a graph. However, the large graph scale presents new challenges for efficient query processing.
In this talk, I will introduce two new yet important types of graph reachability queries: weight constraint reachability that imposes edge weight constraint on the answer path, and k-hop reachability that imposes a length constraint on the answer path. With such realistic constraints, we can find more meaningful and practically feasible answers. These two reachablity queries have wide applications in many real-world problems, such as QoS routing and trip planning.
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...BigMine
In today’s interconnected real world, social and informational entities are interconnected, forming gigantic, interconnected, integrated social and information networks. By structuring these data objects into multiple types, such networks become semi-structured heterogeneous social and information networks. Most real world applications that handle big data, including interconnected social media and social networks, medical information systems, online e-commerce systems, or database systems, can be structured into typed, heterogeneous social and information networks. For example, in a medical care network, objects of multiple types, such as patients, doctors, diseases, medication, and links such as visits, diagnosis, and treatments are intertwined together, providing rich information and forming heterogeneous information networks. Effective analysis of large-scale heterogeneous social and information networks poses an interesting but critical challenge.
In this talk, we present a set of data mining scenarios in heterogeneous social and information networks and show that mining typed, heterogeneous networks is a new and promising research frontier in data mining research. However, such mining may raise some serious challenging problems on scalability computation. We identify a set of problems on scalable computation and calls for serious studies on such problems. This includes how to efficiently computation for (1) meta path-based similarity search, (2) rank-based clustering, (3) rank-based classification, (4) meta path-based link/relationship prediction, and (5) topical hierarchies from heterogeneous information networks. We introduce some recent efforts, discuss the trade-offs between query-independent pre-computation vs. query-dependent online computation, and point out some promising research directions.
Big & Personal: the data and the models behind Netflix recommendations by Xa...BigMine
Since the Netflix $1 million Prize, announced in 2006, our company has been known for having personalization at the core of our product. Even at that point in time, the dataset that we released was considered “large”, and we stirred innovation in the (Big) Data Mining research field. Our current product offering is now focused around instant video streaming, and our data is now many orders of magnitude larger. Not only do we have many more users in many more countries, but we also receive many more streams of data. Besides the ratings, we now also use information such as what our members play, browse, or search.
In this talk I will discuss the different approaches we follow to deal with these large streams of data in order to extract information for personalizing our service. I will describe some of the machine learning models used, as well as the architectures that allow us to combine complex offline batch processes with real-time data streams.
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosBigMine
What do graphs look like? How do they evolve over time? How does influence/news/viruses propagate, over time? We present a long list of static and temporal laws, and some recent observations on real graphs. We show that fractals and self-similarity can explain several of the observed patterns, and we conclude with cascade analysis and a surprising result on virus propagation and immunization.
Unexpected Challenges in Large Scale Machine Learning by Charles ParkerBigMine
Talk by Charles Parker (BigML) at BigMine12 at KDD12.
In machine learning, scale adds complexity. The most obvious consequence of scale is that data takes longer to process. At certain points, however, scale makes trivial operations costly, thus forcing us to re-evaluate algorithms in light of the complexity of those operations. Here, we will discuss one important way a general large scale machine learning setting may differ from the standard supervised classification setting and show some the results of some preliminary experiments highlighting this difference. The results suggest that there is potential for significant improvement beyond obvious solutions.
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...BigMine
Talk by Usama Fayyad at BigMine12 at KDD12.
Virtually all organizations are having to deal with Big Data in many contexts: marketing, operations, monitoring, performance, and even financial management. Big Data is characterized not just by its size, but by its Velocity and its Variety for which keeping up with the data flux, let alone its analysis, is challenging at best and impossible in many cases. In this talk I will cover some of the basics in terms of infrastructure and design considerations for effective an efficient BigData. In many organizations, the lack of consideration of effective infrastructure and data management leads to unnecessarily expensive systems for which the benefits are insufficient to justify the costs. We will refer to example frameworks and clarify the kinds of operations where Map-Reduce (Hadoop and and its derivatives) are appropriate and the situations where other infrastructure is needed to perform segmentation, prediction, analysis, and reporting appropriately – these being the fundamental operations in predictive analytics. We will thenpay specific attention to on-line data and the unique challenges and opportunities represented there. We cover examples of Predictive Analytics over Big Data with case studies in eCommerce Marketing, on-line publishing and recommendation systems, and advertising targeting: Special focus will be placed on the analysis of on-line data with applications in Search, Search Marketing, and targeting of advertising. We conclude with some technical challenges as well as the solutions that can be used to these challenges in social network data.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Big Data and Small Devices by Katharina Morik
1. Big Data and Small Decives
Katharina Morik, Artificial Intelligence,
TU Dortmund University, Germany
2. Overview
! Big data -- small devices
! Graphical Models
! Computation on the batch
layer
! Computation on the speed
layer
! Integer Random Fields
! Spatio-Temporal Random Fields
! Traffic Scenarios
! Using the streams framework
3. Big data – small devices
! Machine learning for the
enhancement of small
devices
! Machine learning on small
devices
! Reduce
! Computational
complexity
! Memory consumption
! Communication
! Energy
4. Overview
! Big data -- small devices
! Graphical Models
! Computation on the batch
layer
! Computation on the speed
layer
! Integer Random Fields
! Spatio-Temporal Random Fields
! Traffic Scenarios
! Using the streams framework
5. Graphical Models
! Graph G=(V,E)
! Multi-variate random variable
X
with finite discrete domain X
! Mapping joint vertex
assignment into vector space
φ(x): X --> Rd
! Parameter to be learned:
θ in Rd
CRF :
p
( !
!
y
x) =
1
!
( x)
Z θ,
Σ
exp θi
φi
( !
x)
!
y,
i
"
# $
%
& '
1
a
= exp(−ln a)
Σ
= exp θi
φi
( !
x)
!
y,
i
"
# $
%
& '
!
( x)
− ln Z θ,
)
* +
,
- .
!
x ( ) − A θ ( ) )*
= exp θ,φ
!
y,
,-
MRF :
p(
!
x) =
1
Z θ ( )
Σ
exp θi
φi
!
(x)
i
"
# $
%
& '
!
x ( ) − A θ ( ) )*
= exp θ,φ
,-
12. Belief propagation example
Π
Σ
mv→u (y) = exp θvu=xy +θv=x ( ) mwu (x)
Lights
Rain
Jam
{jam,
free}
{dry, wet}
{red,
yellow,
green}
w∈Nv−{u}
x∈ℵv
Compute for each node u
and each of its neighbors
and each of the states
13. Parallel Belief Propagation (GPU)
! Data Parallel: all node and neighbor pairs become blocks with b
threads. A thread computes the msg v" u.
! Thread-cooperative: all node and instance pairs become blocks. A
thread computes partial outgoing messages. These are cooperatively
processed by all threads in a block. Partial messages are merged
afterwards.
! Nico Piatkowski (2011) Parallel Algorithms for GPU Accelerated Probabilistic Inference
! http://biglearn.org/2011/files/papers/biglearn2011_submission_28.pdf
14. File Prefetching for Smartphone Energy Saving
! Accessing files that are stored in cache
consumes less energy than accessing those on the hard disc.
! Prefetching files which soon will be accessed
allows for grouping requests to a batched one. This saves energy.
" Prediction of file access saves energy!
Given a system call:
1,open,1812,179,178,201,200,firefox,/etc/hosts,524288,438,7
Predict: hosts full
Prefetch: hosts
2,read,1812,179,178,201,200,firefox,/etc/hosts,4096,361
3,read,1812,179,178,201,200,firefox,/etc/hosts,4096,0
4,close,1812,179,178,201,200,firefox,/etc/hosts
timestamp, syscall, thread-id, process-id, parent, user, group, exec, file,
parameters (optional)
14
15. Results: Memory
! Naive Bayes models use less memory than those of regular CRF.
! Implementing CRF on GPGPU reduces memory needs.
! CRF on GPU scales best.
15
1600
1400
1200
1000
800
600
400
200
0
67k 202k 337k 472k 606k
NB memory
CRF memory
GPU CRF memory
Felix Jungermann, Nico Piatkowski, Katharina
Morik (2010) Enhancing Ubiquitous Systems
Through System Call Mining, LACIS at ICDM
http://www.computer.org/csdl/proceedings/
icdmw/2010/4257/00/4257b338-abs.html
16. Overview
! Big data -- small devices
! Graphical Models
! Computation on the batch
layer
! Computation on the speed
layer
! Integer Random Fields
! Spatio-Temporal Random Fields
! Traffic Scenarios
! Using the streams framework
17. Graphical models on resource-restricted processors
! Floating point arithmetics
(real) costs more clock cycles
than integer arithmetics.
! “The most obvious technique
to conserve power is to
reduce the number of cycles
it takes to complete a
workload.” (Intel 64, IA-32
architectures optimization
reference manual, guidelines
for extending battery life).
! Restrict the parameter space
of MRF
Sandy Bridge ARM 11
Real Int Real Int
+ 3 1 8 1
* 5 3 8 4-5
/ 14 13-15 19 -
Bit
- 3 - 2
shift
Clock cycles for arithmetics
on different processors:
Real vs. integer.
θ ∈{0,1,...,K}⊂ Ν
18. Parameter space transformation
! Graph model tree-structured
! Transform the parameter
space:
ηi (θ) = θi ln 2
MRF :
p(
!
x) =
1
Z θ ( )
Σ
exp θi
φi
!
(x)
i
"
# $
%
& '
!
x ( ) − A θ ( ) )*
= exp θ,φ
+,
IntegerMRF :
p(
!
x) = exp η θ ( ),φ
!
x ( ) )*
+,
= 2
θ ,φ x ( ) −A η θ ( ) ( ) )*
+,
=
2 θ ,φ (x)
2 θ ,φ (y)
y∈ℵ
Σ
19. Integer belief propagation
! Simply replacing the exp(.) by 2(.) is not sufficient
! Overflows are normally avoided by normalization.
! Normalization is impossible in integer division.
! Magnitude of messages corresponds to probability
! Use the length of each message
! Bit-length is similar to log
Π
Σ
mv→u (y) = exp θvu=xy +θv=x ( ) mwu (x)
w∈Nv−{u}
x∈ℵv
m! v→u (y) = 2 (θvu=xy+θv=x )
Σ m! wu (x)
x∈ℵv
Π
w∈Nv−{u}
βv→u (y) = max
x∈ℵv
Σ
θvu=xy +θv=x + βwu (x)
w∈Nv−{u}
20. Evaluation: accuracy
0.95
0.9
0.85
Int, deg=4
Int, deg=8
Int, deg=16
1000 2000 3000 4000 5000 6000 7000 8000
! Different vertex degrees
! Up to 8000 nodes in the graph
! 100 runs of IntMRF
! Real MRF is 100% accuracy.
# Scalable!
21. Integer models use less clock cycles and memory
! Integer MRF shows
! high accuracy: 86 – 92%
of full MRF for up to 8
states and 8000 nodes
! high speed-up: 2.56
Raspberry Pie and 2.34
for Sandy Bridge
! Graphical models can now be
computed on small devices.
! Using less clock cycles really
saves energy.
Nico Piatkowski, Sangkyun Lee, Katharina Morik (2014)
The Integer Approximation of Undirected Graphical
Models, ICPRAM
http://www-ai.cs.uni-dortmund.de/
PublicPublicationFiles/piatkowski_etal_2014a.pdf
22. Overview
! Big data -- small devices
! Graphical Models
! Computation on the batch
layer
! Computation on the speed
layer
! Integer Random Fields
! Spatio-Temporal Random Fields
! Traffic Scenarios
! Using the streams framework
23. Spatio-temporal random fields
! The spatio-temporal graph is
trained to predict each node’s
maximum a posteriori
probability with the marginal
probabilities.
! Generative model predicting
all nodes.
! Dimension
T x |V0| x | X| +
[(T-1)(|V0|+3|E0|)+ |E0|] x |X|2
! Remember: vectors are sparse
we have to exploit that!
User queries:
Given traffic densities at all
nodes at t1, t2, t3, what is the
probability of traffic density
at node A at time t5?
Given state “jam” at place A
ts, which other places have a
higher probability for “jam” in
ts < t < te?
24. Training efficiently
! Reparametrization
! Sparseness and smoothness
! There are not many changes
over time
! Small ratio of non-zero
entries
! Regularization
! L1 and L2 norm now
penalizing difference θ(t-1)
and θ(t)
! STRF uses 8 times less memory
than MRF model
! Its distributed optimization
converges about 2 times faster.
25. Compressible parametrization
! STRF is 8 times smaller than
MRF model
! The ratio of non-zero entries
is quite small already for
quite small L1 regularization
! STRF’s distributed
optimization converges
faster than MRF:
! Run MRF for 100
iterations, record the
likelihood value;
! Run STRF until it reaches
the same result of the
objective function.
! Compare seconds.
26. Evaluation: accuracy
! Smaller state spaces result
in higher accuracies.
! Again, for MRF accuracy
100%.
# State space is important!
1
0.8
0.6
0.4
Int, |X|=2
Int, |X|=4
Int, |X|=8
Int, |X|=16
0 1000 2000 3000 4000 5000 6000 7000 8000
27. Flood in the City
! The city of Dublin suffers
from flood due to rainfall and
tide. Wanted:
! Optimal traffic
regulations based on
traffic predictions.
! Insight EU Project
www.insight-ict.eu
Intelligent Synthesis and
Real-time Response using
Massive Streaming of
Heterogeneous Data
Coordinator:
Dimitrios Gunopoulos, Univ.
Athens, Greece
28. Smart trip modeling for Dublin
! Open Street Map " graph
topology
! Open Trip Planner: user query
(v,w), route planning based on
traffic costs.
! Traffic costs learned:
! Spatio-temporal random
field based on sensor data
stream;
! Gaussian process estimates
values for non-sensor
locations.
! Framework for real-time
processing of data streams, XML
configuration of data flow,
connecting data, traffic model
and planner.
7:00
8:00
v
v
w
w
29. Constructing the spatio-temporal graph of Dublin
OpenStreetMap streets segmented according to junctions.
966 sensors transmit traffic flow every 6 minutes (www.dublinked.ie).
Aggregate sensor readings for 30 minutes, aggregate sensor nodes by 7NN.
Traffic flow discretized into 6 intervals of density.
48 time layers for each day (48* 30=1440 minutes make a day).
Training for every weekday.
Predicting density of each node and edge – interpreted as costs.
30. Using STRF for smart trip modeling -- Evaluation
! Confusion matrix of predicting the number of vehicles (6
intervals) for all sensors and all half hours following 1 pm
on Fridays, tested on March 1.,8.,15.,22.,29.
! given the traffic at 1 p.m. (bold is true).
Predicted 0 1-5 6-20 21-
30
31-60 61- Prec
True
0 840 32 10 6 3 0 0.943
1-5 2 632 498 3 0 1 0.556
6-20 91 156 12169 2006 83 25 0.838
21-30 32 0 1223 5637 717 14 0.739
31-60 43 0 60 893 1945 29 0.655
61- 0 0 16 3 12 35 0.530
Recall 0.833 0.771 0.871 0.659 0.705 0.34
Environment Energy Engineering
32. Smart traffic for smart cities
! Spatio-temporal modeling is
natural for traffic prediction.
! Several questions can be
answered using the same
learned model.
! The answers come along with
their probabilities. This might
be helpful for decision
makers.
! Integration of Spatio-temporal
random fields into
the Open Trip Planner and
Gaussian information
completion resulted in an
excellent navigation system.
Piatkowski, Lee, Morik (2013) Spatio-temporal random
fields: compressible representation and distributed
estimation, Machine Learning Journal 93:1, 115 – 140.
Liebig, Piatkowski, Bockermann, Morik (2014)
Predictive Trip Planning – Smart Routing in Smart Cities,
Mining Urban Data Workshop at 17th Intern. Conf. on
Extending Database Technology.
33. Overview
! Introduction
! Big data
! Small devices
! Graphical Models
! Computation on the batch
layer
! Computation on the speed
layer
! Integer Random Fields
! Spatio-Temporal Random Fields
! Traffic Scenarios
! Using the streams framework
34. Streams framework
! Open-source Java
environment for abstract
orchestration of streaming
functions by Christian
Bockermann, GNU AGPL
license.
! High-level API for sources,
processing nodes,
executable on the STORM
engine and topology.
! Graph of nodes can be used
for distributed processing.
! Easy integration of MOA
library.
! Eases integration of various
engines.
Christian Bockermann, Hendrik Blom
2012
The Streams Framework
Tech.Rep. SFB 876
http://www-ai.cs.uni-dortmund.de/
PublicPublicationFiles/
bockermann_blom_2012c.pdf
35. Stream Processing Engines
! Fault tolerance by resending
not acknowledged data
(streams) vs. sophisticated
recovery methods
(MillWheel).
! State management by storing
state in process context
(streams) vs. links to
distributed storage systems
(Samza, MillWheel on Apache
Cassandra, BigTable).
! Explicit partitioning by flow
graph vs. built upon
distributed computing
environment managed by a
central node.
36. Comparison of stream processing engines
Christian Bockermann
A Survey of the Stream
Processing Landscape
2014
TechRep. SFB 876
http://sfb876.tu-dortmund.de/
PublicPublicationFiles/
bockermann_2014b.pdf
37. Streams framework
! Streams to map reduce for
an easy execution in Hadoop
reading HDFS files.
38. Conclusion
! Complex models such as MRF
can be computed on small
devices such as smartphones.
! Graphical Models
! Parallel BP using GPU
! Approximation to Integer
Random Fields
! Spatio-Temporal Random
Fields
! Using the streams
framework for integrating
processes (e.g. MOA for
learning) and real-time
processing and
documentation.
39. References
! Nico Piatkowski (2011) Parallel Algorithms
for GPU Accelerated Probabilistic
Inference
http://biglearn.org/2011/files/papers/
biglearn2011_submission_28.pdf
! Felix Jungermann, Nico Piatkowski,
Katharina Morik (2010) Enhancing
Ubiquitous Systems Through System Call
Mining, LACIS at ICDM
http://www.computer.org/csdl/proceedings/
icdmw/2010/4257/00/4257b338-abs.html
! Nico Piatkowski, Sangkyun Lee, Katharina
Morik (2014) The Integer Approximation of
Undirected Graphical Models, ICPRAM
http://www-ai.cs.uni-dortmund.de/
PublicPublicationFiles/
piatkowski_etal_2014a.pdf
! Piatkowski, Lee, Morik (2013) Spatio-temporal
random fields: compressible
representation and distributed
estimation, Machine Learning Journal
93:1, 115 – 140.
! Liebig, Piatkowski, Bockermann, Morik
(2014) Predictive Trip Planning – Smart
Routing in Smart Cities, Mining Urban
Data Workshop at 17th Intern. Conf. on
Extending Database Technology.
! Streams framework
Christian Bockermann, Hendrik Blom
2012 The Streams Framework
Tech.Rep. SFB 876
http://www-ai.cs.uni-dortmund.de/
PublicPublicationFiles/
bockermann_blom_2012c.pdf
! Christian Bockermann (2014) A Survey of
the Stream Processing Landscape
TechRep. SFB 876
http://sfb876.tu-dortmund.de/
PublicPublicationFiles/
bockermann_2014b.pdf
! For streams software (not just scripts!)
check:
http://www-ai.cs.uni-dortmund.de/
SOFTWARE/streams/