Can we use information from social media and crowdsourced images to detect smoke and assist rescue forces? While there are computer vision methods for detecting smoke, they require movement information extracted from video data. In this paper we propose SmokeBlock: a method that is able to segment and detect smoke in still images. SmokeBlock uses superpixel segmentation and extracts local color and texture features from images to spot smoke. We used real data from Flickr and compared SmokeBlock against state-of-the-art methods for feature extraction. Our method achieved performance superior than the competitors, for the task of smoke detection. Our findings shall support further investigations in the field of image analysis, in particular, concerning images captured with mobile devices.
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...Universidade de São Paulo
Given a very large dataset of moderate-to-high di-
mensionality, how to mine useful patterns from it? In such
cases, dimensionality reduction is essential to overcome the
“curse of dimensionality”. Although there exist algorithms to
reduce the dimensionality of Big Data, unfortunately, they
all fail to identify/eliminate non-linear correlations between
attributes. This paper tackles the problem by exploring con-
cepts of the Fractal Theory and massive parallel processing
to present Curl-Remover, a novel dimensionality reduction
technique for very large datasets. Our contributions are: Curl-
Remover eliminates linear and non-linear attribute correlations
as well as irrelevant ones; it is unsupervised and suits for
analytical tasks in general – not only classification; it presents
linear scale-up; it does not require the user to guess the
number of attributes to be removed, and; it preserves the
attributes’ semantics. We performed experiments on synthetic
and real data spanning up to 1.1 billion points and Curl-
Remover outperformed a PCA-based algorithm, being up to
8% more accurate.
Several graph visualization tools exist. However, they are not able to handle large graphs, and/or they do not allow interaction. We are interested on large graphs, with hundreds of thousands of nodes. Such graphs bring two challenges: the first one is that any straightforward interactive manipulation will be prohibitively slow. The second one is sensory overload: even if we could plot and replot the graph quickly, the user would be overwhelmed with the vast volume of information because the screen would be too cluttered as nodes and edges overlap each other. GMine system addresses both these issues, by using summarization and multi-resolution. GMine offers multi-resolution graph exploration by partitioning a given graph into a hierarchy of com-munities-within-communities and storing it into a novel R-tree-like structure which we name G-Tree. GMine offers summarization by implementing an innovative subgraph extraction algorithm and then visualizing its output.
http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesJr_et_al_Frequency_Plot-SIBGRAPI2003.pdf
Jose Rodrigues, Agma J M Traina, Caetano Traina Jr (2003) Frequency Plot and Relevance Plot to Enhance Visual Data Exploration In: XVI Brazilian Symposium on Computer Graphics and Image Processing 117-124 IEEE Press.
@inproceedings { DBLP:conf/sibgrapi/RodriguesTT03,
title = "Frequency Plot and Relevance Plot to Enhance Visual Data Exploration",
year = "2003",
author = "Jose Rodrigues and Agma J M Traina and Caetano Traina Jr",
booktitle = " XVI Brazilian Symposium on Computer Graphics and Image Processing",
pages = "117-124",
publisher = "IEEE Press",
doi = "10.1109/SIBGRA.2003.1240999",
url = "http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesJr_et_al_Frequency_Plot-SIBGRAPI2003.pdf",
urllink = "http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=1240999&",
abstract = "We present two techniques aiming at exploring databases through multivariate visualizations. Both techniques intend to deal with the problem caused by the limited amount of elements that can be presented simultaneously in traditional visual exploration procedures. The first technique, the Frequency Plot, combines data frequency with interactive filtering to identify clusters and trends in subsets of the database. Thus, graphical elements (lines, pixels, icons, or graphical marks) are color differentiated proportionally to how frequent the value being represented is, while interactive filtering allows the selection of interesting partitions of the database. The second technique, the Relevance Plot, corresponds to assigning different levels of color distinguishably to visual elements according to their relevance to a user's specified data properties set, which can be chosen visually and dynamically.",
keywords = "Computer science , Data analysis , Data visualization , Filtering , Frequency , Humans , Image databases , Information retrieval , Layout , Visual databases"}
Jose Rodrigues, Agma J M Traina, Christos Faloutsos, Caetano Traina Jr (2006) SuperGraph Visualization In: 8th IEEE International Symposium on Multimedia 227-234 IEEE Press.
@inproceedings { DBLP:conf/ism/RodriguesTFT06,
title = "SuperGraph Visualization",
year = "2006",
author = "Jose Rodrigues and Agma J M Traina and Christos Faloutsos and Caetano Traina Jr",
booktitle = "8th IEEE International Symposium on Multimedia",
pages = "227-234",
publisher = "IEEE Press",
doi = "10.1109/ISM.2006.143",
url = "http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesJr_et_al-ISM2006.pdf",
urllink = "http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4061172",
abstract = "Given a large social or computer network, how can we visualize it, find patterns, outliers, communities? Although several graph visualization tools exist, they cannot handle large graphs with hundred thousand nodes and possibly million edges. Such graphs bring two challenges: interactive visualization demands prohibitive processing power and, even if we could interactively update the visualization, the user would be overwhelmed by the excessive number of graphical items. To cope with this problem, we propose a formal innovation on the use of graph hierarchies that leads to GMine system. GMine promotes scalability using a hierarchy of graph partitions, promotes concomitant presentation for the graph hierarchy and for the original graph, and extends analytical possibilities with the integration of the graph partitions in an interactive environment.",
keywords = "Application software , Bipartite graph , Computer networks , Computer science , Data structures , Scalability , Technological innovation , Tree graphs , Visualization , Web pages"}
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...Universidade de São Paulo
Given a very large dataset of moderate-to-high di-
mensionality, how to mine useful patterns from it? In such
cases, dimensionality reduction is essential to overcome the
“curse of dimensionality”. Although there exist algorithms to
reduce the dimensionality of Big Data, unfortunately, they
all fail to identify/eliminate non-linear correlations between
attributes. This paper tackles the problem by exploring con-
cepts of the Fractal Theory and massive parallel processing
to present Curl-Remover, a novel dimensionality reduction
technique for very large datasets. Our contributions are: Curl-
Remover eliminates linear and non-linear attribute correlations
as well as irrelevant ones; it is unsupervised and suits for
analytical tasks in general – not only classification; it presents
linear scale-up; it does not require the user to guess the
number of attributes to be removed, and; it preserves the
attributes’ semantics. We performed experiments on synthetic
and real data spanning up to 1.1 billion points and Curl-
Remover outperformed a PCA-based algorithm, being up to
8% more accurate.
Several graph visualization tools exist. However, they are not able to handle large graphs, and/or they do not allow interaction. We are interested on large graphs, with hundreds of thousands of nodes. Such graphs bring two challenges: the first one is that any straightforward interactive manipulation will be prohibitively slow. The second one is sensory overload: even if we could plot and replot the graph quickly, the user would be overwhelmed with the vast volume of information because the screen would be too cluttered as nodes and edges overlap each other. GMine system addresses both these issues, by using summarization and multi-resolution. GMine offers multi-resolution graph exploration by partitioning a given graph into a hierarchy of com-munities-within-communities and storing it into a novel R-tree-like structure which we name G-Tree. GMine offers summarization by implementing an innovative subgraph extraction algorithm and then visualizing its output.
http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesJr_et_al_Frequency_Plot-SIBGRAPI2003.pdf
Jose Rodrigues, Agma J M Traina, Caetano Traina Jr (2003) Frequency Plot and Relevance Plot to Enhance Visual Data Exploration In: XVI Brazilian Symposium on Computer Graphics and Image Processing 117-124 IEEE Press.
@inproceedings { DBLP:conf/sibgrapi/RodriguesTT03,
title = "Frequency Plot and Relevance Plot to Enhance Visual Data Exploration",
year = "2003",
author = "Jose Rodrigues and Agma J M Traina and Caetano Traina Jr",
booktitle = " XVI Brazilian Symposium on Computer Graphics and Image Processing",
pages = "117-124",
publisher = "IEEE Press",
doi = "10.1109/SIBGRA.2003.1240999",
url = "http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesJr_et_al_Frequency_Plot-SIBGRAPI2003.pdf",
urllink = "http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=1240999&",
abstract = "We present two techniques aiming at exploring databases through multivariate visualizations. Both techniques intend to deal with the problem caused by the limited amount of elements that can be presented simultaneously in traditional visual exploration procedures. The first technique, the Frequency Plot, combines data frequency with interactive filtering to identify clusters and trends in subsets of the database. Thus, graphical elements (lines, pixels, icons, or graphical marks) are color differentiated proportionally to how frequent the value being represented is, while interactive filtering allows the selection of interesting partitions of the database. The second technique, the Relevance Plot, corresponds to assigning different levels of color distinguishably to visual elements according to their relevance to a user's specified data properties set, which can be chosen visually and dynamically.",
keywords = "Computer science , Data analysis , Data visualization , Filtering , Frequency , Humans , Image databases , Information retrieval , Layout , Visual databases"}
Jose Rodrigues, Agma J M Traina, Christos Faloutsos, Caetano Traina Jr (2006) SuperGraph Visualization In: 8th IEEE International Symposium on Multimedia 227-234 IEEE Press.
@inproceedings { DBLP:conf/ism/RodriguesTFT06,
title = "SuperGraph Visualization",
year = "2006",
author = "Jose Rodrigues and Agma J M Traina and Christos Faloutsos and Caetano Traina Jr",
booktitle = "8th IEEE International Symposium on Multimedia",
pages = "227-234",
publisher = "IEEE Press",
doi = "10.1109/ISM.2006.143",
url = "http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesJr_et_al-ISM2006.pdf",
urllink = "http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4061172",
abstract = "Given a large social or computer network, how can we visualize it, find patterns, outliers, communities? Although several graph visualization tools exist, they cannot handle large graphs with hundred thousand nodes and possibly million edges. Such graphs bring two challenges: interactive visualization demands prohibitive processing power and, even if we could interactively update the visualization, the user would be overwhelmed by the excessive number of graphical items. To cope with this problem, we propose a formal innovation on the use of graph hierarchies that leads to GMine system. GMine promotes scalability using a hierarchy of graph partitions, promotes concomitant presentation for the graph hierarchy and for the original graph, and extends analytical possibilities with the integration of the graph partitions in an interactive environment.",
keywords = "Application software , Bipartite graph , Computer networks , Computer science , Data structures , Scalability , Technological innovation , Tree graphs , Visualization , Web pages"}
On the Support of a Similarity-Enabled Relational Database Management System ...Universidade de São Paulo
Crowdsourcing solutions can be helpful to extract information from disaster-related data during crisis management. However, certain information can only be obtained through similarity operations. Some of them also depend on additional data stored in a Relational Database Management System (RDBMS). In this context, several works focus on crisis management supported by data. Nevertheless, none of them provides a methodology for employing a similarity-enabled RDBMS in disaster-relief tasks. To fill this gap, we introduce a similarity-enabled methodology together with a supporting architecture named Data-Centric Crisis Management (DCCM), which employs our methods over a RDBMS. We evaluate our proposal through three tasks: classification of incoming data regarding current events, identifying relevant information to guide rescue teams; filtering of incoming data, enhancing the decision support by removing near-duplicate data; and similarity retrieval of historical data, supporting analytical comprehension of the crisis context. To make it possible, similarity-based operations were implemented within one popular, open-source RDBMS. Results using real data from Flickr show that the proposed methodology over DCCM is feasible for real-time applications. In addition to high performance, accurate results were obtained with a proper combination of techniques for each task. At last, given its accuracy and efficiency, we expect our work to provide a framework for further developments on crisis management solutions.
StructMatrix: large-scale visualization of graphs by means of structure detec...Universidade de São Paulo
Given a large-scale graph with millions of nodes and edges, how to reveal macro patterns of interest, like cliques, bi-partite cores, stars, and chains? Furthermore, how to visualize such patterns altogether getting insights from the graph to support wise decision-making? Although there are many algorithmic and visual techniques to analyze graphs, none of the existing approaches is able to present the structural information of graphs at large-scale. Hence, this paper describes StructMatrix, a methodology aimed at high-scalable visual inspection of graph structures with the goal of revealing macro patterns of interest. StructMatrix combines algorithmic structure detection and adjacency matrix visualization to present cardinality, distribution, and relationship features of the structures found in a given graph. We performed experiments in real, large-scale graphs with up to one million nodes and millions of edges. StructMatrix revealed that graphs of high relevance (e.g., Web, Wikipedia and DBLP) have characterizations that reflect the nature of their corresponding domains; our findings have not been seen in the literature so far. We expect that our technique will bring deeper insights into large graph mining, leveraging their use for decision making.
Currently, link recommendation has gained more attention as networked data becomes abundant in several scenarios. However, existing methods for this task have failed in considering solely the structure of dynamic networks for improved performance and accuracy. Hence, in this work, we present a methodology based on the use of multiple topological metrics in order to achieve prospective link recommendations considering time constraints. The combination of such metrics is used as input to binary classification algorithms that state whether two pairs of authors will/should define a link. We experimented with five algorithms, what allowed us to reach high rates of accuracy and to evaluate the different classification paradigms. Our results also demonstrated that time parameters and the activity profile of the authors can significantly influence the recommendation. In the context of DBLP, this research is strategic as it may assist on identifying potential partners, research groups with similar themes, research competition (absence of obvious links), and related work.
Techniques for effective and efficient fire detection from social media imagesUniversidade de São Paulo
Social media provides information, in the form of images, that is valuable to a vast set of human activities, including salvage and rescue in the case of crisis situations (such as accidents, explosions, and fire). However, these services produce images in a rate that is impossible for human beings to absorb and analyze; thus, it is a requirement to have methods for automatic analysis. However, despite the multiple works on image analysis, there are no studies on the specific topic of fire detection over social media. To fill this gap, this work describes the use and the evaluation of an ample set of content-based image retrieval and classification techniques in the task of fire detection. In our intent, we (1) built a ground-truth set of annotated images regarding fire occurrence; (2) engineered the Fast-Fire Detection and Retrieval ($\FFDnR$) architecture to combine configurations of feature extractors and distance functions to work with instance-based learning; and (3) evaluated 36 image descriptors in the task of fire detection. Our results demonstrated that, for fire detection, the best image descriptors concerning efficacy (F-measure, Precision-Recall, and ROC) and processing efficiency (wall-clock time) are achieved with MPEG-7 feature extractors Color Structure and Scalable Color, and with distance functions City-Block and Euclidean. Our work shall provide basis for further developments regarding monitoring of images from social media.
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Universidade de São Paulo
The use of graph theory for analyzing network-like data has gained central importance with the rise of the Web 2.0. However, many graph-based techniques are not well-disseminated and neither explored at their full potential, what might depend on a complimentary approach achieved with the combination of multiple techniques. This paper describes the systematic use of graph-based techniques of different types (multimodal) combining the resultant analytical insights around a common domain, the Digital Bibliography & Library Project (DBLP). To do so, we introduce an analytical ensemble based on statistical (degree, and weakly-connected components distribution), topological (average clustering coefficient, and effective diameter evolution), algorithmic (link prediction/machine learning), and algebraic techniques to inspect non-evident features of DBLP at the same time that we interpret the heterogeneous discoveries found along the work. As a result, we have put together a set of techniques demonstrating over DBLP what we call multimodal analysis, an innovative process of information understanding that demands a wide technical knowledge and a deep understanding of the data domain. We expect that our methodology and our findings will foster other multimodal analyses and also that they will bring light over the Computer Science research.
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Universidade de São Paulo
The semantic segmentation of events on emergency contexts involves the identification of previously defined events of interest. In this work, the focused semantic event is the presence of fire in videos. The literature presents several methods for automatic video fire detection, but these methods were built under assumptions, such as stationary cameras and controlled lightening conditions that are often in contrast to the videos acquired by hand-held devices. To fulfill this gap, we propose a fire detection method, called SPATFIRE. Our method innovates on three aspects: (1) it relies on a specifically tailored color model named Fire-like Pixel Detector able to improve the accuracy of fire detection, (2) it employs a new technique for motion compensation, diminishing the problems observed in videos captured with non-stationary cameras, and, (3) it defines a segmentation method able to identify, not only the presence of fire in a video, but also the segments in the video where fire occurs. We experimented our proposal on two video datasets with different characteristics and summarize the results to demonstrate the superior efficacy, in terms of true positives and negatives, as compared to state-of-the-art methods.
Relational databases are rigid-structured data sources characterized by complex relationships among a set of relations (tables). Making sense of such relationships is a challenging problem because users must consider multiple relations, understand their ensemble of integrity constraints, interpret dozens of attributes, and draw complex SQL queries for each desired data exploration. In this scenario, we introduce a twofold methodology; we use a hierarchical graph representation to efficiently model the database relationships and, on top of it, we designed a visualization technique for rapidly relational exploration. Our results demonstrate that the exploration of databases is profoundly simplified as the user is able to visually browse the data with little or no knowledge about its structure, dismissing the need of complex SQL queries. We believe our findings will bring a novel paradigm in what concerns relational data comprehension.
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsUniversidade de São Paulo
Inference problems on networks and their algorithms were always important subjects, but more so now with so much data available and so little time to make sense of it.
Common applications range from product recommendation to social networks and protein interaction.
One of the main inferences in this types of networks is the guilty-by-association method, where labeled nodes propagate their information throughout the network, towards unlabeled nodes.
While there is a widely used algorithm for this context, called Belief Propagation, it lacks the necessary convergence guarantees for loopy-networks.
More recently, a new alternative method was proposed, called LinBP and while it solved the convergence issue, the scalability for large graphs that do not fit memory remains a challenge.
Additionally, most works that try to use BP considering large scale graphs rely on specific infrastructure such as supercomputers and computational clusters.
Therefore we propose a new algorithm, that leverages state-of-the-art asynchronous vertex-centric parallel processing techniques in conjunction with the state-of-the-art BP alternative LinBP, to provide a scalable framework for large graph inference that runs on a single commodity machine.
Our results show that our algorithm is up to 200 times faster than LinBP's SQL implementation on tested networks, while achieving the same accuracy rate.
We also show that due to the asynchronous processing, our algorithm actually needs less iterations to converge when compared to LinBP when using the same parameters.
Finally, we believe that our methodology highlights the yet not fully explored parallelism available on commodity machines, leaning towards a more cost-efficient computational paradigm.
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelUniversidade de São Paulo
Recent graph computation approaches have demonstrated that a single PC can perform efficiently on billion-scale graphs. While these approaches achieve scalability by optimizing I/O operations, they do not fully exploit the capabilities of modern hard drives and processors. To overcome their performance, in this work, we introduce the Bimodal Block Processing (BBP), an innovation that is able to boost the graph computation by minimizing the I/O cost even further. With this strategy, we achieved the following contributions: (1) \mflash, the fastest graph computation framework to date; (2) a flexible and simple programming model to easily implement popular and essential graph algorithms, including the \textit{first} single-machine billion-scale eigensolver; and (3) extensive experiments on real graphs with up to 6.6 billion edges, demonstrating M-Flash's consistent and significant speedup.
Stephan Makowski
Seal Digitisation with Reflectance Transformation Imaging (RTI)
Making Sigillographic Material Accessible to Researchers – Digitising, Catalogues, Editions of Seals
13 October 2016, Provincial Archives, Opava – branch Olomouc
MediaEval 2016 - UNED-UV @ Retrieving Diverse Social Images Taskmultimediaeval
Ana Garcia Serrano
UNED-UV @ Retrieving Diverse Social Images Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Angel C. González, Xaro B. Garcia, Ana García-Serrano, Esther de Ves Cuenca
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_17.pdf
Video: https://youtu.be/EyuN7IA1HFk
Abstract: This paper details the participation of the UNED-UV group at the MediaEval 2016 Retrieving Diverse Social Images Task using a multimodal approach. Several Local Logistic Regression models, which use the visual low-level features, estimate the relevance probability for all the images in the dataset. Then, the images are ranked by selecting the highest probability image at each of the textual clusters. These textual clusters are generated by making use of a textual algorithm based on Formal Concept Analysis (FCA) and Hierarchical Agglomerative Clustering (HAC) to detect the latent topics addressed. The images will be then diversified according to detected topics.
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...Ansgar Scherp
ACM SIGMM Rising Stars Symposium
The ACM SIGMM Rising Stars Symposium, inaugurated in 2015, will highlight plenary presentations of six selected rising SIGMM members on their vision and research achievements, and dialogs with senior members about the future of multimedia research.
See: http://www.acmmm.org/2016/?page_id=706
On the Support of a Similarity-Enabled Relational Database Management System ...Universidade de São Paulo
Crowdsourcing solutions can be helpful to extract information from disaster-related data during crisis management. However, certain information can only be obtained through similarity operations. Some of them also depend on additional data stored in a Relational Database Management System (RDBMS). In this context, several works focus on crisis management supported by data. Nevertheless, none of them provides a methodology for employing a similarity-enabled RDBMS in disaster-relief tasks. To fill this gap, we introduce a similarity-enabled methodology together with a supporting architecture named Data-Centric Crisis Management (DCCM), which employs our methods over a RDBMS. We evaluate our proposal through three tasks: classification of incoming data regarding current events, identifying relevant information to guide rescue teams; filtering of incoming data, enhancing the decision support by removing near-duplicate data; and similarity retrieval of historical data, supporting analytical comprehension of the crisis context. To make it possible, similarity-based operations were implemented within one popular, open-source RDBMS. Results using real data from Flickr show that the proposed methodology over DCCM is feasible for real-time applications. In addition to high performance, accurate results were obtained with a proper combination of techniques for each task. At last, given its accuracy and efficiency, we expect our work to provide a framework for further developments on crisis management solutions.
StructMatrix: large-scale visualization of graphs by means of structure detec...Universidade de São Paulo
Given a large-scale graph with millions of nodes and edges, how to reveal macro patterns of interest, like cliques, bi-partite cores, stars, and chains? Furthermore, how to visualize such patterns altogether getting insights from the graph to support wise decision-making? Although there are many algorithmic and visual techniques to analyze graphs, none of the existing approaches is able to present the structural information of graphs at large-scale. Hence, this paper describes StructMatrix, a methodology aimed at high-scalable visual inspection of graph structures with the goal of revealing macro patterns of interest. StructMatrix combines algorithmic structure detection and adjacency matrix visualization to present cardinality, distribution, and relationship features of the structures found in a given graph. We performed experiments in real, large-scale graphs with up to one million nodes and millions of edges. StructMatrix revealed that graphs of high relevance (e.g., Web, Wikipedia and DBLP) have characterizations that reflect the nature of their corresponding domains; our findings have not been seen in the literature so far. We expect that our technique will bring deeper insights into large graph mining, leveraging their use for decision making.
Currently, link recommendation has gained more attention as networked data becomes abundant in several scenarios. However, existing methods for this task have failed in considering solely the structure of dynamic networks for improved performance and accuracy. Hence, in this work, we present a methodology based on the use of multiple topological metrics in order to achieve prospective link recommendations considering time constraints. The combination of such metrics is used as input to binary classification algorithms that state whether two pairs of authors will/should define a link. We experimented with five algorithms, what allowed us to reach high rates of accuracy and to evaluate the different classification paradigms. Our results also demonstrated that time parameters and the activity profile of the authors can significantly influence the recommendation. In the context of DBLP, this research is strategic as it may assist on identifying potential partners, research groups with similar themes, research competition (absence of obvious links), and related work.
Techniques for effective and efficient fire detection from social media imagesUniversidade de São Paulo
Social media provides information, in the form of images, that is valuable to a vast set of human activities, including salvage and rescue in the case of crisis situations (such as accidents, explosions, and fire). However, these services produce images in a rate that is impossible for human beings to absorb and analyze; thus, it is a requirement to have methods for automatic analysis. However, despite the multiple works on image analysis, there are no studies on the specific topic of fire detection over social media. To fill this gap, this work describes the use and the evaluation of an ample set of content-based image retrieval and classification techniques in the task of fire detection. In our intent, we (1) built a ground-truth set of annotated images regarding fire occurrence; (2) engineered the Fast-Fire Detection and Retrieval ($\FFDnR$) architecture to combine configurations of feature extractors and distance functions to work with instance-based learning; and (3) evaluated 36 image descriptors in the task of fire detection. Our results demonstrated that, for fire detection, the best image descriptors concerning efficacy (F-measure, Precision-Recall, and ROC) and processing efficiency (wall-clock time) are achieved with MPEG-7 feature extractors Color Structure and Scalable Color, and with distance functions City-Block and Euclidean. Our work shall provide basis for further developments regarding monitoring of images from social media.
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Universidade de São Paulo
The use of graph theory for analyzing network-like data has gained central importance with the rise of the Web 2.0. However, many graph-based techniques are not well-disseminated and neither explored at their full potential, what might depend on a complimentary approach achieved with the combination of multiple techniques. This paper describes the systematic use of graph-based techniques of different types (multimodal) combining the resultant analytical insights around a common domain, the Digital Bibliography & Library Project (DBLP). To do so, we introduce an analytical ensemble based on statistical (degree, and weakly-connected components distribution), topological (average clustering coefficient, and effective diameter evolution), algorithmic (link prediction/machine learning), and algebraic techniques to inspect non-evident features of DBLP at the same time that we interpret the heterogeneous discoveries found along the work. As a result, we have put together a set of techniques demonstrating over DBLP what we call multimodal analysis, an innovative process of information understanding that demands a wide technical knowledge and a deep understanding of the data domain. We expect that our methodology and our findings will foster other multimodal analyses and also that they will bring light over the Computer Science research.
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Universidade de São Paulo
The semantic segmentation of events on emergency contexts involves the identification of previously defined events of interest. In this work, the focused semantic event is the presence of fire in videos. The literature presents several methods for automatic video fire detection, but these methods were built under assumptions, such as stationary cameras and controlled lightening conditions that are often in contrast to the videos acquired by hand-held devices. To fulfill this gap, we propose a fire detection method, called SPATFIRE. Our method innovates on three aspects: (1) it relies on a specifically tailored color model named Fire-like Pixel Detector able to improve the accuracy of fire detection, (2) it employs a new technique for motion compensation, diminishing the problems observed in videos captured with non-stationary cameras, and, (3) it defines a segmentation method able to identify, not only the presence of fire in a video, but also the segments in the video where fire occurs. We experimented our proposal on two video datasets with different characteristics and summarize the results to demonstrate the superior efficacy, in terms of true positives and negatives, as compared to state-of-the-art methods.
Relational databases are rigid-structured data sources characterized by complex relationships among a set of relations (tables). Making sense of such relationships is a challenging problem because users must consider multiple relations, understand their ensemble of integrity constraints, interpret dozens of attributes, and draw complex SQL queries for each desired data exploration. In this scenario, we introduce a twofold methodology; we use a hierarchical graph representation to efficiently model the database relationships and, on top of it, we designed a visualization technique for rapidly relational exploration. Our results demonstrate that the exploration of databases is profoundly simplified as the user is able to visually browse the data with little or no knowledge about its structure, dismissing the need of complex SQL queries. We believe our findings will bring a novel paradigm in what concerns relational data comprehension.
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsUniversidade de São Paulo
Inference problems on networks and their algorithms were always important subjects, but more so now with so much data available and so little time to make sense of it.
Common applications range from product recommendation to social networks and protein interaction.
One of the main inferences in this types of networks is the guilty-by-association method, where labeled nodes propagate their information throughout the network, towards unlabeled nodes.
While there is a widely used algorithm for this context, called Belief Propagation, it lacks the necessary convergence guarantees for loopy-networks.
More recently, a new alternative method was proposed, called LinBP and while it solved the convergence issue, the scalability for large graphs that do not fit memory remains a challenge.
Additionally, most works that try to use BP considering large scale graphs rely on specific infrastructure such as supercomputers and computational clusters.
Therefore we propose a new algorithm, that leverages state-of-the-art asynchronous vertex-centric parallel processing techniques in conjunction with the state-of-the-art BP alternative LinBP, to provide a scalable framework for large graph inference that runs on a single commodity machine.
Our results show that our algorithm is up to 200 times faster than LinBP's SQL implementation on tested networks, while achieving the same accuracy rate.
We also show that due to the asynchronous processing, our algorithm actually needs less iterations to converge when compared to LinBP when using the same parameters.
Finally, we believe that our methodology highlights the yet not fully explored parallelism available on commodity machines, leaning towards a more cost-efficient computational paradigm.
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelUniversidade de São Paulo
Recent graph computation approaches have demonstrated that a single PC can perform efficiently on billion-scale graphs. While these approaches achieve scalability by optimizing I/O operations, they do not fully exploit the capabilities of modern hard drives and processors. To overcome their performance, in this work, we introduce the Bimodal Block Processing (BBP), an innovation that is able to boost the graph computation by minimizing the I/O cost even further. With this strategy, we achieved the following contributions: (1) \mflash, the fastest graph computation framework to date; (2) a flexible and simple programming model to easily implement popular and essential graph algorithms, including the \textit{first} single-machine billion-scale eigensolver; and (3) extensive experiments on real graphs with up to 6.6 billion edges, demonstrating M-Flash's consistent and significant speedup.
Stephan Makowski
Seal Digitisation with Reflectance Transformation Imaging (RTI)
Making Sigillographic Material Accessible to Researchers – Digitising, Catalogues, Editions of Seals
13 October 2016, Provincial Archives, Opava – branch Olomouc
MediaEval 2016 - UNED-UV @ Retrieving Diverse Social Images Taskmultimediaeval
Ana Garcia Serrano
UNED-UV @ Retrieving Diverse Social Images Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Angel C. González, Xaro B. Garcia, Ana García-Serrano, Esther de Ves Cuenca
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_17.pdf
Video: https://youtu.be/EyuN7IA1HFk
Abstract: This paper details the participation of the UNED-UV group at the MediaEval 2016 Retrieving Diverse Social Images Task using a multimodal approach. Several Local Logistic Regression models, which use the visual low-level features, estimate the relevance probability for all the images in the dataset. Then, the images are ranked by selecting the highest probability image at each of the textual clusters. These textual clusters are generated by making use of a textual algorithm based on Formal Concept Analysis (FCA) and Hierarchical Agglomerative Clustering (HAC) to detect the latent topics addressed. The images will be then diversified according to detected topics.
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...Ansgar Scherp
ACM SIGMM Rising Stars Symposium
The ACM SIGMM Rising Stars Symposium, inaugurated in 2015, will highlight plenary presentations of six selected rising SIGMM members on their vision and research achievements, and dialogs with senior members about the future of multimedia research.
See: http://www.acmmm.org/2016/?page_id=706
Informações sobre as oportunidade de estudo e carreira promovidos pelo curso de Ciências de Computação da Universidade de São Paulo, campus de São Carlos.
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop:
Business Intelligence e Big Data
Big Data warehousing
Arquitetura de um data warehouse
Hadoop e Apache Hive
Extract Transform Load
Data warehouse vs Banco de dados operacional
OLAP – Online Analytical Processing
Apache Kylin
Soluções OLAP convencionais
Advanced Analytics com o Apache Mahout
Metric s plat - a platform for quick development testing and visualization of...Universidade de São Paulo
Jose Rodrigues, Luciana A S Romani, Luciana Zaina, Ricardo Ciferri (2009) MetricSPlat - A platform for quick development, testing and visualization of content-based retrieval techniques In: Simpósio Brasileiro de Bancos de Dados - SBBD2009 1-6.
@inproceedings { RodriguesSBBD09,
title = "MetricSPlat - A platform for quick development, testing and visualization of content-based retrieval techniques",
year = "2009",
author = "Jose Rodrigues and Luciana A S Romani and Luciana Zaina and Ricardo Ciferri",
booktitle = "Simpósio Brasileiro de Bancos de Dados - SBBD2009",
pages = "1-6",
url = "http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesSBBD09-MetricSPlat.pdf",
urllink = "http://www.icmc.usp.br/~junio/MetricSPlat/index.htm",
abstract = "The development and testing of content-based data retrieval systems is a time-consuming task. Over the concept of metric space, such systems must integrate the three factors that deï¬ne an indexing environment. These factors are features extraction, metric structures and distance functions, not to mention a suitable user interface. This integration deviates the work from the real focus of research, suppressing quick experimentation of ideas. In this context, we present the Metric Space Platform (MetricSPlat), a system designed for content-based retrieval enabled with plug-in features. With minimal effort, MetricSPlat substantially speeds up the experimentation of new techniques by providing a well-deï¬ned framework aided with interactive data visualization techniques.",
note = "8 pages",
keywords = "visualization, content-based data retrieval"}
Hierarchical visual filtering pragmatic and epistemic actions for database vi...Universidade de São Paulo
Jose Rodrigues, Carlos E Cirilo, Luciana A M Zaina, Antonio F Prado (2013) Hierarchical Visual Filtering, pragmatic and epistemic actions for database visualization In: Proceedings of the ACM Symposium on Applied Computing Edited by:ACM Press. 946-952 ACM Press.
@inproceedings { ref35,
title = "Hierarchical Visual Filtering, pragmatic and epistemic actions for database visualization",
year = "2013",
author = "Jose Rodrigues and Carlos E Cirilo and Luciana A M Zaina and Antonio F Prado",
booktitle = "Proceedings of the ACM Symposium on Applied Computing",
editor = "A C M Press",
pages = "946-952",
publisher = "ACM Press",
doi = "10.1145/2480362.2480545",
url = "http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesJr_et_al-ACMSAC2013.pdf",
urllink = "http://www.icmc.usp.br/~junio/VisTree/VisTree.htm",
abstract = "Visualization techniques of all sorts suffer from visual cluttering, the occlusion of visual information due to the overlap of graphical items, and from excessive complexity in analytical tasks due to multiple parallel perspectives drawn from the data at hand. To cope with these problems, we introduce Hierarchical Visual Filtering, a novel interaction principle that brings pragmatic and epistemic actions to visualization techniques. Pragmatic actions here mean that the analyst is able to visually select and filter information, determining visual configurations that reveal different perspectives of the data; epistemic actions mean that the analyst can record, annotate, and recall intermediate visualizations created over his pragmatic actions. To do so, we use a tree-like structure to keep multiple visualization workspaces linked according to the analytical decisions took by the user. Our goal is to promote an innovative systematization that can augment the potential for database visual inspection, and for visualization systems in general. It is our contention that Hierarchical Visual Filtering can inspire a novel scheme of visualization environments in which space limitations and complexity are treated by means of interactive tasks.",
keywords = "Information Visualization, Multiple Views, Visual Data Analysis, Databases, Interactive Filtering, Hierarchical Filtering"}
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Unveiling smoke in social images with the SmokeBlock approach
1. Unveiling smoke in social
images with the SmokeBlock
approach
Jessica Andressa de Souza (speaker)
jessicasouza@usp.br
Mirela T. Cazzolato, Marcos V. N. Bedo, Alceu F. Costa, Caetano
Traina Jr., Jose F. Rodrigues Jr. and Agma J. M. Traina
SAC 2016 – 31 𝑠𝑡
ACM Symposium on Applied Computing – Pisa, Italy (April 4-8, 2016)
3. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
RESCUER Project
• The RESCUER project is a Brazil-Europe
consortium
• Goal: develop solutions to improve the decision-
making process in disaster situations
Industrial plants
Densely populated area
Crowded events
3
4. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
RESCUER Project
4
The user may also upload
multimedia content such as
photo and video
Multimedia
data are
automatically
analyzed
http://www.rescuer-project.org
Smartphone user sends data
(including photos) of the
situation
Uploaded
multimedia data is
automatically
analyzed
5. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
Research Question
Given a collection of RESCUER user reports
containing images of the situation, can we
automatically detect the presence of smoke?
5
Smoke
Detection
6. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
Problem Definition
6
Given a set of images from social
media/crowdsourcing, find the subset
of images that depict smoke while
minimizing the rate of false-positives.
7. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
Smoke Detection - Challenges
• Absence of movement
• Harder than smoke detection in videos
• Smoke may have different colors:
• Depends on: temperature, material, illumination
• Smoke may be transparent
• May be visually similar to clouds, mist, rain
7
9. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
Proposed Method: SmokeBlock
• Input:
• Unlabeled image
• Set of labeled super-pixels
• Output:
• Segmented image
• Global classification
(Smoke vs. Not-smoke)
11
…
Smoke
10. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016 12
SmokeBlock: Pipeline
Superpixel
Extraction
Feature
Extraction
Superpixel
Classification
Global Smoke
Detection
• Problem: not all regions of the input
image may contain smoke
• Solution: break the image into visually
homogeneous regions
• Algorithm: SLIC
11. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016 13
SmokeBlock: Pipeline
Superpixel
Extraction
Feature
Extraction
Superpixel
Classification
Global Smoke
Detection
• From each super-pixel we extract a
feature vector
• Numerical representation of low-level
visual content
V1 = (0.31, 0.41, 0.59, …, 0.26)
V2 = (1.41, 4.21, 3.56, ..., 2.37)
VN = (0.31, 0.41, 0.59, …, 0.26)
12. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016 14
SmokeBlock: Pipeline
Superpixel
Extraction
Feature
Extraction
Superpixel
Classification
Global Smoke
Detection
Why extract features from superpixels?
1. Parts of the input image may not
contain smoke
• Global features are not adequate
2. By analyzing group of pixels we can
extract texture features
• Not possible when analyzing individual
pixels
13. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016 15
SmokeBlock: Pipeline
Superpixel
Extraction
Feature
Extraction
Superpixel
Classification
Global Smoke
Detection
• SmokeBlock uses a binary classifier that
predict the class of a superpixel given
the feature vectors
• Classes: smoke vs. not-smoke
Training set: manually annotated superpixels
from different images
14. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016 16
SmokeBlock: Pipeline
Superpixel
Extraction
Feature
Extraction
Superpixel
Classification
Global Smoke
Detection
• Goal: decide if image contains smoke
• Application: find all images uploaded to
RESCUER that contain smoke
• Naive approach:
• Classify an image as positive (has smoke) if
at least one superpixel is classified as
containing smoke
• Problem: if a single superpixel is wrongly
classified, then the all the image will be
wrongly classified
17. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
Flickr-Smoke Dataset
• Downloaded a set of images from Flickr API:
• Image retrieval with textual descriptors as `smoke
fire´ and `smoke forest´
• 7 subjects annotated the images as containing or
not traces of smoke
• 832 images were labeled as smoke, and 834 as
non-smoke
19
19. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
Outline
Introduction
Proposed Approach: SmokeBlock
Experimental Results
Visual features
Smoke segmentation
Global classification
Conclusions
21
20. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
Experimental Analysis
• Identification of the two best feature
extraction methods for smoke
detection
22
1202 superpixels
Higher is
better
Lower is better
21. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
Experimental Analysis
• Identification of the two best feature
extraction methods for smoke
detection
23
1202 superpixels
Higher is
better
Lower is better
22. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
Experimental Analysis
• Identification of the two best feature
extraction methods for smoke
detection
24
1202 superpixels
Higher is
better
Lower is better
23. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
Outline
Introduction
Proposed Approach: SmokeBlock
Experimental Results
Visual features
Smoke segmentation
Global classification
Conclusions
25
24. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
Evaluation: smoke segmentation
• We compared SmokeBlock to:
• Celik, Chen
• Both methods classify individual pixels based
on color information
26
26. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
Outline
Introduction
Proposed Approach: SmokeBlock
Experimental Results
Visual features
Smoke segmentation
Global classification
Conclusions
28
27. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
Evaluation: Global classification
Problem: Given a set of unlabeled images,
assign a class (smoke vs. not-smoke) to each
image.
• Celik and Chen are not able to classify
images
• Baselines:
• Global feature vectors
29
28. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016 30
Higher is
better
SmokeBlock was the most accurate approach
Evaluation: Global classification
30. ACM SAC 2016 (Pisa, Italy) SmokeBlock April 04-08, 2016
Conclusions
• We designed a new approach for smoke
detection: SmokeBlock
• More accurate than state-of-the-art approaches
when analyzing still images
• Visual feature evaluation:
• We analyzed different feature extraction methods
w.r.t. smoke detection accuracy
• Flickr-Smoke dataset:
• Annotated and publicly available dataset
36
31. Unveiling smoke in social
images with the SmokeBlock
approach
Mirela T. Cazzolato, Marcos V. N. Bedo, Alceu F. Costa,
Jessica A. de Souza, Caetano Traina Jr., Jose F. Rodrigues Jr.
and Agma J. M. Traina
SAC 2016 – 31 𝑠𝑡
ACM Symposium on Applied Computing – Pisa, Italy (April 4-8, 2016)
jessicasouza@usp.br
Thank You!