The use of graph theory for analyzing network-like data has gained central importance with the rise of the Web 2.0. However, many graph-based techniques are not well-disseminated and neither explored at their full potential, what might depend on a complimentary approach achieved with the combination of multiple techniques. This paper describes the systematic use of graph-based techniques of different types (multimodal) combining the resultant analytical insights around a common domain, the Digital Bibliography & Library Project (DBLP). To do so, we introduce an analytical ensemble based on statistical (degree, and weakly-connected components distribution), topological (average clustering coefficient, and effective diameter evolution), algorithmic (link prediction/machine learning), and algebraic techniques to inspect non-evident features of DBLP at the same time that we interpret the heterogeneous discoveries found along the work. As a result, we have put together a set of techniques demonstrating over DBLP what we call multimodal analysis, an innovative process of information understanding that demands a wide technical knowledge and a deep understanding of the data domain. We expect that our methodology and our findings will foster other multimodal analyses and also that they will bring light over the Computer Science research.
Porosity Calculation Using Techlog SoftwaresMohamed Qasim
Calculate of the Effective Porosity Using the Sonic Log Theory in Techlog Software, this presentation also show how to visualize the data in star plots.
A dense depth representation for vlad descriptors inFederico Magliani
The recent advances brought by deep learning allowed to improve the performance in image retrieval tasks. Through the many convolutional layers, available in a Convolutional Neural Network (CNN), it is possible to obtain a hierarchy of features from the evaluated image. At every step, the patches extracted are smaller than the previous levels and more representative. Following this idea, this paper introduces a new detector applied on the feature maps extracted from pre-trained CNN. Specifically, this approach lets to increase the number of features in order to increase the performance of the aggregation algorithms like the most famous and used VLAD embedding. The proposed approach is tested on different public datasets: Holidays, Oxford5k, Paris6k and UKB.
Porosity Calculation Using Techlog SoftwaresMohamed Qasim
Calculate of the Effective Porosity Using the Sonic Log Theory in Techlog Software, this presentation also show how to visualize the data in star plots.
A dense depth representation for vlad descriptors inFederico Magliani
The recent advances brought by deep learning allowed to improve the performance in image retrieval tasks. Through the many convolutional layers, available in a Convolutional Neural Network (CNN), it is possible to obtain a hierarchy of features from the evaluated image. At every step, the patches extracted are smaller than the previous levels and more representative. Following this idea, this paper introduces a new detector applied on the feature maps extracted from pre-trained CNN. Specifically, this approach lets to increase the number of features in order to increase the performance of the aggregation algorithms like the most famous and used VLAD embedding. The proposed approach is tested on different public datasets: Holidays, Oxford5k, Paris6k and UKB.
A multimodal discourse analysis of video games (toh weimin)Toh Weimin
This is a presentation of my PhD dissertation at the International Conference on Narrative 2016 at the University of Amsterdam on 17 June 2016 from 1:15 - 2:45 pm (Panel G7 - Narrative and Video Game Characters: Perspectives on Cognition, Meaning-making, and Subjectivity)
Can we use information from social media and crowdsourced images to detect smoke and assist rescue forces? While there are computer vision methods for detecting smoke, they require movement information extracted from video data. In this paper we propose SmokeBlock: a method that is able to segment and detect smoke in still images. SmokeBlock uses superpixel segmentation and extracts local color and texture features from images to spot smoke. We used real data from Flickr and compared SmokeBlock against state-of-the-art methods for feature extraction. Our method achieved performance superior than the competitors, for the task of smoke detection. Our findings shall support further investigations in the field of image analysis, in particular, concerning images captured with mobile devices.
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...Universidade de São Paulo
Given a very large dataset of moderate-to-high di-
mensionality, how to mine useful patterns from it? In such
cases, dimensionality reduction is essential to overcome the
“curse of dimensionality”. Although there exist algorithms to
reduce the dimensionality of Big Data, unfortunately, they
all fail to identify/eliminate non-linear correlations between
attributes. This paper tackles the problem by exploring con-
cepts of the Fractal Theory and massive parallel processing
to present Curl-Remover, a novel dimensionality reduction
technique for very large datasets. Our contributions are: Curl-
Remover eliminates linear and non-linear attribute correlations
as well as irrelevant ones; it is unsupervised and suits for
analytical tasks in general – not only classification; it presents
linear scale-up; it does not require the user to guess the
number of attributes to be removed, and; it preserves the
attributes’ semantics. We performed experiments on synthetic
and real data spanning up to 1.1 billion points and Curl-
Remover outperformed a PCA-based algorithm, being up to
8% more accurate.
Several graph visualization tools exist. However, they are not able to handle large graphs, and/or they do not allow interaction. We are interested on large graphs, with hundreds of thousands of nodes. Such graphs bring two challenges: the first one is that any straightforward interactive manipulation will be prohibitively slow. The second one is sensory overload: even if we could plot and replot the graph quickly, the user would be overwhelmed with the vast volume of information because the screen would be too cluttered as nodes and edges overlap each other. GMine system addresses both these issues, by using summarization and multi-resolution. GMine offers multi-resolution graph exploration by partitioning a given graph into a hierarchy of com-munities-within-communities and storing it into a novel R-tree-like structure which we name G-Tree. GMine offers summarization by implementing an innovative subgraph extraction algorithm and then visualizing its output.
http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesJr_et_al_Frequency_Plot-SIBGRAPI2003.pdf
Jose Rodrigues, Agma J M Traina, Caetano Traina Jr (2003) Frequency Plot and Relevance Plot to Enhance Visual Data Exploration In: XVI Brazilian Symposium on Computer Graphics and Image Processing 117-124 IEEE Press.
@inproceedings { DBLP:conf/sibgrapi/RodriguesTT03,
title = "Frequency Plot and Relevance Plot to Enhance Visual Data Exploration",
year = "2003",
author = "Jose Rodrigues and Agma J M Traina and Caetano Traina Jr",
booktitle = " XVI Brazilian Symposium on Computer Graphics and Image Processing",
pages = "117-124",
publisher = "IEEE Press",
doi = "10.1109/SIBGRA.2003.1240999",
url = "http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesJr_et_al_Frequency_Plot-SIBGRAPI2003.pdf",
urllink = "http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=1240999&",
abstract = "We present two techniques aiming at exploring databases through multivariate visualizations. Both techniques intend to deal with the problem caused by the limited amount of elements that can be presented simultaneously in traditional visual exploration procedures. The first technique, the Frequency Plot, combines data frequency with interactive filtering to identify clusters and trends in subsets of the database. Thus, graphical elements (lines, pixels, icons, or graphical marks) are color differentiated proportionally to how frequent the value being represented is, while interactive filtering allows the selection of interesting partitions of the database. The second technique, the Relevance Plot, corresponds to assigning different levels of color distinguishably to visual elements according to their relevance to a user's specified data properties set, which can be chosen visually and dynamically.",
keywords = "Computer science , Data analysis , Data visualization , Filtering , Frequency , Humans , Image databases , Information retrieval , Layout , Visual databases"}
Jose Rodrigues, Agma J M Traina, Christos Faloutsos, Caetano Traina Jr (2006) SuperGraph Visualization In: 8th IEEE International Symposium on Multimedia 227-234 IEEE Press.
@inproceedings { DBLP:conf/ism/RodriguesTFT06,
title = "SuperGraph Visualization",
year = "2006",
author = "Jose Rodrigues and Agma J M Traina and Christos Faloutsos and Caetano Traina Jr",
booktitle = "8th IEEE International Symposium on Multimedia",
pages = "227-234",
publisher = "IEEE Press",
doi = "10.1109/ISM.2006.143",
url = "http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesJr_et_al-ISM2006.pdf",
urllink = "http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4061172",
abstract = "Given a large social or computer network, how can we visualize it, find patterns, outliers, communities? Although several graph visualization tools exist, they cannot handle large graphs with hundred thousand nodes and possibly million edges. Such graphs bring two challenges: interactive visualization demands prohibitive processing power and, even if we could interactively update the visualization, the user would be overwhelmed by the excessive number of graphical items. To cope with this problem, we propose a formal innovation on the use of graph hierarchies that leads to GMine system. GMine promotes scalability using a hierarchy of graph partitions, promotes concomitant presentation for the graph hierarchy and for the original graph, and extends analytical possibilities with the integration of the graph partitions in an interactive environment.",
keywords = "Application software , Bipartite graph , Computer networks , Computer science , Data structures , Scalability , Technological innovation , Tree graphs , Visualization , Web pages"}
On the Support of a Similarity-Enabled Relational Database Management System ...Universidade de São Paulo
Crowdsourcing solutions can be helpful to extract information from disaster-related data during crisis management. However, certain information can only be obtained through similarity operations. Some of them also depend on additional data stored in a Relational Database Management System (RDBMS). In this context, several works focus on crisis management supported by data. Nevertheless, none of them provides a methodology for employing a similarity-enabled RDBMS in disaster-relief tasks. To fill this gap, we introduce a similarity-enabled methodology together with a supporting architecture named Data-Centric Crisis Management (DCCM), which employs our methods over a RDBMS. We evaluate our proposal through three tasks: classification of incoming data regarding current events, identifying relevant information to guide rescue teams; filtering of incoming data, enhancing the decision support by removing near-duplicate data; and similarity retrieval of historical data, supporting analytical comprehension of the crisis context. To make it possible, similarity-based operations were implemented within one popular, open-source RDBMS. Results using real data from Flickr show that the proposed methodology over DCCM is feasible for real-time applications. In addition to high performance, accurate results were obtained with a proper combination of techniques for each task. At last, given its accuracy and efficiency, we expect our work to provide a framework for further developments on crisis management solutions.
StructMatrix: large-scale visualization of graphs by means of structure detec...Universidade de São Paulo
Given a large-scale graph with millions of nodes and edges, how to reveal macro patterns of interest, like cliques, bi-partite cores, stars, and chains? Furthermore, how to visualize such patterns altogether getting insights from the graph to support wise decision-making? Although there are many algorithmic and visual techniques to analyze graphs, none of the existing approaches is able to present the structural information of graphs at large-scale. Hence, this paper describes StructMatrix, a methodology aimed at high-scalable visual inspection of graph structures with the goal of revealing macro patterns of interest. StructMatrix combines algorithmic structure detection and adjacency matrix visualization to present cardinality, distribution, and relationship features of the structures found in a given graph. We performed experiments in real, large-scale graphs with up to one million nodes and millions of edges. StructMatrix revealed that graphs of high relevance (e.g., Web, Wikipedia and DBLP) have characterizations that reflect the nature of their corresponding domains; our findings have not been seen in the literature so far. We expect that our technique will bring deeper insights into large graph mining, leveraging their use for decision making.
Currently, link recommendation has gained more attention as networked data becomes abundant in several scenarios. However, existing methods for this task have failed in considering solely the structure of dynamic networks for improved performance and accuracy. Hence, in this work, we present a methodology based on the use of multiple topological metrics in order to achieve prospective link recommendations considering time constraints. The combination of such metrics is used as input to binary classification algorithms that state whether two pairs of authors will/should define a link. We experimented with five algorithms, what allowed us to reach high rates of accuracy and to evaluate the different classification paradigms. Our results also demonstrated that time parameters and the activity profile of the authors can significantly influence the recommendation. In the context of DBLP, this research is strategic as it may assist on identifying potential partners, research groups with similar themes, research competition (absence of obvious links), and related work.
Techniques for effective and efficient fire detection from social media imagesUniversidade de São Paulo
Social media provides information, in the form of images, that is valuable to a vast set of human activities, including salvage and rescue in the case of crisis situations (such as accidents, explosions, and fire). However, these services produce images in a rate that is impossible for human beings to absorb and analyze; thus, it is a requirement to have methods for automatic analysis. However, despite the multiple works on image analysis, there are no studies on the specific topic of fire detection over social media. To fill this gap, this work describes the use and the evaluation of an ample set of content-based image retrieval and classification techniques in the task of fire detection. In our intent, we (1) built a ground-truth set of annotated images regarding fire occurrence; (2) engineered the Fast-Fire Detection and Retrieval ($\FFDnR$) architecture to combine configurations of feature extractors and distance functions to work with instance-based learning; and (3) evaluated 36 image descriptors in the task of fire detection. Our results demonstrated that, for fire detection, the best image descriptors concerning efficacy (F-measure, Precision-Recall, and ROC) and processing efficiency (wall-clock time) are achieved with MPEG-7 feature extractors Color Structure and Scalable Color, and with distance functions City-Block and Euclidean. Our work shall provide basis for further developments regarding monitoring of images from social media.
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Universidade de São Paulo
The semantic segmentation of events on emergency contexts involves the identification of previously defined events of interest. In this work, the focused semantic event is the presence of fire in videos. The literature presents several methods for automatic video fire detection, but these methods were built under assumptions, such as stationary cameras and controlled lightening conditions that are often in contrast to the videos acquired by hand-held devices. To fulfill this gap, we propose a fire detection method, called SPATFIRE. Our method innovates on three aspects: (1) it relies on a specifically tailored color model named Fire-like Pixel Detector able to improve the accuracy of fire detection, (2) it employs a new technique for motion compensation, diminishing the problems observed in videos captured with non-stationary cameras, and, (3) it defines a segmentation method able to identify, not only the presence of fire in a video, but also the segments in the video where fire occurs. We experimented our proposal on two video datasets with different characteristics and summarize the results to demonstrate the superior efficacy, in terms of true positives and negatives, as compared to state-of-the-art methods.
Relational databases are rigid-structured data sources characterized by complex relationships among a set of relations (tables). Making sense of such relationships is a challenging problem because users must consider multiple relations, understand their ensemble of integrity constraints, interpret dozens of attributes, and draw complex SQL queries for each desired data exploration. In this scenario, we introduce a twofold methodology; we use a hierarchical graph representation to efficiently model the database relationships and, on top of it, we designed a visualization technique for rapidly relational exploration. Our results demonstrate that the exploration of databases is profoundly simplified as the user is able to visually browse the data with little or no knowledge about its structure, dismissing the need of complex SQL queries. We believe our findings will bring a novel paradigm in what concerns relational data comprehension.
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelUniversidade de São Paulo
Recent graph computation approaches have demonstrated that a single PC can perform efficiently on billion-scale graphs. While these approaches achieve scalability by optimizing I/O operations, they do not fully exploit the capabilities of modern hard drives and processors. To overcome their performance, in this work, we introduce the Bimodal Block Processing (BBP), an innovation that is able to boost the graph computation by minimizing the I/O cost even further. With this strategy, we achieved the following contributions: (1) \mflash, the fastest graph computation framework to date; (2) a flexible and simple programming model to easily implement popular and essential graph algorithms, including the \textit{first} single-machine billion-scale eigensolver; and (3) extensive experiments on real graphs with up to 6.6 billion edges, demonstrating M-Flash's consistent and significant speedup.
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsUniversidade de São Paulo
Inference problems on networks and their algorithms were always important subjects, but more so now with so much data available and so little time to make sense of it.
Common applications range from product recommendation to social networks and protein interaction.
One of the main inferences in this types of networks is the guilty-by-association method, where labeled nodes propagate their information throughout the network, towards unlabeled nodes.
While there is a widely used algorithm for this context, called Belief Propagation, it lacks the necessary convergence guarantees for loopy-networks.
More recently, a new alternative method was proposed, called LinBP and while it solved the convergence issue, the scalability for large graphs that do not fit memory remains a challenge.
Additionally, most works that try to use BP considering large scale graphs rely on specific infrastructure such as supercomputers and computational clusters.
Therefore we propose a new algorithm, that leverages state-of-the-art asynchronous vertex-centric parallel processing techniques in conjunction with the state-of-the-art BP alternative LinBP, to provide a scalable framework for large graph inference that runs on a single commodity machine.
Our results show that our algorithm is up to 200 times faster than LinBP's SQL implementation on tested networks, while achieving the same accuracy rate.
We also show that due to the asynchronous processing, our algorithm actually needs less iterations to converge when compared to LinBP when using the same parameters.
Finally, we believe that our methodology highlights the yet not fully explored parallelism available on commodity machines, leaning towards a more cost-efficient computational paradigm.
A multimodal discourse analysis of video games (toh weimin)Toh Weimin
This is a presentation of my PhD dissertation at the International Conference on Narrative 2016 at the University of Amsterdam on 17 June 2016 from 1:15 - 2:45 pm (Panel G7 - Narrative and Video Game Characters: Perspectives on Cognition, Meaning-making, and Subjectivity)
Can we use information from social media and crowdsourced images to detect smoke and assist rescue forces? While there are computer vision methods for detecting smoke, they require movement information extracted from video data. In this paper we propose SmokeBlock: a method that is able to segment and detect smoke in still images. SmokeBlock uses superpixel segmentation and extracts local color and texture features from images to spot smoke. We used real data from Flickr and compared SmokeBlock against state-of-the-art methods for feature extraction. Our method achieved performance superior than the competitors, for the task of smoke detection. Our findings shall support further investigations in the field of image analysis, in particular, concerning images captured with mobile devices.
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...Universidade de São Paulo
Given a very large dataset of moderate-to-high di-
mensionality, how to mine useful patterns from it? In such
cases, dimensionality reduction is essential to overcome the
“curse of dimensionality”. Although there exist algorithms to
reduce the dimensionality of Big Data, unfortunately, they
all fail to identify/eliminate non-linear correlations between
attributes. This paper tackles the problem by exploring con-
cepts of the Fractal Theory and massive parallel processing
to present Curl-Remover, a novel dimensionality reduction
technique for very large datasets. Our contributions are: Curl-
Remover eliminates linear and non-linear attribute correlations
as well as irrelevant ones; it is unsupervised and suits for
analytical tasks in general – not only classification; it presents
linear scale-up; it does not require the user to guess the
number of attributes to be removed, and; it preserves the
attributes’ semantics. We performed experiments on synthetic
and real data spanning up to 1.1 billion points and Curl-
Remover outperformed a PCA-based algorithm, being up to
8% more accurate.
Several graph visualization tools exist. However, they are not able to handle large graphs, and/or they do not allow interaction. We are interested on large graphs, with hundreds of thousands of nodes. Such graphs bring two challenges: the first one is that any straightforward interactive manipulation will be prohibitively slow. The second one is sensory overload: even if we could plot and replot the graph quickly, the user would be overwhelmed with the vast volume of information because the screen would be too cluttered as nodes and edges overlap each other. GMine system addresses both these issues, by using summarization and multi-resolution. GMine offers multi-resolution graph exploration by partitioning a given graph into a hierarchy of com-munities-within-communities and storing it into a novel R-tree-like structure which we name G-Tree. GMine offers summarization by implementing an innovative subgraph extraction algorithm and then visualizing its output.
http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesJr_et_al_Frequency_Plot-SIBGRAPI2003.pdf
Jose Rodrigues, Agma J M Traina, Caetano Traina Jr (2003) Frequency Plot and Relevance Plot to Enhance Visual Data Exploration In: XVI Brazilian Symposium on Computer Graphics and Image Processing 117-124 IEEE Press.
@inproceedings { DBLP:conf/sibgrapi/RodriguesTT03,
title = "Frequency Plot and Relevance Plot to Enhance Visual Data Exploration",
year = "2003",
author = "Jose Rodrigues and Agma J M Traina and Caetano Traina Jr",
booktitle = " XVI Brazilian Symposium on Computer Graphics and Image Processing",
pages = "117-124",
publisher = "IEEE Press",
doi = "10.1109/SIBGRA.2003.1240999",
url = "http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesJr_et_al_Frequency_Plot-SIBGRAPI2003.pdf",
urllink = "http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=1240999&",
abstract = "We present two techniques aiming at exploring databases through multivariate visualizations. Both techniques intend to deal with the problem caused by the limited amount of elements that can be presented simultaneously in traditional visual exploration procedures. The first technique, the Frequency Plot, combines data frequency with interactive filtering to identify clusters and trends in subsets of the database. Thus, graphical elements (lines, pixels, icons, or graphical marks) are color differentiated proportionally to how frequent the value being represented is, while interactive filtering allows the selection of interesting partitions of the database. The second technique, the Relevance Plot, corresponds to assigning different levels of color distinguishably to visual elements according to their relevance to a user's specified data properties set, which can be chosen visually and dynamically.",
keywords = "Computer science , Data analysis , Data visualization , Filtering , Frequency , Humans , Image databases , Information retrieval , Layout , Visual databases"}
Jose Rodrigues, Agma J M Traina, Christos Faloutsos, Caetano Traina Jr (2006) SuperGraph Visualization In: 8th IEEE International Symposium on Multimedia 227-234 IEEE Press.
@inproceedings { DBLP:conf/ism/RodriguesTFT06,
title = "SuperGraph Visualization",
year = "2006",
author = "Jose Rodrigues and Agma J M Traina and Christos Faloutsos and Caetano Traina Jr",
booktitle = "8th IEEE International Symposium on Multimedia",
pages = "227-234",
publisher = "IEEE Press",
doi = "10.1109/ISM.2006.143",
url = "http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesJr_et_al-ISM2006.pdf",
urllink = "http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4061172",
abstract = "Given a large social or computer network, how can we visualize it, find patterns, outliers, communities? Although several graph visualization tools exist, they cannot handle large graphs with hundred thousand nodes and possibly million edges. Such graphs bring two challenges: interactive visualization demands prohibitive processing power and, even if we could interactively update the visualization, the user would be overwhelmed by the excessive number of graphical items. To cope with this problem, we propose a formal innovation on the use of graph hierarchies that leads to GMine system. GMine promotes scalability using a hierarchy of graph partitions, promotes concomitant presentation for the graph hierarchy and for the original graph, and extends analytical possibilities with the integration of the graph partitions in an interactive environment.",
keywords = "Application software , Bipartite graph , Computer networks , Computer science , Data structures , Scalability , Technological innovation , Tree graphs , Visualization , Web pages"}
On the Support of a Similarity-Enabled Relational Database Management System ...Universidade de São Paulo
Crowdsourcing solutions can be helpful to extract information from disaster-related data during crisis management. However, certain information can only be obtained through similarity operations. Some of them also depend on additional data stored in a Relational Database Management System (RDBMS). In this context, several works focus on crisis management supported by data. Nevertheless, none of them provides a methodology for employing a similarity-enabled RDBMS in disaster-relief tasks. To fill this gap, we introduce a similarity-enabled methodology together with a supporting architecture named Data-Centric Crisis Management (DCCM), which employs our methods over a RDBMS. We evaluate our proposal through three tasks: classification of incoming data regarding current events, identifying relevant information to guide rescue teams; filtering of incoming data, enhancing the decision support by removing near-duplicate data; and similarity retrieval of historical data, supporting analytical comprehension of the crisis context. To make it possible, similarity-based operations were implemented within one popular, open-source RDBMS. Results using real data from Flickr show that the proposed methodology over DCCM is feasible for real-time applications. In addition to high performance, accurate results were obtained with a proper combination of techniques for each task. At last, given its accuracy and efficiency, we expect our work to provide a framework for further developments on crisis management solutions.
StructMatrix: large-scale visualization of graphs by means of structure detec...Universidade de São Paulo
Given a large-scale graph with millions of nodes and edges, how to reveal macro patterns of interest, like cliques, bi-partite cores, stars, and chains? Furthermore, how to visualize such patterns altogether getting insights from the graph to support wise decision-making? Although there are many algorithmic and visual techniques to analyze graphs, none of the existing approaches is able to present the structural information of graphs at large-scale. Hence, this paper describes StructMatrix, a methodology aimed at high-scalable visual inspection of graph structures with the goal of revealing macro patterns of interest. StructMatrix combines algorithmic structure detection and adjacency matrix visualization to present cardinality, distribution, and relationship features of the structures found in a given graph. We performed experiments in real, large-scale graphs with up to one million nodes and millions of edges. StructMatrix revealed that graphs of high relevance (e.g., Web, Wikipedia and DBLP) have characterizations that reflect the nature of their corresponding domains; our findings have not been seen in the literature so far. We expect that our technique will bring deeper insights into large graph mining, leveraging their use for decision making.
Currently, link recommendation has gained more attention as networked data becomes abundant in several scenarios. However, existing methods for this task have failed in considering solely the structure of dynamic networks for improved performance and accuracy. Hence, in this work, we present a methodology based on the use of multiple topological metrics in order to achieve prospective link recommendations considering time constraints. The combination of such metrics is used as input to binary classification algorithms that state whether two pairs of authors will/should define a link. We experimented with five algorithms, what allowed us to reach high rates of accuracy and to evaluate the different classification paradigms. Our results also demonstrated that time parameters and the activity profile of the authors can significantly influence the recommendation. In the context of DBLP, this research is strategic as it may assist on identifying potential partners, research groups with similar themes, research competition (absence of obvious links), and related work.
Techniques for effective and efficient fire detection from social media imagesUniversidade de São Paulo
Social media provides information, in the form of images, that is valuable to a vast set of human activities, including salvage and rescue in the case of crisis situations (such as accidents, explosions, and fire). However, these services produce images in a rate that is impossible for human beings to absorb and analyze; thus, it is a requirement to have methods for automatic analysis. However, despite the multiple works on image analysis, there are no studies on the specific topic of fire detection over social media. To fill this gap, this work describes the use and the evaluation of an ample set of content-based image retrieval and classification techniques in the task of fire detection. In our intent, we (1) built a ground-truth set of annotated images regarding fire occurrence; (2) engineered the Fast-Fire Detection and Retrieval ($\FFDnR$) architecture to combine configurations of feature extractors and distance functions to work with instance-based learning; and (3) evaluated 36 image descriptors in the task of fire detection. Our results demonstrated that, for fire detection, the best image descriptors concerning efficacy (F-measure, Precision-Recall, and ROC) and processing efficiency (wall-clock time) are achieved with MPEG-7 feature extractors Color Structure and Scalable Color, and with distance functions City-Block and Euclidean. Our work shall provide basis for further developments regarding monitoring of images from social media.
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Universidade de São Paulo
The semantic segmentation of events on emergency contexts involves the identification of previously defined events of interest. In this work, the focused semantic event is the presence of fire in videos. The literature presents several methods for automatic video fire detection, but these methods were built under assumptions, such as stationary cameras and controlled lightening conditions that are often in contrast to the videos acquired by hand-held devices. To fulfill this gap, we propose a fire detection method, called SPATFIRE. Our method innovates on three aspects: (1) it relies on a specifically tailored color model named Fire-like Pixel Detector able to improve the accuracy of fire detection, (2) it employs a new technique for motion compensation, diminishing the problems observed in videos captured with non-stationary cameras, and, (3) it defines a segmentation method able to identify, not only the presence of fire in a video, but also the segments in the video where fire occurs. We experimented our proposal on two video datasets with different characteristics and summarize the results to demonstrate the superior efficacy, in terms of true positives and negatives, as compared to state-of-the-art methods.
Relational databases are rigid-structured data sources characterized by complex relationships among a set of relations (tables). Making sense of such relationships is a challenging problem because users must consider multiple relations, understand their ensemble of integrity constraints, interpret dozens of attributes, and draw complex SQL queries for each desired data exploration. In this scenario, we introduce a twofold methodology; we use a hierarchical graph representation to efficiently model the database relationships and, on top of it, we designed a visualization technique for rapidly relational exploration. Our results demonstrate that the exploration of databases is profoundly simplified as the user is able to visually browse the data with little or no knowledge about its structure, dismissing the need of complex SQL queries. We believe our findings will bring a novel paradigm in what concerns relational data comprehension.
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelUniversidade de São Paulo
Recent graph computation approaches have demonstrated that a single PC can perform efficiently on billion-scale graphs. While these approaches achieve scalability by optimizing I/O operations, they do not fully exploit the capabilities of modern hard drives and processors. To overcome their performance, in this work, we introduce the Bimodal Block Processing (BBP), an innovation that is able to boost the graph computation by minimizing the I/O cost even further. With this strategy, we achieved the following contributions: (1) \mflash, the fastest graph computation framework to date; (2) a flexible and simple programming model to easily implement popular and essential graph algorithms, including the \textit{first} single-machine billion-scale eigensolver; and (3) extensive experiments on real graphs with up to 6.6 billion edges, demonstrating M-Flash's consistent and significant speedup.
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsUniversidade de São Paulo
Inference problems on networks and their algorithms were always important subjects, but more so now with so much data available and so little time to make sense of it.
Common applications range from product recommendation to social networks and protein interaction.
One of the main inferences in this types of networks is the guilty-by-association method, where labeled nodes propagate their information throughout the network, towards unlabeled nodes.
While there is a widely used algorithm for this context, called Belief Propagation, it lacks the necessary convergence guarantees for loopy-networks.
More recently, a new alternative method was proposed, called LinBP and while it solved the convergence issue, the scalability for large graphs that do not fit memory remains a challenge.
Additionally, most works that try to use BP considering large scale graphs rely on specific infrastructure such as supercomputers and computational clusters.
Therefore we propose a new algorithm, that leverages state-of-the-art asynchronous vertex-centric parallel processing techniques in conjunction with the state-of-the-art BP alternative LinBP, to provide a scalable framework for large graph inference that runs on a single commodity machine.
Our results show that our algorithm is up to 200 times faster than LinBP's SQL implementation on tested networks, while achieving the same accuracy rate.
We also show that due to the asynchronous processing, our algorithm actually needs less iterations to converge when compared to LinBP when using the same parameters.
Finally, we believe that our methodology highlights the yet not fully explored parallelism available on commodity machines, leaning towards a more cost-efficient computational paradigm.
Larry will discuss what data science means in general, and more specifically at Udemy. He will describe some key data science frameworks, and what it means for them to be agile. He will also discuss ideally what it would mean to be a data scientist at Udemy.
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording
of metadata for their interpretation.
The FAIR Guiding Principles for scientific data management and stewardship (http://www.nature.com/articles/sdata201618) has been an effective rallying-cry for EU and USA Research Infrastructures. FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has 8 years of experience of asset sharing and data infrastructure ranging across European programmes (SysMO and EraSysAPP ERANets), national initiatives (de.NBI, German Virtual Liver Network, UK SynBio centres) and PI's labs. It aims to support Systems and Synthetic Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges of and approaches to sharing, credit, citation and asset infrastructures in practice. I'll also highlight recent experiments in affecting sharing using behavioural interventions.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
Presented at COMBINE 2016, Newcastle, 19 September.
http://co.mbine.org/events/COMBINE_2016
Webinar: A bi-objective multiperiod fuzzy scheduling for a multimodal urban t...BRTCoE
2015-02-05 by Paula Ávila
Get the video and more info here:
http://www.brt.cl/webinar-a-bi-objetive-multiperiod-fuzzy-scheduling-for-a-multimodal-urban-transport-system
EzPAARSE is open source software that analyses your locally gathered proxy logfiles and provides you with COUNTER-deduplicated, KBART-formatted and geolocalised reports of your users’ accesses to subscribed e-resources. Come and watch us demo it live to understand how it works and learn how to install it in your institution for producing your own enriched measures and indicators.
Conference: 23rd ICE/IEEE ITMC Conference
(ICE2017).
Madeira, Portugal – June 27-30, 2017
Title of the paper: An Approach to Production
Scheduling Optimization
A Case of an Oil Lubrication and Hydraulic Systems
Manufacturer
Authors: Artem Katasonov, Toni Lastusilta, Timo
Korvola, Leila Saari, Dan Bendas, Roberto Camp,
Wael M. Mohammed, Angelica Nieto Lee
if you would like to receive a reprint of the
original paper, please contact us.
Model-Based Optimization for Effective and Reliable Decision-MakingBob Fourer
Optimization originated as an advanced mathematical technique, but it has become an accessible and widely used decision-making tool. A key factor in the spread of successful optimization applications has been the adoption of a model-based approach: A domain expert or operations analyst focuses on modeling the problem of interest, while the computation of a solution is left to general-purpose, off-the-shelf solvers; powerful yet intuitive modeling software manages the difficulties of translating between the human modeler’s formulation and the solver software’s needs. This talk introduces model-based optimization by contrasting it to a method-based approach that relies on customized implementation of rules and algorithms. Model-based implementations are illustrated using the AMPL modeling language and popular solvers. The presentation concludes by surveying the variety of modeling languages and solvers available for model-based optimization today.
Talk given at the International Conference on Cognitive Modelling, University of Groningen on 10 April 2015.#
CC0 - Public Domain
To the extent possible under law, Caspar Addyman has waived all copyright and related or neighboring rights to Open science in cognitive modeling. This work is published from: United Kingdom.
Our research demonstrates how data assimilation can be used, with a non-hydrostatic coastal ocean model, to study sub-mesoscale processes and accurately estimate the state variables. The implementation is non trivial for physical ocean models which are highly nonlinear, sensitive to perturbations, and require a dense spatial discretization in order to correctly reproduce the dynamics. A major challenge of this approach is the high computational cost incurred by a high-resolution numerical model with a three-dimensional data assimilation scheme in a complicated stratified system. Interfacing the General Curvilinear Coastal Ocean Model (GCCOM) with the faster data assimilation framework, NCAR Data Assimilation Research Testbed (DART), allowed us to assimilate very high resolution observations into the system. Observing System Simulation Experiments (OSSEs) in very steep seamount test cases are presented. These were used to explore the proper initial ensemble members for the model, estimate the observation error variance needed to reproduce the dynamics in a turbulent flow experiment, and to analyze the impact of localization in such small processes. Our results demonstrate that the DART-GCCOM model can assimilate high resolution observations (tenths of meters) using as few as 30 ensemble members.
- Complexity of climate systems
- Climate modelling
- The need for modelling
- System thinking
- Analytical vs Numerical modeling
- Mathematical models
- Modeling process and model selection
- Model Uncertainty
- Modeling application and tools
We participated in the MediaEval Benchmarking whose goal is to concentrate on the multimodal geo-location prediction on the Yahoo! Flickr Creative Commons 100M dataset – the placing task. It challenges participants to develop models and/or techniques to estimate the geographic locations of the Flickr resources based on textual metadata, e.g. titles, descriptions and tags. We aim to find a procedure that is conceptual to understand, simple to implement and exible to integrate different techniques. In this paper, we present a three-step approach to tackle the locale-based sub-task.
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
Informações sobre as oportunidade de estudo e carreira promovidos pelo curso de Ciências de Computação da Universidade de São Paulo, campus de São Carlos.
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop:
Business Intelligence e Big Data
Big Data warehousing
Arquitetura de um data warehouse
Hadoop e Apache Hive
Extract Transform Load
Data warehouse vs Banco de dados operacional
OLAP – Online Analytical Processing
Apache Kylin
Soluções OLAP convencionais
Advanced Analytics com o Apache Mahout
Metric s plat - a platform for quick development testing and visualization of...Universidade de São Paulo
Jose Rodrigues, Luciana A S Romani, Luciana Zaina, Ricardo Ciferri (2009) MetricSPlat - A platform for quick development, testing and visualization of content-based retrieval techniques In: Simpósio Brasileiro de Bancos de Dados - SBBD2009 1-6.
@inproceedings { RodriguesSBBD09,
title = "MetricSPlat - A platform for quick development, testing and visualization of content-based retrieval techniques",
year = "2009",
author = "Jose Rodrigues and Luciana A S Romani and Luciana Zaina and Ricardo Ciferri",
booktitle = "Simpósio Brasileiro de Bancos de Dados - SBBD2009",
pages = "1-6",
url = "http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesSBBD09-MetricSPlat.pdf",
urllink = "http://www.icmc.usp.br/~junio/MetricSPlat/index.htm",
abstract = "The development and testing of content-based data retrieval systems is a time-consuming task. Over the concept of metric space, such systems must integrate the three factors that deï¬ne an indexing environment. These factors are features extraction, metric structures and distance functions, not to mention a suitable user interface. This integration deviates the work from the real focus of research, suppressing quick experimentation of ideas. In this context, we present the Metric Space Platform (MetricSPlat), a system designed for content-based retrieval enabled with plug-in features. With minimal effort, MetricSPlat substantially speeds up the experimentation of new techniques by providing a well-deï¬ned framework aided with interactive data visualization techniques.",
note = "8 pages",
keywords = "visualization, content-based data retrieval"}
Hierarchical visual filtering pragmatic and epistemic actions for database vi...Universidade de São Paulo
Jose Rodrigues, Carlos E Cirilo, Luciana A M Zaina, Antonio F Prado (2013) Hierarchical Visual Filtering, pragmatic and epistemic actions for database visualization In: Proceedings of the ACM Symposium on Applied Computing Edited by:ACM Press. 946-952 ACM Press.
@inproceedings { ref35,
title = "Hierarchical Visual Filtering, pragmatic and epistemic actions for database visualization",
year = "2013",
author = "Jose Rodrigues and Carlos E Cirilo and Luciana A M Zaina and Antonio F Prado",
booktitle = "Proceedings of the ACM Symposium on Applied Computing",
editor = "A C M Press",
pages = "946-952",
publisher = "ACM Press",
doi = "10.1145/2480362.2480545",
url = "http://www.icmc.usp.br/~junio/PublishedPapers/RodriguesJr_et_al-ACMSAC2013.pdf",
urllink = "http://www.icmc.usp.br/~junio/VisTree/VisTree.htm",
abstract = "Visualization techniques of all sorts suffer from visual cluttering, the occlusion of visual information due to the overlap of graphical items, and from excessive complexity in analytical tasks due to multiple parallel perspectives drawn from the data at hand. To cope with these problems, we introduce Hierarchical Visual Filtering, a novel interaction principle that brings pragmatic and epistemic actions to visualization techniques. Pragmatic actions here mean that the analyst is able to visually select and filter information, determining visual configurations that reveal different perspectives of the data; epistemic actions mean that the analyst can record, annotate, and recall intermediate visualizations created over his pragmatic actions. To do so, we use a tree-like structure to keep multiple visualization workspaces linked according to the analytical decisions took by the user. Our goal is to promote an innovative systematization that can augment the potential for database visual inspection, and for visualization systems in general. It is our contention that Hierarchical Visual Filtering can inspire a novel scheme of visualization environments in which space limitations and complexity are treated by means of interactive tasks.",
keywords = "Information Visualization, Multiple Views, Visual Data Analysis, Databases, Interactive Filtering, Hierarchical Filtering"}
Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses
1. Introduction Methodology Experiments Conclusions
Multimodal graph-based analysis over the DBLP
repository: critical discoveries and hypotheses
Gabriel Perri Gimenes, Hugo Gualdron, Jose F Rodrigues Jr 1
Mario Gazziro 2
1University of Sao Paulo 2Fed. University of Santo Andre
Av Trab Sao-carlense, 400 Av dos Estados, 500
Sao Carlos, SP, Brazil - 13566-590 Santo Andre, SP, Brazil - 09210-580
{ggimenes,gualdron,junio}@icmc.usp.br mario.gazziro@ufabc.edu.br
This work has financial support from Fapesp (2013/10026-7)
http://www.icmc.usp.br/pessoas/junio/Site/index.htm
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 1/21
4. Introduction Methodology Experiments Conclusions
Introduction
High demand for informations about the behavior of
scientists: authors, editors, funding agencies and society
Combining analytical techniques - multimodal approach
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 4/21
5. Introduction Methodology Experiments Conclusions
Problem
Finding non-evident facts about DBLP is a non-trivial task
Single-technique approaches - limited analytical potential
Sistematic process - can be applied on similar data from other
domains
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 5/21
6. Introduction Methodology Experiments Conclusions
Hypothesis
Hypothesis
The use of multiple analytical techniques, through a well-defined
process, is capable of revealing important aspects of the scientific
community in computer science
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 6/21
8. Introduction Methodology Experiments Conclusions
Materials
Cardinality of the entities extracted from DBLP - XML
Entity Number
Authors 1.060.221
Articles 1.801.576
Events 14.654
Publications 4.262
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 8/21
9. Introduction Methodology Experiments Conclusions
Data migration
Semi-structured format ⇒ Relational model
Need of specific software for the migration
Definition of the entity-relationship model:
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 9/21
10. Introduction Methodology Experiments Conclusions
Extracted relationships
Relationship Description
Co-authorship Authors that published an article
togheter.
Co-edition Authors that appear as editors in the
same event or journal.
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 10/21
12. Introduction Methodology Experiments Conclusions
Multimodal Analysis - WCC
Weakly-connected components distribution - Co-authorship
13% small components with up to 30 nodes
Giant component with 87% of the authors
44.000 sub-networks of co-authorship - eventual researchers,
industry white papers
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 12/21
13. Introduction Methodology Experiments Conclusions
Multimodal Analysis - ACC
Node degree × average clustering coefficient - Co-authorship
High coefficient values are found in nodes with degree < 10
Coefficient value decreases as the node degree increases - ACC ∝ degree−1.06
Authors tend to colaborate with the co-authors of their co-authors - triangles
Young authors vs. older authors
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 13/21
14. Introduction Methodology Experiments Conclusions
Multimodal Analysis - Densification
Degree distribution - Co-autorship
As new authors appear new edges also appear - e(t) ∝ n(t)1.47 - densification
Edges appear exponentially vs. publication of elaborated articles
Master and Ph.D as regular courses
Funding agencies - numbers
More authors per paper
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 14/21
15. Introduction Methodology Experiments Conclusions
Multimodal Analysis - Diameter
Effective diameter evolution - Co-edition
Peaked near 1995 - beginning of a shrink period
Before that - new editors/publication vehicles vs. after that - same editor/same
vehicles
Densification period: more new edges than new nodes - editor commitees rotate
between same members
Editor: experience and expertise - limitations for new researchers
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 15/21
16. Introduction Methodology Experiments Conclusions
Multimodal Analysis - Previsibility
Previsibility analysis - Co-authoring
Can we predict new interactions in the DBLP newtork?
Extraction of topological features → supervised learning
Figure: Results - Interval G[1995, 2005], G[2006, 2007]
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 16/21
17. Introduction Methodology Experiments Conclusions
Multimodal Analysis - Counting and algebraic analysis
Counting - Bipartite author-article network with timestamps
Accomplishment: number of years with at least one
publication
Silence: number of consecutive years with no publications
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 17/21
18. Introduction Methodology Experiments Conclusions
Multimodal Analysis - Counting and algebraic analysis
Proposed metric
Importance = 1√
silence+1
∗ log(Accomplishment)
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 18/21
20. Introduction Methodology Experiments Conclusions
Conclusions
Well-defined analytical process - combination of multiple
techniques
Non-trivial extraction of information from DBLP
Multi-perspective interpretations about the past and future of
the academic community in computer science
Application in the decision making process of funding agencies
and academic personnel
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 20/21