This presentation was given at the SSW workshop, collocated with VLDB 2012.
Exploratory search applications upon structured Web content is becoming one of the main information seeking paradigms for users. This is mainly due to the move towards mobile and pervasive Web access and to the more and more tight intertwining between everyday life and information seeking.
Structured data is typically distributed on the Web and accessible through a service-oriented paradigm. This paper proposes a vision on: (1) a semantically-enabled service registration framework for describing in a Web data services in a convenient way; and (2) a design method for applications that exploit such model using a design pattern -based method.
A tutorial on the query auto-completion, with more than 10 conference papers, about the development of current query auto-completion and its personalized, time-sensitive , mobile features and so on.
A recommendation engine for your php applicationMichele Orselli
Nowadays a lot of websites try to guess what we might like: ”Recommendation for you in books”
”People you may like”
Sounds familiar, isn’t it? Wouldn’t be cool if you could do the same in your application? Well, this session is for you! In the first part of this talk recommendation systems will be introduced, focusing on collaborative filtering algorithms (CR). After that we’ll dive in Prediction.io, an open source machine learning server for software developers to create predictive features, such as personalization, recommendation and content discovery. In the last part we’ll cover the integration details with a PHP application
Improving Software Languages: usage patterns to the rescueJordi Cabot
In this presentation at PAME 2015 (STAF co-located workshop) I argue that most patterns are useless and the only ones I´m interested in are usage patterns showing how people really use a language. Then I go on to claim that these usage patterns are the key to evolve a language in order to improve its usability
Read more on: http://modeling-languages.com and http://jordicabot.com
From Declarative to Imperative Operation Specifications (ER 2007)Jordi Cabot
Declarative specifications are better but have ambiguity problems. Here I present my common sense -based approach to interpret declarative postconditions in a non-ambiguos way
Full paper: http://jordicabot.com/papers/ER07.pdf
(presented in the ER'07 conference)
A tutorial on the query auto-completion, with more than 10 conference papers, about the development of current query auto-completion and its personalized, time-sensitive , mobile features and so on.
A recommendation engine for your php applicationMichele Orselli
Nowadays a lot of websites try to guess what we might like: ”Recommendation for you in books”
”People you may like”
Sounds familiar, isn’t it? Wouldn’t be cool if you could do the same in your application? Well, this session is for you! In the first part of this talk recommendation systems will be introduced, focusing on collaborative filtering algorithms (CR). After that we’ll dive in Prediction.io, an open source machine learning server for software developers to create predictive features, such as personalization, recommendation and content discovery. In the last part we’ll cover the integration details with a PHP application
Improving Software Languages: usage patterns to the rescueJordi Cabot
In this presentation at PAME 2015 (STAF co-located workshop) I argue that most patterns are useless and the only ones I´m interested in are usage patterns showing how people really use a language. Then I go on to claim that these usage patterns are the key to evolve a language in order to improve its usability
Read more on: http://modeling-languages.com and http://jordicabot.com
From Declarative to Imperative Operation Specifications (ER 2007)Jordi Cabot
Declarative specifications are better but have ambiguity problems. Here I present my common sense -based approach to interpret declarative postconditions in a non-ambiguos way
Full paper: http://jordicabot.com/papers/ER07.pdf
(presented in the ER'07 conference)
Modeling Safe Interface Interactions in Web Applications (ER´09)Jordi Cabot
Moving the Web (and the supporting browsers) from the browsing paradigm based on Pages, with related Back and Forward actions, to a full-fledged interactive application paradigm, based on the concept of State, that features Undo and Redo capabilities, and transactional properties
Execution Semantics of BPMN through MDE Web Application Generation, using BPM...Marco Brambilla
We describe a pragmatic approach based on Model Driven Engineering (MDE) principles for implmenting the execution semantics of BPMN. The approach is based on a two-step model transformation that transforms BPMN models into Web application models specified according to the WebML notation and then into running Web applications. Thanks to the proposed chain of model transformations it is also possible to fine tune the final application in several ways by refining the intermediate WebML application models.
Model driven crowdsourcing of search (CrowdSearch2012 workshop at www2012)Marco Brambilla
Even though search systems are very efficient in retrieving world-wide information, they can not capture some peculiar aspects and features of user needs, such as subjective opinions and recommendations, or information that require local or domain specific expertise. In this kind of scenario, the human opinion provided by an expert or knowledgeable user can be more useful than any factual information retrieved by a search engine.
In this paper we propose a model-driven approach for the specification of crowd-search tasks, i.e. activities where real people – in real time – take part to the generalized search process that involve search engines. In particular we define two models: the“Query Task Model”, representing the meta- model of the query that is submitted to the crowd and the associated answers; and the “User Interaction Model”, which shows how the user can interact with the query model to fulfill her needs. Our solution allows for a top-down design approach, from the crowd-search task design, down to the crowd answering system design. Our approach also grants automatic code generation thus leading to quick prototyping of search applications based on human responses collected over social networking or crowdsourcing platforms.
Looking at WordPress through the eyes of a Software ResearcherJordi Cabot
Talk given at WordCamp Europe 2015 on:
"What does a researcher have to say about the WordPress source code and the community behind it? Join us on this talk on unusual “WordPress analytics” and see what we can learn, and improve!, from the way WordPress (and the plugin and theme ecosystem around it) is developed nowadays."
More on: http://modeling-languages.com and http://jordicabot.com
The MDE International post-graduate specialization Diploma in Model Driven Engineering (MDE) for Software Management is offered by Ecole des Mines de Nantes. Its objective is to train engineers to manage complex projects in various IT fields with the latest cutting-edge modeling technologies.
Current conceptual models and methodologies for Web applications concentrate on content, navigation, and service modeling. Although some of them are meant to address semantic web applications too, they do not fully exploit the whole potential deriving from interaction with ontological data sources and and from Semantic annotations. This paper proposes an extension to Web application conceptual models toward Semantic Web. We devise an extension of the WebML modeling framework that fulfills most of the design requirements emerging for the new area of Semantic Web. We generalize the development process to cover Semantic Web and we devise a set of new primitives for ontology importing and querying. Finally, an implementation prototype of the proposed concepts is proposed within the commercial tool WebRatio.
WebML and WebRatio 5 - TOOLS conference, Zurich 2008Marco Brambilla
We present WebRatio 5.0, a design tool that supports WebML (Web Modelling Language). WebML is a domain specific language (DSL) for designing complex, distributed, multi-actor, and adaptive applications deployed on the Web and on Service Oriented Architectures using Web Services. WebRatio 5.0 provides visual design facilities based on the WebML notation and code generation engines for J2EE Web applications. The tool is developed as a set of Eclipse plug-ins and takes advantage of all the features of this IDE framework. It also provides support of customized extensions to the models, project documentation, and requirements specifications. The overall approach moves towards a full coverage of the specification, design, verification, and implementation of Web applications.
Strategic scenarios in digital content and digital businessMarco Brambilla
This lesson was given in May 2009 at MIP, Politecnico di Milano. The audience included members of the Acer academy program.
Rights on reused content are maintained by respective owners.
See further information on my activity at:
http://home.dei.polimi.it/mbrambil/
and:
http://twitter.com/marcobrambi
Collaborative definition of Domain - Specific Languages (DSLs ) - CAiSE'13Jordi Cabot
Proposing a community-aware language development process where all community members (both developers and end-users of the DSL) participate: voting and discussing proposals, solutions and decisions.
This tutorial gives an overview of how search engines and machine learning techniques can be tightly coupled to address the need for building scalable recommender or other prediction based systems. Typically, most of them architect retrieval and prediction in two phases. In Phase I, a search engine returns the top-k results based on constraints expressed as a query. In Phase II, the top-k results are re-ranked in another system according to an optimization function that uses a supervised trained model. However this approach presents several issues, such as the possibility of returning sub-optimal results due to the top-k limits during query, as well as the prescence of some inefficiencies in the system due to the decoupling of retrieval and ranking.
To address this issue the authors created ML-Scoring, an open source framework that tightly integrates machine learning models into Elasticsearch, a popular search engine. ML-Scoring replaces the default information retrieval ranking function with a custom supervised model that is trained through Spark, Weka, or R that is loaded as a plugin in Elasticsearch. This tutorial will not only review basic methods in information retrieval and machine learning, but it will also walk through practical examples from loading a dataset into Elasticsearch to training a model in Spark, Weka, or R, to creating the ML-Scoring plugin for Elasticsearch. No prior experience is required in any system listed (Elasticsearch, Spark, Weka, R), though some programming experience is recommended.
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
Search engines have focused on solving the document retrieval problem, so their scoring functions do not handle naturally non-traditional IR data types, such as numerical or categorical. Therefore, on domains beyond traditional search, scores representing strengths of associations or matches may vary widely. As such, the original model doesn’t suffice, so relevance ranking is performed as a two-phase approach with 1) regular search 2) external model to re-rank the filtered items. Metrics such as click-through and conversion rates are associated with the users’ response to items served. The predicted selection rates that arise in real-time can be critical for optimal matching. For example, in recommender systems, predicted performance of a recommended item in a given context, also called response prediction, is often used in determining a set of recommendations to serve in relation to a given serving opportunity. Similar techniques are used in the advertising domain. To address this issue the authors have created ML-Scoring, an open source framework that tightly integrates machine learning models into a popular search engine (SOLR/Elasticsearch), replacing the default IR-based ranking function. A custom model is trained through either Weka or Spark and it is loaded as a plugin used at query time to compute custom scores.
Keyword research tools for Search Engine Optimisation (SEO)Duncan MacGruer
Presentation given to the University of Edinburgh web publishers community in January 2018 on the use of Keyword research tools for Search Engine Optimisation (SEO).
Search is nothing without context. In this session we help you build your strategy around best serving your customers for Office 365 through planning and best practices. SharePoint 2013 is compared to Office 365 as well.
Determining the overall system performance and measuring the quality of complex search systems are tough questions. Changes come from all subsystems of the complex system, at the same time, making it difficult to assess which modification came from which sub-component and whether they improved or regressed the overall performance. If this wasn’t hard enough, the target against which you are measuring your search system is also constantly evolving, sometimes in real time. Regression testing of the system and its components is crucial, but resources are limited. In this talk I discuss some of the issues involved and some possible ways of dealing with these problems. In particular I want to present an academic view of what I should have known about search quality before I joined Cuil in 2008.
Making IA Real: Planning an Information Architecture StrategyChiara Fox Ogan
Presented at Internet Librarian conference in 2001. Provides an introduction to what information architecture is and how you can use the methods to develop a good website.
Slides from Enterprise Search & Analytics Meetup @ Cisco Systems - http://www.meetup.com/Enterprise-Search-and-Analytics-Meetup/events/220742081/
Relevancy and Search Quality Analysis - By Mark David and Avi Rappoport
The Manifold Path to Search Quality
To achieve accurate search results, we must come to an understanding of the three pillars involved.
1. Understand your data
2. Understand your customers’ intent
3. Understand your search engine
The first path passes through Data Analysis and Text Processing.
The second passes through Query Processing, Log Analysis, and Result Presentation.
Everything learned from those explorations feeds into the final path of Relevancy Ranking.
Search quality is focused on end users finding what they want -- technical relevance is sometimes irrelevant! Working with the short head (very frequent queries) has the most return on investment for improving the search experience, tuning the results, for example, to emphasize recent documents or de-emphasize archive documents, near-duplicate detection, exposing diverse results in ambiguous situations, using synonyms, and guiding search via best bets and auto-suggest. Long-tail analysis can reveal user intent by detecting patterns, discovering related terms, and identifying the most fruitful results by aggregated behavior. all this feeds back into the regression testing, which provides reliable metrics to evaluate the changes.
By merging these insights, you can improve the quality of the search overall, in a scalable and maintainable fashion.
Slides for the iDB summer school (Sapporo, Japan) http://db-event.jpn.org/idb2013/
Typically, Web mining approaches have focused on enhancing or learning about user seeking behavior, from query log analysis and click through usage, employing the web graph structure for ranking to detecting spam or web page duplicates. Lately, there's a trend on mining web content semantics and dynamics in order to enhance search capabilities by either providing direct answers to users or allowing for advanced interfaces or capabilities. In this tutorial we will look into different ways of mining textual information from Web archives, with a particular focus on how to extract and disambiguate entities, and how to put them in use in various search scenarios. Further, we will discuss how web dynamics affects information access and how to exploit them in a search context.
Slides for my full-day information architecture workshop. Will teach in Minneapolis, MN (November 12, 2012) and Toronto, ON (November 29, 2012) Details: http://rosenfeldmedia.com/workshops/
"The greater promise of Big Data lies not in doing old things in slightly new ways. Instead, it lies in doing new things that were previously not possible. One major class of new things is adding intelligence to large-scale systems. In this session I will present a survey of how machine learning can be applied to real-life situations without having to get a PhD in advanced mathematics. These systems can be built today from open source components to increase business revenues by understanding what customers need and want. I will provide real world examples of best practices and pitfalls in machine learning including practical ways to build maintainable, high performance systems." - Ted Dunning
Similar to Exploratory Search upon Semantically Described Web Data Sources: Service registration and methodology. At vldb2012 (20)
Hierarchical Transformers for User Semantic Similarity - ICWE 2023Marco Brambilla
We discuss the use of hierarchical transformers for user semantic similarity in the context of analyzing users' behavior and profiling social media users. The objectives of the research include finding the best model for computing semantic user similarity, exploring the use of transformer-based models, and evaluating whether the embeddings reflect the desired similarity concept and can be used for other tasks.
We use a large dataset of Twitter users and apply an automatic labeling approach. The dataset consists of English tweets posted in November and December 2020, totaling about 27GB of compressed data. Preprocessing steps include filtering out short texts, cleaning user connections, and selecting a benchmark set of users for evaluation.
Since Transformer architectures are known to work well on short text, we cannot use them on extensive collections of tweets describing the activity of a user. Therefore, we propose a hierarchical structure of transformer models to be used first on tweets and then on their aggregations.
The models used in the study include hierarchical transformers, and the tweet embeddings are obtained using four Transformer-based models: RoBERTa2, BERTweet3, Sentence BERT4, and Twitter4SSE5. The researchers test different techniques for processing tweet embeddings to generate accurate user embeddings, including mean pooling, recurrence over BERT (RoBERT), and transformer over BERT (ToBERT).
The evaluation of the models is done on a set of 5,000 users, comparing user similarities with 30 other candidate users, 5 of which are considered similar and 25 considered dissimilar. The evaluation metrics used include mean average precision (MAP), mean reciprocal rank (MRR) at 10, and normalized discounted cumulative gain (nDCG).
The optimization process involves selecting a loss function and using the AdamW optimizer with specific hyperparameters. The results show that the hierarchical approach with a Stage-1 Twitter4SSE model and a Stage-2 Transformer model performs the best among the alternatives.
In conclusion, the research provides a large unbiased dataset for user similarity analysis, presents a hierarchical language model optimized for accurate user similarity computation, and validates the models' performance on similarity tasks, with potential applications to related problems.
The future work includes investigating the impact of time and topic drift on the models' performance.
Exploring the Bi-verse.A trip across the digital and physical ecospheresMarco Brambilla
The Web and social media are the environments where people post their content, opinions, activities, and resources. Therefore, a considerable amount of user-generated content is produced every day for a wide variety of purposes. On the other side, people live their everyday life immersed in the physical world, where society, economy, politics and personal relations continuously evolve. These two opposite and complementary environment are today fully integrated: they reflect each other and they interact with each other in a stronger and stronger way.
Exploring and studying content and data coming from both environments offers a great opportunity to understand the ever evolving modern society, in terms of topics of interest, events, relations, and behaviour.
In this speech I will discuss through business cases and socio-political scenarios how we can extract insights and understand reality by combining and analyzing data from the digital and physical world, so as to reach a better overall picture of reality itself. Along this path, we need to keep into account that reality is complex and varies in time, space and along many other dimensions, including societal and economic variables. The speech highlights the main challenges that need to be addressed and outlines some data science strategies that can be applied to tackle these specific challenges.
This slide deck has been presented as a keynote speech at WISE 2022 in Biarritz, France.
In online social media platforms, users can express their ideas by posting original content or by adding comments and responses to existing posts, thus generating virtual discussions and conversations. Studying these conversations is essential for understanding the online communication behavior of users. This study proposes a novel approach to retrieve popular patterns on online conversations using network-based analysis. The analysis consists of two main stages: intent analysis and network generation. Users’ intention is detected using keyword-based categorization of posts and comments, integrated with classification through Naïve Bayes and Support Vector Machine algorithms for uncategorized comments. A continuous human-in-the-loop approach further improves the keyword-based classification. To build and understand communication patterns among the users, we build conversation graphs starting from the hierarchical structure of posts and comments, using a directed multigraph network. The experiments categorize 90% comments with 98% accuracy on a real social media dataset. The model then identifies relevant patterns in terms of shape and content; and finally determines the relevance and frequency of the patterns. Results show that the most popular online discussion patterns obtained from conversation graphs resemble real-life interactions and communication.
Trigger.eu: Cocteau game for policy making - introduction and demoMarco Brambilla
COCTEAU stands for "Co-Creating the European Union".
It's a project supported by the European Union whose objective is to involve citizens to cooperate alongside policy makers, contributing to build a better future.
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...Marco Brambilla
A large audience of users and typically a long time frame are needed to produce sensible and useful log data, making it an expensive task.
To address this limit, we propose a method that focuses on the generation of REALISTIC NAVIGATIONAL PATHS, i.e., web logs .
Our approach is extremely relevant because it can at the same time tackle the problem of lack of publicly available data about web navigation logs, and also be adopted in industry for AUTOMATIC GENERATION OF REALISTIC TEST SETTINGS of Web sites yet to be deployed.
The generation has been implemented using deep learning methods for generating more realistic navigation activities, namely
Recurrent Neural Network, which are very well suited to temporally evolving data
Generative Adversarial Network: neural networks aimed at generating new data, such as images or text, very similar to the original ones and sometimes indistinguishable from them, that have become increasingly popular in recent years.
We run experiments using open data sets of weblogs as training, and we run tests for assessing the performance of the methods. Results in generating new weblog data are quite good with respect to the two evaluation metrics adopted (BLEU and Human evaluation).
Our study is described in detail in the paper published at ICWE 2020 – International Conference on Web Engineering with DOI: 10.1007/978-3-030-50578-3. It’s available online on the Springer Web site.
Analyzing rich club behavior in open source projectsMarco Brambilla
The network of collaborations in an open source project can reveal relevant emergent properties that influence its prospects of success.
In this work, we analyze open source projects to determine whether they exhibit a rich-club behavior, i.e., a phenomenon where contributors with a high number of collaborations (i.e., strongly connected within the collaboration network)
are likely to cooperate with other well-connected individuals. The presence or absence of a rich-club has an impact on the sustainability and robustness of the project.
For this analysis, we build and study a dataset with the 100 most popular projects in GitHub, exploiting connectivity patterns in the graph structure of collaborations that arise from commits, issues and pull requests. Results show that rich-club behavior is present in all the projects, but only few of them have an evident club structure. We compute coefficients both for single source graphs and the overall interaction graph, showing that rich-club behavior varies across different layers of software development. We provide possible explanations of our results, as well as implications for further analysis.
Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...Marco Brambilla
In this study, we demonstrate that the computational social science is important to understand people behavior in political phenomena, and based on the long-running Brexit debate analysis on Twitter, we predict the public stance, discussion topics, and we measure the involvement of automated accounts and politicians’ social media accounts.
Community analysis using graph representation learning on social networksMarco Brambilla
In a world more and more connected, new and complex interaction
patterns can be extracted in the communication between people.
This is extremely valuable for brands that can better understand
the interests of users and the trends on social media to better target
their products. In this paper, we aim to analyze the communities
that arise around commercial brands on social networks to understand
the meaning of similarity, collaboration, and interaction
among users.We exploit the network that builds around the brands
by encoding it into a graph model.We build a social network graph,
considering user nodes and friendship relations; then we compare
it with a heterogeneous graph model, where also posts and hashtags
are considered as nodes and connected to the different node
types; we finally build also a reduced network, generated by inducing
direct user-to-user connections through the intermediate
nodes (posts and hashtags). These different variants are encoded
using graph representation learning, which generates a numerical
vector for each node. Machine learning techniques are applied to
these vectors to extract valuable insights for each user and for the
communities they belong to. In the paper, we report on our experiments
performed on an emerging fashion brand on Instagram, and
we show that our approach is able to discriminate potential customers
for the brand, and to highlight meaningful sub-communities
composed by users that share the same kind of content on social
networks.
Data Cleaning for social media knowledge extractionMarco Brambilla
Social media platforms let users share their opinions through textual or multimedia content. In many settings, this becomes a valuable source of knowledge that can be exploited for specific business objectives. Brands and companies often ask to monitor social media as sources for understanding the stance, opinion, and sentiment of their customers, audience and potential audience. This is crucial for them because it let them understand the trends and future commercial and marketing opportunities.
However, all this relies on a solid and reliable data collection phase, that grants that all the analyses, extractions and predictions are applied on clean, solid and focused data. Indeed, the typical topic-based collection of social media content performed through keyword-based search typically entails very noisy results.
We recently implemented a simple study aiming at cleaning the data collected from social content, within specific domains or related to given topics of interest. We propose a basic method for data cleaning and removal of off-topic content based on supervised machine learning techniques, i.e. classification, over data collected from social media platforms based on keywords regarding a specific topic. We define a general method for this and then we validate it through an experiment of data extraction from Twitter, with respect to a set of famous cultural institutions in Italy, including theaters, museums, and other venues.
For this case, we collaborated with domain experts to label the dataset, and then we evaluated and compared the performance of classifiers that are trained with different feature extraction strategies.
Iterative knowledge extraction from social networks. The Web Conference 2018Marco Brambilla
Knowledge in the world continuously evolves, and ontologies are largely incomplete, especially regarding data belonging to the so-called long tail. We propose a method for discovering emerging knowledge by extracting it from social content. Once initialized by domain experts, the method is capable of finding relevant entities by means of a mixed syntactic-semantic method. The method uses seeds, i.e. prototypes of emerging entities provided by experts, for generating candidates; then, it associates candidates to feature vectors built by using terms occurring in their social content and ranks the candidates by using their distance from the centroid of seeds, returning the top candidates. Our method can run iteratively, using the results as new seeds.
In this paper we address the following research questions: (1) How does the reconstructed domain knowledge evolve if the candidates of one extraction are recursively used as seeds (2) How does the reconstructed domain knowledge spread geographically (3) Can the method be used to inspect the past, present, and future of knowledge (4) Can the method be used to find emerging knowledge?.
This work was presented at The Web Conference 2018, MSM workshop.
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...Marco Brambilla
Over one billion cars interact with each other on the road every day. Each driver has his own driving style, which could impact safety, fuel economy and road congestion. Knowledge about the driving style of the driver could be used to encourage ``better" driving behaviour through immediate feedback
while driving, or by scaling auto insurance rates based on the aggressiveness of the driving style.
In this work we report on our study of driving behaviour profiling based on unsupervised data mining methods. The main goal is to detect the different driving behaviours, and thus to cluster drivers with similar behaviour.
This paves the way to new business models related to the driving sector, such as Pay-How-You-Drive insurance
policies and car rentals.
Driver behavioral characteristics are studied by collecting information from GPS sensors on the cars and by applying three different analysis approaches (DP-means, Hidden Markov Models, and Behavioural Topic Extraction) to the contextual scene detection problems on car trips, in order to detect different
behaviour along each trip. Subsequently, drivers are clustered in similar profiles based on that and the results are compared with a human-defined groundtruth on drivers classification. The proposed framework is tested on a real dataset containing sampled car signals. While the different approaches show relevant differences in trip segment classification, the coherence of the final driver clustering results is surprisingly high.
Myths and challenges in knowledge extraction and analysis from human-generate...Marco Brambilla
For centuries, science (in German "Wissenschaft") has aimed to create ("schaften") new knowledge ("Wissen") from the observation of physical phenomena, their modelling, and empirical validation. Recently, a new source of knowledge has emerged: not (only) the physical world any more, but the virtual world, namely the Web with its ever-growing stream of data materialized in the form of social network chattering, content produced on demand by crowds of people, messages exchanged among interlinked devices in the Internet of Things. The knowledge we may find there can be dispersed, informal, contradicting, unsubstantiated and ephemeral today, while already tomorrow it may be commonly accepted. The challenge is once again to capture and create knowledge that is new, has not been formalized yet in existing knowledge bases, and is buried inside a big, moving target (the live stream of online data). The myth is that existing tools (spanning fields like semantic web, machine learning, statistics, NLP, and so on) suffice to the objective. While this may still be far from true, some existing approaches are actually addressing the problem and provide preliminary insights into the possibilities that successful attempts may lead to.
The talk explores the mixed realistic-utopian domain of knowledge extraction and reports on some tools and cases where digital and physical world have brought together for better understanding our society.
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Marco Brambilla
Knowledge bases like DBpedia, Yago or Google's Knowledge
Graph contain huge amounts of ontological knowledge harvested from
(semi-)structured, curated data sources, such as relational databases or
XML and HTML documents. Yet, the Web is full of knowledge that is
not curated and/or structured and, hence, not easily indexed, for ex-
ample social data. Most work so far in this context has been dedicated
to the extraction of entities, i.e., people, things or concepts. This poster
describes our work toward the extraction of relationships among entities.
The objective is reconstructing a typed graph of entities and relation-
ships to represent the knowledge contained in social data, without the
need for a-priori domain knowledge. The experiments with real datasets
show promising performance across a variety of domains.
The key distinguishing
feature of the work is its focus on highly unstructured social data (tweets and
Facebook posts) without reliable grammar structures. Traditional relation extraction approaches supervised , semi-supervised or unsupervised,
commonly assume the availability of grammatically correct language corpora.
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...Marco Brambilla
Internet of Things technologies and applications are evolving and continuously gaining traction in all fields and environments, including homes, cities, services, industry and commercial enterprises. However, still many problems need to be addressed. For instance, the
IoT vision is mainly focused on the technological and infrastructure aspect, and on the management and analysis of the huge amount of generated data, while so far the development of front-end and user interfaces for
IoT has not played a relevant role in research. On the contrary, user interfaces in the IoT ecosystem they can play a key role in the acceptance of solutions by final adopters. In this paper we present a model-driven approach to the design of IoT interfaces, by defining a specific visual design language and design patterns for IoT\ applications, and we show them at work. The language we propose is defined as an extension of the OMG standard language called IFML.
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.Marco Brambilla
Consumer-centered software applications nowadays are required
to be available both as mobile and desktop versions.
However, the app design is frequently made only for one of
the two (i.e., mobile first or web first) while missing an appropriate
design for the other (which, in turn, simply mimics
the interaction of the first one). This results into poor quality
of the interaction on one or the other platform. Current solutions
would require different designs, to be realized through
different design methods and tools, and that may require to
double development and maintenance costs.
In order to mitigate such an issue, this paper proposes a
novel approach that supports the design of both web and mobile
applications at once. Starting from a unique requirement
and business specification, where web– and mobile–specific
aspects are captured through tagging, we derive a platform independent
design of the system specified in IFML. This
model is subsequently refined and detailed for the two platforms,
and used to automatically generate both the web and
mobile versions. If more precise interactions are needed for
the mobile part, a blending with MobML, a mobile-specific
modeling language, is devised. Full traceability of the relations
between artifacts is granted.
The Web Science course focuses on the study of large-scale socio-technical systems associated with the World Wide Web. It considers the relationship between people and technology, the ways that society and technology complement one another and the way they impact on broader society. These analyses are inherently associated with Big Data management issues.
The course is organised in four parts.
1. Syntax
In the first part, the course introduces the basis of content analysis. If focuses on the syntactic aspects, covering the fundamentals of natural language processing and text mining. It describes the structure and typical characteristics of the different web sources, spanning search results, social media contents, social network structures, Web APIs, and so on. It also provides an overview of the basic Web analysis techniques applied in Web search and Web recommendation.
2. Semantics
In the second part, the course presents semantic technologies. These technologies are very important nowadays because they allow to treat the "variety" dimension of Big Data, i.e., they enable integration of multiple and diverse sources of information, which is typical on the modern Web platform. Covered topics include:
- RDF - a flexible data model to represent heterogeneous data
- OWL - a flexible ontological language to model heterogeneous data sources
- SPARQL - a query language for RDF.
It shows how to put all the pieces together in order to achieve interoperability among heterogeneous information sources
3. Time
The third part covers the realm of temporal-dependent data. The topics covered here allow to treat the "velocity" dimension of Big Data. It shows the importance for many Big Data analysis scenarios to process data stream, coming for instance from Internet of Things (IoT) and Social Media sources; and it describes how to apply semantic and syntactic techniques in the context of time-dependent information. For instance, it shows how to extend RDF to model RDF streams, how to extend SPARQL to continuously process RDF streams and how to reason on those RDF Streams
4. Applications
In the fourth part, the course focuses on specific application scenarios and presents the typical settings and problems where the presented techniques can be applied. This part discusses settings such as: big data analysis for smart cities; data analytics for brand monitoring (marketing) and event monitoring; data analysis for trend detection and user engagement; and so on.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Exploratory Search upon Semantically Described Web Data Sources: Service registration and methodology. At vldb2012
1. Exploratory Search upon
Semantically Described
Web Data Sources
Marco Brambilla
Politecnico di Milano
marco.brambilla@polimi.it
marcobrambi
SSW workshop @ VLDB 2012, Istanbul, Turkey
3. Context
Web is a huge, heterogeneous data source:
Structured, unstructured and semi-structured data
Known problems of trust, reputation, consistency
User needs to solve real-life problems, not to find a web site
5. Context
User needs to solve real-life problems, not to find a web site
Web queries get increasingly complex and specialized
Exploratory search
From document search to object search
Search as a service
Viability of systems based upon search service orchestration
6. What are search services?
• APIs over Web data sources
• Structured data
• Domain-specific
• Wrapping of information utility sites
7. How can we use them?
• Applying complex queries (also with “joins”)
“… search for upcoming concerts close to an attractive location (like a
beach, lake, mountain, natural park, and so on), considering also availability
of good, close-by hotels …”
8. Background: semantic multi-domain search
“… expand the search to get information about available restaurants near
the candidate concert locations, news associated to the event and possible
options to combine further events …”
9. Liquid Query: Query Submission
Example Scenario 1: Trip planner for events
Concert Hotels
query conditions query conditions
11. Liquid Query: alternative visualizations
and domain-independent platform
Example Scenario 2: Scientific Publication search
12. Problem 1: Service specification
• No service description per se
• Focused on search
• Ranking aware
• Description
• Bottom-up
• Based on the service interface
• Annotation
• Relying on an external reference knowledge base
15. The registration of services
Bottom-up approach from the service signatures
Registration process fully specified and implemented
• Starts from SI details (name, type of service, etc.) and SI field
details, i.e. name, data type and I/O directionality
• the name and I/O fields of the SI are scanned with NLP and
Semantic techniques in order to identify the most suitable
Domain Diagram items to represent them
• The expert user's intervention is required to provide a
feedback concerning system-hypothesized mappings
• When all mappings have been validated, a newly created
Access Pattern and its corresponding Service Mart are
committed
See demo video at: http://search-computing.it/registration_demo
16. Example of resulting service mart
• A set of predefined combinations of services, to be
reused for specific cases
17. Problem 2: Reduce flexibility
Maximum flexibility over huge amounts of search services is not always the
best solution
People want straightforward paths and want to be quick
Commercial implementations are likely to be on fixed sets of domains and
fixed exploration directions
18. Design Patterns
• A set of blueprint combinations of services, to be
reused for different cases
• Very much like UML design patterns or datamart
patterns
23. Exploration implementation
• Not just a matter of data sources
• Also: data visualization, user interface specification,
usability, ..
See demo videos at:
http://demo.search-computing.net/night_planner_demo/seco/seco.html
http://demo.search-computing.net/new_job_demo/seco/seco.html
24. Problem 3 - Outlook
When dealing with real-life problems, people do not trust the web
completely
Want to go back to discussion with people
Expect insights, opinions, reassurance
Exploratory search must be blended with social-network based
recommendations and inputs
25. Social Search: increasing quality in search
• From exploratory search to friends and experts feedback
Initial
query
Exploration
Exploratory step Human
Search Search
System System
Exploration
step
System API Social API
Database / Crowd /
IR index Community
26. Example: Find your job (social invitation)
Selected data items
can be transferred
to the crowd question
28. Conclusions and future work
Well, I’ve shown everything..
See our papers at WWW 2010 (Liquid Query) and WWW 2012
(CrowdSearcher)
Future work
• More experiments (e.g., vs. sociality of users, vs. crowds, …)
• Not only search: active integration of web structured data and
social sensors
Some ads
• Search Computing book series (Springer LNCS)
• Workshop Very Large Data Search at VLDB
• VLDB Journal special issue (deadline Sept 2012)
29. Thanks!
Questions?
Marco Brambilla
marco.brambilla@polimi.it
marcobrambi