This document discusses different model representations for large meta-model based datasets. It compares object-by-object representation to fragmentation strategies. Fragmentation breaks models into multiple fragments stored separately. The document evaluates different fragmentation strategies through theoretical analysis and implementation tests. It also compares part-of-source and relational representations and discusses applications of model fragmentation including software engineering and scientific data analysis.
Presentation about an eclipse framework that allows to generate ecore model instances as input for tests and benchmarks. Held at the 3rd BigMDE workshop at STAF in L'Aquia, Italy in July 2015.
Creating and Analyzing Source Code Repository Models - A Model-based Approach...Markus Scheidgen
With mining software repositories (MSR), we analyze the rich data created during the whole evolution of one or more software projects. One major obstacle in MSR is the heterogeneity and complexity of source code as a data source. With model-based technology in general and reverse engineering in particular, we can use abstraction to overcome this obstacle. But, this raises a new question: can we apply existing reverse engineering frameworks that were designed to create models from a single revision of a software system to analyze all revisions of such a system at once? This paper presents a framework that uses a combination of EMF, the reverse engineering framework Modisco, a NoSQL-based model persistence framework, and OCL-like expressions to create and analyze fully resolved AST-level model representations of whole source code repositories. We evaluated the feasibility of this approach with a series of experiments on the Eclipse code-base.
A novel approach for clone group mappingijseajournal
Clone group mapping has a very important significance in the evolution of code clone. The topic modeling
techniques were applied into code clone firstly and a new clone group mapping method was proposed. The
method is very effective for not only Type-1 and Type-2 clone but also Type-3 clone .By making full use of
the source text and structure information, topic modeling techniques transform the mapping problem of
high-dimensional code space into a low-dimensional topic space, the goal of clone group mapping was
indirectly reached by mapping clone group topics. Experiments on four open source software show that the
recall and precision are up to 0.99, thus the method can effectively and accurately reach the goal of clone
group mapping.
Presentation about an eclipse framework that allows to generate ecore model instances as input for tests and benchmarks. Held at the 3rd BigMDE workshop at STAF in L'Aquia, Italy in July 2015.
Creating and Analyzing Source Code Repository Models - A Model-based Approach...Markus Scheidgen
With mining software repositories (MSR), we analyze the rich data created during the whole evolution of one or more software projects. One major obstacle in MSR is the heterogeneity and complexity of source code as a data source. With model-based technology in general and reverse engineering in particular, we can use abstraction to overcome this obstacle. But, this raises a new question: can we apply existing reverse engineering frameworks that were designed to create models from a single revision of a software system to analyze all revisions of such a system at once? This paper presents a framework that uses a combination of EMF, the reverse engineering framework Modisco, a NoSQL-based model persistence framework, and OCL-like expressions to create and analyze fully resolved AST-level model representations of whole source code repositories. We evaluated the feasibility of this approach with a series of experiments on the Eclipse code-base.
A novel approach for clone group mappingijseajournal
Clone group mapping has a very important significance in the evolution of code clone. The topic modeling
techniques were applied into code clone firstly and a new clone group mapping method was proposed. The
method is very effective for not only Type-1 and Type-2 clone but also Type-3 clone .By making full use of
the source text and structure information, topic modeling techniques transform the mapping problem of
high-dimensional code space into a low-dimensional topic space, the goal of clone group mapping was
indirectly reached by mapping clone group topics. Experiments on four open source software show that the
recall and precision are up to 0.99, thus the method can effectively and accurately reach the goal of clone
group mapping.
Often information is spread among
several data sources, such as hospital databases, lab databases,
spreadsheets, etc. Moreover, the complexity of each of these data sources
might make it difficult for end-users to access them, and even
more, to query all of them at the same time.
A new solution that has been proposed to this problem is
ontology-based data access (OBDA).
OBDA is a popular paradigm, developed since the mid 2000s, to query
various types of data sources
using a common vocabulary familiar to the end-users. In a nutshell
OBDA separates the user
from the data sources (relational databases, CVS files, etc.) by means
of an ontology, which is a common terminology that provides the user with a
convenient query vocabulary, hides the structure of the data sources,
and can enrich incomplete data with background knowledge. About a
dozen OBDA systems have been implemented in both academia and
industry.
In this tutorial we will give an overview of OBDA, and our system -ontop-
which is currently being used in the context of the European project
Optique. We will discuss how to use -ontop- for data integration,
in particular concentrating on:
– How to create an ontology (common vocabulary) for a life science domain.
– How to map available data sources to this ontology.
– How to query the database using the terms in the ontology.
– How to check consistency of the data sources w.r.t. the ontology
Ontology-based data access: why it is so cool!Josef Hardi
A brief introduction about ontology-based data access (shortly OBDA) and its core implementation. I presented too a recent simple benchmark between -ontop- and Semantika---two most available software for OBDA framework---in term of query performance (including details in the appendix section). The slides were presented for Friday Research Meeting in Stanford Center for Biomedical Informatics Research (BMIR).
License: Creative Commons by Attribution 3.0
A tutorial on how to create mappings using ontop, how inference (OWL 2 QL and RDFS) plays a role answering SPARQL queries in ontop, and how ontop's support for on-the-fly SQL query translation enables scenarios of semantic data access and data integration.
Ontop: Answering SPARQL Queries over Relational DatabasesGuohui Xiao
We present Ontop, an open-source Ontology-Based Data Access (OBDA) system that allows for querying relational data sources through a conceptual representation of the domain of interest, provided in terms of an ontology, to which the data sources are mapped. Key features of Ontop are its solid theoretical foundations, a virtual approach to OBDA, which avoids materializing triples and is implemented through the query rewriting technique, extensive optimizations exploiting all elements of the OBDA architecture, its compliance to all relevant W3C recommendations (including SPARQL queries, R2RML mappings, and OWL 2 QL and RDFS ontologies), and its support for all major relational databases.
Results of the FLOSSMetrics project, and some related tools. Presentation at the master's program on development and mangement of free software projects (Vigo, Spain).
A Taxonomy for Program Metamodels in Program Reverse EngineeringHironori Washizaki
Hironori Washizaki, Yann-Gael Gueheneuc, Foutse Khomh, “A Taxonomy for Program Metamodels in Program Reverse Engineering,” 32nd IEEE International Conference on Software Maintenance and Evolution (ICSME) (CORE Rank A), October 2-10, Raleigh, North Carolina, USA. (to appear) (acceptance rate 29%=37/127) http://www.washi.cs.waseda.ac.jp/
FLOSSMETRICS: The main objective of FLOSSMETRICS is to construct, publish and analyse a large scale database with information and metrics about libre software development coming from several thousands of software projects, using existing methodologies, and tools already developed. The project will also provide a public platform for validation and industrial exploitation of results.
A Platform for Application Risk IntelligenceCheckmarx
Using Source Code Understanding as a Risk Barometer:
Source Code Analysis technologies have significantly evolved in recent years – making improvements in precision and accuracy with the introduction of new analysis techniques like flow analysis. This article describes this evolution and how the most advanced capabilities available today like query-based analysis and Knowledge Discovery can be leveraged to create a platform for Application Risk Intelligence (ARI) to help implement a proactive security program.
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...Raffi Khatchadourian
Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today’s data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major cross-cutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and put forth several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffMartin Pinzger
Software systems continuously change and developers spent a large portion of their time in keeping track and understanding changes and their effects. Current development tools provide only limited support. Most of all, they track changes in source files only on the level of textual lines lacking semantic and context information on changes. Developers frequently need to reconstruct this information manually which is a time consuming and error prone task. In this talk, I present three techniques to address this problem by extracting detailed syntactical information from changes in various source files. I start with introducing ChangeDistiller, a tool and approach to extract information on source code changes on the level of ASTs. Next, I present the WSDLDiff approach to extract information on changes in web services interface description files. Finally, I present FMDiff, an approach to extract changes from feature models defined with the linux Kconfig language. For each approach I report on cases studies and experiments to highlight the benefits of our techniques. I also point out several research opportunities opened by our techniques and tools, and the detailed data on changes extracted by them.
Often information is spread among
several data sources, such as hospital databases, lab databases,
spreadsheets, etc. Moreover, the complexity of each of these data sources
might make it difficult for end-users to access them, and even
more, to query all of them at the same time.
A new solution that has been proposed to this problem is
ontology-based data access (OBDA).
OBDA is a popular paradigm, developed since the mid 2000s, to query
various types of data sources
using a common vocabulary familiar to the end-users. In a nutshell
OBDA separates the user
from the data sources (relational databases, CVS files, etc.) by means
of an ontology, which is a common terminology that provides the user with a
convenient query vocabulary, hides the structure of the data sources,
and can enrich incomplete data with background knowledge. About a
dozen OBDA systems have been implemented in both academia and
industry.
In this tutorial we will give an overview of OBDA, and our system -ontop-
which is currently being used in the context of the European project
Optique. We will discuss how to use -ontop- for data integration,
in particular concentrating on:
– How to create an ontology (common vocabulary) for a life science domain.
– How to map available data sources to this ontology.
– How to query the database using the terms in the ontology.
– How to check consistency of the data sources w.r.t. the ontology
Ontology-based data access: why it is so cool!Josef Hardi
A brief introduction about ontology-based data access (shortly OBDA) and its core implementation. I presented too a recent simple benchmark between -ontop- and Semantika---two most available software for OBDA framework---in term of query performance (including details in the appendix section). The slides were presented for Friday Research Meeting in Stanford Center for Biomedical Informatics Research (BMIR).
License: Creative Commons by Attribution 3.0
A tutorial on how to create mappings using ontop, how inference (OWL 2 QL and RDFS) plays a role answering SPARQL queries in ontop, and how ontop's support for on-the-fly SQL query translation enables scenarios of semantic data access and data integration.
Ontop: Answering SPARQL Queries over Relational DatabasesGuohui Xiao
We present Ontop, an open-source Ontology-Based Data Access (OBDA) system that allows for querying relational data sources through a conceptual representation of the domain of interest, provided in terms of an ontology, to which the data sources are mapped. Key features of Ontop are its solid theoretical foundations, a virtual approach to OBDA, which avoids materializing triples and is implemented through the query rewriting technique, extensive optimizations exploiting all elements of the OBDA architecture, its compliance to all relevant W3C recommendations (including SPARQL queries, R2RML mappings, and OWL 2 QL and RDFS ontologies), and its support for all major relational databases.
Results of the FLOSSMetrics project, and some related tools. Presentation at the master's program on development and mangement of free software projects (Vigo, Spain).
A Taxonomy for Program Metamodels in Program Reverse EngineeringHironori Washizaki
Hironori Washizaki, Yann-Gael Gueheneuc, Foutse Khomh, “A Taxonomy for Program Metamodels in Program Reverse Engineering,” 32nd IEEE International Conference on Software Maintenance and Evolution (ICSME) (CORE Rank A), October 2-10, Raleigh, North Carolina, USA. (to appear) (acceptance rate 29%=37/127) http://www.washi.cs.waseda.ac.jp/
FLOSSMETRICS: The main objective of FLOSSMETRICS is to construct, publish and analyse a large scale database with information and metrics about libre software development coming from several thousands of software projects, using existing methodologies, and tools already developed. The project will also provide a public platform for validation and industrial exploitation of results.
A Platform for Application Risk IntelligenceCheckmarx
Using Source Code Understanding as a Risk Barometer:
Source Code Analysis technologies have significantly evolved in recent years – making improvements in precision and accuracy with the introduction of new analysis techniques like flow analysis. This article describes this evolution and how the most advanced capabilities available today like query-based analysis and Knowledge Discovery can be leveraged to create a platform for Application Risk Intelligence (ARI) to help implement a proactive security program.
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...Raffi Khatchadourian
Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today’s data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major cross-cutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and put forth several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffMartin Pinzger
Software systems continuously change and developers spent a large portion of their time in keeping track and understanding changes and their effects. Current development tools provide only limited support. Most of all, they track changes in source files only on the level of textual lines lacking semantic and context information on changes. Developers frequently need to reconstruct this information manually which is a time consuming and error prone task. In this talk, I present three techniques to address this problem by extracting detailed syntactical information from changes in various source files. I start with introducing ChangeDistiller, a tool and approach to extract information on source code changes on the level of ASTs. Next, I present the WSDLDiff approach to extract information on changes in web services interface description files. Finally, I present FMDiff, an approach to extract changes from feature models defined with the linux Kconfig language. For each approach I report on cases studies and experiments to highlight the benefits of our techniques. I also point out several research opportunities opened by our techniques and tools, and the detailed data on changes extracted by them.
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development. However the same is not so true for data intensive problems even though commercial clouds presumably devote more resources to data analytics than supercomputers devote to simulations. We try to establish some principles that allow one to compare data intensive architectures and decide which applications fit which machines and which software.
We use a sample of over 50 big data applications to identify characteristics of data intensive applications and propose a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks. We consider hardware from clouds to HPC. Our software analysis builds on the Apache software stack (ABDS) that is well used in modern cloud computing, which we enhance with HPC concepts to derive HPC-ABDS.
We illustrate issues with examples including kernels like clustering, and multi-dimensional scaling; cyberphysical systems; databases; and variants of image processing from beam lines, Facebook and deep-learning.
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development. However the same is not so true for data intensive problems even though commercial clouds presumably devote more resources to data analytics than supercomputers devote to simulations. We try to establish some principles that allow one to compare data intensive architectures and decide which applications fit which machines and which software.
We use a sample of over 50 big data applications to identify characteristics of data intensive applications and propose a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks. We consider hardware from clouds to HPC. Our software analysis builds on the Apache software stack (ABDS) that is well used in modern cloud computing, which we enhance with HPC concepts to derive HPC-ABDS.
We illustrate issues with examples including kernels like clustering, and multi-dimensional scaling; cyberphysical systems; databases; and variants of image processing from beam lines, Facebook and deep-learning.
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
RAMSES: A new project in data-driven analytical modeling of distributed systems
RAMSES is a new DOE-funded project on the end-to-end analytical performance modeling of science workflows in extreme-scale science environments. It aims to link multiple threads of inquiry that have not, until now, been adequately connected: namely, first-principles performance modeling within individual sub-disciplines (e.g., networks, storage systems, applications), and data-driven methods for evaluating, calibrating, and synthesizing models of complex phenomena. What makes this fusion necessary is the drive to explain, predict, and optimize not just individual system components but complex end-to-end workflows. In this talk, I will introduce the goals of the project and some aspects of our technical approach.
High Performance Computing and Big Data Geoffrey Fox
We propose a hybrid software stack with Large scale data systems for both research and commercial applications running on the commodity (Apache) Big Data Stack (ABDS) using High Performance Computing (HPC) enhancements typically to improve performance. We give several examples taken from bio and financial informatics.
We look in detail at parallel and distributed run-times including MPI from HPC and Apache Storm, Heron, Spark and Flink from ABDS stressing that one needs to distinguish the different needs of parallel (tightly coupled) and distributed (loosely coupled) systems.
We also study "Java Grande" or the principles to use to allow Java codes to perform as fast as those written in more traditional HPC languages. We also note the differences between capacity (individual jobs using many nodes) and capability (lots of independent jobs) computing.
We discuss how this HPC-ABDS concept allows one to discuss convergence of Big Data, Big Simulation, Cloud and HPC Systems. See http://hpc-abds.org/kaleidoscope/
Integration Patterns for Big Data ApplicationsMichael Häusler
Big Data technologies like distributed databases, queues, batch processors, and stream processors are fun and exciting to play with. Making them play nicely together can be challenging. Keeping it fun for engineers to continuously improve and operate them is hard. At ResearchGate, we run thousands of YARN applications every day to gain insights and to power user facing features. Of course, there are numerous integration challenges on the way:
* integrating batch and stream processors with operational systems
* ingesting data and playing back results while controlling performance crosstalk
* rolling out new versions of synchronous, stream, and batch applications and their respective data schemas
* controlling the amount of glue and adapter code between different technologies
* modeling cross-flow dependencies while handling failures gracefully and limiting their repercussions
We describe our ongoing journey in identifying patterns and principles to make our big data stack integrate well. Technologies to be covered will include MongoDB, Kafka, Hadoop (YARN), Hive (TEZ), Flink Batch, and Flink Streaming.
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsLightbend
Audience: Architects, Data Scientists, Developers
Technical level: Introductory
From home intrusion detection, to self-driving cars, to keeping data center operations healthy, Machine Learning (ML) has become one of the hottest topics in software engineering today. While much of the focus has been on the actual creation of the algorithms used in ML, the less talked-about challenge is how to serve these models in production, often utilizing real-time streaming data.
The traditional approach to model serving is to treat the model as code, which means that ML implementation has to be continually adapted for model serving. As the amount of machine learning tools and techniques grows, the efficiency of such an approach is becoming more questionable. Additionally, machine learning and model serving are driven by very different quality of service requirements; while machine learning is typically batch, dealing with scalability and processing power, model serving is mostly concerned with performance and stability.
In this webinar with O’Reilly author and Lightbend Principal Architect, Boris Lublinsky, we will define an alternative approach to model serving, based on treating the model itself as data. Using popular frameworks like Akka Streams and Apache Flink, Boris will review how to implement this approach, explaining how it can help you:
* Achieve complete decoupling between the model implementation for machine learning and model serving, enforcing better standardization of your model serving implementation.
* Enable dynamic updates of the served model without having to restart the system.
* Utilize Tensorflow and PMML as model representation and their usage for building “real time updatable” model serving architecture.
Source-to-source transformations: Supporting tools and infrastructurekaveirious
Introduction to source-to-source transformation. Concept and overview. Basics of existing tools (TXL, ROSE, Cetus, EDG, C-to-C, Memphis); pros and cons. Part of an internal evaluation for selecting a source-to-source transformation tool.
Keynote talk at the International Conference on Supercoming 2009, at IBM Yorktown in New York. This is a major update of a talk first given in New Zealand last January. The abstract follows.
The past decade has seen increasingly ambitious and successful methods for outsourcing computing. Approaches such as utility computing, on-demand computing, grid computing, software as a service, and cloud computing all seek to free computer applications from the limiting confines of a single computer. Software that thus runs "outside the box" can be more powerful (think Google, TeraGrid), dynamic (think Animoto, caBIG), and collaborative (think FaceBook, myExperiment). It can also be cheaper, due to economies of scale in hardware and software. The combination of new functionality and new economics inspires new applications, reduces barriers to entry for application providers, and in general disrupts the computing ecosystem. I discuss the new applications that outside-the-box computing enables, in both business and science, and the hardware and software architectures that make these new applications possible.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
With so much talk of how Big Data is revolutionizing the world and how a data lake with Hadoop and/or Spark will solve all your data problems, it is hard to tell what is hype, reality, or somewhere in-between.
In working with dozens of enterprises in varying stages of their enterprise data management (EDM) strategy, MongoDB enterprise architect, Matt Kalan, sees the same challenges and misunderstandings arise again and again.
In this session, he will explain common challenges in data management, what capabilities are necessary, and what the future state of architecture looks like. MongoDB is uniquely capable of filling common gaps in the data lake strategy.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
"Data integration with a façade.
The case of knowledge graph construction." is an overview of recent research in façade-based data access. The slides introduce core notions of façade-based data access and the design principles of SPARQL Anything, a system that allows querying of many formats (CSV, JSON, XML, HTML, Markdown , Excel, ...) in plain SPARQL.
Data lineage has gained popularity in the Machine Learning community as a way to make models and datasets easier to interpret and to help developers debug their ML pipelines by enabling them to go from a model to the dataset/user who trained it. Data provenance and lineage is the process of building up the history of how a data artifact came to be. This history of derivations and interactions can provide a better context for data discovery, debugging, as well as auditing. In this area, others, such as Google and Databricks, have made small steps.
The Hopsworks approach presented provenance information is collected implicitly through the unobtrusive instrumentation of jupyter notebooks and python code - What we call 'implicit provenance'.
Enterprise guide to building a Data MeshSion Smith
Making Data Mesh simple, Open Source and available to all; without vendor lock-in, without complex tooling and to use an approach centered around ‘specifications’, existing tools and baking in a ‘domain’ model.
Similar to Reference Representation in Large Metamodel-based Datasets (20)
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Accelerate your Kubernetes clusters with Varnish Caching
Reference Representation in Large Metamodel-based Datasets
1. Markus Scheidgen
Model representations for
large meta-model based data-sets
■ Introduction: Technological spaces and model representations
■ Comparison of representation
■ Implementation
■ Application
1
2. Introduction:
Technological Spaces
2
Software Models
Code
reverse engineering
code generation
XML
persistence / exchange
databases
persistence/versioning
processing
(via ORMs: e.g. JPA)
Objects
(e.g. POJOs)
debugging/profiling
reflection
runtimemodeling
processing (e.g. dom/jaxb)
exchange (e.g. in web-services) xslt/xsl/
xquery/xpath
model-transformation/
-constraints/-queries
static analysis/compilation/
refactoring
SQL
running programs
other data
otherdata
otherdata
otherda
ta
ot
herdata
3. Introduction:
State of the Art
3
Meta-Models
Models
Schemas
XML
Gammars
Code
Classes
Objects
ER-Schemas
Relational Data
*
visualization and editing
by human users
processing in computer programs
exchange
large data-sets/
persistence and querying
4. Introduction:
New Class of DBMS
4
Meta-Models
Models
Schemas
XML
Gammars
Code
Classes
Objects
ER-Schemas
Relational Data
*
-
Big Data
+
-
Graphs
ER-Schemas
Big Relational Data
?
8. Representation: Object-by-object vs. Fragmentation
(considering traversal, implementation with actual model)
■ Model traversal of Grabats models with four different sizes
and different characteristics
8
set0 set1 set2 set3 set4
0
1
2
3
4
5
6
7
8
XMI
CDO
Morsa
EMFFrag coarse
EMFFrag fine
notmeasured–extrapolated
notmeasured–extrapolated
Objectspersecond(10
4
)
set0 set1 set2 set3 set4
10
3
10
4
10
5
10
6
10
7
Numberoffragments
CDO/Morsa
EMFFrag coarse
EMFFrag fine
9. Representation: Object-by-object vs. Fragmentation
(considering query, implementation with actual model)
■ Query of Grabats models with four different sizes and
different characteristics
9
set0 set1 set2 set3 set4
10
3
10
4
10
5
10
6
10
7
Numberoffragments
CDO/Morsa
EMFFrag coarse
EMFFrag fine
set0 set1 set2 set3 set4
0
50
100
150
200
250
300
350
Executiontime(ins)
XMI
CDO w/o SQL
CDO
Morsa w/o index
Morsa
EMFFrag coarse
EMFFrag fine
notmeasured–extrapolated
notmeasured–extrapolated
notmeasured–extrapolated
notmeasured–extrapolated
10. Representation: Part-of-source vs. Relations
(real implementation, artificial model)
10
10
0
10
2
10
4
10
6
10
1
10
2
10
3
10
4
number of outgoing references
executiontimeinms
10
0
10
2
10
4
10
6
10
1
10
2
10
3
10
4
number of outgoing references
executiontimeinms
Part of source implementation Relation implementation with individual access
access of one outgoing reference
traversal of all outgoing references
access of one outgoing reference
traversal of all outgoing references
11. Representation: Part-of-source vs. Relations
(real implementation, artificial model)
11
10
0
10
2
10
4
10
6
10
1
10
2
10
3
10
4
number of outgoing references
executiontimeinms
Part of source implementation
access of one outgoing reference
traversal of all outgoing references
10
0
10
2
10
4
10
6
10
1
10
2
10
3
10
4
number of outgoing references
executiontimeinms
Relation implementation with scanning
access of one outgoing reference
traversal of all outgoing references
16. Applications: Mining and Analyzing Software
Repositories
■ Software repositories contain more information than the current
software code:
■ “developers who changed class/method/statement X also changed class/
method/statement Y”
■ this information leads to knowledge about dependencies that cannot be
determined through static or even dynamic analysis
■ this can be used to
• predict/find bugs
• understand/improve the code-base
■ dependency information should be stored as relational data
■ When a piece of software evolves, its metrics change. Such
dynamic metrics describe software better than static code metrics.
Could lead to a better assessment of methodologies or
understanding of software engineering in general.
16
17. Applications: Mining and Analyzing Software
Repositories
■ JGit: Java implementation of the Git version control system
■ MoDisco: Reverse engineering framework for eclipse java
projects based on EMF
■ EMF-Compare: Determines matches and differences between
models
■ EMF-Fragments: My own framework for large models
■ over 300 Git repositories with eclipse plug-ins that
constitute the whole eclipse foundation source base as
“example” data-set
17
18. Applications: Model of a Software Repository
18
A B C
A
A B
A D
PB1.R1
B1.R2
B1.R3
B1.R4
B2.R1
B2.R2
A
A B
Repository
Revision Diff
Compilation
Unit
Model
Package Class
...
* * * *
*
1
prevnext
JGit MoDisco
modelmetamodel
usageIn
Package
Access
*
package1
«relation,
fragmentation»
«fragmentation» «relation,
fragmentation»
«relation»
«fragmentation»
* *
extends1
19. Summary
■ Choosing the right representation makes a difference
■ Meta-model-based declaration of representations works
(might not be good enough)
■ There are applications that can benefit from different
representations
19
Object-by-object Fragments
Part-of-source Morsa, (Java) XMI, EMF-Frag
Relations CDO ?
References
Objects
21. Possible Approaches: Different Target Platforms
21
Schemas
XML
*
-
Big Data
-
Graphs
BASE
CAP-Theorem1
1Eric A. Brewer: Towards robust distributed systems; 19th ACM Symposium on Principles of Distributed Computing, 2000
2K. Barmpis and D.S. Kolovos. Comparative Analysis of Data Persistence Technologies for Large-Scale Models. XM 2012
ORM
XMI
XM
I+Resources
ER-Schemas
Relational Data
ACID,
structured data
ER-Schemas
Big Relational Data
BASE,
structured data
BASE,
structured data
Big
*
ORM?
2
22. Possible Approaches: Different Types of
Mapping
22
*
1Javier Espinazo-Pagán, Jesús Sánchez Cuadrado, Jesús García Molina: Morsa, A Scalable Approach for Persisting and Accessing Large Models; MoDELS 2011
per object m
apping fragm
entation
ER-Schemas
Relational Data
fast query,
slow traversal,
slow entry,
(fine transactions)
fast query,
slow traversal,
slow entry,
(fine transactions)1
Big
*
perobject
mapping
slow query,
fast traversal,
fast entry,
(coarse trans.)
Big
*ER-Schemas
Big Relational Data
/
23. Fragmentation: Types of references
■ organizing large artifacts in different resources is already
implemented in EMF
■ resources are loaded if necessary, objects in unloaded
resources are represented by proxy objects
■ objects in different resources (as all related objects) are
related through references, therefore models are
fragmented along references
■ EMF-Fragments automatically fragments large models based
on annotations in the meta-model
■ resources are identified via URIs and can be serialized (e.g.
XMI), therefore resources can be stored in a key-value store
23
24. Fragmentation: Types of references
24
*
normal
references
*
«fragments»fragmenting
references
large value
sets *
25. Applications
■ HWL sensor and network operation data (or experiment data in general)
■ realtime persistence required ➜ fast data entry
■ hierarchical structured data (different sensors and other data sources) ➜ meta-modeling
■ queries for experiments, sensors, specific time periods ➜ only coarse simple queries
■ traversal of larger sub-trees, mostly applications based on data aggregation
■ actual demand for big-data depends on size of sensor network ➜ scalability
■ CityGML models (or geo-spatial data in general)
■ standardized as XML-schemas ➜ XML based data
■ special proprietary indexes (e.g. spacial indexes like R-trees) and corresponding queries
■ rather query intense applications
■ actual demand for big-data depends on LOL of the models ➜ scalability
■ Software Engineering
■ Code/Model Version Control
■ Mining Software Repositories (MSR)
■ revisions of AST-trees and differences between AST-trees ➜ existing meta-model based frameworks (e.g. designed
for reverse engineering purposes)
■ large number of revisions causes many large value sets
■ queries for revisions, compilation-units ➜ rather coarse queries
■ aggregations and statistics ➜ can be expressed in an OCL-like language
■ immediate demand for processing in (at least smaller) clusters
■ has to be mixed with relational data for some applications
25
27. Applications: CityGML
■ XML-based standard ➜ meta-models can be generated (1-
to-1 mapping)
■ different standards define XML-schemas that extend each
other: GML⇽CityGML⇽extensions
■ transparent use of spacial indexes
■ map onto existing platforms (e.g. SpatialHadoop)
■ use existing implementations and persist into the key-value
store
■ extensions to CityGML can be facilitated to reference
CityGML-models as spatial context for sensor data
27
29. Research Overview
29
W
IRELESS SENSOR NETWORKS
DATA
ANALYSISFRAMEWORK
G
EO
INFORMATION SYSTEMS
sensor data
heterogenous networks
mesh-
networks
cellular-
networks
spatial data
regular databases
spatial databases
distributed
data stores
distributed
analysis
data homo-
genisation
domain
specific
analysis
languages
32. ‣120+ Nodes
‣indoor and outdoor
‣dense and sparse
‣short and long links
‣stationary and mobil nodes
33. ‣120+ Nodes
‣indoor and outdoor
‣dense and sparse
‣short and long links
‣stationary and mobil nodes
34.
35. 1
2
3
4
6
7
8
9
stein
? m
10m
5 10
Richtung Groß-Berliner
Damm
Richtung Institut
MarkusScheidgen:HWL–AHigh-PerformanceWirelessSensorResearchNetwork
35
Experiments: The Test Site
§ simplest case: two lane,
newly paved road
§ spatially equally distributed
nodes on both sides of the
rode
§ 2x5 nodes
§ homogeneous test-bed:
same nodes, equally
calibrated, same stone
ground
§ one camera to record control
data
36. 0 20 40 60 80 100 120 140 160 180 200
0
50
100
150
200
250
300
350
400
450
Single−sided Amplitude Spectrum
Frequency (Hz)
|Y(fr)|
Channel Z
Channel Y
Channel X
0 500 1000 1500 2000 2500 3000
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
Time sample (1/400 sec)
Acceleratorvalue
Time signal of all 3 channels
Channel Z
Channel Y
Channel X
MarkusScheidgen:HWL–AHigh-PerformanceWirelessSensorResearchNetwork
Experiments: Example Data
36
Amplitudes Frequencies
40. Research Overview
40
W
IRELESS SENSOR NETWORKS
DATA
ANALYSISFRAMEWORK
G
EO
INFORMATION SYSTEMS
sensor data
heterogenous networks
mesh-
networks
cellular-
networks
spatial data
regular databases
spatial databases
distributed
data stores
distributed
analysis
data homo-
genisation
domain
specific
analysis
languages
44. Complex Data Types
44
➡ complex data structures
➡ lots of links between data objects
➡ evolving structures
➡ requires a type safe programming
environment that proliferates re-
use
45. Large Amounts of Data
45
➡ a certain amount of data needs to be
stored per second (HWL: 120 nodes)
~140x103 data objects per second
~7MB/s serialized
➡ a certain amount of data needs to be
stored all together (24h)
~12x109 data objects
~600GB serialized
➡ Data analysis must complete in
reasonable time. For live
applications in real time.
46. From Click to ClickWatch
46
Click API software
Element
Element
Element
Compound
Handler
Handler
NetworkInterface
47. Complex Data Types: Meta-Modeling
47
This [ ] happens all the time in software modeling
state charts class diagrams MSCsOCL
context Foo
self.properties->
foreach(a|a.x != a.y)
eclipse modeling framework (EMF)
➡ Distributed storage and links between different types of data is only a simple
extension of existing technology: multi resource persistence is already implemented
48. “Share Nothing” Nodes
(cluster, adhoc-network)
DFS
(HDFS)
key-value-store1
(hbase)
Large Amounts of Data:
Problem Statement
48
1. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar
Chandra, Andrew Fikes, and Robert Gruber. Bigtable: A distributed storage system for structured data (awarded
best paper!). In Brian N. Bershad and Jeffrey C. Mogul, editors, OSDI, pages 205–218. USENIX Association, 2006.
2. Jeffrey Dean and Sanjay Ghemawat. Map/reduce: Simplified data processing on large clusters. In OSDI, pages 137–
150. USENIX Association, 2004.
map/reduce2
(hadoop)
hierarchical data
(XML, OGC standards)
data series
(sensor data)
signal analysis, statistics, sensor-fusion
domainspecificgeneric
49. 1
2
3
4
Large Amounts of Data: Approach
49
map/reduce
(hadoop)
“Share Nothing” Nodes
(cluster, adhoc-network)
DFS
(HDFS)
key-value-store
(hbase)
hierarchical data
(XML, OGC standards)
data series
(sensor data)
signal analysis, statistics, sensor-fusion meta-model
structured datamodel transformations