This document presents Resilient Distributed Datasets (RDDs), a fault-tolerant abstraction for in-memory cluster computing introduced by Spark. RDDs allow programmers to perform iterative and interactive computations over large datasets in a fault-tolerant manner. RDDs are distributed immutable collections of records that can be operated on through transformations and actions. They track the lineage of transformations to allow recovering lost data partitions. This provides an efficient abstraction for iterative algorithms compared to MapReduce.
Cortana Analytics Workshop: Insights and Predictions -- Integrating and Deplo...MSAdvAnalytics
Jeremy Reynolds.
This technical talk will describe how you can leverage Revolution R Enterprise to model Big Data and then leverage a new R package (AzureML) in order to deploy a production-level web service for scoring and consumption. Go to https://channel9.msdn.com/ to find the recording of this session.
Recent developments in Hadoop version 2 are pushing the system from the traditional, batch oriented, computational model based on MapRecuce towards becoming a multi paradigm, general purpose, platform. In the first part of this talk we will review and contrast three popular processing frameworks. In the second part we will look at how the ecosystem (eg. Hive, Mahout, Spark) is making use of these new advancements. Finally, we will illustrate "use cases" of batch, interactive and streaming architectures to power traditional and "advanced" analytics applications.
Approximation algorithms for stream and batch processingGabriele Modena
At Improve Digital (http://www.improvedigital.com) we collect and process large amounts of machine generated and behavioral data. Our systems address a variety of use cases that involve both batch and streaming technologies. One common denominator of the overall architecture is the need to share models and workflows across both worlds. Another one is that the analysis of large amounts of data often requires trade-offs; for instance trading accuracy for timeliness in streaming applications. One approach to satisfy these constraints is to make "big data" small. In this talk we will review a number of approximation methods for sketching, summarization and clustering and discuss how they are starting to change the way we think about certain types of analytics, and how they are being integrated into our data pipelines.
Minimizing Overprocessing Waste in Business Processes via Predictive Activity...Marlon Dumas
This document presents an approach to minimize overprocessing waste in business processes by predictively ordering activities. It does so by first executing the activity that is most likely to reject a case based on predictive models of each case's attributes. This avoids performing expensive downstream activities unnecessarily. The approach is evaluated on two real-world datasets, showing it reduces the average number of process checks required by selectively performing the knockout activity earlier. Overall, predictive activity ordering provides a way to reduce overprocessing waste in business processes.
Predictive Process Monitoring with Hyperparameter OptimizationMarlon Dumas
1. The document presents a predictive process monitoring framework that is enhanced with technique and hyperparameter optimization.
2. The framework evaluates multiple configurations of machine learning techniques and their hyperparameters on historical execution traces to identify the most suitable configuration for a given prediction problem and dataset.
3. An evaluation of the framework on two prediction problems and datasets found that it was able to identify suitable configurations within a reasonable time frame, though the best configuration varied depending on the prediction problem and users' performance criteria.
This document presents Resilient Distributed Datasets (RDDs), a fault-tolerant abstraction for in-memory cluster computing introduced by Spark. RDDs allow programmers to perform iterative and interactive computations over large datasets in a fault-tolerant manner. RDDs are distributed immutable collections of records that can be operated on through transformations and actions. They track the lineage of transformations to allow recovering lost data partitions. This provides an efficient abstraction for iterative algorithms compared to MapReduce.
Cortana Analytics Workshop: Insights and Predictions -- Integrating and Deplo...MSAdvAnalytics
Jeremy Reynolds.
This technical talk will describe how you can leverage Revolution R Enterprise to model Big Data and then leverage a new R package (AzureML) in order to deploy a production-level web service for scoring and consumption. Go to https://channel9.msdn.com/ to find the recording of this session.
Recent developments in Hadoop version 2 are pushing the system from the traditional, batch oriented, computational model based on MapRecuce towards becoming a multi paradigm, general purpose, platform. In the first part of this talk we will review and contrast three popular processing frameworks. In the second part we will look at how the ecosystem (eg. Hive, Mahout, Spark) is making use of these new advancements. Finally, we will illustrate "use cases" of batch, interactive and streaming architectures to power traditional and "advanced" analytics applications.
Approximation algorithms for stream and batch processingGabriele Modena
At Improve Digital (http://www.improvedigital.com) we collect and process large amounts of machine generated and behavioral data. Our systems address a variety of use cases that involve both batch and streaming technologies. One common denominator of the overall architecture is the need to share models and workflows across both worlds. Another one is that the analysis of large amounts of data often requires trade-offs; for instance trading accuracy for timeliness in streaming applications. One approach to satisfy these constraints is to make "big data" small. In this talk we will review a number of approximation methods for sketching, summarization and clustering and discuss how they are starting to change the way we think about certain types of analytics, and how they are being integrated into our data pipelines.
Minimizing Overprocessing Waste in Business Processes via Predictive Activity...Marlon Dumas
This document presents an approach to minimize overprocessing waste in business processes by predictively ordering activities. It does so by first executing the activity that is most likely to reject a case based on predictive models of each case's attributes. This avoids performing expensive downstream activities unnecessarily. The approach is evaluated on two real-world datasets, showing it reduces the average number of process checks required by selectively performing the knockout activity earlier. Overall, predictive activity ordering provides a way to reduce overprocessing waste in business processes.
Predictive Process Monitoring with Hyperparameter OptimizationMarlon Dumas
1. The document presents a predictive process monitoring framework that is enhanced with technique and hyperparameter optimization.
2. The framework evaluates multiple configurations of machine learning techniques and their hyperparameters on historical execution traces to identify the most suitable configuration for a given prediction problem and dataset.
3. An evaluation of the framework on two prediction problems and datasets found that it was able to identify suitable configurations within a reasonable time frame, though the best configuration varied depending on the prediction problem and users' performance criteria.
My business processes are deviant! What should I do about it?Marlon Dumas
This document discusses techniques for identifying and addressing deviant business processes. It defines deviance as processes that violate compliance rules, service level objectives, or cost targets. The document recommends a two-pronged approach of deviance mining and predictive monitoring. Deviance mining involves analyzing process event logs to discover patterns that distinguish normal and deviant cases, in order to explain the causes of deviance. Predictive monitoring uses the patterns to predict future deviance and generate alerts. Several case studies are described where organizations successfully applied these techniques to problems like late deliveries, faulty products, and software issues. The key takeaway is that organizations should quantify, analyze, monitor, and predict deviance to preempt problems in their business processes.
Business Process Performance Mining with Staged Process FlowsMarlon Dumas
This document proposes a new technique called staged process flows to analyze how business process performance evolves over time. It introduces measures to quantify the performance of each process stage and the overall process. These measures are calculated over time periods to show how performance changes. The approach is evaluated on two real-world event logs where it is able to answer questions about how overall performance, bottlenecks, and demand/capacity affect the processes over time. The technique provides novel visual analytics to help investigate process patterns and bottlenecks.
Complete and Interpretable Conformance Checking of Business ProcessesMarlon Dumas
This document presents a new approach for conformance checking of business processes that identifies all differences between a process model and an event log. It generates natural language statements to describe each difference. The approach works by translating the model and log into prime event structures and extracting mismatches by comparing their partially synchronized product. It can identify seven elementary mismatch patterns to characterize deviations. The approach was implemented in a standalone Java tool and evaluated on a real-life process with over 150,000 event traces.
Beyond Tasks and Gateways: Automated Discovery of BPMN Models with Subprocess...Marlon Dumas
Paper presentation at the 12th International BPM Conference, Eindhoven, The Netherlands, September 2014. The corresponding paper can be found at: http://math.ut.ee/~dumas/pubs/bpm2014bpmnminer.pdf
Predictive Business Process Monitoring with Structured and Unstructured DataMarlon Dumas
Presentation delivered by Irene Teinemaa at the BPM'2016 conference, Rio de Janeiro, 22 September 2016. Paper available at: http://kodu.ut.ee/~dumas/pubs/bpm2016predictivemonitoring.pdf
Process Mining Reloaded: Event Structures as a Unified Representation of Proc...Marlon Dumas
Keynote talk at the 36th International Conference on Application and Theory of Petri Nets and Concurrency (Petri Nets 2015).
Screencast available at: https://youtu.be/9bQr0r_WaoE
In Processes We Trust: Privacy and Trust in Business ProcessesMarlon Dumas
This document discusses challenges and opportunities around privacy and trust in business processes. It begins by defining key concepts like security, privacy, and trust. It then outlines topics related to business process security and privacy, such as access control, flow analysis to detect unauthorized data access, and privacy-aware business process execution. The document proposes approaches for privacy-aware business processes using techniques like k-anonymization and multi-party computation. It describes a system called Pleak.io that aims to model stakeholders, data flows, and privacy-enhancing technologies to quantify privacy leaks and accuracy loss in processes. The document concludes by discussing challenges around collaborative processes with untrusted parties and the potential use of distributed ledgers and smart contracts to address issues of
Automated Discovery of Structured Process Models: Discover Structured vs Disc...Marlon Dumas
Research paper presentation at the 35th International Conference on Conceptual Modeling (ER'2016), Gifu, Japan, 15 Nov. 2016
Presentation delivered by Raffaele Conforti.
Paper available at: http://goo.gl/5EN3l2
Evidence-Based Business Process ManagementMarlon Dumas
1. The document discusses trends in business process management, specifically the rise of evidence-based business process management using process mining techniques.
2. Process mining allows companies to analyze process data from event logs to understand their actual processes, quantify the impact of changes, and discover opportunities for improvement.
3. The techniques discussed include process discovery, conformance checking, predictive monitoring, and rule mining to provide insights into deviations, bottlenecks, and other process issues.
Process Mining and Predictive Process MonitoringMarlon Dumas
This document discusses process mining and predictive process monitoring. It begins with an overview of offline process mining techniques like process discovery, conformance checking, and deviance mining. It then discusses applying these techniques online for predictive process monitoring, including predicting outcomes, deviations, or failures. Various techniques are presented like nearest neighbor classification of partial traces and clustering traces before classification. The goal is to accurately predict outcomes during process execution based on control flow, data attributes, and textual case data.
Introduction to Business Process Analysis and RedesignMarlon Dumas
Special course on business process analysis and design delivered at University of Granada on 23-24 January 2014. The course covers qualitative and quantitative process analysis techniques and redesign heuristics. Based on the textbook Fundamentals of Business Process Management by Dumas et al.
Minería de Procesos y de Reglas de NegocioMarlon Dumas
Charla sobre minería de procesos y reglas de negocio en el 1er Foro Colombiano de BPM organizado por la Universidad de los Andes (Bogotá), 29 de Noviembre 2013 - http://forosisis.uniandes.edu.co/bpm/1er-forodebpm/
GPR, or ground penetrating radar, is a non-destructive geophysical technique that uses high frequency electromagnetic waves to image the shallow subsurface. It works by transmitting waves into the ground from an antenna and detecting the reflected signals, with the reflection times corresponding to layer depths. GPR can create 2D or 3D images of underground structures based on contrasts in electrical properties like conductivity and dielectric permittivity, which are affected by material and moisture. Common applications include utility detection, archaeology, and mapping stratigraphy, but performance depends on ground conditions.
Fundamentals of Business Process Management: A Quick Introduction to Value-Dr...Marlon Dumas
Marlon Dumas of University of Tartu gives an introduction and quick tour of the business process management lifecycle. Seminar given at the Estonian BPM Roundtable, 10 October 2013.
1 Project 1 Introduction - the SeaPort Project seri.docxoswald1horne84988
1
Project 1
Introduction - the SeaPort Project series
For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports.
Here are the classes and their instance variables we wish to define:
SeaPortProgram extends JFrame
o variables used by the GUI interface
o world: World
Thing implement Comparable <Thing>
o index: int
o name: String
o parent: int
World extends Thing
o ports: ArrayList <SeaPort>
o time: PortTime
SeaPort extends Thing
o docks: ArrayList <Dock>
o que: ArrayList <Ship> // the list of ships waiting to dock
o ships: ArrayList <Ship> // a list of all the ships at this port
o persons: ArrayList <Person> // people with skills at this port
Dock extends Thing
o ship: Ship
Ship extends Thing
o arrivalTime, dockTime: PortTime
o draft, length, weight, width: double
o jobs: ArrayList <Job>
PassengerShip extends Ship
o numberOfOccupiedRooms: int
o numberOfPassengers: int
o numberOfRooms: int
CargoShip extends Ship
o cargoValue: double
o cargoVolume: double
o cargoWeight: double
Person extends Thing
o skill: String
Job extends Thing - optional till Projects 3 and 4
o duration: double
o requirements: ArrayList <String>
// should be some of the skills of the persons
PortTime
o time: int
Eventually, in Projects 3 and 4, you will be asked to show the progress of the jobs using JProgressBar's.
2
Here's a very quick overview of all projects:
1. Read a data file, create the internal data structure, create a GUI to display the structure, and let
the user search the structure.
2. Sort the structure, use hash maps to create the structure more efficiently.
3. Create a thread for each job, cannot run until a ship has a dock, create a GUI to show the
progress of each job.
4. Simulate competing for resources (persons with particular skills) for each job.
Project 1 General Objectives
Project 1 - classes, text data file, GUI, searching
Define and implement appropriate classes, including:
o instance and class variables,
o constructors,
o toString methods, and
o other appropriate methods.
Read data from a text file:
o specified at run time,
JFileChooser jfc = new JFileChooser (".");
// start at dot, the current directory
o using that data to create instances of the classes,
o creating a multi-tree (class instances related in hierarchical, has-some, relationships),
and
o organizing those instances in existing JDK structures which can be sorted, such as
ArrayList's.
Create a simple GUI:
o presenting the data in the structures with with some buttons and
o text fields supporting SEARCHING on the various fields of each class.
Documentation Requirements:
You should start working on a documentation file before you do anything else with these projects, and
fill in items as you go along. Leaving the documentation until the project is finished is not a good idea for
any number of .
20160818 Semantics and Linkage of Archived Catalogsandrea huang
1. The document discusses representing archive catalog data as linked data using semantic web technologies. It involves mapping catalog metadata from XML and CSV formats to RDF and linking to external vocabularies.
2. A system is presented that converts archive catalogs to linked data, stores it using CKAN and provides SPARQL querying. It allows browsing catalog records, performing spatial and temporal queries.
3. An ontology called voc4odw is introduced for organizing open data. It is based on the R4R ontology and aims to semantically enrich catalog records by linking objects, events, places and times using common vocabularies.
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...European Data Forum
Selected Talk of Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universidad Politecnica de Madrid, Spain at the European Data Forum 2014, 19 March 2014 in Athens, Greece: 3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
My business processes are deviant! What should I do about it?Marlon Dumas
This document discusses techniques for identifying and addressing deviant business processes. It defines deviance as processes that violate compliance rules, service level objectives, or cost targets. The document recommends a two-pronged approach of deviance mining and predictive monitoring. Deviance mining involves analyzing process event logs to discover patterns that distinguish normal and deviant cases, in order to explain the causes of deviance. Predictive monitoring uses the patterns to predict future deviance and generate alerts. Several case studies are described where organizations successfully applied these techniques to problems like late deliveries, faulty products, and software issues. The key takeaway is that organizations should quantify, analyze, monitor, and predict deviance to preempt problems in their business processes.
Business Process Performance Mining with Staged Process FlowsMarlon Dumas
This document proposes a new technique called staged process flows to analyze how business process performance evolves over time. It introduces measures to quantify the performance of each process stage and the overall process. These measures are calculated over time periods to show how performance changes. The approach is evaluated on two real-world event logs where it is able to answer questions about how overall performance, bottlenecks, and demand/capacity affect the processes over time. The technique provides novel visual analytics to help investigate process patterns and bottlenecks.
Complete and Interpretable Conformance Checking of Business ProcessesMarlon Dumas
This document presents a new approach for conformance checking of business processes that identifies all differences between a process model and an event log. It generates natural language statements to describe each difference. The approach works by translating the model and log into prime event structures and extracting mismatches by comparing their partially synchronized product. It can identify seven elementary mismatch patterns to characterize deviations. The approach was implemented in a standalone Java tool and evaluated on a real-life process with over 150,000 event traces.
Beyond Tasks and Gateways: Automated Discovery of BPMN Models with Subprocess...Marlon Dumas
Paper presentation at the 12th International BPM Conference, Eindhoven, The Netherlands, September 2014. The corresponding paper can be found at: http://math.ut.ee/~dumas/pubs/bpm2014bpmnminer.pdf
Predictive Business Process Monitoring with Structured and Unstructured DataMarlon Dumas
Presentation delivered by Irene Teinemaa at the BPM'2016 conference, Rio de Janeiro, 22 September 2016. Paper available at: http://kodu.ut.ee/~dumas/pubs/bpm2016predictivemonitoring.pdf
Process Mining Reloaded: Event Structures as a Unified Representation of Proc...Marlon Dumas
Keynote talk at the 36th International Conference on Application and Theory of Petri Nets and Concurrency (Petri Nets 2015).
Screencast available at: https://youtu.be/9bQr0r_WaoE
In Processes We Trust: Privacy and Trust in Business ProcessesMarlon Dumas
This document discusses challenges and opportunities around privacy and trust in business processes. It begins by defining key concepts like security, privacy, and trust. It then outlines topics related to business process security and privacy, such as access control, flow analysis to detect unauthorized data access, and privacy-aware business process execution. The document proposes approaches for privacy-aware business processes using techniques like k-anonymization and multi-party computation. It describes a system called Pleak.io that aims to model stakeholders, data flows, and privacy-enhancing technologies to quantify privacy leaks and accuracy loss in processes. The document concludes by discussing challenges around collaborative processes with untrusted parties and the potential use of distributed ledgers and smart contracts to address issues of
Automated Discovery of Structured Process Models: Discover Structured vs Disc...Marlon Dumas
Research paper presentation at the 35th International Conference on Conceptual Modeling (ER'2016), Gifu, Japan, 15 Nov. 2016
Presentation delivered by Raffaele Conforti.
Paper available at: http://goo.gl/5EN3l2
Evidence-Based Business Process ManagementMarlon Dumas
1. The document discusses trends in business process management, specifically the rise of evidence-based business process management using process mining techniques.
2. Process mining allows companies to analyze process data from event logs to understand their actual processes, quantify the impact of changes, and discover opportunities for improvement.
3. The techniques discussed include process discovery, conformance checking, predictive monitoring, and rule mining to provide insights into deviations, bottlenecks, and other process issues.
Process Mining and Predictive Process MonitoringMarlon Dumas
This document discusses process mining and predictive process monitoring. It begins with an overview of offline process mining techniques like process discovery, conformance checking, and deviance mining. It then discusses applying these techniques online for predictive process monitoring, including predicting outcomes, deviations, or failures. Various techniques are presented like nearest neighbor classification of partial traces and clustering traces before classification. The goal is to accurately predict outcomes during process execution based on control flow, data attributes, and textual case data.
Introduction to Business Process Analysis and RedesignMarlon Dumas
Special course on business process analysis and design delivered at University of Granada on 23-24 January 2014. The course covers qualitative and quantitative process analysis techniques and redesign heuristics. Based on the textbook Fundamentals of Business Process Management by Dumas et al.
Minería de Procesos y de Reglas de NegocioMarlon Dumas
Charla sobre minería de procesos y reglas de negocio en el 1er Foro Colombiano de BPM organizado por la Universidad de los Andes (Bogotá), 29 de Noviembre 2013 - http://forosisis.uniandes.edu.co/bpm/1er-forodebpm/
GPR, or ground penetrating radar, is a non-destructive geophysical technique that uses high frequency electromagnetic waves to image the shallow subsurface. It works by transmitting waves into the ground from an antenna and detecting the reflected signals, with the reflection times corresponding to layer depths. GPR can create 2D or 3D images of underground structures based on contrasts in electrical properties like conductivity and dielectric permittivity, which are affected by material and moisture. Common applications include utility detection, archaeology, and mapping stratigraphy, but performance depends on ground conditions.
Fundamentals of Business Process Management: A Quick Introduction to Value-Dr...Marlon Dumas
Marlon Dumas of University of Tartu gives an introduction and quick tour of the business process management lifecycle. Seminar given at the Estonian BPM Roundtable, 10 October 2013.
1 Project 1 Introduction - the SeaPort Project seri.docxoswald1horne84988
1
Project 1
Introduction - the SeaPort Project series
For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports.
Here are the classes and their instance variables we wish to define:
SeaPortProgram extends JFrame
o variables used by the GUI interface
o world: World
Thing implement Comparable <Thing>
o index: int
o name: String
o parent: int
World extends Thing
o ports: ArrayList <SeaPort>
o time: PortTime
SeaPort extends Thing
o docks: ArrayList <Dock>
o que: ArrayList <Ship> // the list of ships waiting to dock
o ships: ArrayList <Ship> // a list of all the ships at this port
o persons: ArrayList <Person> // people with skills at this port
Dock extends Thing
o ship: Ship
Ship extends Thing
o arrivalTime, dockTime: PortTime
o draft, length, weight, width: double
o jobs: ArrayList <Job>
PassengerShip extends Ship
o numberOfOccupiedRooms: int
o numberOfPassengers: int
o numberOfRooms: int
CargoShip extends Ship
o cargoValue: double
o cargoVolume: double
o cargoWeight: double
Person extends Thing
o skill: String
Job extends Thing - optional till Projects 3 and 4
o duration: double
o requirements: ArrayList <String>
// should be some of the skills of the persons
PortTime
o time: int
Eventually, in Projects 3 and 4, you will be asked to show the progress of the jobs using JProgressBar's.
2
Here's a very quick overview of all projects:
1. Read a data file, create the internal data structure, create a GUI to display the structure, and let
the user search the structure.
2. Sort the structure, use hash maps to create the structure more efficiently.
3. Create a thread for each job, cannot run until a ship has a dock, create a GUI to show the
progress of each job.
4. Simulate competing for resources (persons with particular skills) for each job.
Project 1 General Objectives
Project 1 - classes, text data file, GUI, searching
Define and implement appropriate classes, including:
o instance and class variables,
o constructors,
o toString methods, and
o other appropriate methods.
Read data from a text file:
o specified at run time,
JFileChooser jfc = new JFileChooser (".");
// start at dot, the current directory
o using that data to create instances of the classes,
o creating a multi-tree (class instances related in hierarchical, has-some, relationships),
and
o organizing those instances in existing JDK structures which can be sorted, such as
ArrayList's.
Create a simple GUI:
o presenting the data in the structures with with some buttons and
o text fields supporting SEARCHING on the various fields of each class.
Documentation Requirements:
You should start working on a documentation file before you do anything else with these projects, and
fill in items as you go along. Leaving the documentation until the project is finished is not a good idea for
any number of .
20160818 Semantics and Linkage of Archived Catalogsandrea huang
1. The document discusses representing archive catalog data as linked data using semantic web technologies. It involves mapping catalog metadata from XML and CSV formats to RDF and linking to external vocabularies.
2. A system is presented that converts archive catalogs to linked data, stores it using CKAN and provides SPARQL querying. It allows browsing catalog records, performing spatial and temporal queries.
3. An ontology called voc4odw is introduced for organizing open data. It is based on the R4R ontology and aims to semantically enrich catalog records by linking objects, events, places and times using common vocabularies.
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...European Data Forum
Selected Talk of Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universidad Politecnica de Madrid, Spain at the European Data Forum 2014, 19 March 2014 in Athens, Greece: 3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
This 2nd version of the last year workshop will shed light on a modern solution to solve application portability, building, delivery, packaging, and system dependency issues. Containers especially Docker have seen accelerated adoption in the web, cloud and recently the enterprise. HPC environments are seeing something similar to the introduction of HPC containers Singularity and Shifter. They provide a good use case for solving software portability, not to mention ensure repeatability of results. Not to mention their ECO system provides for the better development, delivery, testing workflows that were alien to most of HPC environments. This workshop will cover the Theory and hands-on of containers and Its ecosystem. Introducing Docker and singularity containers; Docker as a general-purpose container for almost any app, Singularity as the particular container technology for HPC. The workshop will go over the foundations of the containers platform, including an overview of the platform system components: images, containers, repositories, clustering, and orchestration. The strategy is to demonstrate through "live demo, and hands-on exercises." The reuse case of containers in building a portable distributed application cluster running a variety of workloads including HPC workload.
The document describes a student project titled "Error Detection in Big Data on Cloud". The project aims to develop a time-efficient approach for detecting and correcting errors in large sensor data stored on the cloud. If errors are found, the approach also involves error recovery and storing the corrected data in its original format. The proposed method uses algorithms like cyclic redundancy check, Hamming code, and secure hash algorithm to detect and locate errors in big data sets efficiently. Design documents like data flow diagrams, use case diagrams and class diagrams were created to plan the system architecture and implementation of the project.
Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...Trivadis
In this presentation, Olaf Nimz talks about a proposed marriage between SQL Server and Hadoop, about Building Bridges to HDFS, Distributed query processing and about Sensible Hybrid Scenarios.
Emerging Communications Tech: Lessons from Hurricane Sandy and Super Typhoon ...Cisco Crisis Response
This presentation takes a look at two recent Cisco TacOps deployments, Hurricane Sandy and Super Typhoon Haiyan, and examines the emerging communications technologies that are bringing innovation to disasters and humanitarian crises.
containerit at useR!2017 conference, BrusselsDaniel Nüst
**Webpage**
https://github.com/o2r-project/containerit/
**Abstract**
Reproducibility of computations is crucial in an era where data is born digital and analysed algorithmically. Most studies however only publish the results, often with figures as important interpreted outputs. But where do these figures come from? Scholarly articles must provide not only a description of the work but be accompanied by data and software. R offers excellent tools to create reproducible works, i.e. Sweave and RMarkdown. Several approaches to capture the workspace environment in R have been made, working around CRAN’s deliberate choice not to provide explicit versioning of packages and their dependencies. They preserve a collection of packages locally (packrat, pkgsnap, switchr/GRANBase) or remotely (MRAN timemachine/checkpoint), or install specific versions from CRAN or source (requireGitHub, devtools). Installers for old versions of R are archived on CRAN. A user can manually re-create a specific environment, but this is a cumbersome task.
We introduce a new possibility to preserve a runtime environment including both, packages and R, by adding an abstraction layer in the form of a container, which can execute a script or run an interactive session. The package containeRit automatically creates such containers based on Docker. Docker is a solution for packaging an application and its dependencies, but shows to be useful in the context of reproducible research (Boettiger 2015). The package creates a container manifest, the Dockerfile, which is usually written by hand, from sessionInfo(), R scripts, or RMarkdown documents. The Dockerfiles use the Rocker community images as base images. Docker can build an executable image from a Dockerfile. The image is executable anywhere a Docker runtime is present. containeRit uses harbor for building images and running containers, and sysreqs for installing system dependencies of R packages. Before the planned CRAN release we want to share our work, discuss open challenges such as handling linked libraries (see discussion on geospatial libraries in Rocker), and welcome community feedback.
containeRit is developed within the DFG-funded project Opening Reproducible Research to support the creation of Executable Research Compendia (ERC) (Nüst et al. 2017).
**References**
Boettiger, Carl. 2015. “An Introduction to Docker for Reproducible Research, with Examples from the R Environment.” ACM SIGOPS Operating Systems Review 49 (January): 71–79. doi:10.1145/2723872.2723882.
Nüst, Daniel, Markus Konkol, Edzer Pebesma, Christian Kray, Marc Schutzeichel, Holger Przibytzin, and Jörg Lorenz. 2017. “Opening the Publication Process with Executable Research Compendia.” D-Lib Magazine 23 (January). doi:10.1045/january2017-nuest.
Grehack2013-RuoAndo-Unraveling large scale geographical distribution of vulne...Ruo Ando
Feasibility study of large scale attacks of DNS. We have found 10,334,293 DNS servers in 34 hours of first measurement (2013/05/31 – 2013/06/02) and 30285322 DNS servers in 26 hours of second measurement (2013/07/05).
2. Develop a MapReduce program to calculate the frequency of a given word in ...Prof. Maulik Trivedi
The document describes installing and configuring Apache Hadoop in standalone mode on a single machine. It involves installing Java, downloading and extracting the Hadoop binaries, and starting the necessary Hadoop daemons like HDFS and YARN. A sample MapReduce wordcount program is then run on some sample input text to test the Hadoop installation.
This document discusses SQL on Druid. It provides an overview of Druid, benchmarks comparing Druid to Spark, and details how SQL can be used with Druid through Hive integration and Druid's built-in SQL functionality. Hive allows SQL queries over Druid data through a Druid storage handler and by translating Hive queries into the appropriate Druid query format. Druid also natively supports SQL queries through its Avatica server, enabling SQL queries directly against Druid data sources.
We now have larger Knowledge Bases than ever before. (10 billion facts is now a small number).
We now have the instruments to observe and analyse these very large Knowledge Bases.
We can use these insights for better tools for querying, inferencing, publishing, maintaining, visualising and explaining.
This document discusses Teledyne Brown Engineering's space systems department and achievements. It summarizes Teledyne's history in supporting NASA missions dating back to the 1950s. It outlines Teledyne's current work on the International Space Station, including operating the Microgravity Science Glovebox and Life Science Glovebox payloads, and developing and integrating science experiments. One highlighted experiment is a 2019 tissue engineering experiment to observe protein fibril formation under shear forces. The document also mentions Teledyne's work on future vehicles like Dream Chaser and the Deep Space Gateway.
The DISTRILAKE project aims to integrate and coordinate the management of available reservoirs in the Lake Como basin to increase the overall efficiency of the water system and its ability to meet various user needs, especially during periods of water stress. A key part of the project is developing a geo web system with a data model, web services, and client interface to enable collection and sharing of water-related data among project participants and stakeholders. The system will make all relevant data immediately and easily accessible. Setting up metadata standards and catalog services are also important aspects for implementing the geo web system.
This talk has been given at the 13th International Conference on Principles of Knowledge Representation and Reasoning (KR 2012) to be held in Rome, Italy, June 10-14, 2012 by Ilias Tahmazidis (FORTH).
Abstract:
We are witnessing an explosion of available data from the Web, government authorities, scientific databases, sensors and more. Such datasets could benefit from the introduction of rule sets encoding commonly accepted rules or facts, application- or domain-specific rules, commonsense knowledge etc. This raises the question of whether, how, and to what extent knowledge representation methods are capable of handling the vast amounts of data for these applications. In this paper, we consider nonmonotonic reasoning, which has traditionally focused on rich knowledge structures. In particular, we consider defeasible logic, and analyze how parallelization, using the MapReduce framework, can be used to reason with defeasible rules over huge data sets. Our experimental results demonstrate that defeasible reasoning with billions of data is performant, and has the potential to scale to trillions of facts.
This document discusses the Data Distribution Service (DDS) and how it provides a data sharing abstraction for building distributed systems. DDS implements a fully distributed tuple space, allowing applications to share information through the publication and subscription of data topics. It provides location transparency, anonymity, temporal decoupling and other benefits over traditional messaging approaches. The document demonstrates how to build a basic DDS application by creating a domain participant, topics, publishers, writers and readers to share data between distributed applications.
Similar to Differential Privacy Analysis of Data Processing Workflows (20)
How GenAI will (not) change your business?Marlon Dumas
Not all new technology waves are the same. Some waves are vertical (3D printing, digital twins, blockchain) while others are horizontal (the PC in the 80s, the Web in the 90s). GenAI is a horizontal wave. The question is not if GenAI will impact my business, but what will be the scope of this impact. In this talk, we will go through a journey of collisions: GenAI colliding with customer service, clerical work, information search, content production, IT development, product design, and other knowledge work. A common thread to understand the impact of GenAI is to distinguish between descriptive use cases (search, summarize, expand, transcribe & translate) versus creative use.
Walking the Way from Process Mining to AI-Driven Process OptimizationMarlon Dumas
While generative AI grabs headlines, most organizations are yet to achieve continuous process improvement from predictive and prescriptive analytics.
Why? It’s largely about data, people, and a methodical approach to deploy AI to connect data and people. The good news is that if your organization has built a process mining capability, you are well placed to climb the ladder to achieve AI-driven process optimization. But to get there, you need a disciplined step-by-step approach along two tracks: a tactical management track and an operational management track.
First, it’s about predicting what will happen if you leave your process as-is, and what will happen if you implement a change in your process. At a tactical level, a predictive capability allows you to prioritize improvement opportunities. At an operational level, it allows you to predict issues, such as deadline violations. The challenges here are how to manage the inherent uncertainty of data-driven AI systems, and how to change your people and culture to manage processes proactively, rather than reactively. One thing is to deploy predictive dashboards, another entirely different thing is to get people to use them effectively to improve the processes.
Next, it’s about becoming preemptive: continuously optimizing your processes by leveraging streams of data-driven recommendations to trigger changes and actions. At the tactical level, this prescriptive capability allows you to implement the right changes to maximize competing KPIs. At the operational level, it means triggering interventions in your processes to “wow” customers and to meet SLAs in a cost-effective manner. The challenge here is how to help process owners, workers, and other stakeholders to understand the causes of performance issues and how the recommendations generated by the AI-driven optimization system will tackle those causes?
And finally, as an icing on the cake, generative AI allows you to produce improvement scenarios to adapt to external changes. Importantly, the transformative potential of generative AI in the context of process improvement does not come from its ability to provide question-and-answer interfaces to query data. It comes from its ability to support continuous process adaptation by generating and validating hypotheses based on a holistic view of your organization.
In this talk, we will discuss how organizations are driving sustainable business value by strategically layering predictive, prescriptive, and generative AI onto a process mining foundation, one brick at a time.
Industry keynote talk by Marlon Dumas at the 5th International Conference on Process Mining (ICPM'2023), Rome, Italy, 25 October 2023
Discovery and Simulation of Business Processes with Probabilistic Resource Av...Marlon Dumas
In the field of business process simulation, the availability of resources is captured by assigning a calendar to each resource, e.g., Monday-Friday 9:00-18:00. Resources are assumed to be always available to perform activities during their calendar. This assumption often does not hold due to interruptions, breaks, or because resources time-share across multiple processes. A simulation model that captures availability via crisp time slots (a resource is either on or off during a slot) does not capture these behaviors, leading to inaccuracies in the simulation output. This paper presents a simulation approach wherein resource availability is modeled probabilistically. In this approach, each availability time slot is associated with a probability, allowing us to capture, for example, that a resource is available on Fridays between 14:00-15:00 with 90% probability and between 17:00-18:00 with 50% probability. The paper proposes an algorithm to discover probabilistic availability calendars from event logs. An empirical evaluation shows that simulation models with probabilistic calendars discovered from event logs, replicate the temporal distribution of activity instances and cycle times of a process more closely than simulation models with crisp calendars.
This presentation was delivered at the 5th International Conference on Process Mining (ICPM'2023), Rome, Italy, October 2023.
The paper is available at: https://easychair.org/publications/preprint/Rz9g
Can I Trust My Simulation Model? Measuring the Quality of Business Process Si...Marlon Dumas
Business Process Simulation (BPS) is an approach to analyze the performance of business processes under different scenarios. For example, BPS allows us to estimate what would be the cycle time of a process if one or more resources became unavailable. The starting point of BPS is a process model annotated with simulation parameters (a BPS model). BPS models may be manually designed, based on information collected from stakeholders and empirical observations, or automatically discovered from execution data. Regardless of its origin, a key question when using a BPS model is how to assess its quality. In this paper, we propose a collection of measures to evaluate the quality of a BPS model w.r.t. its ability to replicate the observed behavior of the process. We advocate an approach whereby different measures tackle different process perspectives. We evaluate the ability of the proposed measures to discern the impact of modifications to a BPS model, and their ability to uncover the relative strengths and weaknesses of two approaches for automated discovery of BPS models. The evaluation shows that the measures not only capture how close a BPS model is to the observed behavior, but they also help us to identify sources of discrepancies.
Presentation delivered by David Chapela-Campa at the BPM'2023 conference, Utrecht, September 2023.
Business Process Optimization: Status and PerspectivesMarlon Dumas
For decades, business process optimization has been largely about art and craft (and sometimes wizardry). Apart from narrowly scoped approaches to optimize resource allocation (often assuming that workers behave like robots), a lot of business process optimization relies on high-level guidelines, with A/B testing for idea validation, which is hard to scale to complex processes. As a result, managers end up settling for a "good enough" process. Can we do more? In this talk, we review recent work on the use of high-fidelity simulation models discovered from execution data. The talk also explores the possibilities (and perils) that LLMs bring to the field of business process optimization.
This talk was delivered at the Workshop on Data-Driven Business Process Optimization at the BPM'2023 conference.
Learning When to Treat Business Processes: Prescriptive Process Monitoring wi...Marlon Dumas
Paper presentation at the 35th International Conference on Advanced Information Systems Engineering (CAiSE'2023).
Abstract.
Increasing the success rate of a process, i.e. the percentage of cases that end in a positive outcome, is a recurrent process improvement goal. At runtime, there are often certain actions (a.k.a. treatments) that workers may execute to lift the probability that a case ends in a positive outcome. For example, in a loan origination process, a possible treatment is to issue multiple loan offers to increase the probability that the customer takes a loan. Each treatment has a cost. Thus, when defining policies for prescribing treatments to cases, managers need to consider the net gain of the treatments. Also, the effect of a treatment varies over time: treating a case earlier may be more effective than later in a case. This paper presents a prescriptive monitoring method that automates this decision-making task. The method combines causal inference and reinforcement learning to learn treatment policies that maximize the net gain. The method leverages a conformal prediction technique to speed up the convergence of the reinforcement learning mechanism by separating cases that are likely to end up in a positive or negative outcome, from uncertain cases. An evaluation on two real-life datasets shows that the proposed method outperforms a state-of-the-art baseline.
Why am I Waiting Data-Driven Analysis of Waiting Times in Business ProcessesMarlon Dumas
Presentation of a research paper at the 35th International Conference on Advanced Information Systems Engineering (CAiSE) in Zaragoza Spain. The paper presents a classification of causes of waiting times in business processes and a method to automatically detect and quantify the presence of each of these causes in a business process recorded in an event log.
This talk introduces the concept of Augmented Business Process Management System: An ABPMS is a process-aware information system that relies on trustworthy AI technology to
reason and act upon data, within a set of restrictions, with the aim to continuously adapt and
improve a set of business processes with respect to one or more key performance indicators.
The talk describes the transition from existing process mining technology to AI-Augmented BPM as a pyramid, where predictive, prescriptive, conversational and reasoning capabilities are stacked up incrementally to reach the level of Augmented BPM.
Talk delivered at the AAAI'2023 Workshop on AI for Business Process Management.
Process Mining and Data-Driven Process SimulationMarlon Dumas
Guest lecture delivered at the - Institut Teknologi Sepuluh on 8 December 2022.
This lecture gives an overview of process mining and simulation techniques, and how the two can be used together in process improvement projects.
Modeling Extraneous Activity Delays in Business Process SimulationMarlon Dumas
This paper presents a technique to enhance the fidelity of business process simulation models by detecting unexplained (extraneous) delays from business process execution data, and modeling these delays in the simulation model, via timer events.
The presentation was delivered at the 4th International Conference on Process Mining (ICPM'2022).
Paper available at: https://arxiv.org/abs/2206.14051
Business Process Simulation with Differentiated Resources: Does it Make a Dif...Marlon Dumas
Existing methods for discovering business process simulation models from execution data (event logs) assume that all resources in a pool have the same performance and share the same availability calendars. This paper proposes a method for discovering simulation models, wherein each resource is treated as an individual entity, with its own performance and availability calendar. An evaluation shows that simulation models with differentiated resources more closely replicate the distributions of cycle times and the work rhythm in a process than models with undifferentiated resources. The paper is available at: https://link.springer.com/chapter/10.1007/978-3-031-16103-2_24
Prescriptive Process Monitoring Under Uncertainty and Resource ConstraintsMarlon Dumas
This paper presents an approach to trigger runtime interventions at runtime, in order to improve the success rate of a process, when the number of resources who can perform these interventions is limited.
The paper is available at: https://link.springer.com/chapter/10.1007/978-3-031-16171-1_13
The presentation delivered at the 20th International Conference on Business Process Management (BPM'2022), in Muenster, Germany, September 2022.
Slides of a lecture delivered at the First Process Mining Summer School in Aachen, Germany, July 2022.
This lecture introduces techniques in the area of "task mining" with an emphasis on Robotic Process Mining. Robotic Process Mining (RPM) is a family of techniques to discover repetitive routines that can be automated using Robotic Process Automation (RPA) technology, by analyzing interactions between
one or more workers and one or more software applications, during the performance of one or more tasks in a business process. In general, RPM techniques take as input logs of User Interactions (UI logs). These UI logs are recorded while workers interact with one or more applications, typically desktop applications. Based on these logs, RPM techniques produce specifications of one or more routines that can be automated using RPA or related tools.
Accurate and Reliable What-If Analysis of Business Processes: Is it Achievable?Marlon Dumas
This document discusses using event logs to generate business process simulation models. It describes traditional discrete event simulation approaches that discover simulation models from event logs recorded by information systems. Deep learning techniques are also discussed that can generate traces without an explicit process model. The document suggests that combining discrete event simulation and deep learning may produce more accurate simulations, but challenges remain around validating such hybrid approaches and testing them in previously unseen scenarios. More research is needed before these data-driven simulation methods can reliably predict the effects of interventions.
Learning Accurate Business Process Simulation Models from Event Logs via Auto...Marlon Dumas
Paper presentation at the International Conference on Advanced Information Systems Engineering (CAiSE).
This paper presents an approach to automatically discover business process simulation models from event logs by combining process mining and deep learning techniques.
Paper available at: https://link.springer.com/chapter/10.1007/978-3-031-07472-1_4
Process Mining: A Guide for PractitionersMarlon Dumas
This document presents a guide for practitioners on process mining. It introduces process mining and discusses its main use cases. These use cases are categorized into discovery oriented, future and change oriented, alignment oriented, variant oriented, and performance oriented. The document also provides a framework to classify use cases and discusses the business-oriented questions that can be answered using different process mining use cases, such as improving transparency, quality, agility, efficiency and conformance.
Process Mining for Process Improvement.pptxMarlon Dumas
Presentation of a research paper at the 16th International Conference on Research Challenges in Information Science (RCIS). The paper presents the results of an empirical study on how practitioners use process mining to identify business process improvement opportunities. The paper is available at: https://link.springer.com/chapter/10.1007/978-3-031-05760-1_13
Data-Driven Analysis of Batch Processing Inefficiencies in Business ProcessesMarlon Dumas
Slides of a research paper presentation at the 16th International Conference on Research Challenges in Information Science (RCIS).
The research paper presents an approach to analyze event logs of business processes in order to identify batched activities and to analyze the waiting times caused by these activities.
Paper available at: https://link.springer.com/chapter/10.1007/978-3-031-05760-1_14
Optimización de procesos basada en datosMarlon Dumas
Ponencia en BPM Day Lima 2021.
En esta charla, hablaremos de métodos y aplicaciones emergentes en el ámbito de la optimización de procesos basada en datos. Hablaremos de avances en el área de la minería de procesos, de métodos de construcción de gemelos digitales de procesos y de métodos de monitoreo predictivo. Mostraremos por medio de ejemplos y casos de estudio, cómo estos métodos permiten guiar las iniciativas de transformación digital y de mejora continua de procesos, En particular, ilustraremos el uso de estos métodos para: (1) analizar el rendimiento de los procesos de negocio de manera a identificar fricciones y oportunidades de automatización; (2) predecir el impacto de cambios, y en particular, predecir el impacto de una iniciativa de automatización; (3) realizar predicciones sobre el rendimiento del proceso y ajustar la ejecución del proceso de manera a prevenir incumplimientos del SLA, quejas de clientes, y otros eventos indeseables.
Process Mining and AI for Continuous Process ImprovementMarlon Dumas
Talk delivered at BPM Day Rio Grande do Sul on 11 November 2021.
Abstract.
Process mining is a technology that marries methods from business process management and from data science, to support operational excellence and digital transformation. Process mining tools can transform data extracted from enterprise systems, into visualizations and reports that allow managers to improve organizational performance along different dimensions, such as efficiency, quality, and compliance. In this talk, we will give an overview of the capabilities of process mining tools, and we will illustrate the benefits of process mining via several case studies in the fields of insurance, manufacturing, and IT service management.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
3. Conflicting goals: Privacy vs. Utility
We need to release aggregate information about data without leaking
information about an individual involved in the incident
• Aggregate info: Number of crew members of nationality X in the ship
• Individual info: Is a particular crew member of nationality X?
Problem: Aggregate information may leak information on individuals
Number of crew members of nationality X in the ship,
Number of crew members of nationality X in the ship excluding Y
4. Differential privacy (Dwork 2006)
K gives e-differential privacy if for all values of DB, DB’differing
in a single element, and all S in Range(K )
Pr[ K (DB) in S]
Pr[ K (DB’) in S]
≤ eε ~ (1+ε)
5. Source: Gerome Miklau and Michael Hay
Accuracy loss!
Differential privacy (Laplacian noise)
6. NAPLES project’s goal
Develop theoretical foundations for implementing tools that
• Let one model stakeholders and flows in the Business Process Model and Notation
(BPMN)
• Find data leaks in these process models, taking into account the Privacy-Enhancing
Technologies used in the as-is models
• Quantify leakages using differential privacy
• Quantify accuracy loss
• Suggest relevant privacy-enhancing technologies to reduce privacy leaks
See http://pleak.io
7. Usage scenarios
•Support privacy audit of existing system
What will each stakeholder of the System learn about a
private data object? e.g. with respect to differential privacy
•Build a new privacy-aware system
What will each stakeholder of the System learn about each
private data object?
Which Privacy Enhancing Technologies would help reducing
the leakage?
9. Adding privacy-enhancing technologies
OrganizationX
EmergencyOfficerDispatcher
Communities
Retrieve
Communities
Requiring
Intensive Care
Ships
ReservedFor
Evacuees
Adjust Ship to
Community
Assignment
Ship to
Community
Assignment
Communities ICU
requirements Ships
OrganizationY
Retrieve
Nearby Ships with
Spare Capacity &
Equipment
dp-task
dp-flow
Need to
assign ship
Send ship
assignment
(ships, disaster) -> {
avail_food = 0;
avail_ships = [];
for (ship in ships) do {
fuzzed_loc = ship.loc() + Lap2(3);
if (dist(fuzzed_loc, disaster.loc()) / ship.speed() <= 2
&& ship.cargo_type() == "food"
&& !ship.contains(dangerous_materials) ) {
avail_food += ship.cargo();
avail_ships.append({ship.name(), fuzzed_loc});
}
}
avail_food += Lap(2);
return (avail_food, avail_ships);
}
10. Annotating the model with DF and sensitivity bounds
OrganizationX
EmergencyOfficerDispatcher
Communities
Retrieve
Communities
Requiring
Intensive Care
Ships
ReservedFor
Evacuees
Adjust Ship to
Community
Assignment
Ship to
Community
Assignment
Communities ICU
requirements Ships
OrganizationY
Retrieve
Nearby Ships with
Spare Capacity &
Equipment
Ship to
Community
Assignment
(∈:0.1,c:0.2)
(∈:0.1,c:0.2) (∈:0.1,c:0.2)
(∈:0.1,c:0.2)
(∈:0.1,c:0.2) (∈:0.1,c:0.2)
Need to
assign ship
Send ship
assignment
11. Differential Privacy Disclosure (Roles)
OrganizationX
EmergencyOfficerDispatcher
Communities
Retrieve
Communities
Requiring
Intensive Care
Ships
ReservedFor
Evacuees
Adjust Ship to
Community
Assignment
Ship to
Community
Assignment
Communities ICU
requirements Ships
OrganizationY
Retrieve
Nearby Ships with
Spare Capacity &
Equipment
Ship to
Community
Assignment
(∈:0.1,c:0.2)
(∈:0.1,c:0.2) (∈:0.1,c:0.2)
(∈:0.1,c:0.2)
(∈:0.1,c:0.2) (∈:0.1,c:0.2)
Need to
assign ship
Send ship
assignment
14. Generalized sensitivity
• Generalized distances – any partial order with addition and least element
• has sensitivity
• Differential privacy is a specific case of generalized sensitivity
• Generalized sensitivity is composable, i.e.
19. Outlook
•Extend privacy analyzer to cover a broader class of BPMN
process models
E.g. adding conditional branching
•Principles of program analysis for DP
For arbitrary generalized metrics and sensitivities
•Defining a super set of BPMN, with ad-hoc constructs to model
privacy related concerns (i.e. PA-BPMN)
•Building the PETs library & extend PA-BPMN to cater for other
PETs besides differential privacy