The document describes an ontology-based approach to handling information quality in e-science. It presents an initial quality framework that captures scientists' quality requirements and allows defining domain-specific quality characteristics. It introduces a web service that annotates datasets with quality metrics based on how well their elements conform to relevant ontologies, using transcriptomics as an example domain. The approach aims to make quality definitions reusable and the computation of quality measurements over large datasets cost-effective.
Novel Database-Centric Framework for Incremental Information Extractionijsrd.com
Information extraction (IE) has been an active research area that seeks techniques to uncover information from a large collection of text. IE is the task of automatically extracting structured information from unstructured and/or semi structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in document processing like automatic annotation and content extraction could be seen as information extraction. Many applications call for methods to enable automatic extraction of structured information from unstructured natural language text. Due to the inherent challenges of natural language processing, most of the existing methods for information extraction from text tend to be domain specific. In this project a new paradigm for information extraction. In this extraction framework, intermediate output of each text processing component is stored so that only the improved component has to be deployed to the entire corpus. Extraction is then performed on both the previously processed data from the unchanged components as well as the updated data generated by the improved component. Performing such kind of incremental extraction can result in a tremendous reduction of processing time and there is a mechanism to generate extraction queries from both labeled and unlabeled data. Query generation is critical so that casual users can specify their information needs without learning the query language.
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW ijait
Protégé is one of the most popular tools of the ontology visualization. The “Protégé” tools are being applied for further development in various disciplines for better understanding of knowledge. These tools commonly use four methods of ontology visualization, namely, indented list, node-link and tree,
zoomable, and focus+context. The purpose of this work is to present a study on application of these four methods in the development of different kinds of protégé visualization tools and categorize their characteristics and features so that it assists in method selection and promotes further future research in
the area of ontology visualization.
SDTM (Study Data Tabulation Model) defines a standard structure for human clinical trial (study) data tabulations and for nonclinical study data tabulations that are to be submitted as part of a product application to a regulatory authority such as the United States Food and Drug Administration (FDA).
Novel Database-Centric Framework for Incremental Information Extractionijsrd.com
Information extraction (IE) has been an active research area that seeks techniques to uncover information from a large collection of text. IE is the task of automatically extracting structured information from unstructured and/or semi structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in document processing like automatic annotation and content extraction could be seen as information extraction. Many applications call for methods to enable automatic extraction of structured information from unstructured natural language text. Due to the inherent challenges of natural language processing, most of the existing methods for information extraction from text tend to be domain specific. In this project a new paradigm for information extraction. In this extraction framework, intermediate output of each text processing component is stored so that only the improved component has to be deployed to the entire corpus. Extraction is then performed on both the previously processed data from the unchanged components as well as the updated data generated by the improved component. Performing such kind of incremental extraction can result in a tremendous reduction of processing time and there is a mechanism to generate extraction queries from both labeled and unlabeled data. Query generation is critical so that casual users can specify their information needs without learning the query language.
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW ijait
Protégé is one of the most popular tools of the ontology visualization. The “Protégé” tools are being applied for further development in various disciplines for better understanding of knowledge. These tools commonly use four methods of ontology visualization, namely, indented list, node-link and tree,
zoomable, and focus+context. The purpose of this work is to present a study on application of these four methods in the development of different kinds of protégé visualization tools and categorize their characteristics and features so that it assists in method selection and promotes further future research in
the area of ontology visualization.
SDTM (Study Data Tabulation Model) defines a standard structure for human clinical trial (study) data tabulations and for nonclinical study data tabulations that are to be submitted as part of a product application to a regulatory authority such as the United States Food and Drug Administration (FDA).
Experimental Result Analysis of Text Categorization using Clustering and Clas...ijtsrd
In a world that routinely produces more textual data. It is very critical task to managing that textual data. There are many text analysis methods are available to managing and visualizing that data, but many techniques may give less accuracy because of the ambiguity of natural language. To provide the ne grained analysis, in this paper introduce e cient machine learning algorithms for categorize text data. To improve the accuracy, in proposed system I introduced Natural language toolkit NLTK python library to perform natural language processing. The main aim of proposed system is to generalize the model for real time text categorization applications by using e cient text classi cation as well as clustering machine learning algorithms and nd the efficient and accurate model for input dataset using performance measure concept. Patil Kiran Sanajy | Prof. Kurhade N. V. ""Experimental Result Analysis of Text Categorization using Clustering and Classification Algorithms"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25077.pdf
Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/25077/experimental-result-analysis-of-text-categorization-using-clustering-and-classification-algorithms/patil-kiran-sanajy
Text mining efforts to innovate new, previous unknown or hidden data by automatically extracting
collection of information from various written resources. Applying knowledge detection method to
formless text is known as Knowledge Discovery in Text or Text data mining and also called Text Mining.
Most of the techniques used in Text Mining are found on the statistical study of a term either word or
phrase. There are different algorithms in Text mining are used in the previous method. For example
Single-Link Algorithm and Self-Organizing Mapping(SOM) is introduces an approach for visualizing
high-dimensional data and a very useful tool for processing textual data based on Projection method.
Genetic and Sequential algorithms are provide the capability for multiscale representation of datasets and
fast to compute with less CPU time based on the Isolet Reduces subsets in Unsupervised Feature
Selection. We are going to propose the Vector Space Model and Concept based analysis algorithm it will
improve the text clustering quality and a better text clustering result may achieve. We think it is a good
behavior of the proposed algorithm is in terms of toughness and constancy with respect to the formation of
Neural Network.
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWScsandit
The Web considers one of the main sources of customer opinions and reviews which they are represented in two formats; structured data (numeric ratings) and unstructured data (textual comments). Millions of textual comments about goods and services are posted on the web by customers and every day thousands are added, make it a big challenge to read and understand them to make them a useful structured data for customers and decision makers. Sentiment
analysis or Opinion mining is a popular technique for summarizing and analyzing those opinions and reviews. In this paper, we use natural language processing techniques to generate some rules to help us understand customer opinions and reviews (textual comments) written in the Arabic language for the purpose of understanding each one of them and then convert them to a structured data. We use adjectives as a key point to highlight important information in the text then we work around them to tag attributes that describe the subject of the reviews, and we associate them with their values (adjectives).
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly.
For more information:
http://societyofdatascientists.com/controlling-informative-features-for-improved-accuracy-and-faster-predictions-in-omentum-cancer-models/?src=slideshare
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing
one single patient’s information gathered from a large volume of data over a long period of time. The
main objective of this paper is to represent our ontology-based information retrieval approach for
clinical Information System. We have performed a Case Study in the real life hospital settings. The results
obtained illustrate the feasibility of the proposed approach which significantly improved the information
retrieval process on a large volume of data over a long period of time from August 2011 until January
2012
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
Abstract—The Information retrieval system is taking an important role in current search engine which performs searching operation based on keywords which results in an enormous amount of data available to the user, from which user cannot figure out the essential and most important information. This limitation may be overcome by a new web architecture known as the semantic web which overcome the limitation of the keyword based search technique called the conceptual or the semantic search technique. Natural language processing technique is mostly implemented in a QA system for asking user’s questions and several steps are also followed for conversion of questions to the query form for retrieving an exact answer. In conceptual search, search engine interprets the meaning of the user’s query and the relation among the concepts that document contains with respect to a particular domain that produces specific answers instead of showing lists of answers. In this paper, we proposed the ontology based semantic information retrieval system and the Jena semantic web framework in which, the user enters an input query which is parsed by Standford Parser then the triplet extraction algorithm is used. For all input queries, the SPARQL query is formed and further, it is fired on the knowledge base (Ontology) which finds appropriate RDF triples in knowledge base and retrieve the relevant information using the Jena framework.
Filter-Wrapper Approach to Feature Selection Using PSO-GA for Arabic Document...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...idescitation
Genetic Algorithm (GA) has been a successful method
that is been used for extracting keywords. This paper presents
a full method by which keywords can be derived from the
various corpuses. We have built equations that exploit the
structure of the documents from which the keywords need to
be extracted. The procedures are been broken into two
distinguished profiles: one is to weigh the words in the whole
document content and the other is to explore the possibilities
of the occurrence of key terms by using genetic algorithm.
The basic equations of the heuristic mechanism is been varied
to allow the complete exploitation of document. The Genetic
Algorithm and the enhanced standard deviation method is
used in full potential to enable the generation of the key
terms that describe the given text document. The new
technique has an enhanced performance and better time
complexities.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Text mining is a new and exciting research area that tries to solve the information overload problem by using techniques from machine learning, natural language processing (NLP), data mining, information retrieval (IR), and knowledge management. Text mining involves the pre-processing of document collections such as information extraction, term extraction, text categorization, and storage of intermediate representations. The techniques that are used to analyse these intermediate representations such as clustering, distribution analysis, association rules and visualisation of the results.
Experimental Result Analysis of Text Categorization using Clustering and Clas...ijtsrd
In a world that routinely produces more textual data. It is very critical task to managing that textual data. There are many text analysis methods are available to managing and visualizing that data, but many techniques may give less accuracy because of the ambiguity of natural language. To provide the ne grained analysis, in this paper introduce e cient machine learning algorithms for categorize text data. To improve the accuracy, in proposed system I introduced Natural language toolkit NLTK python library to perform natural language processing. The main aim of proposed system is to generalize the model for real time text categorization applications by using e cient text classi cation as well as clustering machine learning algorithms and nd the efficient and accurate model for input dataset using performance measure concept. Patil Kiran Sanajy | Prof. Kurhade N. V. ""Experimental Result Analysis of Text Categorization using Clustering and Classification Algorithms"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25077.pdf
Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/25077/experimental-result-analysis-of-text-categorization-using-clustering-and-classification-algorithms/patil-kiran-sanajy
Text mining efforts to innovate new, previous unknown or hidden data by automatically extracting
collection of information from various written resources. Applying knowledge detection method to
formless text is known as Knowledge Discovery in Text or Text data mining and also called Text Mining.
Most of the techniques used in Text Mining are found on the statistical study of a term either word or
phrase. There are different algorithms in Text mining are used in the previous method. For example
Single-Link Algorithm and Self-Organizing Mapping(SOM) is introduces an approach for visualizing
high-dimensional data and a very useful tool for processing textual data based on Projection method.
Genetic and Sequential algorithms are provide the capability for multiscale representation of datasets and
fast to compute with less CPU time based on the Isolet Reduces subsets in Unsupervised Feature
Selection. We are going to propose the Vector Space Model and Concept based analysis algorithm it will
improve the text clustering quality and a better text clustering result may achieve. We think it is a good
behavior of the proposed algorithm is in terms of toughness and constancy with respect to the formation of
Neural Network.
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWScsandit
The Web considers one of the main sources of customer opinions and reviews which they are represented in two formats; structured data (numeric ratings) and unstructured data (textual comments). Millions of textual comments about goods and services are posted on the web by customers and every day thousands are added, make it a big challenge to read and understand them to make them a useful structured data for customers and decision makers. Sentiment
analysis or Opinion mining is a popular technique for summarizing and analyzing those opinions and reviews. In this paper, we use natural language processing techniques to generate some rules to help us understand customer opinions and reviews (textual comments) written in the Arabic language for the purpose of understanding each one of them and then convert them to a structured data. We use adjectives as a key point to highlight important information in the text then we work around them to tag attributes that describe the subject of the reviews, and we associate them with their values (adjectives).
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly.
For more information:
http://societyofdatascientists.com/controlling-informative-features-for-improved-accuracy-and-faster-predictions-in-omentum-cancer-models/?src=slideshare
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing
one single patient’s information gathered from a large volume of data over a long period of time. The
main objective of this paper is to represent our ontology-based information retrieval approach for
clinical Information System. We have performed a Case Study in the real life hospital settings. The results
obtained illustrate the feasibility of the proposed approach which significantly improved the information
retrieval process on a large volume of data over a long period of time from August 2011 until January
2012
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
Abstract—The Information retrieval system is taking an important role in current search engine which performs searching operation based on keywords which results in an enormous amount of data available to the user, from which user cannot figure out the essential and most important information. This limitation may be overcome by a new web architecture known as the semantic web which overcome the limitation of the keyword based search technique called the conceptual or the semantic search technique. Natural language processing technique is mostly implemented in a QA system for asking user’s questions and several steps are also followed for conversion of questions to the query form for retrieving an exact answer. In conceptual search, search engine interprets the meaning of the user’s query and the relation among the concepts that document contains with respect to a particular domain that produces specific answers instead of showing lists of answers. In this paper, we proposed the ontology based semantic information retrieval system and the Jena semantic web framework in which, the user enters an input query which is parsed by Standford Parser then the triplet extraction algorithm is used. For all input queries, the SPARQL query is formed and further, it is fired on the knowledge base (Ontology) which finds appropriate RDF triples in knowledge base and retrieve the relevant information using the Jena framework.
Filter-Wrapper Approach to Feature Selection Using PSO-GA for Arabic Document...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...idescitation
Genetic Algorithm (GA) has been a successful method
that is been used for extracting keywords. This paper presents
a full method by which keywords can be derived from the
various corpuses. We have built equations that exploit the
structure of the documents from which the keywords need to
be extracted. The procedures are been broken into two
distinguished profiles: one is to weigh the words in the whole
document content and the other is to explore the possibilities
of the occurrence of key terms by using genetic algorithm.
The basic equations of the heuristic mechanism is been varied
to allow the complete exploitation of document. The Genetic
Algorithm and the enhanced standard deviation method is
used in full potential to enable the generation of the key
terms that describe the given text document. The new
technique has an enhanced performance and better time
complexities.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Text mining is a new and exciting research area that tries to solve the information overload problem by using techniques from machine learning, natural language processing (NLP), data mining, information retrieval (IR), and knowledge management. Text mining involves the pre-processing of document collections such as information extraction, term extraction, text categorization, and storage of intermediate representations. The techniques that are used to analyse these intermediate representations such as clustering, distribution analysis, association rules and visualisation of the results.
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paolo Missier
Missier, P., Ludascher, B., Bowers, S., Anand, M. K., Altintas, I., Dey, S., et al. (2010). Linking Multiple Workflow Provenance Traces for Interoperable Collaborative Science. Proc.s 5th Workshop on Workflows in Support of Large-Scale Science (WORKS).
Your data won’t stay smart forever:exploring the temporal dimension of (big ...Paolo Missier
Much of the knowledge produced through data-intensive computations is liable to decay over time, as the underlying data drifts, and the algorithms, tools, and external data sources used for processing change and evolve. Your genome, for example, does not change over time, but our understanding of it does. How often should be look back at it, in the hope to gain new insight e.g. into genetic diseases, and how much does that cost when you scale re-analysis to an entire population?
The "total cost of ownership” of knowledge derived from data (TCO-DK) includes the cost of refreshing the knowledge over time in addition to the initial analysis, but is often not a primary consideration.
The ReComp project aims to provide models, algorithms, and tools to help humans understand TCO-DK, i.e., the nature and impact of changes in data, and assess the cost and benefits of knowledge refresh.
In this talk we try and map the scope of ReComp, by giving a number of patterns that cover typical analytics scenarios where re-computation is appropriate. We specifically describe two such scenarios, where we are conducting small scale, proof-of-concept ReComp experiments to help us sketch the general ReComp architecture. This initial exercise reveals a multiplicity of problems and research challenges, which will inform the rest of the project
Resource Description Framework Approach to Data Publication and FederationPistoia Alliance
Bob Stanley, CEO, IO Informatics, explains the utility to RDF as a standard way of defining and redefining data that could have utility in managing life science information.
I gave this talk in the EDBT 2014 conference, which tool place in Athens, Greece.
I show how data examples can be used to characterize the behavior of scientific modules. I present a new methods that automatically generate the data examples, and show that such data examples are useful for the human user to understand the task of the modules, and that they can be used to assist curators in repairing broken workflows (i.e., workflows for which one or more modules are no longer supplied by their providers)
Reference Domain Ontologies and Large Medical Language Models.pptxChimezie Ogbuji
Large Language Models (LLMs) have exploded into the modern research and development consciousness and triggered an artificial intelligence revolution. They are well-positioned to have a major impact on Medical Informatics. However, much of the data used to train these revolutionary models are general-purpose and, in some cases, synthetically generated from LLMs. Ontologies are a shared and agreed-upon conceptualization of a domain and facilitate computational reasoning. They have become important tools in biomedicine, supporting critical aspects of healthcare and biomedical research, and are integral to science. In this talk, we will delve into ontologies, their representational and reasoning power, and how terminology systems such as SNOMED-CT, an international master terminology providing comprehensive coverage of the entire domain of medicine, can be used with Controlled Natural Languages (CNL) to advance how LLMs are used and trained.
Curation-Friendly Tools for the Scientific Researcherbwestra
Presentation for Online Northwest Conference, in Corvallis Oregon, February 10, 2012.
Highlights electronic lab notebooks (ELN) and OMERO (Open Microscopy Environment) as two tools that enable researchers to better manage their research data.
IEEE Projects 2012 For Me Cse @ Seabirds ( Trichy, Chennai, Thanjavur, Pudukk...SBGC
ieee projects 2012 for cse, ieee projects 2012, ieee projects for cse, ieee projects for cse 2012, ieee project for cse 2012, ieee projects for cse 2012 titles, ieee projects for cse 2012 free download, ieee mini projects for cse 2012, ieee projects 2012 for cse with abstract, ieee final year projects 2012 for cse, ieee projects titles 2012 for cse, ieee projects titles 2012 for mca, ieee projects titles 2012 for it, ieee projects titles 2012, ieee projects 2012 for it, ieee projects 2012 for mca, ieee projects 2012 for me, ieee projects 2012 for me cse, ieee projects 2012 for me cse with abstract, latest ieee projects 2012 for cse, latest ieee projects 2012 for it, latest ieee projects 2012
ieee projects 2012 for cse, ieee projects 2012, ieee projects for cse, ieee projects for cse 2012, ieee project for cse 2012, ieee projects for cse 2012 titles, ieee projects for cse 2012 free download, ieee mini projects for cse 2012, ieee projects 2012 for cse with abstract, ieee final year projects 2012 for cse, ieee projects titles 2012 for cse, ieee projects titles 2012 for mca, ieee projects titles 2012 for it, ieee projects titles 2012, ieee projects 2012 for it, ieee projects 2012 for mca, ieee projects 2012 for me, ieee projects 2012 for me cse, ieee projects 2012 for me cse with abstract, latest ieee projects 2012 for cse, latest ieee projects 2012 for it, latest ieee projects 2012, ieee projects 2012 in networking, ieee projects 2012 in data mining, ieee 2012 projects on cloud computing, ieee projects mobile computing 2012, ieee projects networking, ieee projects network security, ieee projects 2012 for it with abstract, ieee image processing projects 2012
ieee projects 2012 for cse, ieee projects 2012, ieee projects for cse, ieee projects for cse 2012, ieee project for cse 2012, ieee projects for cse 2012 titles, ieee projects for cse 2012 free download, ieee mini projects for cse 2012, ieee projects 2012 for cse with abstract, ieee final year projects 2012 for cse, ieee projects titles 2012 for cse, ieee projects titles 2012 for mca, ieee projects titles 2012 for it, ieee projects titles 2012, ieee projects 2012 for it, ieee projects 2012 for mca, ieee projects 2012 for me, ieee projects 2012 for me cse, ieee projects 2012 for me cse with abstract, latest ieee projects 2012 for cse, latest ieee projects 2012 for it, latest ieee projects 2012, ieee projects 2012 in networking, ieee projects 2012 in data mining, ieee 2012 projects on cloud computing, ieee projects mobile computing 2012, ieee projects networking, ieee projects network security, ieee projects 2012 for it with abstract, ieee image processing projects 2012
Similar to Paper presentations: UK e-science AHM meeting, 2005 (20)
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
A talk given at the DATAPLAT workshop, co-located with the IEEE ICDE conference (May 2024, Utrecht, NL).
Data Provenance for Data Science is our attempt to provide a foundation to add explainability to data-centric AI.
It is a prototype, with lots of work still to do.
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
In this presentation, given to graduate students at Universita' RomaTre, Italy, we suggest that concepts well-known in Data Provenance can be exploited to provide explanations in the context of data-centric AI processes. Through use cases (incremental data cleaning, training set pruning), we build up increasingly complex provenance patterns, culminating in an open question:
how to describe "why" a specific data item has been manipulated as part of data processing, when such processing may consist of a complex data transformation algorithm.
Interpretable and robust hospital readmission predictions from Electronic Hea...Paolo Missier
A talk given at the BDA4HM workshop, IEEE BigData conference, Dec. 2023
please see paper here:
https://drive.google.com/file/d/1vN08G0FWxOSH1Yeak5AX6a0sr5-EBbAt/view
Data-centric AI and the convergence of data and model engineering:opportunit...Paolo Missier
A keynote talk given to the IDEAL 2023 conference (Evora, Portugal Nov 23, 2023).
Abstract.
The past few years have seen the emergence of what the AI community calls "Data-centric AI", namely the recognition that some of the limiting factors in AI performance are in fact in the data used for training the models, as much as in the expressiveness and complexity of the models themselves. One analogy is that of a powerful engine that will only run as fast as the quality of the fuel allows. A plethora of recent literature has started the connection between data and models in depth, along with startups that offer "data engineering for AI" services. Some concepts are well-known to the data engineering community, including incremental data cleaning, multi-source integration, or data bias control; others are more specific to AI applications, for instance the realisation that some samples in the training space are "easier to learn from" than others. In this "position talk" I will suggest that, from an infrastructure perspective, there is an opportunity to efficiently support patterns of complex pipelines where data and model improvements are entangled in a series of iterations. I will focus in particular on end-to-end tracking of data and model versions, as a way to support MLDev and MLOps engineers as they navigate through a complex decision space.
Realising the potential of Health Data Science:opportunities and challenges ...Paolo Missier
A guest lecture given to a group of healthcare professionals as part of an Information Management course at Newcastle University, on working with healthcare data to generate disease risk prediction models
A Data-centric perspective on Data-driven healthcare: a short overviewPaolo Missier
a brief intro on the data challenges associated with working with Health Care data, with a few examples, both from literature and our own, of traditional approaches (Latent Class Analysis, Topic Modelling) and a perspective on Language-based modelling for Electronic Health Records (EHR).
probably more references than actual content in here!
Tracking trajectories of multiple long-term conditions using dynamic patient...Paolo Missier
Momentum has been growing into research to better understand the dynamics of multiple long-term conditions-multimorbidity (MLTC-M), defined as the co-occurrence of two or more long-term or chronic conditions within an individual. Several research efforts make use of Electronic Health Records (EHR), which represent patients' medical histories. These range from discovering patterns of multimorbidity, namely by clustering diseases based on their co-occurrence in EHRs, to using EHRs to predict the next disease or other specific outcomes. One problem with the former approach is that it discards important temporal information on the co-occurrence, while the latter requires "big" data volumes that are not always available from routinely collected EHRs, limiting the robustness of the resulting models. In this paper we take an intermediate approach, where initially we use about 143,000 EHRs from UK Biobank to perform time-independent clustering using topic modelling, and Latent Dirichlet Allocation specifically. We then propose a metric to measure how strongly a patient is "attracted" into any given cluster at any point through their medical history. By tracking how such gravitational pull changes over time, we may then be able to narrow the scope for potential interventions and preventative measures to specific clusters, without having to resort to full-fledged predictive modelling. In this preliminary work we show exemplars of these dynamic associations, which suggest that further exploration may lead to On behalf of the AI-MULTIPLY consortium. Funded by NIHR AIM Development grant to AI-MULTIPLY actionable insights into patients' medical trajectories.
Digital biomarkers for preventive personalised healthcarePaolo Missier
A talk given to the Alan Turing Institute, UK, Oct 2021, reporting on the preliminary results and ongoing research in our lab, on self-monitoring using accelerometers for healthcare applications
Digital biomarkers for preventive personalised healthcarePaolo Missier
A talk given to the Alan Turing Institute, UK, Oct 2021, reporting on the preliminary results and ongoing research in our lab, on self-monitoring using accelerometers for healthcare applications
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
a talk given at the VLDB 2021 conference, August, 2021, presenting our paper:
Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science. Chapman, A., Missier, P., Simonelli, G., & Torlone, R. PVLDB, 14(4):507–520, January, 2021.
http://doi.org/10.14778/3436905.3436911
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Paper presentations: UK e-science AHM meeting, 2005
1. An Ontology-Based Approach to Handling Information Quality in e-Science Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science, University of Manchester Alun Preece , Binling Jin Department of Computing Science, University of Aberdeen www.qurator.org Describing the Quality of Curated e-Science Information Resources
21. www.qurator.org Describing the Quality of Curated e-Science Information Resources Suzanne Embury Paolo Missier Mark Greenwood Andy Brass Brian Warboys Alun Preece Binling Jin Edoardo Pignotti Al Brown David Stead Dawn Field Bela Tiwari Joe Wood The Qurator project is funded by the EPSRC Programme Fundamental Computer Science for e-Science : GR/S67593 & GR/S67609. Qurator logo by Irene Christensen.