Slides from the Ontology Access Kit (OAK) workshop, https://incatools.github.io/ontology-access-kit/
OAK is a pluralistic Python library for accessing a variety of ontologies, using either the command line or the Python library
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOChris Mungall
NOTE THAT I HAVE MOVED AWAY FROM SLIDESHARE TO ZENODO
The identical presentation is now here:
https://doi.org/10.5281/zenodo.7778641
General introduction to LinkML, The Linked Data Modeling Language.
Adapter from presentation given to NIH May 2022
https://linkml.io/linkml
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
Wix' internal ML Platform, whose mission is to allow data scientists and analysts at Wix to build, deploy, maintain, and monitor machine learning models in production with minimal engineering efforts
An online training course run by the FIWARE Foundation in conjunction with the i4Trust project and IShare Foundation. The core part of this virtual training camp (27 Jun - 01 Jul 2022) covered all the necessary skills to develop smart solutions powered by FIWARE. It introduces the basis of Digital Twin programming using NGSI-LD (the simple yet powerful open standard API enabling to publish and access digital twin data) combined with common smart data models
In addition, it covers the supplementary FIWARE technologies used to implement the rest of functions typically required when architecting a complete smart solution: Identity and Access Management (IAM) functions to secure access to digital twin data, and functions enabling the interface with IoT and 3rd systems, or the connection with different tools for processing and monitoring current and historic big data.
Extending this core part, the training camp also cover how you can easily integrate FIWARE systems with blockchain networks to create audit-proof logs of processes and ensure transparency.
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOChris Mungall
NOTE THAT I HAVE MOVED AWAY FROM SLIDESHARE TO ZENODO
The identical presentation is now here:
https://doi.org/10.5281/zenodo.7778641
General introduction to LinkML, The Linked Data Modeling Language.
Adapter from presentation given to NIH May 2022
https://linkml.io/linkml
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
Wix' internal ML Platform, whose mission is to allow data scientists and analysts at Wix to build, deploy, maintain, and monitor machine learning models in production with minimal engineering efforts
An online training course run by the FIWARE Foundation in conjunction with the i4Trust project and IShare Foundation. The core part of this virtual training camp (27 Jun - 01 Jul 2022) covered all the necessary skills to develop smart solutions powered by FIWARE. It introduces the basis of Digital Twin programming using NGSI-LD (the simple yet powerful open standard API enabling to publish and access digital twin data) combined with common smart data models
In addition, it covers the supplementary FIWARE technologies used to implement the rest of functions typically required when architecting a complete smart solution: Identity and Access Management (IAM) functions to secure access to digital twin data, and functions enabling the interface with IoT and 3rd systems, or the connection with different tools for processing and monitoring current and historic big data.
Extending this core part, the training camp also cover how you can easily integrate FIWARE systems with blockchain networks to create audit-proof logs of processes and ensure transparency.
The financial industry operates on a variety of different data and computing platforms. Integrating these different sources into a centralized data lake is crucial to support reporting and analytics tools.
Apache Spark is becoming the tool of choice for big data integration analytics due it’s scalable nature and because it supports processing data from a variety of data sources and formats such as JSON, Parquet, Kafka, etc. However, one of the most common platforms in the financial industry is the mainframe, which does not provide easy interoperability with other platforms.
COBOL is the most used language in the mainframe environment. It was designed in 1959 and evolved in parallel to other programming languages, thus, having its own constructs and primitives. Furthermore, data produced by COBOL has EBCDIC encoding and has a different binary representation of numeric data types.
We have developed Cobrix, a library that extends Spark SQL API to allow direct reading from binary files generated by mainframes.
While projects like Sqoop focus on transferring relational data by providing direct connectors to a mainframe, Cobrix can be used to parse and load hierarchical data (from IMS for instance) after it is transferred from a mainframe by dumping records to a binary file. Schema should be provided as a COBOL copybook. It can contain nested structures and arrays. We present how the schema mapping between COBOL and Spark was done, and how it was used in the implementation of Spark COBOL data source. We also present use cases of simple and multi-segment files to illustrate how we use the library to load data from mainframes into our Hadoop data lake.
This training camp teaches you how FIWARE technologies and iSHARE, brought together under the umbrella of the i4Trust initiative, can be combined to provide the means for creation of data spaces in which multiple organizations can exchange digital twin data in a trusted and efficient manner, collaborating in the development of innovative services based on data sharing and creating value out of the data they share. SMEs and Digital Innovation Hubs (DIHs) will be equipped with the necessary know-how to use the i4Trust framework for creating data spaces!
Data integration, data interoperation and data quality are major challenges that continue to haunt enterprises. Every enterprise either by choice or by chance has created massive silos of data in different formats, with duplications and quality issues.
Knowledge graphs have proven to be a viable solution to address the integration and interoperation problem. Semantic technologies in particular provide an intelligent way of creating an abstract layer for the enterprise data model and mapping of siloed data to that model, allowing a smooth integration and a common view of the data.
Technologies like OWL (Web Ontology Language) and RDF (Resource Description Framework) are the back bone of semantics for knowledge graph implementation. Enterprises use OWL to build an ontology model to create a common definition for concepts and how they are connected to each other in their specific domain.
They then use RDF to create a triple format representation of their data by mapping it to the Ontology. This approach makes their data smart and machine understandable.
But how can enterprises control and validate the quality of this mapped data? Furthermore, how can they use this one abstract representation of data to meet all their different business requirements? Different departments, different LoBs and different business branches all have their own data needs, creating a new challenge to be tackled by the enterprise.
In this talk we will look at how the power of SHACL (SHAPES and Constraints Language), a W3C standard for defining constraint sets over data; complements the two core semantic technologies OWL and RDF. What are the similarities, the overlaps and the differences.
We will talk about how SHACL gives enterprises the power to reuse, customize and validate their data for various scenarios, uses cases and business requirements; making the application of semantics even more practical.
At Data-centric Architecture Forum 2020 Thomas Cook, our Sales Director of AnzoGraph DB, gave his presentation "Knowledge Graph for Machine Learning and Data Science". These are his slides.
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
Tutorial on "Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs" presented at the 4th Joint International Conference on Semantic Technologies (JIST2014)
Build an application upon Semantic Web models. Brief overview of Apache Jena and OWL-API.
Semantic Web course
e-Lite group (https://elite.polito.it)
Politecnico di Torino, 2017
Neo4j Bloom is a breakthrough graph communication and visualization product that allows graph novices and experts the ability to communicate and share their work, thoughts, and plans with peers, managers, and executives. Its illustrative, codeless search to storyboard design makes it the ideal interface for non-technical project participants to share in the innovative work of their graph analytics and development teams.
In this Introduction to Apache Sqoop the following topics are covered:
1. Why Sqoop
2. What is Sqoop
3. How Sqoop Works
4. Importing and Exporting Data using Sqoop
5. Data Import in Hive and HBase with Sqoop
6. Sqoop and NoSql data store i.e. MongoDB
7. Resources
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
Slides presented at the DBpedia Day, at the Semantcis conference in 2021. FOOPS! (available at https://w3id.org/foops) is a validator based on the FAIR principles that will guide users when conforming their ontologies to them. For each principle, FOOPS! runs a series of tests and notifies errors, suggestions and ways to conform to the best practices.
This talk will give an overview of Apache Nutch, its main components, how it fits with other Apache projects and its latest developments.
Apache Nutch was started exactly 10 years ago and was the starting point for what later became Apache Hadoop and also Apache Tika. Nutch is nowadays the tool of reference for large scale web crawling.
In this talk I will give an overview of Apache Nutch and describe its main components and how Nutch fits with other Apache projects such as Hadoop, SOLR or Tika.
The second part of the presentation will be focused on the latest developments in Nutch and the changes introduced by the 2.x branch with the use of Apache GORA as a front end to various NoSQL datastores.
LOD , Linked Open Data 에 대한 소개 자료 입니다. LOD는 공공 데이터를 제공, 공유, 재활용하기 위한 또 하나의 방법이며 오픈 데이터(Open Data) 를 위한 하나의 방법으로 웹을 기반으로 데이터를 공유하여 재활용하고자 방법이며 기술이고 데이터입니다.
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on DatabricksDatabricks
Data & ML projects bring many new complexities beyond the traditional software development lifecycle. Unlike software projects, after they were successfully delivered and deployed, they cannot be abandoned but must be continuously monitored if model performance still satisfies all requirements. We can always get new data with new statistical characteristics that can break our pipelines or influence model performance. All these qualities of data & ML projects lead us to the necessity of continuous testing and monitoring of our models and pipelines.
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
Looking to implement MLOps using AWS services and Kubeflow? Come and learn about machine learning from the experts of Provectus and Amazon Web Services (AWS)!
Businesses recognize that machine learning projects are important but go beyond just building and deploying models, which is mostly done by organizations. Successful ML projects entail a complete lifecycle involving ML, DevOps, and data engineering and are built on top of ML infrastructure.
AWS and Amazon SageMaker provide a foundation for building infrastructure for machine learning while Kubeflow is a great open source project, which is not given enough credit in the AWS community. In this webinar, we show how to design and build an end-to-end ML infrastructure on AWS.
Agenda
- Introductions
- Case Study: GoCheck Kids
- Overview of AWS Infrastructure for Machine Learning
- Provectus ML Infrastructure on AWS
- Experimentation
- MLOps
- Feature Store
Intended Audience
Technology executives & decision makers, manager-level tech roles, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Qingwei Li, ML Specialist Solutions Architect, AWS
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/webinar-mlops-and-reproducible-ml-on-aws-with-kubeflow-and-sagemaker-aug-2020/
Apache Spark is an open-source distributed general-purpose cluster-computing framework with implicit data parallelism. OpenStreetMap is a huge database of features, found on Earth surface. Working with that database is hard, so Spark is a natural solution to solve OSM size-caused processing issues.
The financial industry operates on a variety of different data and computing platforms. Integrating these different sources into a centralized data lake is crucial to support reporting and analytics tools.
Apache Spark is becoming the tool of choice for big data integration analytics due it’s scalable nature and because it supports processing data from a variety of data sources and formats such as JSON, Parquet, Kafka, etc. However, one of the most common platforms in the financial industry is the mainframe, which does not provide easy interoperability with other platforms.
COBOL is the most used language in the mainframe environment. It was designed in 1959 and evolved in parallel to other programming languages, thus, having its own constructs and primitives. Furthermore, data produced by COBOL has EBCDIC encoding and has a different binary representation of numeric data types.
We have developed Cobrix, a library that extends Spark SQL API to allow direct reading from binary files generated by mainframes.
While projects like Sqoop focus on transferring relational data by providing direct connectors to a mainframe, Cobrix can be used to parse and load hierarchical data (from IMS for instance) after it is transferred from a mainframe by dumping records to a binary file. Schema should be provided as a COBOL copybook. It can contain nested structures and arrays. We present how the schema mapping between COBOL and Spark was done, and how it was used in the implementation of Spark COBOL data source. We also present use cases of simple and multi-segment files to illustrate how we use the library to load data from mainframes into our Hadoop data lake.
This training camp teaches you how FIWARE technologies and iSHARE, brought together under the umbrella of the i4Trust initiative, can be combined to provide the means for creation of data spaces in which multiple organizations can exchange digital twin data in a trusted and efficient manner, collaborating in the development of innovative services based on data sharing and creating value out of the data they share. SMEs and Digital Innovation Hubs (DIHs) will be equipped with the necessary know-how to use the i4Trust framework for creating data spaces!
Data integration, data interoperation and data quality are major challenges that continue to haunt enterprises. Every enterprise either by choice or by chance has created massive silos of data in different formats, with duplications and quality issues.
Knowledge graphs have proven to be a viable solution to address the integration and interoperation problem. Semantic technologies in particular provide an intelligent way of creating an abstract layer for the enterprise data model and mapping of siloed data to that model, allowing a smooth integration and a common view of the data.
Technologies like OWL (Web Ontology Language) and RDF (Resource Description Framework) are the back bone of semantics for knowledge graph implementation. Enterprises use OWL to build an ontology model to create a common definition for concepts and how they are connected to each other in their specific domain.
They then use RDF to create a triple format representation of their data by mapping it to the Ontology. This approach makes their data smart and machine understandable.
But how can enterprises control and validate the quality of this mapped data? Furthermore, how can they use this one abstract representation of data to meet all their different business requirements? Different departments, different LoBs and different business branches all have their own data needs, creating a new challenge to be tackled by the enterprise.
In this talk we will look at how the power of SHACL (SHAPES and Constraints Language), a W3C standard for defining constraint sets over data; complements the two core semantic technologies OWL and RDF. What are the similarities, the overlaps and the differences.
We will talk about how SHACL gives enterprises the power to reuse, customize and validate their data for various scenarios, uses cases and business requirements; making the application of semantics even more practical.
At Data-centric Architecture Forum 2020 Thomas Cook, our Sales Director of AnzoGraph DB, gave his presentation "Knowledge Graph for Machine Learning and Data Science". These are his slides.
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
Tutorial on "Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs" presented at the 4th Joint International Conference on Semantic Technologies (JIST2014)
Build an application upon Semantic Web models. Brief overview of Apache Jena and OWL-API.
Semantic Web course
e-Lite group (https://elite.polito.it)
Politecnico di Torino, 2017
Neo4j Bloom is a breakthrough graph communication and visualization product that allows graph novices and experts the ability to communicate and share their work, thoughts, and plans with peers, managers, and executives. Its illustrative, codeless search to storyboard design makes it the ideal interface for non-technical project participants to share in the innovative work of their graph analytics and development teams.
In this Introduction to Apache Sqoop the following topics are covered:
1. Why Sqoop
2. What is Sqoop
3. How Sqoop Works
4. Importing and Exporting Data using Sqoop
5. Data Import in Hive and HBase with Sqoop
6. Sqoop and NoSql data store i.e. MongoDB
7. Resources
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
Slides presented at the DBpedia Day, at the Semantcis conference in 2021. FOOPS! (available at https://w3id.org/foops) is a validator based on the FAIR principles that will guide users when conforming their ontologies to them. For each principle, FOOPS! runs a series of tests and notifies errors, suggestions and ways to conform to the best practices.
This talk will give an overview of Apache Nutch, its main components, how it fits with other Apache projects and its latest developments.
Apache Nutch was started exactly 10 years ago and was the starting point for what later became Apache Hadoop and also Apache Tika. Nutch is nowadays the tool of reference for large scale web crawling.
In this talk I will give an overview of Apache Nutch and describe its main components and how Nutch fits with other Apache projects such as Hadoop, SOLR or Tika.
The second part of the presentation will be focused on the latest developments in Nutch and the changes introduced by the 2.x branch with the use of Apache GORA as a front end to various NoSQL datastores.
LOD , Linked Open Data 에 대한 소개 자료 입니다. LOD는 공공 데이터를 제공, 공유, 재활용하기 위한 또 하나의 방법이며 오픈 데이터(Open Data) 를 위한 하나의 방법으로 웹을 기반으로 데이터를 공유하여 재활용하고자 방법이며 기술이고 데이터입니다.
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on DatabricksDatabricks
Data & ML projects bring many new complexities beyond the traditional software development lifecycle. Unlike software projects, after they were successfully delivered and deployed, they cannot be abandoned but must be continuously monitored if model performance still satisfies all requirements. We can always get new data with new statistical characteristics that can break our pipelines or influence model performance. All these qualities of data & ML projects lead us to the necessity of continuous testing and monitoring of our models and pipelines.
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
Looking to implement MLOps using AWS services and Kubeflow? Come and learn about machine learning from the experts of Provectus and Amazon Web Services (AWS)!
Businesses recognize that machine learning projects are important but go beyond just building and deploying models, which is mostly done by organizations. Successful ML projects entail a complete lifecycle involving ML, DevOps, and data engineering and are built on top of ML infrastructure.
AWS and Amazon SageMaker provide a foundation for building infrastructure for machine learning while Kubeflow is a great open source project, which is not given enough credit in the AWS community. In this webinar, we show how to design and build an end-to-end ML infrastructure on AWS.
Agenda
- Introductions
- Case Study: GoCheck Kids
- Overview of AWS Infrastructure for Machine Learning
- Provectus ML Infrastructure on AWS
- Experimentation
- MLOps
- Feature Store
Intended Audience
Technology executives & decision makers, manager-level tech roles, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Qingwei Li, ML Specialist Solutions Architect, AWS
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/webinar-mlops-and-reproducible-ml-on-aws-with-kubeflow-and-sagemaker-aug-2020/
Apache Spark is an open-source distributed general-purpose cluster-computing framework with implicit data parallelism. OpenStreetMap is a huge database of features, found on Earth surface. Working with that database is hard, so Spark is a natural solution to solve OSM size-caused processing issues.
GSoC2014 - Uniritter Presentation May, 2015Fabrízio Mello
This presentation is about the work that I did during the Google Summer of Code 2014 to PostgreSQL. The project is about change an Unlogged Table to Logged and vice-versa. Project wiki page: https://wiki.postgresql.org/wiki/Allow_an_unlogged_table_to_be_changed_to_logged_GSoC_2014
I present this work to Uniritter IT students in Canoas/RS (2015-05-18) and Porto Alegre/RS (FAPA - 2015-05-20).
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryMarcus Hanwell
The Open Chemistry project is developing an ambitious platform to facilitate reproducible quantum chemistry workflows by integrating the best of breed open source projects currently available in a cohesive platform with extensions specific to the needs of quantum chemistry. The core of the project is a Python-based data server capable of storing metadata, executing quantum chemistry calculations, and processing the output. The platform exposes RESTful endpoints using programming language agnostic web endpoints, and uses Linux container technology to package quantum codes that are often difficult to build.
The Jupyter project has been leveraged as a web-based frontend offering reproducibility as a core principle. This has been coupled with the data server to initiate quantum chemistry calculations, cache results, make them searchable, and even visualize the results within a modern browser environment. The Avogadro libraries have been reused for visualization workflows, coupled with Open Babel for file translation, and examples of the use of NWChem and Psi4 will be demonstrated.
The core of the platform is developed upon JSON data standards, and encouraging the wider adoption of JSON/HDF5 as the principle storage mediums. A single page web application using React at its core will be shown for sharing simple views of data output, and linking to the Jupyter notebooks that documents how they were made. Command line tools and links to the Avogadro graphical interface will be shown demonstrating capabilities from web through to desktop.
If You Have The Content, Then Apache Has The Technology!gagravarr
Within the ASF, there are a wide variety of projects with technologies to help you store, retrieve, host, transform and generate content. This talk will review the landscape of Apache content technologies, provide a quick introduction to the more common and more interesting projects, and flag up new and innovative features within them. It'll also highlight talks from the rest of the week on many of the projects covered, so that you'll know where and when to go to learn more about those projects and technologies which catch your eye!
The world has changed and having one huge server won’t do the job anymore, when you’re talking about vast amounts of data, growing all the time the ability to Scale Out would be your savior. Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
This lecture will be about the basics of Apache Spark and distributed computing and the development tools needed to have a functional environment.
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
Mahout is an open source machine learning library from Apache. From its humble beginnings at Apache Lucene, the project has grown into a active community of developers, machine learning experts and enthusiasts. With v0.5 released recently, the project has been focussing full steam on developing stable APIs with an eye on our major milestone of v1.0. The speaker has been with Mahout from his days in college as a computer science student. The talk will focus on the major use cases of Mahout. The design decisions, things that worked, things that didn't, and things to expect in the future releases.
http://sdec.kr/
HelsinkiJS - Clojurescript for Javascript DevelopersJuho Teperi
Web development is nowadays dominated by many compile to JS languages. ClojureScript is one of such languages. This talk will give overview of ClojureScript ecosystem.
SciPipe - A light-weight workflow library inspired by flow-based programmingSamuel Lampa
A presentation of the SciPipe workflow library, written in Go (Golang), inspired by Flow-based programming, at an internal workshop at Uppsala University, Department of Pharmaceutical Biosciences.
Making the big data ecosystem work together with Python & Apache Arrow, Apach...Holden Karau
Slides from PyData London exploring how the big data ecosystem (currently) works together as well as how different parts of the ecosystem work with Python. Proof-of-concept examples are provided using nltk & spacy with Spark. Then we look to the future and how we can improve.
Talk given at the London Semantic Web Meetup February 17th 2015. Discusses projects that aim to bridge the gap between the RDF and the Hadoop ecosystems primarily focusing on Apache Jena Elephas.
Demonstration of the applicability of the Linked Data Modeling Language and CHEMROF ( https://chemkg.github.io/chemrof/) for semantic chemical sciences. Presented at MADICES 2022. https://github.com/MADICES/MADICES-2022
Scaling up semantics; lessons learned across the life sciencesChris Mungall
Semantic modeling is key to understanding the biological processes underpinning the health of humans and the health of ecosystems on this planet. There are a number of different approaches to semantic modeling, varying from modeling of *things* in the form of knowledge graphs, modeling of *data structures* in the form of semantic schemas, and modeling of *words* in the form of ultra-large language models. Taking the metaphor of modeling paradigms as planets in a semantic solar system, I will take us on a tour through the solar system, exploring the strengths of each approach, and looking through a historic lens at how we keep iterating over similar solutions with each rotation around the sun. As an alternative to the dichotomy of either resisting change, or starting afresh I urge an approach were we embrace change and adapt with each revolution. I will look specifically at how the OBO community have built powerful knowledge graphs of biological concepts, how the LinkML modeling language incorporates aspects of both frame languages and shape languages, and how language models can be integrated with semantic ontological approaches through the OntoGPT framework
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
Overview of collaborative projects in the life sciences building out the necessary ontologies, schemas, and knowledge graphs for describing biological knowledge
Uberon: opening up to community contributionsChris Mungall
Presentation from the 2nd Phenotypes Traversing All the Organisms (POTATO) Workshop https://www.biocuration2019.org/workshop-potato
Brief summary of Uberon and plan for opening up to community contributions
Presentation on BioMake, a GNU-Make-like utility for managing builds and complex workflows using declarative specifications. From GMOD/PAG meeting 2017
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
2. Agenda
Part 1 (first hour)
● Background (these slides):
○ Motivation & Background
○ Key Concepts
● Demo/Tutorial
○ Installation
○ Command Line Usage
● Example apps:
○ Mapping Walker
○ …
Part 2 (second hour)
● Roadmap and milestones
https://github.com/INCATools/ontology-
access-kit/milestones
● Using SQLite
● Code walkthrough
○ LexMatch
○ SemSim
● Design Decisions
○ Nomenclature
○ Architecture
3. Why would I want a generic ontology library?
● To build data infrastructure
○ Data repositories
○ Knowledge bases
● To clean and analyze data
○ Data annotation and alignment
○ Data interpretation and discovery
■ gene set enrichment and pathway analysis
■ Semantic similarity and knowledge base embedding
■ Rolling up data
● To explore and build ontologies
○ Visualization, lookup, mapping, quality control
4. Why would I want ANOTHER generic ontology library?
“I already use X, it’s great!”
I assume if you’re here, then X isn’t a great fit for all your problems
5. Methods of ontology access (i.e computational use of ontologies)
External API or query server
REST-ish API
● BioPortal / OntoPortal
● OLS
Query Interface (SPARQL)
● Ubergraph
● Ontobee
6. Methods of ontology access (i.e computational use of ontologies)
External API or query server
REST-ish API
● BioPortal / OntoPortal
● OLS
Query Interface (SPARQL)
● Ubergraph
● Ontobee
Local File
● RDF/OWL
● OBO Format
● OBO Json
Libraries
● Pronto/fastobo
● OWLAPI
● FunOWL
● OwlReady
● Obonet
● Or simply: Curl/requests
7. Advantages and disadvantages of different methods
External API or query server
Advantages:
● No local download necessary
● Minimal local compute
Disadvantages:
● Doesn’t have my ontology/version
● Doesn’t do the thing I need
○ Many operations are more suited to in-
memory processing
○ Multi high-latency call antipattern
● Occasional downtime
[counterpoint: run service locally]
8. Advantages and disadvantages of different methods
External API or query server
Advantages:
● No local download necessary
● Minimal local compute
Disadvantages:
● Doesn’t have my ontology/version
● Doesn’t do the thing I need
○ Many operations are more suited to in-
memory processing
○ Multi high-latency call antipattern
● Occasional downtime
[counterpoint: run service locally]
Local File
Advantages
● Control over ontology/version
● Efficient operations (e.g. graph/recursive)
Disadvantages
● Doesn’t scale for long-tail sized ontologies
○ PRO, CHEBI, NCBITaxon
Increased effort / increased control
9. The format/datamodel landscape
OWL, RDF
● Great for communicating to OWL reasoners
● OWL and RDF are two datamodels, multiple formats…
● Wrong abstraction for many problems
○ Triples != Edges
○ Axioms != Edges
○ Most code using OWL written by non-OWL gurus is dangerously wrong
● Poor historic support outside JVM
● Poor scaling for long-tail ontologies
○ Even mid-size ontologies like Uberon are slow to parse with rdflib
https://www.w3.org/TR/owl2-primer
Use of OWL within the Gene Ontology
Christopher J Mungall, Heiko Dietze, David Osumi-Sutherland (OWLED)
doi: https://doi.org/10.1101/010090
10. The format/datamodel landscape
OBO Format
● Poor for long-tail of expressivity
○ Good enough for 95% of purposes
○ Many rarely used OWL constructs can’t be expressed
○ Poor internationalization
● High hidden legacy cost of some design decisions
○ E.g. identifiers
● Parsing
○ Easy to do quick hacky parsers
○ Surprisingly hard to write a robust parser
● Not used outside biology
● Impossible to wean bioinformaticians off it
https://owlcollab.github.io/oboformat/doc/obo-syntax.html
11. The format/datamodel landscape
OBO JSON
● Designed for the long-tail
○ Core structures intended to serve 95% of purposes in an easy fashion
○ Additional constructs can be added for remaining 5%
● Low uptake
https://github.com/geneontology/obographs
https://douroucouli.wordpress.com/2016/10/04/a-developer-friendly-json-exchange-format-for-ontologies
13. Existing Ontology Libraries
● There is a multitude of ontology libraries
● For brevity we will highlight a few key ones
○ Emphasis on Python
14. OWLAPI (Java)
● Complete support for all OWL specifications
○ This is highly non-trivial!!!
● Only way to directly communicate with multiple reasoners
● Has been indispensable for ontology development cycle
○ ROBOT, Protege
● Less widely used for ontology application cycle
● Challenges:
○ JVM
○ Size long-tail: Local file / memory bound, no DBMS access
○ OWL is not the right level of abstraction for many problems
https://github.com/owlcs/owlapi
15. FunOWL (Python)
● Partial support for OWL specifications
○ Read: Functional
○ Write: Functional, RDF
○ Not intended for communication with reasoners
● Less well supported
○ But does what is in scope quite well
https://github.com/hsolbrig/funowl
16. Horned OWL (Rust)
● Rust OWL library
○ Implements all of OWL2
○ Faster than OWL API
● Experimental python bindings
○ https://github.com/jannahastings/py-horned-owl
https://github.com/phillord/horned-owl
17. Pronto (Python, with Rust bindings)
● Support for OBO-Format and OBO-Format OWL profile
● Pros: Python, Fast (fastobo), and robust
● Cons (all related to coupling with OBO-Format):
○ Most ontologies don’t conform to its strict profile (fixable)
○ long-tail of expressivity (hard to fix)
https://github.com/althonos/pronto
https://github.com/fastobo/fastobo
20. owlery
● Web API for OWL
● Supports reasoning
● Easy to stand up
https://github.com/phenoscape/owlery
21. BioPortal
● Comprehensive and broad
○ Nearly 1k ontologies
○ Up-to-date
● Robust, well-documented APIs
○ Term access
○ Automated Mappings
○ Annotator
● API key required
● Broader than biomedicine; OntoPortal
○ MatPortal
○ AgroPortal
○ EcoPortal
https://data.bioontology.org/documentation
22. OLS: Ontology Lookup Service
● Less comprehensive but curated for quality
○ 275 ontologies
○ Up-to-date
● Robust, well-documented APIs
○ Term access
○ Curated Mappings (OxO)
○ Annotator (ZOOMA)
● Many local Docker installations
https://www.ebi.ac.uk/ols/docs/api
23. OntoBee SPARQL endpoint
● OBO plus others
● SPARQL endpoint
○ More expressive than an API
○ But sometimes more work to express yourself!
● Main drawback:
○ RDF/SPARQL not the right level of abstraction for
many tasks
■ E.g. impossible to get part-ancestors
https://www.ontobee.org/sparql
25. Ubergraph SPARQL endpoint uses Relation Graph
● Edges (direct and indirect) as simple triples
https://github.com/balhoff/relation-graph Parts of organs that are parts
of abdomens
26. Relational Database: Semantic SQL
● Uses rdftab (Rust) for fast loading into triple tables
● Uses relation-graph (Scala) for entailed edge tables
● Uses SQL views to provide higher level constructs (OWL and RG)
● Python ORM for those that like that sort of thing
https://github.com/INCATools/semantic-sql/
27. Relational Database: Semantic SQL
● Uses rdftab (Rust) for fast loading into triple tables
● Uses relation-graph (Scala) for entailed edge tables
● Uses SQL views to provide higher level constructs (OWL and RG)
● Python ORM for those that like that sort of thing
https://github.com/INCATools/semantic-sql/
Find terms with mappings to
Allen Brain Atlas that are not part
of the brain
30. Just tell me which one to use to do X
Historically we have said: IT DEPENDS ON X (and Y and Z)
● End result: a multiple square pegs and round holes
31. OAK: The People’s Pluralistic Python Ontology Library
Support multiple conceptualizations of what an
ontology is
● A relational graph for data analysis
● A collection of logical statements
● A vocabulary for text mining
● A collection of concepts and mappings for
tagging data
● Terminological units plus rich metadata (e.g.
conforming to OMO datamodel)
These are all to some extent interlocking
32. OAK: The People’s Pluralistic Python Ontology Library
Support multiple conceptualizations of what an
ontology is
● A relational graph for data analysis
● A collection of logical statements
● A vocabulary for text mining
● A collection of concepts and mappings for
tagging data
● Terminological units plus rich metadata (e.g.
conforming to OMO datamodel)
These are all to some extent interlocking
Support multiple modes of access
● Local files
○ obo, json, rdf, owl
● Remote API services
○ Ontology portals
○ Large scale annotation
● Local or remote database
○ SPARQL
○ SQL
33. OAK: The People’s Pluralistic Python Ontology Library
Support multiple conceptualizations of what an
ontology is
● A relational graph for data analysis
● A collection of logical statements
● A vocabulary for text mining
● A collection of concepts and mappings for
tagging data
● Terminological units plus rich metadata (e.g.
conforming to OMO datamodel)
These are all to some extent interlocking
Support multiple modes of access
● Local files
○ obo, json, rdf, owl
● Remote API services
○ Ontology portals
○ Large scale annotation
● Local or remote database
○ SPARQL
○ SQL
Mid-term goal: Speed
47. Demo / Tutorial
Part 1: Command Line
● Basic lookup/search
● Different implementations
○ Obo
○ Ubergraph
○ OWL
○ Bioportal
● Using SQLite
● OboGraphViz
Part 2: Python usage
https://incatools.github.io/ontology-access-kit/intro
48. [Switch to demo / tutorial here]
https://incatools.github.io/ontology-access-kit/intro
49. Using SQLite
Why use the SQLite backend?
● No long tail expressivity loss
● It’s fast
○ No parse penalty: Once downloaded, access is instantaneous
○ Entailed_edge pre-loaded, for fast graph queries
○ SQLite is in general fast
○ Further optimizations easy - e.g concretizing views
51. Behind the scenes
$ chebi -vv ancestors -p i CHEBI:15356
DEBUG:root:Ancestors query:
SELECT entailed_edge.subject AS entailed_edge_subject, entailed_edge.predicate AS entailed_edge_predicate,
entailed_edge.object AS entailed_edge_object
FROM entailed_edge
WHERE entailed_edge.subject IN (__[POSTCOMPILE_subject_1]) AND entailed_edge.predicate IN (__[POSTCOMPILE_predicate_1])
52. How to use SQLite
Method 1: Use ready-made OBO files
● protocol A
○ Download from S3
■ https://s3.amazonaws.com/bbop-sqlite/hp.db
● Protocol B
○ Download using semsql
■ semsql download obi -o obi.db
● Protocol C
○ Use the obo sqlite selector
■ runoak -i sqlite:obo:obi COMMAND
Method 2: build your own
53. How to use SQLite
Method 1: Use ready-made files Method 2: build your own
● Protocol A:
○ Install rdftab
○ Install relation-graph
○ Prepare rdf/xml (pre-merged) file
■ E.g obi.owl
○ semsql make obi.db
● Protocol B:
○ Use ODK docker image
○ semsql make –docker obi.db
■ Requires ODK v13.1
54. Plans to make this easier
Rdftab.rs
● Job: load “statements” table from RDF
● Currently only accepts RDF/XML
Make it easier to install:
● In Rust, so easy binding PyO3
● OR could write in Python
○ Don’t need the “stanza” functionality
Relation-graph
● Job: load entailed_edge table using a
reasoner
○ Only requires tiny OWL profile (SubC,
Some, Transitivity, Property Chain)
● Written in Scala
○ Possible souffle rewrite
Can we make part of python install?
● Proposed path:
○ Write reasoner using
https://github.com/ekzhang/crepe
■ (10 lines of datalog)
○ Provide PyO3 bindings
https://github.com/INCATools/semantic-sql/issues/41