Tutorial given at European Open Science Cloud data interoperability workshop.
The purpose was to evaluate whether schema.org markup could be used in a variety of scientific disciplines.
1) JSON-LD has seen widespread adoption with over 2 million HTML pages including it and it being a required format for Linked Data platforms.
2) A primary goal of JSON-LD was to allow JSON developers to use it similarly to JSON while also providing mechanisms to reshape JSON documents into a deterministic structure for processing.
3) JSON-LD 1.1 includes additional features like using objects to index into collections, scoped contexts, and framing capabilities.
Published on Feb 29, 2016 by PMR
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
Automatic Extraction of Science and Medicine from the scholarly literaturepetermurrayrust
Many scientists have to extract many facts out the scholarly literature - to evaluate other work or to extract useful collections of facts. This shows the approach, especially for systematic reviews of animal or clinical trials
Research data catalogues and data interoperability in life sciencesBlue BRIDGE
Presentation by Rafael C Jimenez, ELIXIR CTO
This presentation gives an overview of data catalogues in the life sciences and describe different approaches of data interoperability and federation. It also explains the relationship and differences among ELIXIR registries, data repositories, data archives and knowledge-bases. The presentation introduces few ideas for discussion about how to facilitate data interoperability in the European Open Science Cloud.
The document outlines a workshop on using mining techniques to help with systematic reviews. It discusses demonstrating dictionaries and having participants share what they would like a system to do. Attendees will see their dictionaries in action and learn about advanced mining techniques like chemistry and diagram mining. Any early adopter can obtain the open source software to run mining at home on various resource types. All material will be made available under a Creative Commons license.
This document provides an overview of database resources for molecular and cell biology students at UCT Libraries. It describes how to access databases through the library website or subject guides. Specific databases covered include Scopus, Henry Stewart Talks, and Knovel. Features of each database are demonstrated, such as searching, filtering results, and exporting citations. Tips are provided for advanced searching and assistance is available from the librarian.
Hydra: A Vocabulary for Hypermedia-Driven Web APIsMarkus Lanthaler
Presentation of the paper "Hydra: A Vocabulary for Hypermedia-Driven Web APIs" at the 6th Workshop on Linked Data on the Web (LDOW2013) at the WWW2013 in Rio de Janeiro, Brazil
1) JSON-LD has seen widespread adoption with over 2 million HTML pages including it and it being a required format for Linked Data platforms.
2) A primary goal of JSON-LD was to allow JSON developers to use it similarly to JSON while also providing mechanisms to reshape JSON documents into a deterministic structure for processing.
3) JSON-LD 1.1 includes additional features like using objects to index into collections, scoped contexts, and framing capabilities.
Published on Feb 29, 2016 by PMR
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
Automatic Extraction of Science and Medicine from the scholarly literaturepetermurrayrust
Many scientists have to extract many facts out the scholarly literature - to evaluate other work or to extract useful collections of facts. This shows the approach, especially for systematic reviews of animal or clinical trials
Research data catalogues and data interoperability in life sciencesBlue BRIDGE
Presentation by Rafael C Jimenez, ELIXIR CTO
This presentation gives an overview of data catalogues in the life sciences and describe different approaches of data interoperability and federation. It also explains the relationship and differences among ELIXIR registries, data repositories, data archives and knowledge-bases. The presentation introduces few ideas for discussion about how to facilitate data interoperability in the European Open Science Cloud.
The document outlines a workshop on using mining techniques to help with systematic reviews. It discusses demonstrating dictionaries and having participants share what they would like a system to do. Attendees will see their dictionaries in action and learn about advanced mining techniques like chemistry and diagram mining. Any early adopter can obtain the open source software to run mining at home on various resource types. All material will be made available under a Creative Commons license.
This document provides an overview of database resources for molecular and cell biology students at UCT Libraries. It describes how to access databases through the library website or subject guides. Specific databases covered include Scopus, Henry Stewart Talks, and Knovel. Features of each database are demonstrated, such as searching, filtering results, and exporting citations. Tips are provided for advanced searching and assistance is available from the librarian.
Hydra: A Vocabulary for Hypermedia-Driven Web APIsMarkus Lanthaler
Presentation of the paper "Hydra: A Vocabulary for Hypermedia-Driven Web APIs" at the 6th Workshop on Linked Data on the Web (LDOW2013) at the WWW2013 in Rio de Janeiro, Brazil
Liberating facts from the scientific literature - Jisc Digifest 2016 TheContentMine
Published on Mar 4, 2016 by PMR
Text and data mining (TDM) techniques can be applied to a wide range of materials, from published research papers, books and theses, to cultural heritage materials, digitised collections, administrative and management reports and documentation, etc. Use cases include academic research, resource discovery and business intelligence.
This workshop will show the value and benefits of TDM techniques and demonstrate how ContentMine aims to liberate 100,000,000 facts from the scientific literature, and ContentMine will provide a hands on demo on a topical and accessible scientific/medical subject.
Automatic Extraction of Knowledge from the Literaturepetermurrayrust
ContentMine tools (and the Harvest alliance) can be used to search the literature for knowledge, especially in biomedicine. All tools are Open and shortly we shall be indexing the complete daily scholarly literature
Presentation of the paper "On Using JSON-LD to Create Evolvable RESTful Services" at the 3rd International Workshop on RESTful Design (WS-REST 2012) at WWW2012 in Lyon, France
JSON-LD provides a standard representation for expressing linked data using JSON objects. It allows objects to represent entities with keys as properties, and arrays to express property values. Contexts define terms and associate properties and values with IRIs. JSON-LD brings the benefits of linked data to web applications by mapping JSON structures to RDF triples.
Digital Scholarship: Enlightenment or Devastated Landscape? TheContentMine
Published on Dec 17, 2015 by PMR
Every year 500 Billion USD of public funding is spent on research, but much of this lies hidden in papers that are never read. I describe how machines can help us to read the literature. However there is massive opposition from publishers who are trying to prevent open scholarship and who build walled gardens that they control
This document provides an introduction to using ElasticSearch. It discusses installing ElasticSearch and making API calls. It demonstrates indexing employee documents with fields like name, age, interests. It shows how to search for documents by field values or full text, do phrase searches, and highlight search terms. It also introduces analytics capabilities like aggregations to analyze field values.
dkNET is a portal that provides a single point of entry for discovering NIDDK-relevant research resources and data to help researchers make efficient decisions. It allows searching across community databases, literature, and over 200 biomedical databases. Key features include personalized search capabilities, the ability to save searches and set up alerts, and creating collections of search results. The portal aims to interconnect research communities by providing access to large pools of interconnected data and resources.
Cassavabase is a database for cassava and related crops that allows users to search accessions and trial plots, manage breeding programs and trials, analyze genomic selection tools, and visualize genomic data. It contains data for over 80,000 accessions, 300,000 trial plots, 8 million phenotypic observations, and 2 billion genotypic datapoints. Users must create an account to access full database privileges like submitting trials or phenotypic data. The database is open to all cassava breeders and part of a collaborative effort for genomic selection in African cassava breeding programs.
Hypermedia In Practice - FamilySearch Developers Conference 2014Ryan Heaton
The document discusses how hypermedia principles can make web APIs more robust and adaptable to changes. It provides examples of how resources and their representations may change, such as property order, whitespace, namespaces, caching policies, available resources, domain names, resource locations, and inclusion of subresources. The document advocates using hypermedia to link related resources and handle changes gracefully.
Automatic Extraction of Knowledge from Biomedical literature TheContentMine
Published on Mar 16, 2016 by PMR
A plenary lecture to Cochrane Collaboration in Birmingham, on the value of automatically extracting knowledge. Covers the Why? How? What? Who? and problems and invites collaboration
The Nuclear Receptor Signaling Atlas (NURSA) is partnering with dkNET (NIDDK Information Network) to host a dataset challenge, and we invite you to join! Everyone is talking about Big Data. How can we ensure that the impact of individual scientists working on a myriad of small and focused studies that discover and probe new phenomena - is not lost in the Big Data world. In fact, there is more than one way to generate big data and we would like your help in creating and expanding “big data” for NIDDK! In this 30-minute webinar, dkNET team will give a presentation about the overview of challenge task, how to use dkNET to find research resources, and top tips!
This document discusses the importance of making data "sticky" by using shared identifiers that create links between data sources. It notes that identifiers should be globally unique, resolvable by both humans and machines, and widely used. Examples of shared identifiers that enable discovery and metrics include DOIs, specimen identifiers from databases like GBIF, and citations. The value, it argues, comes from the links between nodes, not just the nodes themselves.
This document provides an overview of key concepts for working with Elasticsearch including:
- What documents and their metadata fields like _index, _type, and _id represent
- How to index, retrieve, update, delete and check for existence of documents
- Using versions for optimistic concurrency control
- Partially updating documents with scripts
- Retrieving multiple documents with _mget
- Reducing overhead with bulk indexing operations
- Setting default types to reduce repetition
Automatic Extraction of Knowledge from Biomedical literaturepetermurrayrust
a plenary lecture to Cochrane Collaboration in Birmingham, on the value of automatically extracting knowledge. Covers the Why? How? What? Who? and problems and invites collaboration
The increase in online and web-only publishing has made it easier for organisations to create and distribute grey literature. Use these tips and tricks to track it down.
Example-driven Web API Specification DiscoveryJavier Canovas
Slides of my presentation at European Conference on Modelling Foundations and Applications (ECMFA'17). To be presented during the session on Thursday 16:00-17:30
Liberating facts from the scientific literature - Jisc Digifest 2016 TheContentMine
Published on Mar 4, 2016 by PMR
Text and data mining (TDM) techniques can be applied to a wide range of materials, from published research papers, books and theses, to cultural heritage materials, digitised collections, administrative and management reports and documentation, etc. Use cases include academic research, resource discovery and business intelligence.
This workshop will show the value and benefits of TDM techniques and demonstrate how ContentMine aims to liberate 100,000,000 facts from the scientific literature, and ContentMine will provide a hands on demo on a topical and accessible scientific/medical subject.
Automatic Extraction of Knowledge from the Literaturepetermurrayrust
ContentMine tools (and the Harvest alliance) can be used to search the literature for knowledge, especially in biomedicine. All tools are Open and shortly we shall be indexing the complete daily scholarly literature
Presentation of the paper "On Using JSON-LD to Create Evolvable RESTful Services" at the 3rd International Workshop on RESTful Design (WS-REST 2012) at WWW2012 in Lyon, France
JSON-LD provides a standard representation for expressing linked data using JSON objects. It allows objects to represent entities with keys as properties, and arrays to express property values. Contexts define terms and associate properties and values with IRIs. JSON-LD brings the benefits of linked data to web applications by mapping JSON structures to RDF triples.
Digital Scholarship: Enlightenment or Devastated Landscape? TheContentMine
Published on Dec 17, 2015 by PMR
Every year 500 Billion USD of public funding is spent on research, but much of this lies hidden in papers that are never read. I describe how machines can help us to read the literature. However there is massive opposition from publishers who are trying to prevent open scholarship and who build walled gardens that they control
This document provides an introduction to using ElasticSearch. It discusses installing ElasticSearch and making API calls. It demonstrates indexing employee documents with fields like name, age, interests. It shows how to search for documents by field values or full text, do phrase searches, and highlight search terms. It also introduces analytics capabilities like aggregations to analyze field values.
dkNET is a portal that provides a single point of entry for discovering NIDDK-relevant research resources and data to help researchers make efficient decisions. It allows searching across community databases, literature, and over 200 biomedical databases. Key features include personalized search capabilities, the ability to save searches and set up alerts, and creating collections of search results. The portal aims to interconnect research communities by providing access to large pools of interconnected data and resources.
Cassavabase is a database for cassava and related crops that allows users to search accessions and trial plots, manage breeding programs and trials, analyze genomic selection tools, and visualize genomic data. It contains data for over 80,000 accessions, 300,000 trial plots, 8 million phenotypic observations, and 2 billion genotypic datapoints. Users must create an account to access full database privileges like submitting trials or phenotypic data. The database is open to all cassava breeders and part of a collaborative effort for genomic selection in African cassava breeding programs.
Hypermedia In Practice - FamilySearch Developers Conference 2014Ryan Heaton
The document discusses how hypermedia principles can make web APIs more robust and adaptable to changes. It provides examples of how resources and their representations may change, such as property order, whitespace, namespaces, caching policies, available resources, domain names, resource locations, and inclusion of subresources. The document advocates using hypermedia to link related resources and handle changes gracefully.
Automatic Extraction of Knowledge from Biomedical literature TheContentMine
Published on Mar 16, 2016 by PMR
A plenary lecture to Cochrane Collaboration in Birmingham, on the value of automatically extracting knowledge. Covers the Why? How? What? Who? and problems and invites collaboration
The Nuclear Receptor Signaling Atlas (NURSA) is partnering with dkNET (NIDDK Information Network) to host a dataset challenge, and we invite you to join! Everyone is talking about Big Data. How can we ensure that the impact of individual scientists working on a myriad of small and focused studies that discover and probe new phenomena - is not lost in the Big Data world. In fact, there is more than one way to generate big data and we would like your help in creating and expanding “big data” for NIDDK! In this 30-minute webinar, dkNET team will give a presentation about the overview of challenge task, how to use dkNET to find research resources, and top tips!
This document discusses the importance of making data "sticky" by using shared identifiers that create links between data sources. It notes that identifiers should be globally unique, resolvable by both humans and machines, and widely used. Examples of shared identifiers that enable discovery and metrics include DOIs, specimen identifiers from databases like GBIF, and citations. The value, it argues, comes from the links between nodes, not just the nodes themselves.
This document provides an overview of key concepts for working with Elasticsearch including:
- What documents and their metadata fields like _index, _type, and _id represent
- How to index, retrieve, update, delete and check for existence of documents
- Using versions for optimistic concurrency control
- Partially updating documents with scripts
- Retrieving multiple documents with _mget
- Reducing overhead with bulk indexing operations
- Setting default types to reduce repetition
Automatic Extraction of Knowledge from Biomedical literaturepetermurrayrust
a plenary lecture to Cochrane Collaboration in Birmingham, on the value of automatically extracting knowledge. Covers the Why? How? What? Who? and problems and invites collaboration
The increase in online and web-only publishing has made it easier for organisations to create and distribute grey literature. Use these tips and tricks to track it down.
Example-driven Web API Specification DiscoveryJavier Canovas
Slides of my presentation at European Conference on Modelling Foundations and Applications (ECMFA'17). To be presented during the session on Thursday 16:00-17:30
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)Pat Patterson
Why would anyone but the most pedestrian enterprise developer be interested in a data access protocol originally designed by Microsoft, implemented in XML and handed to OASIS for standardization? The Open Data Protocol, or OData for short, has evolved into a clean, RESTful interface for CRUD operations against data services. Alongside the usual enterprise suspects such as Microsoft, Salesforce and IBM, OData has been adopted by government and non-profit agencies to open up their data and make it accessible to the public. For developers wanting to consume data, or create their own OData services, there's no shortage of open source options, from Apache Olingo in Java to node-odata and ODataCpp. Whether you're accessing customer orders in SAP or the Whitehouse visitor book, you're going to need some OData smarts.
CEDAR is a metadata management tool that lets user define metadata templates using a well described yet flexible metdata format. CEDAR then presents the forms represented by those templates to other users to fill out. CEDAR offers semantic precision (with support from the BioPortal ontology repository), metadata completion assistance, intelligent recommendations, support for JSON-LD and RDF metadata export, and an easy-to-use user interface.
Rafael C Jimenez presents the Omics Discovery Index | OSFair2017 Workshop
Workshop title: How FAIR friendly is your data catalogue?
Workshop overview:
This workshop will build upon the work planned by the EOSCpilot data interoperability task and the BlueBridge workshop held on April 3 at the RDA meeting. We will investigate common mechanisms for interoperation of data catalogues that preserve established community standards, norms and resources, while simplifying the process of being/becoming FAIR. Can we have a simple interoperability architecture based on a common set of metadata types? What are the minimum metadata requirements to expose FAIR data to EOSC services and EOSC users?
DAY 3 - PARALLEL SESSION 6 & 7
The core Search frameworks in Liferay 7 have been significantly retooled to benefit not only from Liferay's new modular architecture, but also from one of the most innovative players in the market: Elasticsearch, which replaces Lucene as the default search engine in Portal. This session will cover topics like clustering and scalability, unveil improvements (both Elasticsearch and Solr) like aggregations, filters, geolocation, "more like this" and other new query types, and also hot new features for the Enterprise like out-of-the-box Marvel cluster monitoring and Shield security.
André "Arbo" Oliveira joined Liferay in early 2014 as a senior engineer and leads the Search Infrastructure team. He's been writing code for a living for 22 years, 14 of them as a Java developer and architect. Ever since discovering Elasticsearch, he's vowed never to write another SQL WHERE clause again.
Presented on 10/11/12 at the Boston Elasticsearch meetup held at the Microsoft New England Research & Development Center. This talk gave a very high-level overview of Elasticsearch to newcomers and explained why ES is a good fit for Traackr's use case.
Elasticsearch is presented as an expert in real-time search, aggregation, and analytics. The document outlines Elasticsearch concepts like indexing, mapping, analysis, and the query DSL. Examples are provided for real-time search queries, aggregations including terms, date histograms, and geo distance. Lessons learned from using Elasticsearch at LARC are also discussed.
The document discusses JSON, SQLite, and persistence. It provides an overview of JSON and SQLite, including their advantages and disadvantages. It then describes how to use SQLite as a persistence layer for storing and retrieving JSON data using a JSON Persistence Framework. The framework provides a zero configuration approach to persisting JSON objects to an SQLite database without any custom mappings or modifications to the domain objects.
Demo presentation given at the Semantic Web Applications and Tools for Life Science (SWAT4LS) 2014 meeting in Berlin, Dec 10, 2014. http://www.swat4ls.org/workshops/berlin2014/scientific-programme/
This document discusses consuming APIs with Python. It introduces the requests library for making HTTP requests to APIs from Python. It shows examples of making GET, POST, PUT, and DELETE requests to a sample TODO list API to perform operations like retrieving, adding, updating, and deleting tasks. It then builds out these requests into reusable functions within a todo.py library module to easily interact with the API. Finally, it demonstrates using the library functions to add a task and retrieve all tasks from the API.
This document summarizes how Elasticsearch can be used for scaling analytics applications. Elasticsearch is an open source, distributed search and analytics engine that can index large volumes of data. It automatically shards and replicates data across nodes for redundancy and high availability. Analytics queries like date histograms, statistical facets, and geospatial searches can retrieve insightful results from large datasets very quickly. The document provides an example of using Elasticsearch to perform sentiment analysis, location tagging, and analytical queries on over 100 million social media documents.
This document discusses using Elasticsearch for SQL users. It covers search queries, data modeling, and architecture approaches. The agenda includes search queries, data modeling, and architecture. A live demo shows searching a single field, multiple fields, and phrases. Data modeling discusses analyzing or not analyzing fields. Relationships can be modeled through application joins, data denormalization, nested objects, or parent-child documents. Architecture approaches include using triggers, asynchronous replication, and forked writes from applications with or without Logstash.
JSON Schema is an extremely powerful, yet easily approachable, tool for describing data structures. In fact, the OpenAPI has embraced JSON Schema and currently uses it for describing the inputs/outputs of your APIs. JSON Schema is a technology that is often misunderstood and often used in ways that leave people scratching their heads when it does not work the way they expected. This talk will introduce JSON Schema from the ground up, complete with gotchas and best practices. In the end, the hope is that the attendee will see the value of JSON Schema and understand it well enough to use in their OpenAPI documents and even their own applications.
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...confluent
We’ve built a real-time streaming platform that enables prediction based on user behavior, with events occurring in virtual and augmented reality environments. The solution enables organizations to train people in an extended reality environment, where real-life training may be costly and dangerous. Kafka Streams enables analyzing spatial and event data to detect gestural feature and analyze user behavior in real-time to be able to predict any future mistake the user might make. Kafka is the backbone of our real-time analytics and extended reality communication platform with our cluster and applications being deployed on Kubernetes.
In this talk, we will mainly focus on the following: 1. Why Extended Reality with Kafka is a step in the right direction. 2. Architecture & Power of Schema Registry in building a generic platform for pluggable XR apps and analytics models 3. How KSQL and Kafka Streams fits in Kafka Ecosystem to help analyze human motion data and detect features for real-time prediction. 4. Demo of a VR application with real-time analytics feedback, which assists people to be trained in how to work with chemical laboratory equipment.
The document provides an overview of Elasticsearch including that it is easy to install, horizontally scalable, and highly available. It discusses Elasticsearch's core search capabilities using Lucene and how data can be stored and retrieved. The document also covers Elasticsearch's distributed nature, plugins, scripts, custom analyzers, and other features like aggregations, filtering and sorting.
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
My talk about BioThings API project at ISMB 2018 Chicago, as part of BD2K special session. BioThings API project provides a collection of high-performance APIs (MyGene.info, MyVariant.info, MyChem.info), an SDK for building a new biomedical API (BioThings SDK), and a JSON-LD and OpenAPI based solution for across-API interoperability and knowledge exploration.
Make your Web resources more discoverable with Bioschemas markup –Bioschemas ...Bioschemas
This tutorial will introduce you to Bioschemas, and will present how to include schema.org markup to make your resource(s) more discoverable on the Web. At the end of the session, attendees will be able to
1) Understand what is Bioschemas and how to use it
2) Have examples of deployments using Bioschemas
This document discusses Bioschemas, which aims to enable findability and interoperability of life sciences data on the web. It defines schemas using Schema.org for different types of life sciences data, including datasets and data catalogs. Bioschemas has over 200 members across 35 organizations that have deployed Bioschemas markup. It has ongoing work to increase adoption of Bioschemas across different types of life sciences resources and provide training and events for the community.
Bioschemas findability and interoperabilityBioschemas
This document discusses Bioschemas, which uses Schema.org vocabulary to add semantic markup to life sciences web pages in order to improve findability and interoperability of data. It notes that Bioschemas defines types and minimum properties for life sciences data, links to domain ontologies, and has a community-driven process for developing specifications. The document outlines the size of the Bioschemas community and number of specifications, deployments, and pages marked up, and proposes initiatives for increasing adoption, staff exchange, and providing training.
Bioschemas community: Developing profiles over Schema.org to make life scienc...Bioschemas
The Bioschemas community (http://bioschemas.org) is a loose collaboration formed by a wide range of life science resource providers and informaticians. The community is developing profiles over Schema.org to enable life science resources such as data about a specific protein, sample, or training event, to be more discoverable on the web. While the content of well-known resources such as Uniprot (for protein data) are easily discoverable, there is a long tail of specialist resources that would benefit from embedding Schema.org markup in a standardised approach.
The community have developed twelve profiles for specific types of life science resources (http://bioschemas.org/specifications/), with another six at an early draft stage. For each profile, a set of use cases have been identified. These typically focus on search, but several facilitate lightweight data exchange to support data aggregators such as Identifiers.org, FAIRsharing.org, and BioSamples. The next stage of the development of a profile consists of mapping the terms used in the use cases to existing properties in Schema.org and domain ontologies. The properties are then prioritised in order to support the use cases, with a minimal set of about six properties identified, along with a larger set of recommended and optional properties. For each property, an expected cardinality is defined and where appropriate, object values are specified from controlled vocabularies. Before a profile is finalised, it must first be demonstrated that resources can deploy the markup.
In this talk, we will outline the progress that has been made by the Bioschemas Community in a single year through three hackathon events. We will discuss the processes followed by the Bioschemas Community to foster collaboration, and highlight the benefits and drawbacks of using open Google documents and spreadsheets to support the community develop the profiles. We will conclude by summarising future opportunities and directions for the community.
This document discusses Bioschemas, which uses Schema.org to add structured metadata tags to life sciences web pages. It provides an overview of Bioschemas activities in 2017, including new specifications and demonstrators for data repositories, datasets, samples, and phenotypes. It also discusses adoption of Bioschemas through tools for discovery and validation, and deployment in resources like OmicsDI and Reactome to improve data findability on the web.
Brief overview of Bioschemas presented at the 2017 bio hackathon in Japan. The presentation introduces the new proposal for a new Schema.org type called BiologicalEntity.
Bioschemas: Introduction and Implementation Study OverviewBioschemas
Bioschemas aims to improve data interoperability in life sciences. It does this by encouraging people in life science to use schema.org markup, so that their websites and services contain consistently structured information. This structured information then makes it easier to discover, collate and analyse distributed data.
This presentation gives an overview of the project and the ELIXIR funded Implementation Study running through 2017.
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Discover the cutting-edge telemetry solution implemented for Alan Wake 2 by Remedy Entertainment in collaboration with AWS. This comprehensive presentation dives into our objectives, detailing how we utilized advanced analytics to drive gameplay improvements and player engagement.
Key highlights include:
Primary Goals: Implementing gameplay and technical telemetry to capture detailed player behavior and game performance data, fostering data-driven decision-making.
Tech Stack: Leveraging AWS services such as EKS for hosting, WAF for security, Karpenter for instance optimization, S3 for data storage, and OpenTelemetry Collector for data collection. EventBridge and Lambda were used for data compression, while Glue ETL and Athena facilitated data transformation and preparation.
Data Utilization: Transforming raw data into actionable insights with technologies like Glue ETL (PySpark scripts), Glue Crawler, and Athena, culminating in detailed visualizations with Tableau.
Achievements: Successfully managing 700 million to 1 billion events per month at a cost-effective rate, with significant savings compared to commercial solutions. This approach has enabled simplified scaling and substantial improvements in game design, reducing player churn through targeted adjustments.
Community Engagement: Enhanced ability to engage with player communities by leveraging precise data insights, despite having a small community management team.
This presentation is an invaluable resource for professionals in game development, data analytics, and cloud computing, offering insights into how telemetry and analytics can revolutionize player experience and game performance optimization.
06-20-2024-AI Camp Meetup-Unstructured Data and Vector DatabasesTimothy Spann
Tech Talk: Unstructured Data and Vector Databases
Speaker: Tim Spann (Zilliz)
Abstract: In this session, I will discuss the unstructured data and the world of vector databases, we will see how they different from traditional databases. In which cases you need one and in which you probably don’t. I will also go over Similarity Search, where do you get vectors from and an example of a Vector Database Architecture. Wrapping up with an overview of Milvus.
Introduction
Unstructured data, vector databases, traditional databases, similarity search
Vectors
Where, What, How, Why Vectors? We’ll cover a Vector Database Architecture
Introducing Milvus
What drives Milvus' Emergence as the most widely adopted vector database
Hi Unstructured Data Friends!
I hope this video had all the unstructured data processing, AI and Vector Database demo you needed for now. If not, there’s a ton more linked below.
My source code is available here
https://github.com/tspannhw/
Let me know in the comments if you liked what you saw, how I can improve and what should I show next? Thanks, hope to see you soon at a Meetup in Princeton, Philadelphia, New York City or here in the Youtube Matrix.
Get Milvused!
https://milvus.io/
Read my Newsletter every week!
https://github.com/tspannhw/FLiPStackWeekly/blob/main/141-10June2024.md
For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here
https://www.youtube.com/@MilvusVectorDatabase/videos
Unstructured Data Meetups -
https://www.meetup.com/unstructured-data-meetup-new-york/
https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7
https://www.meetup.com/pro/unstructureddata/
https://zilliz.com/community/unstructured-data-meetup
https://zilliz.com/event
Twitter/X: https://x.com/milvusio https://x.com/paasdev
LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/
GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw
Invitation to join Discord: https://discord.com/invite/FjCMmaJng6
Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann
https://www.meetup.com/unstructured-data-meetup-new-york/events/301383476/?slug=unstructured-data-meetup-new-york&eventId=301383476
https://www.aicamp.ai/event/eventdetails/W2024062014
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
8. Describes metadata for data
repositories and data catalogues
so they can be more easily
indexed by search engines and
registries.
Specification
Examples
9. →
{
"@context": "http://schema.org",
"@type": "DataCatalog",
"@id": "http://www.uniprot.org",
"name": "UniProt",
"description": "The Universal Protein Resource (UniProt) is a
comprehensive resource for protein sequence and annotation
data",
"url": "http://www.uniprot.org",
"keywords": "protein, protein sequence, protein annotation",
"provider": {
"@type": "Organization",
"name": "UniProt Consortium"
}
}
10. Describes metadata for datasets so
they can be more easily indexed by
search engines and registries.
Specification
Examples
22. Examples & template
Creating your own DataCatalog
metadata
1. Copy the template.json example.
2. Change the values of the minimum fields.
3. Add as many recommended and optional fields as possible.
4. Wrap up the example using script tags.
<script type="application/ld+json">
....
</script>
5. Embed the example in your landing page.
• Name
• Description
• License
• Release
• Citation
• Metrics
• Tools
• …
23. Data Catalogue description for UniProt using
the minimum fields.
https://search.google.com/structured-data/testing-tool