This talk reviews some of the software packages available for R and Bioconductor to access NCBI Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). In particular, the GEOquery, GEOmetadb, and SRAdb packages are discussed.
Jack Zhu presents a quick set of overview slides on the Bioconductor SRAdb package. The package allows SQL access to the Sequence Read Archive (SRA) repository at NCBI.
Jack Zhu presents a quick set of overview slides on the Bioconductor SRAdb package. The package allows SQL access to the Sequence Read Archive (SRA) repository at NCBI.
Life Science Database Cross Search and MetadataMaori Ito
Life science databases are sometimes difficult to understand due to lack of information. I'd like to add metadata into databases and improve search results.
Making powerful science: an introduction to NGS data analysisAdamCribbs1
This slide deck is from the Botnar Research Centre introduction to NGS sequencing workshop 2021- an overview of the theoretical concepts behind sequencing data analysis are given
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceDataWorks Summit
During the rise and innovation of “big data,” the geospatial analytics landscape has grown and evolved. We are beyond just analyzing static maps. Geospatial data is streaming from devices, sensors, infrastructure systems, or social media, and our applications and use cases must dynamically scale to meet the increased demands.
Cloud can provide cost-effective storage and that ephemeral resource-burst needed for fast processing and low latency, all to monetize the immediate value of fresh geospatial data. Geospatial analytics require optimized spatial data types and algorithms to distill data to knowledge. Such processing, especially with strict latency requirements, has always been a challenge.
We propose an open source big data stack for geospatial analytics on Cloud based on Apache NiFi, Apache Spark and LocationTech GeoMesa. GeoMesa is a geospatial framework deployed in a modern big data platform that provides a scalable and low latency solution for indexing volumes of historical data and generating live views and streaming geospatial analytics. CONSTANTIN STANCA, Solutions Engineer, Hortonworks and JAMES HUGHES, Mathematician, CCRi
Lightweight data engineering, tools, and software to facilitate data reuse an...Sean Davis
Lightweight tools, software, and publication processes that tie together data resources, analysis tools, documentation can powerful stimuli for the high-quality reuse of available data. While developed with reproducibility as a core value, Bioconductor tooling and infrastructure has reduced barriers to data reuse and established best practices for rich data and metadata sharing in genomics and proteomics. In this talk, I give a few examples and motivation for how the Bioconductor data ecosystem can be a model for other communities to enhance the value of available data.
2016 07 12_purdue_bigdatainomics_seandavisSean Davis
Newer, faster, cheaper molecular assays are driving biomedical research. I discuss the history of biomedical data including concepts of data sharing, hypothesis-driven vs generating research, and the potential to expand our thinking on biomedical research to be much more integrated through smart, creative, and open use of technologies and more flexible, longitudinal studies.
Life Science Database Cross Search and MetadataMaori Ito
Life science databases are sometimes difficult to understand due to lack of information. I'd like to add metadata into databases and improve search results.
Making powerful science: an introduction to NGS data analysisAdamCribbs1
This slide deck is from the Botnar Research Centre introduction to NGS sequencing workshop 2021- an overview of the theoretical concepts behind sequencing data analysis are given
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceDataWorks Summit
During the rise and innovation of “big data,” the geospatial analytics landscape has grown and evolved. We are beyond just analyzing static maps. Geospatial data is streaming from devices, sensors, infrastructure systems, or social media, and our applications and use cases must dynamically scale to meet the increased demands.
Cloud can provide cost-effective storage and that ephemeral resource-burst needed for fast processing and low latency, all to monetize the immediate value of fresh geospatial data. Geospatial analytics require optimized spatial data types and algorithms to distill data to knowledge. Such processing, especially with strict latency requirements, has always been a challenge.
We propose an open source big data stack for geospatial analytics on Cloud based on Apache NiFi, Apache Spark and LocationTech GeoMesa. GeoMesa is a geospatial framework deployed in a modern big data platform that provides a scalable and low latency solution for indexing volumes of historical data and generating live views and streaming geospatial analytics. CONSTANTIN STANCA, Solutions Engineer, Hortonworks and JAMES HUGHES, Mathematician, CCRi
Lightweight data engineering, tools, and software to facilitate data reuse an...Sean Davis
Lightweight tools, software, and publication processes that tie together data resources, analysis tools, documentation can powerful stimuli for the high-quality reuse of available data. While developed with reproducibility as a core value, Bioconductor tooling and infrastructure has reduced barriers to data reuse and established best practices for rich data and metadata sharing in genomics and proteomics. In this talk, I give a few examples and motivation for how the Bioconductor data ecosystem can be a model for other communities to enhance the value of available data.
2016 07 12_purdue_bigdatainomics_seandavisSean Davis
Newer, faster, cheaper molecular assays are driving biomedical research. I discuss the history of biomedical data including concepts of data sharing, hypothesis-driven vs generating research, and the potential to expand our thinking on biomedical research to be much more integrated through smart, creative, and open use of technologies and more flexible, longitudinal studies.
RNA-seq: A High-resolution View of the TranscriptomeSean Davis
The molecular microscopes that we use to examine human biology have advanced significantly with the advent of next generation sequencing. RNA-seq is one application of this technology that leads to a very high-resolution view of the transcriptome. With these new technologies come increased data analysis and data handling burdens as well as the promise of new discovery. These slides present a high-level overview of the RNA-seq technology with a focus on the analysis approaches, quality control challenges, and experimental design.
ShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor packageSean Davis
The Sequence Read Archive (SRA) is hosted at the National Center for Biomedical Informatics at the National Institutes of Health. These slides showcase Olivia Zhang's summer internship work to wrap a Bioconductor package, SRAdb, using the the shiny R package.
This slide set is meant to be a teaching guide to R functionality. It includes hands-on exercises meant to be used for an audience sitting in front of a computer.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
3. Objec#ves
• Navigate
NCBI
GEO
website
• Understand
NCBI
GEO
data
en<<es
and
rela#onships
between
them
• Use
GEOquery
package
to
import
data
from
NCBI
GEO
into
R
• Convert
GEOquery
data
structures
into
R
data
structures
• Use
GEOmetadb
to
find
data
in
NCBI
GEO
• Know
rela#onship
between
NCBI
GEO
and
NCBI
SRA
• Understand
how
to
use
the
SRAdb
package
to
query
SRA
metadata
• Use
SRAdb
package
to
control
the
Integrated
Genome
Viewer
(IGV)
from
R
4.
5. MIAME-‐compliant
Data
• The
raw
data
for
each
hybridiza#on
(e.g.,
CEL
or
GPR
files)
• The
final
processed
(normalized)
data
for
the
set
of
hybridiza#ons
in
the
experiment
(study)
(e.g.,
the
gene
expression
data
matrix
used
to
draw
the
conclusions
from
the
study)
• The
essen#al
sample
annota#on
including
experimental
factors
and
their
values
(e.g.,
compound
and
dose
in
a
dose
response
experiment)
• The
experimental
design
including
sample
data
rela#onships
(e.g.,
which
raw
data
file
relates
to
which
sample,
which
hybridiza#ons
are
technical,
which
are
biological
replicates)
• Sufficient
annota#on
of
the
array
(e.g.,
gene
iden#fiers,
genomic
coordinates,
probe
oligonucleo#de
sequences
or
reference
commercial
array
catalog
number)
• The
essen#al
laboratory
and
data
processing
protocols
(e.g.,
what
normaliza#on
method
has
been
used
to
obtain
the
final
processed
data)
6. What’s
in
GEO
• Gene
expression
profiling
by
microarray
or
next-‐genera#on
sequencing
• Non-‐coding
RNA
profiling
by
microarray
or
next-‐genera#on
sequencing
• Chroma#n
immunoprecipita#on
(ChIP)
profiling
by
microarray
or
next-‐genera#on
sequencing
• Genome
methyla#on
profiling
by
microarray
or
next-‐
genera#on
sequencing
• Genome
varia#on
profiling
by
array
(arrayCGH)
• SNP
arrays
• Serial
Analysis
of
Gene
Expression
(SAGE)
• Protein
arrays
7. GEO
data
en##es
• GEO
Samples
(GSM)
• GEO
PlaYorm
(GPL)
• GEO
Series
(GSE)
– Collec#ons
of
related
GSM
and
GPL
along
with
free-‐text
study
metadata
– Two
data
“flavors”
• GSE
• GSEMatrix
• GEO
Dataset
(GDS)
– Only
data
en#ty
that
is
curated
by
NCBI
GEO
staff
– Typically,
samples
are
divided
into
sta#s#cally
and
biologically
relevant
groups
upon
which
we
can
compute
11. GEOquery
• Singular
goal:
Get
data
from
NCBI
GEO
and
parse
into
R
objects
• Secondary
goals:
– Get
supplemental
files
from
NCBI
GEO
– Allow
large-‐scale
data
mining
by
doing
all-‐of-‐the-‐
above
in
a
lossless
manner
14. GEOquery
Walkthrough
• h^p://watson.nci.nih.gov/~sdavis/
– Click
on
Tutorials
and
then
on
“Accessing
public
data….”
• Download
the
R
script
for
cut-‐and-‐paste
to
follow
along
15. GEOmetadb
• Finding
data
in
NCBI
GEO
can
be
challenging
• Some
inves#gators
need
large-‐scale,
computable
access
to
GEO
metadata
• Given
one
GEO
en#ty
type,
it
may
be
useful
to
find
all
rela#onships
with
other
en#ty
types
(eg.,
find
all
hgu133a
arrays
in
GEO)
16. GEOmetadb
• What
is
GEOmetadb?
–
A
Bioconductor
package
that
offers
an
alterna#ve
to
eU#ls
data
mining
of
GEO
metadata
– A
SQLite
database
which
stores
all
the
GEO
metadata
in
a
rela#onal
format
for
easy
querying,
par#cularly
in
bulk
• What
is
GEOmetadb
NOT?
– We
have
not
a^empted
to
alter
or
standardize
in
any
way
the
data
from
GEO
20. SRAdb
• Similar
goals
to
GEOmetadb,
but
with
SRA
data
• Data
for
SRA
(ENA,
DRA)
are
mirrored,
but
exact
policies
are
a
bit
unclear
(to
me)
• Accessing
to
the
data
also
provided
by
SRAdb
package,
but
further
processing
(SRA
SDK)
now
needed
to
get
FASTQ
files
21. SRAdb
and
NCBI
SRA
Recently,
NCBI
announced
that
due
to
budget
constraints,
it
would
be
discon#nuing
its
Sequence
Read
Archive
(SRA)
and
Trace
Archive
repositories
for
high-‐throughput
sequence
data.
However,
NIH
has
since
commi^ed
interim
funding
for
SRA
in
its
current
form
un#l
October
1,
2011.
In
addi#on,
NCBI
has
been
working
with
staff
from
other
NIH
Ins#tutes
and
NIH
grantees
to
develop
an
approach
to
con#nue
archiving
a
widely
used
subset
of
next
genera#on
sequencing
data
ajer
October
1,
2011.
We
now
plan
to
con#nue
handling
sequencing
data
associated
with:
• RNA-‐Seq,
ChIP-‐Seq,
and
epigenomic
data
that
are
submi^ed
to
GEO
• Genomic
and
Transcriptomic
assemblies
that
are
submi^ed
to
GenBank
• 16S
ribosomal
RNA
data
associated
with
metagenomics
that
are
submi^ed
to
GenBank
In
addi#on,
NCBI
will
con#nue
to
provide
access
to
exis#ng
SRA
and
Trace
Archive
data
for
the
foreseeable
future.
NCBI
is
also
con#nuing
to
discuss
with
NIH
Ins#tutes
approaches
for
handling
other
next-‐genera#on
sequencing
data
associated
with
specific
large-‐scale
studies.
22. SRAdb
and
IGV
Walkthrough
• Download
and
start
IGV:
– h^p://www.broadins#tute.org/sojware/igv/
download