The document outlines the usage of the NCBI E-utilities API for programmatically accessing data from NCBI databases like PubMed, Nucleotide, and Gene. It describes the 8 E-utilities programs (ESearch, EPost, ESummary, EFetch, ELink, EInfo, EGQuery, ESpell) and provides examples of building analysis pipelines using combinations of the E-utilities, such as searching PubMed and linking results to Gene records or nucleotide sequences. Sample applications demonstrated include finding related human genes to PubMed articles on osteosarcoma, downloading nucleotide sequences for Burkholderia cepacia complex, and identifying PubMed articles discussing cancer copy number changes that used specific GEO microarray
This document discusses biological databases. It defines a biological database as a collection of structured, searchable, and periodically updated biological data like protein sequences, molecular structures, and DNA sequences. It notes that biological data is heterogeneous, high-volume, uncertain, dynamically changing, and integrated from various global sources. The key functions of biological databases are to make biological data available worldwide in a computer-readable format. They are broadly classified into sequence, structure, and pathway databases. Some examples of important biological databases discussed are Swiss-Prot, PDB, GenBank, and COGs.
Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping: identifying splice variants and isoforms. The issue of transcript quantification and genomic variants that can be identified from RNAseq data will be discussed.
This document provides an overview of the PubMed database. PubMed is an online database developed by the National Center for Biotechnology Information that allows users to access MEDLINE, a database of biomedical literature. The document discusses PubMed's history, coverage of over 19 million citations back to the 1950s, features like advanced search capabilities and links to full texts, and tutorials for searching and using PubMed.
The document discusses data mining tools and techniques for analyzing diffraction pattern data. It provides examples of analyzing an unknown catalytic converter specimen using x-ray diffraction data mining of the Powder Diffraction File. Through phase identification, Rietveld refinement, and searching the PDF database, it is able to determine that the specimen contains cordierite, cerium-stabilized zirconia, and around 3% rhodium oxide. Additional online searching finds a patent that matches these materials and likely synthesis process. The analysis demonstrates how data mining of diffraction data can be used to elucidate unknown sample compositions and properties.
Word embedding, Vector space model, language modelling, Neural language model, Word2Vec, GloVe, Fasttext, ELMo, BERT, distilBER, roBERTa, sBERT, Transformer, Attention
# 修正
P.18 式: x+z*z/2±z√(x+z*z/4)
# 概要
大きな標本の少頻度の信頼区間を示すことは良いはずだが、信頼区間の計算が複雑なことと、標本のサイズが不明な場合があり、信頼区間が示されることは稀である。本稿では、Wilson score interval による信頼区間の簡易な計算を無限極限にすることでこの問題を解決する。またWilson score intervalをポアソン分布に適用し、その一致を確認する。また経験ベイズによる頻度の予測などを検討する。
# abstruct
It should be good to show low frequency confidence intervals for large samples, but confidence intervals are rarely shown due to the complexity of the confidence interval calculations and the unknown sample size. In this paper, we solve this problem by limiting the simple calculation of confidence intervals by Wilson score interval to the infinite limit. Also, we apply the Wilson score interval to the Poisson distribution and confirm the match. Also, we consider prediction of frequency by empirical Bayes.
統計数理研究所言語系共同研究グループ2020年度第2回合同研究発表会
This document discusses biological databases. It defines a biological database as a collection of structured, searchable, and periodically updated biological data like protein sequences, molecular structures, and DNA sequences. It notes that biological data is heterogeneous, high-volume, uncertain, dynamically changing, and integrated from various global sources. The key functions of biological databases are to make biological data available worldwide in a computer-readable format. They are broadly classified into sequence, structure, and pathway databases. Some examples of important biological databases discussed are Swiss-Prot, PDB, GenBank, and COGs.
Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping: identifying splice variants and isoforms. The issue of transcript quantification and genomic variants that can be identified from RNAseq data will be discussed.
This document provides an overview of the PubMed database. PubMed is an online database developed by the National Center for Biotechnology Information that allows users to access MEDLINE, a database of biomedical literature. The document discusses PubMed's history, coverage of over 19 million citations back to the 1950s, features like advanced search capabilities and links to full texts, and tutorials for searching and using PubMed.
The document discusses data mining tools and techniques for analyzing diffraction pattern data. It provides examples of analyzing an unknown catalytic converter specimen using x-ray diffraction data mining of the Powder Diffraction File. Through phase identification, Rietveld refinement, and searching the PDF database, it is able to determine that the specimen contains cordierite, cerium-stabilized zirconia, and around 3% rhodium oxide. Additional online searching finds a patent that matches these materials and likely synthesis process. The analysis demonstrates how data mining of diffraction data can be used to elucidate unknown sample compositions and properties.
Word embedding, Vector space model, language modelling, Neural language model, Word2Vec, GloVe, Fasttext, ELMo, BERT, distilBER, roBERTa, sBERT, Transformer, Attention
# 修正
P.18 式: x+z*z/2±z√(x+z*z/4)
# 概要
大きな標本の少頻度の信頼区間を示すことは良いはずだが、信頼区間の計算が複雑なことと、標本のサイズが不明な場合があり、信頼区間が示されることは稀である。本稿では、Wilson score interval による信頼区間の簡易な計算を無限極限にすることでこの問題を解決する。またWilson score intervalをポアソン分布に適用し、その一致を確認する。また経験ベイズによる頻度の予測などを検討する。
# abstruct
It should be good to show low frequency confidence intervals for large samples, but confidence intervals are rarely shown due to the complexity of the confidence interval calculations and the unknown sample size. In this paper, we solve this problem by limiting the simple calculation of confidence intervals by Wilson score interval to the infinite limit. Also, we apply the Wilson score interval to the Poisson distribution and confirm the match. Also, we consider prediction of frequency by empirical Bayes.
統計数理研究所言語系共同研究グループ2020年度第2回合同研究発表会
Swiss-Prot is a protein database that provides highly annotated and non-redundant protein sequences. It features annotation of proteins, minimal redundancy, integration with other databases, and documentation. Swiss-Prot provides organized data and information on proteins and can be used as a starting point for protein research by allowing searches with various strings.
This document provides an introduction to biological databases. It discusses what databases are and features of an ideal database. It describes the relationships between primary sequence databases like GenBank that contain original submissions, and derived databases like RefSeq that are curated by NCBI. Key databases at NCBI are described, including GenBank, RefSeq, and Entrez, which allows integrated searching across multiple databases. The benefits of data integration through linking related information are highlighted.
The document discusses Prosite, a database of protein family signatures that can be used to determine the function of uncharacterized proteins. It contains patterns and profiles formulated to identify which known protein family a new sequence belongs to. The Prosite database consists of two files - a data file containing information for scanning sequences, and a documentation file describing each pattern and profile. New Prosite entries are mainly profiles developed by collaborators at the SIB Swiss Institute of Bioinformatics to identify distantly related proteins based on conserved residues.
This document provides an overview of a computational materials science lecture at Tokyo Tech. The lecture will cover first principles calculations, focusing on numerical analysis of electronic states. First principles calculations determine electronic states without experimental parameters by only using fundamental physical constants and numerical parameters. The lecture notes can be downloaded online and questions are welcome. Example materials that will be discussed include graphene and magnetic materials interfaces. Computational methods like density functional theory and plane wave basis sets will be introduced.
This document provides summaries of numerous protein and genome databases. It describes databases that contain protein sequence information and annotations, protein structure information, genomic and gene information, information on transcriptional regulation, and several other types of biological databases. The databases serve various purposes like housing protein and DNA sequences, functional annotations, protein structures and classifications, genomic and gene data, and information on transcriptional regulation and interactions.
This document discusses several common file formats used for storing bioinformatics sequence data, including their history and key features. It describes early ASCII text file formats, followed by flat file formats adopted for collaboration between sequence databases in 1986 which allowed for annotations. Two common modern formats are then outlined - FASTA format which represents sequences with single letter codes and allows programs to read sequences, and PIR format which identifies sequence type and includes the sequence itself along with optional descriptions. The document also briefly discusses ALN/ClustalW and GCG/MSF formats for storing aligned multiple sequences.
BITS: Overview of important biological databases beyond sequencesBITS
Module 4 Other relevant biological data sources beyond sequences
Part of training session "Basic Bioinformatics concepts, databases and tools" - http://www.bits.vib.be/training
This document discusses genome database systems. It begins with an introduction to bioinformatics and genomes. It then discusses the background of genome databases, including some examples. The major characteristics of genome database systems are described as having high complex data, schema changes at a rapid pace, and complex queries. The key areas of data management in genome databases are discussed as non-standard data, complex queries, data interpretation, integration across databases, and uniform management solutions. Major research areas and applications that impact society are also summarized.
Introduction to HMMs in Bioinformaticsavrilcoghlan
This document provides an introduction to Hidden Markov Models (HMMs) for modeling DNA sequence evolution. It explains that HMMs are an advancement over simpler Markov and multinomial models because they allow the probability of a base to depend on the hidden state (e.g. GC-rich vs. AT-rich region) of the previous position, rather than just the previous base. The key components of an HMM - the transition matrix describing the probabilities of changing between states, and the emission matrix describing the probabilities of bases for each state - are also introduced.
BLAST is a program that compares a query DNA or protein sequence against a database to find sequences that resemble the query above a certain threshold. It works by breaking the query into short words and searching the database for those words, then extending any matches. The output includes alignments of high-scoring segment pairs and statistical measures like E-values and bit scores to indicate match significance. BLAST is faster than FASTA and more specific due to low complexity filtering.
Basics of Data Analysis in BioinformaticsElena Sügis
Presentation gives introduction to the Basics of Data Analysis in Bioinformatics.
The following topics are covered:
Data acquisition
Data summary(selecting the needed column/rows from the file and showing basic descriptive statistics)
Preprocessing (missing values imputation, data normalization, etc.)
Principal Component Analysis
Data Clustering and cluster annotation (k-means, hierarchical)
Cluster annotations
This document provides a short guide to using Entrez and the E-utilities. It describes Entrez as a global query system that searches over 40 databases at NCBI, including PubMed, Nucleotide, and Protein. It also describes the E-utilities, which provide a stable interface for programmers to query and retrieve data from Entrez. The document outlines the main functions of nine E-utilities including ESearch, ESummary, EFetch, and ELink. It also provides examples of constructing pipelines using multiple E-utilities to retrieve relevant document summaries and sequences from Entrez based on search queries or lists of IDs.
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
My talk at NCI's CBIIT speaker series:
https://wiki.nci.nih.gov/display/CBIITSpeakers/2019/01/02/Jan+16%2C+Chunlei+Wu%2C+BioThings+API
A companion blog post: https://ncip.nci.nih.gov/blog/the-network-of-biothings/
See more details about BioThings project at http://biothings.io.
Swiss-Prot is a protein database that provides highly annotated and non-redundant protein sequences. It features annotation of proteins, minimal redundancy, integration with other databases, and documentation. Swiss-Prot provides organized data and information on proteins and can be used as a starting point for protein research by allowing searches with various strings.
This document provides an introduction to biological databases. It discusses what databases are and features of an ideal database. It describes the relationships between primary sequence databases like GenBank that contain original submissions, and derived databases like RefSeq that are curated by NCBI. Key databases at NCBI are described, including GenBank, RefSeq, and Entrez, which allows integrated searching across multiple databases. The benefits of data integration through linking related information are highlighted.
The document discusses Prosite, a database of protein family signatures that can be used to determine the function of uncharacterized proteins. It contains patterns and profiles formulated to identify which known protein family a new sequence belongs to. The Prosite database consists of two files - a data file containing information for scanning sequences, and a documentation file describing each pattern and profile. New Prosite entries are mainly profiles developed by collaborators at the SIB Swiss Institute of Bioinformatics to identify distantly related proteins based on conserved residues.
This document provides an overview of a computational materials science lecture at Tokyo Tech. The lecture will cover first principles calculations, focusing on numerical analysis of electronic states. First principles calculations determine electronic states without experimental parameters by only using fundamental physical constants and numerical parameters. The lecture notes can be downloaded online and questions are welcome. Example materials that will be discussed include graphene and magnetic materials interfaces. Computational methods like density functional theory and plane wave basis sets will be introduced.
This document provides summaries of numerous protein and genome databases. It describes databases that contain protein sequence information and annotations, protein structure information, genomic and gene information, information on transcriptional regulation, and several other types of biological databases. The databases serve various purposes like housing protein and DNA sequences, functional annotations, protein structures and classifications, genomic and gene data, and information on transcriptional regulation and interactions.
This document discusses several common file formats used for storing bioinformatics sequence data, including their history and key features. It describes early ASCII text file formats, followed by flat file formats adopted for collaboration between sequence databases in 1986 which allowed for annotations. Two common modern formats are then outlined - FASTA format which represents sequences with single letter codes and allows programs to read sequences, and PIR format which identifies sequence type and includes the sequence itself along with optional descriptions. The document also briefly discusses ALN/ClustalW and GCG/MSF formats for storing aligned multiple sequences.
BITS: Overview of important biological databases beyond sequencesBITS
Module 4 Other relevant biological data sources beyond sequences
Part of training session "Basic Bioinformatics concepts, databases and tools" - http://www.bits.vib.be/training
This document discusses genome database systems. It begins with an introduction to bioinformatics and genomes. It then discusses the background of genome databases, including some examples. The major characteristics of genome database systems are described as having high complex data, schema changes at a rapid pace, and complex queries. The key areas of data management in genome databases are discussed as non-standard data, complex queries, data interpretation, integration across databases, and uniform management solutions. Major research areas and applications that impact society are also summarized.
Introduction to HMMs in Bioinformaticsavrilcoghlan
This document provides an introduction to Hidden Markov Models (HMMs) for modeling DNA sequence evolution. It explains that HMMs are an advancement over simpler Markov and multinomial models because they allow the probability of a base to depend on the hidden state (e.g. GC-rich vs. AT-rich region) of the previous position, rather than just the previous base. The key components of an HMM - the transition matrix describing the probabilities of changing between states, and the emission matrix describing the probabilities of bases for each state - are also introduced.
BLAST is a program that compares a query DNA or protein sequence against a database to find sequences that resemble the query above a certain threshold. It works by breaking the query into short words and searching the database for those words, then extending any matches. The output includes alignments of high-scoring segment pairs and statistical measures like E-values and bit scores to indicate match significance. BLAST is faster than FASTA and more specific due to low complexity filtering.
Basics of Data Analysis in BioinformaticsElena Sügis
Presentation gives introduction to the Basics of Data Analysis in Bioinformatics.
The following topics are covered:
Data acquisition
Data summary(selecting the needed column/rows from the file and showing basic descriptive statistics)
Preprocessing (missing values imputation, data normalization, etc.)
Principal Component Analysis
Data Clustering and cluster annotation (k-means, hierarchical)
Cluster annotations
This document provides a short guide to using Entrez and the E-utilities. It describes Entrez as a global query system that searches over 40 databases at NCBI, including PubMed, Nucleotide, and Protein. It also describes the E-utilities, which provide a stable interface for programmers to query and retrieve data from Entrez. The document outlines the main functions of nine E-utilities including ESearch, ESummary, EFetch, and ELink. It also provides examples of constructing pipelines using multiple E-utilities to retrieve relevant document summaries and sequences from Entrez based on search queries or lists of IDs.
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
My talk at NCI's CBIIT speaker series:
https://wiki.nci.nih.gov/display/CBIITSpeakers/2019/01/02/Jan+16%2C+Chunlei+Wu%2C+BioThings+API
A companion blog post: https://ncip.nci.nih.gov/blog/the-network-of-biothings/
See more details about BioThings project at http://biothings.io.
The Role of Metadata in Reproducible Computational ResearchJeremy Leipzig
Reproducible computational research (RCR) provides the keystone to the scientific method, packaging the transformation of raw data to published results in a manner than can be communicated to others. Developing RCR standards has been a growing concern of statisticians, data scientists, and informatics professionals. Metadata provides context and provenance to raw data, and is essential to both discovery and validation RCR. This presentation will give an overview for emerging metadata standards in data, analysis, pipelines tools, and publications.
The document describes Biothings.api, a framework for building RESTful web APIs that query biological data stored in Elasticsearch. It generalizes code from existing APIs like MyGene.info and MyVariant.info. The framework includes handlers for common API endpoints, classes for constructing Elasticsearch queries, and a project template. Using this framework, new biological data APIs can be quickly created with minimal additional code. It has been used to rebuild MyVariant.info and create new APIs for taxonomy and chemicals. Future work includes improving documentation and integrating data loading/indexing functionality.
Metadata-based tools at the ENCODE PortalENCODE-DCC
This document describes metadata-driven tools developed by the ENCODE Data Coordination Center to provide access to data from the ENCODE project. It discusses how detailed metadata describing experimental variables is captured and structured to enable faceted browsing and filtering of the large dataset. Elasticsearch allows full-text searches. The metadata schema and relationships between metadata objects are shown. The ENCODE data portal allows browsing and searching across 4000+ experiments and associated files, biosamples, antibodies, and annotations.
GenBank, EMBL, and DDBJ are primary nucleotide sequence databases that collaborate to store publicly available DNA sequences. NCBI's GenBank is one of the largest primary sequence databases, containing over 240,000 organisms' sequences submitted from laboratories. PubMed and Entrez are literature and biomedical databases maintained by NCBI that allow users to search biomedical research articles and integrate related data from multiple sources. SRS is a sequence retrieval system developed by EBI that integrates over 250 molecular biology databases and allows complex queries across data sources.
Bioinformatic Harvester is a software tool that acts as a meta search engine for genes and protein information. It collects and indexes data from 16 major bioinformatics databases and allows users to search across these databases simultaneously. Search results are displayed on a single HTML page and are ranked based on relevance. Users can query the system using terms like gene names, sequences, protein domains, and literature to retrieve integrated information from databases on genes and proteins.
Role of bioinformatics in life sciences researchAnshika Bansal
1. The document discusses bioinformatics and summarizes some of its key applications and tools. It describes how bioinformatics merges biology and computer science to solve biological problems by applying computational tools to molecular data.
2. It provides examples of common bioinformatics tasks like retrieving sequences from databases, comparing sequences, analyzing genes and proteins, and viewing 3D structures.
3. The document lists several popular databases for nucleotide sequences, protein sequences, literature, and other biological data. It also introduces common bioinformatics tools for tasks like sequence alignment, translation, and structure analysis.
The document provides information about a bioinformatics course including the syllabus, topics, and schedule. The course focuses on using relational databases to manage and analyze large biological datasets. Topics include relation databases, web application development, genome browsers, text mining, and systems biology. The schedule lists the topics to be covered each class over 12 weeks from March to May.
The document describes tools and web services from the National Center for Biomedical Ontology (NCBO) including the Ontology Web Services, Ontology Widgets, NCBO Annotator, NCBO Resource Index, and Ontology Recommender. The NCBO Annotator is an open access web service that annotates text with terms from ontologies in BioPortal and includes a variety of customization parameters. The NCBO Resource Index provides an ontology-based search across publicly available biomedical resources.
This document provides an overview of downstream analyses that can be performed after variant identification and filtering in a typical variant calling pipeline. It discusses visualization of variant data in each gene to identify potential causative variants. It also mentions association studies as another type of downstream analysis where variants are tested for association with disease phenotypes. The goal of downstream analyses is to help prioritize variants for further investigation.
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
Metagenomic Data Provenance and Management using the ISA infrastructure - overview, implementation patterns & software tools
Slides presented at EBI Metagenomics Bioinformatics course: http://www.ebi.ac.uk/training/course/metagenomics2014
Databases store organized information in tables and fields. A database management system interacts with users and applications to capture and analyze data. Biological databases contain life sciences information from experiments, literature, and computational analysis. They classify sequences, structures, and functions. Common biological databases include GenBank, UniProt, and PDB.
This document describes several text-based biological databases and how to search them. It discusses Entrez, which searches multiple databases and links related entries. It also describes the Sequence Retrieval System (SRS) which allows searching over 80 biological databases. Additionally, it outlines DBGET/LinkDB, an integrated system that searches about 20 databases and links results to associated information. The document provides an example of using each system to retrieve information on a specific protein entry.
Biological databases store and organize large amounts of biological data for research use. There are many types of biological databases that classify data by type, such as nucleotide sequences, protein sequences, genomes, protein structures, gene expression, and metabolic pathways. Databases can also be classified by their data source as primary databases containing experimental results or secondary databases that analyze primary database results. Database availability varies, with some publicly open and others proprietary. Common biological databases discussed include GenBank, UniProt, PDB, KEGG, and FlyBase.
Bioinformatics is defined as the application of tools of computation and analysis to the capture and interpretation of biological data. It is an interdisciplinary field, which harnesses computer science, mathematics, physics, and biology
openEHR Developers Workshop at #MedInfo2015Pablo Pazos
The document discusses Pablo Pazos Gutiérrez's experiences developing open source software using the openEHR standard, including EHRServer, a clinical data repository, and EHRGen, an EHR generator framework. EHRServer allows committing and querying clinical data in an openEHR format. EHRGen auto-generates user interfaces and menus from openEHR archetypes and templates. The document also describes an XML rule engine for clinical decision support using openEHR data and an approach to generalized UI generation across technologies.
Presentation to ImmPort Science Meeting, February 27, 2014 on the proper treatment of value sets in the Immport Immunology Database and Analysis Portal
The document discusses different text-based database retrieval systems for accessing biological data, including Entrez, SRS, and DBGET/LinkDB. It describes their key features and how each system allows users to search text databases using queries, with Entrez providing linked related data across multiple databases. An example shows how each system can be used to retrieve and view related information for a SwissProt protein entry.
PPT on Sustainable Land Management presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxshubhijain836
Centrifugation is a powerful technique used in laboratories to separate components of a heterogeneous mixture based on their density. This process utilizes centrifugal force to rapidly spin samples, causing denser particles to migrate outward more quickly than lighter ones. As a result, distinct layers form within the sample tube, allowing for easy isolation and purification of target substances.
Microbial interaction
Microorganisms interacts with each other and can be physically associated with another organisms in a variety of ways.
One organism can be located on the surface of another organism as an ectobiont or located within another organism as endobiont.
Microbial interaction may be positive such as mutualism, proto-cooperation, commensalism or may be negative such as parasitism, predation or competition
Types of microbial interaction
Positive interaction: mutualism, proto-cooperation, commensalism
Negative interaction: Ammensalism (antagonism), parasitism, predation, competition
I. Mutualism:
It is defined as the relationship in which each organism in interaction gets benefits from association. It is an obligatory relationship in which mutualist and host are metabolically dependent on each other.
Mutualistic relationship is very specific where one member of association cannot be replaced by another species.
Mutualism require close physical contact between interacting organisms.
Relationship of mutualism allows organisms to exist in habitat that could not occupied by either species alone.
Mutualistic relationship between organisms allows them to act as a single organism.
Examples of mutualism:
i. Lichens:
Lichens are excellent example of mutualism.
They are the association of specific fungi and certain genus of algae. In lichen, fungal partner is called mycobiont and algal partner is called
II. Syntrophism:
It is an association in which the growth of one organism either depends on or improved by the substrate provided by another organism.
In syntrophism both organism in association gets benefits.
Compound A
Utilized by population 1
Compound B
Utilized by population 2
Compound C
utilized by both Population 1+2
Products
In this theoretical example of syntrophism, population 1 is able to utilize and metabolize compound A, forming compound B but cannot metabolize beyond compound B without co-operation of population 2. Population 2is unable to utilize compound A but it can metabolize compound B forming compound C. Then both population 1 and 2 are able to carry out metabolic reaction which leads to formation of end product that neither population could produce alone.
Examples of syntrophism:
i. Methanogenic ecosystem in sludge digester
Methane produced by methanogenic bacteria depends upon interspecies hydrogen transfer by other fermentative bacteria.
Anaerobic fermentative bacteria generate CO2 and H2 utilizing carbohydrates which is then utilized by methanogenic bacteria (Methanobacter) to produce methane.
ii. Lactobacillus arobinosus and Enterococcus faecalis:
In the minimal media, Lactobacillus arobinosus and Enterococcus faecalis are able to grow together but not alone.
The synergistic relationship between E. faecalis and L. arobinosus occurs in which E. faecalis require folic acid
PPT on Alternate Wetting and Drying presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSSérgio Sacani
The pathway(s) to seeding the massive black holes (MBHs) that exist at the heart of galaxies in the present and distant Universe remains an unsolved problem. Here we categorise, describe and quantitatively discuss the formation pathways of both light and heavy seeds. We emphasise that the most recent computational models suggest that rather than a bimodal-like mass spectrum between light and heavy seeds with light at one end and heavy at the other that instead a continuum exists. Light seeds being more ubiquitous and the heavier seeds becoming less and less abundant due the rarer environmental conditions required for their formation. We therefore examine the different mechanisms that give rise to different seed mass spectrums. We show how and why the mechanisms that produce the heaviest seeds are also among the rarest events in the Universe and are hence extremely unlikely to be the seeds for the vast majority of the MBH population. We quantify, within the limits of the current large uncertainties in the seeding processes, the expected number densities of the seed mass spectrum. We argue that light seeds must be at least 103 to 105 times more numerous than heavy seeds to explain the MBH population as a whole. Based on our current understanding of the seed population this makes heavy seeds (Mseed > 103 M⊙) a significantly more likely pathway given that heavy seeds have an abundance pattern than is close to and likely in excess of 10−4 compared to light seeds. Finally, we examine the current state-of-the-art in numerical calculations and recent observations and plot a path forward for near-future advances in both domains.
Anti-Universe And Emergent Gravity and the Dark UniverseSérgio Sacani
Recent theoretical progress indicates that spacetime and gravity emerge together from the entanglement structure of an underlying microscopic theory. These ideas are best understood in Anti-de Sitter space, where they rely on the area law for entanglement entropy. The extension to de Sitter space requires taking into account the entropy and temperature associated with the cosmological horizon. Using insights from string theory, black hole physics and quantum information theory we argue that the positive dark energy leads to a thermal volume law contribution to the entropy that overtakes the area law precisely at the cosmological horizon. Due to the competition between area and volume law entanglement the microscopic de Sitter states do not thermalise at sub-Hubble scales: they exhibit memory effects in the form of an entropy displacement caused by matter. The emergent laws of gravity contain an additional ‘dark’ gravitational force describing the ‘elastic’ response due to the entropy displacement. We derive an estimate of the strength of this extra force in terms of the baryonic mass, Newton’s constant and the Hubble acceleration scale a0 = cH0, and provide evidence for the fact that this additional ‘dark gravity force’ explains the observed phenomena in galaxies and clusters currently attributed to dark matter.
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...Sérgio Sacani
We present the JWST discovery of SN 2023adsy, a transient object located in a host galaxy JADES-GS
+
53.13485
−
27.82088
with a host spectroscopic redshift of
2.903
±
0.007
. The transient was identified in deep James Webb Space Telescope (JWST)/NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) program. Photometric and spectroscopic followup with NIRCam and NIRSpec, respectively, confirm the redshift and yield UV-NIR light-curve, NIR color, and spectroscopic information all consistent with a Type Ia classification. Despite its classification as a likely SN Ia, SN 2023adsy is both fairly red (
�
(
�
−
�
)
∼
0.9
) despite a host galaxy with low-extinction and has a high Ca II velocity (
19
,
000
±
2
,
000
km/s) compared to the general population of SNe Ia. While these characteristics are consistent with some Ca-rich SNe Ia, particularly SN 2016hnk, SN 2023adsy is intrinsically brighter than the low-
�
Ca-rich population. Although such an object is too red for any low-
�
cosmological sample, we apply a fiducial standardization approach to SN 2023adsy and find that the SN 2023adsy luminosity distance measurement is in excellent agreement (
≲
1
�
) with
Λ
CDM. Therefore unlike low-
�
Ca-rich SNe Ia, SN 2023adsy is standardizable and gives no indication that SN Ia standardized luminosities change significantly with redshift. A larger sample of distant SNe Ia is required to determine if SN Ia population characteristics at high-
�
truly diverge from their low-
�
counterparts, and to confirm that standardized luminosities nevertheless remain constant with redshift.
Signatures of wave erosion in Titan’s coastsSérgio Sacani
The shorelines of Titan’s hydrocarbon seas trace flooded erosional landforms such as river valleys; however, it isunclear whether coastal erosion has subsequently altered these shorelines. Spacecraft observations and theo-retical models suggest that wind may cause waves to form on Titan’s seas, potentially driving coastal erosion,but the observational evidence of waves is indirect, and the processes affecting shoreline evolution on Titanremain unknown. No widely accepted framework exists for using shoreline morphology to quantitatively dis-cern coastal erosion mechanisms, even on Earth, where the dominant mechanisms are known. We combinelandscape evolution models with measurements of shoreline shape on Earth to characterize how differentcoastal erosion mechanisms affect shoreline morphology. Applying this framework to Titan, we find that theshorelines of Titan’s seas are most consistent with flooded landscapes that subsequently have been eroded bywaves, rather than a uniform erosional process or no coastal erosion, particularly if wave growth saturates atfetch lengths of tens of kilometers.
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Sérgio Sacani
Wereport the study of a huge optical intraday flare on 2021 November 12 at 2 a.m. UT in the blazar OJ287. In the binary black hole model, it is associated with an impact of the secondary black hole on the accretion disk of the primary. Our multifrequency observing campaign was set up to search for such a signature of the impact based on a prediction made 8 yr earlier. The first I-band results of the flare have already been reported by Kishore et al. (2024). Here we combine these data with our monitoring in the R-band. There is a big change in the R–I spectral index by 1.0 ±0.1 between the normal background and the flare, suggesting a new component of radiation. The polarization variation during the rise of the flare suggests the same. The limits on the source size place it most reasonably in the jet of the secondary BH. We then ask why we have not seen this phenomenon before. We show that OJ287 was never before observed with sufficient sensitivity on the night when the flare should have happened according to the binary model. We also study the probability that this flare is just an oversized example of intraday variability using the Krakow data set of intense monitoring between 2015 and 2023. We find that the occurrence of a flare of this size and rapidity is unlikely. In machine-readable Tables 1 and 2, we give the full orbit-linked historical light curve of OJ287 as well as the dense monitoring sample of Krakow.
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...Scintica Instrumentation
Targeting Hsp90 and its pathogen Orthologs with Tethered Inhibitors as a Diagnostic and Therapeutic Strategy for cancer and infectious diseases with Dr. Timothy Haystead.
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Sérgio Sacani
Context. The observation of several L-band emission sources in the S cluster has led to a rich discussion of their nature. However, a definitive answer to the classification of the dusty objects requires an explanation for the detection of compact Doppler-shifted Brγ emission. The ionized hydrogen in combination with the observation of mid-infrared L-band continuum emission suggests that most of these sources are embedded in a dusty envelope. These embedded sources are part of the S-cluster, and their relationship to the S-stars is still under debate. To date, the question of the origin of these two populations has been vague, although all explanations favor migration processes for the individual cluster members. Aims. This work revisits the S-cluster and its dusty members orbiting the supermassive black hole SgrA* on bound Keplerian orbits from a kinematic perspective. The aim is to explore the Keplerian parameters for patterns that might imply a nonrandom distribution of the sample. Additionally, various analytical aspects are considered to address the nature of the dusty sources. Methods. Based on the photometric analysis, we estimated the individual H−K and K−L colors for the source sample and compared the results to known cluster members. The classification revealed a noticeable contrast between the S-stars and the dusty sources. To fit the flux-density distribution, we utilized the radiative transfer code HYPERION and implemented a young stellar object Class I model. We obtained the position angle from the Keplerian fit results; additionally, we analyzed the distribution of the inclinations and the longitudes of the ascending node. Results. The colors of the dusty sources suggest a stellar nature consistent with the spectral energy distribution in the near and midinfrared domains. Furthermore, the evaporation timescales of dusty and gaseous clumps in the vicinity of SgrA* are much shorter ( 2yr) than the epochs covered by the observations (≈15yr). In addition to the strong evidence for the stellar classification of the D-sources, we also find a clear disk-like pattern following the arrangements of S-stars proposed in the literature. Furthermore, we find a global intrinsic inclination for all dusty sources of 60 ± 20◦, implying a common formation process. Conclusions. The pattern of the dusty sources manifested in the distribution of the position angles, inclinations, and longitudes of the ascending node strongly suggests two different scenarios: the main-sequence stars and the dusty stellar S-cluster sources share a common formation history or migrated with a similar formation channel in the vicinity of SgrA*. Alternatively, the gravitational influence of SgrA* in combination with a massive perturber, such as a putative intermediate mass black hole in the IRS 13 cluster, forces the dusty objects and S-stars to follow a particular orbital arrangement. Key words. stars: black holes– stars: formation– Galaxy: center– galaxies: star formation
3. NCBI & Entrez
• The National Center for
Biotechnology Information
advances science and health by
providing access to biomedical
and genomic information.
• Entrez is NCBI’s primary text
search and retrieval system
that integrates the PubMed
database of biomedical
literature with 39 other
literature and molecular
databases including DNA and
protein sequence, structure,
gene, genome, genetic
variation and gene expression.
4. E-utilities
• Entrez Programming Utilities
– The Entrez Programming Utilities (E-utilities) are a set of
eight server-side programs that provide a stable interface
into the Entrez query and database system at the NCBI.
– The E-utilities use a fixed URL syntax that translates a
standard set of input parameters into the values necessary
for various NCBI software components to search for and
retrieve the requested data.
E-utilitiesURL XML, FASTA, Text …
Input Output
5. Usage Guidelines and Requirements
• Use the E-utility URL
– baseURL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/ …
– Python urllib/urlopen, Perl LWP::Simple, Linux wget, …
• Frequency, Timing and Registration of E-utility URL Requests
– Make no more than 3 requests per second → sleep(0.5)
– Run large jobs on weekends or between 5 PM and 9 AM EST
– Include &tool and &email in all requests
• Minimizing the Number of Requests
– &retmax=500
• Handling Special Characters Within URLs
– Space → +, " → %22, # → %23
7. ESearch (text searches)
• Responds to a text query with the list of matching UIDs in a
given database (for later use in ESummary, EFetch or ELink),
along with the term translations of the query.
• Syntax: esearch.fcgi?db=<database>&term=<query>
– Input: Entrez database (&db); Any Entrez text query (&term)
– Output: List of UIDs matching the Entrez query
• Example: Get the PubMed IDs (PMIDs) for articles about
osteosarcoma
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&
term=%22osteosarcoma%22[majr:noexp]
9. ESummary
(document summary downloads)
• Responds to a list of UIDs from a given database with the
corresponding document summaries.
• Syntax: esummary.fcgi?db=<database>&id=<uid_list>
– Input: List of UIDs (&id); Entrez database (&db)
– Output: XML DocSums
• Example: Download DocSums for these PubMed IDs:
24450072, 24333720, 24333432
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubme
d&id=24450072,24333720,24333432
11. EFetch (data record downloads)
• Responds to a list of UIDs in a given database with the
corresponding data records in a specified format.
• Syntax:
efetch.fcgi?db=<database>&id=<uid_list>&rettype=<retrieval
_type>&retmode=<retrieval_mode>
– Input: List of UIDs (&id); Entrez database (&db); Retrieval type
(&rettype); Retrieval mode (&retmode)
– Output: Formatted data records as specified
• Example: Download the abstract of PubMed ID 24333432
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&i
d=24333432&rettype=abstract&retmode=text
12. ELink (Entrez links)
• Responds to a list of UIDs in a given database with either a list
of related UIDs (and relevancy scores) in the same database
or a list of linked UIDs in another Entrez database
• Checks for the existence of a specified link from a list of one
or more UIDs
• Creates a hyperlink to the primary LinkOut provider for a
specific UID and database, or lists LinkOut URLs and attributes
for multiple UIDs.
13. ELink (Entrez links)
• Syntax:
elink.fcgi?dbfrom=<source_db>&db=<destination_db>&id=<u
id_list>
– Input: List of UIDs (&id); Source Entrez database (&dbfrom);
Destination Entrez database (&db)
– Output: XML containing linked UIDs from source and destination
databases
• Example: Find one set/separate sets of Gene IDs linked to
PubMed IDs 24333432 and 24314238
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubme
d&db=gene&id=24333432,24314238
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubme
d&db=gene&id=24333432&id=24314238
15. EGQuery (global query)
• Responds to a text query with the number of records
matching the query in each Entrez database.
• Syntax: egquery.fcgi?term=<query>
– Input: Entrez text query (&term)
– Output: XML containing the number of hits in each database.
• Example: Determine the number of records for mouse in
Entrez.
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi?term=mouse[
orgn]&retmode=xml
17. ESpell (spelling suggestions)
• Retrieves spelling suggestions for a text query in a given
database.
• Syntax: espell.fcgi?term=<query>&db=<database>
– Input: Entrez text query (&term); Entrez database (&db)
– Output: XML containing the original query and spelling suggestions.
• Example: Find spelling suggestions for the PubMed query
"osteosacoma".
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/espell.fcgi?term=osteosac
oma&db=pmc
18. EInfo (database statistics)
• Provides the number of records indexed in each field of a
given database, the date of the last update of the database,
and the available links from the database to other Entrez
databases.
• Syntax: einfo.fcgi?db=<database>
– Input: Entrez database (&db)
– Output: XML containing database statistics
• Example: Find database statistics for Entrez Protein.
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=protein
19. EPost (UID uploads)
• Accepts a list of UIDs from a given database, stores the set on
the History Server, and responds with a query key and web
environment for the uploaded dataset.
• Syntax: epost.fcgi?db=<database>&id=<uid_list>
– Input: List of UIDs (&id); Entrez database (&db)
– Output: Web environment (&WebEnv) and query key (&query_key)
parameters specifying the location on the Entrez history server of the
list of uploaded UIDs
• Example: Upload five Gene IDs (7173, 22018, 54314, 403521,
525013) for later processing.
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?db=gene&id=71
73,22018,54314,403521,525013
20. Application 1
• Find related human genes to articles searched for non-
extended MeSH term "Osteosarcoma" (PubMed → Gene)
1. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubme
d&term=%22osteosarcoma%22[majr:noexp]&usehistory=y
2. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubm
ed&db=gene&query_key=1&WebEnv=NCID_1_220057266_130.14.
18.34_9001_1396281951_1196950266&term=%22homo+sapiens%
22[organism]&cmd=neighbor_history
3. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene
&query_key=3&WebEnv=NCID_1_220057266_130.14.18.34_9001_
1396281951_1196950266
21. Application 1
• Find related human genes to articles searched for non-
extended MeSH term "Osteosarcoma" (PubMed → Gene)
– ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2pubmed.gz
• It can be used instead of "ELink".
– ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
• It can be used instead of "ESummary".
22. Application 2
• Find nucleotide sequences of "Burkholderia cepacia complex"
and download in GenBank format
1. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nuccor
e&term=%22burkholderia+cepacia+complex%22[organism]&usehist
ory=y
2. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore
&query_key=1&WebEnv=NCID_1_264773253_130.14.22.215_9001
_1396244608_457974498&rettype=gb&retmode=text
23. Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
cancer "copy number"
esearch.fcgi?db=pubmed
Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter]
esearch.fcgi?db=gds
esummary.fcgi?db=pubmed
WebEnv, query_key
esummary.fcgi?db=gds
WebEnv, query_key
GPL9704
GPL8226
GPL6804
GPL6801
elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds
Parsing
Result table
Common
PubMed title
24. "cancer copy number" articles
"Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
25. Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
cancer "copy number"
esearch.fcgi?db=pubmed
Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter]
esearch.fcgi?db=gds
esummary.fcgi?db=pubmed
WebEnv, query_key
esummary.fcgi?db=gds
WebEnv, query_key
GPL9704
GPL8226
GPL6804
GPL6801
elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds
Parsing
Result table
Common
PubMed title
26. Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
27. Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
cancer "copy number"
esearch.fcgi?db=pubmed
Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter]
esearch.fcgi?db=gds
esummary.fcgi?db=pubmed
WebEnv, query_key
esummary.fcgi?db=gds
WebEnv, query_key
GPL9704
GPL8226
GPL6804
GPL6801
elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds
Parsing
Result table
Common
PubMed title
28. Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
30. EBot
• EBot is an interactive web tool that first allows
users to construct an arbitrary E-utility
analysis pipeline and then generates a Perl
script to execute the pipeline. The Perl script
can be downloaded and executed on any
computer with a Perl installation. For more
details, see the EBot page linked above.
– http://www.ncbi.nlm.nih.gov/Class/PowerTools/e
utils/ebot/ebot.cgi
31. Entrez Direct
• E-utilities on the UNIX Command Line
• Download from ftp://ftp.ncbi.nih.gov/entrez/entrezdirect/
• Entrez Direct Functions
– esearch performs a new Entrez search using terms in indexed fields.
– elink looks up neighbors (within a database) or links (between databases).
– efilter filters or restricts the results of a previous query.
– efetch downloads records or reports in a designated format.
– xtract converts XML into a table of data values.
– einfo obtains information on indexed fields in an Entrez database.
– epost uploads unique identifiers (UIDs) or sequence accession numbers.
– nquire sends a URL request to a web page or CGI service.
• Entering Query Commands
– esearch -db pubmed -query "opsin gene conversion" | elink -related
32. Links
• References
– Entrez Programming Utilities Help
• http://www.ncbi.nlm.nih.gov/books/NBK25501/
– Entrez Help
• http://www.ncbi.nlm.nih.gov/books/NBK3836/
• Useful Links
– Entrez Unique Identifiers (UIDs) for selected databases
• http://www.ncbi.nlm.nih.gov/books/NBK25497/table/chapter2.chapter2_table1/?r
eport=objectonly
– Valid values of &retmode and &rettype for EFetch (null = empty string)
• http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.chapter4_table1/?r
eport=objectonly
– The full list of Entrez links
• http://eutils.ncbi.nlm.nih.gov/entrez/query/static/entrezlinks.html