Interoperable data have been a long-time goal in many scientific communities. The recent growth in analysis, visualization and mash-up applications that expect data stored in a standardized manner has brought the interoperability issue to the fore. On the other hand, producing interoperable data is often regarded as a sideline task in a typical research team for which resources are not readily available. The HDF Group is developing a software tool aimed at lessening the burden of creating data in standards-compliant, interoperable HDF5 files. The tool, named HDF Product Designer, lowers the threshold needed to design such files by providing a user interface that combines the rich HDF5 feature set with applicable metadata conventions. Users can quickly devise new HDF5 files while at the same time seamlessly incorporating the latest best practices and conventions from their community. That is what the term interoperability in the first mile means: enabling generation of interoperable data in HDF5 files from the onset of their production. The tool also incorporates collaborative features, allowing team approach in the file design, as well as easy transfer of best practices as they are being developed. The current state of the tool and the plans for future development will be presented. Constructive input from interested parties is always welcome.
Hdf Augmentation: Interoperability in the Last MileTed Habermann
Science data files are generally written to serve well-defined purposes for a small science teams. In many cases, the organization of the data and the metadata are designed for custom tools developed and maintained by and for the team. Using these data outside of this context many times involves restructuring, re-documenting, or reformatting the data. This expensive and time-consuming process usually prevents data reuse and thus decreases the total life-cycle value of the data considerably. If the data are unique or critically important to solving a particular problem, they can be modified into a more generally usable form or metadata can be added in order to enable reuse. This augmentation process can be done to enhance data for the intended purpose or for a new purpose, to make the data available to new tools and applications, to make the data more conventional or standard, or to simplify preservation of the data. The HDF Group has addressed augmentation needs in many ways: by adding extra information, by renaming objects or moving them around in the file, by reducing complexity of the organization, and sometimes by hiding data objects that are not understood by specific applications. In some cases these approaches require re-writing the data into new files and in some cases it can be done externally, without affecting the original file. We will describe and compare several examples of each approach.
Using Neo4j for exploring the research graph connections made by RD-Switchboardamiraryani
In this talk, Jingbo Wang (NCI) and Amir Aryani (ANDS) have presented the Neo4j queries that can help data managers to explore the connections between datasets, researchers, grants, and publications using the graph model and Research Data Switchboard. In addition, they have discussed a paper on "Graph connections made by RD-Switchboard using NCI’s metadata", presented in the Reproducible Open Science workshop in Hannover September 2016.
3.24.15 Slides, “New Possibilities: Developments with DSpace and ORCID”DuraSpace
Hot Topics: The DuraSpace Community Webinar Series
Series 11: Integrating ORCID Persistent Identifiers with DSpace, Fedora and VIVO
Webinar 1: “New Possibilities: Developments with DSpace and ORCID”
Tuesday, March 24, 2015
Curated by Josh Brown, ORCID
Presented by: Bram Luyten, Co-Founder, @mire - Andrea Bollini, CRIS Solution Product Manager, CINECA - Michele Mennielli, International Relations Manager, CINECA - João Moreira, Head of Scientific Information, FCT-FCCN - Paulo Graça, RCAAP Team Member
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]Thamme Gowda
This paper describes the applications of deep learning-based image
recognition in the DARPA Memex program and its repository of
1.4 million weapons-related images collected from the Deep web.
We develop a fast, efficient, and easily deployable framework for
integrating Google’s Tensorflow framework with Apache Tika for
automatically performing image forensics on the Memex data. Our
framework and its integration are evaluated qualitatively and quantitatively
and our work suggests that automated, large-scale, and
reliable image classification and forensics can be widely used and
deployed in bulk analysis for answering domain-specific questions
Hdf Augmentation: Interoperability in the Last MileTed Habermann
Science data files are generally written to serve well-defined purposes for a small science teams. In many cases, the organization of the data and the metadata are designed for custom tools developed and maintained by and for the team. Using these data outside of this context many times involves restructuring, re-documenting, or reformatting the data. This expensive and time-consuming process usually prevents data reuse and thus decreases the total life-cycle value of the data considerably. If the data are unique or critically important to solving a particular problem, they can be modified into a more generally usable form or metadata can be added in order to enable reuse. This augmentation process can be done to enhance data for the intended purpose or for a new purpose, to make the data available to new tools and applications, to make the data more conventional or standard, or to simplify preservation of the data. The HDF Group has addressed augmentation needs in many ways: by adding extra information, by renaming objects or moving them around in the file, by reducing complexity of the organization, and sometimes by hiding data objects that are not understood by specific applications. In some cases these approaches require re-writing the data into new files and in some cases it can be done externally, without affecting the original file. We will describe and compare several examples of each approach.
Using Neo4j for exploring the research graph connections made by RD-Switchboardamiraryani
In this talk, Jingbo Wang (NCI) and Amir Aryani (ANDS) have presented the Neo4j queries that can help data managers to explore the connections between datasets, researchers, grants, and publications using the graph model and Research Data Switchboard. In addition, they have discussed a paper on "Graph connections made by RD-Switchboard using NCI’s metadata", presented in the Reproducible Open Science workshop in Hannover September 2016.
3.24.15 Slides, “New Possibilities: Developments with DSpace and ORCID”DuraSpace
Hot Topics: The DuraSpace Community Webinar Series
Series 11: Integrating ORCID Persistent Identifiers with DSpace, Fedora and VIVO
Webinar 1: “New Possibilities: Developments with DSpace and ORCID”
Tuesday, March 24, 2015
Curated by Josh Brown, ORCID
Presented by: Bram Luyten, Co-Founder, @mire - Andrea Bollini, CRIS Solution Product Manager, CINECA - Michele Mennielli, International Relations Manager, CINECA - João Moreira, Head of Scientific Information, FCT-FCCN - Paulo Graça, RCAAP Team Member
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]Thamme Gowda
This paper describes the applications of deep learning-based image
recognition in the DARPA Memex program and its repository of
1.4 million weapons-related images collected from the Deep web.
We develop a fast, efficient, and easily deployable framework for
integrating Google’s Tensorflow framework with Apache Tika for
automatically performing image forensics on the Memex data. Our
framework and its integration are evaluated qualitatively and quantitatively
and our work suggests that automated, large-scale, and
reliable image classification and forensics can be widely used and
deployed in bulk analysis for answering domain-specific questions
This is a talk from the Coalition for Networked Information Fall 2010 Member Meeting (CNIfall2010). I talked about our project to use Fedora as archival storage for social science research data and documentation.
DSpace-CRIS: new features and contribution to the DSpace mainstreamAndrea Bollini
The presentation focus on the latest releases of DSpace-CRIS, compatible with DSpace 5 and 6, with new exciting features. Particularly interesting is the recent integration between DSpace-CRIS and CKAN released as an independent module. The DSpace-CKAN Integration Module has already been released in open source (same license than DSpace) and it can easily adopted also by standard DSpace installations, both JSPUI or XMLUI.
Starting with DSpace-CRIS 5.6.1, along with the security fixes of DSpace JSPUI 5.6, the following features have been introduced: an extendible UI to deliver the bitstreams with dedicated viewers, a simple metadata editing of any DSpace object; the editing of archived items using the submission UI; a deduplication and duplicate-alert tool; improved ORCiD synchronization; improved submission form; improved security model for CRIS entities; creation of CRIS object as part of the submission process, automatic calculation of metrics; advanced import framework; on-demand DOI registration; template services.
DSpace-CKAN Integration Module allows users to directly preview the dataset content deposited in a CKAN instance from DSpace via a “curation task”. DSpace-CRIS and DSpace-CKAN will be supported by 4Science also for the future major versions of the platform and the roadmap to the DSpace 7 compatibility will be also presented.
As of Drupal 7 we'll have RDFa markup in core, in this session I will:
-explain what the implications are of this and why this matters
-give a short introduction to the Semantic web, RDF, RDFa and SPARQL in human language
-give a short overview of the RDF modules that are available in contrib
-talk about some of the potential use cases of all these magical technologies
Presentation done* at the 13th International Semantic Web Conference (ISWC) in which we approach a compressed format to represent RDF Data Streams. See the original article at: http://dataweb.infor.uva.es/wp-content/uploads/2014/07/iswc14.pdf
* Presented by Alejandro Llaves (http://www.slideshare.net/allaves)
Introduction of semantic technology for SAS programmersKevin Lee
There is a new technology to express and search the data that can provide more meaning and relationship –
semantic technology. The semantic technology can easily add, change and implement the meaning and relationship
to the current data. Companies such as Facebook and Google are currently using the semantic technology. For
example, Facebook Graph Search use semantic technology to enhance more meaningful search for users.
The paper will introduce the basic concepts of semantic technology and its graph data model, Resource Description
Framework (RDF). RDF can link data elements in a self-describing way with elements and property: subject,
predicate and object. The paper will introduce the application and examples of RDF elements. The paper will also
introduce three different representation of RDF: RDF/XML representation, turtle representation and N-triple
representation.
The paper will also introduce “CDISC standards RDF representation, Reference and Review Guide” published by
CDISC and PhUSE CSS. The paper will discuss RDF representation, reference and review guide and show how
CDISC standards are represented and displayed in RDF format.
The paper will also introduce Simple Protocol RDF Query Language (SPARQL) that can retrieve and manipulate data
in RDF format. The paper will show how programmers can use SPARQL to re-represent RDF format of CDISC
standards metadata into structured tabular format.
Finally, paper will discuss the benefits and futures of semantic technology. The paper will also discuss what semantic
technology means to SAS programmers and how programmers take an advantage of this new technology.
Distributed Query Processing for Federated RDF Data ManagementOlafGoerlitz
PhD defense talk about SPLENDID, a state-of-the-art implementation for efficient distributed SPARQL query processing on Linked Data using SPARQL endpoints and voiD descriptions.
In grammars we trust: LeadMine, a knowledge driven solutionNextMove Software
We present a system employing large grammars and dictionaries to recognize a broad range of chemical entities. The system utilizes these re-sources to identify chemical entities without an explicit tokenization step. To al-low recognition of terms slightly outside the coverage of these resources we employ spelling correction, entity extension, and merging of adjacent entities. Recall is enhanced by the use of abbreviation detection and precision is en-hanced by the removal of abbreviations of non-entities. With the use of training data to produce further dictionaries of terms to recognize/ignore our system achieved 86.2% precision and 85.0% recall on an unused development set.
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...dan2097
Extracting the structures of small molecules from unstructured text is now a mature field, however there still remain areas that present considerable difficulty or have until this point remained unexplored.
One such area is identification of chemical names with misspellings or errors introduced by optical character recognition. The approach we have taken employs a formal grammar describing the syntax of a systematic name. To provide coverage over the vast majority of organic nomenclature including carbohydrates, amino acids and natural products we have developed a new way of representing the grammar such as to allow an order of magnitude more states than previous efforts1 whilst simultaneously reducing memory consumption. To efficiently perform spelling correction against this grammar we will describe a heuristic spelling correction algorithm.
Another area that remains underexplored is the identification and resolution of chemical line formulae by which we also include domain specific line formulae such as are used to describe oligosaccharides and peptides. We describe the recognition and resolution of these often overlooked chemical entities.
We also show how one can identify entities such as journal and patent references, which can aid in the navigation of semantically enhanced documents.
(1) Sayle, R.; Xie, P. H.; Muresan, S. Improved Chemical Text Mining of Patents with Infinite Dictionaries and Automatic Spelling Correction. J. Chem. Inf. Model. 2011, 52, 51–62.
Presented at the seminar Libraries and the Semantic Web: the role of International Standard Bibliographic Description (ISBD), National Library of Scotland, Edinburgh, 25 Feb 2011
NASA's Earth Observing System (EOS) archive includes data collected over many years by many satellite instruments. These data are stored in the HDF format that includes data and metadata. The content of the metadata was examined for compliance with a set of conventions developed by the NASA science community at the beginning of the EOS Project (the HDF-EOS conventions). The initial results show that ~50% of the data files and 76% of the datasets have metadata that allows them to be used easily in standard tools. This talk was presented at the ESIP (espied.org) meeting during January 2014.
New data access paradigms support a variety of human and machine access paths with data servers (THREDDS, https://www.unidata.ucar.edu/software/thredds/current/tds/ and Hyrax, http://opendap.org) that support multiple services for a given dataset. We need metadata that can describe those services and unambiguously differentiate between access paths for humans and for machines. The ISO 19115 metadata standard includes service metadata and allows data and services for that data to be described in the same record. I propose that we use the service metadata for machine access and the more traditional distribution information for human access. This talk was presented at the ESIP (espied.org) meeting during January 2014.
The NASA Earth Science Data and Information System (ESDIS) is migrating documentation for their data and products towards International Standards developed by ISO Technical Committee 211 (ISO/TC211). In order to do this effectively, NASA must understand and participate in the ISO process. This presentation was given at a NASA ISO Seminar during November 2012. It outlines the ISO standards process and describes some extensions to the ISO standards that are being proposed to address ESDIS requirements not addressed in the original standard.
This is a talk from the Coalition for Networked Information Fall 2010 Member Meeting (CNIfall2010). I talked about our project to use Fedora as archival storage for social science research data and documentation.
DSpace-CRIS: new features and contribution to the DSpace mainstreamAndrea Bollini
The presentation focus on the latest releases of DSpace-CRIS, compatible with DSpace 5 and 6, with new exciting features. Particularly interesting is the recent integration between DSpace-CRIS and CKAN released as an independent module. The DSpace-CKAN Integration Module has already been released in open source (same license than DSpace) and it can easily adopted also by standard DSpace installations, both JSPUI or XMLUI.
Starting with DSpace-CRIS 5.6.1, along with the security fixes of DSpace JSPUI 5.6, the following features have been introduced: an extendible UI to deliver the bitstreams with dedicated viewers, a simple metadata editing of any DSpace object; the editing of archived items using the submission UI; a deduplication and duplicate-alert tool; improved ORCiD synchronization; improved submission form; improved security model for CRIS entities; creation of CRIS object as part of the submission process, automatic calculation of metrics; advanced import framework; on-demand DOI registration; template services.
DSpace-CKAN Integration Module allows users to directly preview the dataset content deposited in a CKAN instance from DSpace via a “curation task”. DSpace-CRIS and DSpace-CKAN will be supported by 4Science also for the future major versions of the platform and the roadmap to the DSpace 7 compatibility will be also presented.
As of Drupal 7 we'll have RDFa markup in core, in this session I will:
-explain what the implications are of this and why this matters
-give a short introduction to the Semantic web, RDF, RDFa and SPARQL in human language
-give a short overview of the RDF modules that are available in contrib
-talk about some of the potential use cases of all these magical technologies
Presentation done* at the 13th International Semantic Web Conference (ISWC) in which we approach a compressed format to represent RDF Data Streams. See the original article at: http://dataweb.infor.uva.es/wp-content/uploads/2014/07/iswc14.pdf
* Presented by Alejandro Llaves (http://www.slideshare.net/allaves)
Introduction of semantic technology for SAS programmersKevin Lee
There is a new technology to express and search the data that can provide more meaning and relationship –
semantic technology. The semantic technology can easily add, change and implement the meaning and relationship
to the current data. Companies such as Facebook and Google are currently using the semantic technology. For
example, Facebook Graph Search use semantic technology to enhance more meaningful search for users.
The paper will introduce the basic concepts of semantic technology and its graph data model, Resource Description
Framework (RDF). RDF can link data elements in a self-describing way with elements and property: subject,
predicate and object. The paper will introduce the application and examples of RDF elements. The paper will also
introduce three different representation of RDF: RDF/XML representation, turtle representation and N-triple
representation.
The paper will also introduce “CDISC standards RDF representation, Reference and Review Guide” published by
CDISC and PhUSE CSS. The paper will discuss RDF representation, reference and review guide and show how
CDISC standards are represented and displayed in RDF format.
The paper will also introduce Simple Protocol RDF Query Language (SPARQL) that can retrieve and manipulate data
in RDF format. The paper will show how programmers can use SPARQL to re-represent RDF format of CDISC
standards metadata into structured tabular format.
Finally, paper will discuss the benefits and futures of semantic technology. The paper will also discuss what semantic
technology means to SAS programmers and how programmers take an advantage of this new technology.
Distributed Query Processing for Federated RDF Data ManagementOlafGoerlitz
PhD defense talk about SPLENDID, a state-of-the-art implementation for efficient distributed SPARQL query processing on Linked Data using SPARQL endpoints and voiD descriptions.
In grammars we trust: LeadMine, a knowledge driven solutionNextMove Software
We present a system employing large grammars and dictionaries to recognize a broad range of chemical entities. The system utilizes these re-sources to identify chemical entities without an explicit tokenization step. To al-low recognition of terms slightly outside the coverage of these resources we employ spelling correction, entity extension, and merging of adjacent entities. Recall is enhanced by the use of abbreviation detection and precision is en-hanced by the removal of abbreviations of non-entities. With the use of training data to produce further dictionaries of terms to recognize/ignore our system achieved 86.2% precision and 85.0% recall on an unused development set.
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...dan2097
Extracting the structures of small molecules from unstructured text is now a mature field, however there still remain areas that present considerable difficulty or have until this point remained unexplored.
One such area is identification of chemical names with misspellings or errors introduced by optical character recognition. The approach we have taken employs a formal grammar describing the syntax of a systematic name. To provide coverage over the vast majority of organic nomenclature including carbohydrates, amino acids and natural products we have developed a new way of representing the grammar such as to allow an order of magnitude more states than previous efforts1 whilst simultaneously reducing memory consumption. To efficiently perform spelling correction against this grammar we will describe a heuristic spelling correction algorithm.
Another area that remains underexplored is the identification and resolution of chemical line formulae by which we also include domain specific line formulae such as are used to describe oligosaccharides and peptides. We describe the recognition and resolution of these often overlooked chemical entities.
We also show how one can identify entities such as journal and patent references, which can aid in the navigation of semantically enhanced documents.
(1) Sayle, R.; Xie, P. H.; Muresan, S. Improved Chemical Text Mining of Patents with Infinite Dictionaries and Automatic Spelling Correction. J. Chem. Inf. Model. 2011, 52, 51–62.
Presented at the seminar Libraries and the Semantic Web: the role of International Standard Bibliographic Description (ISBD), National Library of Scotland, Edinburgh, 25 Feb 2011
NASA's Earth Observing System (EOS) archive includes data collected over many years by many satellite instruments. These data are stored in the HDF format that includes data and metadata. The content of the metadata was examined for compliance with a set of conventions developed by the NASA science community at the beginning of the EOS Project (the HDF-EOS conventions). The initial results show that ~50% of the data files and 76% of the datasets have metadata that allows them to be used easily in standard tools. This talk was presented at the ESIP (espied.org) meeting during January 2014.
New data access paradigms support a variety of human and machine access paths with data servers (THREDDS, https://www.unidata.ucar.edu/software/thredds/current/tds/ and Hyrax, http://opendap.org) that support multiple services for a given dataset. We need metadata that can describe those services and unambiguously differentiate between access paths for humans and for machines. The ISO 19115 metadata standard includes service metadata and allows data and services for that data to be described in the same record. I propose that we use the service metadata for machine access and the more traditional distribution information for human access. This talk was presented at the ESIP (espied.org) meeting during January 2014.
The NASA Earth Science Data and Information System (ESDIS) is migrating documentation for their data and products towards International Standards developed by ISO Technical Committee 211 (ISO/TC211). In order to do this effectively, NASA must understand and participate in the ISO process. This presentation was given at a NASA ISO Seminar during November 2012. It outlines the ISO standards process and describes some extensions to the ISO standards that are being proposed to address ESDIS requirements not addressed in the original standard.
The ISO Metadata Standards include the capability to add citations to many kinds of external resources. This is very important for providing complete documentation required to understand and reproduce scientific results.
For many years metadata development activities have focused on developing and sharing metadata for discovering data. This is important. Once data are discovered, metadata supporting use and understanding become important. Efforts to encourage scientists and data providers to create those metadata have had limited success. This talk describes some approaches and tools for supporting the organizational change efforts required to integrate use and understanding metadata into organizational cultures. These approaches are described in terms of the ideas presented in Switch: How to Change Things When Change is Hard.
We are interested in developing a standard method for writing ISO TC211 compliant metadata into HDF data files. This presentation shows some initial workflows for this using the HDF Product Designer.
Science platforms are made up of (at least) four planks: data formats, services, tools and conventions. I focus here on formats and conventions, specifically the HDF5 format, already used in many disciplines, and the Climate-Forecast and HDF-EOS Conventions. Many science disciplines have already agreed on HDF as the preferred format for storing and sharing data. It is well established in high performance computing and supports arbitrary grouping and annotation. Community conventions are critical for useful data on top of the format. The Climate-Forecast (CF) conventions were created for relatively simple gridded data types while the HDF-EOS conventions originally considered more complex data (swaths). Making simple conventions more complex makes adoption more difficult. Community input and the need for stable data processing systems must be balanced in governance of conventions.
Communities use many different dialects to document their data. We need to be able to translate between these dialects and to understand how much is lost in translation
Wikis, Rubrics and Views: An Integrated Approach to Improving DocumentationTed Habermann
For many years scientists and data managers have focused on creating metadata that supports the discovery of available data. This is important, but once data sets are discovered, users need metadata that supports use and understanding of those data. This talk describes a system developed to support the required metadata improvements using wikis, rubrics, and metadata views. The wikis provide a mechanism for the community to record experiences and lessons learned and provide high-quality examples. Rubrics provide a mechanism for consistent and clear quantitative evaluation of the completeness of metadata records. The results displays include integrated links to the wiki. Views provide views with connections to the wiki and on-going interactive learning. These tools can be used with metadata from any standard and can facilitate translation of the metadata between multiple standards.
The HDF format is the foundation for sharing data in many communities that have created domain-specific conventions on top of HDF. This presentation was given at the Winter meeting of the Earth Science Information Partnership (ESIP).
ISO Metadata Improvements - Questions and AnswersTed Habermann
The ISO Standards for describing geospatial data, services, and other resources are changing. These slides describe a few of these changes in terms of documentation needs and how the new standards address these needs. I talked with these slides at a recent webinar that is available at https://www.youtube.com/watch?v=un-PtJLclIM&feature=youtu.be
Can ISO 19157 support current NASA data quality metadata?Ted Habermann
ISO 19157 provides a powerful framework for describing quality of Earth science datasets. As NASA migrates towards using that standard, it is important to understand whether and how existing data quality content fits into the ISO 19157 model. This talk demonstrates that fit and concludes that ISO 19157 can include all existing content and also includes new capabilities that can be very useful for all kinds of NASA data users.
Should We Expect a Bang or a Whimper? Will Linked Data Revolutionize Scholar Authoring and Workflow Tools?
Jeff Baer, Senior Director of Product Management, Research Development Services, Proquest
A preponderance of data from NASA's Earth Observing System (EOS) is archived in the HDF Version 4 (HDF4) format. The long-term preservation of these data is critical for climate and other scientific studies going many decades into the future. HDF4 is very effective for working with the large and complex collection of EOS data products. Unfortunately, because of the complex internal byte layout of HDF4 files, future readability of HDF4 data depends on preserving a complex software library that can interpret that layout. Having a way to access HDF4 data independent of a library could improve its viability as an archive format, and consequently give confidence that HDF4 data will be readily accessible forever, even if the HDF4 library is gone.
To address the need to simplify long-term access to EOS data stored in HDF4, a collaborative project between The HDF Group and NASA Earth Science Data Centers is implementing an approach to accessing data in HDF4 files based on the use of independent maps that describe the data in HDF4 files and tools that can use these maps to recover data from those files. With this approach, relatively simple programs will be able to extract the data from an HDF4 file, bypassing the need for the HDF4 library.
A demonstration project has shown that this approach is feasible. This involved an assessment of NASA�s HDF4 data holdings, and development of a prototype XML-based layout mapping language and tools to read layout maps and read HDF4 files using layout maps. Future plans call for a second phase of the project, in which the mapping tools and XML schema are made production quality, the mapping schema are integrated with existing XML metadata files in several data centers, and outreach activities are carried out to encourage and facilitate acceptance of the technology.
Semantic Integration with Apache Jena and StanbolAll Things Open
All Things Open 2014 - Day 1
Wednesday, October 22nd, 2014
Phillip Rhodes
Founder & President of Fogbeam Labs
Big Data
Semantic Integration with Apache Jena and Stanbol
Keynote at Chilean Week of Computer Science. I present a brief overview of algorithms for Recommender and then I present my work Tag-based Recommendation, Implicit Feedback and Visual Interactive Interfaces.
ResourceSync: Web-Based Resource Synchronization. Also for Data.
Herbert Van de Sompel, Digital Library Researcher, Los Alamos National Laboratory, and Co-chair of NISO’s ResourceSync Working Group
Web applications frequently leverage resources made available by remote Web servers. As resources are created, updated, or deleted these applications face challenges to remain in lockstep with the server’s change dynamics. Several approaches exist to help meet this challenge for use cases where “good enough” synchronization is acceptable. But when strict resource coverage or low synchronization latency is required, commonly accepted Web-based solutions remain elusive. Motivated by the need to synchronize resources for applications in the realm of cultural heritage and research communication, the National Information Standards Organization (NISO) and the Open Archives Initiative (OAI) have launched the ResourceSync project that aims at designing an approach for resource synchronization that is aligned with the web architecture and that has a fair chance of adoption by different communities. The presentation will discuss some motivating use cases and will provide a perspective on the resource synchronization problem that results from ResourceSync project discussions. It will provide an overview of the ongoing thinking regarding an approach to address the challenges and will pay special attention to aspects that are relevant for the synchronization of data.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
The HDF Product Designer – Interoperability in the First Mile
1. The HDF Group
www.hdfgroup.orgDecember 17, 2014 American Geophysical Union Fall Meeting
HDF Product Designer:
Interoperability in the First Mile
H. Joe Lee (hyoklee@hdfgroup.org),
Aleksandar Jelenak, and Ted Habermann
The HDF Group
2. www.hdfgroup.orgAmerican Geophysical Union Fall Meeting
Data Life Cycle – First and Last Miles
Archive Discovery AnalysisQuestion Processing Distribution
Repurposing
Data
Collection
3. www.hdfgroup.orgAmerican Geophysical Union Fall Meeting
Data Life Cycle – First and Last Miles
Archive Discovery AnalysisQuestion Processing Distribution
Principal Investigator Someone Else
Repurposing
Data
Collection
4. www.hdfgroup.orgAmerican Geophysical Union Fall Meeting
Data Life Cycle – First and Last Miles
Archive Discovery AnalysisQuestion Processing Distribution
Principal Investigator Someone Else
Repurposing
Data
Collection
Experts Non-Experts
5. www.hdfgroup.orgAmerican Geophysical Union Fall Meeting
Data Life Cycle – First and Last Miles
Archive Discovery AnalysisQuestion Processing Distribution
Principal Investigator Someone Else
Repurposing
Data
Collection
# Users
Experts Non-Experts
6. www.hdfgroup.orgAmerican Geophysical Union Fall Meeting
Data Life Cycle – First and Last Miles
Archive Discovery AnalysisQuestion Processing Distribution
Principal Investigator Someone Else
Repurposing
Data
Collection
# Users
Experts Non-Experts
Standards and
Conventions
7. www.hdfgroup.orgAmerican Geophysical Union Fall Meeting
Data Life Cycle – First and Last Miles
Archive Discovery AnalysisQuestion Processing Distribution
Principal Investigator Someone Else
Repurposing
Data
Collection
# Users
Experts Non-Experts
? Standards and
Conventions
8. www.hdfgroup.orgAmerican Geophysical Union Fall Meeting
Data Life Cycle – First and Last Miles
Archive Discovery AnalysisQuestion Processing Distribution
Principal Investigator Someone Else
Repurposing
Data
Collection
To facilitate collaborative design of
interoperable and standards-compliant
data products in HDF5 as early as possible
in the mission development process.
9. www.hdfgroup.orgAmerican Geophysical Union Fall Meeting
Data Life Cycle – First and Last Miles
Archive Discovery AnalysisQuestion Processing Distribution
Principal Investigator Someone Else
Repurposing
Data
Collection
To facilitate collaborative design of
interoperable and standards-compliant
data products in HDF5 as early as possible
in the mission development process.
Interoperability in the First Mile
10. www.hdfgroup.orgAmerican Geophysical Union Fall MeetingDecember 17, 2014
Mission Data Producer’s Conundrum
Mission Requirements
• Science objectives
• Data processing
• Data discovery & distribution
• Data documentation
• User engagement,
preparedness, feedback
11. www.hdfgroup.orgAmerican Geophysical Union Fall MeetingDecember 17, 2014
Mission Data Producer’s Conundrum
Interoperability
Mission Requirements
• Standards
• Conventions
• Best Practices
• Metadata
• Software Tools
• netCDF4, CF
• Science objectives
• Data processing
• Data discovery & distribution
• Data documentation
• User engagement,
preparedness, feedback
12. www.hdfgroup.orgAmerican Geophysical Union Fall MeetingDecember 17, 2014
Mission Data Producer’s Conundrum
HDF FeaturesInteroperability
Mission Requirements
• Datatypes
• Groups
• Attributes
• Scale/offset
• Dimension scales
• Compression
• Chunking
• Standards
• Conventions
• Best Practices
• Metadata
• Software Tools
• netCDF4, CF
• Science objectives
• Data processing
• Data discovery & distribution
• Data documentation
• User engagement,
preparedness, feedback
13. www.hdfgroup.orgAmerican Geophysical Union Fall MeetingDecember 17, 2014
Mission Data Producer’s Conundrum
HDF FeaturesInteroperability
Mission Requirements
HDF Product
Designer
• Datatypes
• Groups
• Attributes
• Scale/offset
• Dimension scales
• Compression
• Chunking
• Standards
• Conventions
• Best Practices
• Metadata
• Software Tools
• netCDF4, CF
• Science objectives
• Data processing
• Data discovery & distribution
• Data documentation
• User engagement,
preparedness, feedback
19. www.hdfgroup.orgAmerican Geophysical Union Fall MeetingDecember 17, 2014
HDF5 Product Design Architecture
Data Store
(PostgreSQL)
Desktop
Client
Restful
Service
(Tornado/Py
thon)
HDF5
Server
20. www.hdfgroup.orgAmerican Geophysical Union Fall MeetingDecember 17, 2014
HDF5 Product Design Architecture
Data Store
(PostgreSQL)
Desktop
Client
Restful
Service
(Tornado/Py
thon)
HDF5
JSON
HDF4 MAP
XML
NcML HDF5
Server
Flexible
Input
21. www.hdfgroup.orgAmerican Geophysical Union Fall MeetingDecember 17, 2014
HDF5 Product Design Architecture
Data Store
(PostgreSQL)
Desktop
Client
HDF5
JSON
Fortran IDL MATLAB Python
Restful
Service
(Tornado/Py
thon)
CSV
(Excel)
HDF5
JSON
HDF4 MAP
XML
NcML HDF5
Server
HDF5 File
Template
Flexible
Input
Flexible Output
22. www.hdfgroup.orgAmerican Geophysical Union Fall MeetingDecember 17, 2014
Convention Support
• Initial:
• NetCDF User’s Guide (NUG)
• Attribute Convention for Data Discovery (ACDD)
• Object Convention for Data Discovery (OCDD)
• Climate and Forecast (CF)
• HDF-EOS
• Implementation:
• Conventions for groups and variables
• Convention for connected variables (e.g.
coordinate dimensions)
• Compliance checkers on entire file
• Support for community components
23. www.hdfgroup.orgAmerican Geophysical Union Fall MeetingDecember 17, 2014
Conclusion
• enable individuals and mission teams to
design products quickly and easily
• enable collaboration at many levels
• promote data management best practices
HDF Product Designer is being built using a
flexible architecture to support multiple front
and back ends. It will: