UNM Cyberinfrastructure Day 2010 presentation: Applications in Biocomputing, biomedical and cheminformatics research computing cyberinfrastructure issues.
DNA computing has several advantages including performing millions of operations in parallel, using large amounts of data storage in a small space, being lightweight, low power, and environmentally friendly. However, it also faces challenges such as molecular operations not being perfect and having a relatively high error rate. DNA computing shows promise for medical applications such as cancer diagnosis and targeted drug delivery but introducing genetic material into humans safely requires overcoming challenges like immune system reactions. While concerns around ethics and computers/DNA taking control exist, DNA computing remains an emerging field with opportunities in healthcare.
Biological computers use biological components like DNA to store and process data analogous to human body processes. They are implantable devices with a CPU and use DNA as software to monitor body activities and process data faster than traditional computers. DNA contains all genetic information in its molecular structure and biological computers use DNA computing, storing information in DNA molecules that can perform calculations much faster than regular computers using DNA's four basic components - adenine, cytosine, guanine and thymine. While biological computers are more efficient, accurate and environmentally friendly than traditional silicon-based computers, they also face challenges like potential hacking, need for human assistance, DNA degradation over time and rare pairing errors.
Bio computing uses DNA and biochemical processes to store and manipulate data similarly to human biology. DNA can store vast amounts of data densely due to its structure of paired chemical bases. A DNA computer operates massively in parallel and with extraordinary energy efficiency compared to conventional computers. While DNA computing shows potential for medical and data applications, it still requires further development to overcome challenges such as reduced accuracy compared to conventional computing.
Bio computers use systems of biologically derived molecules—such as DNA and proteins—to perform computational calculations involving storing, retrieving, and processing data. The development of biocomputers has been made possible by the expanding new science of nanobiotechnology.
The document discusses DNA computers as an alternative to silicon-based computers. DNA computers use DNA strands as a means of data storage and processing. Some key advantages of DNA computers include massive parallelism, as all DNA strands can be operated on simultaneously, and large storage capacity, as a single gram of DNA can store over 1x10^14 megabytes of data. DNA computers also have error correction mechanisms that allow them to resist viruses. They may help solve computationally difficult problems like the Hamiltonian problem more efficiently through massive parallelization. However, more work is still needed to develop DNA computing into a fully practical technology.
The Virtual brain or machine which can function like human brain, which would work even after death of the human is called the blue brain.
Under this topic I would basically cover the functionalities of the blue brain, its advantages and disadvantages, what actually a virtual brain is etc.
Many research work is going under this field and expected to release after 2020 approx. This is a new science technology. For this, we should have a good knowledge of the brain and its internal parts along with their functions. Basically, this is being done to upload human brain into machine so that we need to take no effort for thinking or remembering. Even after death, the virtual brain will act as a man.
DNA computing has several advantages including performing millions of operations in parallel, using large amounts of data storage in a small space, being lightweight, low power, and environmentally friendly. However, it also faces challenges such as molecular operations not being perfect and having a relatively high error rate. DNA computing shows promise for medical applications such as cancer diagnosis and targeted drug delivery but introducing genetic material into humans safely requires overcoming challenges like immune system reactions. While concerns around ethics and computers/DNA taking control exist, DNA computing remains an emerging field with opportunities in healthcare.
Biological computers use biological components like DNA to store and process data analogous to human body processes. They are implantable devices with a CPU and use DNA as software to monitor body activities and process data faster than traditional computers. DNA contains all genetic information in its molecular structure and biological computers use DNA computing, storing information in DNA molecules that can perform calculations much faster than regular computers using DNA's four basic components - adenine, cytosine, guanine and thymine. While biological computers are more efficient, accurate and environmentally friendly than traditional silicon-based computers, they also face challenges like potential hacking, need for human assistance, DNA degradation over time and rare pairing errors.
Bio computing uses DNA and biochemical processes to store and manipulate data similarly to human biology. DNA can store vast amounts of data densely due to its structure of paired chemical bases. A DNA computer operates massively in parallel and with extraordinary energy efficiency compared to conventional computers. While DNA computing shows potential for medical and data applications, it still requires further development to overcome challenges such as reduced accuracy compared to conventional computing.
Bio computers use systems of biologically derived molecules—such as DNA and proteins—to perform computational calculations involving storing, retrieving, and processing data. The development of biocomputers has been made possible by the expanding new science of nanobiotechnology.
The document discusses DNA computers as an alternative to silicon-based computers. DNA computers use DNA strands as a means of data storage and processing. Some key advantages of DNA computers include massive parallelism, as all DNA strands can be operated on simultaneously, and large storage capacity, as a single gram of DNA can store over 1x10^14 megabytes of data. DNA computers also have error correction mechanisms that allow them to resist viruses. They may help solve computationally difficult problems like the Hamiltonian problem more efficiently through massive parallelization. However, more work is still needed to develop DNA computing into a fully practical technology.
The Virtual brain or machine which can function like human brain, which would work even after death of the human is called the blue brain.
Under this topic I would basically cover the functionalities of the blue brain, its advantages and disadvantages, what actually a virtual brain is etc.
Many research work is going under this field and expected to release after 2020 approx. This is a new science technology. For this, we should have a good knowledge of the brain and its internal parts along with their functions. Basically, this is being done to upload human brain into machine so that we need to take no effort for thinking or remembering. Even after death, the virtual brain will act as a man.
Professional Issues in IT course project presentation to discuss how DNA can be used to store and manipulate information. Also, I discussed why or how can we use DNA in computing.
This seminar presentation discusses bio molecular computing. It explains that bio molecular computing uses biological molecules like DNA instead of silicon chips for computing. It works by pairing DNA bases and using enzymes to cut or splice DNA molecules. While bio molecular computing has potential advantages in terms of memory, it also faces challenges like being resource intensive, producing errors, and not allowing easy transmission of information. Whether bio molecular computing becomes practical will depend on overcoming these challenges and finding applications where it has clear advantages over traditional computing.
The document presents an overview of DNA computers. DNA computers use DNA molecules as the data storage medium and enzymes as the processing units. Some key advantages of DNA computers include massive data storage capacity using a small physical space, highly parallel processing, and low cost. However, DNA computers also currently have limitations such as high error rates and the need for human assistance in laboratory procedures. Potential applications of DNA computing include DNA chips, genetic programming, and pharmaceutical analysis. While DNA computers show promise, further work is still needed to develop them into a practical product.
Molecular computing is an emerging field to which chemistry,
biophysics, molecular biology, electronic engineering, solid state physics and computer science contribute to a large extent. It involves the encoding, manipulation and retrieval of information at a macro molecular level in contrast to the current techniques, which accomplish the above functions
via IC miniaturization of bulk devices. Bio-molecular computers have the real potential for solving problems of high computational complexities and therefore, many problems are still associated with this field.
DNA computers have potential to replace silicon-based computers by storing vast amounts of data within DNA strands. DNA computers operate in parallel through chemical reactions rather than linearly like silicon. While early DNA computers were test tubes and gold plates, they now include a 2002 gene analysis biochip and a 2003 self-powered programmable computer. DNA computers could be smaller and more powerful than supercomputers, but current challenges include lack of full accuracy and DNA degradation. Further development is still needed but DNA computing shows promise for medical and data processing uses.
This document discusses DNA computing and provides an overview of key concepts. It summarizes Adleman's 1994 experiment solving the Hamiltonian path problem using DNA strands to represent graph connections. While DNA computing shows promise for massively parallel processing, current limitations include slow laboratory procedures and inability to represent data universally. Future advances may address these issues and enable DNA computers to solve certain complex problems not feasible with electronic computers.
Sequencing Genomics:The New Big Data DriverLarry Smarr
1. Genomic sequencing is driving big data as the cost of sequencing DNA falls faster than Moore's Law and the amount of data produced increases dramatically.
2. The Beijing Genome Institute is the world's largest genomic institute, using over 130 sequencing machines each producing 25 gigabases per day for a total of over 12 petabytes of data storage.
3. Interdisciplinary teams of computer scientists, data analysts, and geneticists are needed to analyze the massive amounts of genomic and metagenomic data being produced to gain insights into human health and disease.
DNA has potential as a material for computing due to its large storage capacity and ability to perform calculations in parallel. While DNA computing is currently slow and can only return yes or no answers, researchers are working to address these issues and develop more complex DNA-based computers. Examples include the MAYA computers which used DNA logic gates to play Tic-Tac-Toe, and experiments using light-sensitive bacteria to create a basic binary system for information storage and processing.
The document discusses using DNA tokens for parallel computing. It outlines the basic structure and operations on DNA, including hybridization, denaturation, cutting, and amplification. It then discusses how the travelling salesman problem was solved using DNA computing. An experiment was conducted using DNA tokens to send data between molecular processors. The results showed this approach was successful for massively parallel computation.
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...Larry Smarr
08.06.16
Invited Talk
Association of University Research Parks BioParks 2008
"From Discovery to Innovation"
Salk Institute
Title: High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Research Instruments
La Jolla, CA
Power point presentation of saminer topic DNA based computingPaushali Sen
This document provides an overview of DNA computing. It discusses how Adleman solved the Hamiltonian path problem using DNA molecules to represent the graph and encode possible paths. The key steps involved encoding the cities as DNA sequences, encoding all possible paths as complementary DNA strands, merging the strands so that complementary bases adhere to represent all possible solutions simultaneously, and using various DNA techniques like PCR and electrophoresis. While DNA computing shows promise for massively parallel processing and energy efficiency, current limitations include error rates and the need for manual intervention.
The document discusses DNA computers. It explains that DNA computers can store vastly more information than conventional computers and solve complex problems faster. DNA computers use DNA's ability to store genetic information through nucleotide base pairing to process and solve computational problems in a massively parallel way. The first successful DNA computer was created in 1994 by Leonard Adleman, who used DNA to solve the traveling salesman problem. The document then provides details on the structure of DNA, including its double helix shape, nucleotide base pairing rules of A-T and C-G, and directionality of strands.
Molecular computers are systems in which molecules or macromolecules individually mediate
information processing functions. Molecular computing provides an alternative to computing using silicon
integrated circuits. It aims at developing intelligent computers using biological molecules as computational
devices. It is a promising means of unconventional computation owing to its capability for massive parallelism.
It offers to augment digital computing with biology-like capabilities. This paper provides a brief introduction to
molecular computing.
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
This document summarizes genomic big data management, integration and mining. It discusses the exponential growth of biological data due to advances in sequencing technologies. Next generation sequencing techniques generate large amounts of short DNA reads. Several public databases contain heterogeneous biological data sources. Effective data management and integration methods are needed to analyze these large and complex datasets. Supervised machine learning can be used to extract knowledge and classify samples. Tools like CAMUR apply rule-based classification to problems like analyzing gene expression from cancer datasets. Future work involves advanced integration systems and new big data approaches for biological data.
DNA Computer can store billions of times more information then your PC hard drive and solve complex problems in a less time. We know that computer chip manufacturers are racing to make the next microprocessor that will more faster. Microprocessors made of silicon will eventually reach their limits of speed and miniaturization. Chips makers need a new material to produce faster computing speeds.
2015-08-13 ESA: NextGen tools for scaling from seeds to traits to ecosystemsTimeScience
This document discusses using new technologies to monitor ecosystems and plants at high resolution over time. It proposes collecting detailed data on individual plants and trees in fields and forests through methods like:
- Automated imaging of plant growth in controlled lab conditions
- Sensor networks and remote sensing to generate 3D models of field sites from aerial drones and ground-based laser scanning
- Genotyping every plant to correlate phenotypes with genetics
The goal is to generate massive, multilayer datasets that track environmental and genetic factors over time at plant and ecosystem scales, analogous to advances in high-throughput genomics and phenomics. This would transform ecological understanding and address global challenges around food security and climate change.
1. Synthetic biology enables the extreme genetic engineering of lifeforms through techniques like designing DNA, splicing genes, and synthesizing genomes.
2. There are concerns about potential misuse for biowarfare and rapid digital biopiracy that can bypass traditional benefit-sharing systems.
3. As synthetic biology advances, it may allow for the mass construction of new lifeforms and genomes at an unprecedented scale and speed, with uncertain and potentially dangerous implications.
The document discusses the potential for nanomedicine and cryonics to revolutionize medicine in the future. It describes how nanotechnology could enable the precise arrangement of atoms to create molecular machines like robotic arms and computers smaller than cells. These devices could guide cell repair and restore lost functions, preserving life. The document also discusses cryonics and the possibility that those preserved could be revived in the future when medical technologies are advanced enough to repair the damage of freezing.
The document discusses the growth of data-intensive science and the need for new computing infrastructures to manage the large amounts of data being produced. It covers three perspectives on infrastructure: grid computing which enables sharing of distributed resources over the internet, data centers which provide integrated storage and computing services, and e-science which combines grids, collaboration tools, and data analysis services. Examples are given of different scientific domains using these infrastructures.
The document discusses how computation can accelerate the generation of new knowledge by enabling large-scale collaborative research and extracting insights from vast amounts of data. It provides examples from astronomy, physics simulations, and biomedical research where computation has allowed more data and researchers to be incorporated, advancing various fields more quickly over time. Computation allows for data sharing, analysis, and hypothesis generation at scales not previously possible.
Professional Issues in IT course project presentation to discuss how DNA can be used to store and manipulate information. Also, I discussed why or how can we use DNA in computing.
This seminar presentation discusses bio molecular computing. It explains that bio molecular computing uses biological molecules like DNA instead of silicon chips for computing. It works by pairing DNA bases and using enzymes to cut or splice DNA molecules. While bio molecular computing has potential advantages in terms of memory, it also faces challenges like being resource intensive, producing errors, and not allowing easy transmission of information. Whether bio molecular computing becomes practical will depend on overcoming these challenges and finding applications where it has clear advantages over traditional computing.
The document presents an overview of DNA computers. DNA computers use DNA molecules as the data storage medium and enzymes as the processing units. Some key advantages of DNA computers include massive data storage capacity using a small physical space, highly parallel processing, and low cost. However, DNA computers also currently have limitations such as high error rates and the need for human assistance in laboratory procedures. Potential applications of DNA computing include DNA chips, genetic programming, and pharmaceutical analysis. While DNA computers show promise, further work is still needed to develop them into a practical product.
Molecular computing is an emerging field to which chemistry,
biophysics, molecular biology, electronic engineering, solid state physics and computer science contribute to a large extent. It involves the encoding, manipulation and retrieval of information at a macro molecular level in contrast to the current techniques, which accomplish the above functions
via IC miniaturization of bulk devices. Bio-molecular computers have the real potential for solving problems of high computational complexities and therefore, many problems are still associated with this field.
DNA computers have potential to replace silicon-based computers by storing vast amounts of data within DNA strands. DNA computers operate in parallel through chemical reactions rather than linearly like silicon. While early DNA computers were test tubes and gold plates, they now include a 2002 gene analysis biochip and a 2003 self-powered programmable computer. DNA computers could be smaller and more powerful than supercomputers, but current challenges include lack of full accuracy and DNA degradation. Further development is still needed but DNA computing shows promise for medical and data processing uses.
This document discusses DNA computing and provides an overview of key concepts. It summarizes Adleman's 1994 experiment solving the Hamiltonian path problem using DNA strands to represent graph connections. While DNA computing shows promise for massively parallel processing, current limitations include slow laboratory procedures and inability to represent data universally. Future advances may address these issues and enable DNA computers to solve certain complex problems not feasible with electronic computers.
Sequencing Genomics:The New Big Data DriverLarry Smarr
1. Genomic sequencing is driving big data as the cost of sequencing DNA falls faster than Moore's Law and the amount of data produced increases dramatically.
2. The Beijing Genome Institute is the world's largest genomic institute, using over 130 sequencing machines each producing 25 gigabases per day for a total of over 12 petabytes of data storage.
3. Interdisciplinary teams of computer scientists, data analysts, and geneticists are needed to analyze the massive amounts of genomic and metagenomic data being produced to gain insights into human health and disease.
DNA has potential as a material for computing due to its large storage capacity and ability to perform calculations in parallel. While DNA computing is currently slow and can only return yes or no answers, researchers are working to address these issues and develop more complex DNA-based computers. Examples include the MAYA computers which used DNA logic gates to play Tic-Tac-Toe, and experiments using light-sensitive bacteria to create a basic binary system for information storage and processing.
The document discusses using DNA tokens for parallel computing. It outlines the basic structure and operations on DNA, including hybridization, denaturation, cutting, and amplification. It then discusses how the travelling salesman problem was solved using DNA computing. An experiment was conducted using DNA tokens to send data between molecular processors. The results showed this approach was successful for massively parallel computation.
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...Larry Smarr
08.06.16
Invited Talk
Association of University Research Parks BioParks 2008
"From Discovery to Innovation"
Salk Institute
Title: High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Research Instruments
La Jolla, CA
Power point presentation of saminer topic DNA based computingPaushali Sen
This document provides an overview of DNA computing. It discusses how Adleman solved the Hamiltonian path problem using DNA molecules to represent the graph and encode possible paths. The key steps involved encoding the cities as DNA sequences, encoding all possible paths as complementary DNA strands, merging the strands so that complementary bases adhere to represent all possible solutions simultaneously, and using various DNA techniques like PCR and electrophoresis. While DNA computing shows promise for massively parallel processing and energy efficiency, current limitations include error rates and the need for manual intervention.
The document discusses DNA computers. It explains that DNA computers can store vastly more information than conventional computers and solve complex problems faster. DNA computers use DNA's ability to store genetic information through nucleotide base pairing to process and solve computational problems in a massively parallel way. The first successful DNA computer was created in 1994 by Leonard Adleman, who used DNA to solve the traveling salesman problem. The document then provides details on the structure of DNA, including its double helix shape, nucleotide base pairing rules of A-T and C-G, and directionality of strands.
Molecular computers are systems in which molecules or macromolecules individually mediate
information processing functions. Molecular computing provides an alternative to computing using silicon
integrated circuits. It aims at developing intelligent computers using biological molecules as computational
devices. It is a promising means of unconventional computation owing to its capability for massive parallelism.
It offers to augment digital computing with biology-like capabilities. This paper provides a brief introduction to
molecular computing.
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
This document summarizes genomic big data management, integration and mining. It discusses the exponential growth of biological data due to advances in sequencing technologies. Next generation sequencing techniques generate large amounts of short DNA reads. Several public databases contain heterogeneous biological data sources. Effective data management and integration methods are needed to analyze these large and complex datasets. Supervised machine learning can be used to extract knowledge and classify samples. Tools like CAMUR apply rule-based classification to problems like analyzing gene expression from cancer datasets. Future work involves advanced integration systems and new big data approaches for biological data.
DNA Computer can store billions of times more information then your PC hard drive and solve complex problems in a less time. We know that computer chip manufacturers are racing to make the next microprocessor that will more faster. Microprocessors made of silicon will eventually reach their limits of speed and miniaturization. Chips makers need a new material to produce faster computing speeds.
2015-08-13 ESA: NextGen tools for scaling from seeds to traits to ecosystemsTimeScience
This document discusses using new technologies to monitor ecosystems and plants at high resolution over time. It proposes collecting detailed data on individual plants and trees in fields and forests through methods like:
- Automated imaging of plant growth in controlled lab conditions
- Sensor networks and remote sensing to generate 3D models of field sites from aerial drones and ground-based laser scanning
- Genotyping every plant to correlate phenotypes with genetics
The goal is to generate massive, multilayer datasets that track environmental and genetic factors over time at plant and ecosystem scales, analogous to advances in high-throughput genomics and phenomics. This would transform ecological understanding and address global challenges around food security and climate change.
1. Synthetic biology enables the extreme genetic engineering of lifeforms through techniques like designing DNA, splicing genes, and synthesizing genomes.
2. There are concerns about potential misuse for biowarfare and rapid digital biopiracy that can bypass traditional benefit-sharing systems.
3. As synthetic biology advances, it may allow for the mass construction of new lifeforms and genomes at an unprecedented scale and speed, with uncertain and potentially dangerous implications.
The document discusses the potential for nanomedicine and cryonics to revolutionize medicine in the future. It describes how nanotechnology could enable the precise arrangement of atoms to create molecular machines like robotic arms and computers smaller than cells. These devices could guide cell repair and restore lost functions, preserving life. The document also discusses cryonics and the possibility that those preserved could be revived in the future when medical technologies are advanced enough to repair the damage of freezing.
The document discusses the growth of data-intensive science and the need for new computing infrastructures to manage the large amounts of data being produced. It covers three perspectives on infrastructure: grid computing which enables sharing of distributed resources over the internet, data centers which provide integrated storage and computing services, and e-science which combines grids, collaboration tools, and data analysis services. Examples are given of different scientific domains using these infrastructures.
The document discusses how computation can accelerate the generation of new knowledge by enabling large-scale collaborative research and extracting insights from vast amounts of data. It provides examples from astronomy, physics simulations, and biomedical research where computation has allowed more data and researchers to be incorporated, advancing various fields more quickly over time. Computation allows for data sharing, analysis, and hypothesis generation at scales not previously possible.
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
This document provides an overview of bioinformatics tools and services for analyzing big data in biomedical research. It discusses traditional bioinformatics tools, analyzing genomic data from microarrays and next-generation sequencing without and with code, interpreting results using protein interaction networks and pathways, tools for data storage, cleaning and visualization, and making research reproducible. Galaxy, R, and programming are presented as useful for automated, reproducible analysis of large genomic datasets.
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Keiichiro Ono
Introduction to biological network analysis and visualization with Cytoscape (using the latest version 3.4).
This is a first half of the lecture for Applied Bioinformatics lecture at TSRI.
This is a talk I gave at a Northwestern University - Complete Genomics Workshop on April 21, 2011 about using clouds to support research in genomics and related areas.
Opening talk at the "Interdisciplinary Data Resources to Address the Challenges of Urban Living” Workshop at the Urban Big Data Centre, University of Glasgow, 4 April 2016
Foundations for the future of science discusses using artificial intelligence and machine learning to advance scientific research. Key points discussed include using AI to analyze large datasets, develop scientific models, and automate experimental workflows. The document also outlines several examples of how the Globus data platform is currently enabling AI-powered scientific applications across multiple domains. Overall, the document advocates that embracing "AI for science" has the potential to accelerate scientific discovery by overcoming limitations in human analysis capabilities and computational resources.
Building an Information Infrastructure to Support Microbial Metagenomic SciencesLarry Smarr
06.01.14
Presentation for the Microbe Project Interagency Team
Title: Building an Information Infrastructure to Support Microbial Metagenomic Sciences
La Jolla, CA
This document discusses challenges and opportunities for integrating large, heterogeneous biological data sets. It outlines the types of analysis and discovery that could be enabled, such as comparing data across studies. Technical challenges include incompatible identifiers and schemas between data sources. Common solutions attempt standardization but have limitations. The document examines Amazon's approach as a model, with principles like exposing all data through programmatic interfaces. It argues for a "platform" approach and combining data-driven and model-driven analysis to gain new insights. Developing services with end users in mind could help maximize data reuse.
Stephen Friend Dana Farber Cancer Institute 2011-10-24Sage Base
The document discusses building disease models using data intensive science and open medical information systems, with the goal of better understanding disease biology before testing drugs. It describes the Sage Bionetworks non-profit organization, which aims to create a commons for shared disease maps and models through several pilot projects including clinical trial data sharing and identifying cancer patients who do not respond to approved drug regimens.
This document discusses the challenges of handling large-scale genomic and biological data and proposes potential solutions. It notes that data volumes are increasing rapidly due to advances in sequencing technology but dissemination and data handling methods have not kept pace. Several hurdles to data sharing are described including technical issues around data size, heterogeneity and longevity as well as economic and cultural barriers. Potential solutions discussed include providing incentives for data sharing through attribution and citation, adopting data citation practices using Digital Object Identifiers, establishing funding models for long-term curation, and launching new databases and journals focused on publishing and analyzing large-scale datasets.
In this deck from the 2014 HPC User Forum in Seattle, Jack Collins from the National Cancer Institute presents: Genomes to Structures to Function: The Role of HPC.
Watch the video presentation: http://wp.me/p3RLHQ-d28
08.04.14
Invited Talk
National Astrobiology Institute Executive Council Meeting
Astrobiology Science Conference 2008
Santa Clara Convention Center
Title: High Performance Collaboration
Santa Clara, CA
Taverna is a free and open-source workflow management system that allows researchers to design and execute scientific workflows. It was developed by the University of Manchester to support in silico experiments in biology. Taverna provides a graphical user interface for designing workflows using a variety of distributed data sources and web services without having to learn complex programming. It has been widely adopted by researchers in fields such as biology, healthcare, astronomy, and cheminformatics to automate analysis pipelines and share workflows.
This talk presents areas of investigation underway at the Rensselaer Institute for Data Exploration and Applications. First presented at Flipkart, Bangalore India, 3/2015.
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Preferred Networks
Currently, we face new challenges in realtime analytics of BigData, such as social monitoring, M2M sensor, online advertising optimization, smart energy management and security monitoring. To analyze these data, scalable machine learning technologies are essential. Jubatus is the open source platform for online distributed machine learning on the data streams of BigData. we explain the inside technologies of Jubatus and show how jubatus can achieve realtime analytics in various problems.
The National Resource for Network Biology aims to provide freely available, open-source software tools to enable researchers to assemble biological data into networks and pathways and use these networks to better understand biological systems and disease; it pursues this mission through technology research and development projects, driving biological projects, collaboration and service projects, training, and dissemination; key components include the Cytoscape software platform, supercomputing infrastructure, and partnerships with over 30 external research groups.
TIGA: Target Illumination GWAS AnalyticsJeremy Yang
Aggregating and assessing experimental evidence for interpretable, explainable, accountable gene-trait associations. Presentation for NIH IDG Annual Meeting, Feb 9-11, 2021.
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerJeremy Yang
DrugCentralDb, a biomedical research database developed at UNM and widely used by drug discovery scientists, has been Dockerized and deployed via AWS EC2. Additionally, we have developed a Python package BioClients, with module 'drugcentral' API for DrugCentral. Source code and Docker image are available via GitHub and DockerHub, respectively. These tools are new and in testing, with full release planned for later in 2020.
Mining ClinicalTrials.gov via CTTI AACT for drug target hypothesesJeremy Yang
This document discusses methods for mining clinical trial data from ClinicalTrials.gov to infer disease-target associations. It presents several proposed confidence metrics for evaluating these associations and provides examples. Over 1.2 million associations were found involving 164,000 unique disease-gene pairs. Named entity recognition tools were used to extract drugs, diseases, and other information from trial descriptions. Top drugs, diseases, and targets were identified based on number of mentions.
TIN-X v2: modernized architecture with REST APIJeremy Yang
TIN-X v2: modernized architecture with REST API for sustainability and interoperability. Presented at the IDG Face2Face meeting in Arlington, VA, Feb 26-27, 2019.
Ex-files: Sex-Specific Gene Expression Profiles ExplorerJeremy Yang
Poster prepared for NIH Data Commons Pilot Project Consortium (DCPPC) scientific use case, developed in 2018, with GTEx gene expression data and deployed as online application.
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Jeremy Yang
Talk given at 14th Annual New Mexico BioInformatics, Science and Technology (NMBIST) Symposium, entitled Integrative Omics, on March 14-15, 2019. Most slides c/o IDG KMC PI Tudor Oprea, MD, PhD.
Badapple: promiscuity patterns from noisy evidence (poster)Jeremy Yang
Badapple: promiscuity patterns from noisy evidence. Bioassay data analysis using scaffold associations. Presented at the UNM Staff Research Expo, Jan 27, 2017. Adapted from "Badapple: promiscuity patterns from noisy evidence", Yang JJ, Ursu O, Lipinski CA, Sklar LA, Oprea TI Bologa CG, J. Cheminfo. 8:29 (2016), DOI: 10.1186/s13321-016-0137-3.
Bibliological data science and drug discoveryJeremy Yang
Presented at the 2016 ACS Fall Meeting in Philadelpha, session "Effectively Harnessing the World's Literature to Inform Rational Compound Design", on 8/21/16.
BioMISS: Language Diversity of ComputingJeremy Yang
Talk given at the UNM BioMedical Informatics Seminar Series, Oct 15, 2015. Because the languages of computing are numerous and diverse, it can be challenging to choose an appropriate language for a given task. Yet data are of little value unless represented by semantic systems of languages with appropriate levels of abstraction. We consider the analogy between object-oriented programming and abstraction in biomedical vocabulary and the Sapir-Whorf Hypothesis (that an individual’s thoughts and actions are determined by the language he or she speaks). As an example, we consider the differences between ICD-10 and disease ontology.
This document discusses the many languages used in computing to communicate and represent information. It begins by providing examples of programming languages like Python and Perl, as well as other computer languages for various purposes. It then defines what is meant by the term "language" and lists some major advances in computing languages over time. The document notes challenges like language gaps, differing standards, and issues representing biomedical knowledge and disease codes across systems. It describes a project to illuminate the druggable genome and the language challenges of representing drug names and disease ontologies consistently. The conclusion reflects on the goal of computers being able to effectively "talk" with each other through the diversity of languages.
RMSD (root mean square deviation) is commonly used to measure the geometric difference between molecular conformations, but it has limitations. It reduces multiple atomic distance comparisons to a single number, missing important details. While useful, RMSD may be insufficient on its own and larger values should be inspected more closely. The document examines alternative measures like variance, maximum distance, shape similarity, and graph-based measures that can provide additional insight into structural relationships not fully captured by RMSD alone. It concludes that investigator should be aware of RMSD's limitations and consider supplementing it with other measures.
Canonicalized systematic nomenclature in cheminformaticsJeremy Yang
This document discusses canonicalization in chemoinformatics and new canonicalization tools from OpenEye. It reviews existing canonicalization methods like the Morgan algorithm and describes how OpenEye has implemented and expanded on these methods to canonicalize molecular structures, tautomers, and pKa states. OpenEye tools like OEChem and QuacPac can generate canonical SMILES, connection tables, and representations of different chemical forms and standard file formats.
Molecular scaffolds are special and useful guides to discovery, poster (36x54"). Presented at ACS National Meeting SciMix in Indianapolis, Sep 9, 2013.
Molecular scaffolds are special and useful guides to discoveryJeremy Yang
Molecular scaffolds are special structures that can be used to guide discovery in fields like chemical biology and drug discovery. Scaffolds represent the core structure or framework of molecules. They are useful because they allow clustering and organization of chemical data, exploration of chemical space, and prediction of properties like bioactivity. Examples of famous drug scaffolds discussed include the beta-lactam, steroid, and benzodiazepine scaffolds. Software tools are available for scaffold analysis and applications include database clustering, navigation of chemical space, and prediction of promiscuity. While the definition of a scaffold is not always consistent, cheminformatics methods can help address challenges in scaffold analysis.
How am I supposed to organize a protein database when I can't even organize m...Jeremy Yang
This document discusses challenges in organizing protein and human databases using cheminformatics approaches. It provides examples of using algorithms to determine if protein or small molecule structures are the same, but questions how useful this is given the complex nature of proteins and humans. While cheminformatics has been very successful in organizing small molecule data, the document argues it may not directly transfer to domains like proteins or human identities that are more complex and context-dependent. The conclusion is that cheminformatics approaches could potentially be used for indexing related domains, but their power is limited when logical assumptions break down, such as determining the number of individuals with the same name.
Promiscuous patterns and perils in PubChem and the MLSCNJeremy Yang
The document discusses analyzing promiscuous compounds and patterns in PubChem and the Molecular Libraries Screening Center Network (MLSCN) database. It defines promiscuity and discusses types of promiscuous compounds like aggregators and reactives. It also examines very active scaffolds and histograms of compounds versus assays for different scaffolds. The goal is to improve hit rates in high-throughput screening by pre-filtering or post-analysis of promiscuous compounds and patterns.
Promiscuous patterns and perils in PubChem and the MLSCN
Cyberinfrastructure Day 2010: Applications in Biocomputing
1. Jeremy Yang
Software Systems Manager
Division of Biocomputing
Dept. of Biochemistry & Molecular Biology
UNM School of Medicine
Cyberinfrastructure Day -- April 22, 2010
2. I. What is Biocomputing?
II. Cyber Revolution (~1980-2010+)
III. Cyberinfrastructure (To be or not to be?)
IV. Super Computing, Redefined
3. Division of Biocomputing
http://biocomp.health.unm.edu/
Department of Biochemistry & Molecular Biology
School of Medicine
Also affiliated with the NIH Roadmap-funded UNM
Center for Molecular Discovery
4. Biomolecular screening Data mining, machine
informatics learning
Cheminformatics 3D visualization
Bioinformatics Public data integration
Genomics Collaborations in
Virtual screening chemistry, biology,
medicine, comp sci
Molecular modeling
BIOMED 505 course
SAR (Structure-
Activity-Relationship) Software development,
management, deployment
& support
5. Larry Sklar, et al., UNMCMD (NIH Roadmap)
~$20M NIH awarded to date
6. 32 cpu Linux cluster 2+ Oracle instances
32GB RAM server PostgreSQL, MySQL
Linux: OpenSUSE, CentOS, Stereo graphics
RedHat, Fedora, Ubuntu workstation
SGI/IRIX 25+ scientific software
Windows, Mac OS X packages
Automated integration with
Supported in-house
NIH databases applications
We are cyberinfrastructure users and providers!
9. Nucleotide and protein sequence analysis
Genomics, proteomics
Merging with chemical biology, etc.
10. Computational search for likely
biological actives Example:
3D shape search;
Database may be real or virtual prozac & paxil
compounds
2D and 3D methods
2D similarity search
3D similarity search (shape,
pharmacophore)
docking (3D, protein binding site)
c/o OpenEye Rocs
12. Computational models for protein-ligand binding
Abl kinase
(1iep.pdb)
interaction potentia
hydrophobic (green
hbond acceptors (r
Gleevec in binding site
Gleevec is a leukemia drug known to bind with Abl kinase.
17. Rapid change, challenge and opportunity
Learning from history, trends (new not enough)
Winners and losers
Science, experts have led and followed.
~1980-2010 covers 3σ (99.7%)
And evolution...
18. Rapid change, challenge and opportunity
Learning from history, trends
Winners and losers
Science, experts have led and followed.
~1980-2010 covers 3σ (99.7%)
And evolution...
19. 1977: Atari 2600
1978: Space Invaders
1981: IBM-PC (MS-DOS)
1983: cellphone
1983: GNU Project
1984: Neuromancer,
William Gibson,
“cyberspace”
1984: Apple Mac, mouse,
windows & icons
20. 1985: Oracle 5 (client-server)
1989: Intel 486 Pentium (1M
transistors, 50MHz)
1990: MS Windows 3.0
1990: WWW (Berners-Lee)
1991: High Perf Comp &
Comm Act (Al Gore)
1991: Linux (Linux Torvalds)
1991: AOL
1991: ETrade
21. 1993: Jurassic Park (via SGI)
1993: NCSA Mosaic
1994: Netscape Navigator
1994: “Good Times” hoax
1994: Match.com
1995: “Concept” virus (Word)
1995: Internet Explorer
1995: Apache project
1995: Yahoo!
22. 1995: Amazon.com
1995: My mother gets email
1997: Google
1997: eBay
1999: Melissa virus (Outlook)
1999: Napster (p2p)
2000: MS convicted
2000: 3M USA broadband*
2000: dot-com bubble pops
*Fixed non dial-up internet connections >56k (FCC).
23. 2000: 802.11b wireless
2001: Apple iPod
2001: Apple iTunes
2001: Wikipedia
2003: Skype
2005: YouTube
2005: Rio power grid hacked
2005: NSA domestic surveillance
2006: Facebook
24. 2006: Amazon Cloud
2007: DOD hacked
2008: 70M USA broadband*
2009: Cyberdefense USA priority
2009: Twitter role in Iran election
protests
2010: UAVs are SOPs
2011: Cyber terrorism?
*Fixed non dial-up internet connections >56k (FCC).
25. The dotted line keeps moving...
Case study: database cheminformatics in
pharma research, 1990→2000.
26. In 1990, high speed chemical searching was
beyond standard capabilities.
Research groups managed local servers in
their labs & specialized DB engines (e.g.
Daylight Inc.).
By 2000, this function had moved to IT (via
Oracle cartridges, etc.) corporate informatics
infrastructure
Transition not smooth, but very beneficial.
27. Standard cocaine
functions:
substructure,
similarity,
identity
chemical
searching
imidazoles
28. (1) office equipment
(2) lab equipment
(3) experimental apparatus
(4) the experiment
(5) a commodity
(6) custom configured experimental
vehicle for exploration
(5) all of the above
29. (1) office equipment
(2) lab equipment
(3) experimental apparatus
(4) the experiment
(5) a commodity
(6) custom configured experimental
vehicle for exploration
(5) all of the above
31. Scientific research Scientific software for
experts
Computational research
Enabling software for
High performance
scientists
computing as a research
tool Commoditization (e.g.
cloud computing)
High performance
infrastructure as a Plumbing vs.
productivity tool experimental apparatus
Appropriate tiers and
domains
32. IT: “Poorly managed Research: “We need
computers and needy ill- power, flexibility and
trained users put the access and not another
system at risk.” lame PC.”
34. In ~5 yrs, super → un-super
Super computing? Define computer.
Advances from unexpected places:
gaming, movies (graphics -- vs. AI)
social networking (crowdsourcing)
even business (web standards, UIs, security)
Super computing is pushing the current limits
But where are the key frontiers?