The document discusses UCSD's efforts to build a research cyberinfrastructure (RCI) to support the large and growing data needs of researchers across campus. It outlines how various campus organizations like SDSC, libraries, and Calit2 are providing resources for data storage, computing, networking and expertise. The RCI aims to connect researchers and their instruments, some of which are generating terabytes of data daily, to shared resources to enable collaborative, data-driven research. Upcoming initiatives include a new high-performance storage system and efforts to integrate data from key instruments in areas like genomics and imaging.
The document discusses challenges in analyzing next generation sequencing (NGS) data from genome sequencing and the potential for real-time analysis using in-memory technologies. Specifically, it notes that conventional genome analysis can take days to weeks but the Hasso Plattner Institute has developed an in-memory approach that can perform alignment and variant calling on 10GB of sequencing data from 1000 genomes in under 45 minutes and enable interactive analysis in real-time. This approach uses an in-memory column-oriented database to store and query sequencing data without disk access for faster processing and analysis of genomic data.
This document discusses the potential of personalized medicine using genomic and other patient data. It notes that current medicine treats all patients as averages, but personalized approaches could use large databases to predict individual risk. New technologies like genome sequencing, biosensors, and artificial intelligence could enable superconvergence of data to precisely tailor treatment. This would increase efficiency and allow treatment of interconnected biological systems rather than reductionist views. The document advocates for integrated electronic medical records and biobanks to enable these personalized approaches.
This document discusses issues around data sharing in genomics research. It provides background on the history of genomics projects like the Human Genome Project. It then discusses BGI's role in large-scale sequencing efforts and their goal of making sequencing data highly accessible. It also discusses challenges around sharing large volumes of genomic data and ensuring proper attribution and credit for data sharing. Issues around data citation are examined, including the need for data citations to be tracked by citation indexes and for metrics around data citations to be utilized by the research community.
The document discusses big data and its applications in various domains including commerce, science, and healthcare. It provides examples of using big data for fraud detection in credit card transactions and customizing product shelves based on social media posts. It also discusses challenges in defining typical behaviors in large datasets and how approaches like building models from training data or using existing data directly can help detect outliers. The document emphasizes that big data is driving new approaches in integrative research like analyzing millions of nuclear features from whole slide images to classify brain tumors.
Este documento presenta a Raona, una empresa de ingeniería de software. Describe sus soluciones tecnológicas valiosas, como infraestructura, redes, business intelligence y CRM. También destaca a sus 105 ingenieros expertos y certificados y su pasión por la tecnología. Incluye comentarios positivos de varios clientes sobre la calidad de los ingenieros de Raona y su capacidad para resolver problemas. Finalmente, resume los servicios de Raona como prestaciones intelectuales, consultoría y servicios gestionados.
Open Homes for sale in Cheyenne, WY hosted by Coldwell Banker The Property Exchange for Saturday May 24 and Sunday May 25, 2014.
If you can't make it this weekend, you can view these or any Cheyenne home for sale by going to www.propertyex.com or calling us at 307-632-6481.
Please note Prices and Properties subject to change, these are only accurate through May 23, 2014. Open Houses are weather permitting.
The document discusses challenges in analyzing next generation sequencing (NGS) data from genome sequencing and the potential for real-time analysis using in-memory technologies. Specifically, it notes that conventional genome analysis can take days to weeks but the Hasso Plattner Institute has developed an in-memory approach that can perform alignment and variant calling on 10GB of sequencing data from 1000 genomes in under 45 minutes and enable interactive analysis in real-time. This approach uses an in-memory column-oriented database to store and query sequencing data without disk access for faster processing and analysis of genomic data.
This document discusses the potential of personalized medicine using genomic and other patient data. It notes that current medicine treats all patients as averages, but personalized approaches could use large databases to predict individual risk. New technologies like genome sequencing, biosensors, and artificial intelligence could enable superconvergence of data to precisely tailor treatment. This would increase efficiency and allow treatment of interconnected biological systems rather than reductionist views. The document advocates for integrated electronic medical records and biobanks to enable these personalized approaches.
This document discusses issues around data sharing in genomics research. It provides background on the history of genomics projects like the Human Genome Project. It then discusses BGI's role in large-scale sequencing efforts and their goal of making sequencing data highly accessible. It also discusses challenges around sharing large volumes of genomic data and ensuring proper attribution and credit for data sharing. Issues around data citation are examined, including the need for data citations to be tracked by citation indexes and for metrics around data citations to be utilized by the research community.
The document discusses big data and its applications in various domains including commerce, science, and healthcare. It provides examples of using big data for fraud detection in credit card transactions and customizing product shelves based on social media posts. It also discusses challenges in defining typical behaviors in large datasets and how approaches like building models from training data or using existing data directly can help detect outliers. The document emphasizes that big data is driving new approaches in integrative research like analyzing millions of nuclear features from whole slide images to classify brain tumors.
Este documento presenta a Raona, una empresa de ingeniería de software. Describe sus soluciones tecnológicas valiosas, como infraestructura, redes, business intelligence y CRM. También destaca a sus 105 ingenieros expertos y certificados y su pasión por la tecnología. Incluye comentarios positivos de varios clientes sobre la calidad de los ingenieros de Raona y su capacidad para resolver problemas. Finalmente, resume los servicios de Raona como prestaciones intelectuales, consultoría y servicios gestionados.
Open Homes for sale in Cheyenne, WY hosted by Coldwell Banker The Property Exchange for Saturday May 24 and Sunday May 25, 2014.
If you can't make it this weekend, you can view these or any Cheyenne home for sale by going to www.propertyex.com or calling us at 307-632-6481.
Please note Prices and Properties subject to change, these are only accurate through May 23, 2014. Open Houses are weather permitting.
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...Larry Smarr
Invited Presentation
Symposium on Computational Biology and Bioinformatics:
Remembering John Wooley
National Institutes of Health
Bethesda, MD
July 29, 2016
This document discusses the challenges of handling large-scale genomic and biological data and proposes potential solutions. It notes that data volumes are increasing rapidly due to advances in sequencing technology but dissemination and data handling methods have not kept pace. Several hurdles to data sharing are described including technical issues around data size, heterogeneity and longevity as well as economic and cultural barriers. Potential solutions discussed include providing incentives for data sharing through attribution and citation, adopting data citation practices using Digital Object Identifiers, establishing funding models for long-term curation, and launching new databases and journals focused on publishing and analyzing large-scale datasets.
Scott Edmunds talk in the "Policies and Standards for Reproducible Research" session on Revolutionizing Data Dissemination: GigaScience, at the Genomic Standards Consortium meeting at Shenzhen. 6th March 2012
1) Quantitative medicine uses large amounts of medical data and advanced analytics to determine the most effective treatment for individual patients based on their specific clinical profile and biomarkers. This approach can help reduce healthcare costs and improve outcomes compared to the traditional one-size-fits-all model.
2) However, realizing the promise of quantitative personalized medicine is challenging due to the huge quantities of diverse medical data located in dispersed systems, lack of computing capabilities, and barriers to data sharing.
3) Grid and service-oriented computing approaches are helping to address these challenges by enabling federated querying, analysis, and sharing of medical data and services across organizations through virtual integration rather than true consolidation.
Driving Applications on the UCSD Big Data Freeway SystemLarry Smarr
This document provides a summary of a keynote lecture about driving data-intensive applications using high-performance cyberinfrastructure at UC San Diego. The lecture discusses:
1) The exponential growth of digital data and need for dedicated high-bandwidth infrastructure to analyze large datasets.
2) Examples of data-intensive applications at UCSD including climate modeling, protein structure analysis, and medical research requiring fast access to remote supercomputers and large datasets.
3) UCSD's development of an optical "Big Data Freeway System" using high-speed fiber to connect resources and enable real-time analysis of large datasets up to 1000 times faster than the shared internet.
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
This document discusses how biomedical discovery is being disrupted by big data. Large genomic, phenotype, and environmental datasets are needed to understand complex diseases that result from combinations of many rare variants. However, analyzing large biomedical data is costly and difficult given the standard model of local computing. The document proposes creating large "commons" of community data and computing as an instrument for big data discovery. Examples are given of the Cancer Genome Atlas project, which has petabytes of research data on thousands of cancer patients, and how tumors evolve over time. Overall, the document argues that new models of shared biomedical clouds and commons are needed to enable cost-effective analysis of big biomedical data.
Sequencing Genomics:The New Big Data DriverLarry Smarr
1. Genomic sequencing is driving big data as the cost of sequencing DNA falls faster than Moore's Law and the amount of data produced increases dramatically.
2. The Beijing Genome Institute is the world's largest genomic institute, using over 130 sequencing machines each producing 25 gigabases per day for a total of over 12 petabytes of data storage.
3. Interdisciplinary teams of computer scientists, data analysts, and geneticists are needed to analyze the massive amounts of genomic and metagenomic data being produced to gain insights into human health and disease.
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...Larry Smarr
08.06.16
Invited Talk
Association of University Research Parks BioParks 2008
"From Discovery to Innovation"
Salk Institute
Title: High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Research Instruments
La Jolla, CA
2016 07 12_purdue_bigdatainomics_seandavisSean Davis
Newer, faster, cheaper molecular assays are driving biomedical research. I discuss the history of biomedical data including concepts of data sharing, hypothesis-driven vs generating research, and the potential to expand our thinking on biomedical research to be much more integrated through smart, creative, and open use of technologies and more flexible, longitudinal studies.
Building bioinformatics resources for the global communityExternalEvents
1. The document evaluates different methods for inferring relationships between Salmonella samples based on whole genome sequencing data from large databases. It compares k-mer based methods and site-based methods using 18,997 Salmonella isolates from public databases.
2. Site-based methods like NUCmer and MLST produced more accurate results, but require more computing resources when dealing with large databases. K-mer based methods are faster but more sensitive to assembly and contamination issues.
3. While k-mer methods may be useful for initial filtering, site-based methods are superior for accuracy, though challenges remain in applying them to databases containing tens of thousands of samples. Quality control and computing resources are important considerations.
Next generation genomics: Petascale data in the life sciencesGuy Coates
Keynote presentation at OGF 28.
The year 2000 saw the release of "The" human genome, the product of a the combined sequencing effort of the whole planet. In 2010, single institutions are sequencing thousands of genomes a year, producing petabytes of data. Furthermore, many of the large scale sequencing projects are based around international collaboration and consortia. The talk will explore how Grid and Cloud technologies are being used to share genomics data around the planet, revolutionizing life science research.
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Databasenist-spin
"Development of FDA MicroDB: A Regulatory-Grade
Microbial Reference Database" presentation at the Standards for Pathogen Identification via NGS (SPIN) workshop hosted by National Institute for Standards and Technology October 2014 by Heike Sichtig, PhD from the FDA and Luke Tallon from IGS UMSOM.
This document provides an introduction to bioinformatics. It defines bioinformatics as the analysis of large amounts of biological data, such as DNA sequences, using computer programs. It discusses how next-generation sequencing technologies are generating terabytes of nucleotide sequence data that is analyzed by automated computer programs. The document then provides examples of the types of biological data that is analyzed in bioinformatics, including DNA, RNA, protein sequences and their interactions. It also discusses some common programming languages and analysis techniques used in bioinformatics.
This document discusses the need for annotation of genomic data given the deluge of information from next generation sequencing. It outlines that clinical-grade annotation is important for application. Many sources of annotation are discussed, including databases, literature, testing labs, and crowdsourcing. However, it emphasizes that specialized human curation remains essential for high quality annotation.
This document provides an overview of DNA microarrays (DNA chips). It discusses that DNA chips allow scientists to simultaneously measure gene expression levels or genotype multiple genomic regions. It describes the principle technologies used in DNA chips, including attaching cDNA or oligonucleotide probes to glass or silicon surfaces. The document also provides background on DNA and microarrays, their history, applications in gene expression analysis and disease research, and principle of hybridization. It discusses alternative bead-based array technologies and how microarrays enabled large-scale genomic experiments.
My Remembrances of Mike Norman Over The Last 45 YearsLarry Smarr
Mike Norman has been a leader in computational astrophysics for over 45 years. Some of his influential work includes:
- Cosmic jet simulations in the early 1980s which helped explain phenomena from galactic centers.
- Pioneering the use of adaptive mesh refinement in the 1990s to achieve dynamic load balancing on supercomputers.
- Massive cosmology simulations in the late 2000s with over 100 trillion particles using thousands of processors across multiple supercomputing sites, producing petabytes of data.
- Developing end-to-end workflows in the 2000s to couple supercomputers, high-speed networks, and large visualization systems to enable real-time analysis of extremely large astrophysics simulations.
Metagenics How Do I Quantify My Body and Try to Improve its Health? June 18 2019Larry Smarr
Larry Smarr discusses quantifying his body and health over time through extensive self-tracking. He measures various biomarkers through regular blood tests and analyzes his gut microbiome by sequencing stool samples. This revealed issues like chronic inflammation and an unhealthy microbiome. Smarr then took steps like a restricted eating window and increasing plant diversity in his diet, which reversed metabolic syndrome issues and correlated with shifts in his microbiome ecology. His goal is to continue precisely measuring factors like toxins, hormones, gut permeability and food/supplement impacts to further optimize his health.
More Related Content
Similar to Health Sciences Driving UCSD Research Cyberinfrastructure
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...Larry Smarr
Invited Presentation
Symposium on Computational Biology and Bioinformatics:
Remembering John Wooley
National Institutes of Health
Bethesda, MD
July 29, 2016
This document discusses the challenges of handling large-scale genomic and biological data and proposes potential solutions. It notes that data volumes are increasing rapidly due to advances in sequencing technology but dissemination and data handling methods have not kept pace. Several hurdles to data sharing are described including technical issues around data size, heterogeneity and longevity as well as economic and cultural barriers. Potential solutions discussed include providing incentives for data sharing through attribution and citation, adopting data citation practices using Digital Object Identifiers, establishing funding models for long-term curation, and launching new databases and journals focused on publishing and analyzing large-scale datasets.
Scott Edmunds talk in the "Policies and Standards for Reproducible Research" session on Revolutionizing Data Dissemination: GigaScience, at the Genomic Standards Consortium meeting at Shenzhen. 6th March 2012
1) Quantitative medicine uses large amounts of medical data and advanced analytics to determine the most effective treatment for individual patients based on their specific clinical profile and biomarkers. This approach can help reduce healthcare costs and improve outcomes compared to the traditional one-size-fits-all model.
2) However, realizing the promise of quantitative personalized medicine is challenging due to the huge quantities of diverse medical data located in dispersed systems, lack of computing capabilities, and barriers to data sharing.
3) Grid and service-oriented computing approaches are helping to address these challenges by enabling federated querying, analysis, and sharing of medical data and services across organizations through virtual integration rather than true consolidation.
Driving Applications on the UCSD Big Data Freeway SystemLarry Smarr
This document provides a summary of a keynote lecture about driving data-intensive applications using high-performance cyberinfrastructure at UC San Diego. The lecture discusses:
1) The exponential growth of digital data and need for dedicated high-bandwidth infrastructure to analyze large datasets.
2) Examples of data-intensive applications at UCSD including climate modeling, protein structure analysis, and medical research requiring fast access to remote supercomputers and large datasets.
3) UCSD's development of an optical "Big Data Freeway System" using high-speed fiber to connect resources and enable real-time analysis of large datasets up to 1000 times faster than the shared internet.
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
This document discusses how biomedical discovery is being disrupted by big data. Large genomic, phenotype, and environmental datasets are needed to understand complex diseases that result from combinations of many rare variants. However, analyzing large biomedical data is costly and difficult given the standard model of local computing. The document proposes creating large "commons" of community data and computing as an instrument for big data discovery. Examples are given of the Cancer Genome Atlas project, which has petabytes of research data on thousands of cancer patients, and how tumors evolve over time. Overall, the document argues that new models of shared biomedical clouds and commons are needed to enable cost-effective analysis of big biomedical data.
Sequencing Genomics:The New Big Data DriverLarry Smarr
1. Genomic sequencing is driving big data as the cost of sequencing DNA falls faster than Moore's Law and the amount of data produced increases dramatically.
2. The Beijing Genome Institute is the world's largest genomic institute, using over 130 sequencing machines each producing 25 gigabases per day for a total of over 12 petabytes of data storage.
3. Interdisciplinary teams of computer scientists, data analysts, and geneticists are needed to analyze the massive amounts of genomic and metagenomic data being produced to gain insights into human health and disease.
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...Larry Smarr
08.06.16
Invited Talk
Association of University Research Parks BioParks 2008
"From Discovery to Innovation"
Salk Institute
Title: High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Research Instruments
La Jolla, CA
2016 07 12_purdue_bigdatainomics_seandavisSean Davis
Newer, faster, cheaper molecular assays are driving biomedical research. I discuss the history of biomedical data including concepts of data sharing, hypothesis-driven vs generating research, and the potential to expand our thinking on biomedical research to be much more integrated through smart, creative, and open use of technologies and more flexible, longitudinal studies.
Building bioinformatics resources for the global communityExternalEvents
1. The document evaluates different methods for inferring relationships between Salmonella samples based on whole genome sequencing data from large databases. It compares k-mer based methods and site-based methods using 18,997 Salmonella isolates from public databases.
2. Site-based methods like NUCmer and MLST produced more accurate results, but require more computing resources when dealing with large databases. K-mer based methods are faster but more sensitive to assembly and contamination issues.
3. While k-mer methods may be useful for initial filtering, site-based methods are superior for accuracy, though challenges remain in applying them to databases containing tens of thousands of samples. Quality control and computing resources are important considerations.
Next generation genomics: Petascale data in the life sciencesGuy Coates
Keynote presentation at OGF 28.
The year 2000 saw the release of "The" human genome, the product of a the combined sequencing effort of the whole planet. In 2010, single institutions are sequencing thousands of genomes a year, producing petabytes of data. Furthermore, many of the large scale sequencing projects are based around international collaboration and consortia. The talk will explore how Grid and Cloud technologies are being used to share genomics data around the planet, revolutionizing life science research.
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Databasenist-spin
"Development of FDA MicroDB: A Regulatory-Grade
Microbial Reference Database" presentation at the Standards for Pathogen Identification via NGS (SPIN) workshop hosted by National Institute for Standards and Technology October 2014 by Heike Sichtig, PhD from the FDA and Luke Tallon from IGS UMSOM.
This document provides an introduction to bioinformatics. It defines bioinformatics as the analysis of large amounts of biological data, such as DNA sequences, using computer programs. It discusses how next-generation sequencing technologies are generating terabytes of nucleotide sequence data that is analyzed by automated computer programs. The document then provides examples of the types of biological data that is analyzed in bioinformatics, including DNA, RNA, protein sequences and their interactions. It also discusses some common programming languages and analysis techniques used in bioinformatics.
This document discusses the need for annotation of genomic data given the deluge of information from next generation sequencing. It outlines that clinical-grade annotation is important for application. Many sources of annotation are discussed, including databases, literature, testing labs, and crowdsourcing. However, it emphasizes that specialized human curation remains essential for high quality annotation.
This document provides an overview of DNA microarrays (DNA chips). It discusses that DNA chips allow scientists to simultaneously measure gene expression levels or genotype multiple genomic regions. It describes the principle technologies used in DNA chips, including attaching cDNA or oligonucleotide probes to glass or silicon surfaces. The document also provides background on DNA and microarrays, their history, applications in gene expression analysis and disease research, and principle of hybridization. It discusses alternative bead-based array technologies and how microarrays enabled large-scale genomic experiments.
Similar to Health Sciences Driving UCSD Research Cyberinfrastructure (20)
My Remembrances of Mike Norman Over The Last 45 YearsLarry Smarr
Mike Norman has been a leader in computational astrophysics for over 45 years. Some of his influential work includes:
- Cosmic jet simulations in the early 1980s which helped explain phenomena from galactic centers.
- Pioneering the use of adaptive mesh refinement in the 1990s to achieve dynamic load balancing on supercomputers.
- Massive cosmology simulations in the late 2000s with over 100 trillion particles using thousands of processors across multiple supercomputing sites, producing petabytes of data.
- Developing end-to-end workflows in the 2000s to couple supercomputers, high-speed networks, and large visualization systems to enable real-time analysis of extremely large astrophysics simulations.
Metagenics How Do I Quantify My Body and Try to Improve its Health? June 18 2019Larry Smarr
Larry Smarr discusses quantifying his body and health over time through extensive self-tracking. He measures various biomarkers through regular blood tests and analyzes his gut microbiome by sequencing stool samples. This revealed issues like chronic inflammation and an unhealthy microbiome. Smarr then took steps like a restricted eating window and increasing plant diversity in his diet, which reversed metabolic syndrome issues and correlated with shifts in his microbiome ecology. His goal is to continue precisely measuring factors like toxins, hormones, gut permeability and food/supplement impacts to further optimize his health.
Panel: Reaching More Minority Serving InstitutionsLarry Smarr
This document discusses engaging more minority serving institutions (MSIs) in cyberinfrastructure development through regional networks. It provides data showing the importance of MSIs like historically black colleges and universities (HBCUs) in educating underrepresented minority students in STEM fields. Regional networks can help equalize opportunities by assisting MSIs in overcoming barriers to resources through training, networking infrastructure support, and helping institutions obtain necessary staffing and funding. Strategies mentioned include collaborating with MSIs on grants and addressing issues identified in surveys like lack of vision for data use beyond compliance. The goal is to broaden participation in STEAM fields by leveraging the success MSIs have shown in supporting underrepresented students.
Global Network Advancement Group - Next Generation Network-Integrated SystemsLarry Smarr
This document summarizes a presentation on global petascale to exascale workflows for data intensive sciences. It discusses a partnership convened by the GNA-G Data Intensive Sciences Working Group with the mission of meeting challenges faced by data-intensive science programs. Cornerstone concepts that will be demonstrated include integrated network and site resource management, model-driven frameworks for resource orchestration, end-to-end monitoring with machine learning-optimized data transfers, and integrating Qualcomm's GradientGraph with network services to optimize applications and science workflows.
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...Larry Smarr
This document discusses opportunities for ESnet to support wireless edge computing through developing a strategy around self-guided field laboratories (SGFL). It outlines several potential science use cases that could benefit from wireless and distributed computing capabilities, both in the short term through technologies like 5G, LoRa and Starlink, and longer term through the vision of automated SGFL. The document proposes some initial ideas for deploying and testing wireless edge computing technologies through existing projects to help enable the SGFL vision and further scientific opportunities. It emphasizes that exploring these emerging areas could help drive new science possibilities if done at a reasonable scale.
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon MoonLarry Smarr
This document provides an overview of Asia Pacific and Korea research platforms. It discusses the Asia Pacific Research Platform working group in APAN, including its objectives to promote HPC ecosystems and engage members. It describes the Asi@Connect project which provides high-capacity internet connectivity for research across Asia-Pacific. It also discusses the Korea Research Platform and efforts to expand it to 25 national research institutes in Korea. New related projects on smart hospitals, agriculture, and environment are mentioned. The conclusion discusses enhancing APAN and the Korea Research Platform and expanding into new areas like disaster and AI education.
Panel: Reaching More Minority Serving InstitutionsLarry Smarr
This document discusses engaging more minority serving institutions (MSIs) in the National Research Platform (NRP). It provides data showing that MSIs serve a disproportionate number of underrepresented minority students and are important producers of STEM graduates from these groups. The NRP can help broaden participation in STEAM fields by providing MSIs access to advanced cyberinfrastructure resources, new learning modalities, and opportunities for collaborative research between MSIs and other institutions. Regional networks also have a role to play in helping MSIs overcome barriers and attracting them to collaborative grants. The goal is to tear down walls between research and teaching and reinvent the university experience for more inclusive learning and innovation.
Panel: The Global Research Platform: An OverviewLarry Smarr
The document provides an overview of the Global Research Platform (GRP), an international collaborative partnership creating a distributed environment for data-intensive global science. The GRP facilitates high-performance data gathering, analytics, transport up to terabits per second, computing, and storage to support large-scale global science cyberinfrastructure ecosystems. It aims to orchestrate research across multiple domains using international testbeds for investigating new technologies related to data-intensive science. Examples of instruments generating exabytes of data that would benefit include the Korea Superconducting Tokamak, the High Luminosity LHC, genomics, the SKA radio telescope, and the Vera Rubin Observatory.
Panel: Future Wireless Extensions of Regional Optical NetworksLarry Smarr
CENIC is a non-profit organization that operates an 8,000+ mile fiber optic network connecting over 12,000 sites across California, including K-12 schools, universities, libraries, and research organizations. It has over 750 private sector partners and contributes over $100 million annually to the California economy. CENIC's network enables research and education collaborations, innovation, and economic growth statewide. It also operates a wireless research network called PRP that connects wireless sensors to supercomputers, supporting applications like wildfire modeling.
Global Research Platform Workshops - Maxine BrownLarry Smarr
The document announces a workshop on global research platforms that will be held virtually in 2021 and in Salt Lake City in 2022, with topics including large-scale science, next-generation platforms, data transport, and international testbeds. It also announces the 4th Global Research Platform Workshop to be held in October 2023 in Limassol, Cyprus co-located with the IEEE eScience 2023 conference.
EPOC and NetSage provide engagement and network monitoring services to support research and education. NetSage collects anonymized network flow data to help understand traffic patterns and troubleshoot performance issues. It provides dashboards and analysis to answer common questions from network engineers and end users. Examples of NetSage deployments and use cases were shown for the CENIC network, including top sources and destinations of traffic, debugging slow flows, and analyzing international traffic patterns by country over time.
The document discusses accelerating science discovery with AI inference-as-a-service. It describes showcases using this approach for high energy physics and gravitational wave experiments. It outlines the vision of the A3D3 institute to unite domain scientists, computer scientists, and engineers to achieve real-time AI and transform science. Examples are provided of using AI inference-as-a-service to accelerate workflows for CMS, ProtoDUNE, LIGO, and other experiments.
Democratizing Science through Cyberinfrastructure - Manish ParasharLarry Smarr
This document summarizes a presentation by Manish Parashar on democratizing science through cyberinfrastructure. The key points are:
1) Broad, fair, and equitable access to advanced cyberinfrastructure is essential for democratizing 21st century science, but there are significant barriers related to knowledge, technical issues, social factors, and balancing capabilities.
2) An advanced cyberinfrastructure ecosystem for all requires integrated portals, access to local and national resources through high-speed networks, diverse allocation modes, embedded expertise networks, and broad training.
3) Realizing this vision will require a scalable federated ecosystem with diverse capabilities and incentives for partnerships to meet growing needs for cyberinfrastructure and
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Larry Smarr
This document summarizes a panel discussion on building the National Research Platform ecosystem with regional networks. The panelists discussed how their regional networks are connecting to and using the Nautilus nodes of the NRP. Examples included using NRP for deep learning and computer vision research at the University of Missouri, challenges of adoption in Nevada and potential solutions, and Georgia Tech's new involvement through the Southern Crossroads regional network. The regional networks see opportunities to expand NRP access and training to enable more researchers in their regions to take advantage of the platform.
Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Je...Larry Smarr
The document discusses Open Force Field (OpenFF), an open-source project that enables rapid development of molecular force fields through automated infrastructure, open data and software, and an open science approach. OpenFF provides access to large quantum chemical datasets, runs quantum chemistry calculations on pre-emptible cloud resources with minimal human intervention, and facilitates easy iteration and testing of new force field hypotheses through an open development model.
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr
The document discusses open infrastructure for an open society and the role of commercial clouds. It describes how the National Research Platform (NRP), Open Science Grid (OSG), and Open Science Data Federation (OSDF) provide open infrastructure through open source components that anyone can contribute to and use. It then discusses how Southwestern Oklahoma State University leveraged NRP resources on their campus and engaged students and local teachers. Finally, it outlines the pros and cons of commercial clouds, when they may be suitable to use, and how tools like CloudBank and Kubernetes can help facilitate science users' access to cloud resources.
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr
The document discusses open infrastructure for an open society and the role of commercial clouds. It describes how the National Research Platform (NRP), Open Science Grid (OSG), and Open Science Data Federation (OSDF) provide open infrastructure through open source components that anyone can contribute to and use. It then discusses how Southwestern Oklahoma State University leveraged NRP resources on their campus and engaged students and local teachers. Finally, it outlines the pros and cons of commercial clouds, noting they provide huge capacity and variety but are very expensive for regular use. Facilitating science users on clouds requires services like CloudBank and Kubernetes federation.
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr
The document discusses open infrastructure for an open society and the role of commercial clouds. It describes how the National Research Platform (NRP), Open Science Grid (OSG), and Open Science Data Federation (OSDF) provide open infrastructure through open source components that anyone can contribute to and use. It then discusses how Southwestern Oklahoma State University leveraged NRP resources on their campus and engaged students and local teachers. Finally, it outlines the pros and cons of commercial clouds, noting they provide huge capacity and variety but are very expensive for regular use. Facilitating science users on clouds requires tools for account management, documentation, and integrating cloud resources through HTCondor and Kubernetes.
Frank Würthwein - NRP and the Path forwardLarry Smarr
NRP will replace PRP and aims to democratize access to national research cyberinfrastructure. The long term vision is to create an open national cyberinfrastructure by federating resources across research institutions. Key innovations include an innovative network fabric, application libraries for FPGAs, a "bring your own resource" model, and innovative scheduling and data infrastructure. The NSF has funded the Prototype National Research Platform project to support NRP for the next 5 years. NRP aims to grow resources, introduce new capabilities, and be driven by the research community.
🔥🔥🔥🔥🔥🔥🔥🔥🔥
إضغ بين إيديكم من أقوى الملازم التي صممتها
ملزمة تشريح الجهاز الهيكلي (نظري 3)
💀💀💀💀💀💀💀💀💀💀
تتميز هذهِ الملزمة بعِدة مُميزات :
1- مُترجمة ترجمة تُناسب جميع المستويات
2- تحتوي على 78 رسم توضيحي لكل كلمة موجودة بالملزمة (لكل كلمة !!!!)
#فهم_ماكو_درخ
3- دقة الكتابة والصور عالية جداً جداً جداً
4- هُنالك بعض المعلومات تم توضيحها بشكل تفصيلي جداً (تُعتبر لدى الطالب أو الطالبة بإنها معلومات مُبهمة ومع ذلك تم توضيح هذهِ المعلومات المُبهمة بشكل تفصيلي جداً
5- الملزمة تشرح نفسها ب نفسها بس تكلك تعال اقراني
6- تحتوي الملزمة في اول سلايد على خارطة تتضمن جميع تفرُعات معلومات الجهاز الهيكلي المذكورة في هذهِ الملزمة
واخيراً هذهِ الملزمة حلالٌ عليكم وإتمنى منكم إن تدعولي بالخير والصحة والعافية فقط
كل التوفيق زملائي وزميلاتي ، زميلكم محمد الذهبي 💊💊
🔥🔥🔥🔥🔥🔥🔥🔥🔥
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...TechSoup
Whether you're new to SEO or looking to refine your existing strategies, this webinar will provide you with actionable insights and practical tips to elevate your nonprofit's online presence.
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...EduSkills OECD
Andreas Schleicher, Director of Education and Skills at the OECD presents at the launch of PISA 2022 Volume III - Creative Minds, Creative Schools on 18 June 2024.
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
Health Sciences Driving UCSD Research Cyberinfrastructure
1. Health Sciences Driving
UCSD Research Cyberinfrastructure
Invited Talk
UCSD Health Sciences Faculty Council
UC San Diego
April 3, 2012
Dr. Larry Smarr
Director, California Institute for Telecommunications and
Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
Follow me at http://lsmarr.calit2.net
2. UCSD Researcher
Research Cyberinfrastructure Needs
• UCSD Researchers Diverse Sources of Data
Surveyed in 2008 to
Determine Their Unmet CI
Needs
• Answer: DATA – Help!
– Data Infrastructure
(Storage, Transmission,
Curation)
– Data Expertise
(Management, Analysis,
Visualization, Curation)
Source: Mike Norman, SDSC
4. UCSD RCI
Provider Organizations
RCI element SDSC UCSD ACT Calit2
Libraries
Co-Location Lead
Storage Lead Partner Partner
Curation Partner Lead
Computing Lead
Networking Partner Lead Partner
4
Source: Mike Norman, SDSC
5. From One to a Billion Data Points Defining Me:
The Exponential Rise in Body Data in Just One Decade
Full Genome
SNPs
Blood
Variables
Weight
6. First Stage of Metagenomic Sequencing of
My Gut Microbiome at J. Craig Venter Institute
I Received
a Disk Drive Today
With 30-50 GigaBytes
Gel Image of Extract from Smarr Sample-Next is Library Construction
Manny Torralba, Project Lead - Human Genomic Medicine
J Craig Venter Institute
January 25, 2012
7. The Coming Digital Transformation
of Health
www.technologyreview.com/biomedicine/39636
8. Integrative Personal Omics Profiling
Reveals Details of Clinical Onset of Viruses and Diabetes
Cell 148, 1293–1307, March 16, 2012
• Michael Snyder,
Chair of Genomics
Stanford Univ.
• Genome 140x
Coverage
• Blood Tests 20
Times in 14 Months
– tracked nearly
20,000 distinct
transcripts coding
for 12,000 genes
– measured the
relative levels of
more than 6,000
proteins and 1,000
metabolites in
Snyder's blood
9. Source: Lucila Ohno-Machado, UCSD SOM
iDASH
Outcome of NIH Botstein-Smarr Report (1999)
9
http://acd.od.nih.gov/agendas/060399_Biomed_Computing_WG_RPT.htm
10. integrating Data for Analysis,
Anonymization, and SHaring (iDASH)
Private Cloud at SD Supercomputer Center
Medical Center Data Hosting
HIPAA certified facility
Source: Lucila Ohno-Machado, UCSD SOM 10
funded by NIH U54HL108460
11. Data + Ontologies + Tools
UCSF UC Davis UC Irvine UCLA UCSD
Complications
associated with
a new drug or Extraction Transformation Load
device? (even with same vendor, the EMRs are configured differently)
Semantic Integration
Query
Information
Source: Lucila Ohno-Machado, UCSD SOM
12. Personalized Care and Population Health
• Genomics
– SNP-based therapy (cancer)
• ‘Phenomics’
– Electronic Health Records
– Personal monitoring
– Blood pressure, glucose
– Behavior
– Adherence to medication, exercise
• Public Health and Environment
– Air quality, food
– Surveillance
Source: DOE
Source: Lucila Ohno-Machado, UCSD SOM
13. NCMIR’s Integrated Infrastructure
of Shared Resources
Shared Infrastructure
Scientific Local SOM
Instruments Infrastructure
End User
Workstations
Source: Steve Peltier, NCMIR
16. Moving to Shared Enterprise Data Storage & Analysis
Resources: SDSC Triton Resource & Calit2 GreenLight
http://tritonresource.sdsc.edu Source: Philip Papadopoulos, SDSC, UCSD
SDSC
Large Memory SDSC Shared
Nodes Resource
• 256/512 GB/sys Cluster
• 8TB Total • 24 GB/Node
• 128 GB/sec • 6TB Total
• ~ 9 TF • 256 GB/sec
x256 • ~ 20 TF
x28
UCSD Research Labs
SDSC Data Oasis
Large Scale Storage
• 2 PB
• 50 GB/sec
• 3000 – 6000 disks
• Phase 0: 1/3 PB, 8GB/
s
N x 10Gb/s Campus
Research
Network
Calit2 GreenLight
17. SOM Use of
SDSC Triton Resource
• 10 SOM PIs Received Substantial Allocations
– 100K CPU-hours or more
• 8 SOM PIs / Labs Currently Using Triton with Time Purchased
from Grant Funds
• 30+ Active Trial Accounts
• Supporting ~6 Next Generation Sequencing Projects with PIs
from SOM, SIO, and 2 Outside Research Institutes (TSRI, LIAI)
19. Calit2 Microbial Metagenomics Cluster-
Next Generation Optically Linked Science Data Server
Source: Phil Papadopoulos, SDSC, Calit2
512 Processors
~200TB
~5 Teraflops Sun
X4500
~ 200 Terabytes Storage 1GbE and
Storage
10GbE
Switched/
10GbE
Routed
Core
4000 Users
From 90 Countries
20. Creating CAMERA 2.0 -
Advanced Cyberinfrastructure Service Oriented Architecture
Source:
CAMERA CTO
Mark Ellisman
21. Access to Computing Resources Tailored by
User’s Requirements and Resources
Advanced HPC Platforms
CAMERA
Core HPC
Resource
NSF/DOE TeraScale
Resources
Source: Jeff Grethe, CAMERA
22. NSF Funds a Data-Intensive Track 2 Supercomputer:
SDSC’s Gordon-Coming Summer 2011
• Data-Intensive Supercomputer Based on
SSD Flash Memory and Virtual Shared Memory SW
– Emphasizes MEM and IOPS over FLOPS
– Supernode has Virtual Shared Memory:
– 2 TB RAM Aggregate
– 8 TB SSD Aggregate
– Total Machine = 32 Supernodes
– 4 PB Disk Parallel File System >100 GB/s I/O
• System Designed to Accelerate Access
to Massive Data Bases being Generated in
Many Fields of Science, Engineering, Medicine,
and Social Science
Source: Mike Norman, Allan Snavely SDSC
23. Rapid Evolution of 10GbE Port Prices
Makes Campus-Scale 10Gbps CI Affordable
• Port Pricing is Falling
• Density is Rising – Dramatically
• Cost of 10GbE Approaching Cluster HPC Interconnects
$80K/port
Chiaro
(60 Max)
$ 5K
Force 10
(40 max) ~$1000
(300+ Max)
$ 500
Arista $ 400
48 ports Arista
48 ports
2005 2007 2009 2010
Source: Philip Papadopoulos, SDSC/Calit2
25. 2012 RCI Initiatives
• RCI is Preparing an Attractive Storage Offering
for All UCSD Researchers to Encourage Adoption
– “Wide and Deep”
– On-Ramp to Digital Curation Efforts
• SOM Possesses Many of the Most Data-Intensive
Instruments on Campus (NGS, MassSpec, MRI)
– Effort to Connect Them to RCI Resources This Year
• SDSC Working with DBMI to Define a HIPPA-compliant
Cloud Computing Resource that Would Leverage or
Extend RCI Resources
• RCI Implementation Team Needs your Input and
Collaboration (email Richard Moore @ SDSC)
Source: Mike Norman, SDSC
26. Potential UCSD Optical Networked
Biomedical Researchers and Instruments
• Connects at 10 Gbps :
CryoElectron
Microscopy Facility – Microarrays
San Diego – Genome Sequencers
Supercomputer – Mass Spectrometry
Center
– Light and Electron
Microscopes
– Whole Body Imagers
– Computing
Cellular & Molecular
– Storage
Medicine East
Calit2@UCSD
Bioengineering
Radiology
Imaging Lab
National
Center for Developing
Microscopy &
Imaging Center for
Molecular Genetics
Detailed Plan
Pharmaceutical
Sciences Building Cellular & Molecular
Biomedical Research Medicine West
Editor's Notes
I will quickly hint to the problem of data harmonization without getting into details, speak about how difficult it is to find A1ATD patients despite ICD-9 codes.
This is a production cluster with it’s own Force10 e1200 switch. It is connected to quartzite and is labeled as the “CAMERA Force10 E1200”. We built CAMERA this way because of technology deployed successfully in Quartzite