This document summarizes a talk given by Professor Ken Kreutz-Delgado on distributed machine learning platforms and brain-inspired computing. It discusses the Pacific Research Platform (PRP) which connects multiple universities and research institutions. The PRP uses FIONA appliances and Kubernetes to distribute storage and processing. A new NSF grant will add GPUs across 10 campuses for training AI algorithms on big data. The talk envisions connecting the PRP with clouds of GPUs and non-von Neumann processors like IBM's TrueNorth chip. Calit2's Pattern Recognition Lab uses different processors including TrueNorth to explore machine learning algorithms.
Creating a Big Data Machine Learning Platform in CaliforniaLarry Smarr
Big Data Tech Forum: Big Data Enabling Technologies and Applications
San Diego Chinese American Science and Engineering Association (SDCASEA)
Sanford Consortium
La Jolla, CA
December 2, 2017
The Pacific Research Platform (PRP) is a multi-institutional cyberinfrastructure project that connects researchers across California and beyond to share large datasets. It spans the 10 University of California campuses, major private research universities, supercomputer centers, and some out-of-state universities. Fifteen multi-campus research teams in fields like physics, astronomy, earth sciences, biomedicine, and multimedia will drive the technical needs of the PRP over five years. The goal is to create a "big data freeway" to allow high-speed sharing of data between research labs, supercomputers, and repositories across multiple networks without performance loss over long distances.
- The Pacific Research Platform (PRP) interconnects campus DMZs across multiple institutions to provide high-speed connectivity for data-intensive research.
- The PRP utilizes specialized data transfer nodes called FIONAs that provide disk-to-disk transfer speeds of 10-100Gbps.
- Early applications of the PRP include distributing telescope data between UC campuses, connecting particle physics experiments to computing resources, and enabling real-time wildfire sensor data analysis.
Creating a Big Data Machine Learning Platform in CaliforniaLarry Smarr
Big Data Tech Forum: Big Data Enabling Technologies and Applications
San Diego Chinese American Science and Engineering Association (SDCASEA)
Sanford Consortium
La Jolla, CA
December 2, 2017
The Pacific Research Platform (PRP) is a multi-institutional cyberinfrastructure project that connects researchers across California and beyond to share large datasets. It spans the 10 University of California campuses, major private research universities, supercomputer centers, and some out-of-state universities. Fifteen multi-campus research teams in fields like physics, astronomy, earth sciences, biomedicine, and multimedia will drive the technical needs of the PRP over five years. The goal is to create a "big data freeway" to allow high-speed sharing of data between research labs, supercomputers, and repositories across multiple networks without performance loss over long distances.
- The Pacific Research Platform (PRP) interconnects campus DMZs across multiple institutions to provide high-speed connectivity for data-intensive research.
- The PRP utilizes specialized data transfer nodes called FIONAs that provide disk-to-disk transfer speeds of 10-100Gbps.
- Early applications of the PRP include distributing telescope data between UC campuses, connecting particle physics experiments to computing resources, and enabling real-time wildfire sensor data analysis.
Opening Keynote Lecture
15th Annual ON*VECTOR International Photonics Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
February 29, 2016
Building the Pacific Research Platform: Supernetworks for Big Data ScienceLarry Smarr
The document summarizes Dr. Larry Smarr's presentation on building the Pacific Research Platform (PRP) to enable big data science across research universities on the West Coast. The PRP provides 100-1000 times more bandwidth than today's internet to support research fields from particle physics to climate change. In under 2 years, the prototype PRP has connected researchers and datasets across California through optical networks and is now expanding nationally and globally. The next steps involve adding machine learning capabilities to the PRP through GPU clusters to enable new discoveries from massive datasets.
Towards a High-Performance National Research Platform Enabling Digital ResearchLarry Smarr
The document summarizes Dr. Larry Smarr's keynote presentation on enabling a high-performance national research platform. It describes how multi-institutional research increasingly relies on access to large datasets, requiring new cyberinfrastructure. The Pacific Research Platform provides high-bandwidth networking between universities to support research collaborations across disciplines. The next steps involve scaling this model into a national and global platform. The presentation highlights how the PRP enables various scientific applications and drives innovation through improved data transfer capabilities and distributed computing resources.
My talk at the Winter School on Big Data in Tarragona, Spain.
Abstract: We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to leverage the “cloud” (whether private or public) to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of large-scale outsourcing and automation for science, and suggest opportunities and challenges for today’s researchers.
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
Argonne’s Discovery Engines for Big Data project is working to enable new research modalities based on the integration of advanced computing with experiments at facilities such as the Advanced Photon Source (APS). I review science drivers and initial results in diffuse scattering, high energy diffraction microscopy, tomography, and pythography. I also describe the computational methods and infrastructure that we leverage to support such applications, which include the Petrel online data store, ALCF supercomputers, Globus research data management services, and Swift parallel scripting. This work points to a future in which tight integration of DOE’s experimental and computational facilities enables both new science and more efficient and rapid discovery.
Accelerating Discovery via Science ServicesIan Foster
[A talk presented at Oak Ridge National Laboratory on October 15, 2015]
We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In big-science projects in high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to develop suites of science services to which researchers can dispatch mundane but time-consuming tasks, and thus to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of large-scale outsourcing and automation for science, and suggest opportunities and challenges for today’s researchers. I use examples from Globus and other projects to demonstrate what can be achieved.
Machine Learning in Healthcare DiagnosticsLarry Smarr
Machine learning and artificial intelligence are rapidly transforming healthcare and medicine. Advances in genetic sequencing have enabled the mapping of human and microbial genomes at low costs. Researchers are using machine learning to analyze genomic and microbiome data to better understand health and disease. Non-von Neumann brain-inspired computing architectures are being developed for machine learning applications and could accelerate medical research and diagnostics. These technologies may help create personalized health coaching and move medicine from reactive sickcare to proactive healthcare.
This document summarizes a presentation given at ICME 2015 at the Cheyenne Mountain Resort in Colorado Springs, Colorado. The presentation was given by James Belak from Lawrence Livermore National Laboratory and discussed use-inspired research and development for integrated computational materials engineering. It addressed key computations needed for ICME like databases, expert systems, simulation methods, and continuum models. Integrating computations into materials engineering was a focus.
1) Scientists at the Advanced Photon Source use the Argonne Leadership Computing Facility for data reconstruction and analysis from experimental facilities in real-time or near real-time. This provides feedback during experiments.
2) Using the Swift parallel scripting language and ALCF supercomputers like Mira, scientists can process terabytes of data from experiments in minutes rather than hours or days. This enables errors to be detected and addressed during experiments.
3) Key applications discussed include near-field high-energy X-ray diffraction microscopy, X-ray nano/microtomography, and determining crystal structures from diffuse scattering images through simulation and optimization. The workflows developed provide significant time savings and improved experimental outcomes.
06.07.26
Invited Talk
Cyberinfrastructure for Humanities, Arts, and Social Sciences, A Summer Institute, SDSC
Title: The OptIPuter and Its Applications
La Jolla, CA
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...Larry Smarr
The document discusses the Pacific Research Platform (PRP), a regional big data cyberinfrastructure connecting researchers across California universities. PRP provides high-speed networks and data transfer nodes to enable sharing of large datasets for projects like medical imaging, cryo-electron microscopy, and machine learning. Recent grants are expanding PRP to add GPUs and non-von Neumann processors to support these computationally intensive applications.
Distributed Cyberinfrastructure to Support Big Data Machine LearningLarry Smarr
Panel on the Future of Machine Learning
California Institute for Telecommunications and Information Technology
University of California, Irvine
May 24, 2018
Accelerating Data-driven Discovery in Energy ScienceIan Foster
A talk given at the US Department of Energy, covering our work on research data management and analysis. Three themes:
(1) Eliminate data friction (use of SaaS for research data management)
(2) Liberate scientific data (research on data extraction, organization, publication)
(3) Create discovery engines at DOE facilities (services that organize data + computation)
In this deck from the 2014 HPC User Forum in Seattle, Jack Collins from the National Cancer Institute presents: Genomes to Structures to Function: The Role of HPC.
Watch the video presentation: http://wp.me/p3RLHQ-d28
This document discusses using cloud computing and virtualization for scientific research. Some key points:
- Scientists can access remote sensors, share data and workflows, and store personal data in the cloud. Beginners can click to code, while experts can build complex workflows.
- Services allow publishing, finding, and binding to distributed resources through registries. Data can be queried through standards like Simple Image Access Protocol.
- Distributed registries from various organizations harvest metadata to enable semantic search across sky regions, identifiers, tags, vocabularies, schemas, and service descriptions.
- Tools provide code/presentation environments and access to distributed data in the cloud. Services include astronomical cross-matching and event notification through Sky
Opening Keynote Lecture
15th Annual ON*VECTOR International Photonics Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
February 29, 2016
Building the Pacific Research Platform: Supernetworks for Big Data ScienceLarry Smarr
The document summarizes Dr. Larry Smarr's presentation on building the Pacific Research Platform (PRP) to enable big data science across research universities on the West Coast. The PRP provides 100-1000 times more bandwidth than today's internet to support research fields from particle physics to climate change. In under 2 years, the prototype PRP has connected researchers and datasets across California through optical networks and is now expanding nationally and globally. The next steps involve adding machine learning capabilities to the PRP through GPU clusters to enable new discoveries from massive datasets.
Towards a High-Performance National Research Platform Enabling Digital ResearchLarry Smarr
The document summarizes Dr. Larry Smarr's keynote presentation on enabling a high-performance national research platform. It describes how multi-institutional research increasingly relies on access to large datasets, requiring new cyberinfrastructure. The Pacific Research Platform provides high-bandwidth networking between universities to support research collaborations across disciplines. The next steps involve scaling this model into a national and global platform. The presentation highlights how the PRP enables various scientific applications and drives innovation through improved data transfer capabilities and distributed computing resources.
My talk at the Winter School on Big Data in Tarragona, Spain.
Abstract: We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to leverage the “cloud” (whether private or public) to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of large-scale outsourcing and automation for science, and suggest opportunities and challenges for today’s researchers.
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
Argonne’s Discovery Engines for Big Data project is working to enable new research modalities based on the integration of advanced computing with experiments at facilities such as the Advanced Photon Source (APS). I review science drivers and initial results in diffuse scattering, high energy diffraction microscopy, tomography, and pythography. I also describe the computational methods and infrastructure that we leverage to support such applications, which include the Petrel online data store, ALCF supercomputers, Globus research data management services, and Swift parallel scripting. This work points to a future in which tight integration of DOE’s experimental and computational facilities enables both new science and more efficient and rapid discovery.
Accelerating Discovery via Science ServicesIan Foster
[A talk presented at Oak Ridge National Laboratory on October 15, 2015]
We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In big-science projects in high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to develop suites of science services to which researchers can dispatch mundane but time-consuming tasks, and thus to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of large-scale outsourcing and automation for science, and suggest opportunities and challenges for today’s researchers. I use examples from Globus and other projects to demonstrate what can be achieved.
Machine Learning in Healthcare DiagnosticsLarry Smarr
Machine learning and artificial intelligence are rapidly transforming healthcare and medicine. Advances in genetic sequencing have enabled the mapping of human and microbial genomes at low costs. Researchers are using machine learning to analyze genomic and microbiome data to better understand health and disease. Non-von Neumann brain-inspired computing architectures are being developed for machine learning applications and could accelerate medical research and diagnostics. These technologies may help create personalized health coaching and move medicine from reactive sickcare to proactive healthcare.
This document summarizes a presentation given at ICME 2015 at the Cheyenne Mountain Resort in Colorado Springs, Colorado. The presentation was given by James Belak from Lawrence Livermore National Laboratory and discussed use-inspired research and development for integrated computational materials engineering. It addressed key computations needed for ICME like databases, expert systems, simulation methods, and continuum models. Integrating computations into materials engineering was a focus.
1) Scientists at the Advanced Photon Source use the Argonne Leadership Computing Facility for data reconstruction and analysis from experimental facilities in real-time or near real-time. This provides feedback during experiments.
2) Using the Swift parallel scripting language and ALCF supercomputers like Mira, scientists can process terabytes of data from experiments in minutes rather than hours or days. This enables errors to be detected and addressed during experiments.
3) Key applications discussed include near-field high-energy X-ray diffraction microscopy, X-ray nano/microtomography, and determining crystal structures from diffuse scattering images through simulation and optimization. The workflows developed provide significant time savings and improved experimental outcomes.
06.07.26
Invited Talk
Cyberinfrastructure for Humanities, Arts, and Social Sciences, A Summer Institute, SDSC
Title: The OptIPuter and Its Applications
La Jolla, CA
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...Larry Smarr
The document discusses the Pacific Research Platform (PRP), a regional big data cyberinfrastructure connecting researchers across California universities. PRP provides high-speed networks and data transfer nodes to enable sharing of large datasets for projects like medical imaging, cryo-electron microscopy, and machine learning. Recent grants are expanding PRP to add GPUs and non-von Neumann processors to support these computationally intensive applications.
Distributed Cyberinfrastructure to Support Big Data Machine LearningLarry Smarr
Panel on the Future of Machine Learning
California Institute for Telecommunications and Information Technology
University of California, Irvine
May 24, 2018
Accelerating Data-driven Discovery in Energy ScienceIan Foster
A talk given at the US Department of Energy, covering our work on research data management and analysis. Three themes:
(1) Eliminate data friction (use of SaaS for research data management)
(2) Liberate scientific data (research on data extraction, organization, publication)
(3) Create discovery engines at DOE facilities (services that organize data + computation)
In this deck from the 2014 HPC User Forum in Seattle, Jack Collins from the National Cancer Institute presents: Genomes to Structures to Function: The Role of HPC.
Watch the video presentation: http://wp.me/p3RLHQ-d28
This document discusses using cloud computing and virtualization for scientific research. Some key points:
- Scientists can access remote sensors, share data and workflows, and store personal data in the cloud. Beginners can click to code, while experts can build complex workflows.
- Services allow publishing, finding, and binding to distributed resources through registries. Data can be queried through standards like Simple Image Access Protocol.
- Distributed registries from various organizations harvest metadata to enable semantic search across sky regions, identifiers, tags, vocabularies, schemas, and service descriptions.
- Tools provide code/presentation environments and access to distributed data in the cloud. Services include astronomical cross-matching and event notification through Sky
Distributed Cyberinfrastructure to Support Big Data Machine LearningLarry Smarr
Panel on the Future of Machine Learning
California Institute for Telecommunications and Information Technology
University of California, Irvine
May 24, 2018
Peering The Pacific Research Platform With The Great Plains NetworkLarry Smarr
The Pacific Research Platform (PRP) connects research institutions across the western United States with high-speed networks to enable data-intensive science collaborations. Key points:
- The PRP connects 15 campuses across California and links to the Great Plains Network, allowing researchers to access remote supercomputers, share large datasets, and collaborate on projects like analyzing data from the Large Hadron Collider.
- The PRP utilizes Science DMZ architectures with dedicated data transfer nodes called FIONAs to achieve high-speed transfer of large files. Kubernetes is used to manage distributed storage and computing resources.
- Early applications include distributed climate modeling, wildfire science, plankton imaging, and cancer genomics. The PR
Looking Back, Looking Forward NSF CI Funding 1985-2025Larry Smarr
This document provides an overview of the development of national research platforms (NRPs) from 1985 to the present, with a focus on the Pacific Research Platform (PRP). It describes the evolution of the PRP from early NSF-funded supercomputing centers to today's distributed cyberinfrastructure utilizing optical networking, containers, Kubernetes, and distributed storage. The PRP now connects over 15 universities across the US and internationally to enable data-intensive science and machine learning applications across multiple domains. Going forward, the document discusses plans to further integrate regional networks and partner with new NSF-funded initiatives to develop the next generation of NRPs through 2025.
Pacific Research Platform Science DriversLarry Smarr
The document discusses the vision and progress of the Pacific Research Platform (PRP) in creating a "big data freeway" across the West Coast to enable data-intensive science. It outlines how the PRP builds on previous NSF and DOE networking investments to provide dedicated high-performance computing resources, like GPU clusters and Jupyter hubs, connected by high-speed networks at multiple universities. Several science driver teams are highlighted, including particle physics, astronomy, microbiology, earth sciences, and visualization, that will leverage PRP resources for large-scale collaborative data analysis projects.
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...Larry Smarr
Invited Presentation
Symposium on Computational Biology and Bioinformatics:
Remembering John Wooley
National Institutes of Health
Bethesda, MD
July 29, 2016
The document discusses the Pacific Research Platform (PRP), a distributed cyberinfrastructure that connects researchers and data across multiple campuses in California and beyond using optical fiber networking. Key points:
- The PRP uses high-speed networking infrastructure like the CENIC network to connect data generators and consumers across 15+ campuses, creating an integrated "big data freeway system".
- It deploys specialized data transfer nodes called FIONAs to enable high-speed transfer of large datasets between sites at near the full network speed.
- Recent additions include using Kubernetes to orchestrate containers across the PRP infrastructure and integrating machine learning resources through the CHASE-CI grant to support data-intensive AI applications.
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemLarry Smarr
The Pacific Research Platform will create a regional "Big Data Freeway System" along the West Coast to support science. It will connect major research institutions with high-speed optical networks, allowing them to share vast amounts of data and computational resources. This will enable new forms of collaborative, data-intensive research for fields like particle physics, astronomy, biomedicine, and earth sciences. The first phase aims to establish a basic networked infrastructure, with later phases advancing capabilities to 100Gbps and beyond with security and distributed technologies.
Global Research Platforms: Past, Present, FutureLarry Smarr
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise boosts blood flow, releases endorphins, and promotes changes in the brain which help regulate emotions and stress levels.
Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI)Larry Smarr
This document summarizes Dr. Larry Smarr's presentation on the Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI) project. The project received two NSF grants totaling over $5 million to create a regional cyberinfrastructure linking multiple universities. This includes a "Big Data Superhighway" linking campus networks and a machine learning layer with 256 GPUs for faculty and students. The goal is to map machine learning algorithms to novel architectures like GPUs, FPGAs, and neuromorphic chips to support data science and AI applications.
A California-Wide Cyberinfrastructure for Data-Intensive ResearchLarry Smarr
The document discusses creating a California-wide cyberinfrastructure for data-intensive research. It outlines efforts to connect all UC campuses and other research institutions across California with high-speed optical networks. This would create a "big data plane" to share large datasets. Several campuses have received NSF grants to upgrade their networks and implement Science DMZ architectures with 10-100Gbps connections to CENIC. Connecting these resources would provide researchers access to high-performance computing, large scientific instruments, and datasets. This would support collaborative big data science across disciplines like physics, climate modeling, genomics and microscopy.
The document provides an overview of the Pacific Research Platform (PRP) and discusses its role in connecting researchers across institutions and enabling new applications. It summarizes the PRP's key components like Science DMZs, Data Transfer Nodes (FIONAs), and use of Kubernetes for container management. Several examples are given of how the PRP facilitates high-performance distributed data analysis, access to remote supercomputers, and sensor networks coupled to real-time computing. Upcoming work on machine learning applications and expanding the PRP internationally is also outlined.
Similar to CHASE-CI: A Distributed Big Data Machine Learning Platform (20)
My Remembrances of Mike Norman Over The Last 45 YearsLarry Smarr
Mike Norman has been a leader in computational astrophysics for over 45 years. Some of his influential work includes:
- Cosmic jet simulations in the early 1980s which helped explain phenomena from galactic centers.
- Pioneering the use of adaptive mesh refinement in the 1990s to achieve dynamic load balancing on supercomputers.
- Massive cosmology simulations in the late 2000s with over 100 trillion particles using thousands of processors across multiple supercomputing sites, producing petabytes of data.
- Developing end-to-end workflows in the 2000s to couple supercomputers, high-speed networks, and large visualization systems to enable real-time analysis of extremely large astrophysics simulations.
Metagenics How Do I Quantify My Body and Try to Improve its Health? June 18 2019Larry Smarr
Larry Smarr discusses quantifying his body and health over time through extensive self-tracking. He measures various biomarkers through regular blood tests and analyzes his gut microbiome by sequencing stool samples. This revealed issues like chronic inflammation and an unhealthy microbiome. Smarr then took steps like a restricted eating window and increasing plant diversity in his diet, which reversed metabolic syndrome issues and correlated with shifts in his microbiome ecology. His goal is to continue precisely measuring factors like toxins, hormones, gut permeability and food/supplement impacts to further optimize his health.
Panel: Reaching More Minority Serving InstitutionsLarry Smarr
This document discusses engaging more minority serving institutions (MSIs) in cyberinfrastructure development through regional networks. It provides data showing the importance of MSIs like historically black colleges and universities (HBCUs) in educating underrepresented minority students in STEM fields. Regional networks can help equalize opportunities by assisting MSIs in overcoming barriers to resources through training, networking infrastructure support, and helping institutions obtain necessary staffing and funding. Strategies mentioned include collaborating with MSIs on grants and addressing issues identified in surveys like lack of vision for data use beyond compliance. The goal is to broaden participation in STEAM fields by leveraging the success MSIs have shown in supporting underrepresented students.
Global Network Advancement Group - Next Generation Network-Integrated SystemsLarry Smarr
This document summarizes a presentation on global petascale to exascale workflows for data intensive sciences. It discusses a partnership convened by the GNA-G Data Intensive Sciences Working Group with the mission of meeting challenges faced by data-intensive science programs. Cornerstone concepts that will be demonstrated include integrated network and site resource management, model-driven frameworks for resource orchestration, end-to-end monitoring with machine learning-optimized data transfers, and integrating Qualcomm's GradientGraph with network services to optimize applications and science workflows.
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...Larry Smarr
This document discusses opportunities for ESnet to support wireless edge computing through developing a strategy around self-guided field laboratories (SGFL). It outlines several potential science use cases that could benefit from wireless and distributed computing capabilities, both in the short term through technologies like 5G, LoRa and Starlink, and longer term through the vision of automated SGFL. The document proposes some initial ideas for deploying and testing wireless edge computing technologies through existing projects to help enable the SGFL vision and further scientific opportunities. It emphasizes that exploring these emerging areas could help drive new science possibilities if done at a reasonable scale.
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon MoonLarry Smarr
This document provides an overview of Asia Pacific and Korea research platforms. It discusses the Asia Pacific Research Platform working group in APAN, including its objectives to promote HPC ecosystems and engage members. It describes the Asi@Connect project which provides high-capacity internet connectivity for research across Asia-Pacific. It also discusses the Korea Research Platform and efforts to expand it to 25 national research institutes in Korea. New related projects on smart hospitals, agriculture, and environment are mentioned. The conclusion discusses enhancing APAN and the Korea Research Platform and expanding into new areas like disaster and AI education.
Panel: Reaching More Minority Serving InstitutionsLarry Smarr
This document discusses engaging more minority serving institutions (MSIs) in the National Research Platform (NRP). It provides data showing that MSIs serve a disproportionate number of underrepresented minority students and are important producers of STEM graduates from these groups. The NRP can help broaden participation in STEAM fields by providing MSIs access to advanced cyberinfrastructure resources, new learning modalities, and opportunities for collaborative research between MSIs and other institutions. Regional networks also have a role to play in helping MSIs overcome barriers and attracting them to collaborative grants. The goal is to tear down walls between research and teaching and reinvent the university experience for more inclusive learning and innovation.
Panel: The Global Research Platform: An OverviewLarry Smarr
The document provides an overview of the Global Research Platform (GRP), an international collaborative partnership creating a distributed environment for data-intensive global science. The GRP facilitates high-performance data gathering, analytics, transport up to terabits per second, computing, and storage to support large-scale global science cyberinfrastructure ecosystems. It aims to orchestrate research across multiple domains using international testbeds for investigating new technologies related to data-intensive science. Examples of instruments generating exabytes of data that would benefit include the Korea Superconducting Tokamak, the High Luminosity LHC, genomics, the SKA radio telescope, and the Vera Rubin Observatory.
Panel: Future Wireless Extensions of Regional Optical NetworksLarry Smarr
CENIC is a non-profit organization that operates an 8,000+ mile fiber optic network connecting over 12,000 sites across California, including K-12 schools, universities, libraries, and research organizations. It has over 750 private sector partners and contributes over $100 million annually to the California economy. CENIC's network enables research and education collaborations, innovation, and economic growth statewide. It also operates a wireless research network called PRP that connects wireless sensors to supercomputers, supporting applications like wildfire modeling.
Global Research Platform Workshops - Maxine BrownLarry Smarr
The document announces a workshop on global research platforms that will be held virtually in 2021 and in Salt Lake City in 2022, with topics including large-scale science, next-generation platforms, data transport, and international testbeds. It also announces the 4th Global Research Platform Workshop to be held in October 2023 in Limassol, Cyprus co-located with the IEEE eScience 2023 conference.
EPOC and NetSage provide engagement and network monitoring services to support research and education. NetSage collects anonymized network flow data to help understand traffic patterns and troubleshoot performance issues. It provides dashboards and analysis to answer common questions from network engineers and end users. Examples of NetSage deployments and use cases were shown for the CENIC network, including top sources and destinations of traffic, debugging slow flows, and analyzing international traffic patterns by country over time.
The document discusses accelerating science discovery with AI inference-as-a-service. It describes showcases using this approach for high energy physics and gravitational wave experiments. It outlines the vision of the A3D3 institute to unite domain scientists, computer scientists, and engineers to achieve real-time AI and transform science. Examples are provided of using AI inference-as-a-service to accelerate workflows for CMS, ProtoDUNE, LIGO, and other experiments.
Democratizing Science through Cyberinfrastructure - Manish ParasharLarry Smarr
This document summarizes a presentation by Manish Parashar on democratizing science through cyberinfrastructure. The key points are:
1) Broad, fair, and equitable access to advanced cyberinfrastructure is essential for democratizing 21st century science, but there are significant barriers related to knowledge, technical issues, social factors, and balancing capabilities.
2) An advanced cyberinfrastructure ecosystem for all requires integrated portals, access to local and national resources through high-speed networks, diverse allocation modes, embedded expertise networks, and broad training.
3) Realizing this vision will require a scalable federated ecosystem with diverse capabilities and incentives for partnerships to meet growing needs for cyberinfrastructure and
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Larry Smarr
This document summarizes a panel discussion on building the National Research Platform ecosystem with regional networks. The panelists discussed how their regional networks are connecting to and using the Nautilus nodes of the NRP. Examples included using NRP for deep learning and computer vision research at the University of Missouri, challenges of adoption in Nevada and potential solutions, and Georgia Tech's new involvement through the Southern Crossroads regional network. The regional networks see opportunities to expand NRP access and training to enable more researchers in their regions to take advantage of the platform.
Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Je...Larry Smarr
The document discusses Open Force Field (OpenFF), an open-source project that enables rapid development of molecular force fields through automated infrastructure, open data and software, and an open science approach. OpenFF provides access to large quantum chemical datasets, runs quantum chemistry calculations on pre-emptible cloud resources with minimal human intervention, and facilitates easy iteration and testing of new force field hypotheses through an open development model.
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr
The document discusses open infrastructure for an open society and the role of commercial clouds. It describes how the National Research Platform (NRP), Open Science Grid (OSG), and Open Science Data Federation (OSDF) provide open infrastructure through open source components that anyone can contribute to and use. It then discusses how Southwestern Oklahoma State University leveraged NRP resources on their campus and engaged students and local teachers. Finally, it outlines the pros and cons of commercial clouds, when they may be suitable to use, and how tools like CloudBank and Kubernetes can help facilitate science users' access to cloud resources.
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr
The document discusses open infrastructure for an open society and the role of commercial clouds. It describes how the National Research Platform (NRP), Open Science Grid (OSG), and Open Science Data Federation (OSDF) provide open infrastructure through open source components that anyone can contribute to and use. It then discusses how Southwestern Oklahoma State University leveraged NRP resources on their campus and engaged students and local teachers. Finally, it outlines the pros and cons of commercial clouds, noting they provide huge capacity and variety but are very expensive for regular use. Facilitating science users on clouds requires services like CloudBank and Kubernetes federation.
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr
The document discusses open infrastructure for an open society and the role of commercial clouds. It describes how the National Research Platform (NRP), Open Science Grid (OSG), and Open Science Data Federation (OSDF) provide open infrastructure through open source components that anyone can contribute to and use. It then discusses how Southwestern Oklahoma State University leveraged NRP resources on their campus and engaged students and local teachers. Finally, it outlines the pros and cons of commercial clouds, noting they provide huge capacity and variety but are very expensive for regular use. Facilitating science users on clouds requires tools for account management, documentation, and integrating cloud resources through HTCondor and Kubernetes.
Frank Würthwein - NRP and the Path forwardLarry Smarr
NRP will replace PRP and aims to democratize access to national research cyberinfrastructure. The long term vision is to create an open national cyberinfrastructure by federating resources across research institutions. Key innovations include an innovative network fabric, application libraries for FPGAs, a "bring your own resource" model, and innovative scheduling and data infrastructure. The NSF has funded the Prototype National Research Platform project to support NRP for the next 5 years. NRP aims to grow resources, introduce new capabilities, and be driven by the research community.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
CHASE-CI: A Distributed Big Data Machine Learning Platform
1. “CHASE-CI: A Distributed Big Data
Machine Learning Platform”
Opening Talk With Professor Ken Kreutz-Delgado
CHASE-CI Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
May 14, 2018
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
2. DOE ESnet’s Science DMZ
Creates a Separate Network for Big Data Applications
• A Science DMZ Integrates 4 Key Concepts Into a Unified Whole:
– A network architecture designed for high-performance applications,
with the science network distinct from the general-purpose network
– The use of dedicated systems as data transfer nodes (DTNs)
– Performance measurement and network testing systems that are
regularly used to characterize and troubleshoot the network
– Security policies and enforcement mechanisms that are tailored for
high performance science environments
http://fasterdata.es.net/science-dmz/
Science DMZ
Coined 2010
3. Based on Community Input and on ESnet’s Science DMZ Concept,
NSF Has Made Over 200 Campus-Level Awards in 44 States
Source: Kevin Thompson, NSF
4. Science DMZ Data Transfer Nodes (DTNs) -
Flash I/O Network Appliances (FIONAs)
UCSD Designed FIONAs
To Solve the Disk-to-Disk
Data Transfer Problem
at Full Speed
on 10G, 40G and 100G Networks
FIONAS—10/40G, $8,000
Phil Papadopoulos, SDSC &
Tom DeFanti, Joe Keefe & John Graham, Calit2
FIONette—1G, $250
Five Racked FIONAs at Calit2
• Each Contains:
• Dual 12-Core CPUs
• 96GB RAM
• 1TB SSD
• 2 10GbE interfaces
• Total ~$10,500
• With 8 GPUs
• total ~$18,500
5. Logical Next Step: The Pacific Research Platform Networks Campus DMZs
to Create a Regional End-to-End Science-Driven “Big Data Superhighway” System
(GDC)
NSF CC*DNI Grant
$5M 10/2015-10/2020
PI: Larry Smarr, UC San Diego Calit2
Co-PIs:
• Camille Crittenden, UC Berkeley CITRIS,
• Tom DeFanti, UC San Diego Calit2/QI,
• Philip Papadopoulos, UCSD SDSC,
• Frank Wuerthwein, UCSD Physics and SDSC
Letters of Commitment from:
• 50 Researchers from 15 Campuses
• 32 IT/Network Organization Leaders
NSF Program Officer: Amy Walton
Source: John Hess, CENIC
6. PRP National-Scale Experimental Distributed Testbed:
Using Internet2 to Connect Early-Adopter Quilt Regional R&E Networks
Original PRP
Extended PRP
Testbed
Announced at Internet2 Global Summit May 8, 2018
7. PRP’s First 2.5 Years:
Connecting Multi-Campus Application Teams and Devices
Earth
Sciences
8. 100 Gbps FIONA at UCSC Allows for Downloads to the UCSC Hyades Cluster
from the LBNL NERSC Supercomputer for Telescope Survey Analysis
300 images per night.
100MB per raw image
120GB per night
250 images per night.
530MB per raw image
800GB per night
Source: Peter Nugent, LBNL
Professor of Astronomy, UC Berkeley
NSF-Funded Cyberengineer
Shaw Dong @UCSC
Receiving FIONA
Feb 7, 2017
CENIC 2018 Innovations
in Networking Award for
Research Applications
9. Game Changer: Using Kubernetes
to Manage Containers Across the PRP
“Kubernetes is a way of stitching together
a collection of machines into, basically, a big computer,”
--Craig Mcluckie, Google
and now CEO and Founder of Heptio
"Everything at Google runs in a container."
--Joe Beda,Google
“Kubernetes has emerged as
the container orchestration engine of choice
for many cloud providers including
Google, AWS, Rackspace, and Microsoft,
and is now being used in HPC and Science DMZs.
--John Graham, Calit2/QI UC San Diego
10. Rook is Ceph Cloud-Native Object Storage
‘Inside’ Kubernetes
https://rook.io/
Source: John Graham, Calit2/QI
11. FIONA8
FIONA8
100G Epyc NVMe
40G 160TB
100G NVMe 6.4T
SDSU
100G Gold NVMe
March 2018 John Graham, UCSD
100G NVMe 6.4T
Caltech
40G 160TB
UCAR
FIONA8
UCI
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
sdx-controller
controller-0
Calit2
100G Gold FIONA8
SDSC
40G 160TB
UCR 40G 160TB
USC
40G 160TB
UCLA
40G 160TB
Stanford
40G 160TB
UCSB
100G NVMe 6.4T
40G 160TB
UCSC
40G 160TB
Hawaii
Running Kubernetes/Rook/Ceph On PRP
Allows Us to Deploy a Distributed PB+ of Storage for Posting Science Data
Rook/Ceph - Block/Object/FS
Swift API compatible with
SDSC, AWS, and Rackspace
Kubernetes
Centos7
12. The Rise of Brain-Inspired Computers:
Left & Right Brain Computing: Arithmetic vs. Pattern Recognition
Adapted from D-Wave
13. New NSF CHASE-CI Grant Creates a Community Cyberinfrastructure:
Adding a Machine Learning Layer Built on Top of the Pacific Research Platform
Caltech
UCB
UCI UCR
UCSD
UCSC
Stanford
MSU
UCM
SDSU
NSF Grant for High Speed “Cloud” of 256 GPUs
For 30 ML Faculty & Their Students at 10 Campuses
for Training AI Algorithms on Big Data
NSF Program Officer: Mimi McClure
14. FIONA8: Adding GPUs to FIONAs
Supports Data Science Machine Learning
Multi-Tenant Containerized GPU JupyterHub
Running Kubernetes / CoreOS
Eight Nvidia GTX-1080 Ti GPUs
32GB RAM, 3TB SSD, 40G & Dual 10G ports
Source: John Graham, Calit2
15. 48 GPUs for
OSG Applications
UCSD Adding >350 Game GPUs to Data Sciences Cyberinfrastructure -
Devoted to Data Analytics and Machine Learning
SunCAVE 70 GPUs
WAVE + Vroom 48 GPUs
FIONA with
8-Game GPUs
95 GPUs
for Students
CHASE-CI Grant Provides
96 GPUs at UCSD
for Training AI Algorithms on Big Data
Plus 288 64-bit GPUs
On SDSC’s Comet
16. Next Step: Surrounding the PRP Machine Learning Platform
With Clouds of GPUs and Non-Von Neumann Processors
Microsoft Installs Altera FPGAs
into Bing Servers &
384 into TACC for Academic Access
CHASE-CI
64-TrueNorth
Cluster
64-bit GPUs
4352x NVIDIA Tesla V100 GPUs
17. The Future of Supercomputing Will Blend Traditional HPC and Data Analytics
Integrating Non-von Neumann Architectures
“High Performance Computing Will Evolve
Towards a Hybrid Model,
Integrating Emerging Non-von Neumann Architectures,
with Huge Potential in Pattern Recognition,
Streaming Data Analysis,
and Unpredictable New Applications.”
Horst Simon, Deputy Director,
U.S. Department of Energy’s
Lawrence Berkeley National Laboratory
18. Calit2’s Qualcomm Institute Has Established a Pattern Recognition Lab
For Machine Learning on GPUs and von Neumann and NvN Processors
Source: Dr. Dharmendra Modha
Founding Director, IBM Cognitive Computing Group
August 8, 2014
UCSD ECE Professor Ken Kreutz-Delgado Brings
the IBM TrueNorth Chip
to Start Calit2’s Qualcomm Institute
Pattern Recognition Laboratory
September 16, 2015
19. Ken Kreutz-Delgado
Director, Calit2/QI Pattern Recognition Laboratory
Professor of Electrical & Computer Engineering
Irwin & Joan Jacobs School of Engineering
University of California, San Diego
Calit2/QI Pattern Recognition Laboratory (PRLab)
20. Pattern Recognition Lab (PRLab)
– A Nexus for a Community of Researchers and Practitioners
in Theory and Applications of Pattern Recognition & Machine Learning
– All Disciplines and Application Areas (Medicine, Education, Finance,
Economics, Science, Engineering, Art…) Can Be Involved
– Computing “On-The-Edge-of-The-Edge”: Real-Time, Local, Fast and
Robust Processing for Critical Control and Decision Making
– (e.g., Robotic Surgical Assistance, Autonomous Aircraft)
21. The PRLab Community - I
Calit2 Technical Leadership:
Tom DeFanti, Engineering Systems Scientist
Srinjoy Das, Principle Chip Algorithms Designer
John Graham, Systems Development & Integration
Joe Keefe, Systems Integration
The PRLab Community - I
21
Regional UC Campuses
23. Mapping Machine Learning Algorithm Families onto Novel Architectures
for Real-time On-the-Edge Embedded Computing
• Deep & Recurrent Neural Networks (DNN, RNN)
• Graph Theoretic Approaches (Bayes Nets, Markov Random Fields)
• Reinforcement Learning and Control (RL)
• Markov Decision Processes; Time Series Analysis
• Clustering and other Neighborhood Approaches
• Support Vector Machines (SVM)
• Sparse Signal Processing, Source Localization & Compressive Sensing
• Stochastic Sampling & Variational Approximation for Bayesian Reasoning
• Dimensionality Reduction & Manifold Learning
• Ensemble Learning (Boosting, Bagging)
• Latent Variable Analysis (PCA, ICA)
24. Example Hard Problem –
Real-Time EEG-Based BCI
Research Performed by Grad Students Jason Palmer, Nima Bigdely-Shamlo,
Ozgur Balkan, Luca Pion-Tonachini, Alejandro Pineda, Ramon Martinez, Ching-fu Chen,
in Collaboration with Dr. Scott Makeig, Director SCCN
Localized and Isolate Dynamically Changing Brain Sources in Real Time
Image source: emotiv.com
25. Computing on the Edge-of-the-Edge
– New Computational Paradigms are Needed for Real-Time Pattern Recognition
and Machine Learning Algorithms
– Exploit and Enhance the Performance of:
– Advanced SOC Mobile Device Processors (e.g., Qualcomm Snapdragon)
– Non-von Neumann (NvN) Processors, Including:
– Field Programmable Gate Arrays (FPGAs)
– Digital Neuromorphic Processors (e.g., IBM TrueNorth)
26. Horst Simon, Deputy Director,
Lawrence Berkeley National Laboratory’s
National Energy Research Scientific Computing Center
Qualcomm
Institute
Brain-Inspired Computation
• Straightforward Extrapolation Results in a Real Time Human
Brain Scale Simulation at 1–10 Exaflop/s with 4 PB of Memory
• A Digital Computer with this Performance Might be Available
in 2022–2024 with a Power Consumption of >20–30 MW
• The Human Brain Runs on 20 W
• Our Brain is a Million Times More Power Efficient!
SI-1
27. Brain-Inspired Computing
13 May 201627
Spike-based
Snapdragon
Other
Canonical Families of Pattern Recognition Algorithms
Heterogeneous Mix of Cybercores
Deep Networks Stochastic SamplingSupport Vector Machines
28. Pushing the NvN Envelope
Deep Generative Neural Networks for Real-Time Embedded
Hardware-Based IoT Applications
Approximate and Efficient Arithmetic,
(Adders, Comparators, Multipliers); Finite
Precision; Memory Optimization; Data Flow
Optimization
Functional Approximation; Weights and Modes
Criticality & Sensitivity analysis; Sparsity & Pruning;
Applications and Processors-Specific Architecture
Determination and Optimization
Training and Inference Methodologies; Gibbs
Sampling; Variational Approximation; Transfer
Learning; Reinforcement Learning
Performance at Power & Power at Performance
measures; Distributional Similarity Measures;
Statistical Hypothesis Testing; Design optimization
Criteria and Tools
Low-Power, Embedded Real-Time
Decision Making, Control &
Scene/Scenario Generation
and Situation Analysis
Pushing the NvN Envelope
Deep Generative Neural Networks for Real-Time
Embedded Hardware-Based IoT Applications
Approximate and Efficient Arithmetic,
(Adders, Comparators, Multipliers); Finite
Precision; Memory Optimization; Data Flow
Optimization
Functional Approximation; Weights and Modes
Criticality & Sensitivity analysis; Sparsity & Pruning;
Applications and Processors-Specific Architecture
Determination and Optimization
Training and Inference Methodologies; Gibbs
Sampling; Variational Approximation; Transfer
Learning; Reinforcement Learning
Performance at Power & Power at Performance
measures; Distributional Similarity Measures;
Statistical Hypothesis Testing; Design optimization
Criteria and Tools
Low-Power, Embedded Real-Time
Decision Making, Control &
Scene/Scenario Generation
and Situation Analysis
29. Brain-Inspired Processors
Are Accelerating the Non-von Neumann Architecture Era
“On the drawing board are collections of 64, 256, 1024, and 4096 chips.
‘It’s only limited by money, not imagination,’ Modha says.”
Source: Dr. Dharmendra Modha
IBM Chief Scientist for Brain-inspired Computing
August 8, 2014
30. Example: Stochastic Sampling for Deep Learning Algorithms
On Analog & Digital (IBM TrueNorth) Neuromorphic Chips
30
Hierarchical, Probabilistic
Learning and Inference
(Restricted Boltzmann Machines,
Deep Belief Networks)
Massively Parallel
Computational Substrates
(“Brain-Like” VLSI Platforms)
Analog Neurons
(UCSD IFAT)
Neuromorphic VLSI, Stochastic
Sampling & RBMs
Digital Neurons
(IBM TrueNorth)
Conventional
Von Neuman
MNIST
Digit
Recognition,
Completion,
Generation
Low-Power
Neuromorphic
(Neftci et al.,
Frontiers in
Neuroscience
2014)
(Neural Sampling with Event-
Driven Contrastive Divergence
for RBM learning/inference)
(Digital Gibbs Sampling for RBM/DBN Inference)
(Das et al., ISCAS
2015)
31. TrueNorth Chip
is on the PRP
• Our IBM TrueNorth Platform is Available for Use for Anyone on the PRP
• We Encourage All Who are Interested, Particularly Students, to Do Similar
Research.
• Last Summer:
– Three MS Students Remotely Accessed Our TrueNorth Chip From Berkeley
– Four UCSD Students Learned How to Program the TrueNorth Chip
32. Current Focus on FPGA Applications
• Application: Real-Time, Low Power Inference in
Restricted Boltzmann Machines (RBM), Deep Belief
Networks (DBN) and Generative Adversarial Networks
(GAN) using FPGA
• Current Shortcomings of Neuromorphic Approach:
– Analog Platforms (Homoeostasis Necessary)
– Digital Platforms Like TrueNorth – Algorithm Conversion
to “Spiking” Version for Maximum Effectiveness
Required (b/c Rate-Based is Inefficient)
– FPGA: Very Flexible NvN Platform
– Rapid Prototyping Enabling Algorithm Exploration
– If Necessary, Mapping to ASIC is Possible
34. Can Optimize Power/Performance Trade-Off
Synthetically
Generated Faces:
Lower Bitwidth = Less Power
Higher Bitwidth = More Realistic
Metric Says Use 12 bits
35. Google Released Its AI Software as Open Source
Accelerating Development
https://exponential.singularityu.org/medicine/big-data-machine-learning-with-jeremy-howard/
From Programming Computers
Step by Step To Achieve a Goal
To Showing the Computer
Some Examples of
What You Want It to Achieve
and Then Letting the Computer
Figure It Out On Its Own
--Jeremy Howard, Singularity Univ.
2015
November 9, 2015
36. Google Designed a NvN
Machine Learning Accelerator
Calit2 is Negotiating Access for CHASE-CI
37. Join the Fun –
and Do Good Science!
• The PRLab is a Nexus for Pattern Recognition, Machine Learning, and Neural
Networks That Arise in Any Domain – From Medicine to the Arts
• As We Expand Our Suite of Processors, Opportunities for Students and Others
To Do Important Research and Development will Expand
• The Applications are Important and Will Become Even More So…
This is the Future!
38. Our Support:
• US National Science Foundation (NSF) awards
CNS 0821155, CNS-1338192, CNS-1456638, CNS-1730158,
ACI-1540112, & ACI-1541349
• University of California Office of the President CIO
• UCSD Chancellor’s Integrated Digital Infrastructure Program
• UCSD Next Generation Networking initiative
• Calit2 and Calit2 Qualcomm Institute
• CENIC, PacificWave and StarLight
• DOE ESnet