Talk at the EarthCube End-User Domain Workshop for Rock Deformation and Mineral Physics Research.
By Martin Kunz, Lawrence Berkeley National Laboratory
Welcome & Workshop Objectives: Introduction to COMPRES by Jay Bass, Universit...EarthCube
Talk at the EarthCube End-User Domain Workshop for Rock Deformation and Mineral Physics Research.
By Jay Bass, University of Illinois at Urbana-Champaign
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...Larry Smarr
National Ocean Exploration Forum 2017
Ocean Exploration in a Sea of Data
Calit2’s Qualcomm Institute
University of California, San Diego
October 21, 2017
The Pacific Research Platform (PRP) aims to achieve transparent and rapid data access among collaborating scientists at multiple institutions through an integrated implementation of data-focused networking that extends the university campus Science DMZ model to a regional, national, and, eventually, a global scale.
PRP researchers are routinely achieving high-performance end-to-end networking from their labs to their collaborators’ labs and data centers, traversing multiple, heterogeneous Science DMZs and wide-area networks connecting multiple campus gateways, enabling researchers across the partnership to transfer data over dedicated optical lightpaths at speeds from 10Gb/s to 100Gb/s.
Cyberinfrastructure to Support Ocean ObservatoriesLarry Smarr
05.03.18
Invited Talk to the Ocean Studies Board
National Research Council
Title: Cyberinfrastructure to Support Ocean Observatories
University of California San Diego
CENIC: Pacific Wave and PRP Update Big News for Big DataLarry Smarr
The document discusses the Pacific Wave exchange and Pacific Research Platform (PRP). It provides an overview of Pacific Wave, including its history and connectivity across the Pacific and western US. It then discusses how the PRP will build on infrastructure projects to create a high-speed "big data freeway" for science across California universities. This will allow researchers to more easily share and analyze large datasets for projects in areas like climate modeling, cancer genomics, astronomy and particle physics. Details are provided on specific science applications and datasets that will benefit from the enhanced connectivity of the PRP.
Creating a Big Data Machine Learning Platform in CaliforniaLarry Smarr
Big Data Tech Forum: Big Data Enabling Technologies and Applications
San Diego Chinese American Science and Engineering Association (SDCASEA)
Sanford Consortium
La Jolla, CA
December 2, 2017
Using the Pacific Research Platform for Earth Sciences Big DataLarry Smarr
Grand Challenge Lecture
Big Data and the Earth Sciences: Grand Challenges Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
May 31, 2017
Welcome & Workshop Objectives: Introduction to COMPRES by Jay Bass, Universit...EarthCube
Talk at the EarthCube End-User Domain Workshop for Rock Deformation and Mineral Physics Research.
By Jay Bass, University of Illinois at Urbana-Champaign
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...Larry Smarr
National Ocean Exploration Forum 2017
Ocean Exploration in a Sea of Data
Calit2’s Qualcomm Institute
University of California, San Diego
October 21, 2017
The Pacific Research Platform (PRP) aims to achieve transparent and rapid data access among collaborating scientists at multiple institutions through an integrated implementation of data-focused networking that extends the university campus Science DMZ model to a regional, national, and, eventually, a global scale.
PRP researchers are routinely achieving high-performance end-to-end networking from their labs to their collaborators’ labs and data centers, traversing multiple, heterogeneous Science DMZs and wide-area networks connecting multiple campus gateways, enabling researchers across the partnership to transfer data over dedicated optical lightpaths at speeds from 10Gb/s to 100Gb/s.
Cyberinfrastructure to Support Ocean ObservatoriesLarry Smarr
05.03.18
Invited Talk to the Ocean Studies Board
National Research Council
Title: Cyberinfrastructure to Support Ocean Observatories
University of California San Diego
CENIC: Pacific Wave and PRP Update Big News for Big DataLarry Smarr
The document discusses the Pacific Wave exchange and Pacific Research Platform (PRP). It provides an overview of Pacific Wave, including its history and connectivity across the Pacific and western US. It then discusses how the PRP will build on infrastructure projects to create a high-speed "big data freeway" for science across California universities. This will allow researchers to more easily share and analyze large datasets for projects in areas like climate modeling, cancer genomics, astronomy and particle physics. Details are provided on specific science applications and datasets that will benefit from the enhanced connectivity of the PRP.
Creating a Big Data Machine Learning Platform in CaliforniaLarry Smarr
Big Data Tech Forum: Big Data Enabling Technologies and Applications
San Diego Chinese American Science and Engineering Association (SDCASEA)
Sanford Consortium
La Jolla, CA
December 2, 2017
Using the Pacific Research Platform for Earth Sciences Big DataLarry Smarr
Grand Challenge Lecture
Big Data and the Earth Sciences: Grand Challenges Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
May 31, 2017
Opening Keynote Lecture
15th Annual ON*VECTOR International Photonics Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
February 29, 2016
GeoCENS Source Talk: Results from an Atlantic Rainforest Micrometeorology Sen...Cybera Inc.
Rob Fatland gave this presentation to the GeoCENS SSC Workshop on the current efforts, projects, and tools towards advancing environmental science in Banff, AB, September 23, 2010.
Reusable Software and Open Data To Optimize AgricultureDavid LeBauer
Abstract:
Humans need a secure and sustainable food supply, and science can help. We have an opportunity to transform agriculture by combining knowledge of organisms and ecosystems to engineer ecosystems that sustainably produce food, fuel, and other services. The challenge is that the information we have. Measurements, theories, and laws found in publications, notebooks, measurements, software, and human brains are difficult to combine. We homogenize, encode, and automate the synthesis of data and mechanistic understanding in a way that links understanding at different scales and across domains. This allows extrapolation, prediction, and assessment. Reusable components allow automated construction of new knowledge that can be used to assess, predict, and optimize agro-ecosystems.
Developing reusable software and open-access databases is hard, and examples will illustrate how we use the Predictive Ecosystem Analyzer (PEcAn, pecanproject.org), the Biofuel Ecophysiological Traits and Yields database (BETYdb, betydb.org), and ecophysiological crop models to predict crop yield, decide which crops to plant, and which traits can be selected for the next generation of data driven crop improvement. A next step is to automate the use of sensors mounted on robots, drones, and tractors to assess plants in the field. The TERRA Reference Phenotyping Platform (TERRA-Ref, terraref.github.io) will provide an open access database and computing platform on which researchers can use and develop tools that use sensor data to assess and manage agricultural and other terrestrial ecosystems.
TERRA-Ref will adopt existing standards and develop modular software components and common interfaces, in collaboration with researchers from iPlant, NEON, AgMIP, USDA, rOpenSci, ARPA-E, many scientists and industry partners. Our goal is to advance science by enabling efficient use, reuse, exchange, and creation of knowledge.
---
Invited talk for the "Informatics for Reproducibility in Earth and Environmental Science Research" session at the American Geophysical Union Fall Meeting, Dec 17 2015.
In this video from the HPC User Forum at Argonne, Dr. Brett Bode from NCSA presents: Research on Blue Waters.
"Blue Waters is one of the most powerful supercomputers in the world and is one of the fastest supercomputers on a university campus. Scientists and engineers across the country use the computing and data power of Blue Waters to tackle a wide range of challenging problems, from predicting the behavior of complex biological systems to simulating the evolution of the cosmos."
Watch the video: https://wp.me/p3RLHQ-kYx
Learn more: http://www.ncsa.illinois.edu/enabling/bluewaters
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
ADASS XXV: LSST DM - Building the Data System for the Era of Petascale Optica...Mario Juric
The document discusses the Large Synoptic Survey Telescope (LSST) data management system. It describes how LSST will image the entire visible sky every few nights over 10 years, generating 5 petabytes of data per year. It outlines the LSST data system, which will process and archive the data, producing catalogs and other data products that will be accessible to scientists. The ultimate goal is to transform the sky into a fully searchable database for astronomical research.
Applying Photonics to User Needs: The Application ChallengeLarry Smarr
05.02.28
Invited Talk to the 4th Annual On*VECTOR International Photonics Workshop
Sponsored by NTT Network Innovation Laboratories
Title: Applying Photonics to User Needs: The Application Challenge
University of California, San Diego
AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound ...Mario Juric
The document discusses large sky surveys and how they are transforming astronomy into a software-driven field. It focuses on the Large Synoptic Survey Telescope (LSST) project, which will be the largest sky survey to date. Some key points:
- LSST will image the entire visible sky every few nights for 10 years, collecting enormous amounts of data on billions of objects.
- Processing and analyzing this data poses major computational challenges and requires new techniques for extracting science from massive catalogs and datasets.
- LSST aims to deliver real-time alerts of changing/transient objects, yearly source catalogs with positions/measurements of billions of objects, and deep co-added images.
- The data
LSST Solar System Science: MOPS Status, the Science, and Your QuestionsMario Juric
1. The presentation summarized the status and plans for LSST's Moving Object Processing System (MOPS) and the expected science outcomes from LSST's solar system surveys.
2. MOPS development is ongoing, with plans to validate the system on data from the Zwicky Transient Facility in 2018 and integrate it into LSST's data processing system starting in 2019.
3. LSST is expected to discover over 600,000 new solar system objects and obtain high-precision light curves for millions of known objects over its 10-year mission.
The document provides an overview of the Pacific Research Platform (PRP) and discusses its role in connecting researchers across institutions and enabling new applications. It summarizes the PRP's key components like Science DMZs, Data Transfer Nodes (FIONAs), and use of Kubernetes for container management. Several examples are given of how the PRP facilitates high-performance distributed data analysis, access to remote supercomputers, and sensor networks coupled to real-time computing. Upcoming work on machine learning applications and expanding the PRP internationally is also outlined.
Peering The Pacific Research Platform With The Great Plains NetworkLarry Smarr
The Pacific Research Platform (PRP) connects research institutions across the western United States with high-speed networks to enable data-intensive science collaborations. Key points:
- The PRP connects 15 campuses across California and links to the Great Plains Network, allowing researchers to access remote supercomputers, share large datasets, and collaborate on projects like analyzing data from the Large Hadron Collider.
- The PRP utilizes Science DMZ architectures with dedicated data transfer nodes called FIONAs to achieve high-speed transfer of large files. Kubernetes is used to manage distributed storage and computing resources.
- Early applications include distributed climate modeling, wildfire science, plankton imaging, and cancer genomics. The PR
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...Larry Smarr
05.02.04
Invited Talk to the NASA Jet Propulsion Laboratory
Title: LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks and High Resolution Visualizations
Pasadena, CA
This document discusses several projects related to connecting research institutions through high-speed networks:
1) The Pacific Research Platform connects campuses in California through a "big data superhighway" funded by NSF from 2015-2020.
2) CHASE-CI adds machine learning capabilities for researchers across 10 campuses in California using NSF-funded GPU resources.
3) A pilot project is using CENIC and Internet2 to connect regional research networks on a national scale, funded by NSF from 2018-2019.
Looking Back, Looking Forward NSF CI Funding 1985-2025Larry Smarr
This document provides an overview of the development of national research platforms (NRPs) from 1985 to the present, with a focus on the Pacific Research Platform (PRP). It describes the evolution of the PRP from early NSF-funded supercomputing centers to today's distributed cyberinfrastructure utilizing optical networking, containers, Kubernetes, and distributed storage. The PRP now connects over 15 universities across the US and internationally to enable data-intensive science and machine learning applications across multiple domains. Going forward, the document discusses plans to further integrate regional networks and partner with new NSF-funded initiatives to develop the next generation of NRPs through 2025.
The Pacific Research Platform Enables Distributed Big-Data Machine-LearningLarry Smarr
The Pacific Research Platform enables distributed big data machine learning by connecting scientific instruments, sensors, and supercomputers across California and the United States with high-speed optical networks. Key components include FIONA data transfer nodes that allow fast disk-to-disk transfers near the theoretical maximum, Kubernetes to orchestrate distributed computing resources, and the Nautilus hypercluster which aggregates thousands of CPU cores and GPUs into a unified platform. This infrastructure has accelerated many scientific workflows and supported cutting-edge research in fields such as astronomy, oceanography, climate science, and particle physics.
Predictive analysis aims to forecast future events or outcomes based on available data. This type of analysis can help anticipate needs or problems before they occur. By analyzing patterns in existing data, predictive models try to identify risks and opportunities that may impact business decisions going forward.
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data ContinuumVoltDB
In this webinar Marc Firenze, CTO of Eagle Investment Systems, and VoltDB will discuss the latest market and data management trends; and the growing need for real-time data consistency. He will also address in-memory database architecture for high performance, scale-out applications that doesn’t sacrifice data guarantees and how the combination of streaming analytics and fast operational data store represent the future for next gen data management services.
Opening Keynote Lecture
15th Annual ON*VECTOR International Photonics Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
February 29, 2016
GeoCENS Source Talk: Results from an Atlantic Rainforest Micrometeorology Sen...Cybera Inc.
Rob Fatland gave this presentation to the GeoCENS SSC Workshop on the current efforts, projects, and tools towards advancing environmental science in Banff, AB, September 23, 2010.
Reusable Software and Open Data To Optimize AgricultureDavid LeBauer
Abstract:
Humans need a secure and sustainable food supply, and science can help. We have an opportunity to transform agriculture by combining knowledge of organisms and ecosystems to engineer ecosystems that sustainably produce food, fuel, and other services. The challenge is that the information we have. Measurements, theories, and laws found in publications, notebooks, measurements, software, and human brains are difficult to combine. We homogenize, encode, and automate the synthesis of data and mechanistic understanding in a way that links understanding at different scales and across domains. This allows extrapolation, prediction, and assessment. Reusable components allow automated construction of new knowledge that can be used to assess, predict, and optimize agro-ecosystems.
Developing reusable software and open-access databases is hard, and examples will illustrate how we use the Predictive Ecosystem Analyzer (PEcAn, pecanproject.org), the Biofuel Ecophysiological Traits and Yields database (BETYdb, betydb.org), and ecophysiological crop models to predict crop yield, decide which crops to plant, and which traits can be selected for the next generation of data driven crop improvement. A next step is to automate the use of sensors mounted on robots, drones, and tractors to assess plants in the field. The TERRA Reference Phenotyping Platform (TERRA-Ref, terraref.github.io) will provide an open access database and computing platform on which researchers can use and develop tools that use sensor data to assess and manage agricultural and other terrestrial ecosystems.
TERRA-Ref will adopt existing standards and develop modular software components and common interfaces, in collaboration with researchers from iPlant, NEON, AgMIP, USDA, rOpenSci, ARPA-E, many scientists and industry partners. Our goal is to advance science by enabling efficient use, reuse, exchange, and creation of knowledge.
---
Invited talk for the "Informatics for Reproducibility in Earth and Environmental Science Research" session at the American Geophysical Union Fall Meeting, Dec 17 2015.
In this video from the HPC User Forum at Argonne, Dr. Brett Bode from NCSA presents: Research on Blue Waters.
"Blue Waters is one of the most powerful supercomputers in the world and is one of the fastest supercomputers on a university campus. Scientists and engineers across the country use the computing and data power of Blue Waters to tackle a wide range of challenging problems, from predicting the behavior of complex biological systems to simulating the evolution of the cosmos."
Watch the video: https://wp.me/p3RLHQ-kYx
Learn more: http://www.ncsa.illinois.edu/enabling/bluewaters
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
ADASS XXV: LSST DM - Building the Data System for the Era of Petascale Optica...Mario Juric
The document discusses the Large Synoptic Survey Telescope (LSST) data management system. It describes how LSST will image the entire visible sky every few nights over 10 years, generating 5 petabytes of data per year. It outlines the LSST data system, which will process and archive the data, producing catalogs and other data products that will be accessible to scientists. The ultimate goal is to transform the sky into a fully searchable database for astronomical research.
Applying Photonics to User Needs: The Application ChallengeLarry Smarr
05.02.28
Invited Talk to the 4th Annual On*VECTOR International Photonics Workshop
Sponsored by NTT Network Innovation Laboratories
Title: Applying Photonics to User Needs: The Application Challenge
University of California, San Diego
AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound ...Mario Juric
The document discusses large sky surveys and how they are transforming astronomy into a software-driven field. It focuses on the Large Synoptic Survey Telescope (LSST) project, which will be the largest sky survey to date. Some key points:
- LSST will image the entire visible sky every few nights for 10 years, collecting enormous amounts of data on billions of objects.
- Processing and analyzing this data poses major computational challenges and requires new techniques for extracting science from massive catalogs and datasets.
- LSST aims to deliver real-time alerts of changing/transient objects, yearly source catalogs with positions/measurements of billions of objects, and deep co-added images.
- The data
LSST Solar System Science: MOPS Status, the Science, and Your QuestionsMario Juric
1. The presentation summarized the status and plans for LSST's Moving Object Processing System (MOPS) and the expected science outcomes from LSST's solar system surveys.
2. MOPS development is ongoing, with plans to validate the system on data from the Zwicky Transient Facility in 2018 and integrate it into LSST's data processing system starting in 2019.
3. LSST is expected to discover over 600,000 new solar system objects and obtain high-precision light curves for millions of known objects over its 10-year mission.
The document provides an overview of the Pacific Research Platform (PRP) and discusses its role in connecting researchers across institutions and enabling new applications. It summarizes the PRP's key components like Science DMZs, Data Transfer Nodes (FIONAs), and use of Kubernetes for container management. Several examples are given of how the PRP facilitates high-performance distributed data analysis, access to remote supercomputers, and sensor networks coupled to real-time computing. Upcoming work on machine learning applications and expanding the PRP internationally is also outlined.
Peering The Pacific Research Platform With The Great Plains NetworkLarry Smarr
The Pacific Research Platform (PRP) connects research institutions across the western United States with high-speed networks to enable data-intensive science collaborations. Key points:
- The PRP connects 15 campuses across California and links to the Great Plains Network, allowing researchers to access remote supercomputers, share large datasets, and collaborate on projects like analyzing data from the Large Hadron Collider.
- The PRP utilizes Science DMZ architectures with dedicated data transfer nodes called FIONAs to achieve high-speed transfer of large files. Kubernetes is used to manage distributed storage and computing resources.
- Early applications include distributed climate modeling, wildfire science, plankton imaging, and cancer genomics. The PR
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...Larry Smarr
05.02.04
Invited Talk to the NASA Jet Propulsion Laboratory
Title: LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks and High Resolution Visualizations
Pasadena, CA
This document discusses several projects related to connecting research institutions through high-speed networks:
1) The Pacific Research Platform connects campuses in California through a "big data superhighway" funded by NSF from 2015-2020.
2) CHASE-CI adds machine learning capabilities for researchers across 10 campuses in California using NSF-funded GPU resources.
3) A pilot project is using CENIC and Internet2 to connect regional research networks on a national scale, funded by NSF from 2018-2019.
Looking Back, Looking Forward NSF CI Funding 1985-2025Larry Smarr
This document provides an overview of the development of national research platforms (NRPs) from 1985 to the present, with a focus on the Pacific Research Platform (PRP). It describes the evolution of the PRP from early NSF-funded supercomputing centers to today's distributed cyberinfrastructure utilizing optical networking, containers, Kubernetes, and distributed storage. The PRP now connects over 15 universities across the US and internationally to enable data-intensive science and machine learning applications across multiple domains. Going forward, the document discusses plans to further integrate regional networks and partner with new NSF-funded initiatives to develop the next generation of NRPs through 2025.
The Pacific Research Platform Enables Distributed Big-Data Machine-LearningLarry Smarr
The Pacific Research Platform enables distributed big data machine learning by connecting scientific instruments, sensors, and supercomputers across California and the United States with high-speed optical networks. Key components include FIONA data transfer nodes that allow fast disk-to-disk transfers near the theoretical maximum, Kubernetes to orchestrate distributed computing resources, and the Nautilus hypercluster which aggregates thousands of CPU cores and GPUs into a unified platform. This infrastructure has accelerated many scientific workflows and supported cutting-edge research in fields such as astronomy, oceanography, climate science, and particle physics.
Predictive analysis aims to forecast future events or outcomes based on available data. This type of analysis can help anticipate needs or problems before they occur. By analyzing patterns in existing data, predictive models try to identify risks and opportunities that may impact business decisions going forward.
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data ContinuumVoltDB
In this webinar Marc Firenze, CTO of Eagle Investment Systems, and VoltDB will discuss the latest market and data management trends; and the growing need for real-time data consistency. He will also address in-memory database architecture for high performance, scale-out applications that doesn’t sacrifice data guarantees and how the combination of streaming analytics and fast operational data store represent the future for next gen data management services.
Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Evolving data streams are contributing to the growth of data created over the last few years. We are creating the same quantity of data every two days, as we created from the dawn of time up until 2003. Evolving data streams methods are becoming a low-cost, green methodology for real time online prediction and analysis. We discuss the current and future trends of mining evolving data streams, and the challenges that the field will have to overcome during the next years.
Non-interactive big-data analysis prohibits experimentation and can interrupt the analyst’s train of thoughts but analyzing and drawing insights in real time is no easy task with jobs often taking minutes/hours to complete. What if you want to put a interactive interface in front of that data that allows iterative insights? What if you need that interactive experience to be sub second?
Traditional SQL and most MPP/NoSQL databases cannot run complex calculations over large data in a performant manner. Popular distributed systems such as Hadoop or Spark can execute jobs but their job overhead prohibits sub second response times. Learn how an in-memory computing framework enabled us to perform complex analysis jobs on massive data points with sub second response times — allowing us to plug it into a simple, drag-and-drop web 2.0 interface.
Presentation by Dr. Peter Bruce, Statistics.com. Presented on April 27, 2012 at the MRA Spring Research Symposium hosted by the Mid-Atlantic Chapter of the Marketing Research Association.
This document discusses real-time big data analytics from deployment to production. It covers:
1) Distilling raw data like log files and sensor streams into structured data using Hadoop for analytics.
2) Developing predictive models using techniques like decision trees, clustering, and ensembles on structured data.
3) Deploying models for real-time scoring via SQL, code, or PMML on either batch lookup tables or streaming data factors.
4) Scoring billions of predictions daily for applications like determining why customers buy products and attributing marketing channels.
5) Regularly refreshing models to incorporate new data and outcomes using techniques like exploratory analysis and time-to-event modeling
This document provides an overview of big data and real-time analytics, defining big data as high volume, high velocity, and high variety data that requires new technologies and techniques to capture, manage and process. It discusses the importance of big data, key technologies like Hadoop, use cases across various industries, and challenges in working with large and complex data sets. The presentation also reviews major players in big data technologies and analytics.
The document discusses predictive analytics and forecasting. It defines predictive analytics as producing predictive scores for each customer or organizational element, while forecasting provides aggregate estimates such as total sales. Prediction involves classifying outcomes like customer retention, while forecasting understands trends and seasonality. Predictive modeling creates statistical models of future behavior by collecting and analyzing data to predict outcomes. Common predictive algorithms include logistic regression, decision trees, naive bayes, and clustering.
1. Real-time analytics of social networks can help companies detect new business opportunities by understanding customer needs and reactions in real-time.
2. MOA and SAMOA are frameworks for analyzing massive online and distributed data streams. MOA deals with evolving data streams using online learning algorithms. SAMOA provides a programming model for distributed, real-time machine learning on data streams.
3. Both tools allow companies to gain insights from social network and other real-time data to understand customers and react to opportunities.
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Spark Summit
Redis accelerates Apache Spark execution by 45 times, when used as a shared distributed in-memory datastore for Spark in analyses like time series data range queries. With the redis module for machine learning, redis-ml, implementation of spark-ml models gains a new real time serving layer that offloads processing of models directly in Redis, allows multiple applications to reuse the same models and speeds up classification and execution of these models by 13x. Join this session to learn more about the Redis Labs’ connector for Apache Spark that enhances production implementations of real-time big data processing.
This document provides an introduction to predictive analytics. It defines analytics and predictive analytics, comparing their purposes and differences. Analytics uses past data to understand trends while predictive analytics anticipates the future. Business intelligence involves using data to support decision making and aims to provide historical, current and predictive views of business. As technologies advanced, business intelligence evolved from being organized under IT to potentially being aligned under strategy management. Effective communication between business and analytics professionals is important for organizations to benefit from predictive analytics. The business case for predictive analytics includes enabling strategic planning, competitive analysis, and improving business processes to work smarter.
Real Time Analytics: Algorithms and SystemsArun Kejariwal
In this tutorial, an in-depth overview of streaming analytics -- applications, algorithms and platforms -- landscape is presented. We walk through how the field has evolved over the last decade and then discuss the current challenges -- the impact of the other three Vs, viz., Volume, Variety and Veracity, on Big Data streaming analytics.
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
Building Your Own Facebook Real Time Analytics System with Cassandra and GigaSpaces.
Facebook's real time analytics system is a good reference for those looking to build their real time analytics system for big data.
The first part covers the lessons from Facebook's experience and the reason they chose HBase over Cassandra.
In the second part of the session, we learn how we can build our own Real Time Analytics system, achieve better performance, gain real business insights, and business analytics on our big data, and make the deployment and scaling significantly simpler using the new version of Cassandra and GigaSpaces Cloudify.
This presentation introduces big data and explains how to generate actionable insights using analytics techniques. The deck explains general steps involved in a typical analytics project and provides a brief overview of the most commonly used predictive analytics methods and their business applications.
Vijay Adamapure is a Data Science Enthusiast with extensive experience in the field of data mining, predictive modeling and machine learning. He has worked on numerous analytics projects ranging from healthcare, business analytics, renewable energy to IoT.
Vijay presented these slides during the Internet of Everything Meetup event 'Predictive Analytics - An Overview' that took place on Jan. 9, 2015 in Mumbai. To join the Meetup group, register here: http://bit.ly/1A7T0A1
This document provides an overview of predictive analytics, including its evolution, definition, process, tools and techniques. It discusses how predictive analytics is being used across various industries to optimize outcomes, increase revenue and reduce costs. Specific use cases are outlined, such as using IoT sensor data and predictive models to improve risk calculations for auto insurance, optimize energy usage in buildings, enhance customer recommendations, and optimize policy interventions. Business cases focus on how companies in various sectors leverage customer data and predictive analytics to increase digital marketing effectiveness, revenues, and customer loyalty. Overall, the document examines current and emerging applications of predictive analytics across different domains.
This a reduced PDF version of the hardcover book available at http://www.lulu.com/shop/jeffrey-strickland/predictive-analytics-using-r/hardcover/product-22000910.html, at a 40% discount. It will soon be available on Amazon.
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...PyData
By Kerstin Kleese van Dam
PyData New York City 2017
New instrument technologies are enabling a new generation of in-situ and in-operando experiments, with extremely fine spatial and temporal resolution, that allows researchers to observe as physics, chemistry and biology are happening. These new methodologies go hand in hand with an exponential growth in data volumes and rates - petabyte scale data collections and terabyte/sec. At the same time scientists are pushing for a paradigm shift. As they can now observe processes in intricate details, they want to analyze, interpret and control those processes. Given the multitude of voluminous, heterogenous data streams involved in every single experiment, novel real time, data driven analysis and decision support approaches are needed to realize their vision. This talk will discuss state of the art streaming analysis for experimental facilities, its challenges and early successes. It will present where commercial technologies can be leveraged and how many of the novel approaches differ from commonly available solutions.
Toward a Global Interactive Earth Observing CyberinfrastructureLarry Smarr
The document discusses the need for a new generation of cyberinfrastructure to support interactive global earth observation. It outlines several prototyping projects that are building examples of systems enabling real-time control of remote instruments, remote data access and analysis. These projects are driving the development of an emerging cyber-architecture using web and grid services to link distributed data repositories and simulations.
Big Fast Data in High-Energy Particle PhysicsAndrew Lowe
Experiments at CERN (the European Organization for Nuclear Research) generate colossal amounts of data. Physicists must sift through about 30 petabytes of data produced annually in their search for new particles and interesting physics. The tidal wave of data produced by the Large Hadron Collider (LHC) at CERN places an unprecedented challenge for experiments' data acquisition systems, and it is the need to select rare physics processes with high efficiency while rejecting high-rate background processes that drives the architectural decisions and technology choices. Although filtering and managing large data sets is of course not exclusive to particle physics, the approach that has been taken is somewhat unique. In this talk, I describe the typical journey taken by data from the readout electronics of one experiment to the results of a physics analysis.
CERN is a global scientific research organization located in Geneva, Switzerland that operates the largest particle physics laboratory in the world. It was founded in 1954 and has over 10,000 scientists from over 100 countries working on experiments to study the fundamental constituents of matter and the forces that act between them. CERN generates enormous amounts of data from experiments like the Large Hadron Collider, with over 15 petabytes of new data generated each year that is distributed to computing centers around the world for analysis. Solving the mysteries of the universe through these experiments requires advanced computing technologies and global collaboration to process and make sense of the massive volumes of data being collected.
The Pacific Research Platform Two Years InLarry Smarr
This document provides an overview of the Pacific Research Platform (PRP) after two years of operation. It describes several science drivers that are using the PRP, including biomedical research on cancer genomics and microbiomes, earth sciences like earthquake modeling, and astronomy. It highlights how the PRP is connecting sites like UC San Diego, UC Santa Cruz, UC Berkeley to share and analyze large datasets using high-speed networks. The PRP is expanding to support new areas like deep learning, cultural heritage projects, and connecting additional UC campuses through network upgrades.
The document discusses how computation can accelerate the generation of new knowledge by enabling large-scale collaborative research and extracting insights from vast amounts of data. It provides examples from astronomy, physics simulations, and biomedical research where computation has allowed more data and researchers to be incorporated, advancing various fields more quickly over time. Computation allows for data sharing, analysis, and hypothesis generation at scales not previously possible.
The document discusses the evolving landscape of semantic technologies and their applications to scientific domains like eScience. It introduces the Tetherless World Constellation, a research group applying semantic web techniques. Examples are given of projects applying semantics to areas like virtual observatories and provenance capture. The value of semantic technologies is discussed for integration, discovery, and validation of scientific data and models. Modular ontologies and semantically-enabled frameworks are presented as important directions for reuse and collaboration.
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
In this deck from DataTech19, Debbie Bard from NERSC presents: Supercomputing and the scientist: How HPC and large-scale data analytics are transforming experimental science.
"Debbie Bard leads the Data Science Engagement Group NERSC. NERSC is the mission supercomputing center for the USA Department of Energy, and supports over 7000 scientists and 700 projects with supercomputing needs. A native of the UK, her career spans research in particle physics, cosmology and computing on both sides of the Atlantic. She obtained her PhD at Edinburgh University, and has worked at Imperial College London as well as the Stanford Linear Accelerator Center (SLAC) in the USA, before joining the Data Department at NERSC, where she focuses on data-intensive computing and research, including supercomputing for experimental science and machine learning at scale."
Watch the video: https://wp.me/p3RLHQ-kLV
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Science and Cyberinfrastructure in the Data-Dominated EraLarry Smarr
10.02.22
Invited talk
Symposium #1610, How Computational Science Is Tackling the Grand Challenges Facing Science and Society
Title: Science and Cyberinfrastructure in the Data-Dominated Era
San Diego, CA
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...Larry Smarr
Invited Presentation
Symposium on Computational Biology and Bioinformatics:
Remembering John Wooley
National Institutes of Health
Bethesda, MD
July 29, 2016
The Transformation of Systems Biology Into A Large Data ScienceRobert Grossman
Systems biology is becoming a data-intensive science due to the exponential growth of genomic and biological data. Large projects now produce petabytes of data that require new computational infrastructure to store, manage, and analyze. Cloud computing provides elastic resources that can scale to support the increasing data needs of systems biology. Case studies show how clouds are used for large-scale data integration and analysis, running combinatorial analysis over genomic marks, and enabling reanalysis of biological data through elastic virtual machines. The Open Cloud Consortium is working to provide open cloud resources for biological and biomedical research through testbeds and proposed bioclouds.
Opportunities for X-Ray science in future computing architecturesIan Foster
The world of computing continues to evolve rapidly. In just the past 10 years, we have seen the emergence of petascale supercomputing, cloud computing that provides on-demand computing and storage with considerable economies of scale, software-as-a-service methods that permit outsourcing of complex processes, and grid computing that enables federation of resources across institutional boundaries. These trends shown no signs of slowing down: the next 10 years will surely see exascale, new cloud offerings, and terabit networks. In this talk I review various of these developments and discuss their potential implications for a X-ray science and X-ray facilities.
The document provides an overview of plant genome sequence assembly, including:
1) A brief history of sequencing technologies and their improvements over time, from Sanger sequencing to newer technologies producing longer reads.
2) Key steps in a sequencing project including read processing, filtering, and corrections before assembly into contigs and scaffolds using appropriate software.
3) Factors to consider for experimental design and assembly optimization such as sequencing depth, library types, and software choices depending on the genome and data characteristics.
Dr. Frank Wuerthwein from the University of California at San Diego presentation at International Super Computing Conference on Big Data, 2013, US Until recently, the large CERN experiments, ATLAS and CMS, owned and controlled the computing infrastructure they operated on in the US, and accessed data only when it was locally available on the hardware they operated. However, Würthwein explains, with data-taking rates set to increase dramatically by the end of LS1 in 2015, the current operational model is no longer viable to satisfy peak processing needs. Instead, he argues, large-scale processing centers need to be created dynamically to cope with spikes in demand. To this end, Würthwein and colleagues carried out a successful proof-of-concept study, in which the Gordon Supercomputer at the San Diego Supercomputer Center was dynamically and seamlessly integrated into the CMS production system to process a 125-terabyte data set.
Plenary talk at the international Synchrotron Radiation Instrumentation conference in Taiwan, on work with great colleagues Ben Blaiszik, Ryan Chard, Logan Ward, and others.
Rapidly growing data volumes at light sources demand increasingly automated data collection, distribution, and analysis processes, in order to enable new scientific discoveries while not overwhelming finite human capabilities. I present here three projects that use cloud-hosted data automation and enrichment services, institutional computing resources, and high- performance computing facilities to provide cost-effective, scalable, and reliable implementations of such processes. In the first, Globus cloud-hosted data automation services are used to implement data capture, distribution, and analysis workflows for Advanced Photon Source and Advanced Light Source beamlines, leveraging institutional storage and computing. In the second, such services are combined with cloud-hosted data indexing and institutional storage to create a collaborative data publication, indexing, and discovery service, the Materials Data Facility (MDF), built to support a host of informatics applications in materials science. The third integrates components of the previous two projects with machine learning capabilities provided by the Data and Learning Hub for science (DLHub) to enable on-demand access to machine learning models from light source data capture and analysis workflows, and provides simplified interfaces to train new models on data from sources such as MDF on leadership scale computing resources. I draw conclusions about best practices for building next-generation data automation systems for future light sources.
Computational Training and Data Literacy for Domain ScientistsJoshua Bloom
This document discusses training domain scientists in computational and data skills. It notes the increasing amount of data in fields like astronomy and challenges of traditional approaches. It advocates teaching skills like statistics, machine learning, and programming. Examples are given of bootcamps, seminars and degree programs in these areas at UC Berkeley taught by CS and statistics faculty. Challenges discussed include fitting such training into formal curricula and ensuring participation from underrepresented groups. The creation of collaborative spaces is proposed to better connect domain scientists with methodological experts to help scientists address the growing role of data in their fields.
In this video from the DDN User Group Meeting at SC14, Steve Simms from Indiana University presents: Indiana University's Data Capacitor II.
"The High Performance File Systems unit of UITSResearch Technologies operates two separate high-speed file systems for temporary storage of research data. Both use the open sourceLustre parallel distributed file system running on a version of theLinux operating system: Data Capacitor II (DC2) is a larger, faster replacement for the former Data Capacitor, which was decommissioned January 7, 2014. Like its predecessor, DC2 is a large-capacity, high-throughput, high-bandwidth Lustre-based file system serving all IU campuses. It is mounted on the Big Red II, Karst,Quarry, and Mason research computing systems."
Similar to Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL (20)
EarthCube Community Webinar held Tuesday, Dec. 9th at 11:00 PST/2:00 EST for a virtual kick-off of the new 'Demonstration Phase' of EarthCube, including statements from your Leadership Council members and an update from NSF Program Officer, Eva Zanzerkia.
Engagement Team monthly meeting 10.10.2014EarthCube
The document outlines the agenda and priorities for an EarthCube Demonstration Governance Engagement Team meeting in October 2014. The agenda includes an introduction, announcing a team representative to the Leadership Council, developing internal leadership, reviewing priorities and logistical functions, and discussing future meeting schedules. Key priorities and deliverables for the team are to develop an outreach and communications plan to engage the EarthCube community and stakeholders through compiling science use cases. Housekeeping, meeting leadership, point of contact roles, work management, and collaboration with other groups are listed as important logistical functions for the team.
The document summarizes the agenda and priorities for an October meeting of the Science Standing Committee. The agenda includes an introduction, announcing committee representatives, developing internal leadership, and reviewing priorities and logistical functions. The committee's year 1 intended outcome is to support work to complete the year 1 deliverable of developing science use cases. Their priorities are housekeeping tasks like assigning a meeting lead and point of contact for the oversight office.
This document summarizes an EarthCube meeting to discuss funded demonstration projects and governance. It outlines the agenda, including introductions from new project teams and a discussion of the role of funded projects. Key points include that the Test Governance project will coordinate the demonstration governance process and report outcomes to NSF. Both the Technology & Architecture Committee and Science Committee outlined initial steps, including forming subcommittees to analyze use cases and gaps. The meeting concluded with a discussion of how funded projects can best work with standing committees through formal work plans, representatives, and regular communication.
Technology and Architecture Committee meeting slides 10.06.14EarthCube
The October meeting agenda of the EarthCube Technology and Architecture Standing Committee included:
1) Welcome and introductions
2) Announcement of new committee representatives
3) Discussion of the committee's internal leadership structure and responsibilities, including coordinating with other groups, monitoring working groups, and sponsoring new working groups.
4) Review of timelines for upcoming milestones and deliverables and discussion of future meeting schedules.
EarthCube Governance Intro for Solar Terrestrial End-user WorkshopEarthCube
Presentation by the EarthCube Test Enterprise Governance project for the Solar Terrestrial Research End-User Workshop, Newark, New Jersey, August 14, 2014.
AHM 2014: The CSDMS Standard Names, Cross-Domain Naming Conventions for Descr...EarthCube
The document discusses the CSDMS Standard Names, which provide unambiguous naming conventions for describing process models, data sets, and their associated variables. The standard names aim to avoid ambiguity and domain-specific terminology. They support naming quantities, processes, mathematical operations, assumptions, and more. Developing and applying standard names helps different models to automatically match variables and understand each other.
AHM 2014: Addressing Data and Heterogeneity, Semantic Building Blocks & CI Pe...EarthCube
This panel will address data heterogeneity issues in EarthCube from the perspective of semantic building blocks and cyberinfrastructure. The panel, convened by Gary Berg-Cross of SOCoP, will feature co-conveners Pascal Hitzler of Wright State University, Kerstin Lehnert of LDEO, Columbia University, and Peter Wiebe of Woods Hole Oceanographic Institution. Additional panelists will include Scott Peckham of University of Colorado Boulder, Anthony Aufdenkampe of Stroud Water Research Center, Tim Finin of University of Maryland Baltimore County, and Krzysztof Janowicz of University of California Santa Barbara.
AHM 2014: Revisting Governance Model, Preparing for Next StepsEarthCube
The document lists several potential priorities for EarthCube including developing an emergent architecture, identifying and promoting success stories, providing guidelines for shared services, developing common end user training, benchmarking progress against scientific needs, creating a prototype to demonstrate connectivity and functionality, documenting scientific workflows, and coordinating projects. Additional options mentioned are scoping and articulating a vision, identifying collaborations, documenting use cases, engaging academia in education, improving data management plans and data discovery, establishing light governance led by scientists, tying different design efforts together, determining funding mechanisms, adopting standards, enabling participation from diverse fields, and engaging stakeholders.
AHM 2014: Integrated Data Management System for Critical Zone ObservatoriesEarthCube
Presentation by Anthony Aufdenkampe during the Addressing Data Heterogeneity, Semantic Building Bloack & CI Perspective Session on Day 2, June 25 at the EarthCube All-Hands Meeting
The document discusses the CSDMS Standard Names, which are naming conventions developed by the Community Surface Dynamics Modeling System (CSDMS) modeling framework to facilitate automatic coupling of models and data sets from different contributors. The naming conventions follow an object-oriented approach where each standard variable name is composed of an object name and quantity name joined by double underscores. This allows framework software to retrieve numerical values for variables based on their standardized names. The naming conventions were designed according to criteria such as avoiding ambiguity, using widely understood terminology, and supporting mathematical operations and assumptions. They address challenges of automatic semantic mediation when coupling diverse resources that use different naming systems.
The document discusses a watershed modeling system called BCube that aims to decrease the effort of watershed initialization by brokering various global geospatial and environmental data required for watershed modeling. BCube allows researchers to focus on scientific research by providing a single access point to the different data formats and sources for elevation, soils, land use, weather, and other data needed to set up and run watershed models. The document provides an overview of the types of data BCube can broker and the workflow where a scientist requests data for a watershed area and BCube returns the available options to choose from.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
HCL Notes and Domino License Cost Reduction in the World of DLAU
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL
1. Towards real-time analysis of large data volumes for synchrotron experiments
Martin Kunz, Nobumichi Tamura
Advanced Light Source, Lawrence Berkeley National Lab
2. Towards real-time analysis of large data volumes
for synchrotron experiments
Acknowledgements
- Jack Deslippe, David Skinner (NERSC)
- Abdelilah Essiari , Craig E. Tull (LBNL-CRD)
- Eli Dart (ESNET)
- Dula Parkinson (LBNL – ALS)
3. Towards real-time analysis of large data volumes
for synchrotron experiments
X-rays and Earth-Sciences; the story of a moving bottle-neck:
1960’s / 1970’s
X-ray Source
X-ray Detectors
Henry Levy with Picker 5-circle and PDP-5
Data Analysis
Publication
4. Towards real-time analysis of large data volumes
for synchrotron experiments
X-rays and Earth-Sciences; the story of a moving bottle-neck:
1980’s / 1990’s
X-ray Source
X-ray Detectors
1995: “MD Storm”: Readout time: 45 minutes
Data Analysis
Publication
5. Towards real-time analysis of large data volumes
for synchrotron experiments
X-rays and Earth-Sciences; the story of a moving bottle-neck:
2000’s / 2010’s
X-ray Source
X-ray Detectors
Data Analysis
Publication
6. Towards real-time analysis of large data volumes
for synchrotron experiments
X-rays and Earth-Sciences; the story of a moving bottle-neck:
Future:
X-ray Source
X-ray Detectors
Interactive access to supercomputers
Data Analysis
Publication
7. Towards real-time analysis of large data volumes
for synchrotron experiments
Examples of mineral physics related experiments with high data rates:
1) In situ powder diffraction with automated P-T stepping:
ALS BL 12.2.2 with Perkin Elmer detector (~ 0 read-out delay)
http://www.ltp-oldenburg.de
Data rate in the order of 1000’s of frames per day (i.e. 10’s of GB/day)
8. Towards real-time analysis of large data volumes
for synchrotron experiments
Examples of mineral physics related experiments with high data rates:
2) Micro-diffraction / phase/orientation/strain-mapping at high spatial resolution
Micro-diffraction set-up at ALS beamline 12.3.2 with
Pilatus-1M detector.
Left: Distribution of Re3N (black) and Re (blue) grown in a laser-heated DAC
Right: Relative orientation of Re3N grains.
Source: Friedrich et al. (2010), PRL (105), 085504.
Data rate in the order of 10000’s of frames per day (i.e. 100’s of GB/day)
9. Towards real-time analysis of large data volumes
for synchrotron experiments
Examples of mineral physics related experiments with high data rates:
3) Tomography 3d-mapping of geo-materials:
X-rays
Scintillator
Supercritical CO2 penetrating sandstone on ALS BL 8.3.2 (courtesy J
Ajo-Franklin)
Tomography set-up at ALS beamline 8.3.2
Distribution of Fe-alloy melt prepared at 64 GPa measured at SSRL. Shi et al. (2013)
Nature Geosciences. DOI: 10.1038/NGEO1956
Data rate in the order of 100’000’s of frames per day (i.e. TB’s/day)
10. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
- 24 dual-socket AMD Opteron 248 2.2Ghz processor nodes 48 CPU’s
- 48 GB aggregate memory
- 14 TB shared disk storage
- Gigabit Ethernet interconnect
- 212 GFLOPS (theoretical peak)
11. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
1) User tunes parameters manually on some ‘typical’ patterns
12. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
1) Analysis Parameters are written into a instruction-file
13. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
1) Analysis Parameters are written into a instruction-file
14. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
2) Launch parsing script:
-> reads instruction file and parses data-file onto available CPU’s
-> writes batch files which manage individual CPU’s
-> launches software on each node
15. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
3) Results are written in a single file which can be viewed and further analyzed and published:
Relative lattice orientation: Gives domain structure.
Total color range blue to red corresponds to 4 degs rotation.
Average Intensity: Gives high-res fine structure of grain
16. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
1) Data are sent directly to NERSC for analysis and storage during data collection
Data are packaged:
- after every n images a ‘trigger file’ is deposited in a
directory which is monitored by NERSC.
- a SPADE web-app wraps the data (512 files at a
time) with HDF5 (hierarchical data format) and ships
them to NERSC via a Gigabit line (will be upgraded to
10G line).
- at NERSC data are received by a SPADE instance,
places them in target folder and on tape, and sends
an acknowledgment.
17. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
1) Data are sent directly to NERSC for analysis and storage during data collection Up and running
Transfer control is web-based
18. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
1) Data are sent directly to NERSC for analysis and storage during data collection Up and running
Transfer control is web-based
19. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
1) Data are sent directly to NERSC for analysis and storage during data collection: Up and running
Transfer control is web-based
20. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
2) Analysis parameters are set-up with a web-app - under development
21. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
2) Analysis parameters are set-up with a web-app - under development
Jobs are launched manually by user via same web-page.
Test-runs indicate analysis time in the order of data collection time;
can in principle run synchronous to data collection.
22. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
3) Analysis jobs are executed on Carver - under development
Carver is an IBM iDataPlex cluster
- 1202 nodes with a total of 9984 processor cores
- 106 Tflop/sec peak performance
- largest allocated parallel job is 512 cores
23. Towards real-time analysis of large data volumes
for synchrotron experiments
Summary:
- Data analysis is the new bottle-neck limiting progress in many aspects of experimental mineral
physics
- Real-time analysis with immediate feed-back is increasingly important in experimental mineral
physics
- These challenges cannot always be met with traditional desktop machines – software has to be
automatized and parallelized; collaborations with super-computing is becoming important also for
experimental scientists (at least for a few more iterations of Moore’s cycle).
- Data analysis on super-computers, remotely controlled with web-applications is a very promising
alley, allowing for big-data methods to enter mineral physics.
- Future developments may (must?) evolve away from super computers to highly parallelized
(GPU’s) local computers and/or cloud computing.
Editor's Notes
I would like to start off by giving a brief slightly personalized historic perspective on the application of X-rays in mineral physics research:
X-rays are applied in Earth Sciences on a routine basis for about 50 years, this story thus pretty much parallels my life. In the 60-ies and 70-ies, when I was just learning how to spell X-ray the first automated diffractometer replaced fully manual film techniques…. The brightness of the X-rays available in those days limited a data collection powder or single crystal to days and weeks.
This changed most dramatically with the advent of dedicated light sources, in particular high-energy 3rd generation sources such as the ESRF in Grenoble where the first dedicated mineral physics beamline ID30. I meanwhile managed to spell X-rays and thus was fortunate enough to be involved in the early days of said dedicated beamline. The brilliance of the ID30 undulator enabled experiments through a diamond anvil cell to be performed in matter of seconds. However, each data point required the physical transport of a 1 x 1 ft image plate to the one and only IP reader on the floor, plus a read-out time of about 45 minutes. Sadly, the tremendous increase in brightness and flux of the X-ray sources could only be utilized in a limited way.
Another twenty years later - the age-apropriate amount of light sources meanwhile doesn’t fit on my birthday cake anymore - we hail the advent of ultra-fast and ultra-low noise direct detection X-ray detectors such as the Perkin-Elmer or pilatus, which - in principle- allow data-point rates of up to 30 Hz. This leads to the possibility of large data rates. However, our capabil abilities to deal with these data are largely still on the level of high-end desktops and serial work-flow software. The opportunity given to us by the combination of ever brighter lightsources and fast detectors, I.e. to apply big-data methods to mineral physics research can therefore not be fully harnessed.
The way out of this bottleneck is in automatizing and parallelizing the analysis workflow using - at least for the time being - massively parallel super-computers. This is the approach we are presently taking at the Advanced Lightsource in collaboration with the National Energy Research Scientific Computing Center.
Let me quickly give you 3 examples of the order of magnitude of data rates we have to deal with:
Intense X-rays and fast detector, coupled with programmable T and P change allows a much denser coverage of the P-V-T surface and thus a much better description of thermo-elastic properties of Earth materials and their phase transitions….
Mineral physics experiments involving very high temperatures and pressures invariable forces us to deal with large spatial and temporal gradients of pressure, temperature and chemical composition. High-spatial or temporal resolution is therefore needed to explore these inhomegenities. Fast detectors and bright X-rays thus allow us to collect spatially / and or temporally highly resolved maps of our sample…..
Going beyond diffraction, various flavors of tomographic techniques allow now to create 3-dimensional images of samples in- and ex-situ, if needed even with chemical or phase selectivity. Such experiments …..
This solution works fairly well with medium-sized datasets of up to 10000 frames; With larger data volumes and/or tricky data, data analysis even on a 48 CPU cluster can take much more than the data collection