Keynote presentation at the HPC User Forum 2012 in Darborn, MI, September 19, 2012. http://www.hpcuserforum.com/registration/dearborn2012/dearbornagenda.pdf
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
The Matsu Project is an Open Cloud Consortium project that is developing open source software for processing satellite imagery data using Hadoop, OpenStack and R.
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
The Open Science Data Cloud is a petabyte scale science cloud for managing, analyzing, and sharing large datasets. We give an overview of the Open Science Data Cloud and how it can be used for data science research.
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
The Matsu Project is an Open Cloud Consortium project that is developing open source software for processing satellite imagery data using Hadoop, OpenStack and R.
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
The Open Science Data Cloud is a petabyte scale science cloud for managing, analyzing, and sharing large datasets. We give an overview of the Open Science Data Cloud and how it can be used for data science research.
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...Larry Smarr
05.02.04
Invited Talk to the NASA Jet Propulsion Laboratory
Title: LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks and High Resolution Visualizations
Pasadena, CA
Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Pro...Larry Smarr
05.05.03
Presentation to 3rd Annual GEON Meeting
Bahia Resort
Title: Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Projects
San Diego, CA
Calit2-a Persistent UCSD/UCI Framework for CollaborationLarry Smarr
05.02.16
Invited Talk
Sun Microsystems Global Education and Research
Conference 2005
Title: Calit2-a Persistent UCSD/UCI Framework for Collaboration
San Francisco, CA
Positioning University of California Information Technology for the Future: S...Larry Smarr
05.02.15
Invited Talk
The Vice Chancellor of Research and Chief Information Officer Summit
“Information Technology Enabling Research at the University of California”
Title: Positioning University of California Information Technology for the Future: State, National, and International IT Infrastructure Trends and Directions
Oakland, CA
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...inside-BigData.com
Ellen Salmon from NASA gave this talk at the 2017 MSST conference. "This talk will describe recent developments at the NASA Center for Climate Simulation, which is funded by NASA’s Science Mission Directorate, and supports the specialized data storage and computational needs of weather, ocean, and climate researchers, as well as astrophysicists, heliophysicists, and planetary scientists. To meet requirements for higher-resolution, higher-fidelity simulations, the NCCS augments its High Performance Computing and storage/retrieval environment. As the petabytes of model and observational data grow, the NCCS is broadening data services offerings and deploying and expanding virtualization resources for high performance analytics."
Watch the video: http://wp.me/p3RLHQ-gPj
Learn more: https://www.nccs.nasa.gov/
and
http://storageconference.us/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Digital Science: Reproducibility and Visibility in AstronomyJose Enrique Ruiz
The science done in Astronomy is digital science, from observing proposals to final publication, to data and software used: each of the elements and actions involved in scientific output could be recorded in electronic form. This fact does not prevent the final outcome of an experiment is still difficult to reproduce. This procedure can be long, tedious, not easily accessible or understandable, even to the author. At the same time, we have a rich infrastructure of files, observational data and publications. This could be used more efficiently if we reach greater visibility of the scientific production, which avoids duplication of effort and reinvention.
Reproducibility is a cornerstone in scientific method, and extraction of relevant information in the current and future data flood is key in Astronomy. The AMIGA group (Analysis of the interstellar Medium of Isolated GAlaxies, IAA-CSIC, http://amiga.iaa.es) faces these two challenges in the European project "Wf4Ever: Advanced technologies for enhanced preservation workflow Science" to enable the preservation of the methodology in scalable semantic repositories to facilitate their discovery, access, inspection, exploitation and distribution. These repositories store the experiments on "Research Objects" whose main constituents are digital scientific workflows. These provide a comprehensive view and clear scientific interpretation of the experiment as well as the automation of the method, going beyond the usual pipelines that normally end up in data processing.
The quantitative leap in volume and complexity of the next generation of archives will need analysis and data mining tasks to live closer to the data, in computing and distributed storage environments, but they should also be modular enough to allow customization from scientists and be easily accessible to foster their dissemination among the community. Astronomy is a collaborative science, but it has also become highly specialized, as many other disciplines. Sharing, preservation, discovery and a much simplified access to resources in the composition of scientific workflows will enable astronomers to greatly benefit from each other’s highly specialized knowhow, they constitute a way to push Astronomy to share and publish not only results and data, but also processes and methodologies.
We will show how the use of scientific workflows can help to improve the reproducibility of the experiment and a more efficient exploitation of astronomical archives, as well as the visibility of the scientific methodology and its reuse.
Keynote talk at the 3rd International Conference on Supercomputing in Mexico: www.isum.mx. A great group of people!
A substantially revised version of a talk with the same title given on previous occasions.
The science performed in Astronomy is digital science, from observing proposals to final publication, including data and software used: each of the elements and actions involved in the scientific output could be recorded in electronic form.
This fact does not prevent the final outcome of an experiment is still difficult to reproduce. An exhaustive process of documentation can be long, tedious, where access to all the resources must be granted, and after all, the repeatability of results is not even guaranteed. At the same time, we have access to a wealth of files, observational data and publications that could be used more efficiently with a better visibility of the scientific production, avoiding duplication of effort and reinvention.
How e-Science tools are needed for the new data intensive science, specifically targeted to the Square Kilometre Array. Talk given at the Special Symposium 15 on Data Intensive Astronomy, held during the General Assembly Meeting of the International Astronomical Union in Bejing, 2012.
The Square Kilometre Array is currently undergoing the Preliminary Design Reviews for its composing elements, and is thus at a critical point on its way to becoming ready for construction starting in 2018. In this talk we will provide an overview of the SKA, its composing elements, and their status, with emphasis on the Telescope Manager and the Science Data Processor, respectively the Monitoring & Control system, and Pipeline. We will see how do they compare with their ALMA equivalents, and how is the SKA similar/different from ALMA.
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...Larry Smarr
05.02.04
Invited Talk to the NASA Jet Propulsion Laboratory
Title: LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks and High Resolution Visualizations
Pasadena, CA
Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Pro...Larry Smarr
05.05.03
Presentation to 3rd Annual GEON Meeting
Bahia Resort
Title: Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Projects
San Diego, CA
Calit2-a Persistent UCSD/UCI Framework for CollaborationLarry Smarr
05.02.16
Invited Talk
Sun Microsystems Global Education and Research
Conference 2005
Title: Calit2-a Persistent UCSD/UCI Framework for Collaboration
San Francisco, CA
Positioning University of California Information Technology for the Future: S...Larry Smarr
05.02.15
Invited Talk
The Vice Chancellor of Research and Chief Information Officer Summit
“Information Technology Enabling Research at the University of California”
Title: Positioning University of California Information Technology for the Future: State, National, and International IT Infrastructure Trends and Directions
Oakland, CA
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...inside-BigData.com
Ellen Salmon from NASA gave this talk at the 2017 MSST conference. "This talk will describe recent developments at the NASA Center for Climate Simulation, which is funded by NASA’s Science Mission Directorate, and supports the specialized data storage and computational needs of weather, ocean, and climate researchers, as well as astrophysicists, heliophysicists, and planetary scientists. To meet requirements for higher-resolution, higher-fidelity simulations, the NCCS augments its High Performance Computing and storage/retrieval environment. As the petabytes of model and observational data grow, the NCCS is broadening data services offerings and deploying and expanding virtualization resources for high performance analytics."
Watch the video: http://wp.me/p3RLHQ-gPj
Learn more: https://www.nccs.nasa.gov/
and
http://storageconference.us/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Digital Science: Reproducibility and Visibility in AstronomyJose Enrique Ruiz
The science done in Astronomy is digital science, from observing proposals to final publication, to data and software used: each of the elements and actions involved in scientific output could be recorded in electronic form. This fact does not prevent the final outcome of an experiment is still difficult to reproduce. This procedure can be long, tedious, not easily accessible or understandable, even to the author. At the same time, we have a rich infrastructure of files, observational data and publications. This could be used more efficiently if we reach greater visibility of the scientific production, which avoids duplication of effort and reinvention.
Reproducibility is a cornerstone in scientific method, and extraction of relevant information in the current and future data flood is key in Astronomy. The AMIGA group (Analysis of the interstellar Medium of Isolated GAlaxies, IAA-CSIC, http://amiga.iaa.es) faces these two challenges in the European project "Wf4Ever: Advanced technologies for enhanced preservation workflow Science" to enable the preservation of the methodology in scalable semantic repositories to facilitate their discovery, access, inspection, exploitation and distribution. These repositories store the experiments on "Research Objects" whose main constituents are digital scientific workflows. These provide a comprehensive view and clear scientific interpretation of the experiment as well as the automation of the method, going beyond the usual pipelines that normally end up in data processing.
The quantitative leap in volume and complexity of the next generation of archives will need analysis and data mining tasks to live closer to the data, in computing and distributed storage environments, but they should also be modular enough to allow customization from scientists and be easily accessible to foster their dissemination among the community. Astronomy is a collaborative science, but it has also become highly specialized, as many other disciplines. Sharing, preservation, discovery and a much simplified access to resources in the composition of scientific workflows will enable astronomers to greatly benefit from each other’s highly specialized knowhow, they constitute a way to push Astronomy to share and publish not only results and data, but also processes and methodologies.
We will show how the use of scientific workflows can help to improve the reproducibility of the experiment and a more efficient exploitation of astronomical archives, as well as the visibility of the scientific methodology and its reuse.
Keynote talk at the 3rd International Conference on Supercomputing in Mexico: www.isum.mx. A great group of people!
A substantially revised version of a talk with the same title given on previous occasions.
The science performed in Astronomy is digital science, from observing proposals to final publication, including data and software used: each of the elements and actions involved in the scientific output could be recorded in electronic form.
This fact does not prevent the final outcome of an experiment is still difficult to reproduce. An exhaustive process of documentation can be long, tedious, where access to all the resources must be granted, and after all, the repeatability of results is not even guaranteed. At the same time, we have access to a wealth of files, observational data and publications that could be used more efficiently with a better visibility of the scientific production, avoiding duplication of effort and reinvention.
How e-Science tools are needed for the new data intensive science, specifically targeted to the Square Kilometre Array. Talk given at the Special Symposium 15 on Data Intensive Astronomy, held during the General Assembly Meeting of the International Astronomical Union in Bejing, 2012.
The Square Kilometre Array is currently undergoing the Preliminary Design Reviews for its composing elements, and is thus at a critical point on its way to becoming ready for construction starting in 2018. In this talk we will provide an overview of the SKA, its composing elements, and their status, with emphasis on the Telescope Manager and the Science Data Processor, respectively the Monitoring & Control system, and Pipeline. We will see how do they compare with their ALMA equivalents, and how is the SKA similar/different from ALMA.
Yapı Kredi Kültür'de,
26 04 2007'de yapılan sunum -Türkiye için “Köşeyi Dönme” seçeneği:
Türkiye’nin Bilgi Toplumu’na evrilme şansı var mı?
Köstekler, Destekler ve Politikalar, Fırsatlar ve
Şanslar.
Bugün neredeyiz ve neleri aşmalıyız?
Wengines, Workflows, and 2 years of advanced data processing in Apache OODTChris Mattmann
With the advent of OODT-215 and OODT-491, there has been a tremendous amount of work to port our next generation Workflow Management system (cutely dubbed "WEngine" for "workflow engine") from an isolated branch into the mainline trunk.
The WEngine system brings amazing advantages including explicit support for branch and bounds in workflow models; prioritized thread pooling and queueing on a per task, and per workflow level; global workflow level conditions (pre and post); condition and workflow timeouts, and an entirely new and more descriptive state model complete with failure codes, and with checkpointing.
WEngine is currently processing the NPOESS Preparatory Project (NPP) PEATE testbed and its thousands of jobs per day, and is being slowly introduced into processing of an entire snow and ice climatology for the Western US and Alaska for the U.S. National Climate Assessment (NCA), working with the world's best snow hydrologists and snow scientists.
With all of those new features, what's an Apache OODT user and fan to do? How can you use WEngine in your system? How does it work today? How will it work tomorrow? We'll answer those questions and more in this fly-by-the-seat-of-your-pants exciting super talk!
• “Detecting radio-astronomical "Fast Radio Transient Events" via an OODT-based metadata processing pipeline”, Chris Mattmann, Andrew Hart , Luca Cinquini, David Thompson, Kiri Wagstaff, Shakeh Khudikyan. ApacheCon NA 2013, Februrary 2013
Opening Keynote Lecture
15th Annual ON*VECTOR International Photonics Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
February 29, 2016
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...Larry Smarr
10.10.11
Presentation by Larry Smarr to the NSF Campus Bridging Workshop
Title: A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political, and Economic
Anaheim, CA
Set My Data Free: High-Performance CI for Data-Intensive ResearchLarry Smarr
10.11.03
Keynote Speaker
Cyberinfrastructure Days
University of Michigan
Title: Set My Data Free: High-Performance CI for Data-Intensive Research
Ann Arbor, MI
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
11.12.12
Seminar Presentation
Princeton Institute for Computational Science and Engineering (PICSciE)
Princeton University
Title: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research
Princeton, NJ
Cyberinfrastructure to Support Ocean ObservatoriesLarry Smarr
05.03.18
Invited Talk to the Ocean Studies Board
National Research Council
Title: Cyberinfrastructure to Support Ocean Observatories
University of California San Diego
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Scalable Data Mining and Archiving in the Era of the Square Kilometre Array
1. Scalable Data Mining and Archiving in the
Era of the Square Kilometre Array
Chris A. Mattmann
Senior Computer Scientist, JPL/Caltech; Adjunct Assistant Professor, USC; Member,
Apache Software Foundation
2. Agenda
• JPL Radio Array Initiative: Scalable Data
Archiving
• Case Study: KAT-7 Archiving System/U.S.
Replicated Archive
• Case Study: NRAO Experience
• Future Work
20-Sep-12 HPCUF-MATTMANN-KN 2
3. And you are?
• Senior Computer Scientist at
NASA JPL in Pasadena, CA
USA
• Software
Architecture/Engineering
Prof at Univ. of Southern
California
• Apache Member involved in
– OODT (VP, PMC), Tika (VP,PMC), Nutch (PMC), Incubator (PMC),
SIS (Mentor), Gora (PMC), Airavata (Mentor), cTAKES (Mentor),
20-Sep-12 Any23 (Mentor) HPCUF-MATTMANN-KN 3
4. Square Kilometre Array (SKA)
• What is it?
– Next generation radio
astronomy instrument
that will be built jointly
by South Africa and
Australia to image the
sky like never before
• What’s the status?
– Currently in design
phase, will be built over
the next decade
• Why do you care
20-Sep-12
Credit: http://scienceray.com/astronomy/stunning-square-
kilometer-array/
HPCUF-MATTMANN-KN 4
5. Key Science for the SKA
SKA: Key Science
(a.k.a. m- and cm-( astronomy)
Emerging from the Dark Ages &
the Epoch of Reionization
Strong-field Tests of Gravity
with Pulsars and Black Holes
Galaxy Evolution, Cosmology, &
Dark Energy
The Cradle of Life & Astrobiology
Origin & Evolution of
Cosmic Magnetism
Credit: Joe Lazio, JPL
20-Sep-12 HPCUF-MATTMANN-KN 5
6. Radio Initiative: Archiving
• Initiative Lead: Dayton Jones; Champion: Robert Preston
• We will define the necessary data services and underlying
substrate to position JPL to compete for and lead “big data”
management efforts in astronomy, specifically, SKA, HERA,
SKA precursors, and NRAO.
• Perform prototyping and deployment to demonstrate JPL’s
leadership in the “big data” and astronomy space.
• Collaborate on Data Products and Algorithms from Adaptive
Data Processing task
• Establish partnerships with major SKA potential sites and pre-
cursor efforts (South Africa, Australia)
20-Sep-12 HPCUF-MATTMANN-KN 6
7. • The Big Picture
JPL “Big Data” Initiative
• Astronomy, Earth science, planetary science, life/physical
science all drowning in data
• Fundamental technologies and emerging techniques in
archiving and data science
•Largely center around open source communities and
• Research challenges (adapted from NSF)
• More data is being collected than we can store
• Many data sets are too large to download
• Many data sets are too poorly organized to be useful
• Many data sets are heterogeneous in type, structure
• Data utility is limited by our ability to use it
• My Focus: Big Data Archiving
• Research methods for integrating intelligent
algorithms for data triage, subsetting, summarization
• Construct technologies for smart data movement
• Evaluate cloud computing for storage/processing
• Construct data/metadata translators “Babel Fish”
20-Sep-12 HPCUF-MATTMANN-KN 7
8. Some “Big Data” Grand Challenges
• How do we handle 700 TB/sec of data coming off the wire when we
actually have to keep it around?
– Required by the Square Kilometre Array
• Joe scientist says I’ve got an IDL or Matlab algorithm that I will not
change and I need to run it on 10 years of data from the Colorado
River Basin and store and disseminate the output products
– Required by the Western Snow Hydrology project
• How do we compare petabytes of climate model output data in a
variety of formats (HDF, NetCDF, Grib, etc.) with petabytes of remote
sensing data to improve climate models for the next IPCC assessment?
– Required by the 5th IPCC assessment and the Earth System Grid and NASA
• How do we catalog all of NASA’s current planetary science data?
– Required by the NASA Planetary Data System
8
Image 20-Sep-12 HPCUF-MATTMANN-KN
Credit: http://www.jpl.nasa.gov/news/news.cfm?release=2011- Copyright 2012. Jet Propulsion Laboratory, California Institute of
Technology. US Government Sponsorship Acknowledged.
295
9. Technologies/Thrusts
– Framework for rapidly and unobtrusively
integrating science algorithms
– Research study identifying set of
appropriate/inappropriate data movement
technologies for
big data systems
– Cloud computing research study
spaceborne and airborne missions and
ground based sensors
– Extensions to JPL-led Apache Tika and Apache
OODT technologies to handle data formats by
modern big data systems
20-Sep-12 HPCUF-MATTMANN-KN 9
10. Apache OODT
• Entered “incubation” at the Apache
Software Foundation in 2010
• Selected as a top level Apache Software
Foundation project in January 2011
• Developed by a community of participants
from many companies, universities, and
organizations
• Used for a diverse set of science data
system activities in planetary science,
earth science, radio astronomy,
biomedicine, astrophysics, and more
http://oodt.apache.org
OODT Development & user community includes:
20-Sep-12 HPCUF-MATTMANN-KN 10
11. Apache OODT Press
To Host Open Source Summit -- Open Source -- Informati... http://www.informationweek.com/news/government/enterprise-app...
Welcome Guest. Log In Register Benefits
RSS Feeds
Subscribe
Newsletters
Events
Digital Library
Home News Blogs Video Slideshows
Software Security Hardware Mobility Windows Internet Global CIO Government Healthcare Financial SMB Personal Tech Cloud
Cloud Computing Information Management Mobile & Wireless Security
Enterprise Architecture Leadership Policy & Regulation State & Local
P erm alink
2 Like 2 Share
Share
NASA To Host Open Source Summit
The agency plans to bring experts together March 29-30 to discuss open source policy and how
NASA can better support the community.
By Elizabeth Montalbano InformationWeek
March 15, 2011 11:37 AM
20-Sep-12
NASA will continue its support of the open source community by hosting its
first-ever summit around the technology.
HPCUF-MATTMANN-KN 11
12. Why Apache and OODT?
• OODT is meant to be a set of tools to
help build data systems
– It’s not meant to be “turn key”
– It attempts to exploit the boundary
between bringing in capability vs.
being overly rigid in science
– Each discipline/project extends
• Apache is the elite open source
community for software developers
– Less than 100 projects have been
promoted to top level (Apache Web
Server, Tomcat, Solr, Hadoop)
– Differs from other open source
communities; it provides a
governance and management
structure
20-Sep-12 HPCUF-MATTMANN-KN 12
13. Governance Model: Merit
• NASA and other government
agencies require foundation/
structure
20-Sep-12 HPCUF-MATTMANN-KN 13
14. OODT Core Components
• All Core components implemented as web services
–
20-Sep-12 HPCUF-MATTMANN-KN 14
15. Getting back to SKA…
• U.S. National Science Foundation does not
highly prioritize SKA responding to Astro 2010
Decadal Survey: New Worlds,
New Horizons
• Taking a “wait 5 years and see”,
approach
• How do we work with South Africa
and other countries on SKA?
20-Sep-12 HPCUF-MATTMANN-KN 15
16. Case Study: KAT-7
• SKA Banff
Meeting, 2011
• Jasper Horrell and
Simon Ratcliffe
express need for US
Archive for MeerKAT
• Joe Lazio and
Chris Mattmann
decide it’s good fit for
NSF SI^2 – establish
U.S. based archive
Credit: Jasper Horrell, Simon Ratcliffe
20-Sep-12 HPCUF-MATTMANN-KN 16
17. MeerKAT International GigaHertz Tiered Extragalactic Exploration (MIGHTEE) Survey:
This project aims to construct radio luminosity functions, by conducting deep radio continuum
observations, in order to track over cosmic time the contribution of accretion (onto active
MeerKAT: the Science
galactic nuclei [AGN]) versus fusion (from star formation) to galaxy luminosities.
Table 1. Mapping between the Science Frontier Areas of the NWNH and MeerKAT Large Projects.
Science Frontier Area Question or Discovery Area MeerKAT Large Project
Gravitational wave astronomy Key Science Project on
Radio Pulsar Timing
Discovery Time-domain astronomy TRAPUM, ThunderKAT
Astrometry VLBI with MeerKAT
Epoch of Reionization MESMER
What were the first objects to
light up the Universe and when MESMER
did they do it?
Origins
What is the fossil record of Deep H I Field,
galaxy assembly and evolution MHONGOOSE, H I Survey
from the first stars to the present
of Fornax
Deep H I Field,
How do baryons cycle in and out MHONGOOSE, H I Survey
of galaxies … ? of Fornax, Absoption Line
Survey
How do black holes work and
MIGHTEE
influence their surroundings?
Understanding the
How do massive stars end their
Cosmic Order ThunderKAT, TRAPUM
lives?
Key Science Project on
How do rotation and magnetic
Radio Pulsar Timing,
fields affect stars? TRAPUM
What controls the mass-energy- MeerGAL
chemical cycles within galaxies?
Frontiers of Knowledge What controls the masses, spins, Key Science Project on
and radii of compact stellar Radio Pulsar Timing,
remnants? TRAPUM
U.S. Based MeerKAT projects and their relationship to Astro 2010 area
20-Sep-12 HPCUF-MATTMANN-KN 17
5
19. Establishing a U.S. archive is
good because…
• Bandwidth limitations in South Africa will make
sharing the MeerKAT data with U.S. PIs difficult
• JPL team has significant expertise in the study and
evaluation of selecting the best data movement
technologies for dissemination
– MSST 2006, Mattmann Dissertation, IEEE IT Pro 2011, SECLOUD 2011
• JPL team are the leaders of first NASA data
management technology (OODT) to be stewarded at
Apache, making software co-development with
South Africa amenable and tech transfer easy
– South Africans (Bennett) already embracing OODT
20-Sep-12 HPCUF-MATTMANN-KN 19
20. Figure 3). After successfully writing the initial HDF-5 file, it is augmented with relevant sensor
data that is collected from the KAT sensor data store.
Once the augmentation process is complete, the HDF-5 file is made available to the OODT
Early prototype: KAT-7
Crawler daemon. The OODT Crawler is a high powered XML-RPC-based component that uses
Apache Tika [C. Mattmann and Zitting, 2011] and its MIME detection capabilities to identify the
HDF-5 file in the staging directory. MIME detection can be based on regular expressions; digital
Figure 3. Prototype deployment of Apache OODT for the KAT-7 array.
Credit: Tom Bennett, SKA; Chris Mattmann, JPL 9
20-Sep-12 HPCUF-MATTMANN-KN 20
21. Establishing the U.S. archive
Credit: Tom Bennett, SKA; Chris Mattmann, JPL
• Deploy OODT at JPL
• Data movement from Cape Town to the U.S. (researching data movement
approaches)
• Rapidly and easily stand up archive technology in the US and disseminate
• Collaborate on updates with Cape Town
20-Sep-12 HPCUF-MATTMANN-KN 21
22. Synergistic efforts in Astronomy
• November 2010: JPL Hosts Peter Quinn and Andreas Wicenic
– Presents data management strategy to ICRAR and discusses potential
opportunities for collaboration
• December 2010: JPL presents to Duncall from SKA Program Office on data
management work packages
• April 2011: JPL works with NRAO EVLA to develop OODT prototype (Bryan
Butler) – next portion of talk will highlight this
• September 2011: JPL presents at NRAO ALMA Science Software Leads
workshop on OODT: NRAO desires to leverage OODT in pipeline
• January 2012: JPL Co-PI on MIT Haystack Observatory NSF MRI proposal to
support reconfigurable, adaptable array (RAPID): Colin Lonsdale
• February 2012: JPL reaches out to Murchison Widefield Array (MWA)
scientists: Melanie Johnston-Hollitt and Lisa Harvey-Smith
• February 2012: JPL re-engages NRAO: NRAO EVLA collaboration continues
20-Sep-12 HPCUF-MATTMANN-KN 22
23. U.S. National Radio Astronomy
Observatory (NRAO)
• Explore JPL data system expertise
– Leverage Apache OODT
– Leverage architecture experience
– Build on NRAO Socorro F2F given in April 2011 and
Innovations in Data-Intensive Astronomy meeting in
May 2011
• Define achievable prototype
– Focus on EVLA summer school pipeline
• Heavy focus on CASApy, simple pipelining, metadata
extraction, archiving of directory-based products
• Ideal for OODT system
20-Sep-12 HPCUF-MATTMANN-KN 23
24. Architecture
day2_TDEM0003_10s_norx
EVLA
day2_TDEM0003_10s_norx
WWW
Staging
Area
products,
CAS Data
Services
metadata
Crawler Browser
Science
system
Services
status
PCS
Curator FM
proc Data System
Legend: rep cat status
Operator
data flow
Apache
OODT control flow W
Cub WM
Monitor
data
Disk Area /met
ska-dc.jpl.nasa.gov
20-Sep-12 HPCUF-MATTMANN-KN 24
evlascube event
25. Demonstration Use Case
• Run EVLA Spectral Line Cube generation
– First step is ingest EVLARawDataOutput from Joe
– Then fire off evlascube event
– Workflow manager writes CASApy script dynamically
• Via CAS-PGE
– CAS-PGE starts CASApy
– CASApy generates Cal tables and 2 Spectral Line Cube
Images
– CAS-PGE ingests them into the File Manager
• Gravy: UIs,Cmd Line Tools, Services
20-Sep-12 HPCUF-MATTMANN-KN 25
27. NRAO and EVLA
• Extended Very Large Array has deployed
Apache OODT for its data reduction pipeline
• Working to enable more portions of Apache
OODT (currently only using Workflow
Manager)
• Evaluating Apache OODT for data ingestion
and archiving
20-Sep-12 HPCUF-MATTMANN-KN 27
28. Wrap-up
• JPL’s efforts in Scalable Data Mining for Big
Data and the SKA
• Successful collaboration with SKA South Africa
and NRAO
• Open Source Big data management
framework from Apache (“Apache OODT”)
20-Sep-12 HPCUF-MATTMANN-KN 28