"Some Reflections on Data in the Public Sector" : Communia: The European Thematic Network on the Digital Public Domain, London School of Economics, London, UK March 26-27, 2009
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
The Open Science Data Cloud is a petabyte scale science cloud for managing, analyzing, and sharing large datasets. We give an overview of the Open Science Data Cloud and how it can be used for data science research.
Architectures for Data Commons (XLDB 15 Lightning Talk)Robert Grossman
These are the slides from a 5 minute Lightning Talk that I gave at XLDB 2015 on May 19, 2015 at Stanford. It is based in part on our experiences developing the NCI Genomic Data Commons (GDC).
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
The Open Science Data Cloud is a petabyte scale science cloud for managing, analyzing, and sharing large datasets. We give an overview of the Open Science Data Cloud and how it can be used for data science research.
Architectures for Data Commons (XLDB 15 Lightning Talk)Robert Grossman
These are the slides from a 5 minute Lightning Talk that I gave at XLDB 2015 on May 19, 2015 at Stanford. It is based in part on our experiences developing the NCI Genomic Data Commons (GDC).
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
The Matsu Project is an Open Cloud Consortium project that is developing open source software for processing satellite imagery data using Hadoop, OpenStack and R.
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
Multiple Regression Analysis and Covid-19 policy is the contemporary agenda. It demonstrates how to use Python to do data wrangler, to use R to do statistical analysis, and is enable to publish in standard academic journal. The model will explain whether lockdown policy is relevant to control Covid-19 outbreak? It cinc
This talk will examine issues of workflow execution, in particular using the Pegasus Workflow Management System, on distributed resources and how these resources can be provisioned ahead of the workflow execution. Pegasus was designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target cyberinfrastructure. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. In some cases, it is beneficial to provision the resources ahead of the workflow execution, enabling the re-use of resources across workflow tasks. The talk will examine the benefits of resource provisioning for workflow execution.
Lehnert: Making Small Data Big, IACS, April2015Kerstin Lehnert
Seminar presentation at the Institute for Advanced Computational Science at Stony Brook University, April 9, 2015, describing achievements and challenges of data infrastructure in a long-tail science domain with the example of geochemistry.
A talk at the RPI-NSF Workshop on Multiscale Modeling of Complex Data, September 12, 2011, Troy NY, USA.
We have made much progress over the past decade toward effectively
harnessing the collective power of IT resources distributed across the
globe. In fields such as high-energy physics, astronomy, and climate,
thousands benefit daily from tools that manage and analyze large
quantities of data produced and consumed by large collaborative teams.
But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that far more--ultimately
most?--researchers will soon require capabilities not so different from those used by these big-science teams. How is the general population of researchers and institutions to meet these needs? Must every lab be filled
with computers loaded with sophisticated software, and every researcher become an information technology (IT) specialist? Can we possibly afford to equip our labs in this way, and where would we find the experts to operate them?
Consumers and businesses face similar challenges, and industry has
responded by moving IT out of homes and offices to so-called cloud providers (e.g., GMail, Google Docs, Salesforce), slashing costs and complexity. I suggest that by similarly moving research IT out of the lab, we can realize comparable economies of scale and reductions in complexity. More importantly, we can free researchers from the burden of managing IT, giving them back their time to focus on research and empowering them to go beyond the scope of what was previously possible.
I describe work we are doing at the Computation Institute to realize this approach, focusing initially on research data lifecycle management. I present promising results obtained to date and suggest a path towards
large-scale delivery of these capabilities.
Reusable Software and Open Data To Optimize AgricultureDavid LeBauer
Abstract:
Humans need a secure and sustainable food supply, and science can help. We have an opportunity to transform agriculture by combining knowledge of organisms and ecosystems to engineer ecosystems that sustainably produce food, fuel, and other services. The challenge is that the information we have. Measurements, theories, and laws found in publications, notebooks, measurements, software, and human brains are difficult to combine. We homogenize, encode, and automate the synthesis of data and mechanistic understanding in a way that links understanding at different scales and across domains. This allows extrapolation, prediction, and assessment. Reusable components allow automated construction of new knowledge that can be used to assess, predict, and optimize agro-ecosystems.
Developing reusable software and open-access databases is hard, and examples will illustrate how we use the Predictive Ecosystem Analyzer (PEcAn, pecanproject.org), the Biofuel Ecophysiological Traits and Yields database (BETYdb, betydb.org), and ecophysiological crop models to predict crop yield, decide which crops to plant, and which traits can be selected for the next generation of data driven crop improvement. A next step is to automate the use of sensors mounted on robots, drones, and tractors to assess plants in the field. The TERRA Reference Phenotyping Platform (TERRA-Ref, terraref.github.io) will provide an open access database and computing platform on which researchers can use and develop tools that use sensor data to assess and manage agricultural and other terrestrial ecosystems.
TERRA-Ref will adopt existing standards and develop modular software components and common interfaces, in collaboration with researchers from iPlant, NEON, AgMIP, USDA, rOpenSci, ARPA-E, many scientists and industry partners. Our goal is to advance science by enabling efficient use, reuse, exchange, and creation of knowledge.
---
Invited talk for the "Informatics for Reproducibility in Earth and Environmental Science Research" session at the American Geophysical Union Fall Meeting, Dec 17 2015.
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
The Matsu Project is an Open Cloud Consortium project that is developing open source software for processing satellite imagery data using Hadoop, OpenStack and R.
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
Multiple Regression Analysis and Covid-19 policy is the contemporary agenda. It demonstrates how to use Python to do data wrangler, to use R to do statistical analysis, and is enable to publish in standard academic journal. The model will explain whether lockdown policy is relevant to control Covid-19 outbreak? It cinc
This talk will examine issues of workflow execution, in particular using the Pegasus Workflow Management System, on distributed resources and how these resources can be provisioned ahead of the workflow execution. Pegasus was designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target cyberinfrastructure. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. In some cases, it is beneficial to provision the resources ahead of the workflow execution, enabling the re-use of resources across workflow tasks. The talk will examine the benefits of resource provisioning for workflow execution.
Lehnert: Making Small Data Big, IACS, April2015Kerstin Lehnert
Seminar presentation at the Institute for Advanced Computational Science at Stony Brook University, April 9, 2015, describing achievements and challenges of data infrastructure in a long-tail science domain with the example of geochemistry.
A talk at the RPI-NSF Workshop on Multiscale Modeling of Complex Data, September 12, 2011, Troy NY, USA.
We have made much progress over the past decade toward effectively
harnessing the collective power of IT resources distributed across the
globe. In fields such as high-energy physics, astronomy, and climate,
thousands benefit daily from tools that manage and analyze large
quantities of data produced and consumed by large collaborative teams.
But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that far more--ultimately
most?--researchers will soon require capabilities not so different from those used by these big-science teams. How is the general population of researchers and institutions to meet these needs? Must every lab be filled
with computers loaded with sophisticated software, and every researcher become an information technology (IT) specialist? Can we possibly afford to equip our labs in this way, and where would we find the experts to operate them?
Consumers and businesses face similar challenges, and industry has
responded by moving IT out of homes and offices to so-called cloud providers (e.g., GMail, Google Docs, Salesforce), slashing costs and complexity. I suggest that by similarly moving research IT out of the lab, we can realize comparable economies of scale and reductions in complexity. More importantly, we can free researchers from the burden of managing IT, giving them back their time to focus on research and empowering them to go beyond the scope of what was previously possible.
I describe work we are doing at the Computation Institute to realize this approach, focusing initially on research data lifecycle management. I present promising results obtained to date and suggest a path towards
large-scale delivery of these capabilities.
Reusable Software and Open Data To Optimize AgricultureDavid LeBauer
Abstract:
Humans need a secure and sustainable food supply, and science can help. We have an opportunity to transform agriculture by combining knowledge of organisms and ecosystems to engineer ecosystems that sustainably produce food, fuel, and other services. The challenge is that the information we have. Measurements, theories, and laws found in publications, notebooks, measurements, software, and human brains are difficult to combine. We homogenize, encode, and automate the synthesis of data and mechanistic understanding in a way that links understanding at different scales and across domains. This allows extrapolation, prediction, and assessment. Reusable components allow automated construction of new knowledge that can be used to assess, predict, and optimize agro-ecosystems.
Developing reusable software and open-access databases is hard, and examples will illustrate how we use the Predictive Ecosystem Analyzer (PEcAn, pecanproject.org), the Biofuel Ecophysiological Traits and Yields database (BETYdb, betydb.org), and ecophysiological crop models to predict crop yield, decide which crops to plant, and which traits can be selected for the next generation of data driven crop improvement. A next step is to automate the use of sensors mounted on robots, drones, and tractors to assess plants in the field. The TERRA Reference Phenotyping Platform (TERRA-Ref, terraref.github.io) will provide an open access database and computing platform on which researchers can use and develop tools that use sensor data to assess and manage agricultural and other terrestrial ecosystems.
TERRA-Ref will adopt existing standards and develop modular software components and common interfaces, in collaboration with researchers from iPlant, NEON, AgMIP, USDA, rOpenSci, ARPA-E, many scientists and industry partners. Our goal is to advance science by enabling efficient use, reuse, exchange, and creation of knowledge.
---
Invited talk for the "Informatics for Reproducibility in Earth and Environmental Science Research" session at the American Geophysical Union Fall Meeting, Dec 17 2015.
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Similar to "Some Reflections on Data in the Public Sector" : Communia: The European Thematic Network on the Digital Public Domain, London School of Economics, London, UK March 26-27, 2009
Toward a Global Interactive Earth Observing CyberinfrastructureLarry Smarr
05.01.12
Invited Talk to the 21st International Conference on Interactive Information Processing Systems (IIPS) for Meteorology, Oceanography, and Hydrology Held at the 85th AMS Annual Meeting
Title: Toward a Global Interactive Earth Observing Cyberinfrastructure
San Diego, CA
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
11.12.12
Seminar Presentation
Princeton Institute for Computational Science and Engineering (PICSciE)
Princeton University
Title: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research
Princeton, NJ
Dr. Frank Wuerthwein from the University of California at San Diego presentation at International Super Computing Conference on Big Data, 2013, US Until recently, the large CERN experiments, ATLAS and CMS, owned and controlled the computing infrastructure they operated on in the US, and accessed data only when it was locally available on the hardware they operated. However, Würthwein explains, with data-taking rates set to increase dramatically by the end of LS1 in 2015, the current operational model is no longer viable to satisfy peak processing needs. Instead, he argues, large-scale processing centers need to be created dynamically to cope with spikes in demand. To this end, Würthwein and colleagues carried out a successful proof-of-concept study, in which the Gordon Supercomputer at the San Diego Supercomputer Center was dynamically and seamlessly integrated into the CMS production system to process a 125-terabyte data set.
In this video from the HPC User Forum at Argonne, Dr. Brett Bode from NCSA presents: Research on Blue Waters.
"Blue Waters is one of the most powerful supercomputers in the world and is one of the fastest supercomputers on a university campus. Scientists and engineers across the country use the computing and data power of Blue Waters to tackle a wide range of challenging problems, from predicting the behavior of complex biological systems to simulating the evolution of the cosmos."
Watch the video: https://wp.me/p3RLHQ-kYx
Learn more: http://www.ncsa.illinois.edu/enabling/bluewaters
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
From a keynote talk given at the ESIP Winter 2011 meetings in Washington, DC Talk was titled Some Reflections on "The Burden of Proof: Data as Evidence"
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...EarthCube
Talk at the EarthCube End-User Domain Workshop for Rock Deformation and Mineral Physics Research.
By Martin Kunz, Lawrence Berkeley National Laboratory
Making Small Data BIG (UT Austin, March 2016)Kerstin Lehnert
Presentation given at the Texas Advanced Computing Center. It describes the potential of re-using small data for new science, achievements and the challenges to make small data re-usable.
In this video from the DDN User Group Meeting at SC14, Steve Simms from Indiana University presents: Indiana University's Data Capacitor II.
"The High Performance File Systems unit of UITSResearch Technologies operates two separate high-speed file systems for temporary storage of research data. Both use the open sourceLustre parallel distributed file system running on a version of theLinux operating system: Data Capacitor II (DC2) is a larger, faster replacement for the former Data Capacitor, which was decommissioned January 7, 2014. Like its predecessor, DC2 is a large-capacity, high-throughput, high-bandwidth Lustre-based file system serving all IU campuses. It is mounted on the Big Red II, Karst,Quarry, and Mason research computing systems."
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...Larry Smarr
08.06.16
Invited Talk
Association of University Research Parks BioParks 2008
"From Discovery to Innovation"
Salk Institute
Title: High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Research Instruments
La Jolla, CA
Cool big data tools - DeepDive (Stanford) and Apache Spark as well as fundamentals of big data
Similar to "Some Reflections on Data in the Public Sector" : Communia: The European Thematic Network on the Digital Public Domain, London School of Economics, London, UK March 26-27, 2009 (20)
Ecological Society of America Science CommonsTom Moritz
Ecological Society of America
"Obstacles to Data Sharing in Ecology"
(NSF Workshop)
National Evolutionary Synthesis Center
Durham, North Carolina
May 30, 2007
Science and the limits of our current regime for intellectual property.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Digital Artifact 2 - Investigating Pavilion Designs
"Some Reflections on Data in the Public Sector" : Communia: The European Thematic Network on the Digital Public Domain, London School of Economics, London, UK March 26-27, 2009
1. Some reflections on data in the public sector… Tom Moritz Internet Archive London School of Economics London March 26-27, 2009
2.
3. Internet Archive: http://www.archive.org/stream/pragmatismnewnam00jame Note Date of Publiction: 1922
4.
5.
6. Data is now more than ever available in highly diverse formats from very disparate sources: Validation of data and critical awareness and analysis of data sources is essential.
7.
8. NCAR Research Data Archive (RDA ) C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge sharing ,” from the 4th International Digital Curation Conference December 2008 , page 7. www.dcc.ac.uk/events/dcc-2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
9.
10.
11. The $3.6 billion Large Hadron Collider (LHC) will sample and record the results of up to 600 million proton collisions per second , producing roughly 15 petabytes (15 million gigabytes) of data annually in search of new fundamental particles. To allow thousands of scientists from around the globe to collaborate on the analysis of these data over the next 15 years (the estimated lifetime of the LHC), tens of thousands of computers located around the world are being harnessed in a distributed computing network called the Grid. Within the Grid, described as the most powerful supercomputer system in the world, the avalanche of data will be analyzed, shared, re-purposed and combined in innovative new ways designed to reveal the secrets of the fundamental properties of matter. LHC source: http://public.web.cern.ch/public/en/LHC/LHC-en.html Source: http://public.web.cern.ch/Public/en/LHC/LHC-en.html
12. Individual Libraries Cooperative Projects National Disciplinary Initiatives “ BIG Science” “ Small Science” Local / Personal Archiving International Collaborative Research Effort Individuals Data Centers GRIDS
21. Stages of Digital Library Development Howard Besser. Adapted from The Next Stage: Moving from Isolated Digital Collections to Interoperable Digital Libraries by First Monday, volume 7, number 6 (June 2002), URL: http://firstmonday.org/issues/issue7_6/besser/index.html Stage Date Sponsor Purpose I: Experimental 1994 NSF/ARPA/NASA Experiments on collections of digital materials II: Developing 1998/1999 NSF/ARPA/NASA, DLF/CLIR Begin to consider custodianship, sustainability, user communities III: Mature ? Funded through normal channels? Real sustainable interoperable digital libraries
22. THE ROLE OF SCIENTIFIC AND TECHNICAL DATA AND INFORMATION IN THE PUBLIC DOMAIN PROCEEDINGS OF A SYMPOSIUM Julie M. Esanu and Paul F. Uhlir, Editors Steering Committee on the Role of Scientific and Technical Data and Information in the Public Domain Office of International Scientific and Technical Information Programs Board on International Scientific Organizations Policy and Global Affairs Division, National Research Council of the National Academies, p. 5 “ Research Commons” The Public Domain Knowledge Commons “ the institutional ecology of the digital environment” (Yokai Benkler)
23. References to “ Intellectual Property ” in U.S. federal cases “ Professor Hank Greely” Cited in Lessig, L. The future of ideas: the fate of the commons in a connected world. NY, Random House, 2001. P. 294.
24. Julian Birkinshaw and Tony Sheehan, “Managing the Knowledge Life Cycle,” MIT Sloan Management Review, 44 (2) Fall, 2002: 77. ??? Is scientific knowledge a “commodity” ???
25.
26. [ Metadata: Strikes and lockouts -- Motion picture industry; Walt Disney Productions; Disney characters; Mickey Mouse; Motion picture industry -- Employees -- Labor unions; American Federation of Labor; Animators; Brotherhood of Painters, Decorators, and Paperhangers of America.; Screen Cartoonists Local Union No. 852 (Hollywood, Calif.); Animation Guild and Affiliated Optical Electronic and Graphic Arts, Local 839 I.A.T.S.E. (North Hollywood, Los Angeles, Calif.); Motion Pictures Screen Cartoonists Local 839, I.A.T.S.E. ] Flier from 1941 cartoonists strike at Disney Studios “ Mickey Mouse wears an AFL (American Federation of Labor) button and carries a placards that reads "Disney UNFAIR." Bottom edge reads ‘Printed by Disney Strikers on Offset Duplicator. Hand made Stencil’ “ http://digitallibrary.csun.edu/cdm4/results.php?CISOOP1=any&CISOBOX1=Disney&CISOFIELD1=CISOSEARCHALL&CISOROOT=all&submit=search Cal State Univ Northridge
36. ALL knowledge? Or perhaps, an ethical spectrum ? – the polemics of support for the Science Knowledge Commons Human Health Agriculture Earth Science/Conservation [ Nuclear Technology ] [Biotech nology] Education Science-Tech
41. objet trouvé – gutter, 10 th & Colorado, Santa Monica , California Preservation
42.
43. Wednesday, January 21st, 2009 at 12:00 am Freedom of Information Act MEMORANDUM FOR THE HEADS OF EXECUTIVE DEPARTMENTS AND AGENCIES SUBJECT: Freedom of Information Act A democracy requires accountability, and accountability requires transparency. As Justice Louis Brandeis wrote, "sunlight is said to be the best of disinfectants." In our democracy, the Freedom of Information Act (FOIA), which encourages accountability through transparency, is the most prominent expression of a profound national commitment to ensuring an open Government. At the heart of that commitment is the idea that accountability is in the interest of the Government and the citizenry alike. The Freedom of Information Act should be administered with a clear presumption: In the face of doubt, openness prevails. The Government should not keep information confidential merely because public officials might be embarrassed by disclosure, because errors and failures might be revealed, or because of speculative or abstract fears. Nondisclosure should never be based on an effort to protect the personal interests of Government officials at the expense of those they are supposed to serve. In responding to requests under the FOIA, executive branch agencies (agencies) should act promptly and in a spirit of cooperation, recognizing that such agencies are servants of the public. All agencies should adopt a presumption in favor of disclosure , in order to renew their commitment to the principles embodied in FOIA, and to usher in a new era of open Government. The presumption of disclosure should be applied to all decisions involving FOIA. http://www.whitehouse.gov/the_press_office/Freedom_of_Information_Act/