Tufts Spatial Data Rescue: Crawling at-risk Government Data

•Download as PPTX, PDF•

1 like•85 views

Much U. S. federal data are perceived as being at risk of becoming inaccessible through lack of maintenance and funding shifts. Organizations such as the End of Term Project and Data Rescue have emerged to coordinate the backup and rescue of at-risk federal data. Tufts University conducted a curated harvest to back up potentially at-risk federal, environmental and social justice geospatial data and associated tabular data. Tufts ongoing harvest has recovered over 40 TB of data. Tufts developed the Crawler to crawl all data: unzip files, identify file types, sizes, local directories, and harvest and process all related metadata using data mining and natural language processing (NLP) techniques. The Crawler results in detailed collection level and layer level metadata analytics for assessment, search and discovery.

Data & Analytics

Tufts Spatial Data Rescue:
Crawling at-risk Government
Data
Kyle Monahan
Statistics and Research Technology Specialist
Tufts University
FOSS4G | Boston, MA | 8/17/2017

Background
•What is a data rescue?
• Methods and techniques to identify, store and preserve
datasets
• Predominantly data associated with government entities
• *.gov, *.mil, *.edu, *.org, etc.
• Especially critical during election transitions
12/23/2017 FOSS4G Conference | Boston, MA 2

Example: 2008 End of Term Harvest
12/23/2017 FOSS4G Conference | Boston, MA 4
• National Archives and Records Administration announced they
would be unable to rescue data as they did in 2004.
• International Internet Preservation Consortium (IIPC) responded
by organizing a crawl:
• California Digital Library
• Internet Archive
• Government Printing Office
• Library of Congress
• University of North Texas
• Goal: “comprehensive harvest” (EOTerm Archive, 2016)

Example: 2008 End of Term Harvest
12/23/2017 FOSS4G Conference | Boston, MA 5
• Consisted of three main crawls:
• Pre-election
• Post-election
• Post-inauguration
• Produced over 16 TB of data
• And 160,211,356 URIs (Phillips, 2016)
End of Term, 2008

Methods of Tufts Crawl
12/23/2017 FOSS4G Conference | Boston, MA 6
• Access to up-to-date Federal data is critical for our Data Lab
• GIS and statistics classes rely on federal data (e.g. US Census,
TRI, HUD)

Methods of Tufts Crawl
12/23/2017 FOSS4G Conference | Boston, MA 7
• Inquired about key data for faculty
and staff at Tufts
• Also reached out to the Open
Geoportal community
• Created a list of critical data
sources that enable research and
learning at Tufts and beyond
Google Docs
OGP Outreach

Methods of Tufts Crawl
12/23/2017 FOSS4G Conference | Boston, MA 8
• Used an FTP program called Filezilla
• Can re-initiate connections after
failure
• Ran on multiple computers
overnight, set to mirror different
FTP sites.

Methods of Tufts Crawl
12/23/2017 FOSS4G Conference | Boston, MA 9
•Also completed collection of data
from speed-limited locations by
traditional mail
•Placed 128 GB flash drive in an
envelope
• Caught in a storm, but still much
faster than dial-up speed

Summary of Results
12/23/2017 FOSS4G Conference | Boston, MA 10
16
31
40
0
10
20
30
40
50
2008 2012 2017
DataRecused,TB
Year of Harvest
Data Rescue, Estimated Harvest
From Tufts alone – likely much
higher for all data rescues!Source: Phillips, 2016

Results of Tufts Crawl
12/23/2017 FOSS4G Conference | Boston, MA 11
Word Cloud
(highest frequency terms)

Development of Tufts Crawler
12/23/2017 FOSS4G Conference | Boston, MA 12
•High volume of data – much of it
zipped
• Some further compressed inside zip
files
•Needed a lightweight tool to
assess what data was captured
• Solution  Python script

Development of Tufts Crawler
12/23/2017 FOSS4G Conference | Boston, MA 13
•Packaged the Python script in a
GUI using Tkinter
• Object-oriented layer on Tcl/Tk
•Allows for users unfamiliar with
Python to use the tool
•Provides a simple interface and
clear results

Results of Tufts Crawler
12/23/2017 FOSS4G Conference | Boston, MA 14
Unzips
files
Records type
of file
Organizes XML
data

Using the Tufts Crawler – Take a Look
12/23/2017 FOSS4G Conference | Boston, MA 15

Summary & Future Work
12/23/2017 FOSS4G Conference | Boston, MA 16
• Tufts identified federal data
perceived “at-risk”
• Harvested over 40 TB of data, mostly
compressed
• Developed Tufts Crawler to unpack
and categorize types, sizes and other
metadata.
• Future work: pack into .exe, estimate
progress bar

Acknowledgements
12/23/2017 FOSS4G Conference | Boston, MA 17
• Tufts Geospatial Team: Carolyn
Talmadge, Chris Barnett, Szuhui Wu,
Annie Swafford, Kristen Lee, Adrian
Sharpe, Patrick Florance.
• Graduate students: Sam Boiler.
• Others: Faculty and members of OGP
who assisted in data selection,
DataRescue Boston, the #DataRefuge
slack channel, all in Bromfield House.

Thank you!
Kyle M. Monahan
Statistics & Research Technology Specialist
Tufts University
kyle.monahan@tufts.edu
12/23/2017 FOSS4G Conference | Boston, MA 18
kylemonahan.info datalab.tufts.edu
For more information:

Questions?
5 minutes
12/23/2017 FOSS4G Conference | Boston, MA 19

Extra Slides – Python Code
12/23/2017 FOSS4G Conference | Boston, MA 20

Extra Slides – Python Code
12/23/2017 FOSS4G Conference | Boston, MA 21

Extra Slides – Details about tkinter GUI
12/23/2017 FOSS4G Conference | Boston, MA 22

The document introduces the principles of Linked Data, which aims to share data rather than documents on the web. It describes the four rules of Linked Data and provides examples of existing Linked Data datasets as well as tools for publishing and using Linked Data. The document also discusses extending Linked Data to include geospatial and sensor data by linking web resources, structured geospatial databases, and unstructured geographic information.

Wikidata

Anja Jentzsch

Wikidata is a free and open knowledge base that can be edited by anyone to store structured data. It currently has over 33.5 million articles and 1.9 billion edits in 287 languages. Wikidata provides structured, collaborative, free, open, multilingual, and referenced data through its API and licenses its data under CC0 to allow easy access and reuse. It helps projects like Wikipedia by providing integrated access to its data and supports smaller languages and communities through micro-contributions. In 2015, Google's Freebase project moved its data to Wikidata, increasing its scope and ecosystem.

Linked Data

Anja Jentzsch

Linked Data allows evolving the web into a global data space by publishing structured data on the web using RDF and by linking data items across different data sources. It follows the Linked Data principles of using URIs to identify things and HTTP URIs to look up those names, providing useful RDF information when URIs are dereferenced, and including RDF links to discover related data. The amount of published Linked Data on the web has grown enormously since 2007. Large data sources like DBpedia extract structured data from Wikipedia and act as hubs by interlinking different data sets, enabling new applications and search over integrated data.

Linked Data Overview - AGI Technical SIG

Chris Ewing

Fair data - dinkum research - by Andy Turner

Jisc RDM

April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters

National Information Standards Organization (NISO)

NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters About the Webinar In May 2011, the Library of Congress officially launched a new modeling initiative, Bibliographic Framework Initiative, as a linked data alternative to MARC. The Library then announced in November 2012 the proposed model, called BIBFRAME. Since then, the library world is moving from mainly theorizing about the BIBFRAME model to attempts to implement practical experimentation and testing. This experimentation is iterative, and continues to shape the model so that it’s stable enough and broadly acceptable enough for adoption. In this webinar, several institutions will share their progress in experimenting with BIBFRAME within their library system. They will discuss the existing, developing, and planned projects happening at their institutions. Challenges and opportunities in exploring and implementing BIBFRAME in their institutions will be discussed as well. Agenda Introduction Todd Carpenter, Executive Director, NISO Experimental Mode: The National Library of Medicine and experiences with BIBFRAME Nancy Fallgren, Metadata Specialist Librarian, National Library of Medicine, National Institutes of Health, US Department of Health and Human Services (DHHS) Exploring BIBFRAME at a Small Academic Library Jeremy Nelson, Metadata and Systems Librarian, Colorado College Working with BIBFRAME for discovery and production: Linked data for Libraries/Linked Data for Production Nancy Lorimer, Head, Metadata Dept, Stanford University Libraries

Wednesday 6 May: Hand me the data! What you should know as a humanities resea...

WARCnet

WG5: A data wrangling experiment

WARCnet

The document discusses a data wrangling experiment to create datasets from the Rijksmuseum collection and web archive data for research purposes. A group of researchers from different universities aim to develop standardized code books and controlled vocabularies to structure the data and enable interlinking across collections. They discuss techniques like SPARQL and identifiers in Wikidata to retrieve and organize machine-readable data for future studies of body postures in artworks and web archives.

This document discusses Linked Data and outlines its key principles and benefits. It describes how Linked Data extends the traditional web by creating a single global data space using RDF to publish structured data on the web and by setting links between data items from different sources. The document outlines the growth of Linked Data on the web, with over 31 billion triples from 295 datasets as of 2011. It provides examples of large Linked Data sources like DBpedia and discusses best practices for publishing, consuming, and working with Linked Data.

Wiggins-7-jun15

National Information Standards Organization (NISO)

The Library of Congress engaged in linked data efforts starting in 2009 and created its Linked Data Service. It contracted with Zepheira to develop the initial BIBFRAME model and vocabulary 1.0 with input from early experimenters. The Library of Congress conducted a pilot of BIBFRAME from October 2015 to March 2016 with 40 staff cataloging in both MARC and BIBFRAME. The pilot helped develop BIBFRAME and identified areas for improvement. The Library of Congress will continue to refine BIBFRAME 2.0 and conduct additional testing.

2011 05-02 linked data intro

vafopoulos

This document discusses the evolution of the web from a network of documents to a network of linked data. It begins by describing the original web of documents, which organized information in silos and had implicit semantics. The document then introduces the concept of the semantic web and linked data, which structures information as interconnected data using explicit semantics. It provides examples of how linked data can be represented using RDF triples and describes the principles of linked data for publishing and connecting data on the web. Finally, it discusses characteristics and examples of linked data applications.

2011 05-01 linked data

vafopoulos

This document discusses the evolution of the web from a web of documents to a web of linked data. It outlines the principles of linked data, which involve using URIs to identify things and linking those URIs to other URIs so that machines can discover more data. RDF is introduced as a standard data model for publishing linked data on the web using triples. Examples of linked data applications and datasets are provided to illustrate how linked data allows the web to function as a global database.

KESW2012 Hackathon St Petersburg

AI4BD GmbH

This document summarizes a presentation about transforming the web together through open data and standards. It discusses the World Wide Web Consortium (W3C) and its role in developing open web standards. It provides examples of linked open data projects including data.gov and mashups of government data. Specific open data portals for cities like Chicago are highlighted. Semantic web technologies like RDF, RDFa, and SPARQL are referenced as working groups at W3C. Links are included to further resources on linked open data basics and news portals. The presentation concludes with mentioning Peter Mika from Yahoo discussing open data.

Making art (and more!) with metadata

Matthew Miguez

Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...

Robert H. McDonald

Semantic web 101: Benefits for geologists

dgarijo

Thompson 6-jun15-final

National Information Standards Organization (NISO)

Wacker-4-june15

National Information Standards Organization (NISO)

One Discovery Layer, Eight Front Doors: Implementing Blacklight @ IU

Courtney McDonald

The document summarizes Indiana University's implementation of the Blacklight discovery layer across its eight campuses to provide a shared interface for its online catalog (IUCAT) while allowing for flexibility across campuses. Key points include: IU has a complex data environment with diverse collections across eight campuses previously only served by a one-size-fits-all interface; in 2011 IU selected Blacklight over VuFind as its discovery layer due to flexibility and development community; implementation began in summer 2011 with a public beta in fall 2012 and full transition in May 2013; campus-specific views and call number browsing were customized; and future work includes enhanced customization, transition to Kuali OLE, and improving browse functions.

Clark - Metadata is the Message

National Information Standards Organization (NISO)

Open data and linked data

Marie Gustafsson Friberger

Providing open data is of interest for its societal and commercial value, for transparency, and because more people can do fun things with data. There is a growing number of initiatives to provide open data, from, for example, the UK government and the World Bank. However, much of this data is provided in formats such as Excel files, or even PDF files. This raises the question of - How best to provide access to data so it can be most easily reused? - How to enable the discovery of relevant data within the multitude of available data sets? - How to enable applications to integrate data from large numbers of formerly unknown data sources? One way to address these issues to to use the design principles of linked data (http://www.w3.org/DesignIssues/LinkedData.html), which suggest best practices for how to publish and connect structured data on the Web. This presentation gives an overview of linked data technologies (such as RDF and SPARQL), examples of how they can be used, as well as some starting points for people who want to provide and use linked data. The presentation was given on August 8, at the Hacknight event (http://hacknight.se/) of Forskningsavdelningen (http://forskningsavd.se/) (Swedish: “Research Department”) a hackerspace in Malmö.

Congress text-mining-event

Ian Milligan

Methodological Guidelines for Publishing Linked Data

Boris Villazón-Terrazas

The document provides guidelines for publishing data as Linked Data. It discusses identifying appropriate data sources, reusing existing vocabularies and non-ontological resources, generating RDF data from relational databases or geometrical data using tools like R2O, ODEMapster and geometry2rdf, and publishing the data on the web by resolving URIs. The Ontology Engineering Group at Universidad Politécnica de Madrid has published Spanish geospatial and statistical data as part of projects like GeoLinkedData following these guidelines.

Semantic Web in the Digital Humanities

Leipziger Semantic Web Tag

This document discusses a research project on early modern professorial career patterns that analyzes databases of academic histories. It proposes a methodology using the Heloise Common Research Model, which takes a service-based, layered approach to applying knowledge bases. A key part of the methodology is developing a domain-specific research ontology to model relevant concepts from the databases. Future work includes simplifying exploration of databases by aligning them to publishing standards, and documenting the research process using tools and infrastructures.

Brdi rda 9 13 -- rda

Research Data Alliance

- The RDA Plenary 2 was held in Washington D.C. from September 16-18, with 368 participants from 22 countries and all sectors. - There were keynote speeches, panels on global partnerships and affiliate organizations, and meetings of RDA working groups and interest groups. - The RDA community has grown to over 1,300 participants from over 50 countries, with two-thirds from academia. There is increasing momentum and interactions between groups. - Plans are underway to develop policies, determine deliverables, and build the RDA organizational structure with members and affiliates. The next plenaries will be in Dublin in March 2014 and the Netherlands in September 2014.

Information Extraction from EuroParliament and UK Parliament data

Wim Peters

Linked Datapast, present and futures

Pierre-Yves Vandenbussche, Ph.D.

1) The document discusses the evolution of the semantic web and linked data, from their initial visions to their current uses. It describes how linked data has focused more on sharing information as a graph and facilitating data integration, rather than the formal ontologies originally envisioned for the semantic web. 2) Key developments in linked data are highlighted, such as schema.org for web pages metadata and DBpedia for open data. However, limitations around costs, incentives and tool maintenance are noted. 3) Emerging areas are knowledge discovery through graph mining of linked data, and the potential for a more "sentient web" combining linked data with sensors and AI/ML for continuous learning.

DBpedia+ / DBpedia meeting in Dublin

Dimitris Kontokostas

This document discusses the evolution of DBpedia from 2007 to 2014 and challenges in aligning it as Wikipedia changes. It introduces DBpedia+, a new framework using unit testing and feedback loops to adapt the data extraction as Wikipedia and its templates evolve. RDFUnit is presented as a way to test RDF data and link data tests to software tests. The goal is to provide additional feedback through reporting, statistics, cross-checking between Wikipedias, and machine learning to improve the extraction process as Wikipedia changes over time.

The Materials Data Facility: A Distributed Model for the Materials Data Commu...

Ben Blaiszik

The Materials Data Facility (MDF) is a distributed model for the materials data community that aims to make materials data more shareable, open, accessible, computable, and valuable. The MDF indexes over 100 terabytes of materials data from various repositories and facilities. It provides services for data discovery, publication with DOIs, and integrates data with computing resources. The goal is to simplify critical tasks in materials science like finding relevant data, training machine learning models across multiple datasets, and reproducing results.

Materials Data Facility: Streamlined and automated data sharing, discovery, ...

Ian Foster

What's hot

Linked Data (1st Linked Data Meetup Malmö)

Anja Jentzsch

Wiggins-7-jun15

National Information Standards Organization (NISO)

2011 05-02 linked data intro

vafopoulos

2011 05-01 linked data

vafopoulos

KESW2012 Hackathon St Petersburg

AI4BD GmbH

Making art (and more!) with metadata

Matthew Miguez

Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...

Robert H. McDonald

Semantic web 101: Benefits for geologists

dgarijo

Thompson 6-jun15-final

National Information Standards Organization (NISO)

Wacker-4-june15

National Information Standards Organization (NISO)

One Discovery Layer, Eight Front Doors: Implementing Blacklight @ IU

Courtney McDonald

Clark - Metadata is the Message

National Information Standards Organization (NISO)

Open data and linked data

Marie Gustafsson Friberger

Congress text-mining-event

Ian Milligan

Methodological Guidelines for Publishing Linked Data

Boris Villazón-Terrazas

Semantic Web in the Digital Humanities

Leipziger Semantic Web Tag

Brdi rda 9 13 -- rda

Research Data Alliance

Information Extraction from EuroParliament and UK Parliament data

Wim Peters

Linked Datapast, present and futures

Pierre-Yves Vandenbussche, Ph.D.

DBpedia+ / DBpedia meeting in Dublin

Dimitris Kontokostas

What's hot (20)

Linked Data (1st Linked Data Meetup Malmö)

Wiggins-7-jun15

2011 05-02 linked data intro

2011 05-01 linked data

KESW2012 Hackathon St Petersburg

Making art (and more!) with metadata

Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...

Semantic web 101: Benefits for geologists

Thompson 6-jun15-final

Wacker-4-june15

One Discovery Layer, Eight Front Doors: Implementing Blacklight @ IU

Clark - Metadata is the Message

Open data and linked data

Congress text-mining-event

Methodological Guidelines for Publishing Linked Data

Semantic Web in the Digital Humanities

Brdi rda 9 13 -- rda

Information Extraction from EuroParliament and UK Parliament data

Linked Datapast, present and futures

DBpedia+ / DBpedia meeting in Dublin

Similar to Tufts Spatial Data Rescue: Crawling at-risk Government Data

The Materials Data Facility: A Distributed Model for the Materials Data Commu...

Ben Blaiszik

Materials Data Facility: Streamlined and automated data sharing, discovery, ...

Ian Foster

Big Data and its Role in Biomedical Research

Philip Bourne

Uncovering Measures that Matter: A Field-Wide Collaborative Exploration

Georgia Libraries Conference (formerly Ga COMO).

Presenter: Timothy Cherubini. Presented at the Georgia Libraries Conference in Columbus, GA on 10/04/2017. Thousands of public libraries diligently contribute to data collection efforts each year. While the value of data is broadly recognized, concerns have emerged about the proliferation of surveys, lack of coordination between collecting organizations, duplication of efforts, and uncertainty about use of and access to data. The biggest question: Is the data we are collecting the right data to tell the story of the 21st century library effectively? Measures that Matter is a field-wide initiative to explore these issues and questions. It is led by the Chief Officers of State Library Agencies with the support of the Institute of Museum and Library Services.

Johnston - How to Curate Research Data

National Information Standards Organization (NISO)

Lowenberg Making Data Count

National Information Standards Organization (NISO)

re3data.org – a Registry of Research Data Repositories

Heinz Pampel

re3data.org is a global registry of research data repositories that aims to promote open sharing of research data. It indexes repositories from all academic disciplines to help researchers, funders, publishers, and institutions find appropriate places to store and share research data. The registry has grown significantly since its founding and now indexes over 1,000 repositories. It is a collaborative effort between several German and American institutions and works with other organizations to advance open data policies.

Data Management for Research (New Faculty Orientation)

aaroncollie

Services, policy, guidance and training: Improving research data management a...

EDINA, University of Edinburgh

Sharing data

Edmund Chamberlain

This document summarizes a meeting between librarians and researchers on sharing research data. It includes presentations on the changing data environment, sharing geographic data, libraries providing infrastructure for research data curation, and the Cambridge context. Attendees discussed making data available and structured online, managing risks like licensing, and the roles of different players in moving from data production to consumption.

Open Access to Research Data: Challenges and Solutions

Martin Donnelly

MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...

Yongyao Jiang

Implementing a new geospatial data discovery interface across a multi-institu...

nacis_slides

NACIS 2016 Presentation Nathan Piekielek, The Pennsylvania State University James Whitacre, University of Illinois at Urbana-Champaign Geographic information systems (GIS) have been commonly used mapping and analytic tools for more than twenty years. Early in this period, a lack of geospatial data often limited GIS users so that individuals were commonly producing geospatial data for their own use. More recently, the availability of geospatial data has increased dramatically so that the focus has shifted away from the data production efforts of individuals and towards large-scale multi-institution data documentation and discovery projects. In 2015, nine university members of the Committee on Institutional Cooperation (CIC; aka Big Ten) began a collaborative effort to build and populate a geospatial data portal. The portal leverages the newest data documentation and discovery tools including GeoNetwork to create ISO metadata records and GeoBlacklight as the platform for a web-based discovery interface. A beta version of the portal is operational and will be described and demonstrated.

Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...

hsuleslie

1. The Sediment Experimentalist Network (SEN) aims to facilitate collaboration and data sharing between sediment experimentalists. 2. SEN will provide tools and resources to help scientists at every step of the data life cycle, from planning experiments to publishing and archiving data. 3. These include workshops, training, online catalogs and wikis to discover existing data and best practices, and opportunities like a student challenge to earn a trip to an upcoming SEN workshop.

Services, policy, guidance and training: Improving research data management a...

Robin Rice

Edinburgh DataShare: Tackling research data in a DSpace institutional repository

Robin Rice

1) The document discusses Edinburgh DataShare, a data repository at the University of Edinburgh that was established as part of the DISC-UK DataShare project to explore new ways for academics to share research data over the internet. 2) It describes lessons learned from establishing the repository, including that top-down drivers are important for data sharing, and that data libraries can help bridge communication between researchers and repository managers. 3) The document recommends that institutions develop research data policies to clarify rights and responsibilities regarding data sharing and management.

Automating Homelessness

Communication and Media Studies, Carleton University

RDAP 15: Research Data Integration in the Purdue Libraries

ASIS&T

The Power of Open Data!

Renaine Julian

Critically Assembling Data, Processes & Things: Toward and Open Smart City

Communication and Media Studies, Carleton University

Cottbus Brandenburg University of Technology Lecture series on Smart RegionsCritically Assembling Data, Processes & Things: Toward and Open Smart CityJune 5, 2018 This lecture will critically focus on smart cities from a data based socio-technological assemblage approach. It is a theoretical and methodological framework that allows for an empirical examination of how smart cities are socially and technically constructed, and to study them as discursive regimes and as a large technological infrastructural systems. The lecture will refer to the research outcomes of the ERC funded Programmable City Project led by Rob Kitchin at Maynooth University and will feature examples of empirical research conducted in Dublin and other Irish cities. In addition, the lecture will discuss the research outcomes of the Canadian Open Smart Cities project funded by the Government of Canada GeoConnections Program. Examples will be drawn from five case studies namely about the cities of Edmonton, Guelph, Ottawa and Montreal, and the Ontario Smart Grid as well as number of international best practices. The recent Infrastructure Canada Canadian Smart City Challenge and the controversial Sidewalk Lab Waterfront Toronto project will also be discussed. It will be argued that no two smart cities are alike although the technological solutionist and networked urbanist approaches dominate and it is suggested that these kind of smart cities may not live up to the promise of being better places to live. In this lecture, the ideals of an Open Smart City are offered instead and in this kind of city residents, civil society, academics, and the private sector collaborate with public officials to mobilize data and technologies when warranted in an ethical, accountable and transparent way in order to govern the city as a fair, viable and livable commons that balances economic development, social progress and environmental responsibility. Although an Open Smart City does not yet exist, it will be argued that it is possible.

Similar to Tufts Spatial Data Rescue: Crawling at-risk Government Data (20)

The Materials Data Facility: A Distributed Model for the Materials Data Commu...

Materials Data Facility: Streamlined and automated data sharing, discovery, ...

Big Data and its Role in Biomedical Research

Uncovering Measures that Matter: A Field-Wide Collaborative Exploration

Johnston - How to Curate Research Data

Lowenberg Making Data Count

re3data.org – a Registry of Research Data Repositories

Data Management for Research (New Faculty Orientation)

Services, policy, guidance and training: Improving research data management a...

Sharing data

Open Access to Research Data: Challenges and Solutions

MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...

Implementing a new geospatial data discovery interface across a multi-institu...

Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...

Services, policy, guidance and training: Improving research data management a...

Edinburgh DataShare: Tackling research data in a DSpace institutional repository

Automating Homelessness

RDAP 15: Research Data Integration in the Purdue Libraries

The Power of Open Data!

Critically Assembling Data, Processes & Things: Toward and Open Smart City

Recently uploaded

06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM

Timothy Spann

06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM by Timothy Spann Principal Developer Advocate https://budapestdata.hu/2024/en/ https://budapestml.hu/2024/en/ tim.spann@zilliz.com https://www.linkedin.com/in/timothyspann/ https://x.com/paasdev https://github.com/tspannhw https://www.youtube.com/@flank-stack milvus vector database gen ai generative ai deep learning machine learning apache nifi apache pulsar apache kafka apache flink

A presentation that explain the Power BI Licensing

AlessioFois2

End-to-end pipeline agility - Berlin Buzzwords 2024

Lars Albertsson

We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines. A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more. A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream. Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.

Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...

Kaxil Naik

Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical. In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions. This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next. The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs). This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future. Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样

apvysm8

原版一模一样【微信：741003700 】【(uts毕业证书)悉尼科技大学毕业证学历证书】【微信：741003700 】学位证，留信认证（真实可查，永久存档）offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原海外各大学 Bachelor Diploma degree, Master Degree Diploma 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样

v7oacc3l

学校原件一模一样【微信：741003700 】《(英国UCA毕业证书)创意艺术大学毕业证》【微信：741003700 】学位证，留信认证（真实可查，永久存档）原件一模一样纸张工艺/offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原。 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 【主营项目】一.毕业证【q微741003700】成绩单、使馆认证、教育部认证、雅思托福成绩单、学生卡等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 如果您处于以下几种情况： ◇在校期间，因各种原因未能顺利毕业……拿不到官方毕业证【q/微741003700】 ◇面对父母的压力，希望尽快拿到； ◇不清楚认证流程以及材料该如何准备； ◇回国时间很长，忘记办理； ◇回国马上就要找工作，办给用人单位看； ◇企事业单位必须要求办理的 ◇需要报考公务员、购买免税车、落转户口 ◇申请留学生创业基金留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理

wyddcwye1

原版制作【微信:41543339】【利兹贝克特大学毕业证(LeedsBeckett毕业证书)】【微信:41543339】《成绩单、外壳、雅思、offer、真实留信官方学历认证（永久存档/真实可查）》采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路）我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信41543339】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信41543339】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake

Walaa Eldin Moustafa

Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines. #SQL #Views #Privacy #Compliance #DataLake

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...

Aggregage

一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理

nuttdpt

毕业原版【微信:176555708】【(UCSB毕业证书)圣芭芭拉分校毕业证】【微信:176555708】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信176555708】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信176555708】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf

Fernanda Palhano

Learn SQL from basic queries to Advance queries

manishkhaire30

Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively. Key Highlights: Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation. Advanced Queries: Learn to craft complex queries to uncover deep insights from your data. Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets. Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios. Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making. Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data! #DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics

DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx

SaffaIbrahim1

Analysis insight about a Flyball dog competition team's performance

roli9797

DSSML24_tspann_CodelessGenerativeAIPipelines

Timothy Spann

Codeless Generative AI Pipelines (GenAI with Milvus) https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience. Timothy Spann https://www.youtube.com/@FLaNK-Stack https://medium.com/@tspann https://www.datainmotion.dev/ milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge

Open Source Contributions to Postgres: The Basics POSETTE 2024

ElizabethGarrettChri

Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.

Global Situational Awareness of A.I. and where its headed

vikram sood

You can see the future first in San Francisco. Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum. The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war. Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change. Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride. Let me tell you what we see.

一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理

bopyb

毕业原版【微信:176555708】【(GWU,GW毕业证书)乔治·华盛顿大学毕业证】【微信:176555708】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信176555708】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信176555708】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理

nuttdpt

毕业原版【微信:176555708】【(UCSF毕业证书)旧金山分校毕业证】【微信:176555708】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信176555708】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信176555708】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

Build applications with generative AI on Google Cloud

Márton Kodok

We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.

Recently uploaded (20)

06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM

A presentation that explain the Power BI Licensing

End-to-end pipeline agility - Berlin Buzzwords 2024

Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样

在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样

原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...

一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理

Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf

Learn SQL from basic queries to Advance queries

DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx

Analysis insight about a Flyball dog competition team's performance

DSSML24_tspann_CodelessGenerativeAIPipelines

Open Source Contributions to Postgres: The Basics POSETTE 2024

Global Situational Awareness of A.I. and where its headed

一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理

一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理

Build applications with generative AI on Google Cloud

Tufts Spatial Data Rescue: Crawling at-risk Government Data

1. Tufts Spatial Data Rescue: Crawling at-risk Government Data Kyle Monahan Statistics and Research Technology Specialist Tufts University FOSS4G | Boston, MA | 8/17/2017

2. Background •What is a data rescue? • Methods and techniques to identify, store and preserve datasets • Predominantly data associated with government entities • *.gov, *.mil, *.edu, *.org, etc. • Especially critical during election transitions 12/23/2017 FOSS4G Conference | Boston, MA 2

3. Background - History 12/23/2017 KMM 3

4. Example: 2008 End of Term Harvest 12/23/2017 FOSS4G Conference | Boston, MA 4 • National Archives and Records Administration announced they would be unable to rescue data as they did in 2004. • International Internet Preservation Consortium (IIPC) responded by organizing a crawl: • California Digital Library • Internet Archive • Government Printing Office • Library of Congress • University of North Texas • Goal: “comprehensive harvest” (EOTerm Archive, 2016)

5. Example: 2008 End of Term Harvest 12/23/2017 FOSS4G Conference | Boston, MA 5 • Consisted of three main crawls: • Pre-election • Post-election • Post-inauguration • Produced over 16 TB of data • And 160,211,356 URIs (Phillips, 2016) End of Term, 2008

6. Methods of Tufts Crawl 12/23/2017 FOSS4G Conference | Boston, MA 6 • Access to up-to-date Federal data is critical for our Data Lab • GIS and statistics classes rely on federal data (e.g. US Census, TRI, HUD)

7. Methods of Tufts Crawl 12/23/2017 FOSS4G Conference | Boston, MA 7 • Inquired about key data for faculty and staff at Tufts • Also reached out to the Open Geoportal community • Created a list of critical data sources that enable research and learning at Tufts and beyond Google Docs OGP Outreach

8. Methods of Tufts Crawl 12/23/2017 FOSS4G Conference | Boston, MA 8 • Used an FTP program called Filezilla • Can re-initiate connections after failure • Ran on multiple computers overnight, set to mirror different FTP sites.

9. Methods of Tufts Crawl 12/23/2017 FOSS4G Conference | Boston, MA 9 •Also completed collection of data from speed-limited locations by traditional mail •Placed 128 GB flash drive in an envelope • Caught in a storm, but still much faster than dial-up speed

10. Summary of Results 12/23/2017 FOSS4G Conference | Boston, MA 10 16 31 40 0 10 20 30 40 50 2008 2012 2017 DataRecused,TB Year of Harvest Data Rescue, Estimated Harvest From Tufts alone – likely much higher for all data rescues!Source: Phillips, 2016

11. Results of Tufts Crawl 12/23/2017 FOSS4G Conference | Boston, MA 11 Word Cloud (highest frequency terms)

12. Development of Tufts Crawler 12/23/2017 FOSS4G Conference | Boston, MA 12 •High volume of data – much of it zipped • Some further compressed inside zip files •Needed a lightweight tool to assess what data was captured • Solution  Python script

13. Development of Tufts Crawler 12/23/2017 FOSS4G Conference | Boston, MA 13 •Packaged the Python script in a GUI using Tkinter • Object-oriented layer on Tcl/Tk •Allows for users unfamiliar with Python to use the tool •Provides a simple interface and clear results

14. Results of Tufts Crawler 12/23/2017 FOSS4G Conference | Boston, MA 14 Unzips files Records type of file Organizes XML data

15. Using the Tufts Crawler – Take a Look 12/23/2017 FOSS4G Conference | Boston, MA 15

16. Summary & Future Work 12/23/2017 FOSS4G Conference | Boston, MA 16 • Tufts identified federal data perceived “at-risk” • Harvested over 40 TB of data, mostly compressed • Developed Tufts Crawler to unpack and categorize types, sizes and other metadata. • Future work: pack into .exe, estimate progress bar

17. Acknowledgements 12/23/2017 FOSS4G Conference | Boston, MA 17 • Tufts Geospatial Team: Carolyn Talmadge, Chris Barnett, Szuhui Wu, Annie Swafford, Kristen Lee, Adrian Sharpe, Patrick Florance. • Graduate students: Sam Boiler. • Others: Faculty and members of OGP who assisted in data selection, DataRescue Boston, the #DataRefuge slack channel, all in Bromfield House.

18. Thank you! Kyle M. Monahan Statistics & Research Technology Specialist Tufts University kyle.monahan@tufts.edu 12/23/2017 FOSS4G Conference | Boston, MA 18 kylemonahan.info datalab.tufts.edu For more information:

19. Questions? 5 minutes 12/23/2017 FOSS4G Conference | Boston, MA 19

20. Extra Slides – Python Code 12/23/2017 FOSS4G Conference | Boston, MA 20

21. Extra Slides – Python Code 12/23/2017 FOSS4G Conference | Boston, MA 21

22. Extra Slides – Details about tkinter GUI 12/23/2017 FOSS4G Conference | Boston, MA 22

Editor's Notes

Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit.
Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 EDGI. “Homepage: Environmental Data and Governance Initiative.” Accessed on 8-15-2017. KMM. https://envirodatagov.org/
Image source: http://netpreserve.org/wp-content/uploads/2017/04/IIPC-logo.png Sources: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit. End Of Term Archive. “Project Background: End of Term Archive.” 2008. http://eotarchive.cdlib.org/background.html Accessed on 8-15-2017.
URI is uniform resource identifier – be sure to say it! Sources: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit. End Of Term Archive. “Project Background: End of Term Archive.” 2008. http://eotarchive.cdlib.org/background.html Accessed on 8-15-2017.
Data is critical to our teaching and research computing lab, called the Data Lab. We focus on data reference, analysis and visualization, of which federal data provides an integral base. Many of our GIS and statistics courses rely on access to federal data, such as the US Census, the EPA’s Toxic Release Inventory, the US Department of Housing and Urban Development (HUD), among others. ----- Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit.
We took a similar approach as the previously mentioned crawls. We inquired about key data that was critical for faculty and staff at Tufts. Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit.
Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit.
Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit.
Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit.
Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit.
Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit.
Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit.
Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit.
Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit.
Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit.
Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit.
Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit.
Source: "End of Term Presidential Harvest 2008" University of North Texas Digital Library, retrieved August 14, 2017 Phillips, Mark Edward. End of Term Web Archives: 2008, 2012, 2016 ..., presentation, April 5, 2016;(digital.library.unt.edu/ark:/67531/metadc848587/: accessed August 14, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Digital Projects Unit.

Tufts Spatial Data Rescue: Crawling at-risk Government Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Tufts Spatial Data Rescue: Crawling at-risk Government Data

Similar to Tufts Spatial Data Rescue: Crawling at-risk Government Data (20)

Recently uploaded

Recently uploaded (20)

Tufts Spatial Data Rescue: Crawling at-risk Government Data

Editor's Notes