Connections between big data and open data. Includes a case study of Data.gov and the ways that companies, charities, and others are using open data to improve the lives of people around the planet.
Computer-aided content analysis of digitally enabled movementsAlexander Hanna
The document discusses computer-aided content analysis methods for studying digitally-enabled social movements. It outlines applying supervised machine learning to categorize messages from a Facebook group for Egypt's April 6 Youth Movement. Key points:
1. Categories like offline coordination, online actions, and event reporting are defined to classify a training set of messages.
2. Validation is done using cross-validation, and analysis is applied to the full dataset.
3. Results show peaks in offline coordination before protest dates, but other categories did not change as expected, possibly due to errors in training.
GitHub as Transparency Device in Data Journalism, Open Data and Data ActivismLiliana Bounegru
Slides from presentation of research agenda around uses of GitHub in journalism at the Digital Methods Summer School 2015. More details here: http://lilianabounegru.org/2015/07/08/github-as-transparency-device-in-data-journalism-open-data-and-data-activism/
The team consists of Matt, a political scientist in Chicago, Libby and Jahna, informationists in Chicago and Cyprus respectively, and Andrew, a graduate student in technical communication in Chicago. They are studying how politicians and constituents use social media like Twitter to discuss public policy and the consequences of these conversations. They will collect Twitter data and analyze it using various tools like Python, Stata, MALLET, and Weka stored on an in-house server with Dropbox for sharing files and Excel for organizing data.
This document discusses cross-platform profiling as a method for studying issues across multiple online spaces. It provides examples of profiling controversies and issues like Fukushima, the economic crisis, hashtags on climate change, and the WCIT conference. Profiling involves analyzing actor composition, key platforms, framing, and variation over time. It demonstrates profiling using tools like the Google scraper and TCAT associational profiler to map word frequencies, co-occurrence networks, and changing associations for issues on Google, Twitter and other platforms. The document raises questions about how media liveliness relates to issue liveliness and how profiling can capture social dynamics and platform specificities.
This document summarizes Anatoliy Gruzd's presentation on research with social media data and considerations around data stewardship and ethics. It discusses key aspects of working with social big data including collection from APIs and data resellers, analysis through visualization, network and geo-based analysis, and preservation efforts from public archives, private companies and personal archiving. It also covers ethical considerations for researchers, industry and users around topics like transparency, privacy and expectations of data use. The presentation emphasizes the importance of responsible data stewardship across the whole data lifecycle from collection to analysis to preservation.
Cottbus Brandenburg University of Technology Lecture series on Smart RegionsCritically Assembling Data, Processes & Things: Toward and Open Smart CityJune 5, 2018
This lecture will critically focus on smart cities from a data based socio-technological assemblage approach. It is a theoretical and methodological framework that allows for an empirical examination of how smart cities are socially and technically constructed, and to study them as discursive regimes and as a large technological infrastructural systems.
The lecture will refer to the research outcomes of the ERC funded Programmable City Project led by Rob Kitchin at Maynooth University and will feature examples of empirical research conducted in Dublin and other Irish cities.
In addition, the lecture will discuss the research outcomes of the Canadian Open Smart Cities project funded by the Government of Canada GeoConnections Program. Examples will be drawn from five case studies namely about the cities of Edmonton, Guelph, Ottawa and Montreal, and the Ontario Smart Grid as well as number of international best practices. The recent Infrastructure Canada Canadian Smart City Challenge and the controversial Sidewalk Lab Waterfront Toronto project will also be discussed.
It will be argued that no two smart cities are alike although the technological solutionist and networked urbanist approaches dominate and it is suggested that these kind of smart cities may not live up to the promise of being better places to live.
In this lecture, the ideals of an Open Smart City are offered instead and in this kind of city residents, civil society, academics, and the private sector collaborate with public officials to mobilize data and technologies when warranted in an ethical, accountable and transparent way in order to govern the city as a fair, viable and livable commons that balances economic development, social progress and environmental responsibility. Although an Open Smart City does not yet exist, it will be argued that it is possible.
Computer-aided content analysis of digitally enabled movementsAlexander Hanna
The document discusses computer-aided content analysis methods for studying digitally-enabled social movements. It outlines applying supervised machine learning to categorize messages from a Facebook group for Egypt's April 6 Youth Movement. Key points:
1. Categories like offline coordination, online actions, and event reporting are defined to classify a training set of messages.
2. Validation is done using cross-validation, and analysis is applied to the full dataset.
3. Results show peaks in offline coordination before protest dates, but other categories did not change as expected, possibly due to errors in training.
GitHub as Transparency Device in Data Journalism, Open Data and Data ActivismLiliana Bounegru
Slides from presentation of research agenda around uses of GitHub in journalism at the Digital Methods Summer School 2015. More details here: http://lilianabounegru.org/2015/07/08/github-as-transparency-device-in-data-journalism-open-data-and-data-activism/
The team consists of Matt, a political scientist in Chicago, Libby and Jahna, informationists in Chicago and Cyprus respectively, and Andrew, a graduate student in technical communication in Chicago. They are studying how politicians and constituents use social media like Twitter to discuss public policy and the consequences of these conversations. They will collect Twitter data and analyze it using various tools like Python, Stata, MALLET, and Weka stored on an in-house server with Dropbox for sharing files and Excel for organizing data.
This document discusses cross-platform profiling as a method for studying issues across multiple online spaces. It provides examples of profiling controversies and issues like Fukushima, the economic crisis, hashtags on climate change, and the WCIT conference. Profiling involves analyzing actor composition, key platforms, framing, and variation over time. It demonstrates profiling using tools like the Google scraper and TCAT associational profiler to map word frequencies, co-occurrence networks, and changing associations for issues on Google, Twitter and other platforms. The document raises questions about how media liveliness relates to issue liveliness and how profiling can capture social dynamics and platform specificities.
This document summarizes Anatoliy Gruzd's presentation on research with social media data and considerations around data stewardship and ethics. It discusses key aspects of working with social big data including collection from APIs and data resellers, analysis through visualization, network and geo-based analysis, and preservation efforts from public archives, private companies and personal archiving. It also covers ethical considerations for researchers, industry and users around topics like transparency, privacy and expectations of data use. The presentation emphasizes the importance of responsible data stewardship across the whole data lifecycle from collection to analysis to preservation.
Cottbus Brandenburg University of Technology Lecture series on Smart RegionsCritically Assembling Data, Processes & Things: Toward and Open Smart CityJune 5, 2018
This lecture will critically focus on smart cities from a data based socio-technological assemblage approach. It is a theoretical and methodological framework that allows for an empirical examination of how smart cities are socially and technically constructed, and to study them as discursive regimes and as a large technological infrastructural systems.
The lecture will refer to the research outcomes of the ERC funded Programmable City Project led by Rob Kitchin at Maynooth University and will feature examples of empirical research conducted in Dublin and other Irish cities.
In addition, the lecture will discuss the research outcomes of the Canadian Open Smart Cities project funded by the Government of Canada GeoConnections Program. Examples will be drawn from five case studies namely about the cities of Edmonton, Guelph, Ottawa and Montreal, and the Ontario Smart Grid as well as number of international best practices. The recent Infrastructure Canada Canadian Smart City Challenge and the controversial Sidewalk Lab Waterfront Toronto project will also be discussed.
It will be argued that no two smart cities are alike although the technological solutionist and networked urbanist approaches dominate and it is suggested that these kind of smart cities may not live up to the promise of being better places to live.
In this lecture, the ideals of an Open Smart City are offered instead and in this kind of city residents, civil society, academics, and the private sector collaborate with public officials to mobilize data and technologies when warranted in an ethical, accountable and transparent way in order to govern the city as a fair, viable and livable commons that balances economic development, social progress and environmental responsibility. Although an Open Smart City does not yet exist, it will be argued that it is possible.
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...Edward Curry
Digital transformation is driving a new wave of large-scale datafication in every aspect of our world. Today our society creates data ecosystems where data moves among actors within complex information supply chains that can form around an organization, community, sector, or smart environment. These ecosystems of data can be exploited to transform our world and present new challenges and opportunities in the design of intelligent systems. This talk presents my recent work on using the dataspace paradigm as a best-effort approach to data management within data ecosystems. The talk explores the theoretical foundations and principles of dataspaces and details a set of specialized best-effort techniques and models to enable loose administrative proximity and semantic integration of heterogeneous data sources. Finally, I share my perspectives on future dataspace research challenges, including multimedia data, data governance and the role of dataspaces to enable large-scale data sharing within Europe to power data-driven AI.
Extending Memory on the Web via Human-Centric Knowledge Exchange Network. Presented at W3C Workshop on Social Standards: The Future of Business, 7-8 August 2013, San Francisco, USA
This document summarizes a presentation on linked open government data. It discusses how government data is being opened through initiatives like Data.gov and how linked data approaches can help address challenges in making open government data more interoperable, scalable, and able to maintain provenance. Key points discussed include the growth of open government data, challenges in working with raw open data, benefits of converting data to linked open formats, and open questions around improving interoperability, addressing scalability issues, and maintaining provenance as open government data continues to expand.
Conference of Irish Geographies 2018
The Earth as Our Home
Automating Homelessness May 12, 2018
The research for these studies is funded by a European Research Council Advanced Investigator award ERC-2012-AdG-323636-SOFTCITY.
Pie chart or pizza: identifying chart types and their virality on TwitterElena Simperl
This document summarizes a study on identifying chart types from images shared on Twitter and predicting how viral (retweeted and liked) the images will be. The study used a convolutional neural network to identify 11 different chart types from images. It also used a multi-modal neural network to jointly predict retweets and likes by using features from the images as well as author metadata. The study introduced two new datasets (DataTweet+ and DataTweet) of charts collected from Twitter and found the models performed best when trained on these realistic datasets compared to benchmark datasets.
Allan Glen from the City and County of Denver presented on Denver's Open Data Initiative at the 2013 Regional Data Summit. Previously, Denver's GIS data was available for download but required restrictive licensing agreements. Denver has since launched an open data portal, data.denvergov.org, to make city data more freely available under a Creative Commons license. The portal has seen increasing usage, with over 14,000 visits and 105,000 page views in its first six months. Denver plans to continue expanding its open data program by adding more datasets and hiring an Open Data Architect.
Noshir Contractor's view on the future of Linked DataCarlos Pedrinaci
This document discusses adding social network data and analysis to linked open data. It suggests that while linked data is growing, killer applications that use linked data may require integrating social network information as well. It presents an initial test bed for integrating semantic web data that includes social network analysis and text mining to analyze variables across data sources and make recommendations of workflows, datasets, documents, and people. The integration of social data and analysis with linked open data could provide larger, higher resolution longitudinal data and enable reasoning and inferences over the combined data sources.
UNT: Scientific Data Management and SharingCarly Strasser
This document discusses scientific data management and sharing. It begins with an overview of the transition from traditional research methods using paper to modern digital data and complex workflows. It then covers barriers to data stewardship such as costs, sociocultural issues, and lack of incentives. The document also discusses why open science and data sharing are important, including reproducibility, credibility, and faster scientific progress. Funders and publishers now require data management plans and data sharing to support open science.
The workshop opens with a discussion of how to repurpose digital "methods of the medium" for social and cultural scholarly research, including its limitations, critiques and ethics. Subsequently participants are trained in using digital methods in hands-on sessions. How to use crawlers for dynamic URL sampling and issue network mapping? How to employ scrapers to create a bias or partisanship diagnostic instrument? We also consider how to deploy online platforms for social research. How to transform Wikipedia from an online encyclopaedia to a device for cross-cultural memory studies? How to make use of social media so as to profile the preferences and tastes of politicians’ friends, and also locate most engaged with content? How to make use of Twitter analytics to debanalize tweets, and provide compelling accounts of events on the ground? Finally, the workshop turns to the question of employing web data and metrics as societal indices more generally.
What Your Tweets Tell Us About You, Speaker NotesKrisKasianovitz
This document discusses the challenges of archiving and analyzing social media data from platforms like Twitter. It notes that researchers currently focus more on collecting and analyzing this data rather than addressing long-term preservation or privacy issues. Different disciplines also have varying views on whether social media posts should be treated as human subject research or public publications, leading to conflicts in how privacy is handled. The document aims to start a discussion on developing best practices for balancing privacy and open access when curating social media archives.
This document discusses leveraging social big data and the evolution from existing rigid operations to predictive analytics using social media. It begins with an overview of handouts and reference materials on big data, Hadoop, Spark, and data science projects. It then discusses areas for conversation around social content, structure and analytics, data science primers and resources, and data science innovation. It presents a roadmap showing the evolution from rigid and siloed operations to being more flexible, connected, adaptive and predictive using social media. Finally, it discusses types of intentionality and how social CRM can integrate social data.
A Framework Concept for Profiling Researchers on Twitter using the Web of DataLaurens De Vocht
Based upon findings and results from our recent research we propose a generic frame-
work concept for researcher profiling with appliance to the areas of ”Science 2.0” and ”Research 2.0”. Intensive growth of users in social networks, such as Twitter generated a vast amount of information. It has been shown in many previous works that social networks users produce valuable content for profiling and recommendations. Our research focuses on identifying and locating experts for specific research area or topic. In our approach we apply semantic technologies like (RDF, SPARQL), common vocabularies (SIOC , FOAF, MOAT, Tag Ontology) and Linked Data (GeoNames , COLINDA).
Transforming instagram data into location intelligencesuresh sood
This document discusses using data from Instagram to conduct location intelligence and internet of things research. It motivates an Instagram project using data like user trajectories to enable predictive capabilities, location-based services, and tourism recommendations. It outlines workflows for data science and discovery analytics on Instagram data, stored using MongoDB due to its support for geospatial data and JSON format. Tools are presented for Instagram analytics and push notification providers.
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
Presentation at 3rd LEARN workshop on Research Data Management, “Make research data management policies work”
Helsinki, 28 June 2016, by Sarah Callaghan, STFC Rutherford Appleton Laboratory
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...Allen Press
The document discusses open science and how it has transformed research and collaboration in several key ways:
- Data and research outputs are increasingly shared openly online in citable and contextualized ways to maximize their impact.
- Tools exist to support every stage of the research cycle from getting ideas to documenting findings.
- Funders increasingly require data to be shared openly to make publicly funded research a public good.
- Repositories provide places for researchers to store and organize different types of research data and outputs.
- Open science engages stakeholders throughout the entire research process from initial collaboration to downstream metrics and data publishing.
Next generation data services at the Marriott LibraryRebekah Cummings
This document discusses next generation data services at the Marriott Library. It begins by asking how data needs in the social sciences and humanities may change over the next five years, and how libraries can partner with faculty on data needs. The document then discusses the library's role in data curation, challenges, and examples of data services like research data consultation, metadata assistance, and repository services. It provides examples of collaborations like embedded librarianship and a project with the UCLA Civil Rights Project to archive publications and datasets. The discussion emphasizes the changing landscape and growing importance of data sharing and management.
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...Edward Curry
Digital transformation is driving a new wave of large-scale datafication in every aspect of our world. Today our society creates data ecosystems where data moves among actors within complex information supply chains that can form around an organization, community, sector, or smart environment. These ecosystems of data can be exploited to transform our world and present new challenges and opportunities in the design of intelligent systems. This talk presents my recent work on using the dataspace paradigm as a best-effort approach to data management within data ecosystems. The talk explores the theoretical foundations and principles of dataspaces and details a set of specialized best-effort techniques and models to enable loose administrative proximity and semantic integration of heterogeneous data sources. Finally, I share my perspectives on future dataspace research challenges, including multimedia data, data governance and the role of dataspaces to enable large-scale data sharing within Europe to power data-driven AI.
Extending Memory on the Web via Human-Centric Knowledge Exchange Network. Presented at W3C Workshop on Social Standards: The Future of Business, 7-8 August 2013, San Francisco, USA
This document summarizes a presentation on linked open government data. It discusses how government data is being opened through initiatives like Data.gov and how linked data approaches can help address challenges in making open government data more interoperable, scalable, and able to maintain provenance. Key points discussed include the growth of open government data, challenges in working with raw open data, benefits of converting data to linked open formats, and open questions around improving interoperability, addressing scalability issues, and maintaining provenance as open government data continues to expand.
Conference of Irish Geographies 2018
The Earth as Our Home
Automating Homelessness May 12, 2018
The research for these studies is funded by a European Research Council Advanced Investigator award ERC-2012-AdG-323636-SOFTCITY.
Pie chart or pizza: identifying chart types and their virality on TwitterElena Simperl
This document summarizes a study on identifying chart types from images shared on Twitter and predicting how viral (retweeted and liked) the images will be. The study used a convolutional neural network to identify 11 different chart types from images. It also used a multi-modal neural network to jointly predict retweets and likes by using features from the images as well as author metadata. The study introduced two new datasets (DataTweet+ and DataTweet) of charts collected from Twitter and found the models performed best when trained on these realistic datasets compared to benchmark datasets.
Allan Glen from the City and County of Denver presented on Denver's Open Data Initiative at the 2013 Regional Data Summit. Previously, Denver's GIS data was available for download but required restrictive licensing agreements. Denver has since launched an open data portal, data.denvergov.org, to make city data more freely available under a Creative Commons license. The portal has seen increasing usage, with over 14,000 visits and 105,000 page views in its first six months. Denver plans to continue expanding its open data program by adding more datasets and hiring an Open Data Architect.
Noshir Contractor's view on the future of Linked DataCarlos Pedrinaci
This document discusses adding social network data and analysis to linked open data. It suggests that while linked data is growing, killer applications that use linked data may require integrating social network information as well. It presents an initial test bed for integrating semantic web data that includes social network analysis and text mining to analyze variables across data sources and make recommendations of workflows, datasets, documents, and people. The integration of social data and analysis with linked open data could provide larger, higher resolution longitudinal data and enable reasoning and inferences over the combined data sources.
UNT: Scientific Data Management and SharingCarly Strasser
This document discusses scientific data management and sharing. It begins with an overview of the transition from traditional research methods using paper to modern digital data and complex workflows. It then covers barriers to data stewardship such as costs, sociocultural issues, and lack of incentives. The document also discusses why open science and data sharing are important, including reproducibility, credibility, and faster scientific progress. Funders and publishers now require data management plans and data sharing to support open science.
The workshop opens with a discussion of how to repurpose digital "methods of the medium" for social and cultural scholarly research, including its limitations, critiques and ethics. Subsequently participants are trained in using digital methods in hands-on sessions. How to use crawlers for dynamic URL sampling and issue network mapping? How to employ scrapers to create a bias or partisanship diagnostic instrument? We also consider how to deploy online platforms for social research. How to transform Wikipedia from an online encyclopaedia to a device for cross-cultural memory studies? How to make use of social media so as to profile the preferences and tastes of politicians’ friends, and also locate most engaged with content? How to make use of Twitter analytics to debanalize tweets, and provide compelling accounts of events on the ground? Finally, the workshop turns to the question of employing web data and metrics as societal indices more generally.
What Your Tweets Tell Us About You, Speaker NotesKrisKasianovitz
This document discusses the challenges of archiving and analyzing social media data from platforms like Twitter. It notes that researchers currently focus more on collecting and analyzing this data rather than addressing long-term preservation or privacy issues. Different disciplines also have varying views on whether social media posts should be treated as human subject research or public publications, leading to conflicts in how privacy is handled. The document aims to start a discussion on developing best practices for balancing privacy and open access when curating social media archives.
This document discusses leveraging social big data and the evolution from existing rigid operations to predictive analytics using social media. It begins with an overview of handouts and reference materials on big data, Hadoop, Spark, and data science projects. It then discusses areas for conversation around social content, structure and analytics, data science primers and resources, and data science innovation. It presents a roadmap showing the evolution from rigid and siloed operations to being more flexible, connected, adaptive and predictive using social media. Finally, it discusses types of intentionality and how social CRM can integrate social data.
A Framework Concept for Profiling Researchers on Twitter using the Web of DataLaurens De Vocht
Based upon findings and results from our recent research we propose a generic frame-
work concept for researcher profiling with appliance to the areas of ”Science 2.0” and ”Research 2.0”. Intensive growth of users in social networks, such as Twitter generated a vast amount of information. It has been shown in many previous works that social networks users produce valuable content for profiling and recommendations. Our research focuses on identifying and locating experts for specific research area or topic. In our approach we apply semantic technologies like (RDF, SPARQL), common vocabularies (SIOC , FOAF, MOAT, Tag Ontology) and Linked Data (GeoNames , COLINDA).
Transforming instagram data into location intelligencesuresh sood
This document discusses using data from Instagram to conduct location intelligence and internet of things research. It motivates an Instagram project using data like user trajectories to enable predictive capabilities, location-based services, and tourism recommendations. It outlines workflows for data science and discovery analytics on Instagram data, stored using MongoDB due to its support for geospatial data and JSON format. Tools are presented for Instagram analytics and push notification providers.
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
Presentation at 3rd LEARN workshop on Research Data Management, “Make research data management policies work”
Helsinki, 28 June 2016, by Sarah Callaghan, STFC Rutherford Appleton Laboratory
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...Allen Press
The document discusses open science and how it has transformed research and collaboration in several key ways:
- Data and research outputs are increasingly shared openly online in citable and contextualized ways to maximize their impact.
- Tools exist to support every stage of the research cycle from getting ideas to documenting findings.
- Funders increasingly require data to be shared openly to make publicly funded research a public good.
- Repositories provide places for researchers to store and organize different types of research data and outputs.
- Open science engages stakeholders throughout the entire research process from initial collaboration to downstream metrics and data publishing.
Next generation data services at the Marriott LibraryRebekah Cummings
This document discusses next generation data services at the Marriott Library. It begins by asking how data needs in the social sciences and humanities may change over the next five years, and how libraries can partner with faculty on data needs. The document then discusses the library's role in data curation, challenges, and examples of data services like research data consultation, metadata assistance, and repository services. It provides examples of collaborations like embedded librarianship and a project with the UCLA Civil Rights Project to archive publications and datasets. The discussion emphasizes the changing landscape and growing importance of data sharing and management.
This presentation was provided by Daniella Lowenberg of the California Digital Library during the NISO Virtual Conference, Advancing Altmetrics, held on Wednesday, December 13, 2017.
Data sharing in the age of the Social MachineUlrik Lyngs
Social machines generate and consume data. A Web Observatory is proposed as an infrastructure to support data sharing among social machines and the human elements involved in data generation, analysis, and consumption. Key challenges include designing such an infrastructure for generality while addressing ethical, legal, and technological issues around metadata standards, access control, and performance across different computing platforms and stakeholder needs.
This document summarizes a presentation on implementing digital provenance on the World Wide Web using semantic web technology. The presentation introduces digital provenance, discusses use cases, reviews the current state of the art, and considers tool development approaches. Digital provenance tracks the origin and changes made to data over time, enabling trust decisions. The W3C is working to define standards to represent and exchange provenance information. Semantic web ontologies are well-suited to capture complex provenance metadata and link it to related data. Open source, standards-compliant tools are needed to generate and manage provenance information.
Internet Archives and Social Science Research - Yeungnam Universitymwe400
The document discusses using large datasets from the Internet Archive to conduct social science research on emerging organizational forms. It presents examples of previous research leveraging archive data on topics like natural disasters, political activity, and social movements. The author proposes analyzing hyperlink, news coverage, Twitter, and website data on the Occupy Wall Street movement to test hypotheses about its emerging networked structure over time. Results are presented showing the growth of the movement's online presence and core clusters within its organizational network.
The document discusses data, data science, and finding data sources. It defines data as raw facts about the world and notes that data comes from various sources like government, scientific research, citizens, and private companies. It then discusses the growth of digital data and issues around open data. The document defines data science as using analysis methods to describe facts, detect patterns, and test hypotheses. Finally, it provides tips on finding needed data, such as searching open data sources, APIs, scraping, and joining datasets.
Wire Workshop: Overview slides for ArchiveHub Projectmwe400
The document discusses using large datasets from the Internet Archive to conduct research. It outlines an agenda with three parts: large scale data, developing new tools, and testing and building theory. The Internet Archive contains over 10 petabytes of cultural data, including 410 billion archived web pages. The ArchiveHub project aims to create tools and guidelines for longitudinal research on archived web data. Examples of potential research topics are discussed, such as studying social movements using link and text data from websites about Occupy Wall Street. Challenges discussed include accessing and preparing the large datasets for research purposes and connecting the data to theoretical frameworks.
How Data is Transforming Health and SocietyJeanne Holm
Open data refers to data that can be freely used, shared, and modified by anyone for any purpose. Many sectors benefit from open data including healthcare, financial services, government, and non-profits. Open data is estimated to create $3 trillion in economic growth annually in the US by powering innovation. The open data movement aims to provide transparency and allow people to make better decisions by democratizing access to data.
Data for Good: How Data is Transforming Business and SocietyJeanne Holm
From high tech to rural Uganda, the data that companies and governments share is being used around the world by all kinds of people to make life better.
The document discusses open satellite data and geospatial information systems (GIS). It notes that satellites gather high-fidelity, real-time data about locations and conditions on Earth, oceans, atmosphere and socioeconomic factors. Examples are given of how such data has been used for applications like analyzing global climate change, earthquake activity, earth and ocean science, helping with disasters through early warnings, and creating a multi-billion dollar weather and GPS industry. The document advocates for open data and citizen science, including crowdsourcing to solve food security challenges and correcting data. It asks what GIS data is needed in Uganda and suggests ways to utilize and explore such open data.
Data.gov provides access to over 400,000 datasets from 180 US agencies and organizations in easy to use formats. It encourages developers to build innovative applications using open data and drives knowledge sharing and innovation globally and across 18 communities. Data.gov is migrating its platform to the open source Open Government Platform to improve search capabilities, enable application statistics tracking, and allow federated searching of agency and local government catalogs. The presentation calls for continued collaboration to securely share and link government data and empower communities and businesses to use data to address global issues.
1) The document discusses how open data and interoperability can drive innovation by empowering people and communities through access to government data.
2) Key points include how open data can meet regulatory needs, communicate with citizens, and spur new economic development and innovation.
3) An open data ecosystem is created by gathering and connecting data, infrastructure, developers, and communities to empower choices and change behavior.
Using Data.gov Communities to Drive Innovation and Collaboration aims to foster communities on Data.gov around priority topics to connect innovators, industry, academia, and government. Communities are public spaces that present data from multiple organizations on a single topic. Examples include Health, Energy, Education, and Ocean communities. Agencies are encouraged to lead or contribute to communities, engage their networks, and sponsor challenges to drive innovation using government data.
Knowledge Sharing and Social Media at NASAJeanne Holm
This represents work done while I was serving as the Chief Knowledge Architect at NASA's Jet Propulsion Laboratory using social media to encourage collaboration inside and outside the agency
The document discusses how open data and knowledge sharing can drive innovation. It provides examples of how government data from sources like NASA, NOAA, and Health and Human Services have been used by developers to create applications that improve lives. Open data initiatives like Data.gov and Health.Data.gov aim to gather data, connect communities of developers and experts, and encourage the creation of technologies and visualizations that empower citizens. The ultimate goal is to fuel innovation and economic opportunities through making vast amounts of government data openly available.
The document discusses the goals and progress of Data.gov, a US government platform that provides access to government data. It aims to 1) gather data from agencies and make it openly available, 2) connect developers, scientists and citizens to find solutions, 3) provide infrastructure based on standards, and 4) encourage apps and visualizations using the data. Since 2009, Data.gov has grown from 47 to over 400,000 datasets and driven the creation of hundreds of applications and visualizations that have improved lives. The document outlines plans to further open data internationally and drive innovation.
A presentation on knowledge sharing, innovation, and open government data presented to the University of Adelaide MBA program during Dr. David Pender's class
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfflufftailshop
When it comes to unit testing in the .NET ecosystem, developers have a wide range of options available. Among the most popular choices are NUnit, XUnit, and MSTest. These unit testing frameworks provide essential tools and features to help ensure the quality and reliability of code. However, understanding the differences between these frameworks is crucial for selecting the most suitable one for your projects.
A Comprehensive Guide to DeFi Development Services in 2024Intelisync
DeFi represents a paradigm shift in the financial industry. Instead of relying on traditional, centralized institutions like banks, DeFi leverages blockchain technology to create a decentralized network of financial services. This means that financial transactions can occur directly between parties, without intermediaries, using smart contracts on platforms like Ethereum.
In 2024, we are witnessing an explosion of new DeFi projects and protocols, each pushing the boundaries of what’s possible in finance.
In summary, DeFi in 2024 is not just a trend; it’s a revolution that democratizes finance, enhances security and transparency, and fosters continuous innovation. As we proceed through this presentation, we'll explore the various components and services of DeFi in detail, shedding light on how they are transforming the financial landscape.
At Intelisync, we specialize in providing comprehensive DeFi development services tailored to meet the unique needs of our clients. From smart contract development to dApp creation and security audits, we ensure that your DeFi project is built with innovation, security, and scalability in mind. Trust Intelisync to guide you through the intricate landscape of decentralized finance and unlock the full potential of blockchain technology.
Ready to take your DeFi project to the next level? Partner with Intelisync for expert DeFi development services today!
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
3. Data, Data Everywhere
Smart phones
Smart cars
Smart people
Sensors
RFID
Cameras…everywhere
June 11, 2014 3
Americanis.net
4. Move from Structured to Unstructured
Heterogeneous sources of data
Structured (tables, transactions) = schema
Semi-structured (human-readable, XML, JSON)
Unstructured (images, audio, videos) = no relationship
June 11, 2014 4
5. Recent Data Growth Web 2.0
Social media
Facebook
Twitter
Skype
The Internet ofThings
Many sources
Varied formats
Relatively timely
Web content
Many authors
Unstructured
Highly variable trust
and provenance
Gaming
Highly specific
Huge transactional data
Real-time, high
bandwidth usage
June 11, 2014 5
6. Recent Data Growth Web 2.0
Open data
Government and industry
Structured and unstructured
Accessible
Private data
Apps
Health data
Credit card and financial data
The Web
Browsers
Search engines
Web site metrics
June 11, 2014 6
7. Creating Order from the Chaos
Open vs. closed
Multiple formats
Unstructured
Trusted vs. unvalidated
June 11, 2014 7
8. Releasing and using
open data is about
empowering people to
make better decisions
June 11, 2014 8
10. Project Open Data: Open Source Policy
Open source
government policy,
technical guidance,
and software
Citizen contributions
to policy, code, and
content
http://project-open-
data.github.io/
June 11, 2014 10
12. Creating the Open Data Community
Open Data
is an
Ecosystem
June 11, 2014 12
13. Open Exchanges with Citizens
Questions and answers at
the new Open Data Stack
Exchange
http://opendata.stackexchange.com/
Data jams and data
paloozas at theWhite
House
June 11, 2014 13
16. Open Exchange with Developers
Created a new Open Data Stack Exchange to field
questions to the global community:
http://opendata.stackexchange.com/
June 11, 2014 16
17. Citizen Participation: Redesigning Data.gov
In looking at the redesign, conducted multiple places for citizens to
say what they wanted
Formal usability testing (3 rounds)
Blogs
Next.Data.gov
Quora
Twitter @usdatagov
Open Data Stack Exchange
Multiple social media platforms
All the comments in one place Github
Issues tracked at
https://github.com/GSA/data.gov/issues?labels=&milestone=&pag
e=1&state=open
June 11, 2014 17
18. UsabilityTesting
Created and vetted usability test that focused on what actions
people completed on the site and expectations they had for what
they would find
Face-to-face testing inWashington D.C.
Virtual testing via Skype and phone
Online testing using Loop 11
June 11, 2014 18
Reached out to key users
Data journalists
Researchers
Developers
Entrepreneurs
Data scientists
Businesses
Students and teachers
Advocacy groups
People who had complained
about Data.gov
19. Evaluate the Feedback
All issues were copied, connected, or added to Github from any
public communication channel
Issues were assigned to a person and a build
Discussion was encouraged on each and people were invited to
the conversation
June 11, 2014 19
21. U.S. Open Data for Cities, Counties, and
States
21June 11, 2014
22. Linked Data and the SemanticWeb
June 11, 2014 22
Join theW3C
eGovernment Interest
Group
www.w3.org/egov
23. Open Communities
Community
Developers ✓
Safety ✓
Energy ✓
Health ✓
Law ✓
Education ✓
Ocean ✓
Manufacturing ✓
Business ✓
Ethics ✓
States ✓
Counties ✓
Cities ✓
Agriculture ✓
+ many more…
June 11, 2014 23
25. Open Government Platform (OGPL)
Email, Github, Facebook, and Twitter for discussion
https://github.com/opengovtplatform
http://www.opengovplatform.org
June 11, 2014 25
36. International Space Apps
Open annual event, around the world and in space
95 physical locations in 46 countries participated;, 8,195
participants, 671 projects
Where on Earth
Exomars Rover is My Robot
Asteroid Prospector
SpaceWearables
Growing Food for A MartianTable
June 11, 2014 36
41. Organizing and Understanding the Data
Web searching, mining, and crawling
Algorithms
Visualizations
Text mining
Clustering
Semantic analysis
Linked data
Machine learning
June 11, 2014 41
42. New Role: Data Scientist
Combines technical and business skills
Looks at complex data problems with subject matter
expertise
Applies technologies to mine, analyze, and visualize the
data
Understands statistics and math, coding and algorithms
Can explain the significance of the data to others
Leader of the data scientists: The Chief Data Officer
June 11, 2014 42
44. Open Data Matters
Connect citizens to open data to transform their world and
empower them through education
Connect developers to open data to create new ways of
using the data to inform others
Connect businesses to open data to provide new services
and products for everyone to use
Connect data scientists to open data to analyze the past and
predict the future
Encourage governments to release more open data
June 11, 2014 44
45. Helping to improve the
lives of people in our
community
June 11, 2014 45