This document discusses MongoDB and its use by Sunlight Foundation for three open data projects: the National Data Catalog, Real-Time Congress API, and Open State Project. MongoDB allows storing disparate data sources and formats in a schemaless manner. This enables aggregating large amounts of raw data and serving it through lightweight RESTful APIs. The document provides examples of congressional data stored and filtered in MongoDB.
The trajectory schema.org has taken, starting with a history that is less a retrospective than a narrative. I'll follow this narrative to the fortunately-timed emergence of JSON-LD, providing as it does a flexible, standards-based serialization of the vocabulary.
This, I'll explain, helped fuel the popularity of schema.org, which in turn has caused a demand for more schemas, growing the vocabulary and its capabilities. I'll make the case that schema.org has started to resemble exactly what everyone involved in the initiative declared it shouldn't be: an ontology of everything.
Whether or not that be the case, I'll say, the utility of having a relatively simple, well thought-out, well-understood and very broad vocabulary available has made schema.org (along with JSON-LD) a go-to tool for linked data modelers.
Finally, and with a look at the many ways Google, in particular, has made use of schema.org, I'll explore to what extent its utility extends past being a convenient starting for point for back-of-the napkin knowledge graph development, or whether it's making a significant contribution to realizing the promise of a web of data.
GraphQL and its schema as a universal layer for database accessConnected Data World
GraphQL is a query language mostly used to streamline access to REST APIs. It is seeing tremendous growth and adoption, in organizations like Airbnb, Coursera, Docker, GitHub, Twitter, Uber, and Facebook, where it was invented.
As REST APIs are proliferating, the promise of accessing them all through a single query language and hub, which is what GraphQL and GraphQL server implementations bring, is alluring.
A significant recent addition to GraphQL was SDL, its schema definition language. SDL enables developers to define a schema governing interaction with the back-end that GraphQL servers can then implement and enforce.
Prisma is a productized version of the data layer leveraging GraphQL to access any database. Prisma works with MySQL, Postgres, and MongoDB, and is adding to this list.
Prisma sees the GraphQL community really coming together around the idea of schema-first development, and wants to use GraphQL SDL as the foundation for all interfaces between systems.
The trajectory schema.org has taken, starting with a history that is less a retrospective than a narrative. I'll follow this narrative to the fortunately-timed emergence of JSON-LD, providing as it does a flexible, standards-based serialization of the vocabulary.
This, I'll explain, helped fuel the popularity of schema.org, which in turn has caused a demand for more schemas, growing the vocabulary and its capabilities. I'll make the case that schema.org has started to resemble exactly what everyone involved in the initiative declared it shouldn't be: an ontology of everything.
Whether or not that be the case, I'll say, the utility of having a relatively simple, well thought-out, well-understood and very broad vocabulary available has made schema.org (along with JSON-LD) a go-to tool for linked data modelers.
Finally, and with a look at the many ways Google, in particular, has made use of schema.org, I'll explore to what extent its utility extends past being a convenient starting for point for back-of-the napkin knowledge graph development, or whether it's making a significant contribution to realizing the promise of a web of data.
GraphQL and its schema as a universal layer for database accessConnected Data World
GraphQL is a query language mostly used to streamline access to REST APIs. It is seeing tremendous growth and adoption, in organizations like Airbnb, Coursera, Docker, GitHub, Twitter, Uber, and Facebook, where it was invented.
As REST APIs are proliferating, the promise of accessing them all through a single query language and hub, which is what GraphQL and GraphQL server implementations bring, is alluring.
A significant recent addition to GraphQL was SDL, its schema definition language. SDL enables developers to define a schema governing interaction with the back-end that GraphQL servers can then implement and enforce.
Prisma is a productized version of the data layer leveraging GraphQL to access any database. Prisma works with MySQL, Postgres, and MongoDB, and is adding to this list.
Prisma sees the GraphQL community really coming together around the idea of schema-first development, and wants to use GraphQL SDL as the foundation for all interfaces between systems.
Reach Force Marketing Automation Mini Conference - 6/18/2013Steve Susina
Studies show that there is an inverse relationship between number of fields on a registration form and the number of conversions. A famous example provided by Marketo shows that reducing a form from 9 fields to 5 fields increased form completion rate by 45%.
Theoretically, if we have enough information we can we eliminate registration form altogether. Data exists in marketing automation (Name, Company, Job Title, Phone, etc.). Presentation describes results using formless registration
I was asked to do a short presentation to the Flick team on the stuff I saw at ETech 2009. I don’t normally take notes at conferences, and I was only there for two days, so this was the best I could do on short notice. There is only one image.
Use of Open Data in Hong Kong (LegCo 2014)Sammy Fung
Presentation on use of open data in HK given to Legislative Council Secretariat. Content is mixed from my presentations at startmeup 2013 and opendatahk meetup.
In this webinar hosts Kristina Lisacki, Aaron Sawitsky and Marc Tollin walk you through all the ways that you can benefit from accessing the raw data collected by Localytics.
In this webinar you'll learn:
What is raw data and how it differs from the data seen in the Localytics Dashboard.
How raw data can help you obtain a true omni-channel view of your customers.
The various ways that you can access your raw data, along with the benefits and drawbacks to each approach.
How Localytics Direct Access provides a new option for accessing your raw data (including a demo of the product).
Reach Force Marketing Automation Mini Conference - 6/18/2013Steve Susina
Studies show that there is an inverse relationship between number of fields on a registration form and the number of conversions. A famous example provided by Marketo shows that reducing a form from 9 fields to 5 fields increased form completion rate by 45%.
Theoretically, if we have enough information we can we eliminate registration form altogether. Data exists in marketing automation (Name, Company, Job Title, Phone, etc.). Presentation describes results using formless registration
I was asked to do a short presentation to the Flick team on the stuff I saw at ETech 2009. I don’t normally take notes at conferences, and I was only there for two days, so this was the best I could do on short notice. There is only one image.
Use of Open Data in Hong Kong (LegCo 2014)Sammy Fung
Presentation on use of open data in HK given to Legislative Council Secretariat. Content is mixed from my presentations at startmeup 2013 and opendatahk meetup.
In this webinar hosts Kristina Lisacki, Aaron Sawitsky and Marc Tollin walk you through all the ways that you can benefit from accessing the raw data collected by Localytics.
In this webinar you'll learn:
What is raw data and how it differs from the data seen in the Localytics Dashboard.
How raw data can help you obtain a true omni-channel view of your customers.
The various ways that you can access your raw data, along with the benefits and drawbacks to each approach.
How Localytics Direct Access provides a new option for accessing your raw data (including a demo of the product).
For the 28th Civic User Testing Group (CUTGroup) test, Smart Chicago Collaborative tested the redesigned homepage of the City of Chicago’s Open Data Portal. The Open Data Portal allows users to find resources and various datasets regarding the city of Chicago. The City of Chicago Department of Innovation and Technology is working with Socrata to redesign the Open Data Portal, focused currently on the homepage, to be more user-friendly while representing multiple data and technology initiatives and applications created with open data.
Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps (at PETS 2016)Hamza Harkous
Presentation at the 16th Privacy Enhancing Technologies Symposium
(PETS 2016)
The paper is available here: https://infoscience.epfl.ch/record/218003
Video slides available here: https://www.youtube.com/watch?v=w3u3lkbGaXg
You can find more about my work on my personal website: http://hamzaharkous.com
Social Media Data Collection & AnalysisScott Sanders
A non-technical primer on how to collect and analyze social media data. This was an invited lecture by Biostatistics and Bioinformatics Department in the School of Public Health at the University of Louisville.
Goodle Developer Days Munich 2008 - Open Social UpdatePatrick Chanezon
Updates about the OpenSocial ecosystem at Google developer days Munich, including presentations from Xing, Lokalisten, netlog and Viadeo..
OpenSocial is an open specification defining a common API that works on many different social websites, including MySpace, Plaxo, Hi5, Ning, orkut, Friendster Salesforce.com and LinkedIn, among others. This allows developers to learn one API, then write a social application for any of those sites: Learn once, write anywhere.
In addition, in order to make it easier for developers of social sites to implement the API and make their site an OpenSocial container, the Apache project Shindig provides reference implementations for OpenSocial containers in two languages (Java, PHP). Shindig will define a language specific Service Provider Interface (SPI) that a social site can implement to connect Shindig to People, Persistence and Activities backend services for the social site. Shindig will then expose these services as OpenSocial JavaScript and REST APIs.
In this session we will explain what OpenSocial is, show examples of OpenSocial containers and applications, demonstrate how to create an OpenSocial application, and explain how to leverage Apache Shindig in order to implement an OpenSocial container.
Goodle Developer Days London 2008 - Open Social UpdatePatrick Chanezon
Updates about the OpenSocial ecosystem at Google developer days London including presentations from Netlog and Viadeo.
OpenSocial is an open specification defining a common API that works on many different social websites, including MySpace, Plaxo, Hi5, Ning, orkut, Friendster Salesforce.com and LinkedIn, among others. This allows developers to learn one API, then write a social application for any of those sites: Learn once, write anywhere.
In addition, in order to make it easier for developers of social sites to implement the API and make their site an OpenSocial container, the Apache project Shindig provides reference implementations for OpenSocial containers in two languages (Java, PHP). Shindig will define a language specific Service Provider Interface (SPI) that a social site can implement to connect Shindig to People, Persistence and Activities backend services for the social site. Shindig will then expose these services as OpenSocial JavaScript and REST APIs.
In this session we will explain what OpenSocial is, show examples of OpenSocial containers and applications, demonstrate how to create an OpenSocial application, and explain how to leverage Apache Shindig in order to implement an OpenSocial container.
Amundsen: From discovering to security datamarkgrover
Hear about how Lyft and Square are solving data discovery and data security challenges using a shared open source project - Amundsen.
Talk details and abstract:
https://www.datacouncil.ai/talks/amundsen-from-discovering-data-to-securing-data
8. Question? @LuigiMontanez
Opening Up Data
✴ Storing data from disparate sources
✴ Data dumps
✴ Web scraping
✴ Text/PDF parsing
✴ Serving RESTful JSON APIs
16. Text
{
"title": "Worldwide M1+ Earthquakes, Past Hour",
"description": "Real-time, worldwide earthquake list for the past h
"homepage": "http://data.gov/raw/32",
"official_docs": "http://earthquake.usgs.gov/eqcenter/catalogs/",
"organization": "Department of the Interior",
"original_catalog": "data.gov",
}
17. Text
{
"title": "Worldwide M1+ Earthquakes, Past Hour",
"description": "Real-time, worldwide earthquake list for the past
"homepage": "http://data.gov/raw/32",
"official_docs": "http://earthquake.usgs.gov/eqcenter/catalogs/",
"organization_id": "4cbcc0ff2c34576ba4000001",
"catalog_id": "4cbcc0ab2d34d76b97020433",
}
18. {
"title": "Worldwide M1+ Earthquakes, Past Hour",
"description": "Real-time, worldwide earthquake list for the past h
"homepage": "http://data.gov/raw/32",
"official_docs": "http://earthquake.usgs.gov/eqcenter/catalogs/",
"organization": { "name": "Department of the Interior",
"id": "4cbcc0ff2c34576ba4000001",
"slug": "us-dept-of-interior"
},
"original_catalog": { "name": "data.gov",
"id": "4cbcc0ab2d34d76b97020433",
"slug": "datagov"
}
}
19.
20. {
"title": "Worldwide M1+ Earthquakes, Past Hour",
"description": "Real-time, worldwide earthquake list for the past h
"homepage": "http://data.gov/raw/32",
"official_docs": "http://earthquake.usgs.gov/eqcenter/catalogs/",
"organization": {
"name": "Department of the Interior",
"id": "4cbcc0ff2c34576ba4000001",
"slug": "us-dept-of-interior"
},
"original_catalog": {
"name": "data.gov",
"id": "4cbcc0ab2d34d76b97020433",
"slug": "datagov"
},
"downloads": [ { "type": "csv", "url": "http://data.gov/download/32
"ratings" : {
"average_rating": 3.5,
"rating_count": 23
},
"comments": []
}
21.
22. Question? @LuigiMontanez
User-centric data?
✴ Source document: contains collection of
user data
✴ User document: contains collection of
source data
✴ UserSource document
✴ Rating, Favorite, Note docs