Google Public Data Explorer
Upcoming SlideShare
Loading in...5
×
 

Google Public Data Explorer

on

  • 392 views

 

Statistics

Views

Total Views
392
Views on SlideShare
392
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • A DSPL dataset is a bundle that contains an XML file and a set of CSV files. The CSV files are simple tables containing the data of the dataset. The XML file describes the metadata of the dataset, including informational metadata like descriptions of measures, as well as structural metadata like references between tables. The metadata lets non-expert users explore and visualize your data.The only prerequisite for understanding this tutorial is a good level of understanding of XML. Some understanding of simple database concepts (e.g., tables, primary keys) may help, but it's not required. For reference, the completed XML file and complete dataset bundle associated with this tutorial are also available for review.
  • General information about the provider of the dataset: its name and a URL where more information can be found (generally the data provider's home page)The element contains general information about the dataset: name, description, and a URL where more information can be foundThe element contains information about the provider of the dataset: its name and a URL where more information can be found (generally the data provider's home page).
  • Now that we have provided some general information about the dataset, we're ready to start defining its contents.Concepts that are categorical, such as state, are associated with concept tables, which enumerate all their possible values (California, Arizona, etc.). Concepts may have additional columns for properties such as the name or the country of a state.A concept is a definition of a type of data that appears in a dataset. The data values that correspond to a given concept are called instances of that concept.Every concept must provide an id that uniquely identifies the concept within the dataset. Just like for the dataset and its provider, the elements provide textual information about the concept, such as its name and description. The element specifies the data type for the instances of the concept (in other words, its "values").Finally, the school concept has a element. This element references a table that enumerates the list of all schools.The schools table specifies the columns of the table and their types, and references a CSV file that contains the data.
  • The values of metrics vary with the values of dimensions.Just like concepts, slices include a reference to a table that contains the data of the slice. The referenced table must have one column for each dimension and metric of the slice. Just as for concepts, the slice's dimensions and metrics are mapped to the table columns with the same ids.Slices define each combination of concepts for which there is statistical data in the dataset. A slice contains dimensions and metrics. In the above picture, the dimensions are blue and the metrics are orange. In this example, the slice gender_country_slice has data for the metric population and the dimensions country, year and gender. Another slice, called country_slice, gives total yearly population numbers (metric) for countries.

Google Public Data Explorer Google Public Data Explorer Presentation Transcript

  • Digital Enterprise Research Institute www.deri.ie Google Public Data Explorer Aftab Iqbal Stefan.Decker@deri.org http://www.StefanDecker.org/ Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
  • IntroductionDigital Enterprise Research Institute www.deri.ie  DSPL consists of :  XML  CSV files
  • DSPL DatasetDigital Enterprise Research Institute www.deri.ie  General information  About the dataset  Concepts  Definitions of "things" that appear in the dataset (e.g., counties, unemployment rate, gender, etc.)  Slices  Combinations of concepts for which there are data  Tables  Data for concepts and slices. Concept tables hold enumerations and slice tables hold statistical data  Topics  Organize the concepts of the dataset in a meaningful hierarchy through labeling View slide
  • School Enrollment 2009_2010 *Digital Enterprise Research Institute www.deri.ie School_Roll_No Short_Name Level Male Female 00697S ST BRIDGIDS NS Primary 377 447 01170G NAUL NS Primary 40 61 09492W BALSCADDEN NS Primary 98 133 … … … … …* Snapshot took from http://data.fingal.ie/ViewDataSets/Details/default.aspx?datasetID=385 View slide
  • DSPL – Contd.Digital Enterprise Research Institute www.deri.ie  General Information  General information about the provider of the dataset <info> <name> <value>School</value> </name> <description> <value>Statistics about Fingal County Schools</value> </description> <url> <value></value> </url> </info> <provider> <name> <value>County Fingal School Enrollment Statistics</value> </name> <url> <value>http://data.fingal.ie/ViewDataSets/Details/default.aspx?datasetID=385</value> </url> </provider>
  • DSPL – Contd.Digital Enterprise Research Institute www.deri.ie  Concepts  Type of data that appears in a dataset <concept id="Schools“ extends="geo:location" > <info> <table id="schools_table"> <name> <column id="School" type="string"/> <value>Schools</value> <column id=“School_Roll_No" type="string"/> </name> <column id="latitude" type="float"/> <description> <column id="longitude" type="float"/> <value>List of schools for Co. Fingal</value> <data> </description> <file format="csv" encoding="utf-8">schools.csv</file> </info> </data> <type ref="string"/> </table> <table ref="schools_table"/> </concept> school name latitude longitude 00697S Saint Bridgids National School 53.37514 -6.36221 01170G S N Na H Aille Naul National School 53.57887 -6.28564 09492W Balscadden National School 53.61528 -6.23218 09642P Burrow National School 53.39129 -6.10028 … … … …
  • DSPL – Contd.Digital Enterprise Research Institute www.deri.ie  Slices  It’s a combination of concepts for which data exists  contains two kinds of concept references: Dimensions and metrics. <table id="enrolment_slice_table"> <slice id="enrolment_slice"> <column id="school" type="string"/> <dimension concept="school"/> <column id="M" type="integer"/> <dimension concept="time:year"/> <column id="F" type="integer"/> <metric concept="M"/> <column id="year" type="date" format="yyyy"/> <metric concept="F"/> <data> <table ref="enrolment_slice_table"/> <file format="csv" encoding="utf- </slice> 8">school_enrolment_slice.csv</file> </data> </table>
  • School Enrollment SliceDigital Enterprise Research Institute www.deri.ie Dimensions metrics School Male Female Year Saint Bridgids National School 377 447 2009 Saint Bridgids National School 475 392 2010 Balscadden National School 98 133 2009 Balscadden National School 126 102 2010 … … … …
  • DSPL – Contd.Digital Enterprise Research Institute www.deri.ie  Topics  Classify concepts hierarchically, and are used by applications to help users navigate to your data. <topic id="Male_indicators"> <info> <name><value>Male Students Enrollment</value></name> </info> </topic> <topic id="Female_indicators"> <info> <name><value>Female Students Enrollment</value></name> </info> </topic>
  • Data CleansingDigital Enterprise Research Institute www.deri.ie School Enrollment 2009 School Enrollment 2010 School_Roll_No Short_Name Level Male Female School_Roll_No Short_Name Level Male Female 00697S ST BRIDGIDS NS Primary 377 447 00697S ST BRIDGIDS NS Primary 475 392 01170G NAUL NS Primary 40 61 01170G NAUL NS Primary 58 40 … … … … … … … … … … School Male Female Year 00697S 377 447 2009 00697S 475 392 2010 01170G 40 61 2009 01170G 58 40 2010 … … … … School_Enrollment_Slice.csv School Name Latitude Longitude 00697S Saint Bridgids National School 53.37514 -6.36221 01170G S N Na H Aille Naul National School 53.57887 -6.28564 … … … … Schools.csv
  • Digital Enterprise Research Institute www.deri.ie <table id="enrolment_slice_table"> <slice id="enrolment_slice"> <column id="school" type="string"/> <dimension concept="school"/> <column id="Male" type="integer"/> <dimension concept="time:year"/> <column id="Female" type="integer"/> <metric concept="Male"/> <column id="year" type="date" format="yyyy"/> <metric concept="Female"/> <data> <table ref="enrolment_slice_table"/> <file format="csv" encoding="utf-8">School_Enrollment_Slice.csv</file> </slice> </data> </table> Deployment Compressed CSV files metadata