SlideShare a Scribd company logo
1 of 55
Download to read offline
Exploring Open Data with BigQuery
Jenny Tong
Developer Advocate
Google Cloud Platform
@MimmingCodes
Agenda
● Origin story
● Count stuff
● How it works
● Some cool open data
● Do something useful
Google Research Publications
Google Research Publications
Managed Cloud Versions
Bigtable
Flume
Dremel
Bigtable
Dataflow
BigQuery
Google BigQueryGoogle BigQuery
Let's count some stuff
SELECT count(word)
FROM publicdata:samples.shakespeare
Words in Shakespeare
SELECT sum(requests) as total
FROM [fh-bigquery:wikipedia.pagecounts_20150511_05]
Wikipedia hits over 1 hour
SELECT sum(requests) as total
FROM [fh-bigquery:wikipedia.pagecounts_201505]
Wikipedia hits over 1 month
Several years of Wikipedia data
SELECT sum(requests) as total
FROM
[fh-bigquery:wikipedia.pagecounts_201105],
[fh-bigquery:wikipedia.pagecounts_201106],
[fh-bigquery:wikipedia.pagecounts_201107],
...
SELECT
SUM(requests) AS total
FROM
TABLE_QUERY(
[fh-bigquery:wikipedia],
'REGEXP_MATCH(
table_id,
r"pagecounts_2015[0-9]{2}$")')
Several years of Wikipedia data
How about a RegExp
SELECT
SUM(requests) AS total
FROM
TABLE_QUERY(
[fh-bigquery:wikipedia],
'REGEXP_MATCH(
table_id,
r"pagecounts_2015[0-9]{2}$")')
WHERE
(REGEXP_MATCH(title, '.*[dD]inosaur.*'))
How did it do that?
o_O
Qualities of a good RDBMS
Qualities of a good RDBMS
● Inserts & locking
● Indexing
● Cache
● Query planning
Qualities of a good RDBMS
● Inserts & locking
● Indexing
● Cache
● Query planning
Storing data
-- -- -- --
-- -- -- --
-- -- -- --
Table
Columns
Disks
Reading data: Life of a BigQuery
SELECT sum(requests) as sum
FROM (
SELECT requests, title
FROM [fh-bigquery:wikipedia.
pagecounts_201501]
WHERE
(REGEXP_MATCH(title, '[Jj]en.+'))
)
Life of a BigQuery
L L
MMixer
Leaf
Storage
L L L L
M M
M
Life of a BigQuery
Root Mixer
Mixer
Leaf
Storage
Life of a BigQuery
Query
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
Life of a BigQueryLife of a BigQuery
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
SELECT requests, title
Life of a BigQueryLife of a BigQuery
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
5.4 Bil
SELECT requests, title
WHERE
(REGEXP_MATCH(title, '[Jj]en.+'))
Life of a BigQueryLife of a BigQuery
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
5.4 Bil
SELECT sum(requests)
5.8 Mil
WHERE
(REGEXP_MATCH(title, '[Jj]en.+'))
SELECT requests, title
Life of a BigQueryLife of a BigQuery
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
5.4 Bil
SELECT sum(requests)
5.8 Mil
WHERE
(REGEXP_MATCH(title, '[Jj]en.+'))
SELECT requests, title
SELECT sum(requests)
Open Data
Finding Open Data
opendata.stackexchange.com
Finding Open Data
reddit.com/r/dataisbeautiful
Time to explore
GSOD
Weather in Half Moon Bay
SELECT DATE(year+mo+da) day, min, max
FROM [fh-bigquery:weather_gsod.gsod2013]
WHERE stn IN (
SELECT usaf FROM [fh-bigquery:weather_gsod.stations]
WHERE name = 'HALF MOON BAY AIRPOR')
AND max < 200
ORDER BY day;
Weather in Half Moon Bay
SELECT DATE(year+mo+da) day, min, max
FROM [fh-bigquery:weather_gsod.gsod2013]
WHERE stn IN (
SELECT usaf FROM [fh-bigquery:weather_gsod.stations]
WHERE name = 'HALF MOON BAY AIRPOR')
AND max < 200
ORDER BY day;
Global high temperatures
SELECT year, max(max) as max
FROM
TABLE_QUERY(
[fh-bigquery:weather_gsod],
'table_id CONTAINS "gsod"')
where max < 200
group by year order by year asc
GDELT
Stories per month - Massachusetts
SELECT DATE(STRING(MonthYear) + '01') month,
SUM(ActionGeo_ADM1Code='USMA') US
FROM [gdelt-bq:full.events]
WHERE MonthYear > 0
GROUP BY 1 ORDER BY 1
SELECT DATE(STRING(MonthYear) + '01') month,
SUM(ActionGeo_ADM1Code='USMA') / COUNT(*) newsyness
FROM [gdelt-bq:full.events]
WHERE MonthYear > 0
GROUP BY 1 ORDER BY 1
Stories per month, normalized
https://developers.google.com/genomics/
Genomics
Genomics
SELECT Sample, SUM(single), SUM(double),
FROM (
SELECT call.call_set_name AS Sample,
SOME(call.genotype > 0) AND NOT EVERY(call.
genotype > 0) WITHIN call AS single,
EVERY(call.genotype > 0) WITHIN call AS double,
FROM[genomics-public-data:1000_genomes.variants]
OMIT RECORD IF reference_name IN ("X","Y","MT"))
GROUP BY Sample ORDER BY Sample
Genomics
SELECT Sample, SUM(single), SUM(double),
FROM (
SELECT call.call_set_name AS Sample,
SOME(call.genotype > 0) AND NOT EVERY(call.
genotype > 0) WITHIN call AS single,
EVERY(call.genotype > 0) WITHIN call AS double,
FROM[genomics-public-data:1000_genomes.variants]
OMIT RECORD IF reference_name IN ("X","Y","MT"))
GROUP BY Sample ORDER BY Sample
Something useful:
Use Wikipedia data to pick a movie
1. Wikipedia edits
2. ???
3. Movie recommendation
Follow the edits
Same
editor
select title, id, count(id) as edits
from [publicdata:samples.wikipedia]
where
title contains 'Hackers'
and title contains '(film)'
and wp_namespace = 0
group by title, id
order by edits
limit 10
Pick a great movie
select title, id, count(id) as edits
from [publicdata:samples.wikipedia]
where contributor_id in (
select contributor_id
from [publicdata:samples.wikipedia]
where
id=264176
and contributor_id is not null
and is_bot is null
and wp_namespace = 0
and title CONTAINS '(film)'
group by contributor_id)
and wp_namespace = 0
and id != 264176
and title CONTAINS '(film)'
group each by title, id
order by edits desc
limit 100
Find edits in common
Discover the most broadly popular films
select id from (
select id, count(id) as edits
from [publicdata:samples.wikipedia]
where
wp_namespace = 0
and title CONTAINS '(film)'
group each by id
order by edits desc
limit 20)
Edits in common, minus broadly popular
select title, id, count(id) as edits
from [publicdata:samples.wikipedia]
where contributor_id in (
select contributor_id
from [publicdata:samples.wikipedia]
where
id=264176
and contributor_id is not null
and is_bot is null
and wp_namespace = 0
and title CONTAINS '(film)'
group by contributor_id)
and wp_namespace = 0
and id != 264176
and title CONTAINS '(film)'
and id not in (
select id from (
select id, count(id) as edits
from [publicdata:samples.
wikipedia]
where
wp_namespace = 0
and title CONTAINS '(film)'
group each by id
order by edits desc
limit 20
)
)
group each by title, id
order by edits desc
limit 100
What we talked about
● Origin story
● Count stuff
● How it works
● Some cool open data
● Practical applications
● Try BigQuery
○ bigquery.cloud.google.com
● Queries we ran
○ github.com/mimming/snippets
● Me
○ @MimmingCodes
○ google.com/+mimming
The end
Exploring Open Date with BigQuery: Jenny Tong

More Related Content

What's hot

Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation FrameworkMongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkCaserta
 
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...MongoDB
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorHenrik Ingo
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Anuj Jain
 
PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.Andrii Soldatenko
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB MongoDB
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationJoe Drumgoole
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and PythonMike Bright
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?Andrii Soldatenko
 
San Francisco Java User Group
San Francisco Java User GroupSan Francisco Java User Group
San Francisco Java User Groupkchodorow
 
Building social network with Neo4j and Python
Building social network with Neo4j and PythonBuilding social network with Neo4j and Python
Building social network with Neo4j and PythonAndrii Soldatenko
 

What's hot (20)

Mongo indexes
Mongo indexesMongo indexes
Mongo indexes
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
MongoDB and Python
MongoDB and PythonMongoDB and Python
MongoDB and Python
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
 
Python and MongoDB
Python and MongoDB Python and MongoDB
Python and MongoDB
 
hySON - D2Fest
hySON - D2FesthySON - D2Fest
hySON - D2Fest
 
hySON
hySONhySON
hySON
 
PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.
 
GraphDB
GraphDBGraphDB
GraphDB
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
San Francisco Java User Group
San Francisco Java User GroupSan Francisco Java User Group
San Francisco Java User Group
 
Profile of NPOESS HDF5 Files
Profile of NPOESS HDF5 FilesProfile of NPOESS HDF5 Files
Profile of NPOESS HDF5 Files
 
Building social network with Neo4j and Python
Building social network with Neo4j and PythonBuilding social network with Neo4j and Python
Building social network with Neo4j and Python
 

Viewers also liked

Access to iDevices
Access to iDevicesAccess to iDevices
Access to iDeviceswill wade
 
Social business online information 201112
Social business online information 201112 Social business online information 201112
Social business online information 201112 Alpesh Doshi
 
Put the romance back into rome
Put the romance back into romePut the romance back into rome
Put the romance back into romeWhere2Holiday
 
How to set your ADI business profile
How to set your ADI business profileHow to set your ADI business profile
How to set your ADI business profileRoadio
 
Get to know Holiday Extras 2011
Get to know Holiday Extras 2011Get to know Holiday Extras 2011
Get to know Holiday Extras 2011Matthew Pack
 
How to get started with Roadio in under 60 seconds
How to get started with Roadio in under 60 secondsHow to get started with Roadio in under 60 seconds
How to get started with Roadio in under 60 secondsRoadio
 
SMX 2010 Summary of Hot Topics from SEO Track
SMX 2010 Summary of Hot Topics from SEO TrackSMX 2010 Summary of Hot Topics from SEO Track
SMX 2010 Summary of Hot Topics from SEO TrackMatthew Pack
 
Presentation Hassle Free Anna
Presentation Hassle Free AnnaPresentation Hassle Free Anna
Presentation Hassle Free AnnaMatthew Pack
 
How to manage your payments
How to manage your paymentsHow to manage your payments
How to manage your paymentsRoadio
 
Static Sites Can be the Solution (Simon Wood)
Static Sites Can be the Solution (Simon Wood)Static Sites Can be the Solution (Simon Wood)
Static Sites Can be the Solution (Simon Wood)Future Insights
 
Cinematic UX, Brad Weaver
Cinematic UX, Brad WeaverCinematic UX, Brad Weaver
Cinematic UX, Brad WeaverFuture Insights
 
Polyglot polywhat polywhy
Polyglot polywhat polywhyPolyglot polywhat polywhy
Polyglot polywhat polywhythedumbterminal
 
Online Presence
Online PresenceOnline Presence
Online PresenceSimon Wood
 
Apache Cordova, Hybrid Application Development
Apache Cordova, Hybrid Application DevelopmentApache Cordova, Hybrid Application Development
Apache Cordova, Hybrid Application Developmentthedumbterminal
 
Surviving the enterprise storm - @RianVDM
Surviving the enterprise storm - @RianVDMSurviving the enterprise storm - @RianVDM
Surviving the enterprise storm - @RianVDMFuture Insights
 

Viewers also liked (20)

Access to iDevices
Access to iDevicesAccess to iDevices
Access to iDevices
 
Social business online information 201112
Social business online information 201112 Social business online information 201112
Social business online information 201112
 
Put the romance back into rome
Put the romance back into romePut the romance back into rome
Put the romance back into rome
 
How to set your ADI business profile
How to set your ADI business profileHow to set your ADI business profile
How to set your ADI business profile
 
Get to know Holiday Extras 2011
Get to know Holiday Extras 2011Get to know Holiday Extras 2011
Get to know Holiday Extras 2011
 
Break away old
Break away oldBreak away old
Break away old
 
How to get started with Roadio in under 60 seconds
How to get started with Roadio in under 60 secondsHow to get started with Roadio in under 60 seconds
How to get started with Roadio in under 60 seconds
 
SMX 2010 Summary of Hot Topics from SEO Track
SMX 2010 Summary of Hot Topics from SEO TrackSMX 2010 Summary of Hot Topics from SEO Track
SMX 2010 Summary of Hot Topics from SEO Track
 
Hotleads:upsell
Hotleads:upsellHotleads:upsell
Hotleads:upsell
 
Presentation Hassle Free Anna
Presentation Hassle Free AnnaPresentation Hassle Free Anna
Presentation Hassle Free Anna
 
How to manage your payments
How to manage your paymentsHow to manage your payments
How to manage your payments
 
Static Sites Can be the Solution (Simon Wood)
Static Sites Can be the Solution (Simon Wood)Static Sites Can be the Solution (Simon Wood)
Static Sites Can be the Solution (Simon Wood)
 
Cinematic UX, Brad Weaver
Cinematic UX, Brad WeaverCinematic UX, Brad Weaver
Cinematic UX, Brad Weaver
 
Polyglot polywhat polywhy
Polyglot polywhat polywhyPolyglot polywhat polywhy
Polyglot polywhat polywhy
 
Online Presence
Online PresenceOnline Presence
Online Presence
 
Design+Startup 2013
Design+Startup 2013Design+Startup 2013
Design+Startup 2013
 
BreakAway
BreakAwayBreakAway
BreakAway
 
Apache Cordova, Hybrid Application Development
Apache Cordova, Hybrid Application DevelopmentApache Cordova, Hybrid Application Development
Apache Cordova, Hybrid Application Development
 
Osservatorio congressuale Torino 2014 2015
Osservatorio congressuale Torino 2014 2015Osservatorio congressuale Torino 2014 2015
Osservatorio congressuale Torino 2014 2015
 
Surviving the enterprise storm - @RianVDM
Surviving the enterprise storm - @RianVDMSurviving the enterprise storm - @RianVDM
Surviving the enterprise storm - @RianVDM
 

Similar to Exploring Open Date with BigQuery: Jenny Tong

CloudML talk at DevFest Madurai 2016
CloudML talk at DevFest Madurai 2016 CloudML talk at DevFest Madurai 2016
CloudML talk at DevFest Madurai 2016 Karthik Padmanabhan
 
TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDatatdc-globalcode
 
Jeff Jacob MSBI Training portfolio
Jeff Jacob MSBI Training portfolioJeff Jacob MSBI Training portfolio
Jeff Jacob MSBI Training portfolioJeff Jacob
 
Avro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSONAvro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSONAlexandre Victoor
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB
 
Term 2 CS Practical File 2021-22.pdf
Term 2 CS Practical File 2021-22.pdfTerm 2 CS Practical File 2021-22.pdf
Term 2 CS Practical File 2021-22.pdfKiranKumari204016
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Stamatis Zampetakis
 
DB_lecturs8 27 11.pptx
DB_lecturs8 27 11.pptxDB_lecturs8 27 11.pptx
DB_lecturs8 27 11.pptxNermeenKamel7
 
Session 1.5 supporting virtual integration of linked data with just-in-time...
Session 1.5   supporting virtual integration of linked data with just-in-time...Session 1.5   supporting virtual integration of linked data with just-in-time...
Session 1.5 supporting virtual integration of linked data with just-in-time...semanticsconference
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...javier ramirez
 
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Big Data Spain
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Julian Hyde
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabadKelly Technologies
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidDataWorks Summit/Hadoop Summit
 
When Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at SquidooWhen Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at SquidooGil Hildebrand
 
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...javier ramirez
 

Similar to Exploring Open Date with BigQuery: Jenny Tong (20)

CloudML talk at DevFest Madurai 2016
CloudML talk at DevFest Madurai 2016 CloudML talk at DevFest Madurai 2016
CloudML talk at DevFest Madurai 2016
 
TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigData
 
Jeff Jacob MSBI Training portfolio
Jeff Jacob MSBI Training portfolioJeff Jacob MSBI Training portfolio
Jeff Jacob MSBI Training portfolio
 
Avro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSONAvro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSON
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: Keynote
 
Term 2 CS Practical File 2021-22.pdf
Term 2 CS Practical File 2021-22.pdfTerm 2 CS Practical File 2021-22.pdf
Term 2 CS Practical File 2021-22.pdf
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
DB_lecturs8 27 11.pptx
DB_lecturs8 27 11.pptxDB_lecturs8 27 11.pptx
DB_lecturs8 27 11.pptx
 
Session 1.5 supporting virtual integration of linked data with just-in-time...
Session 1.5   supporting virtual integration of linked data with just-in-time...Session 1.5   supporting virtual integration of linked data with just-in-time...
Session 1.5 supporting virtual integration of linked data with just-in-time...
 
Mashing Up The Guardian
Mashing Up The GuardianMashing Up The Guardian
Mashing Up The Guardian
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
 
Mashing Up The Guardian
Mashing Up The GuardianMashing Up The Guardian
Mashing Up The Guardian
 
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
 
Polyalgebra
PolyalgebraPolyalgebra
Polyalgebra
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabad
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
 
When Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at SquidooWhen Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at Squidoo
 
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
 
Where is the World is my Open Government Data?
Where is the World is my Open Government Data?Where is the World is my Open Government Data?
Where is the World is my Open Government Data?
 

More from Future Insights

The Human Body in the IoT. Tim Cannon + Ryan O'Shea
The Human Body in the IoT. Tim Cannon + Ryan O'SheaThe Human Body in the IoT. Tim Cannon + Ryan O'Shea
The Human Body in the IoT. Tim Cannon + Ryan O'SheaFuture Insights
 
Pretty pictures - Brandon Satrom
Pretty pictures - Brandon SatromPretty pictures - Brandon Satrom
Pretty pictures - Brandon SatromFuture Insights
 
Putting real time into practice - Saul Diez-Guerra
Putting real time into practice - Saul Diez-GuerraPutting real time into practice - Saul Diez-Guerra
Putting real time into practice - Saul Diez-GuerraFuture Insights
 
A Universal Theory of Everything, Christopher Murphy
A Universal Theory of Everything, Christopher MurphyA Universal Theory of Everything, Christopher Murphy
A Universal Theory of Everything, Christopher MurphyFuture Insights
 
Horizon Interactive Awards, Mike Sauce & Jeff Jahn
Horizon Interactive Awards, Mike Sauce & Jeff JahnHorizon Interactive Awards, Mike Sauce & Jeff Jahn
Horizon Interactive Awards, Mike Sauce & Jeff JahnFuture Insights
 
Reading Your Users’ Minds: Empiricism, Design, and Human Behavior, Shane F. B...
Reading Your Users’ Minds: Empiricism, Design, and Human Behavior, Shane F. B...Reading Your Users’ Minds: Empiricism, Design, and Human Behavior, Shane F. B...
Reading Your Users’ Minds: Empiricism, Design, and Human Behavior, Shane F. B...Future Insights
 
Front End Development Transformation at Scale, Damon Deaner
Front End Development Transformation at Scale, Damon DeanerFront End Development Transformation at Scale, Damon Deaner
Front End Development Transformation at Scale, Damon DeanerFuture Insights
 
Structuring Data from Unstructured Things. Sean Lorenz
Structuring Data from Unstructured Things. Sean LorenzStructuring Data from Unstructured Things. Sean Lorenz
Structuring Data from Unstructured Things. Sean LorenzFuture Insights
 
The Future is Modular, Jonathan Snook
The Future is Modular, Jonathan SnookThe Future is Modular, Jonathan Snook
The Future is Modular, Jonathan SnookFuture Insights
 
Designing an Enterprise CSS Framework is Hard, Stephanie Rewis
Designing an Enterprise CSS Framework is Hard, Stephanie RewisDesigning an Enterprise CSS Framework is Hard, Stephanie Rewis
Designing an Enterprise CSS Framework is Hard, Stephanie RewisFuture Insights
 
Accessibility Is More Than What Lies In The Code, Jennison Asuncion
Accessibility Is More Than What Lies In The Code, Jennison AsuncionAccessibility Is More Than What Lies In The Code, Jennison Asuncion
Accessibility Is More Than What Lies In The Code, Jennison AsuncionFuture Insights
 
Sunny with a Chance of Innovation: A How-To for Product Managers and Designer...
Sunny with a Chance of Innovation: A How-To for Product Managers and Designer...Sunny with a Chance of Innovation: A How-To for Product Managers and Designer...
Sunny with a Chance of Innovation: A How-To for Product Managers and Designer...Future Insights
 
Designing for Dyslexia, Andrew Zusman
Designing for Dyslexia, Andrew ZusmanDesigning for Dyslexia, Andrew Zusman
Designing for Dyslexia, Andrew ZusmanFuture Insights
 
Beyond Measure, Erika Hall
Beyond Measure, Erika HallBeyond Measure, Erika Hall
Beyond Measure, Erika HallFuture Insights
 
Real Artists Ship, Haraldur Thorleifsson
Real Artists Ship, Haraldur ThorleifssonReal Artists Ship, Haraldur Thorleifsson
Real Artists Ship, Haraldur ThorleifssonFuture Insights
 
Ok Computer. Peter Gasston
Ok Computer. Peter GasstonOk Computer. Peter Gasston
Ok Computer. Peter GasstonFuture Insights
 
Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi Kaya
Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi KayaDigital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi Kaya
Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi KayaFuture Insights
 
How to Build Your Future in the Internet of Things Economy. Jennifer Riggins
How to Build Your Future in the Internet of Things Economy. Jennifer RigginsHow to Build Your Future in the Internet of Things Economy. Jennifer Riggins
How to Build Your Future in the Internet of Things Economy. Jennifer RigginsFuture Insights
 
The Wordpress Game Changer. Jenny Wong
The Wordpress Game Changer. Jenny WongThe Wordpress Game Changer. Jenny Wong
The Wordpress Game Changer. Jenny WongFuture Insights
 
A behind the-scenes look at cross-browser testing with web driver, Adrian Bat...
A behind the-scenes look at cross-browser testing with web driver, Adrian Bat...A behind the-scenes look at cross-browser testing with web driver, Adrian Bat...
A behind the-scenes look at cross-browser testing with web driver, Adrian Bat...Future Insights
 

More from Future Insights (20)

The Human Body in the IoT. Tim Cannon + Ryan O'Shea
The Human Body in the IoT. Tim Cannon + Ryan O'SheaThe Human Body in the IoT. Tim Cannon + Ryan O'Shea
The Human Body in the IoT. Tim Cannon + Ryan O'Shea
 
Pretty pictures - Brandon Satrom
Pretty pictures - Brandon SatromPretty pictures - Brandon Satrom
Pretty pictures - Brandon Satrom
 
Putting real time into practice - Saul Diez-Guerra
Putting real time into practice - Saul Diez-GuerraPutting real time into practice - Saul Diez-Guerra
Putting real time into practice - Saul Diez-Guerra
 
A Universal Theory of Everything, Christopher Murphy
A Universal Theory of Everything, Christopher MurphyA Universal Theory of Everything, Christopher Murphy
A Universal Theory of Everything, Christopher Murphy
 
Horizon Interactive Awards, Mike Sauce & Jeff Jahn
Horizon Interactive Awards, Mike Sauce & Jeff JahnHorizon Interactive Awards, Mike Sauce & Jeff Jahn
Horizon Interactive Awards, Mike Sauce & Jeff Jahn
 
Reading Your Users’ Minds: Empiricism, Design, and Human Behavior, Shane F. B...
Reading Your Users’ Minds: Empiricism, Design, and Human Behavior, Shane F. B...Reading Your Users’ Minds: Empiricism, Design, and Human Behavior, Shane F. B...
Reading Your Users’ Minds: Empiricism, Design, and Human Behavior, Shane F. B...
 
Front End Development Transformation at Scale, Damon Deaner
Front End Development Transformation at Scale, Damon DeanerFront End Development Transformation at Scale, Damon Deaner
Front End Development Transformation at Scale, Damon Deaner
 
Structuring Data from Unstructured Things. Sean Lorenz
Structuring Data from Unstructured Things. Sean LorenzStructuring Data from Unstructured Things. Sean Lorenz
Structuring Data from Unstructured Things. Sean Lorenz
 
The Future is Modular, Jonathan Snook
The Future is Modular, Jonathan SnookThe Future is Modular, Jonathan Snook
The Future is Modular, Jonathan Snook
 
Designing an Enterprise CSS Framework is Hard, Stephanie Rewis
Designing an Enterprise CSS Framework is Hard, Stephanie RewisDesigning an Enterprise CSS Framework is Hard, Stephanie Rewis
Designing an Enterprise CSS Framework is Hard, Stephanie Rewis
 
Accessibility Is More Than What Lies In The Code, Jennison Asuncion
Accessibility Is More Than What Lies In The Code, Jennison AsuncionAccessibility Is More Than What Lies In The Code, Jennison Asuncion
Accessibility Is More Than What Lies In The Code, Jennison Asuncion
 
Sunny with a Chance of Innovation: A How-To for Product Managers and Designer...
Sunny with a Chance of Innovation: A How-To for Product Managers and Designer...Sunny with a Chance of Innovation: A How-To for Product Managers and Designer...
Sunny with a Chance of Innovation: A How-To for Product Managers and Designer...
 
Designing for Dyslexia, Andrew Zusman
Designing for Dyslexia, Andrew ZusmanDesigning for Dyslexia, Andrew Zusman
Designing for Dyslexia, Andrew Zusman
 
Beyond Measure, Erika Hall
Beyond Measure, Erika HallBeyond Measure, Erika Hall
Beyond Measure, Erika Hall
 
Real Artists Ship, Haraldur Thorleifsson
Real Artists Ship, Haraldur ThorleifssonReal Artists Ship, Haraldur Thorleifsson
Real Artists Ship, Haraldur Thorleifsson
 
Ok Computer. Peter Gasston
Ok Computer. Peter GasstonOk Computer. Peter Gasston
Ok Computer. Peter Gasston
 
Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi Kaya
Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi KayaDigital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi Kaya
Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi Kaya
 
How to Build Your Future in the Internet of Things Economy. Jennifer Riggins
How to Build Your Future in the Internet of Things Economy. Jennifer RigginsHow to Build Your Future in the Internet of Things Economy. Jennifer Riggins
How to Build Your Future in the Internet of Things Economy. Jennifer Riggins
 
The Wordpress Game Changer. Jenny Wong
The Wordpress Game Changer. Jenny WongThe Wordpress Game Changer. Jenny Wong
The Wordpress Game Changer. Jenny Wong
 
A behind the-scenes look at cross-browser testing with web driver, Adrian Bat...
A behind the-scenes look at cross-browser testing with web driver, Adrian Bat...A behind the-scenes look at cross-browser testing with web driver, Adrian Bat...
A behind the-scenes look at cross-browser testing with web driver, Adrian Bat...
 

Recently uploaded

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Exploring Open Date with BigQuery: Jenny Tong

  • 1. Exploring Open Data with BigQuery
  • 2. Jenny Tong Developer Advocate Google Cloud Platform @MimmingCodes
  • 3. Agenda ● Origin story ● Count stuff ● How it works ● Some cool open data ● Do something useful
  • 10. SELECT sum(requests) as total FROM [fh-bigquery:wikipedia.pagecounts_20150511_05] Wikipedia hits over 1 hour
  • 11. SELECT sum(requests) as total FROM [fh-bigquery:wikipedia.pagecounts_201505] Wikipedia hits over 1 month
  • 12. Several years of Wikipedia data SELECT sum(requests) as total FROM [fh-bigquery:wikipedia.pagecounts_201105], [fh-bigquery:wikipedia.pagecounts_201106], [fh-bigquery:wikipedia.pagecounts_201107], ...
  • 14. How about a RegExp SELECT SUM(requests) AS total FROM TABLE_QUERY( [fh-bigquery:wikipedia], 'REGEXP_MATCH( table_id, r"pagecounts_2015[0-9]{2}$")') WHERE (REGEXP_MATCH(title, '.*[dD]inosaur.*'))
  • 15. How did it do that? o_O
  • 16. Qualities of a good RDBMS
  • 17. Qualities of a good RDBMS ● Inserts & locking ● Indexing ● Cache ● Query planning
  • 18. Qualities of a good RDBMS ● Inserts & locking ● Indexing ● Cache ● Query planning
  • 19.
  • 20.
  • 21.
  • 22. Storing data -- -- -- -- -- -- -- -- -- -- -- -- Table Columns Disks
  • 23. Reading data: Life of a BigQuery SELECT sum(requests) as sum FROM ( SELECT requests, title FROM [fh-bigquery:wikipedia. pagecounts_201501] WHERE (REGEXP_MATCH(title, '[Jj]en.+')) )
  • 24. Life of a BigQuery L L MMixer Leaf Storage
  • 25. L L L L M M M Life of a BigQuery Root Mixer Mixer Leaf Storage
  • 26. Life of a BigQuery Query L L L L M M MRoot Mixer Mixer Leaf Storage
  • 27. Life of a BigQueryLife of a BigQuery L L L L M M MRoot Mixer Mixer Leaf Storage SELECT requests, title
  • 28. Life of a BigQueryLife of a BigQuery L L L L M M MRoot Mixer Mixer Leaf Storage 5.4 Bil SELECT requests, title WHERE (REGEXP_MATCH(title, '[Jj]en.+'))
  • 29. Life of a BigQueryLife of a BigQuery L L L L M M MRoot Mixer Mixer Leaf Storage 5.4 Bil SELECT sum(requests) 5.8 Mil WHERE (REGEXP_MATCH(title, '[Jj]en.+')) SELECT requests, title
  • 30. Life of a BigQueryLife of a BigQuery L L L L M M MRoot Mixer Mixer Leaf Storage 5.4 Bil SELECT sum(requests) 5.8 Mil WHERE (REGEXP_MATCH(title, '[Jj]en.+')) SELECT requests, title SELECT sum(requests)
  • 35. GSOD
  • 36. Weather in Half Moon Bay SELECT DATE(year+mo+da) day, min, max FROM [fh-bigquery:weather_gsod.gsod2013] WHERE stn IN ( SELECT usaf FROM [fh-bigquery:weather_gsod.stations] WHERE name = 'HALF MOON BAY AIRPOR') AND max < 200 ORDER BY day;
  • 37. Weather in Half Moon Bay SELECT DATE(year+mo+da) day, min, max FROM [fh-bigquery:weather_gsod.gsod2013] WHERE stn IN ( SELECT usaf FROM [fh-bigquery:weather_gsod.stations] WHERE name = 'HALF MOON BAY AIRPOR') AND max < 200 ORDER BY day;
  • 38. Global high temperatures SELECT year, max(max) as max FROM TABLE_QUERY( [fh-bigquery:weather_gsod], 'table_id CONTAINS "gsod"') where max < 200 group by year order by year asc
  • 39. GDELT
  • 40. Stories per month - Massachusetts SELECT DATE(STRING(MonthYear) + '01') month, SUM(ActionGeo_ADM1Code='USMA') US FROM [gdelt-bq:full.events] WHERE MonthYear > 0 GROUP BY 1 ORDER BY 1
  • 41. SELECT DATE(STRING(MonthYear) + '01') month, SUM(ActionGeo_ADM1Code='USMA') / COUNT(*) newsyness FROM [gdelt-bq:full.events] WHERE MonthYear > 0 GROUP BY 1 ORDER BY 1 Stories per month, normalized
  • 43.
  • 44. Genomics SELECT Sample, SUM(single), SUM(double), FROM ( SELECT call.call_set_name AS Sample, SOME(call.genotype > 0) AND NOT EVERY(call. genotype > 0) WITHIN call AS single, EVERY(call.genotype > 0) WITHIN call AS double, FROM[genomics-public-data:1000_genomes.variants] OMIT RECORD IF reference_name IN ("X","Y","MT")) GROUP BY Sample ORDER BY Sample
  • 45. Genomics SELECT Sample, SUM(single), SUM(double), FROM ( SELECT call.call_set_name AS Sample, SOME(call.genotype > 0) AND NOT EVERY(call. genotype > 0) WITHIN call AS single, EVERY(call.genotype > 0) WITHIN call AS double, FROM[genomics-public-data:1000_genomes.variants] OMIT RECORD IF reference_name IN ("X","Y","MT")) GROUP BY Sample ORDER BY Sample
  • 46. Something useful: Use Wikipedia data to pick a movie
  • 47. 1. Wikipedia edits 2. ??? 3. Movie recommendation
  • 49. select title, id, count(id) as edits from [publicdata:samples.wikipedia] where title contains 'Hackers' and title contains '(film)' and wp_namespace = 0 group by title, id order by edits limit 10 Pick a great movie
  • 50. select title, id, count(id) as edits from [publicdata:samples.wikipedia] where contributor_id in ( select contributor_id from [publicdata:samples.wikipedia] where id=264176 and contributor_id is not null and is_bot is null and wp_namespace = 0 and title CONTAINS '(film)' group by contributor_id) and wp_namespace = 0 and id != 264176 and title CONTAINS '(film)' group each by title, id order by edits desc limit 100 Find edits in common
  • 51. Discover the most broadly popular films select id from ( select id, count(id) as edits from [publicdata:samples.wikipedia] where wp_namespace = 0 and title CONTAINS '(film)' group each by id order by edits desc limit 20)
  • 52. Edits in common, minus broadly popular select title, id, count(id) as edits from [publicdata:samples.wikipedia] where contributor_id in ( select contributor_id from [publicdata:samples.wikipedia] where id=264176 and contributor_id is not null and is_bot is null and wp_namespace = 0 and title CONTAINS '(film)' group by contributor_id) and wp_namespace = 0 and id != 264176 and title CONTAINS '(film)' and id not in ( select id from ( select id, count(id) as edits from [publicdata:samples. wikipedia] where wp_namespace = 0 and title CONTAINS '(film)' group each by id order by edits desc limit 20 ) ) group each by title, id order by edits desc limit 100
  • 53. What we talked about ● Origin story ● Count stuff ● How it works ● Some cool open data ● Practical applications
  • 54. ● Try BigQuery ○ bigquery.cloud.google.com ● Queries we ran ○ github.com/mimming/snippets ● Me ○ @MimmingCodes ○ google.com/+mimming The end