SlideShare a Scribd company logo
Google BigQuery
for the BigData win
There are lots of tools to analyze Big Data
There are lots of tools to analyze Big Data
You are here to learn about one of the tools
The current OSS tools can be complicated
You would like a fast and cost effective tool
Use Google BigQuery to analyze your Big Data
BigQuery makes analyzing Big Data easy!
Google learned from initial BigData technologies
Google learned from initial BigData technologies
Google learned from initial BigData technologies
Dremel provides the query system
Data can be nested
The storage format is columar
Google provides a familiar developer experience
Queries have a common SQL-ish syntax
Dremel has functions
Table joins are a little different
BigQuery Building Blocks
Projects
Tables and Datasets
Jobs
There are several options for working with data
Google Cloud Storage
BigQuery REST API
Command-line Tools
BigQuery is very cost effective
No equipment to maintain
On-Demand Pricing
Reserved Capacity Pricing
Questions?
Important Links:
BigQuery: http://goo.gl/hCOMZ
Dremel: http://goo.gl/0EMwl
MapReduce: http://goo.gl/n0agd
GFS: http://goo.gl/WeuPy4
About Me:
Ken Taylor
Twitter: @taylorka
Blog: switchspan.com

More Related Content

What's hot

Cloud Developer Days - BigQuery
Cloud Developer Days - BigQueryCloud Developer Days - BigQuery
Cloud Developer Days - BigQuery
Wlodek Bielski
 
Google Bigtable
Google BigtableGoogle Bigtable
Google Bigtable
GirdhareeSaran
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
javier ramirez
 
Google Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.comGoogle Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.com
Alex Van Boxel
 
Google Big Query UDFs
Google Big Query UDFsGoogle Big Query UDFs
Google Big Query UDFs
David Gloyn-Cox
 
Google App Engine 7 9-14
Google App Engine 7 9-14Google App Engine 7 9-14
Google App Engine 7 9-14
Tony Frame
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
Lets Talk Google BigQuery
Lets Talk Google BigQueryLets Talk Google BigQuery
Lets Talk Google BigQuery
Matillion
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analytics
hafeeznazri
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?
Andreas Raible
 
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Matillion
 
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...
Ashnikbiz
 
Building event-driven Serverless Apps with Azure Functions and Azure Cosmos DB
Building event-driven Serverless Apps with Azure Functions and Azure Cosmos DBBuilding event-driven Serverless Apps with Azure Functions and Azure Cosmos DB
Building event-driven Serverless Apps with Azure Functions and Azure Cosmos DB
Microsoft Tech Community
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
Treasure Data, Inc.
 
AWS Cost Reduction and Management Plan
AWS Cost Reduction and Management PlanAWS Cost Reduction and Management Plan
AWS Cost Reduction and Management Plan
Michael J Geiser
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch Warmup
Márton Kodok
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
Data Con LA
 
"Implementing an Event Sourcing strategy on Azure", Olena Borzenko/Eldert Gro...
"Implementing an Event Sourcing strategy on Azure", Olena Borzenko/Eldert Gro..."Implementing an Event Sourcing strategy on Azure", Olena Borzenko/Eldert Gro...
"Implementing an Event Sourcing strategy on Azure", Olena Borzenko/Eldert Gro...
Fwdays
 
Data Con LA 2018 - Big Data as a Service: Running Elasticsearch on Pure by Br...
Data Con LA 2018 - Big Data as a Service: Running Elasticsearch on Pure by Br...Data Con LA 2018 - Big Data as a Service: Running Elasticsearch on Pure by Br...
Data Con LA 2018 - Big Data as a Service: Running Elasticsearch on Pure by Br...
Data Con LA
 
Getting started with BigQuery
Getting started with BigQueryGetting started with BigQuery
Getting started with BigQuery
Pradeep Bhadani
 

What's hot (20)

Cloud Developer Days - BigQuery
Cloud Developer Days - BigQueryCloud Developer Days - BigQuery
Cloud Developer Days - BigQuery
 
Google Bigtable
Google BigtableGoogle Bigtable
Google Bigtable
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
 
Google Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.comGoogle Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.com
 
Google Big Query UDFs
Google Big Query UDFsGoogle Big Query UDFs
Google Big Query UDFs
 
Google App Engine 7 9-14
Google App Engine 7 9-14Google App Engine 7 9-14
Google App Engine 7 9-14
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Lets Talk Google BigQuery
Lets Talk Google BigQueryLets Talk Google BigQuery
Lets Talk Google BigQuery
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analytics
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?
 
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
 
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...
 
Building event-driven Serverless Apps with Azure Functions and Azure Cosmos DB
Building event-driven Serverless Apps with Azure Functions and Azure Cosmos DBBuilding event-driven Serverless Apps with Azure Functions and Azure Cosmos DB
Building event-driven Serverless Apps with Azure Functions and Azure Cosmos DB
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
 
AWS Cost Reduction and Management Plan
AWS Cost Reduction and Management PlanAWS Cost Reduction and Management Plan
AWS Cost Reduction and Management Plan
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch Warmup
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
 
"Implementing an Event Sourcing strategy on Azure", Olena Borzenko/Eldert Gro...
"Implementing an Event Sourcing strategy on Azure", Olena Borzenko/Eldert Gro..."Implementing an Event Sourcing strategy on Azure", Olena Borzenko/Eldert Gro...
"Implementing an Event Sourcing strategy on Azure", Olena Borzenko/Eldert Gro...
 
Data Con LA 2018 - Big Data as a Service: Running Elasticsearch on Pure by Br...
Data Con LA 2018 - Big Data as a Service: Running Elasticsearch on Pure by Br...Data Con LA 2018 - Big Data as a Service: Running Elasticsearch on Pure by Br...
Data Con LA 2018 - Big Data as a Service: Running Elasticsearch on Pure by Br...
 
Getting started with BigQuery
Getting started with BigQueryGetting started with BigQuery
Getting started with BigQuery
 

Viewers also liked

Google Analytics and BigQuery, by Javier Ramirez, from datawaki
Google Analytics and BigQuery, by Javier Ramirez, from datawakiGoogle Analytics and BigQuery, by Javier Ramirez, from datawaki
Google Analytics and BigQuery, by Javier Ramirez, from datawaki
javier ramirez
 
Storage area network (san)
Storage area network (san) Storage area network (san)
Storage area network (san)
Satwik Kumar Shiri
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
Márton Kodok
 
Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replicationAbDul ThaYyal
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
Dharmesh Vaya
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
Venu Anuganti
 
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s New
DoiT International
 
AWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL QueriesAWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL Queries
DoiT International
 
Storage Area Network (San)
Storage Area Network (San)Storage Area Network (San)
Storage Area Network (San)
sankcomp
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
Krisshhna Daasaarii
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
Alex Van Boxel
 

Viewers also liked (11)

Google Analytics and BigQuery, by Javier Ramirez, from datawaki
Google Analytics and BigQuery, by Javier Ramirez, from datawakiGoogle Analytics and BigQuery, by Javier Ramirez, from datawaki
Google Analytics and BigQuery, by Javier Ramirez, from datawaki
 
Storage area network (san)
Storage area network (san) Storage area network (san)
Storage area network (san)
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replication
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s New
 
AWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL QueriesAWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL Queries
 
Storage Area Network (San)
Storage Area Network (San)Storage Area Network (San)
Storage Area Network (San)
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 

Similar to BigQuery for the Big Data win

Executive Intro to BigQuery
Executive Intro to BigQueryExecutive Intro to BigQuery
Executive Intro to BigQuery
William M. Cohee
 
Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.
Vicente Orjales
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
GoDataDriven
 
Introduction Data Warehouse With BigQuery
Introduction Data Warehouse With BigQueryIntroduction Data Warehouse With BigQuery
Introduction Data Warehouse With BigQuery
Yatno Sudar
 
JAM23-24_ppt.pptx
JAM23-24_ppt.pptxJAM23-24_ppt.pptx
JAM23-24_ppt.pptx
AbrarSharif2
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Chris Jang
 
Big query
Big queryBig query
Big query
Tanvi Parikh
 
BigData Meets the Federal Data Center
BigData Meets the Federal Data CenterBigData Meets the Federal Data Center
BigData Meets the Federal Data Center
Abe Usher
 
Introduction to google cloud platform
Introduction to google cloud platformIntroduction to google cloud platform
Introduction to google cloud platform
PankajSoni224837
 
Building a Marketing Data Warehouse from Scratch - SMX Advanced 202
Building a Marketing Data Warehouse from Scratch - SMX Advanced 202Building a Marketing Data Warehouse from Scratch - SMX Advanced 202
Building a Marketing Data Warehouse from Scratch - SMX Advanced 202
Christopher Gutknecht
 
Google Cloud lightning talk @MHacks
Google Cloud lightning talk @MHacksGoogle Cloud lightning talk @MHacks
Google Cloud lightning talk @MHacks
wesley chun
 
Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)
Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)
Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)
Ido Green
 
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
Codecamp Romania
 
GDSC Cloud Jam.pptx
GDSC Cloud Jam.pptxGDSC Cloud Jam.pptx
GDSC Cloud Jam.pptx
GDSCIITBhilai
 
HadoopWorkshopJuly2014
HadoopWorkshopJuly2014HadoopWorkshopJuly2014
HadoopWorkshopJuly2014
Dieter De Witte
 
GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2
GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2
GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2
GDSCBVCOENM
 
Cloud computing overview & Technical intro to Google Cloud
Cloud computing overview & Technical intro to Google CloudCloud computing overview & Technical intro to Google Cloud
Cloud computing overview & Technical intro to Google Cloud
wesley chun
 
Google Cloud Platform Update
Google Cloud Platform UpdateGoogle Cloud Platform Update
Google Cloud Platform Update
Ido Green
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
huguk
 
Eric Andersen Keynote
Eric Andersen KeynoteEric Andersen Keynote
Eric Andersen Keynote
Data Con LA
 

Similar to BigQuery for the Big Data win (20)

Executive Intro to BigQuery
Executive Intro to BigQueryExecutive Intro to BigQuery
Executive Intro to BigQuery
 
Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
Introduction Data Warehouse With BigQuery
Introduction Data Warehouse With BigQueryIntroduction Data Warehouse With BigQuery
Introduction Data Warehouse With BigQuery
 
JAM23-24_ppt.pptx
JAM23-24_ppt.pptxJAM23-24_ppt.pptx
JAM23-24_ppt.pptx
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
 
Big query
Big queryBig query
Big query
 
BigData Meets the Federal Data Center
BigData Meets the Federal Data CenterBigData Meets the Federal Data Center
BigData Meets the Federal Data Center
 
Introduction to google cloud platform
Introduction to google cloud platformIntroduction to google cloud platform
Introduction to google cloud platform
 
Building a Marketing Data Warehouse from Scratch - SMX Advanced 202
Building a Marketing Data Warehouse from Scratch - SMX Advanced 202Building a Marketing Data Warehouse from Scratch - SMX Advanced 202
Building a Marketing Data Warehouse from Scratch - SMX Advanced 202
 
Google Cloud lightning talk @MHacks
Google Cloud lightning talk @MHacksGoogle Cloud lightning talk @MHacks
Google Cloud lightning talk @MHacks
 
Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)
Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)
Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)
 
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
 
GDSC Cloud Jam.pptx
GDSC Cloud Jam.pptxGDSC Cloud Jam.pptx
GDSC Cloud Jam.pptx
 
HadoopWorkshopJuly2014
HadoopWorkshopJuly2014HadoopWorkshopJuly2014
HadoopWorkshopJuly2014
 
GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2
GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2
GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2
 
Cloud computing overview & Technical intro to Google Cloud
Cloud computing overview & Technical intro to Google CloudCloud computing overview & Technical intro to Google Cloud
Cloud computing overview & Technical intro to Google Cloud
 
Google Cloud Platform Update
Google Cloud Platform UpdateGoogle Cloud Platform Update
Google Cloud Platform Update
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Eric Andersen Keynote
Eric Andersen KeynoteEric Andersen Keynote
Eric Andersen Keynote
 

Recently uploaded

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 

Recently uploaded (20)

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

BigQuery for the Big Data win

Editor's Notes

  1. Introduce self and OutsiteWhat is BigData?Big data[1][2] is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. Same as “Web-scale” dataThe challenges include:capture, curation, storage, search, sharing, transfer, analysis and visualization
  2. OLAP (Online Analytical Processing) not a good option because of the volume of dataOLTP (Online Transaction Processing) is not designed for that type of reporting
  3. The Hadoop ecosystem is made up of a lot of companiesHadoop also has it’s origins from Google research which I will talk about shortlyThere are also visualization tools such as Tableau (out of scope of this talk)
  4. Google BigQuery!BigQuery is a RESTfulweb service that enables interactive analysis of massively large datasets working in conjunction with Google Storage. It is an Infrastructure as a Service (IaaS) that may be used complementarily with MapReduce.
  5. Apache Hadoop'sMapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers.Hadoop was created by Doug Cutting and Mike Cafarella[5] in 2005. Cutting, who was working at Yahoo! at the time,[6] named it after his son's toy elephant.[7] It was originally developed to support distribution for the Nutch search engine project.[8]The Apache Hadoop framework is composed of the following modules:Hadoop Common – contains libraries and utilities needed by other Hadoop modulesHadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications.HadoopMapReduce – a programming model for large scale data processing.Beyond HDFS, YARN and MapReduce, the entire Apache Hadoop “platform” is now commonly considered to consist of a number of related projects as well – Apache Pig, Apache Hive, Apache HBase, Apache Spark, and others
  6. BigData requires massive amounts of storage on multiple drives and a file system to overcome hardware bottlenecks when processing large data sets.Multiple CPUs are required to map/reduce the data (this includes management of the individual jobs)Running jobs can take time, so the time to map/reduce as well as composing a query matters.
  7. If you don’t, a kitten dies every minute.
  8. No need for installing all of the server softwareEverything is hostedA lot of data science and engineering effort was performed to create BigQueryGoogle uses it internally
  9. Google’s initial technologies where GFS andMapReduce(Google released research papers on both):The Google File System (GFS) in2003by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak LeungMapReduce: Simplified Data Processing on Large Clusters in2004by Jeffrey Dean and Sanjay GhemawatGFS is a proprietary distributed file systemThe main goals of a distributed file system is:1. Speed2. Scalability3. ReliabilityGoogle File System grew out of an earlier Google effort, "BigFiles", developed by Larry Page and Sergey Brin in the early days of Google, while it was still located in Stanford.It is designed to provide efficient, reliable access to data using large clusters of commodity hardware. A new version of the Google File System is codenamed Colossus.
  10. Commodity computing, or commodity cluster computing, is the use of large numbers of already available computing components for parallel computing to get the greatest amount of useful computation at low cost.[1] It is computing done in commodity computers as opposed to high-cost supermicrocomputers or boutique computers. They are easy to populate data centers withSome of the general characteristics of a commodity computer are:Shares a base instruction set common to many different models.Shares an architecture (memory, I/O map and expansion capability) that is common to many different models.High degree of mechanical compatibility, internal components (CPU, RAM, motherboard, peripheral cards, drives) are interchangeable with other models.Software is widely available off-the-shelf.Compatible with most available peripherals, works with most right out of the box.
  11. MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster.[1]A MapReduce program is composed of a Map() procedure that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() procedure that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). The "MapReduce System" (also called "infrastructure" or "framework") orchestrates by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance.The model is inspired by the map and reduce functions commonly used in functional programming,[2] although their purpose in the MapReduce framework is not the same as in their original forms.[3]The key contributions of the MapReduce framework are not the actual map and reduce functions, but the scalability and fault-tolerance achieved for a variety of applications by optimizing the execution engine once.MapReduce is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning,[7] andstatistical machine translation.
  12. Released white paper in Sept 2010Dremel is a brand of power tools that primarily rely on their speedas opposed to torque. The goal of google for BigQuery was to query 1TB of data in less than 1s.Dremel has been in production since 2006 and has thousands ofusers within Google. It replaced MapReduce in many instances but can be complementary.Multiple instances of Dremel are deployed inthe company, ranging from tens to thousands of nodes. Examples of using the system include:• Analysis of crawled web documents.• Tracking install data for applications on Android Market.• Crash reporting for Google products.• OCR results from Google Books.• Spam analysis.• Debugging of map tiles on Google Maps.• Tablet migrations in managed Bigtable instances.• Results of tests run on Google’s distributed build system.• Disk I/O statistics for hundreds of thousands of disks.• Resource monitoring for jobs run in Google’s data centers.• Symbols and dependencies in Google’s codebase.Dremel builds on ideas from web search and parallel DBMSs.In contrast to layers such as Pig and Hive for Hadoop, it executes queries natively withouttranslating them into MR jobs.
  13. Dremel allows data to be nested.Only in JSONMakes data more concise for a single tableThe allows a more compact file to import to BigQueryThis makes it easily interoperable with a lot of the current Javascript technologies and NoSQL databases such as MongoDB, etc.Data can also be imported as CSV
  14. ** The data is read-only/append **Dremel uses a column-striped storage representation, which enables it to read less data from secondary storage and reduce CPU cost due to cheaper compression. Column stores have been adopted for analyzing relational data [1] but to the best of my knowledge have not been extended to nested data models.One of the ingredients for building interoperable data management components is a shared storage format. Columnar storage proved successful for flat relational data but making it work for Google required adapting it to a nested data model. Figure 1 illustrates the main idea: All values of a nested field such as A.B.C are stored contiguously. Hence, A.B.C can be retrieved without reading A.E, A.B.D, etc. The challenge that it addresses is how to preserve all structural information and be able to reconstruct records from an arbitrary subset of fields.
  15. Web based interface to managementFlat files (csv/json)Libraries in most of the major programming languagesA RESTful APISQL syntax for querying
  16. BigQuery queries are written using a variation of the standard SQL SELECT statement.BigQuery supports a wide variety of functions such as COUNT, arithmetic expressions, and string functionshttps://developers.google.com/bigquery/query-referenceQuery syntaxSELECTWITHINFROMFLATTENJOINWHEREGROUP BYHAVINGORDER BYLIMIT** Retrieving large result sets can be time consuming – USE LIMIT and/or AGGREGATES!
  17. Dremel has most of the standard SQL-ish functions for aggregates, such as COUNT, SUM, MIN, MAX AVGDremel also has functions for extracting JSON in a field using a JSONPath syntaxDremel has an URL and IP functions which can make quick work out of any network/web logs.
  18. BigQuery supports multiple JOIN operations in each SELECT statement.JOIN typesBigQuery supports INNER, LEFT OUTER and CROSS JOIN operations. The default is INNER.CROSS JOIN clauses must not contain an ON clause. CROSS JOIN operations can return a large amount of data and might result in a slow and inefficient query. When possible, use regular JOIN instead.EACH modifierNormal JOIN operations require that the right-side table contains less than 8 MB of compressed data. The EACH modifier is a hint that informs the query execution engine that the JOIN might reference two large tables. The EACH modifier can't be used in CROSS JOIN clauses.When possible, use JOIN without the EACH modifier for best performance. Use JOIN EACH when table sizes are too large for JOIN.
  19. The Building Blocks of BigQuery are:ProjectsTablesDatasetsJobs
  20. Projects are top-level containers in Google's Cloud Platform. They store information about billing and authorized users, and they contain BigQuery data. Each project has a friendly name and a unique ID.BigQuery bills on a per-project basis, so it’s usually easiest to create a single project for your company that’s maintained by your billing department. For more information on how to grant access to your project, see Access Control.
  21. Tables contain your data in BigQuery, along with a corresponding table schema that describes field names, types, and other information. BigQuery also supports views, virtual tables defined by a SQL query.BigQuery creates tables in one of the following ways:Loading data into a new tableRunning a queryCopying a table
  22. Jobs are actions you construct and BigQuery executes on your behalf to load data, export data, query data, or copy data. Since jobs can potentially take a long time to complete, they execute asynchronously and can be polled for their status.BigQuery saves a history of all jobs associated with a project, accessible via the Google Developers Console.
  23. BigQuery can be accessed/or used 3 ways:Browser tool (limited in functionality – can’t update tables)Commandline toolAPIBigQuery supports two data formats for import/export (and streaming):CSVJSON (newline-delimited)Data can be compressed via tar/gzip
  24. The Google BigQuery API is built on HTTP and JSON, so any standard HTTP client can send requests to it and parse the responses.Current libraries:.NET (C#)GoGoogle Web ToolkitJavaJavascriptNode.jsObjective-CPHPPythonRubyUses Oauth2 for authentication
  25. BigQuery has excellent commandline tools written in Python: gcloud, bq and gsutilgcloud allows update and usage of all of the Google Cloud Services from the commandlinebq is a python-based tool that accesses BigQuery from the command line.gsutil is another cloud based tool which can upload/download files to Google Cloud StorageThese tools allow you the option to script via powershell or other means if you do not want to use the API.
  26. Rented massive parallelism is much more cost effective than trying to set up the infrastructure to do it yourself. BigQuery is comparable to Amazon Elastic MapReduce (EMR) and Cloudera’sHadoop pricingWith Amazon EMRyou can launch a 10-node Hadoop cluster for as little as $0.15 per hour. BiqQuery does not price with a node structure, however.
  27. Computing Bigdata requires large clusters of commodity hardware to do correctly.Maintaining a datacenter while trying to implement something like Hadoop can be very challenging for even the most veteran neck-beards.Cloud computing provides all of the redundancy, scalability and other ‘ilities’BigQuery has two pricing plans:On-DemandReserved-Capacity
  28. Pay as you go modelResource Pricing:Loading data – FreeExporting data - FreeTable reads - FreeStorage$0.026 (per GB/month) Streaming Inserts Free until July 1, 2014 (After July 1, 2014, $0.01 (per 100,000 rows) for streaming inserts)How am I charged for queries?BigQuery uses a columnar data structure, which means that for a given query, you are only charged for data processed in each column, not the entire table. For instance, if a table has 26 columns, and you run the following query: SELECT a, b, f FROM table1 WHERE d > 100 ORDER BY eYou would be charged for processing data in columns a, b, f, d, and e only. For more information on column-oriented database structures, see Column-oriented DBMS.BigQuery accesses all rows of a table when you run a query on the table, and charges according to the total data processed in the columns you select. ** For this reason, if you expect your queries to be generally focused on data from a particular time frame, it can be economical and sometimes better performing to shard your data into separate tables based on a timestamp.If you receive a query error, you aren't charged for that query.Resource Pricing: Interactive Queries $0.005 (per GB processed) &Batch Queries$0.005 (per GB processed)1Charges rounded up to the nearest MB; minimum 10 MB data processed per each table referenced by a query2The first 100 GB of data processed per month is at no charge3Charges are based on the uncompressed data size.
  29. For customers with consistent or larger workloadsreserved capacity can save as much as 70% off On-Demand Pricing.To sign up for reserved capacity, contact a sales representative.