Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Exploring BigData with Google BigQuery


Published on

Explore about Google BigQuery, this presentation was featured as a talk at Google DevFest 2014, Mumbai presented by me.

Published in: Technology
  • Be the first to comment

Exploring BigData with Google BigQuery

  1. 1. Dharmesh Vaya @DRVaya
  2. 2. Agenda ● What is Big Data ? ● Available Big Data Solutions & Issues ● Why Google BigQuery ? ● Inside BigQuery ● Features & Components ● RESTful API ● Development with BigQuery (Live Demo) ○ Query History, Projects, DataSets, Public Datasets, Table Details, Writing Queries, Save Results. ○ Integration with Applications. ● BigQuery Tools ● Big Data Solution with BigQuery & Google Cloud Platform ● Pricing Model ● Any questions ?
  3. 3. What is Big Data ? Is it a Data Type ? No Its a buzzword - massive volume of structured and/or unstructured data. It is so large that it is difficult to process/analyze using traditional databases.
  4. 4. What is Big Data ? Data that has following attributes can be ‘Big Data’
  5. 5. So how Big is B - I - G ?
  6. 6. So how Big is B - I - G ? Library of Congress - Textual Data 20 Terabytes (20 000 000 000 000 bytes)
  7. 7. So how Big is B - I - G ? - Inventory &Customer Data 42 Terabytes (42 000 000 000 000 bytes)
  8. 8. So how Big is B - I - G ? - Media Data 100+ Terabytes (100 000 000 000 000 bytes)
  9. 9. So how Big is B - I - G ? - Search, Mail, Media & anything you can think of !! 850+ Terabytes (850 000 000 000 000 bytes) (Speculated Figures)
  10. 10. So how Big is B - I - G ? World Data Center for Climate - Meteorology Data 6.2 Petabytes (7 000 000 000 000 000 bytes)
  11. 11. Available Big Data Solutions & Issues - Highly Scalable and Distributed Computing. - Storage (HDFS) optimized for high throughput - Security, disabled by default - MapReduce is batch based, hence no real time operations. - Costly to maintain. - Highly Scalable, talks of handling Petabytes - Elastic set of resources to return result sets - Almost 10x fast as compared to Hadoop. - High costs of Data Migration and integration - Operations/Maintenance cost may shoot up
  12. 12. Why Google BigQuery ? Hadoop (with Hive) Amazon Redshift Google BigQuery = 1.4 TB On an average its within 8-10 seconds !!
  13. 13. Inside Google BigQuery ● BigQuery is based on Dremel, a technology pioneered by Google & extensively used within. ● It used Columnar storage & multi-level execution trees to achieve interactive performance for queries against multi-terabyte datasets. ● BigQuery's performance advantage comes from its parallel processing architecture. ● The query is processed by thousands of servers in a multi-level execution tree structure, with the final results aggregated at the root. BigQuery stores the data in a columnar format so that only data from the columns being queried are real. ● All this & more is now available as a publicly available service for any business or developer to use. This release made it possible for those outside of Google to utilize the power of Dremel for their Big Data processing requirements.
  14. 14. Columnar Storage & Trees
  15. 15. Inside Google BigQuery There’s a difference ● Dremel is designed as an interactive data analysis tool for large datasets. ● MapReduce is designed as a programming framework to batch process large datasets Hey you mentioned Dremel, isn’t Map Reduce based on it ?
  16. 16. Features & Components Features: ● Web GUI for BigQuery ● Affordable ● Run in Background ● Easy Data Importation ● Flexible (Addition of Columns, Native Support For Timestamp Type Of Data) ● REST API Support ● More than just Standard SQL Components: ● Project ● Tables ● DataSets ● Jobs
  17. 17. RESTful API Method HTTP Request delete DELETE /projects/projectId/datasets/datasetId get GET /projects/projectId/datasets/datasetId insert POST /projects/projectId/datasets list GET /projects/projectId/datasets patch PATCH /projects/projectId/datasets/datasetId update PUT /projects/projectId/datasets/datasetId For Datasets
  18. 18. RESTful API Method HTTP Request delete GET /projects/projectId/jobs/jobId getQueryR esults GET /projects/projectId/queries/jobId insert POST https://www.googleapis. com/upload/bigquery/v2/projects/p rojectId/jobs and POST /projects/projectId/jobs list GET /projects/projectId/jobs query POST /projects/projectId/queries For Jobs Similar methods for - ● Projects ● Tables ● TableData
  19. 19. Demo using Web Interface
  20. 20. Demo : Excel Connector +
  21. 21. BigQuery Tools BigQuery Excel Connector bq Command LineBigQuery Browser Tool Virtualization & BI Tools ETL Tools ODBC Connector
  22. 22. Big Data Solution with BigQuery
  23. 23. Big Data Solution with BigQuery Data Pipeline - transforming and loading data into BigQuery The process of using the Google Cloud Platform to upload data into BigQuery involves uploading the CSV files or Javascript Object Notation (JSON) files to Google Cloud Storage before loading the data into BigQuery. Alternatively, REST API can also be used to provide programmatic integration into the current computing environment. Data Visualization - performing data analysis on BigQuery and visualizing the results A custom, web-based dashboard can be built on Google App Engine using the BigQuery REST API to execute the queries and using Google Chart Tools to visualize the results
  24. 24. Pricing Model Action Example Loading Data Loading files/data into BigQuery Exporting Data Exporting data, Saving Results from BigQuery Table Reads Browsing through data Table Copies Copy existing table to new table Storage Action Cost Storage $0.020 per GB, per month. Streaming Inserts Free until January 1, 2015. After January 1, 2015, $0.01 per 100,000 rows Query Pricing Cost On-demand $5 per TB Reserved Capacity 5GB per second $20k/ month Wow that’s like 800MB for 1 Rupee, even Internet ain’t that cheap here.
  25. 25. Where to use ? ● Not a replacement to traditional systems, but it compliments the eco-system !! ● Major strength is Handling Large DataSets ● Major usage in Data Analytics ● Important component of Google Cloud Platform ● People are interested in numbers/data and that too quick…. Google BigQuery is the future of Analytics!!
  26. 26. Any questions ? What we covered ... ✓ What is Big Data ? ✓ Available Big Data Solutions & Issues ✓ Why Google BigQuery ? ✓ Features, Components & Tools ✓ RESTful API ✓ Demo using Web Interface ✓ Big Query Tools ✓ Big Data Solution with BigQuery ✓ Pricing Model ✓ Usage
  27. 27. No registration, just sign-in with your Google account Follow Dharmesh Vaya on @DRVaya or subscribe to my You can also add me on +DharmeshVaya About the presenter
  28. 28. http://www.reddit. com/r/bigquery/comments/28ialf/173_million_2013_nyc_taxi_rides_shared_on_big query/