Big datalab

911 views

Published on

Published in: Technology, Education

Big datalab

  1. 1. BIGDATA LAB BigQuery & Query Visualization
  2. 2. Outline • BigQuery • BigQuery Visualization • BigData Lab Open Source!
  3. 3. About me David Chen ! TAGOO CTO PyCon APAC 2014 PR Taipei.py Coorganizer GDE ! Speaker: PyCon Apac 2014 PyCon 2013 Google Festival Google Launch Event
  4. 4. Big Query
  5. 5. RealTime BigQuery: Big Data Analytics in the cloud BigData SQL
  6. 6. SQL
  7. 7. Basic Characteristic • STRING, INTEGER, FLOAT, BOOLEAN, TIMESTAMP, RECORD • schema: Support repeated / nested field (json) • Import / (parallel) Export with CSV / JSON • Streaming (real time insert) • 100,000 rows/s
  8. 8. Big Join
  9. 9. Nested / Repeated
  10. 10. Table wildcard / decorators
  11. 11. User defined function
  12. 12. Big Query Visualization
  13. 13. BigQuery Taiwan http://littleq0903.github.io/bq-taiwan/
  14. 14. With google chartshttps://gcdc2013-coder.appspot.com/app#
  15. 15. http://googlegeodevelopers.blogspot.tw/2013/09/visualizing- airport-delay-correlations.html BigQuery + Map
  16. 16. http://nbviewer.ipython.org/gist/fhoffa/6459195 BigQuery + Ipython Notebook
  17. 17. Even More • BigQuery with R
 http://thinktostart.wordpress.com/2014/03/10/using- google-bigquery-with-r/ • BigQuery with Pandas
 http://pandas.pydata.org/pandas-docs/stable/ io.html#google-bigquery-experimental • BigQuery with Hadoop
 http://googlecloudplatform.blogspot.tw/2014/04/google- bigquery-and-datastore-connectors-for-hadoop.html • Excel Connector
 https://developers.google.com/bigquery/bigquery- connector-for-excel
  18. 18. Real Time
  19. 19. BigQuery + Hadoop
  20. 20. https://www.youtube.com/watch?v=yKBHEznag-g#t=231 Live Dashboard
  21. 21. Big Data Lab Open Source
  22. 22. Google Developer Challenge 2013
  23. 23. AppEngine Manipulate data with MapReduce Cloud Storage Storage with low price and highly consistenc Predict API* Machine learning on cloud BigQuery AdHoc Query to google sheet & Visualization No Deploy / Config needs Easy to use (for kids) but still powerful Open Source
  24. 24. Big Data Pipeline
  25. 25. AppEngine Task Client Pipeline Worker Virtual Env AppEngine Task Client Pipeline Worker Virtual Env Map Reduce Map Reduce GCE Task Client Hadoop ! ! GCE Task Controller Cron Tab Task Graph Controller UI Virtual Env Currently use Luigi
  26. 26. • Task Worker
 https://github.com/Tagtoo/TaskWorker • Predefined Pipeline
 https://github.com/Tagtoo/TaskWorker • Virtual Package
 https://github.com/Tagtoo/BigDataLabWorker AppEngine Manipulate data with MapReduce
  27. 27. Reference • https://cloud.google.com/events/google- cloud-platform-live/

×