The BI Guy's Little Guide to Big Data

725
-1

Published on

You know Pig is more than a farm animal and that Hive is not some ultra-hip bar. You've beyond the buzz words and the word count demos. Now…you're ready to figure out how it all fits in. In this session we will review common integration scenarios, proven patterns and best practices for integration Big Data solutions into your existing data warehouse and BI architecture. Learn how you too can ride the Big Data wave without reinventing the wheel to both enhance the information you currently deliver while solving problems that were previously unapproachable.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
725
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The BI Guy's Little Guide to Big Data

  1. 1. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com The BI Guys Little Guide to Big Data Chris Price Senior BI Consultant @BluewaterSQL
  2. 2. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Introductions…. Chris Price Senior BI Consultant with Pragmatic Works Author Regular Speaker Data Geek & Super Dad! @BluewaterSQL http://bluewatersql.wordpress.com/ cprice@pragmaticworks.com
  3. 3. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Big Data  Data Explosion  As recently as 2000, only ¼ of data was digital  Paper, film or other analog media  According to IBM, 90% of data created in last 2 years  Data volume now growing 10% every 5 years  Approximately, 85% from new sources  Consumerization  4.3 connected devices per adult  27% use social media input
  4. 4. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Big Data Data Complexity: Variety and Velocity Terabytes Gigabytes Megabytes Petabytes Big Data Service Logs Spatial & GPS coordinates Data market feeds eGov feeds Weather Text/image Click stream Wikis/blogs Sensors RFID/Devices SMS HD Audio/video Source: Brian Mitchel, TechEd 2013
  5. 5. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Big Data is well…Big  Drove $28b in IT investment in 2012  Expected to grow to $34b in 2014  Challenges:  Data Volumes (Hardware/Storage Economics)  Data Diversity (Multiple Types & Sources)  Data Velocity (Real-Time)  User-Expectations  How do we plan/integrate…….
  6. 6. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Agenda  Hadoop Landscape  Current BI/DW Landscape  BI/DW & Hadoop Intersection  Tools/Techniques/Strategies
  7. 7. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Hadoop Ecosystem
  8. 8. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Hadoop on Windows  HDInsight on Windows Azure  Seamlessly scale in the cloud  Backed by Azure Storage Vault (ASV)  Hortonworks Data Platform (HDP)  On-Premise  Based on HDFS
  9. 9. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Current Landscape ClientTools Reporting Services / SharePoint / Microsoft Applications DATASOURCES Traditional Sources (CRM/ERP/LOB/Web) BI/DWSystem DW Cubes
  10. 10. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com ClientToolsBI/DWSystem DW Reporting Services / SharePoint / Microsoft Applications DATASOURCES Traditional Sources (CRM/ERP/LOB/Web) Cubes Future Landscape Hadoop New Sources (Email / Logs / Social Media /Sensor)
  11. 11. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Business Scenario DW Cube Hadoop/HDFS ODBCODBC Sqoop ODBC Reporting Tools Flume Sensor DataWebHDFS
  12. 12. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com What about Azure? DW Cube Hadoop ODBCODBC Sqoop ODBC Reporting Tools AzCopy Azure Blob Storage
  13. 13. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Tool, Techniques & Strategies  Enterprise Data Services  WebHDFS  Sqoop  Hcatalog  Pig/Hive  Enterprise Operational Services  Oozie  Other  Windows Azure Blob Storage & AzCopy  Hive ODBC  Polybase
  14. 14. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com WebHDFS  Born from HFTP, intended as a replacement  Widely used by Yahoo!  High performance, first class native protocol using industry standard RESTful mechanism  Complete interface for reading, writing & managing files  Supports secure authentication  Data Locality – requests sent to data nodes
  15. 15. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com WebHDFS – Get Example Request: curl -i -L http://host:port/webhdfs/v1/foo/bar?op=OPEN Response: HTTP/1.1 307 TEMPORARY_REDIRECT Content-Type: application/octet-stream Location: http://datanode:50075/webhdfs/v1/foo/bar?op=OPEN&offset=0 Content-Length: 0 HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 22 Hello, webhdfs user!
  16. 16. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com WebHDFS – More Examples Rename Request: curl -i -X PUT http://host:port/webhdfs/v1/foo/bar?op=RENAME&a mp;destination=/foo/bar2 Create Directory Request: curl -i -X PUT http://host:port/webhdfs/v1/foo2?op=MKDIRS
  17. 17. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Sqoop  Tool designed to efficiently move data between Hadoop (Hive & Hbase) and RDBMS  Importing (single and all tables)  Exporting  Eval (Query Execution)  Merge (Multiple HDFS datasets)  Incremental Imports  Generates MapReduce jobs  Can control the level of parallelism
  18. 18. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Sqoop Demo
  19. 19. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com HCatalog/Hive/Pig  Hcatalog – Metadata & table management  Users interact with a set of defined tables  Abstracts away the where/how of data storage  Allows for consistent access  Pig – ETL/Data Transformation Scripting  Pig Latin  Java User-Defined Functions (Piggybank/DataFu)  Hive – SQL-like interface  Allows ad-hoc queries for data summarizations and analysis  ODBC Connector
  20. 20. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Demo: Pig & Hive
  21. 21. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Oozie  Scalable, Reliable, Extensible Workflow Management System/Job Scheduler  Triggered by:  Time  Data Availability  Can run and orchestrate multiple jobs:  MapReduce and Streaming MapReduce  Hive  Pig
  22. 22. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Windows Azure Blob Storage  Also called Azure Storage Vault (ASV)  Scalable, persistent, highly-scalable storage with built- in geo-replication  Azure HDInsight clusters are wired for ASV  On-Premise HDP uses HDFS  Separates data from compute nodes:  Clusters can be created and dropped, minimizing costs  Multiple clusters can share data  The Azure Flat (Quantum 10) mesh grid network is the key  Violates the principal of data locality, but out-performs HDFS and Azure competitors
  23. 23. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Windows Azure Blob Storage Source: http://dennyglee.com/2013/03/18/why-use-blob-storage-with-hdinsight-on-azure/
  24. 24. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com AzCopy  Windows Azure Blob Storage  Copies files to and from  Similar to Robocopy  Command-line: /S /V Recursively (/S) copies all files in the Beer directory with Verbose (/V) logging
  25. 25. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com AzCopy Demo
  26. 26. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com PolyBase  Part of Parallel Data Warehouse, allows integration of relational and non-relational data  Creates external tables via a HDFS bridge  Allows on-the-fly joins within SQL Server  Supports parallel:  Imports from HDFS  Exports to HDFS
  27. 27. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Resources  Bloggers:  Denny Lee  http://dennyglee.com/  Carl Nolan  http://blogs.msdn.com/b/carlnol/archive/tags/hadoop+streaming/  Cindy Gross  http://blogs.msdn.com/b/cindygross/archive/tags/big+data/  Books:  Hadoop the Definite Guide - Tom White  Programming Pig - Alan Gates  Programming Hive - Edward Capriolo  Hadoop MapReduce Cookbook - Srinath Perera  Links to this Presentation:  http://bluewatersql.wordpress.com/resources/  http://www.slideshare.net/bluewatersql/big-dataguide
  28. 28. MAKING BUSINESS INTELLIGENT www.pragmaticworks.comMAKING BUSINESS INTELLIGENT www.pragmaticworks.com Thank you! @BluewaterSQL http://bluewatersql.wordpress.com/ cprice@pragmaticworks.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×