Your SlideShare is downloading. ×
The BI Guy's Little Guide to Big Data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

The BI Guy's Little Guide to Big Data

527
views

Published on

You know Pig is more than a farm animal and that Hive is not some ultra-hip bar. You've beyond the buzz words and the word count demos. Now…you're ready to figure out how it all fits in. In this …

You know Pig is more than a farm animal and that Hive is not some ultra-hip bar. You've beyond the buzz words and the word count demos. Now…you're ready to figure out how it all fits in. In this session we will review common integration scenarios, proven patterns and best practices for integration Big Data solutions into your existing data warehouse and BI architecture. Learn how you too can ride the Big Data wave without reinventing the wheel to both enhance the information you currently deliver while solving problems that were previously unapproachable.

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
527
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com The BI Guys Little Guide to Big Data Chris Price Senior BI Consultant @BluewaterSQL
  • 2. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Introductions…. Chris Price Senior BI Consultant with Pragmatic Works Author Regular Speaker Data Geek & Super Dad! @BluewaterSQL http://bluewatersql.wordpress.com/ cprice@pragmaticworks.com
  • 3. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Big Data  Data Explosion  As recently as 2000, only ¼ of data was digital  Paper, film or other analog media  According to IBM, 90% of data created in last 2 years  Data volume now growing 10% every 5 years  Approximately, 85% from new sources  Consumerization  4.3 connected devices per adult  27% use social media input
  • 4. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Big Data Data Complexity: Variety and Velocity Terabytes Gigabytes Megabytes Petabytes Big Data Service Logs Spatial & GPS coordinates Data market feeds eGov feeds Weather Text/image Click stream Wikis/blogs Sensors RFID/Devices SMS HD Audio/video Source: Brian Mitchel, TechEd 2013
  • 5. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Big Data is well…Big  Drove $28b in IT investment in 2012  Expected to grow to $34b in 2014  Challenges:  Data Volumes (Hardware/Storage Economics)  Data Diversity (Multiple Types & Sources)  Data Velocity (Real-Time)  User-Expectations  How do we plan/integrate…….
  • 6. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Agenda  Hadoop Landscape  Current BI/DW Landscape  BI/DW & Hadoop Intersection  Tools/Techniques/Strategies
  • 7. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Hadoop Ecosystem
  • 8. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Hadoop on Windows  HDInsight on Windows Azure  Seamlessly scale in the cloud  Backed by Azure Storage Vault (ASV)  Hortonworks Data Platform (HDP)  On-Premise  Based on HDFS
  • 9. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Current Landscape ClientTools Reporting Services / SharePoint / Microsoft Applications DATASOURCES Traditional Sources (CRM/ERP/LOB/Web) BI/DWSystem DW Cubes
  • 10. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com ClientToolsBI/DWSystem DW Reporting Services / SharePoint / Microsoft Applications DATASOURCES Traditional Sources (CRM/ERP/LOB/Web) Cubes Future Landscape Hadoop New Sources (Email / Logs / Social Media /Sensor)
  • 11. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Business Scenario DW Cube Hadoop/HDFS ODBCODBC Sqoop ODBC Reporting Tools Flume Sensor DataWebHDFS
  • 12. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com What about Azure? DW Cube Hadoop ODBCODBC Sqoop ODBC Reporting Tools AzCopy Azure Blob Storage
  • 13. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Tool, Techniques & Strategies  Enterprise Data Services  WebHDFS  Sqoop  Hcatalog  Pig/Hive  Enterprise Operational Services  Oozie  Other  Windows Azure Blob Storage & AzCopy  Hive ODBC  Polybase
  • 14. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com WebHDFS  Born from HFTP, intended as a replacement  Widely used by Yahoo!  High performance, first class native protocol using industry standard RESTful mechanism  Complete interface for reading, writing & managing files  Supports secure authentication  Data Locality – requests sent to data nodes
  • 15. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com WebHDFS – Get Example Request: curl -i -L http://host:port/webhdfs/v1/foo/bar?op=OPEN Response: HTTP/1.1 307 TEMPORARY_REDIRECT Content-Type: application/octet-stream Location: http://datanode:50075/webhdfs/v1/foo/bar?op=OPEN&offset=0 Content-Length: 0 HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 22 Hello, webhdfs user!
  • 16. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com WebHDFS – More Examples Rename Request: curl -i -X PUT http://host:port/webhdfs/v1/foo/bar?op=RENAME&a mp;destination=/foo/bar2 Create Directory Request: curl -i -X PUT http://host:port/webhdfs/v1/foo2?op=MKDIRS
  • 17. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Sqoop  Tool designed to efficiently move data between Hadoop (Hive & Hbase) and RDBMS  Importing (single and all tables)  Exporting  Eval (Query Execution)  Merge (Multiple HDFS datasets)  Incremental Imports  Generates MapReduce jobs  Can control the level of parallelism
  • 18. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Sqoop Demo
  • 19. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com HCatalog/Hive/Pig  Hcatalog – Metadata & table management  Users interact with a set of defined tables  Abstracts away the where/how of data storage  Allows for consistent access  Pig – ETL/Data Transformation Scripting  Pig Latin  Java User-Defined Functions (Piggybank/DataFu)  Hive – SQL-like interface  Allows ad-hoc queries for data summarizations and analysis  ODBC Connector
  • 20. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Demo: Pig & Hive
  • 21. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Oozie  Scalable, Reliable, Extensible Workflow Management System/Job Scheduler  Triggered by:  Time  Data Availability  Can run and orchestrate multiple jobs:  MapReduce and Streaming MapReduce  Hive  Pig
  • 22. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Windows Azure Blob Storage  Also called Azure Storage Vault (ASV)  Scalable, persistent, highly-scalable storage with built- in geo-replication  Azure HDInsight clusters are wired for ASV  On-Premise HDP uses HDFS  Separates data from compute nodes:  Clusters can be created and dropped, minimizing costs  Multiple clusters can share data  The Azure Flat (Quantum 10) mesh grid network is the key  Violates the principal of data locality, but out-performs HDFS and Azure competitors
  • 23. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Windows Azure Blob Storage Source: http://dennyglee.com/2013/03/18/why-use-blob-storage-with-hdinsight-on-azure/
  • 24. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com AzCopy  Windows Azure Blob Storage  Copies files to and from  Similar to Robocopy  Command-line: /S /V Recursively (/S) copies all files in the Beer directory with Verbose (/V) logging
  • 25. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com AzCopy Demo
  • 26. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com PolyBase  Part of Parallel Data Warehouse, allows integration of relational and non-relational data  Creates external tables via a HDFS bridge  Allows on-the-fly joins within SQL Server  Supports parallel:  Imports from HDFS  Exports to HDFS
  • 27. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Resources  Bloggers:  Denny Lee  http://dennyglee.com/  Carl Nolan  http://blogs.msdn.com/b/carlnol/archive/tags/hadoop+streaming/  Cindy Gross  http://blogs.msdn.com/b/cindygross/archive/tags/big+data/  Books:  Hadoop the Definite Guide - Tom White  Programming Pig - Alan Gates  Programming Hive - Edward Capriolo  Hadoop MapReduce Cookbook - Srinath Perera  Links to this Presentation:  http://bluewatersql.wordpress.com/resources/  http://www.slideshare.net/bluewatersql/big-dataguide
  • 28. MAKING BUSINESS INTELLIGENT www.pragmaticworks.comMAKING BUSINESS INTELLIGENT www.pragmaticworks.com Thank you! @BluewaterSQL http://bluewatersql.wordpress.com/ cprice@pragmaticworks.com