MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
The BI Guys Little Guide to Big Data
Chris Price
Senior BI Consultant
@...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Introductions….
Chris Price
Senior BI Consultant with Pragmatic Works
A...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Big Data
 Data Explosion
 As recently as 2000, only ¼ of data was dig...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Big Data
Data Complexity: Variety and Velocity
Terabytes
Gigabytes
Mega...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Big Data is well…Big
 Drove $28b in IT investment in 2012
 Expected t...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Agenda
 Hadoop Landscape
 Current BI/DW Landscape
 BI/DW & Hadoop In...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Hadoop Ecosystem
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Hadoop on Windows
 HDInsight on Windows Azure
 Seamlessly scale in th...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Current Landscape
ClientTools
Reporting Services / SharePoint /
Microso...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
ClientToolsBI/DWSystem
DW
Reporting Services / SharePoint /
Microsoft A...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Business Scenario
DW Cube
Hadoop/HDFS
ODBCODBC
Sqoop
ODBC
Reporting Too...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
What about Azure?
DW Cube
Hadoop
ODBCODBC
Sqoop
ODBC
Reporting Tools
Az...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Tool, Techniques & Strategies
 Enterprise Data Services
 WebHDFS
 Sq...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
WebHDFS
 Born from HFTP, intended as a replacement
 Widely used by Ya...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
WebHDFS – Get Example
Request:
curl -i -L
http://host:port/webhdfs/v1/f...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
WebHDFS – More Examples
Rename Request:
curl -i -X PUT
http://host:port...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Sqoop
 Tool designed to efficiently move data
between Hadoop (Hive & H...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Sqoop Demo
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
HCatalog/Hive/Pig
 Hcatalog – Metadata & table management
 Users inte...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Demo: Pig & Hive
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Oozie
 Scalable, Reliable, Extensible Workflow
Management System/Job S...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Windows Azure Blob Storage
 Also called Azure Storage Vault (ASV)
 Sc...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Windows Azure Blob Storage
Source: http://dennyglee.com/2013/03/18/why-...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
AzCopy
 Windows Azure Blob Storage
 Copies files to and from
 Simila...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
AzCopy Demo
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
PolyBase
 Part of Parallel Data Warehouse, allows
integration of relat...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Resources
 Bloggers:
 Denny Lee
 http://dennyglee.com/
 Carl Nolan
...
MAKING BUSINESS INTELLIGENT
www.pragmaticworks.comMAKING BUSINESS INTELLIGENT
www.pragmaticworks.com
Thank you!
@Bluewater...
Upcoming SlideShare
Loading in...5
×

The BI Guy's Little Guide to Big Data

639

Published on

You know Pig is more than a farm animal and that Hive is not some ultra-hip bar. You've beyond the buzz words and the word count demos. Now…you're ready to figure out how it all fits in. In this session we will review common integration scenarios, proven patterns and best practices for integration Big Data solutions into your existing data warehouse and BI architecture. Learn how you too can ride the Big Data wave without reinventing the wheel to both enhance the information you currently deliver while solving problems that were previously unapproachable.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
639
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "The BI Guy's Little Guide to Big Data"

  1. 1. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com The BI Guys Little Guide to Big Data Chris Price Senior BI Consultant @BluewaterSQL
  2. 2. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Introductions…. Chris Price Senior BI Consultant with Pragmatic Works Author Regular Speaker Data Geek & Super Dad! @BluewaterSQL http://bluewatersql.wordpress.com/ cprice@pragmaticworks.com
  3. 3. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Big Data  Data Explosion  As recently as 2000, only ¼ of data was digital  Paper, film or other analog media  According to IBM, 90% of data created in last 2 years  Data volume now growing 10% every 5 years  Approximately, 85% from new sources  Consumerization  4.3 connected devices per adult  27% use social media input
  4. 4. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Big Data Data Complexity: Variety and Velocity Terabytes Gigabytes Megabytes Petabytes Big Data Service Logs Spatial & GPS coordinates Data market feeds eGov feeds Weather Text/image Click stream Wikis/blogs Sensors RFID/Devices SMS HD Audio/video Source: Brian Mitchel, TechEd 2013
  5. 5. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Big Data is well…Big  Drove $28b in IT investment in 2012  Expected to grow to $34b in 2014  Challenges:  Data Volumes (Hardware/Storage Economics)  Data Diversity (Multiple Types & Sources)  Data Velocity (Real-Time)  User-Expectations  How do we plan/integrate…….
  6. 6. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Agenda  Hadoop Landscape  Current BI/DW Landscape  BI/DW & Hadoop Intersection  Tools/Techniques/Strategies
  7. 7. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Hadoop Ecosystem
  8. 8. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Hadoop on Windows  HDInsight on Windows Azure  Seamlessly scale in the cloud  Backed by Azure Storage Vault (ASV)  Hortonworks Data Platform (HDP)  On-Premise  Based on HDFS
  9. 9. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Current Landscape ClientTools Reporting Services / SharePoint / Microsoft Applications DATASOURCES Traditional Sources (CRM/ERP/LOB/Web) BI/DWSystem DW Cubes
  10. 10. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com ClientToolsBI/DWSystem DW Reporting Services / SharePoint / Microsoft Applications DATASOURCES Traditional Sources (CRM/ERP/LOB/Web) Cubes Future Landscape Hadoop New Sources (Email / Logs / Social Media /Sensor)
  11. 11. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Business Scenario DW Cube Hadoop/HDFS ODBCODBC Sqoop ODBC Reporting Tools Flume Sensor DataWebHDFS
  12. 12. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com What about Azure? DW Cube Hadoop ODBCODBC Sqoop ODBC Reporting Tools AzCopy Azure Blob Storage
  13. 13. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Tool, Techniques & Strategies  Enterprise Data Services  WebHDFS  Sqoop  Hcatalog  Pig/Hive  Enterprise Operational Services  Oozie  Other  Windows Azure Blob Storage & AzCopy  Hive ODBC  Polybase
  14. 14. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com WebHDFS  Born from HFTP, intended as a replacement  Widely used by Yahoo!  High performance, first class native protocol using industry standard RESTful mechanism  Complete interface for reading, writing & managing files  Supports secure authentication  Data Locality – requests sent to data nodes
  15. 15. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com WebHDFS – Get Example Request: curl -i -L http://host:port/webhdfs/v1/foo/bar?op=OPEN Response: HTTP/1.1 307 TEMPORARY_REDIRECT Content-Type: application/octet-stream Location: http://datanode:50075/webhdfs/v1/foo/bar?op=OPEN&offset=0 Content-Length: 0 HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 22 Hello, webhdfs user!
  16. 16. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com WebHDFS – More Examples Rename Request: curl -i -X PUT http://host:port/webhdfs/v1/foo/bar?op=RENAME&a mp;destination=/foo/bar2 Create Directory Request: curl -i -X PUT http://host:port/webhdfs/v1/foo2?op=MKDIRS
  17. 17. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Sqoop  Tool designed to efficiently move data between Hadoop (Hive & Hbase) and RDBMS  Importing (single and all tables)  Exporting  Eval (Query Execution)  Merge (Multiple HDFS datasets)  Incremental Imports  Generates MapReduce jobs  Can control the level of parallelism
  18. 18. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Sqoop Demo
  19. 19. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com HCatalog/Hive/Pig  Hcatalog – Metadata & table management  Users interact with a set of defined tables  Abstracts away the where/how of data storage  Allows for consistent access  Pig – ETL/Data Transformation Scripting  Pig Latin  Java User-Defined Functions (Piggybank/DataFu)  Hive – SQL-like interface  Allows ad-hoc queries for data summarizations and analysis  ODBC Connector
  20. 20. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Demo: Pig & Hive
  21. 21. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Oozie  Scalable, Reliable, Extensible Workflow Management System/Job Scheduler  Triggered by:  Time  Data Availability  Can run and orchestrate multiple jobs:  MapReduce and Streaming MapReduce  Hive  Pig
  22. 22. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Windows Azure Blob Storage  Also called Azure Storage Vault (ASV)  Scalable, persistent, highly-scalable storage with built- in geo-replication  Azure HDInsight clusters are wired for ASV  On-Premise HDP uses HDFS  Separates data from compute nodes:  Clusters can be created and dropped, minimizing costs  Multiple clusters can share data  The Azure Flat (Quantum 10) mesh grid network is the key  Violates the principal of data locality, but out-performs HDFS and Azure competitors
  23. 23. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Windows Azure Blob Storage Source: http://dennyglee.com/2013/03/18/why-use-blob-storage-with-hdinsight-on-azure/
  24. 24. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com AzCopy  Windows Azure Blob Storage  Copies files to and from  Similar to Robocopy  Command-line: /S /V Recursively (/S) copies all files in the Beer directory with Verbose (/V) logging
  25. 25. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com AzCopy Demo
  26. 26. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com PolyBase  Part of Parallel Data Warehouse, allows integration of relational and non-relational data  Creates external tables via a HDFS bridge  Allows on-the-fly joins within SQL Server  Supports parallel:  Imports from HDFS  Exports to HDFS
  27. 27. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Resources  Bloggers:  Denny Lee  http://dennyglee.com/  Carl Nolan  http://blogs.msdn.com/b/carlnol/archive/tags/hadoop+streaming/  Cindy Gross  http://blogs.msdn.com/b/cindygross/archive/tags/big+data/  Books:  Hadoop the Definite Guide - Tom White  Programming Pig - Alan Gates  Programming Hive - Edward Capriolo  Hadoop MapReduce Cookbook - Srinath Perera  Links to this Presentation:  http://bluewatersql.wordpress.com/resources/  http://www.slideshare.net/bluewatersql/big-dataguide
  28. 28. MAKING BUSINESS INTELLIGENT www.pragmaticworks.comMAKING BUSINESS INTELLIGENT www.pragmaticworks.com Thank you! @BluewaterSQL http://bluewatersql.wordpress.com/ cprice@pragmaticworks.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×