Your SlideShare is downloading. ×
The BI Guy's Little Guide to Big Data
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

The BI Guy's Little Guide to Big Data


Published on

You know Pig is more than a farm animal and that Hive is not some ultra-hip bar. You've beyond the buzz words and the word count demos. Now…you're ready to figure out how it all fits in. In this …

You know Pig is more than a farm animal and that Hive is not some ultra-hip bar. You've beyond the buzz words and the word count demos. Now…you're ready to figure out how it all fits in. In this session we will review common integration scenarios, proven patterns and best practices for integration Big Data solutions into your existing data warehouse and BI architecture. Learn how you too can ride the Big Data wave without reinventing the wheel to both enhance the information you currently deliver while solving problems that were previously unapproachable.

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. MAKING BUSINESS INTELLIGENT The BI Guys Little Guide to Big Data Chris Price Senior BI Consultant @BluewaterSQL
  • 2. MAKING BUSINESS INTELLIGENT Introductions…. Chris Price Senior BI Consultant with Pragmatic Works Author Regular Speaker Data Geek & Super Dad! @BluewaterSQL
  • 3. MAKING BUSINESS INTELLIGENT Big Data  Data Explosion  As recently as 2000, only ¼ of data was digital  Paper, film or other analog media  According to IBM, 90% of data created in last 2 years  Data volume now growing 10% every 5 years  Approximately, 85% from new sources  Consumerization  4.3 connected devices per adult  27% use social media input
  • 4. MAKING BUSINESS INTELLIGENT Big Data Data Complexity: Variety and Velocity Terabytes Gigabytes Megabytes Petabytes Big Data Service Logs Spatial & GPS coordinates Data market feeds eGov feeds Weather Text/image Click stream Wikis/blogs Sensors RFID/Devices SMS HD Audio/video Source: Brian Mitchel, TechEd 2013
  • 5. MAKING BUSINESS INTELLIGENT Big Data is well…Big  Drove $28b in IT investment in 2012  Expected to grow to $34b in 2014  Challenges:  Data Volumes (Hardware/Storage Economics)  Data Diversity (Multiple Types & Sources)  Data Velocity (Real-Time)  User-Expectations  How do we plan/integrate…….
  • 6. MAKING BUSINESS INTELLIGENT Agenda  Hadoop Landscape  Current BI/DW Landscape  BI/DW & Hadoop Intersection  Tools/Techniques/Strategies
  • 8. MAKING BUSINESS INTELLIGENT Hadoop on Windows  HDInsight on Windows Azure  Seamlessly scale in the cloud  Backed by Azure Storage Vault (ASV)  Hortonworks Data Platform (HDP)  On-Premise  Based on HDFS
  • 9. MAKING BUSINESS INTELLIGENT Current Landscape ClientTools Reporting Services / SharePoint / Microsoft Applications DATASOURCES Traditional Sources (CRM/ERP/LOB/Web) BI/DWSystem DW Cubes
  • 10. MAKING BUSINESS INTELLIGENT ClientToolsBI/DWSystem DW Reporting Services / SharePoint / Microsoft Applications DATASOURCES Traditional Sources (CRM/ERP/LOB/Web) Cubes Future Landscape Hadoop New Sources (Email / Logs / Social Media /Sensor)
  • 11. MAKING BUSINESS INTELLIGENT Business Scenario DW Cube Hadoop/HDFS ODBCODBC Sqoop ODBC Reporting Tools Flume Sensor DataWebHDFS
  • 12. MAKING BUSINESS INTELLIGENT What about Azure? DW Cube Hadoop ODBCODBC Sqoop ODBC Reporting Tools AzCopy Azure Blob Storage
  • 13. MAKING BUSINESS INTELLIGENT Tool, Techniques & Strategies  Enterprise Data Services  WebHDFS  Sqoop  Hcatalog  Pig/Hive  Enterprise Operational Services  Oozie  Other  Windows Azure Blob Storage & AzCopy  Hive ODBC  Polybase
  • 14. MAKING BUSINESS INTELLIGENT WebHDFS  Born from HFTP, intended as a replacement  Widely used by Yahoo!  High performance, first class native protocol using industry standard RESTful mechanism  Complete interface for reading, writing & managing files  Supports secure authentication  Data Locality – requests sent to data nodes
  • 15. MAKING BUSINESS INTELLIGENT WebHDFS – Get Example Request: curl -i -L http://host:port/webhdfs/v1/foo/bar?op=OPEN Response: HTTP/1.1 307 TEMPORARY_REDIRECT Content-Type: application/octet-stream Location: http://datanode:50075/webhdfs/v1/foo/bar?op=OPEN&offset=0 Content-Length: 0 HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 22 Hello, webhdfs user!
  • 16. MAKING BUSINESS INTELLIGENT WebHDFS – More Examples Rename Request: curl -i -X PUT http://host:port/webhdfs/v1/foo/bar?op=RENAME&a mp;destination=/foo/bar2 Create Directory Request: curl -i -X PUT http://host:port/webhdfs/v1/foo2?op=MKDIRS
  • 17. MAKING BUSINESS INTELLIGENT Sqoop  Tool designed to efficiently move data between Hadoop (Hive & Hbase) and RDBMS  Importing (single and all tables)  Exporting  Eval (Query Execution)  Merge (Multiple HDFS datasets)  Incremental Imports  Generates MapReduce jobs  Can control the level of parallelism
  • 19. MAKING BUSINESS INTELLIGENT HCatalog/Hive/Pig  Hcatalog – Metadata & table management  Users interact with a set of defined tables  Abstracts away the where/how of data storage  Allows for consistent access  Pig – ETL/Data Transformation Scripting  Pig Latin  Java User-Defined Functions (Piggybank/DataFu)  Hive – SQL-like interface  Allows ad-hoc queries for data summarizations and analysis  ODBC Connector
  • 21. MAKING BUSINESS INTELLIGENT Oozie  Scalable, Reliable, Extensible Workflow Management System/Job Scheduler  Triggered by:  Time  Data Availability  Can run and orchestrate multiple jobs:  MapReduce and Streaming MapReduce  Hive  Pig
  • 22. MAKING BUSINESS INTELLIGENT Windows Azure Blob Storage  Also called Azure Storage Vault (ASV)  Scalable, persistent, highly-scalable storage with built- in geo-replication  Azure HDInsight clusters are wired for ASV  On-Premise HDP uses HDFS  Separates data from compute nodes:  Clusters can be created and dropped, minimizing costs  Multiple clusters can share data  The Azure Flat (Quantum 10) mesh grid network is the key  Violates the principal of data locality, but out-performs HDFS and Azure competitors
  • 23. MAKING BUSINESS INTELLIGENT Windows Azure Blob Storage Source:
  • 24. MAKING BUSINESS INTELLIGENT AzCopy  Windows Azure Blob Storage  Copies files to and from  Similar to Robocopy  Command-line: /S /V Recursively (/S) copies all files in the Beer directory with Verbose (/V) logging
  • 26. MAKING BUSINESS INTELLIGENT PolyBase  Part of Parallel Data Warehouse, allows integration of relational and non-relational data  Creates external tables via a HDFS bridge  Allows on-the-fly joins within SQL Server  Supports parallel:  Imports from HDFS  Exports to HDFS
  • 27. MAKING BUSINESS INTELLIGENT Resources  Bloggers:  Denny Lee   Carl Nolan   Cindy Gross   Books:  Hadoop the Definite Guide - Tom White  Programming Pig - Alan Gates  Programming Hive - Edward Capriolo  Hadoop MapReduce Cookbook - Srinath Perera  Links to this Presentation:  
  • 28. MAKING BUSINESS INTELLIGENT www.pragmaticworks.comMAKING BUSINESS INTELLIGENT Thank you! @BluewaterSQL