Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013

6,247 views

Published on

This session focused on data visualisation using Power BI, based on big data. Some examples of Hive and HDFS file storage are given. An overview of Microsoft HDInsight is supplied.

Published in: Technology

Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013

  1. 1. Visualising Big Data Big Data Visualisation with Hadoop, Hive and Excel 2013
  2. 2. Sponsors
  3. 3. Explore Everything PASS Has to Offer Free SQL Server and BI Web Events Free 1-day Training Events Regional Event This is Community Business Analytics Training Local User Groups Around the World Session Recordings PASS Newsletter Free Online Technical Training 3
  4. 4. About me  Director-At-Large (Elect) PASS Board from Jan 2014  SQL Server MVP  Blogger, data strategist, public speaker, technologist  Joint owner of Copper Blue Consulting Ltd 4 |
  5. 5. Agenda     5 | Overview of Big Data Technologies Data Visualisation with Office365 and PowerBI Hive Visualising Big Data with Microsoft
  6. 6. Big Data.
  7. 7. HDInsight Ecosystem ODBC Distributed Processing (Map Reduce) Distributed Storage (HDFS) (Azure Data Marketplace) Windows Azure Storage
  8. 8. What is Hadoop? “Flexible and Available Architecture for Large Scale computation and data processing on a network of highly available commodity hardware.”
  9. 9. Hadoop’s Lineage * Resource: Kerberos Konference (Yahoo) – 2010
  10. 10. Data Visualisation Background We have the tools. All we’ve got to do is imagine what could be. We can reinvent the present; we can transform the world around us. Jason Silva 10
  11. 11. Almost 50% of your brain is dedicated to visual processing. David van Essen Researchers found that colour visuals increase the willingness to read by 11 80% About 70% of your sensory receptors are in your eyes.
  12. 12. Why is Data Visualisation Important?  It’s clearly a budget. It has a lot of numbers in it. George W Bush I could never figure out where the decimal point went. (Lord Randolph Churchill)
  13. 13. The Unknown Unknowns  That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know. (Donald Rumsfeld)
  14. 14. What is the purpose of Hive? Hive is a solution to a business problem: How do you analyse large amounts of data? Data Scientists want to study data Communicate with the data Businesses want to reap benefits of data Results that make sense of the data 16
  15. 15. 17
  16. 16. What is the purpose of Hive? Hive is a data warehousing system for Hadoop To meet the needs of businesses, data scientists, analysts and BI professionals Data, Summarized Fit a structure onto data Data, Analyzed Analysis of Large Datasets stored in Hadoop File Systems SQL-Like language called HiveQL Custom mappers and reduces when HiveQL isn’t enough 18
  17. 17. Agenda  Hive solves the business problem of analysing large amounts of data • • • • 19 What is the purpose of Hive? Why Hive? A history of Hive What are Hive’s constituents
  18. 18. Why Hive? Can’t Hadoop be used to solve these problems? Why is there a need for Hive? Writing MR jobs in Java can be difficult You don’t know it’s wrong until it’s fallen over! Joining Large Datasets can be difficult Learning Curve 20
  19. 19. Agenda  Hive solves the business problem of analysing large amounts of data • • • • 21 What is the purpose of Hive? Why Hive? A history of Hive What are Hive’s constituents
  20. 20. Hive History 22
  21. 21. Hive History 23
  22. 22. What can Hive offer you?  Hive can help with a range of business problems: • • • • 24 Log Processing Predictive Modelling Hypothesis testing And Business Intelligence
  23. 23. Hive is not a replacement for SQL  So don’t throw out your SQL Server instances! • Hive is for processing large data sets that may span hundreds, or even thousands, of machines • Hive as a high overhead for starting a job. It translates queries to MR so it takes time • Hive does not cache data, like SQL Server • Hive performance tuning is mainly Hadoop performance tuning • Similarity of the query engine, but different architectures for different purposes 25
  24. 24. Agenda  Hive solves the business problem of analysing large amounts of data • • • • What is the purpose of Hive? Why Hive? A history of Hive What are Hive’s constituents?  Hive as a SQL-like Language Query Tool  Hive as a Translation Tool  Hive as a Structuring Tool 26
  25. 25. HiveQL Hive QL is a SQL-like language It outputs naturally occurring groups for further analysis Easy Data Summarization Large Datasets, summarized Fit a structure onto data Analysis of Large Datasets stored in Hadoop file systems SQL-Like language called HiveQL Custom mappers and reduces when HiveQL isn’t enough 27
  26. 26. HiveQL Queries like SQL Queries? Similarities in Syntax and Features Similar features SELECT FROM WHERE GROUP BY / HAVING Table Aliases Computed Columns 28
  27. 27. HiveQL Queries like SQL Queries? Similarities in Syntax and Features Similar features Aggregate Functions Nested Select CASE LIKE / RLIKE JOIN ORDER BY / SORT BY 29
  28. 28. How does Hive work? Hive as a Translation Tool Compiles and executes queries Hive translates the SQL Query to a Map Reduce Job These are chained together Queries are compiled and executed 30
  29. 29. How does Hive work? Hive as a structuring Tool Creates a schema around the data Tables stored in Directories Hive Tables Rows and columns, like SQL tables Hive Metastore Namespace with a set of tables Holds table definitions Physical Layout Column Types Partition Information 31
  30. 30. Hive and SQL Data Types Hive SQL Tinyint Tinyint SmallInt Smallint Int Int BigInt BigInt Boolean Bit (setting as NOT NULL) Float Float Double Real BigDecimal Decimal 33
  31. 31. Hive and SQL Data Types HEADING HEADING String Char, varchar, nvarchar, ntext, text, image Binary binary Timestamp Timestamp (note that this is being deprecated). RowVersion 34
  32. 32. Hive Mathematical Operations  Primitive Types  Complex Types • Plus • Arrays • Negative • Maps • Addition • Structs • Subtraction • Union • Multiplication • Division • Modulus 35
  33. 33. How does Hive work? Hive as a structuring Tool Creates a schema around the data Tables stored in Directories Hive Tables Rows and columns, like SQL tables Hive Metastore Namespace with a set of tables Holds table definitions Physical Layout Column Types Partition Information 36
  34. 34. Visualising Big Data Self-Service Insights Actions 37
  35. 35. Different Tools for Different Jobs  Power View  Power Map  Highly Visual Design Experience  Power Map is a new 3D visualization add-in for Excel helping you to analyse geographical and temporal data  Power View is an interactive, ad hoc, query and visualization experience.  It is for business question ‘mystery’ solving  Mapping  Exploring  Interacting 38 38
  36. 36. Data where you want it 39 39
  37. 37. Data you want about ‘where’ 40 40
  38. 38. Data you want to share 41 41
  39. 39. Your data…. Fresh. 42 42
  40. 40. Demo 43
  41. 41. What did we learn from the demo? 44
  42. 42. JOIN US for our second annual event to get the best learning for analyzing, managing, and sharing business information and insights through the Microsoft Data Platform of technologies.
  43. 43. Don’t be shy… questions?
  44. 44. Thank you for listening
  45. 45. Sponsors

×