In this session, you will learn how technologies such as Low Latency Analytical Processing [LLAP] and Hive 2.x are making it possible to analyze petabytes of data with sub second latency with common file formats such as csv, json etc. without converting to columnar file formats like ORC/Parquet. We will go deep into LLAP’s performance and architecture benefits and how it compares with Spark and Presto. We also look at how business analysts can use familiar tools such as Microsoft Excel and Power BI and do interactive query over their data lake without moving data outside the data lake.
29. • Hive Low Latency and Analytical Processing (LLAP)
• Serves queries directly from Azure BLOB/ADLS
• Works with TEXT, JSON, CSV, TSV, ORC, Parquet
• Super fast performance with TEXT data
• Modern scalable query concurrency architecture
• Security with Apache Ranger and Active Directory
@ashishth
33. Intelligent cache
Automatically reacts to changes in underlying data
o Shared cache between queries
o Cache eviction is based on source file last modified date
o Every query will check modified date, and reload if a new file has
arrived
DRAM
SSD
ADLS/BLOBStore
Updates
@ashishth
35. • LLAP, Spark, and Presto against 1 TB derived from the TPC-DS benchmark
• Out of the box HDInsight Configuration
• 45 queries derived from TPC-DS benchmark that ran on all engines
successfully
@ashishth
38. • We used number of different concurrency levels to test the concurrency
performance
• 99 queries on 1 TB data with 32 worker node cluster with max concurrency set
to 32.
Test 1: Run all 99 queries, 1 at a time - Concurrency = 1
Test 2: Run all 99 queries, 2 at a time - Concurrency = 2
Test 3: Run all 99 queries, 4 at a time - Concurrency = 4
Test 4: Run all 99 queries, 8 at a time - Concurrency = 8
Test 5: Run all 99 queries, 16 at a time - Concurrency = 16
Test 6: Run all 99 queries, 32 at a time - Concurrency = 32
Test 7: Run all 99 queries, 64 at a time - Concurrency = 64
@ashishth
40. Capability Interactive Query Spark SQL Presto
Interactive Query Speed High High Medium
Scale High High Low
Caching Yes Yes Early Support
Intelligent Cache Eviction Yes No No
Complex Fact to Fact Joins Yes Yes No
Transactions Yes No No
Query Concurrency High Low Low
Row , Column level security Yes [Apache Ranger+ AAD] High Medium
Rich end user Tools Yes Yes Yes
Language Support SQL, UDF SQL, Scala, Python SQL
Data Source Connector
Support
Storage Handlers Data Sources High number of
connectors
59. OMS Agent for
Linux
HDInsight nodes (Head, Worker ,
Zookeeper )
FluentD
HDInsight
plugin
1. Plugin for ‘in_tail’ for all Logs, allows
regexp to create JSON object
2. Filter for WARN and above for each
Log Type. `grep` filter plugin
3. Output to out_oms_api Type
4. Exec plugin for Metrics
HBaseConfigosmconfig
Spark
Hive/ LLAP
Storm
Kafka
Config
Config
Config
Config
Log Analytics(OMS) Service
HDInsight Log Analytics Architecture
60. Microsoft Azure Estimate
Your Estimate
Service type Custom name Region Description Estimated Cost
HDInsight East US Interactive Query Component: 2 A3 (4 cores, 7 GB RAM) Head
nodes x 730 Hours, 6 D14V2 (16 cores, 112 GB RAM) Region
nodes x 730 Hours, 3 A1 (1 cores, 1.75 GB RAM) Zookeeper
nodes x 730 Hours, 0 D4V2 (8 cores, 28 GB RAM) Edge nodes
x 730 Hours
$7,163.27
Storage East US Block Blob Storage, General Purpose V2, LRS Redundancy, Hot
Access Tier, 100 TB Capacity, 10,000,000 Write operations,
100,000 List and Create Container Operations, 99,999,000
Read operations, 9,990,000 Other operations. 500 TB Data
Retrieval, 50 TB Data Write
$2,181.82
Support Support $0.00
Monthly Total $9,345.09
Annual Total $112,141.06
Disclaimer
All prices shown are in US Dollar ($). This is a summary estimate, not a quote. For up to date pricing information please visit https://azure.microsoft.com/pricing/calculator/
This estimate was created at 4/13/2018 7:48:34 PM UTC.
@ashishth
61.
62.
63.
64. Use the “Decrease List Level” and “Increase List Level” tools
on the Home menu to change text levels.
Try this:
1. Place your cursor in the line of text that says “Segoe UI, size
20pt for second level”
2. Next click the Home tab, and then on the “Decrease List
level” tool. Notice how the line moves up one level.
3. Now try placing your cursor in one of the top “Main topic…”
line of text. Click the “Increase List Level” tool and see how
the text is pushed in one level.
Use these 2 tools to adjust your text levels as you work
69. Creating accessible content
Take the following steps to create accessible content that everyone can consume effectively.
Contrast
Use high contrast colors for
maximum readability
The recommended contrast
ratio is at least 4.5:1
Text Text
Color Contrast Analyzer
Download this tool to determine
the legibility of text and the
contrast of visual elements
Download
Shape and color
Use different shapes with a
legend to indicate statuses
to accommodate for color
blindness
Example:
Alt text
Alt text helps people with
screen readers understand
the content of slides
You can create alternative
text for shapes, pictures,
charts, tables, SmartArt
graphics, or other objects
Here’s how:
Right click the image or shape
Select Edit Alt Text
Enter a Title and Description of
your image or object
Slide layouts
Using a built-in slide layout
that matches your content
ensures a hierarchical
reading order of text blocks
Example:
If a new slide will have a title,
rather than starting with a blank
layout and adding a text block for
the title, choose one of the built-in
layouts with a title placeholder
Reading order
Screen readers describe
content on the screen in the
order it was created
To ensure your content is
read back in the order you
prefer, arrange your objects
in the Selection Pane
appropriately. Objects on the
bottom of the selection pane
are read first
Here’s how:
Click the Home tab
In the Drawing group, select the
Arrange drop-down menu
Click Selection Pane…
Additional tips
Be sure to run the Accessibility Checker! Go to File click the Check for Issues drop down menu click Check Accessibility
Videos need to be accessible: If your presentation includes a video, ensure it is captioned and audio described (if appropriate)
Visit the Office Accessibility Center to learn more about accessibility in PowerPoint
72. Microsoft monoline icons
Looking for icon resources?
The Monoline icon library for PowerPoint is
a slide deck that provides a library of icons
for use in PowerPoint presentations.
The Monoline icon style guide for
PowerPoint is a pdf with additional
guidelines.
Download both from Brand Central.