SlideShare a Scribd company logo
1 of 60
Download to read offline
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Hive 0.13 Performance Benchmarks
Query Times in Hive 0.13 v. Hive 0.10
June 2014
Results from the Stinger Initiative
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The Stinger Initiative
•  Apache Hive is the de facto standard for SQL-in-Hadoop
•  The Stinger Initiative drove improved SQL semantics & performance
•  Stinger Highlights:
–  13 months
–  145 developers
–  44 companies
–  3 Hive releases (0.11, 0.12 & 0.13)
–  392K lines of new Java code
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Benchmark Overview: Hive 10 v Hive 13
•  The TPC Benchmark™DS is a decision support benchmark that models queries and data
maintenance. It evaluates decision support systems that examine large volumes of data to
answer real-world business questions.
•  Test: 50 SQL queries on Hive 0.10 (RCFile) and Hive 0.13 (ORCFile) at scale of 30TB
•  Results: More than 100x speed up for 6 of 50 queries, with average acceleration of 52x
across all queries
•  Test Environment
–  Driven by the Hive Testbench: https://github.com/cartershanklin/hive-testbench
–  Nodes: 20 nodes, 256 GB per node
–  Drives: 6x 4TB WDC WD4000FYYZ-0 drives per node
–  Interconnect: 10GB
–  Processors: 2x Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz for total of 16 CPU cores per machine
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Results for Interactive Queries
Queries #3, 7, 12, 15, 18,19, 26, 27, 42, 43, 52, 55, 82, 84, 91 & 96
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Results for Interactive Queries
Query # Query Description Hive 13 Hive 10 Change
55 For a given year, month and store manager calculate the total store sales of any combination all brands. 45.30 4,768.74 105X
27 For all items sold in stores located in six states during a given year, find summary statistics 99.05 9,988.27 101X
52 Report the total of extended sales price for all items of a specific brand in a specific year and month. 47.64 4,783.76 100X
42 For each item and a specific year and month calculate the sum of the extended sales price of store transactions. 51.16 4,681.42 92X
7 Compute the averages for promotional items sold in stores where promotion is not offered by mail or a special event. 111.89 9,795.81 88X
15 Report the total catalog sales for customers in selected geographical regions or who made large purchases. 129.43 9,299.68 72X
26 Computes averages for promotional items sold through the catalog channel… 79.14 4,718.24 60X
19 Select the top 10 revenue generating products bought by out of zip code customers for a given year… 106.94 5,668.60 53X
3 Report the total extended sales price per item brand of a specific manufacturer for all sales in a specific month. 127.99 5,433.11 42X
96 Count of sales from a named store to customers with a given number of dependents... 200.53 7,888.14 39X
91 Display total returns of catalog sales by call center and manager in a particular month… 51.61 1,460.19 28X
43 Report the sum of all sales from Sunday to Saturday for stores in a given data range by stores. 305.50 6,153.43 20X
82 Find customers who tend to spend more money (net-paid) on-line than in stores. 753.00 9,302.69 12X
84 List all customers living in a specified city, with an income between 2 values. 494.38 2,654.84 5X
12 Compute the revenue ratios across item classes. 163.42 NA ∞
18 Compute catalog sales in a given year by customers meeting certain characteristics. 162.55 NA ∞
Interactive Queries: Star schema joins over single fact tables, which may
involve advances SQL features such as windowing functions or rollups
All times in seconds
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 55
For	
  a	
  given	
  year,	
  month	
  and	
  store	
  manager	
  calculate	
  the	
  total	
  store	
  sales	
  of	
  any	
  
combina7on	
  all	
  brands.
4768.74
45.30
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00
Hive 10
Hive 13
X Factor
105
All	
  Values	
  in	
  Seconds	
  
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 27
For all items sold in stores located in six states during a given year, find the
average quantity, average list price, average list sales price, average coupon
amount for a given gender, marital status, education and customer
demographic.
9988.27
99.05
0.00 2000.00 4000.00 6000.00 8000.00 10000.00 12000.00
Hive 10
Hive 13
X Factor
101
All	
  Values	
  in	
  Seconds	
  
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 52
Report the total of extended sales price for all items of a specific brand in a
specific year and month.
4783.76
47.64
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00
Hive 10
Hive 13
X Factor
100
All	
  Values	
  in	
  Seconds	
  
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 42
For each item and a specific year and month calculate the sum of the extended
sales price of store transactions.
4681.42
51.16
0.00 500.00 1000.00 1500.00 2000.00 2500.00 3000.00 3500.00 4000.00 4500.00 5000.00
Hive 10
Hive 13
X Factor
92
All	
  Values	
  in	
  Seconds	
  
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 7
Compute the average quantity, list price, discount, and sales price for
promotional items sold in stores where the promotion is not offered by mail or a
special event. Restrict the results to a specific gender, marital and educational
status.
9795.81
111.89
0.00 2000.00 4000.00 6000.00 8000.00 10000.00 12000.00
Hive 10
Hive 13
X Factor
88
All	
  Values	
  in	
  Seconds	
  
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 15
Report the total catalog sales for customers in selected geographical regions or
who made large purchases for a given year and quarter.
9299.68
129.43
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00 9000.00 10000.00
Hive 10
Hive 13
X Factor
72
All	
  Values	
  in	
  Seconds	
  
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 26
Computes the average quantity, list price, discount, sales price for promotional
items sold through the catalog channel where the promotion was not offered by
mail or in an event for given gender, marital status and educational status.
4718.24
79.14
0.00 500.00 1000.00 1500.00 2000.00 2500.00 3000.00 3500.00 4000.00 4500.00 5000.00
Hive 10
Hive 13
X Factor
60
All	
  Values	
  in	
  Seconds	
  
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 19
Select the top 10 revenue generating products bought by out of zip code
customers for a given year, month and manager.
5668.60
106.94
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00
Hive 10
Hive 13
X Factor
53
All	
  Values	
  in	
  Seconds	
  
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 3
Report the total extended sales price per item brand of a specific manufacturer
for all sales in a specific month of the year.
5433.11
127.99
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00
Hive 10
Hive 13
X Factor
42
All	
  Values	
  in	
  Seconds	
  
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 96
Compute a count of sales from a named store to customers with a given
number of dependents made in a specified half hour period of the day.
7888.14
200.53
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00 9000.00
Hive 10
Hive 13
X Factor
39
All	
  Values	
  in	
  Seconds	
  
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 91
Display total returns of catalog sales by call center and manager in a particular
month for male customers of unknown education or female customers with
advanced degrees with a specified buy potential and from a particular time
zone.
1460.19
51.61
0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 1400.00 1600.00
Hive 10
Hive 13
X Factor
28
All	
  Values	
  in	
  Seconds	
  
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 43
Report the sum of all sales from Sunday to Saturday for stores in a given data
range by stores.
6153.43
303.50
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00
Hive 10
Hive 13
X Factor
20
All	
  Values	
  in	
  Seconds	
  
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 82
Find customers who tend to spend more money (net-paid) on-line than in
stores.
9302.69
753.00
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00 9000.00 10000.00
Hive 10
Hive 13
X Factor
12
All	
  Values	
  in	
  Seconds	
  
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 84
List all customers living in a specified city, with an income between 2 values.
2654.84
494.38
0.00 500.00 1000.00 1500.00 2000.00 2500.00 3000.00
Hive 10
Hive 13
X Factor
5
All	
  Values	
  in	
  Seconds	
  
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 12
Compute the revenue ratios across item classes: For each item in a list of
given categories, during a 30 day time period, sold through the web channel
compute the ratio of sales of that item to the sum of all of the sales in that item's
class.
0.00
163.42
0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00
Hive 10
Hive 13
X Factor
∞
All	
  Values	
  in	
  Seconds	
  
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 18
Compute, for each county, the average quantity, list price, coupon amount, sales price, net
profit, age, and number of dependents for all items purchased through catalog sales in a given
year by customers who were born in a given list of six months and living in a given list of seven
states and who also belong to a given gender and education demographic.
0.00
162.55
0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00
Hive 10
Hive 13
X Factor
∞
All	
  Values	
  in	
  Seconds	
  
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Results for Deep Reporting Queries
Queries #13, 17, 20, 21, 25, 28, 32, 40, 45, 46, 48, 49, 50, 58, 66, 68, 76, 79, 85, 87, 88, 89, 90, 92, 93, 94, 95 & 97
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Results for Deep Reporting Queries
Query # Query Description Hive 13 Hive 10 Change
58 Retrieve the items generating the highest revenue… 217.68 34,620.94 159X
40 Compute the impact of an item price change on the sales by computing the total sales for items in a 30 day period… 231.04 19,434.19 84X
68 Compute the per customer extended sales price, extended list price and extended tax for "out of town" shoppers… 85.51 7,091.34 83X
66 Compute web and catalog sales and profits by warehouse 619.73 40,677.91 66X
95 Produce a count of web sales and total shipping cost and net profit in a given 60 day period… 334.37 20,473.84 61X
93 For a given merchandise return reason, report on customers’ total cost of purchases minus the cost of returned items. 3,670.04 200,501.26 55X
21 For all items whose price was changed on a given date, compute the percentage change in inventory… 29.00 1,393.71 48X
46 Compute the per-customer coupon amount and net profit of all "out of town" customers buying from stores… 171.84 8,236.25 48X
88 How many items do we sell between pacific times of a day in certain stores to a certain type of customer? 1,767.31 72,721.59 41X
32 Compute the total discounted amount for a particular manufacturer in a particular 90 day period for catalog sales… 151.89 6,103.18 39X
17 Analyze, for each state, all items that were sold in stores in a particular quarter and returned in the next three quarters… 300.94 11,578.61 38X
94 Produce a count of web sales and total shipping cost and net profit in a given 60 day period… 181.77 5,859.67 32X
79 Compute the per customer coupon amount and net profit of Monday shoppers 272.71 8,568.70 31X
76 Computes the average quantity, list price, discount, sales price for promotional items sold through the web channel… 257.89 7,346.07 28X
92 Compute the total discount on web sales of items from a given manufacturer over a particular 90 day period… 860.82 9,768.99 11X
97 Generate counts of promotional sales and total sales, and their ratio from the web channel… 1,178.89 10,802.96 9X
Deep Reporting: Complex queries involving multiple fact tables or large
intermediate datasets
All times in seconds
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Results for Deep Reporting Queries (cont)
Query # Query Description Hive 13 Hive 10 Change
87 Count how many customers have ordered on the same day items on the web and the catalog… 1,672.35 12,308.54 7X
13 Calculate the average sales quantity, average sales price, average wholesale cost, total wholesale cost for store sales… 8,528.86 46,260.90 5X
50 For each store count the number of items in a specified month that were returned after 30, 60, 90, 120 and >120 days… 3,125.99 8,063.41 3X
20 Compute the total revenue and the ratio of total revenue to revenue by item class for specified item categories… 77.59 NA ∞
25 Get all items that were sold in stores in a particular month and year and returned in the next three quarters… 318.87 NA ∞
28 Calculate the average list price, number of non empty (null) list prices and number of distinct list prices… 2,227.67 NA ∞
45 Report the total web sales for customers in specific zip codes, cities, counties or states, or specific items… 112.09 NA ∞
48 Calculate the total sales by different types of customers… 1,813.69 NA ∞
49 Report the top 10 worst return ratios (sales to returns) of all items for each channel by quantity and currency… 559.75 NA ∞
85 For all web return reasons calculate the average sales, average refunded cash and average return fee… 500.67 NA ∞
89 All month and combination of item categories, classes and brands that have had monthly sales larger than 0.1 percent… 164.49 NA ∞
90 The ratio between the number of items sold over the internet in the morning versus items sold in the evening… 131.18 NA ∞
All times in seconds
Deep Reporting: Complex queries involving multiple fact tables or large
intermediate datasets
Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 58
Retrieve the items generating the highest revenue and which had a
revenue that was approximately equivalent across all of store, catalog
and web within the week ending a given date.
34620.94
217.68
0.00 5000.00 10000.00 15000.00 20000.00 25000.00 30000.00 35000.00 40000.00
Hive 10
Hive 13
X Factor
159
All	
  Values	
  in	
  Seconds	
  
Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 40
Compute the impact of an item price change on the sales by computing the total
sales for items in a 30 day period before and after the price change. Group the
items by location of warehouse where they were delivered from.
19434.19
231.04
0.00 5000.00 10000.00 15000.00 20000.00 25000.00
Hive 10
Hive 13
X Factor
84
All	
  Values	
  in	
  Seconds	
  
Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 68
Compute the per customer extended sales price, extended list price and
extended tax for "out of town" shoppers buying from stores located in two cities
in the first two days of each month of three consecutive years. Only consider
customers with specific dependent and vehicle counts.
7091.34
85.51
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00
Hive 10
Hive 13
X Factor
83
All	
  Values	
  in	
  Seconds	
  
Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 66
Compute web and catalog sales and profits by warehouse. Report results by
month for a given year during a given 8-hour period.
40677.91
619.73
0.00 5000.00 10000.00 15000.00 20000.00 25000.00 30000.00 35000.00 40000.00 45000.00
Hive 10
Hive 13
X Factor
66
All	
  Values	
  in	
  Seconds	
  
Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 95
Produce a count of web sales and total shipping cost and net profit in a given
60 day period to customers in a given state from a named web site for returned
orders shipped from more than one warehouse.
20473.84
334.37
0.00 5000.00 10000.00 15000.00 20000.00 25000.00
Hive 10
Hive 13
X Factor
61
All	
  Values	
  in	
  Seconds	
  
Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 93
For a given merchandise return reason, report on customers’ total cost of
purchases minus the cost of returned items. Limit the output to the 100
customers with the highest value of total purchases.
200501.26
3670.04
0.00 50000.00 100000.00 150000.00 200000.00 250000.00
Hive 10
Hive 13
X Factor
55
All	
  Values	
  in	
  Seconds	
  
Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 21
For all items whose price was changed on a given date, compute the
percentage change in inventory between the 30-day period BEFORE the price
change and the 30-day period AFTER the change. Group this information by
warehouse.
1393.71
29.00
0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 1400.00 1600.00
Hive 10
Hive 13
X Factor
48
All	
  Values	
  in	
  Seconds	
  
Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 46
Compute the per-customer coupon amount and net profit of all "out of town" customers buying
from stores located in 5 cities on weekends in three consecutive years. The customers need to
fit the profile of having a specific dependent count and vehicle count. For all these customers
print the city they lived in at the time of purchase, the city in which the store is located, the
coupon amount and net profit.
8236.25
171.84
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00 9000.00
Hive 10
Hive 13
X Factor
48
All	
  Values	
  in	
  Seconds	
  
Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 88
How many items do we sell between pacific times of a day in certain stores to customers with
one dependent count and 2 or less vehicles registered or 2 dependents with 4 or fewer vehicles
registered or 3 dependents and five or less vehicles registered. In one row break the counts
into sells from 8:30 to 9, 9 to 9:30, 9:30 to 10 ... 12 to 12:30
72721.59
1767.31
0.00 10000.00 20000.00 30000.00 40000.00 50000.00 60000.00 70000.00 80000.00
Hive 10
Hive 13
X Factor
41
All	
  Values	
  in	
  Seconds	
  
Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 32
Compute the total discounted amount for a particular manufacturer in a
particular 90 day period for catalog sales whose discounts exceeded the
average discount by at least 30%.
6103.18
154.89
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00
Hive 10
Hive 13
X Factor
39
All	
  Values	
  in	
  Seconds	
  
Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 17
Analyze, for each state, all items that were sold in stores in a particular quarter
and returned in the next three quarters and then re-purchased by the customer
through the catalog channel in the three following quarters.
11578.61
300.94
0.00 2000.00 4000.00 6000.00 8000.00 10000.00 12000.00 14000.00
Hive 10
Hive 13
X Factor
38
All	
  Values	
  in	
  Seconds	
  
Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 94
Produce a count of web sales and total shipping cost and net profit in a given
60 day period to customers in a given state from a named web site for non
returned orders shipped from more than one warehouse.
5859.67
181.77
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00
Hive 10
Hive 13
X Factor
32
All	
  Values	
  in	
  Seconds	
  
Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 79
Compute the per customer coupon amount and net profit of Monday shoppers.
Only purchases of three consecutive years made on Mondays in large stores by
customers with a certain dependent count and with a large vehicle count are
considered.
8568.70
272.71
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00 9000.00
Hive 10
Hive 13
X Factor
31
All	
  Values	
  in	
  Seconds	
  
Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 76
Computes the average quantity, list price, discount, sales price for promotional
items sold through the web channel where the promotion is not offered by mail
or in an event for given gender, marital status and educational status.
7346.07
257.89
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00
Hive 10
Hive 13
X Factor
28
All	
  Values	
  in	
  Seconds	
  
Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 92
Compute the total discount on web sales of items from a given manufacturer
over a particular 90 day period for sales whose discount exceeded 30% over
the average discount of items from that manufacturer in that period of time.
9768.99
860.82
0.00 2000.00 4000.00 6000.00 8000.00 10000.00 12000.00
Hive 10
Hive 13
X Factor
11
All	
  Values	
  in	
  Seconds	
  
Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 97
Generate counts of promotional sales and total sales, and their ratio from the
web channel for a particular item category and month to customers in a given
time zone.
10802.96
1178.89
0.00 2000.00 4000.00 6000.00 8000.00 10000.00 12000.00
Hive 10
Hive 13
X Factor
9
All	
  Values	
  in	
  Seconds	
  
Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 87
Count how many customers have ordered on the same day items on the web
and the catalog and on the same day have bought items in a store.
12308.54
1672.35
0.00 2000.00 4000.00 6000.00 8000.00 10000.00 12000.00 14000.00
Hive 10
Hive 13
X Factor
7
All	
  Values	
  in	
  Seconds	
  
Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 13
Calculate the average sales quantity, average sales price, average wholesale cost, total
wholesale cost for store sales of different customer types (e.g., based on marital status,
education status) including their household demographics, sales price and different
combinations of state and sales profit for a given year.
46260.90
8528.86
0.00 5000.00 10000.00 15000.00 20000.00 25000.00 30000.00 35000.00 40000.00 45000.00 50000.00
Hive 10
Hive 13
X Factor
5
All	
  Values	
  in	
  Seconds	
  
Page 43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 50
For each store count the number of items in a specified month that were
returned after 30, 60, 90, 120 and more than 120 days from the day of
purchase.
8063.41
3125.99
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00 9000.00
Hive 10
Hive 13
X Factor
3
All	
  Values	
  in	
  Seconds	
  
Page 44 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 20
Compute the total revenue and the ratio of total revenue to revenue by item
class for specified item categories and time periods.
0.00
77.59
0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00
Hive 10
Hive 13
X Factor
∞
All	
  Values	
  in	
  Seconds	
  
Page 45 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 25
Get all items that were sold in stores in a particular month and year AND returned in the next
three quarters AND re-purchased by the customer through the catalog channel in the six
following months.
For these items, compute the sum of net profit of store sales, net loss of store loss and net profit
of catalog. Group this information by item and store.
0.00
318.87
0.00 50.00 100.00 150.00 200.00 250.00 300.00 350.00
Hive 10
Hive 13
X Factor
∞
All	
  Values	
  in	
  Seconds	
  
Page 46 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 28
Calculate the average list price, number of non empty (null) list prices and
number of distinct list prices of six different sales buckets of the store sales
channel. Each bucket is defined by a range of distinct items and information
about list price, coupon amount and wholesale cost.
0.00
2227.67
0.00 500.00 1000.00 1500.00 2000.00 2500.00
Hive 10
Hive 13
X Factor
∞
All	
  Values	
  in	
  Seconds	
  
Page 47 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 45
Report the total web sales for customers in specific zip codes, cities, counties or
states, or specific items for a given year and quarter.
0.00
112.09
0.00 20.00 40.00 60.00 80.00 100.00 120.00
Hive 10
Hive 13
X Factor
∞
All	
  Values	
  in	
  Seconds	
  
Page 48 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 48
Calculate the total sales by different types of customers (e.g., based on marital
status, education status), sales price and different combinations of state and
sales profit.
0.00
1813.69
0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 1400.00 1600.00 1800.00 2000.00
Hive 10
Hive 13
X Factor
∞
All	
  Values	
  in	
  Seconds	
  
Page 49 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 49
Report the top 10 worst return ratios (sales to returns) of all items for each
channel by quantity and currency sorted by ratio. Quantity ratio is defined as
total number of sales to total number of returns. Currency ratio is defined as
sum of return amount to sum of net paid.
0.00
559.75
0.00 100.00 200.00 300.00 400.00 500.00 600.00
Hive 10
Hive 13
X Factor
∞
All	
  Values	
  in	
  Seconds	
  
Page 50 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 85
For all web return reason calculate the average sales, average refunded cash
and average return fee by different combinations of customer and sales types
(e.g., based on marital status, education status, state and sales profit).
0.00
500.67
0.00 100.00 200.00 300.00 400.00 500.00 600.00
Hive 10
Hive 13
X Factor
∞
All	
  Values	
  in	
  Seconds	
  
Page 51 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 89
Within a year list all month and combination of item categories, classes and
brands that have had monthly sales larger than 0.1 percent of the total yearly
sales.
0.00
164.49
0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00
Hive 10
Hive 13
X Factor
∞
All	
  Values	
  in	
  Seconds	
  
Page 52 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 90
What is the ratio between the number of items sold over the internet in the
morning (8 to 9am) to the number of items sold in the evening (7 to 8pm) of
customers with a specified number of dependents. Consider only websites with
a high amount of content.
0.00
131.18
0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00
Hive 10
Hive 13
X Factor
∞
All	
  Values	
  in	
  Seconds	
  
Page 53 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Results for Data Mining Queries
Queries #34, 39, 64, 71, 73 & 98
Page 54 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Results for Data Mining Queries
Query # Query Description Hive 13 Hive 10 Change
73 Count the number of customers with specific buy potentials and whose dependent count to vehicle count ratio is… 48.98 5,866.06 120X
71 Select the top 10 revenue generating products, sold during breakfast or dinner time for one month… 91.61 10,498.57 115X
34 Display all customers with specific buy potentials and whose dependent count to vehicle count ratio is larger than 1.2… 125.72 6,745.30 54X
39 Query with multiple, related iterations… 111.74 2,452.08 22X
64 Find those stores that sold more cross-sales items from one year to another 6,821.24 34,289.66 5X
98 Report on items sold in a given 30 day period, belonging to the specified category. 1,085.06 NA ∞
All times in seconds
Deep Mining: Queries that return large amounts of data for further processing
by other tools
Page 55 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 73
Count	
  the	
  number	
  of	
  customers	
  with	
  specific	
  buy	
  poten7als	
  and	
  whose	
  dependent	
  
count	
  to	
  vehicle	
  count	
  ra7o	
  is	
  larger	
  than	
  1	
  and	
  who	
  in	
  three	
  consecu7ve	
  years	
  bought	
  
in	
  stores	
  located	
  in	
  4	
  coun7es	
  between	
  1	
  and	
  5	
  items	
  in	
  one	
  purchase.	
  	
  Only	
  purchases	
  
in	
  the	
  first	
  2	
  days	
  of	
  the	
  months	
  are	
  considered.	
  
5866.06
48.98
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00
Hive 10
Hive 13
X Factor
120
All	
  Values	
  in	
  Seconds	
  
Page 56 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 71
Select	
  the	
  top	
  10	
  revenue	
  genera7ng	
  products,	
  sold	
  during	
  breakfast	
  or	
  dinner	
  7me	
  for	
  
one	
  month	
  managed	
  by	
  a	
  given	
  manager	
  across	
  all	
  three	
  sales	
  channels.
10498.57
91.61
0.00 2000.00 4000.00 6000.00 8000.00 10000.00 12000.00
Hive 10
Hive 13
X Factor
115
All	
  Values	
  in	
  Seconds	
  
Page 57 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 34
Display all customers with specific buy potentials and whose dependent count
to vehicle count ratio is larger than 1.2, who in three consecutive years made
purchases with between 15 and 20 items in the beginning or the end of each
month in stores located in 8 counties.
6745.30
125.72
0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00
Hive 10
Hive 13
X Factor
54
All	
  Values	
  in	
  Seconds	
  
Page 58 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 39
This query contains multiple, related iterations:
•  Iteration 1: Calculate the coefficient of variation and mean of every
item and warehouse of two consecutive months
•  Iteration 2: Find items that had a coefficient of variation in the first
months of 1.5 or large
2452.08
111.74
0.00 500.00 1000.00 1500.00 2000.00 2500.00 3000.00
Hive 10
Hive 13
X Factor
22
All	
  Values	
  in	
  Seconds	
  
Page 59 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 64
Find those stores that sold more cross-sales items from one year to another.
Cross-sale items are items that are sold over the Internet, by catalog and in
store.
34289.66
6821.24
0.00 5000.00 10000.00 15000.00 20000.00 25000.00 30000.00 35000.00 40000.00
Hive 10
Hive 13
X Factor
5
All	
  Values	
  in	
  Seconds	
  
Page 60 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Query 98
Report on items sold in a given 30 day period, belonging to the specified
category.
0.00
1085.06
0.00 200.00 400.00 600.00 800.00 1000.00 1200.00
Hive 10
Hive 13
X Factor
∞
All	
  Values	
  in	
  Seconds	
  

More Related Content

What's hot

Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariHortonworks
 
Accelerating query processing
Accelerating query processingAccelerating query processing
Accelerating query processingDataWorks Summit
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghaiYifeng Jiang
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Hortonworks
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
 
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks
 
Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNDataWorks Summit
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingDataWorks Summit/Hadoop Summit
 
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNHortonworks
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Hortonworks
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarHortonworks
 
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...DataWorks Summit
 
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureDataWorks Summit
 

What's hot (20)

Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache Ambari
 
Accelerating query processing
Accelerating query processingAccelerating query processing
Accelerating query processing
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
 
Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
 
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARN
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Database as a Service - Tutorial @ICDE 2010
Database as a Service - Tutorial @ICDE 2010Database as a Service - Tutorial @ICDE 2010
Database as a Service - Tutorial @ICDE 2010
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
 
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
 
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
 

Viewers also liked

YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambariHortonworks
 
Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNYARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNHortonworks
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalHortonworks
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3Hortonworks
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...Hortonworks
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaDatabricks
 
IBM Hadoop-DS Benchmark Report - 30TB
IBM Hadoop-DS Benchmark Report - 30TBIBM Hadoop-DS Benchmark Report - 30TB
IBM Hadoop-DS Benchmark Report - 30TBGord Sissons
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesDatabricks
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesDatabricks
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine TourLuke Han
 
Hadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the HerdHadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the HerdIBM Analytics
 
Apache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopApache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopDataWorks Summit
 
ORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, SmallerORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, SmallerDataWorks Summit
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 

Viewers also liked (20)

YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambari
 
Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNYARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical Applications
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
 
IBM Hadoop-DS Benchmark Report - 30TB
IBM Hadoop-DS Benchmark Report - 30TBIBM Hadoop-DS Benchmark Report - 30TB
IBM Hadoop-DS Benchmark Report - 30TB
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine Tour
 
Hadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the HerdHadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the Herd
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
Apache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopApache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on Hadoop
 
ORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, SmallerORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, Smaller
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 

Similar to Apache Hive 0.13 Performance Benchmarks

Business Analytics Forum - Sheffield November 2018
Business Analytics Forum - Sheffield November 2018Business Analytics Forum - Sheffield November 2018
Business Analytics Forum - Sheffield November 2018Simon Harrison ACMA CGMA
 
Improving profitability of campaigns through data science
Improving profitability of campaigns through data scienceImproving profitability of campaigns through data science
Improving profitability of campaigns through data scienceswebi
 
The 100 Task Playbook - Sample
The 100 Task Playbook - SampleThe 100 Task Playbook - Sample
The 100 Task Playbook - SampleMartin Bell
 
Technology Update: Channel IQ Tech Innovation and Product Roadmap with CEO An...
Technology Update: Channel IQ Tech Innovation and Product Roadmap with CEO An...Technology Update: Channel IQ Tech Innovation and Product Roadmap with CEO An...
Technology Update: Channel IQ Tech Innovation and Product Roadmap with CEO An...Channel IQ
 
Aligned Supply Chain Metrics
Aligned Supply Chain MetricsAligned Supply Chain Metrics
Aligned Supply Chain MetricsElaine Twomey
 
2 strategic sourcing.pptx
2 strategic sourcing.pptx2 strategic sourcing.pptx
2 strategic sourcing.pptxAnish993330
 
Management consultancy-chapter-26-and-35
Management consultancy-chapter-26-and-35Management consultancy-chapter-26-and-35
Management consultancy-chapter-26-and-35Holy Cross College
 
Operation strategy assignment sheema raza
Operation strategy assignment sheema razaOperation strategy assignment sheema raza
Operation strategy assignment sheema razaSheema Adil
 
Retail managment ppt
Retail managment pptRetail managment ppt
Retail managment ppttejasvaidya01
 
Data mining to improve e-mail marketing
Data mining to improve e-mail marketing Data mining to improve e-mail marketing
Data mining to improve e-mail marketing Ritu Sarkar
 
eSUG-April 5-final
eSUG-April 5-finaleSUG-April 5-final
eSUG-April 5-finalShrey Tandon
 
[DSC Europe 22] Using AI to solve complex business problems: optimizing markd...
[DSC Europe 22] Using AI to solve complex business problems: optimizing markd...[DSC Europe 22] Using AI to solve complex business problems: optimizing markd...
[DSC Europe 22] Using AI to solve complex business problems: optimizing markd...DataScienceConferenc1
 
Telstra Case Solution for its company.pptx
Telstra Case Solution for its company.pptxTelstra Case Solution for its company.pptx
Telstra Case Solution for its company.pptx22pgp075
 
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by  Naga.docxWalmart Sales Prediction Using Rapidminer Prepared by  Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docxcelenarouzie
 
Pricing techniques for profit maximization
Pricing techniques for profit maximizationPricing techniques for profit maximization
Pricing techniques for profit maximizationskillsdeveloper
 
Sales Wave Apps - Partner Training
Sales Wave Apps - Partner TrainingSales Wave Apps - Partner Training
Sales Wave Apps - Partner TrainingSalesforce Partners
 
Uop acc 561 week 6 final guide
Uop acc 561 week 6 final guideUop acc 561 week 6 final guide
Uop acc 561 week 6 final guidekrystalhero123
 
Capstone_Guide_English.pdf
Capstone_Guide_English.pdfCapstone_Guide_English.pdf
Capstone_Guide_English.pdfAjitChandgude3
 

Similar to Apache Hive 0.13 Performance Benchmarks (20)

Business Analytics Forum - Sheffield November 2018
Business Analytics Forum - Sheffield November 2018Business Analytics Forum - Sheffield November 2018
Business Analytics Forum - Sheffield November 2018
 
Improving profitability of campaigns through data science
Improving profitability of campaigns through data scienceImproving profitability of campaigns through data science
Improving profitability of campaigns through data science
 
The 100 Task Playbook - Sample
The 100 Task Playbook - SampleThe 100 Task Playbook - Sample
The 100 Task Playbook - Sample
 
Technology Update: Channel IQ Tech Innovation and Product Roadmap with CEO An...
Technology Update: Channel IQ Tech Innovation and Product Roadmap with CEO An...Technology Update: Channel IQ Tech Innovation and Product Roadmap with CEO An...
Technology Update: Channel IQ Tech Innovation and Product Roadmap with CEO An...
 
Aligned Supply Chain Metrics
Aligned Supply Chain MetricsAligned Supply Chain Metrics
Aligned Supply Chain Metrics
 
2 strategic sourcing.pptx
2 strategic sourcing.pptx2 strategic sourcing.pptx
2 strategic sourcing.pptx
 
Chap010
Chap010Chap010
Chap010
 
Management consultancy-chapter-26-and-35
Management consultancy-chapter-26-and-35Management consultancy-chapter-26-and-35
Management consultancy-chapter-26-and-35
 
Operation strategy assignment sheema raza
Operation strategy assignment sheema razaOperation strategy assignment sheema raza
Operation strategy assignment sheema raza
 
Business Eye 360 EN
Business Eye 360 ENBusiness Eye 360 EN
Business Eye 360 EN
 
Retail managment ppt
Retail managment pptRetail managment ppt
Retail managment ppt
 
Data mining to improve e-mail marketing
Data mining to improve e-mail marketing Data mining to improve e-mail marketing
Data mining to improve e-mail marketing
 
eSUG-April 5-final
eSUG-April 5-finaleSUG-April 5-final
eSUG-April 5-final
 
[DSC Europe 22] Using AI to solve complex business problems: optimizing markd...
[DSC Europe 22] Using AI to solve complex business problems: optimizing markd...[DSC Europe 22] Using AI to solve complex business problems: optimizing markd...
[DSC Europe 22] Using AI to solve complex business problems: optimizing markd...
 
Telstra Case Solution for its company.pptx
Telstra Case Solution for its company.pptxTelstra Case Solution for its company.pptx
Telstra Case Solution for its company.pptx
 
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by  Naga.docxWalmart Sales Prediction Using Rapidminer Prepared by  Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
 
Pricing techniques for profit maximization
Pricing techniques for profit maximizationPricing techniques for profit maximization
Pricing techniques for profit maximization
 
Sales Wave Apps - Partner Training
Sales Wave Apps - Partner TrainingSales Wave Apps - Partner Training
Sales Wave Apps - Partner Training
 
Uop acc 561 week 6 final guide
Uop acc 561 week 6 final guideUop acc 561 week 6 final guide
Uop acc 561 week 6 final guide
 
Capstone_Guide_English.pdf
Capstone_Guide_English.pdfCapstone_Guide_English.pdf
Capstone_Guide_English.pdf
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...ThinkInnovation
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 

Recently uploaded (16)

Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 

Apache Hive 0.13 Performance Benchmarks

  • 1. Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Hive 0.13 Performance Benchmarks Query Times in Hive 0.13 v. Hive 0.10 June 2014 Results from the Stinger Initiative
  • 2. Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The Stinger Initiative •  Apache Hive is the de facto standard for SQL-in-Hadoop •  The Stinger Initiative drove improved SQL semantics & performance •  Stinger Highlights: –  13 months –  145 developers –  44 companies –  3 Hive releases (0.11, 0.12 & 0.13) –  392K lines of new Java code
  • 3. Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Benchmark Overview: Hive 10 v Hive 13 •  The TPC Benchmark™DS is a decision support benchmark that models queries and data maintenance. It evaluates decision support systems that examine large volumes of data to answer real-world business questions. •  Test: 50 SQL queries on Hive 0.10 (RCFile) and Hive 0.13 (ORCFile) at scale of 30TB •  Results: More than 100x speed up for 6 of 50 queries, with average acceleration of 52x across all queries •  Test Environment –  Driven by the Hive Testbench: https://github.com/cartershanklin/hive-testbench –  Nodes: 20 nodes, 256 GB per node –  Drives: 6x 4TB WDC WD4000FYYZ-0 drives per node –  Interconnect: 10GB –  Processors: 2x Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz for total of 16 CPU cores per machine
  • 4. Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Results for Interactive Queries Queries #3, 7, 12, 15, 18,19, 26, 27, 42, 43, 52, 55, 82, 84, 91 & 96
  • 5. Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Results for Interactive Queries Query # Query Description Hive 13 Hive 10 Change 55 For a given year, month and store manager calculate the total store sales of any combination all brands. 45.30 4,768.74 105X 27 For all items sold in stores located in six states during a given year, find summary statistics 99.05 9,988.27 101X 52 Report the total of extended sales price for all items of a specific brand in a specific year and month. 47.64 4,783.76 100X 42 For each item and a specific year and month calculate the sum of the extended sales price of store transactions. 51.16 4,681.42 92X 7 Compute the averages for promotional items sold in stores where promotion is not offered by mail or a special event. 111.89 9,795.81 88X 15 Report the total catalog sales for customers in selected geographical regions or who made large purchases. 129.43 9,299.68 72X 26 Computes averages for promotional items sold through the catalog channel… 79.14 4,718.24 60X 19 Select the top 10 revenue generating products bought by out of zip code customers for a given year… 106.94 5,668.60 53X 3 Report the total extended sales price per item brand of a specific manufacturer for all sales in a specific month. 127.99 5,433.11 42X 96 Count of sales from a named store to customers with a given number of dependents... 200.53 7,888.14 39X 91 Display total returns of catalog sales by call center and manager in a particular month… 51.61 1,460.19 28X 43 Report the sum of all sales from Sunday to Saturday for stores in a given data range by stores. 305.50 6,153.43 20X 82 Find customers who tend to spend more money (net-paid) on-line than in stores. 753.00 9,302.69 12X 84 List all customers living in a specified city, with an income between 2 values. 494.38 2,654.84 5X 12 Compute the revenue ratios across item classes. 163.42 NA ∞ 18 Compute catalog sales in a given year by customers meeting certain characteristics. 162.55 NA ∞ Interactive Queries: Star schema joins over single fact tables, which may involve advances SQL features such as windowing functions or rollups All times in seconds
  • 6. Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 55 For  a  given  year,  month  and  store  manager  calculate  the  total  store  sales  of  any   combina7on  all  brands. 4768.74 45.30 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 Hive 10 Hive 13 X Factor 105 All  Values  in  Seconds  
  • 7. Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 27 For all items sold in stores located in six states during a given year, find the average quantity, average list price, average list sales price, average coupon amount for a given gender, marital status, education and customer demographic. 9988.27 99.05 0.00 2000.00 4000.00 6000.00 8000.00 10000.00 12000.00 Hive 10 Hive 13 X Factor 101 All  Values  in  Seconds  
  • 8. Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 52 Report the total of extended sales price for all items of a specific brand in a specific year and month. 4783.76 47.64 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 Hive 10 Hive 13 X Factor 100 All  Values  in  Seconds  
  • 9. Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 42 For each item and a specific year and month calculate the sum of the extended sales price of store transactions. 4681.42 51.16 0.00 500.00 1000.00 1500.00 2000.00 2500.00 3000.00 3500.00 4000.00 4500.00 5000.00 Hive 10 Hive 13 X Factor 92 All  Values  in  Seconds  
  • 10. Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 7 Compute the average quantity, list price, discount, and sales price for promotional items sold in stores where the promotion is not offered by mail or a special event. Restrict the results to a specific gender, marital and educational status. 9795.81 111.89 0.00 2000.00 4000.00 6000.00 8000.00 10000.00 12000.00 Hive 10 Hive 13 X Factor 88 All  Values  in  Seconds  
  • 11. Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 15 Report the total catalog sales for customers in selected geographical regions or who made large purchases for a given year and quarter. 9299.68 129.43 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00 9000.00 10000.00 Hive 10 Hive 13 X Factor 72 All  Values  in  Seconds  
  • 12. Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 26 Computes the average quantity, list price, discount, sales price for promotional items sold through the catalog channel where the promotion was not offered by mail or in an event for given gender, marital status and educational status. 4718.24 79.14 0.00 500.00 1000.00 1500.00 2000.00 2500.00 3000.00 3500.00 4000.00 4500.00 5000.00 Hive 10 Hive 13 X Factor 60 All  Values  in  Seconds  
  • 13. Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 19 Select the top 10 revenue generating products bought by out of zip code customers for a given year, month and manager. 5668.60 106.94 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 Hive 10 Hive 13 X Factor 53 All  Values  in  Seconds  
  • 14. Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 3 Report the total extended sales price per item brand of a specific manufacturer for all sales in a specific month of the year. 5433.11 127.99 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 Hive 10 Hive 13 X Factor 42 All  Values  in  Seconds  
  • 15. Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 96 Compute a count of sales from a named store to customers with a given number of dependents made in a specified half hour period of the day. 7888.14 200.53 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00 9000.00 Hive 10 Hive 13 X Factor 39 All  Values  in  Seconds  
  • 16. Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 91 Display total returns of catalog sales by call center and manager in a particular month for male customers of unknown education or female customers with advanced degrees with a specified buy potential and from a particular time zone. 1460.19 51.61 0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 1400.00 1600.00 Hive 10 Hive 13 X Factor 28 All  Values  in  Seconds  
  • 17. Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 43 Report the sum of all sales from Sunday to Saturday for stores in a given data range by stores. 6153.43 303.50 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 Hive 10 Hive 13 X Factor 20 All  Values  in  Seconds  
  • 18. Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 82 Find customers who tend to spend more money (net-paid) on-line than in stores. 9302.69 753.00 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00 9000.00 10000.00 Hive 10 Hive 13 X Factor 12 All  Values  in  Seconds  
  • 19. Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 84 List all customers living in a specified city, with an income between 2 values. 2654.84 494.38 0.00 500.00 1000.00 1500.00 2000.00 2500.00 3000.00 Hive 10 Hive 13 X Factor 5 All  Values  in  Seconds  
  • 20. Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 12 Compute the revenue ratios across item classes: For each item in a list of given categories, during a 30 day time period, sold through the web channel compute the ratio of sales of that item to the sum of all of the sales in that item's class. 0.00 163.42 0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 Hive 10 Hive 13 X Factor ∞ All  Values  in  Seconds  
  • 21. Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 18 Compute, for each county, the average quantity, list price, coupon amount, sales price, net profit, age, and number of dependents for all items purchased through catalog sales in a given year by customers who were born in a given list of six months and living in a given list of seven states and who also belong to a given gender and education demographic. 0.00 162.55 0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 Hive 10 Hive 13 X Factor ∞ All  Values  in  Seconds  
  • 22. Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Results for Deep Reporting Queries Queries #13, 17, 20, 21, 25, 28, 32, 40, 45, 46, 48, 49, 50, 58, 66, 68, 76, 79, 85, 87, 88, 89, 90, 92, 93, 94, 95 & 97
  • 23. Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Results for Deep Reporting Queries Query # Query Description Hive 13 Hive 10 Change 58 Retrieve the items generating the highest revenue… 217.68 34,620.94 159X 40 Compute the impact of an item price change on the sales by computing the total sales for items in a 30 day period… 231.04 19,434.19 84X 68 Compute the per customer extended sales price, extended list price and extended tax for "out of town" shoppers… 85.51 7,091.34 83X 66 Compute web and catalog sales and profits by warehouse 619.73 40,677.91 66X 95 Produce a count of web sales and total shipping cost and net profit in a given 60 day period… 334.37 20,473.84 61X 93 For a given merchandise return reason, report on customers’ total cost of purchases minus the cost of returned items. 3,670.04 200,501.26 55X 21 For all items whose price was changed on a given date, compute the percentage change in inventory… 29.00 1,393.71 48X 46 Compute the per-customer coupon amount and net profit of all "out of town" customers buying from stores… 171.84 8,236.25 48X 88 How many items do we sell between pacific times of a day in certain stores to a certain type of customer? 1,767.31 72,721.59 41X 32 Compute the total discounted amount for a particular manufacturer in a particular 90 day period for catalog sales… 151.89 6,103.18 39X 17 Analyze, for each state, all items that were sold in stores in a particular quarter and returned in the next three quarters… 300.94 11,578.61 38X 94 Produce a count of web sales and total shipping cost and net profit in a given 60 day period… 181.77 5,859.67 32X 79 Compute the per customer coupon amount and net profit of Monday shoppers 272.71 8,568.70 31X 76 Computes the average quantity, list price, discount, sales price for promotional items sold through the web channel… 257.89 7,346.07 28X 92 Compute the total discount on web sales of items from a given manufacturer over a particular 90 day period… 860.82 9,768.99 11X 97 Generate counts of promotional sales and total sales, and their ratio from the web channel… 1,178.89 10,802.96 9X Deep Reporting: Complex queries involving multiple fact tables or large intermediate datasets All times in seconds
  • 24. Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Results for Deep Reporting Queries (cont) Query # Query Description Hive 13 Hive 10 Change 87 Count how many customers have ordered on the same day items on the web and the catalog… 1,672.35 12,308.54 7X 13 Calculate the average sales quantity, average sales price, average wholesale cost, total wholesale cost for store sales… 8,528.86 46,260.90 5X 50 For each store count the number of items in a specified month that were returned after 30, 60, 90, 120 and >120 days… 3,125.99 8,063.41 3X 20 Compute the total revenue and the ratio of total revenue to revenue by item class for specified item categories… 77.59 NA ∞ 25 Get all items that were sold in stores in a particular month and year and returned in the next three quarters… 318.87 NA ∞ 28 Calculate the average list price, number of non empty (null) list prices and number of distinct list prices… 2,227.67 NA ∞ 45 Report the total web sales for customers in specific zip codes, cities, counties or states, or specific items… 112.09 NA ∞ 48 Calculate the total sales by different types of customers… 1,813.69 NA ∞ 49 Report the top 10 worst return ratios (sales to returns) of all items for each channel by quantity and currency… 559.75 NA ∞ 85 For all web return reasons calculate the average sales, average refunded cash and average return fee… 500.67 NA ∞ 89 All month and combination of item categories, classes and brands that have had monthly sales larger than 0.1 percent… 164.49 NA ∞ 90 The ratio between the number of items sold over the internet in the morning versus items sold in the evening… 131.18 NA ∞ All times in seconds Deep Reporting: Complex queries involving multiple fact tables or large intermediate datasets
  • 25. Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 58 Retrieve the items generating the highest revenue and which had a revenue that was approximately equivalent across all of store, catalog and web within the week ending a given date. 34620.94 217.68 0.00 5000.00 10000.00 15000.00 20000.00 25000.00 30000.00 35000.00 40000.00 Hive 10 Hive 13 X Factor 159 All  Values  in  Seconds  
  • 26. Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 40 Compute the impact of an item price change on the sales by computing the total sales for items in a 30 day period before and after the price change. Group the items by location of warehouse where they were delivered from. 19434.19 231.04 0.00 5000.00 10000.00 15000.00 20000.00 25000.00 Hive 10 Hive 13 X Factor 84 All  Values  in  Seconds  
  • 27. Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 68 Compute the per customer extended sales price, extended list price and extended tax for "out of town" shoppers buying from stores located in two cities in the first two days of each month of three consecutive years. Only consider customers with specific dependent and vehicle counts. 7091.34 85.51 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00 Hive 10 Hive 13 X Factor 83 All  Values  in  Seconds  
  • 28. Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 66 Compute web and catalog sales and profits by warehouse. Report results by month for a given year during a given 8-hour period. 40677.91 619.73 0.00 5000.00 10000.00 15000.00 20000.00 25000.00 30000.00 35000.00 40000.00 45000.00 Hive 10 Hive 13 X Factor 66 All  Values  in  Seconds  
  • 29. Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 95 Produce a count of web sales and total shipping cost and net profit in a given 60 day period to customers in a given state from a named web site for returned orders shipped from more than one warehouse. 20473.84 334.37 0.00 5000.00 10000.00 15000.00 20000.00 25000.00 Hive 10 Hive 13 X Factor 61 All  Values  in  Seconds  
  • 30. Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 93 For a given merchandise return reason, report on customers’ total cost of purchases minus the cost of returned items. Limit the output to the 100 customers with the highest value of total purchases. 200501.26 3670.04 0.00 50000.00 100000.00 150000.00 200000.00 250000.00 Hive 10 Hive 13 X Factor 55 All  Values  in  Seconds  
  • 31. Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 21 For all items whose price was changed on a given date, compute the percentage change in inventory between the 30-day period BEFORE the price change and the 30-day period AFTER the change. Group this information by warehouse. 1393.71 29.00 0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 1400.00 1600.00 Hive 10 Hive 13 X Factor 48 All  Values  in  Seconds  
  • 32. Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 46 Compute the per-customer coupon amount and net profit of all "out of town" customers buying from stores located in 5 cities on weekends in three consecutive years. The customers need to fit the profile of having a specific dependent count and vehicle count. For all these customers print the city they lived in at the time of purchase, the city in which the store is located, the coupon amount and net profit. 8236.25 171.84 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00 9000.00 Hive 10 Hive 13 X Factor 48 All  Values  in  Seconds  
  • 33. Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 88 How many items do we sell between pacific times of a day in certain stores to customers with one dependent count and 2 or less vehicles registered or 2 dependents with 4 or fewer vehicles registered or 3 dependents and five or less vehicles registered. In one row break the counts into sells from 8:30 to 9, 9 to 9:30, 9:30 to 10 ... 12 to 12:30 72721.59 1767.31 0.00 10000.00 20000.00 30000.00 40000.00 50000.00 60000.00 70000.00 80000.00 Hive 10 Hive 13 X Factor 41 All  Values  in  Seconds  
  • 34. Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 32 Compute the total discounted amount for a particular manufacturer in a particular 90 day period for catalog sales whose discounts exceeded the average discount by at least 30%. 6103.18 154.89 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 Hive 10 Hive 13 X Factor 39 All  Values  in  Seconds  
  • 35. Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 17 Analyze, for each state, all items that were sold in stores in a particular quarter and returned in the next three quarters and then re-purchased by the customer through the catalog channel in the three following quarters. 11578.61 300.94 0.00 2000.00 4000.00 6000.00 8000.00 10000.00 12000.00 14000.00 Hive 10 Hive 13 X Factor 38 All  Values  in  Seconds  
  • 36. Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 94 Produce a count of web sales and total shipping cost and net profit in a given 60 day period to customers in a given state from a named web site for non returned orders shipped from more than one warehouse. 5859.67 181.77 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 Hive 10 Hive 13 X Factor 32 All  Values  in  Seconds  
  • 37. Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 79 Compute the per customer coupon amount and net profit of Monday shoppers. Only purchases of three consecutive years made on Mondays in large stores by customers with a certain dependent count and with a large vehicle count are considered. 8568.70 272.71 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00 9000.00 Hive 10 Hive 13 X Factor 31 All  Values  in  Seconds  
  • 38. Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 76 Computes the average quantity, list price, discount, sales price for promotional items sold through the web channel where the promotion is not offered by mail or in an event for given gender, marital status and educational status. 7346.07 257.89 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00 Hive 10 Hive 13 X Factor 28 All  Values  in  Seconds  
  • 39. Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 92 Compute the total discount on web sales of items from a given manufacturer over a particular 90 day period for sales whose discount exceeded 30% over the average discount of items from that manufacturer in that period of time. 9768.99 860.82 0.00 2000.00 4000.00 6000.00 8000.00 10000.00 12000.00 Hive 10 Hive 13 X Factor 11 All  Values  in  Seconds  
  • 40. Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 97 Generate counts of promotional sales and total sales, and their ratio from the web channel for a particular item category and month to customers in a given time zone. 10802.96 1178.89 0.00 2000.00 4000.00 6000.00 8000.00 10000.00 12000.00 Hive 10 Hive 13 X Factor 9 All  Values  in  Seconds  
  • 41. Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 87 Count how many customers have ordered on the same day items on the web and the catalog and on the same day have bought items in a store. 12308.54 1672.35 0.00 2000.00 4000.00 6000.00 8000.00 10000.00 12000.00 14000.00 Hive 10 Hive 13 X Factor 7 All  Values  in  Seconds  
  • 42. Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 13 Calculate the average sales quantity, average sales price, average wholesale cost, total wholesale cost for store sales of different customer types (e.g., based on marital status, education status) including their household demographics, sales price and different combinations of state and sales profit for a given year. 46260.90 8528.86 0.00 5000.00 10000.00 15000.00 20000.00 25000.00 30000.00 35000.00 40000.00 45000.00 50000.00 Hive 10 Hive 13 X Factor 5 All  Values  in  Seconds  
  • 43. Page 43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 50 For each store count the number of items in a specified month that were returned after 30, 60, 90, 120 and more than 120 days from the day of purchase. 8063.41 3125.99 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00 9000.00 Hive 10 Hive 13 X Factor 3 All  Values  in  Seconds  
  • 44. Page 44 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 20 Compute the total revenue and the ratio of total revenue to revenue by item class for specified item categories and time periods. 0.00 77.59 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 Hive 10 Hive 13 X Factor ∞ All  Values  in  Seconds  
  • 45. Page 45 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 25 Get all items that were sold in stores in a particular month and year AND returned in the next three quarters AND re-purchased by the customer through the catalog channel in the six following months. For these items, compute the sum of net profit of store sales, net loss of store loss and net profit of catalog. Group this information by item and store. 0.00 318.87 0.00 50.00 100.00 150.00 200.00 250.00 300.00 350.00 Hive 10 Hive 13 X Factor ∞ All  Values  in  Seconds  
  • 46. Page 46 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 28 Calculate the average list price, number of non empty (null) list prices and number of distinct list prices of six different sales buckets of the store sales channel. Each bucket is defined by a range of distinct items and information about list price, coupon amount and wholesale cost. 0.00 2227.67 0.00 500.00 1000.00 1500.00 2000.00 2500.00 Hive 10 Hive 13 X Factor ∞ All  Values  in  Seconds  
  • 47. Page 47 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 45 Report the total web sales for customers in specific zip codes, cities, counties or states, or specific items for a given year and quarter. 0.00 112.09 0.00 20.00 40.00 60.00 80.00 100.00 120.00 Hive 10 Hive 13 X Factor ∞ All  Values  in  Seconds  
  • 48. Page 48 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 48 Calculate the total sales by different types of customers (e.g., based on marital status, education status), sales price and different combinations of state and sales profit. 0.00 1813.69 0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 1400.00 1600.00 1800.00 2000.00 Hive 10 Hive 13 X Factor ∞ All  Values  in  Seconds  
  • 49. Page 49 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 49 Report the top 10 worst return ratios (sales to returns) of all items for each channel by quantity and currency sorted by ratio. Quantity ratio is defined as total number of sales to total number of returns. Currency ratio is defined as sum of return amount to sum of net paid. 0.00 559.75 0.00 100.00 200.00 300.00 400.00 500.00 600.00 Hive 10 Hive 13 X Factor ∞ All  Values  in  Seconds  
  • 50. Page 50 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 85 For all web return reason calculate the average sales, average refunded cash and average return fee by different combinations of customer and sales types (e.g., based on marital status, education status, state and sales profit). 0.00 500.67 0.00 100.00 200.00 300.00 400.00 500.00 600.00 Hive 10 Hive 13 X Factor ∞ All  Values  in  Seconds  
  • 51. Page 51 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 89 Within a year list all month and combination of item categories, classes and brands that have had monthly sales larger than 0.1 percent of the total yearly sales. 0.00 164.49 0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 Hive 10 Hive 13 X Factor ∞ All  Values  in  Seconds  
  • 52. Page 52 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 90 What is the ratio between the number of items sold over the internet in the morning (8 to 9am) to the number of items sold in the evening (7 to 8pm) of customers with a specified number of dependents. Consider only websites with a high amount of content. 0.00 131.18 0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 Hive 10 Hive 13 X Factor ∞ All  Values  in  Seconds  
  • 53. Page 53 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Results for Data Mining Queries Queries #34, 39, 64, 71, 73 & 98
  • 54. Page 54 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Results for Data Mining Queries Query # Query Description Hive 13 Hive 10 Change 73 Count the number of customers with specific buy potentials and whose dependent count to vehicle count ratio is… 48.98 5,866.06 120X 71 Select the top 10 revenue generating products, sold during breakfast or dinner time for one month… 91.61 10,498.57 115X 34 Display all customers with specific buy potentials and whose dependent count to vehicle count ratio is larger than 1.2… 125.72 6,745.30 54X 39 Query with multiple, related iterations… 111.74 2,452.08 22X 64 Find those stores that sold more cross-sales items from one year to another 6,821.24 34,289.66 5X 98 Report on items sold in a given 30 day period, belonging to the specified category. 1,085.06 NA ∞ All times in seconds Deep Mining: Queries that return large amounts of data for further processing by other tools
  • 55. Page 55 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 73 Count  the  number  of  customers  with  specific  buy  poten7als  and  whose  dependent   count  to  vehicle  count  ra7o  is  larger  than  1  and  who  in  three  consecu7ve  years  bought   in  stores  located  in  4  coun7es  between  1  and  5  items  in  one  purchase.    Only  purchases   in  the  first  2  days  of  the  months  are  considered.   5866.06 48.98 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 Hive 10 Hive 13 X Factor 120 All  Values  in  Seconds  
  • 56. Page 56 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 71 Select  the  top  10  revenue  genera7ng  products,  sold  during  breakfast  or  dinner  7me  for   one  month  managed  by  a  given  manager  across  all  three  sales  channels. 10498.57 91.61 0.00 2000.00 4000.00 6000.00 8000.00 10000.00 12000.00 Hive 10 Hive 13 X Factor 115 All  Values  in  Seconds  
  • 57. Page 57 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 34 Display all customers with specific buy potentials and whose dependent count to vehicle count ratio is larger than 1.2, who in three consecutive years made purchases with between 15 and 20 items in the beginning or the end of each month in stores located in 8 counties. 6745.30 125.72 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 7000.00 8000.00 Hive 10 Hive 13 X Factor 54 All  Values  in  Seconds  
  • 58. Page 58 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 39 This query contains multiple, related iterations: •  Iteration 1: Calculate the coefficient of variation and mean of every item and warehouse of two consecutive months •  Iteration 2: Find items that had a coefficient of variation in the first months of 1.5 or large 2452.08 111.74 0.00 500.00 1000.00 1500.00 2000.00 2500.00 3000.00 Hive 10 Hive 13 X Factor 22 All  Values  in  Seconds  
  • 59. Page 59 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 64 Find those stores that sold more cross-sales items from one year to another. Cross-sale items are items that are sold over the Internet, by catalog and in store. 34289.66 6821.24 0.00 5000.00 10000.00 15000.00 20000.00 25000.00 30000.00 35000.00 40000.00 Hive 10 Hive 13 X Factor 5 All  Values  in  Seconds  
  • 60. Page 60 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Query 98 Report on items sold in a given 30 day period, belonging to the specified category. 0.00 1085.06 0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 Hive 10 Hive 13 X Factor ∞ All  Values  in  Seconds