微软和Hadoop
• 基于Apache Hadoop的WindowsServer和Windows
Azure,以及AD的支持
– HDInsight Server
– HDInsight Service
• Hive ODBC Driver 和 Add-in for Excel
• JavaScript Framework for Hadoop
• SQL Server and SQL Server Parallel 数据 Warehouse
connections for Hadoop
• Sharepoint, Powerpivot和Powerview作为前端展示
13
如果用HiveQL
SELECT sourceIP, totalRevenue,avgPageRank
FROM
SELECT sourceIP, sum(adRevenue) as totalRevenue,
avg(pageRank)as avgPageRank
FROM Rankings as R, Uservisits as UV
WHERE R.pageURL = UV.destURL and UV.visitDate
between Date (‘2000-01-15’) and Date (‘2000-01-22’)
GROUP BY UV.sourceIP
ORDER BY totalRevenue DESC limit 1;
34
35.
Hive中的表
• 就像在关系型数据库里, 数据存储在表里
•比SQL更丰富的字段类型
– 基本类型: ints, floats, strings, date
– 复杂类型: associative arrays, lists, structs
例子:
CREATE Table Employees
(
Name string,
Salary integer,
Children List <Struct <firstName: string, DOB:date>>
)
35