常透過Hadoop 處理的資料型態
1. 情緒分析(Sentiment)
Understandhow your customers feel about your brand
2. Clickstream
Capture and analyze website visitors’ data trails and optimize your website
3. 感應器(Sensor)/機器
Discover patterns in data streaming automatically from remote sensors and machines
4. 地理資訊
Analyze location-based data to manage operations where they occur
5. 伺服器 Logs
Research logs to diagnose process failures and prevent security breaches
6. 非結構化資料 (txt, video, pictures, etc..)
Understand patterns in files across millions of web pages, emails, and documents
12.
Azure HDInsight 簡介
HadoopMeets the Cloud由微軟所管理的Hadoop服務
使用100% 開源的Apache Hadoop
相容.Net 與 Java 工具
可自動升級 Hadoop 版本
數分鐘內可以設定完成並執行, 無須採購硬體
執行於 Windows 或 Linux
啟用與設定服務, 使用, 取消服務 – 可以保留資料
微軟提供技術支援
Data Node DataNode Data Node Data Node
Task Tracker Task Tracker Task Tracker Task Tracker
Name Node
Job Tracker
HMaster
Coordination
Region Server Region Server Region Server Region Server
其他Hadoop 元件與工具
Ambari: Clusterprovisioning, management, and monitoring.
Avro (Microsoft .NET Library for Avro): Data serialization for
the Microsoft .NET environment
MapReduce and YARN: Distributed processing and resource
management
Oozie: Workflow management
Phoenix: Relational database layer over HBase
Pig: Simpler scripting for MapReduce transformations
Sqoop: Data import and export
Tez: Allows data-intensive processes to run efficiently at
scale
ZooKeeper: Coordination of processes in distributed systems