2. The Gartner - Top 10 Technology Trends
for the Energy and Utilities Sector in
2013
Social Media and Web 2.0
Big Data
Mobile and Location-Aware Technology
Cloud Computing and SaaS
Sensor Technology
In-Memory Computing
IT and OT Convergence
Advanced Metering Infrastructure
Communication Technology
Predictive Analytics
2 http://www.gartner.com/newsroom/id/2426515
3. The Gartner
By 2017, the CMO will spend more on IT than
the CIO." (Gartner)
根據Gartner 對全球CIO 調查顯示,BYOD(員工
自攜設備)普及率將在2017年突破五成,最終將達
到85%
3
http://my.gartner.com/portal/server.pt?open=512&objID=202&mode=2&PageID=5553&resId=1871515
http://blog.cloudsherpas.com/cloud-strategy-2/how-to-use-itsm-to-manage-your-byod-strategy/
10. Hadoop 的應用價值
Mediam Latency, batch jobs
For lower latency:
In-Memory Computing/
Streaming Computing (CEP)
10
11. Open Source Big data project
Solution Developer Type Description
Storm Twitter Streaming Twitter's new streaming
big-data analytics
solution
S4 Yahoo! Streaming Distributed stream
computing platform
from Yahoo!
Hadoop Apache Batch First open source
implementation of the
MapReduce paradigm
Spark UC Berkeley
AMPLab
Batch Recent analytics
platform that supports
in-memory data sets
and resiliency
Disco Nokia Batch Nokia's distributed
MapReduce framework
HPCC LexisNexis Batch HPC cluster for big
data
http://www.ibm.com/developerworks/library/os-twitterstorm/
12. Big Data : 4V
12 Ref: http://contest.trendmicro.com/2013/tw/train.htm
15. HDFS Read
15
Name Node
Data Node Data Node Data Node Data Node
Local disk
name,replica,block_id,,
/home/data/test, 3 …etc
Local disk Local disk Local disk
1
2
3
name
block_id, loc
Transfer data
block_operation(heartbeat, replication, re-balancing
16. HDFS Write
16
Name Node
1
2
create
Write data
doing replica
(data replication)
3
Data Node Data Node Data Node Data Node
Local disk Local disk Local disk Local disk
4
Ack packet
5 Finish
17. Secondary
NameNode
HDFS Fault Tolerance
17
Name Node
Data Node Data Node Data Node
Local disk Local disk Local disk
Data Node
Local disk
18. NameNode HA?
Cloudera
NAS
Hortonworks
Linux Cluster HA or vSphere vMotion
Apache Hadoop
QJM, BackupNode
Other Industry Solution
Veritas VCS/VVR, Linux DRBD
18
19. Hadoop – Big Database?
空間占用為TB ~ PB 以上
儲存非結構化與結構化資料
半結構化資料可以為AP logs, GPS data ,
machine/sensor data
非結構化圖像, 錄影, 錄音檔或任何檔案
19
20. Mapreduce - Dataflow
20
Split 0
Split 1
Split 2
Map 0
Map 1
Map 2
Key a
Key b
Key a
Key b
Key a
Key a
Key b
Reduce 0 Split 0
Reduce 1 Split 1
Input HDFS
Merge Output HDFS
Key b
25. Highlighting Component in Hadoop
25
Component Description
Mahout Scalable machine learning algorithms
YARN NextGen MapReduce - Resource management and
job scheduling
Shark 100X faster than Hive for SQL
Spark Spark can run up to 100x faster
than Hadoop MapReduce
Rhipe R language faster than Java MR
Solr A NoSQL Search Server and Big Data Analytics tool
Storm The Hadoop of Realtime Stream Processing
26. HDFS
Hadoop Distributed File System
Designed Base on Google Filesystem
Provides high-throughput access
Ideal for big table access
Fault tolerence
26
27. HBase
Hadoop-based project
NoSQL/Key-Value Database
分散式即時資料庫
Store structured data storage for large tables
開發者存取資料需透過JAVA 實作出SQL Function/ Join
的效果來
27
28. Pig
Pig Latin 資料流語言
Don’t need to know Java ,實作並簡化了Java Map
reduce
User-defined Functions(UDF)
Parse and manipulate HDFS data
(GROUP、SORT、FILTER、JOIN)
28
37. 各大廠所推出的Hadoop appliance
HP AppSystem for Apache Hadoop
HP HAVEn
Oracle's Big Data Appliance
IBM PureData
EMC Greenplum DCA
(Data Computing Appliance)
NetApp Open Solution for Hadoop
37
Hacking skills that make for a successful data hacker, 具有從網路上取得資料的專長, 不管是被認可的資料, 或是不允許被流通的個資, 都有可能在網路上找到
Math and Stats knowledge: 具有 數學家, 統計學家專長, 屬於學者
Substantive Expertise: Someone who has substantial experience and knowledge, 相當具有solid 經驗, 與知識的專家們
Hacking skills that make for a successful data hacker, 具有從網路上取得資料的專長, 不管是被認可的資料, 或是不允許被流通的個資, 都有可能在網路上找到
Math and Stats knowledge: 具有 數學家, 統計學家專長, 屬於學者
Substantive Expertise: Someone who has substantial experience and knowledge, 相當具有solid 經驗, 與知識的專家們