Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
6,619
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
235
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 2010/5/20 OSS OSS Laboratories Inc.! http://www.ossl.co.jp Mail: funai@ossl.co.jp Twitter: http://twitter.com/satoruf LinkedIn: http://jp.linkedin.com/in/satorufunai/ja 1 Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved
  • 2. •  OSS •  Apache •  Google !  GFS (Google File System) : HDFS (Hadoop Distributed File System) !  Google MapReduce : Hadoop MapReduce !  Google Chubby : Hadoop Zookeeper !  DSL Google Sawzall : Hadoop Pig !  Google BigTable : Hadoop Hbase !  Google ? : Hadoop Hive •  •  •  Yahoo! Facebook Amazon China Mobile VISA JP Morgan Chase •  UFJ NTT •  ACID Atomic Consistent Isolated Durable BASE Basically Available Soft-State Eventual Consistency •  Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 2
  • 3. Apache Hadoop ETL Tools BI Reporting RDBMS Pig (Data Flow) Hive (SQL) Sqoop Avro (Serialization) (Coordination) MapReduce (Job Scheduling/Execution System) Zookeepr HBase (key-value store) (Streaming/Pipes APIs) HDFS (Hadoop Distributed File System)
  • 4. HDFS: Hadoop Distributed File System HDFS 64MB
  • 5. MapReduce: Distributed Processing / Map Reduce 1
  • 6. Hadoop Business Intelligence Interactive Application OLAP Data Mart OLTP Data Store Engineers Hadoop: Storage and Batch Processing ETL/sqoop
  • 7. Hadoop !  : !  2x Quad Core Nehalems !  24GB !  12 * 1TB SATA (JBOD , RAID ) !  1 Gigabit Ethernet !  : !  HDFS : !  ! reserved for temp shuffle space, which leaves 9TB/node !  3 way replication leads to 3TB effective HDFS space/node !  But assuming 7x compression that becomes ~ 20TB/node TB :2 5 /TB
  • 8. Yahoo! •  Hadoop •  25,000 82PB Hadoop •  4,000 64TB 16PB 32,000 •  500 •  SearchAssistTM 26 20 •  1,500 1TB 62 •  3,700 1PB 16 Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 8
  • 9. Facebook •  200 Hive DWH Hadoop •  +12TB/ •  135TB/ •  1,050 32TB 12.5PB 4,800 •  .*/"& !"#$ %)&*#"$ ($ %"&'"&($ +*,-*"&$ 5*'"$ %)&*#"657,118$ &"8/*)731 =,A1)$5*'"657,118$ 9/2(:"&$ 4$ 9/2(:"&$ Node = 0&1,2)314$5*'"657,118$ Disks Disks Disks Disks Disks Disks DataNode ;&7)/"$ .","&7:",$ Node Node Node Node Node Node + <=9$ 9/2(:"&$ Map-Reduce +>%?@$ 1 Gigabit 4 Gigabit Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 9
  • 10. VISA •  2 Hadoop 340TB •  Hadoop #1 ~40Tb / 42 node •  Hadoop #1 ~300Tb / 28 node •  Hadoop ( ) •  ( ) Hadoop IP •  2 7 3000 36TB •  1 Hadoop 13 Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 10
  • 11. •  5 •  CMCC CDR 1 5TB~9TB 2,000 1 300GB •  BC-PDM(Big Cloud based Parallel Data Mining) •  Hadoop HDFS Hyper-DFS Hadoop •  16 •  ETL 12 16 •  10 50 •  3 7 •  Hadoop 256 Hadoop Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 11
  • 12. JP •  Hadoop •  PC (RDBMS) •  RDBMS SAN/NAS Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 12
  • 13. •  Hadoop •  4,000 2,000 GB •  GB x •  •  •  •  150% •  Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 15
  • 14. •  COOKPAD: 3.9 816 64 30 4 1 •  •  Amazon EC2 50 Hadoop •  •  http://business.nikkeibp.co.jp/article/tech/20100416/214016/ Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 16
  • 15. http://www.cloudera.com/ •  Hadoop •  Cloudera Mike Olson Oracle SleepycatSoftware CEO) Christophe Bisciglia Google Dr.Amr Awadallah Yahoo! VivaSmart Jeff Hammerbacher Facebook •  Cloudera Diane Greene VMware CEO Mike Abbott Palm CaterinaFake Flickr Dr. Qi Lu Microsoft Yahoo! MartenMickos MySQL CEO Jeff Weiner LinkedIn Yahoo! Gideon Yu Facebook CFO YouTube CFO •  Yahoo! Facebook OpenPDC Codeplex Hadoop Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 17
  • 16. Pentaho + Hadoop •  2010/7 •  Hadoop BI Hive Hadoop DFS Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 18
  • 17. IBM InfoSphere BigInsights •  Apache Hadoop BigInsights Core Web BigSheets 2 •  BigSheets BigSheets BigInsights Core •  BigSheet Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 19