Your SlideShare is downloading. ×
  • Like
hadoop事例紹介
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply
Published

 

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
6,656
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
236
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 2010/5/20 OSS OSS Laboratories Inc.! http://www.ossl.co.jp Mail: funai@ossl.co.jp Twitter: http://twitter.com/satoruf LinkedIn: http://jp.linkedin.com/in/satorufunai/ja 1 Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved
  • 2. •  OSS •  Apache •  Google !  GFS (Google File System) : HDFS (Hadoop Distributed File System) !  Google MapReduce : Hadoop MapReduce !  Google Chubby : Hadoop Zookeeper !  DSL Google Sawzall : Hadoop Pig !  Google BigTable : Hadoop Hbase !  Google ? : Hadoop Hive •  •  •  Yahoo! Facebook Amazon China Mobile VISA JP Morgan Chase •  UFJ NTT •  ACID Atomic Consistent Isolated Durable BASE Basically Available Soft-State Eventual Consistency •  Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 2
  • 3. Apache Hadoop ETL Tools BI Reporting RDBMS Pig (Data Flow) Hive (SQL) Sqoop Avro (Serialization) (Coordination) MapReduce (Job Scheduling/Execution System) Zookeepr HBase (key-value store) (Streaming/Pipes APIs) HDFS (Hadoop Distributed File System)
  • 4. HDFS: Hadoop Distributed File System HDFS 64MB
  • 5. MapReduce: Distributed Processing / Map Reduce 1
  • 6. Hadoop Business Intelligence Interactive Application OLAP Data Mart OLTP Data Store Engineers Hadoop: Storage and Batch Processing ETL/sqoop
  • 7. Hadoop !  : !  2x Quad Core Nehalems !  24GB !  12 * 1TB SATA (JBOD , RAID ) !  1 Gigabit Ethernet !  : !  HDFS : !  ! reserved for temp shuffle space, which leaves 9TB/node !  3 way replication leads to 3TB effective HDFS space/node !  But assuming 7x compression that becomes ~ 20TB/node TB :2 5 /TB
  • 8. Yahoo! •  Hadoop •  25,000 82PB Hadoop •  4,000 64TB 16PB 32,000 •  500 •  SearchAssistTM 26 20 •  1,500 1TB 62 •  3,700 1PB 16 Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 8
  • 9. Facebook •  200 Hive DWH Hadoop •  +12TB/ •  135TB/ •  1,050 32TB 12.5PB 4,800 •  .*/"& !"#$ %)&*#"$ ($ %"&'"&($ +*,-*"&$ 5*'"$ %)&*#"657,118$ &"8/*)731 =,A1)$5*'"657,118$ 9/2(:"&$ 4$ 9/2(:"&$ Node = 0&1,2)314$5*'"657,118$ Disks Disks Disks Disks Disks Disks DataNode ;&7)/"$ .","&7:",$ Node Node Node Node Node Node + <=9$ 9/2(:"&$ Map-Reduce +>%?@$ 1 Gigabit 4 Gigabit Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 9
  • 10. VISA •  2 Hadoop 340TB •  Hadoop #1 ~40Tb / 42 node •  Hadoop #1 ~300Tb / 28 node •  Hadoop ( ) •  ( ) Hadoop IP •  2 7 3000 36TB •  1 Hadoop 13 Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 10
  • 11. •  5 •  CMCC CDR 1 5TB~9TB 2,000 1 300GB •  BC-PDM(Big Cloud based Parallel Data Mining) •  Hadoop HDFS Hyper-DFS Hadoop •  16 •  ETL 12 16 •  10 50 •  3 7 •  Hadoop 256 Hadoop Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 11
  • 12. JP •  Hadoop •  PC (RDBMS) •  RDBMS SAN/NAS Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 12
  • 13. •  Hadoop •  4,000 2,000 GB •  GB x •  •  •  •  150% •  Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 15
  • 14. •  COOKPAD: 3.9 816 64 30 4 1 •  •  Amazon EC2 50 Hadoop •  •  http://business.nikkeibp.co.jp/article/tech/20100416/214016/ Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 16
  • 15. http://www.cloudera.com/ •  Hadoop •  Cloudera Mike Olson Oracle SleepycatSoftware CEO) Christophe Bisciglia Google Dr.Amr Awadallah Yahoo! VivaSmart Jeff Hammerbacher Facebook •  Cloudera Diane Greene VMware CEO Mike Abbott Palm CaterinaFake Flickr Dr. Qi Lu Microsoft Yahoo! MartenMickos MySQL CEO Jeff Weiner LinkedIn Yahoo! Gideon Yu Facebook CFO YouTube CFO •  Yahoo! Facebook OpenPDC Codeplex Hadoop Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 17
  • 16. Pentaho + Hadoop •  2010/7 •  Hadoop BI Hive Hadoop DFS Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 18
  • 17. IBM InfoSphere BigInsights •  Apache Hadoop BigInsights Core Web BigSheets 2 •  BigSheets BigSheets BigInsights Core •  BigSheet Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 19