Your SlideShare is downloading. ×
0
2010/5/20
   OSS
OSS Laboratories Inc.!
                   http://www.ossl.co.jp

             Mail: funai@ossl.co.jp
    ...
•                                     OSS
•    Apache
•    Google
          !                         GFS (Google File Sys...
Apache Hadoop

                   ETL Tools        BI Reporting      RDBMS

                 Pig (Data Flow)     Hive (SQL...
HDFS: Hadoop Distributed File System




        HDFS




         64MB
MapReduce: Distributed Processing




       /
                  Map
                 Reduce

1
Hadoop

    Business Intelligence                Interactive Application
               OLAP Data Mart                 OLT...
Hadoop
!                                         :
     !    2x Quad Core Nehalems
     !    24GB
     !    12 * 1TB SATA ...
Yahoo!
•             Hadoop
•    25,000    82PB                                         Hadoop
•                  4,000   ...
Facebook
     •    200                                                     Hive                                         DW...
VISA
•          2     Hadoop                                         340TB
     •    Hadoop #1 ~40Tb / 42 node
     •    H...
•                                            5
     •                               CMCC
                                 ...
JP
•    Hadoop
•         PC
                                         (RDBMS)
•         RDBMS                    SAN/NAS


...
•                                                            Hadoop
     •    4,000                2,000
                 ...
•    COOKPAD:                                                                   3.9
                              816     ...
http://www.cloudera.com/


•    Hadoop
•    Cloudera                Mike Olson Oracle
                          SleepycatS...
Pentaho + Hadoop
 •    2010/7
 •    Hadoop                                                                 BI




        ...
IBM InfoSphere BigInsights
•     Apache Hadoop                                      BigInsights Core
           Web
      ...
Upcoming SlideShare
Loading in...5
×

hadoop事例紹介

6,781

Published on

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,781
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
242
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "hadoop事例紹介"

  1. 1. 2010/5/20 OSS OSS Laboratories Inc.! http://www.ossl.co.jp Mail: funai@ossl.co.jp Twitter: http://twitter.com/satoruf LinkedIn: http://jp.linkedin.com/in/satorufunai/ja 1 Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved
  2. 2. •  OSS •  Apache •  Google !  GFS (Google File System) : HDFS (Hadoop Distributed File System) !  Google MapReduce : Hadoop MapReduce !  Google Chubby : Hadoop Zookeeper !  DSL Google Sawzall : Hadoop Pig !  Google BigTable : Hadoop Hbase !  Google ? : Hadoop Hive •  •  •  Yahoo! Facebook Amazon China Mobile VISA JP Morgan Chase •  UFJ NTT •  ACID Atomic Consistent Isolated Durable BASE Basically Available Soft-State Eventual Consistency •  Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 2
  3. 3. Apache Hadoop ETL Tools BI Reporting RDBMS Pig (Data Flow) Hive (SQL) Sqoop Avro (Serialization) (Coordination) MapReduce (Job Scheduling/Execution System) Zookeepr HBase (key-value store) (Streaming/Pipes APIs) HDFS (Hadoop Distributed File System)
  4. 4. HDFS: Hadoop Distributed File System HDFS 64MB
  5. 5. MapReduce: Distributed Processing / Map Reduce 1
  6. 6. Hadoop Business Intelligence Interactive Application OLAP Data Mart OLTP Data Store Engineers Hadoop: Storage and Batch Processing ETL/sqoop
  7. 7. Hadoop !  : !  2x Quad Core Nehalems !  24GB !  12 * 1TB SATA (JBOD , RAID ) !  1 Gigabit Ethernet !  : !  HDFS : !  ! reserved for temp shuffle space, which leaves 9TB/node !  3 way replication leads to 3TB effective HDFS space/node !  But assuming 7x compression that becomes ~ 20TB/node TB :2 5 /TB
  8. 8. Yahoo! •  Hadoop •  25,000 82PB Hadoop •  4,000 64TB 16PB 32,000 •  500 •  SearchAssistTM 26 20 •  1,500 1TB 62 •  3,700 1PB 16 Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 8
  9. 9. Facebook •  200 Hive DWH Hadoop •  +12TB/ •  135TB/ •  1,050 32TB 12.5PB 4,800 •  .*/"& !"#$ %)&*#"$ ($ %"&'"&($ +*,-*"&$ 5*'"$ %)&*#"657,118$ &"8/*)731 =,A1)$5*'"657,118$ 9/2(:"&$ 4$ 9/2(:"&$ Node = 0&1,2)314$5*'"657,118$ Disks Disks Disks Disks Disks Disks DataNode ;&7)/"$ .","&7:",$ Node Node Node Node Node Node + <=9$ 9/2(:"&$ Map-Reduce +>%?@$ 1 Gigabit 4 Gigabit Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 9
  10. 10. VISA •  2 Hadoop 340TB •  Hadoop #1 ~40Tb / 42 node •  Hadoop #1 ~300Tb / 28 node •  Hadoop ( ) •  ( ) Hadoop IP •  2 7 3000 36TB •  1 Hadoop 13 Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 10
  11. 11. •  5 •  CMCC CDR 1 5TB~9TB 2,000 1 300GB •  BC-PDM(Big Cloud based Parallel Data Mining) •  Hadoop HDFS Hyper-DFS Hadoop •  16 •  ETL 12 16 •  10 50 •  3 7 •  Hadoop 256 Hadoop Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 11
  12. 12. JP •  Hadoop •  PC (RDBMS) •  RDBMS SAN/NAS Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 12
  13. 13. •  Hadoop •  4,000 2,000 GB •  GB x •  •  •  •  150% •  Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 15
  14. 14. •  COOKPAD: 3.9 816 64 30 4 1 •  •  Amazon EC2 50 Hadoop •  •  http://business.nikkeibp.co.jp/article/tech/20100416/214016/ Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 16
  15. 15. http://www.cloudera.com/ •  Hadoop •  Cloudera Mike Olson Oracle SleepycatSoftware CEO) Christophe Bisciglia Google Dr.Amr Awadallah Yahoo! VivaSmart Jeff Hammerbacher Facebook •  Cloudera Diane Greene VMware CEO Mike Abbott Palm CaterinaFake Flickr Dr. Qi Lu Microsoft Yahoo! MartenMickos MySQL CEO Jeff Weiner LinkedIn Yahoo! Gideon Yu Facebook CFO YouTube CFO •  Yahoo! Facebook OpenPDC Codeplex Hadoop Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 17
  16. 16. Pentaho + Hadoop •  2010/7 •  Hadoop BI Hive Hadoop DFS Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 18
  17. 17. IBM InfoSphere BigInsights •  Apache Hadoop BigInsights Core Web BigSheets 2 •  BigSheets BigSheets BigInsights Core •  BigSheet Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 19
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×