Cloud ComputingBigData                      2011.12              -   -        2.0          .
babokim@gmail.com  )     (www.gruter.com)     SDS, NHNwww.jaso.co.krwww.cloudata.orgwww.cloumon.orgwww.twitter.com/babokim...
BigData Definition(1)   Big Data(BD)                                               /     /                              , ...
BigData Definition(2)     Very large, distributed aggregations of loosely     structured data           Petabytes/exabytes...
Big-data Analytics Complements Data Warehouse    Traditional Data Warehouse      -   Complete record from transactional sy...
Facebook Social plug-in  Transactional                             process over 20 billion events per                     ...
BigData   Collecting         Store                             Analysis                       Reporting/Searching   , SNS ...
Twitter                 :    backtype                    Workers choose queue to enqueue       All updates for same URL   ...
BigData Architectural Requirements                              ? Scalability  - Scale-out                          -  - E...
BigData                                                                     Flume, Scribe, Chukwa                         ...
Hadoop Echo System http://indoos.wordpress.com/2010/08/16/hadoop-ecosystem-world-map/
Software Stack                                     Interface                                                              ...
Application   Application         Server                                   Collector #1     Server             Log4j      ...
- Esper       Event- Gruter ClouStream, Yahoo S4, Twitter Storm, Facebook Puma                                            ...
: Hadoop File SystemBigData                        Defacto Standard                         x86     /NameNode   SPOF(Singl...
: MapReduce
: Hadoop MapReduceMapReduce                    ,    MapReduce    MapReduceHadoop FileSystem        /                    DB...
: Script Language          HiveHive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);hive> LOAD DATA...
Next Generation Hadoop(0.23)                               YARN                               (Next MapReduce Framework)  ...
NoSQL                            ,    , Scale-out             ,      Key/value, Document       , Simple Column      Schema...
NoSQL: Cloudata/HBase   Distributed Data Storage                                   Create, drop, modify table schema      ...
seenal.com
10.29
BigData                                                           .                                    ,                  ...
.Facebook:      babokim@gruter.com          www.jaso.co.kr
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata
Upcoming SlideShare
Loading in …5
×

제1회 Korea Community Day 발표자료 Bigdata

4,406 views

Published on

Bigdata Platform, Hadoop, Hive, ...

Published in: Technology, Business
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,406
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
328
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

제1회 Korea Community Day 발표자료 Bigdata

  1. 1. Cloud ComputingBigData 2011.12 - - 2.0 .
  2. 2. babokim@gmail.com ) (www.gruter.com) SDS, NHNwww.jaso.co.krwww.cloudata.orgwww.cloumon.orgwww.twitter.com/babokimwww.facebook.com/babokim
  3. 3. BigData Definition(1) Big Data(BD) / / , , DB (McKinsey, 2011) What - SW , , is DB (IDC, 2011) BigData? - Big Data ( ) , , , , Gartner McKinsey Economist (2011.03) (2011.05) (2010.05) 21 / SNS M2M , / , , , , 5 Information silo 6 : Big Data, (KT )
  4. 4. BigData Definition(2) Very large, distributed aggregations of loosely structured data Petabytes/exabytes of data, Millions/billions of people, Billions/trillions of records, Loosely-structured and often distributed data, Flat schemas with few complex interrelationships, Often involving time-stamped events, Often made up of incomplete data, Often including connections between data elements that must be probabilistically inferred, Applications that involved Big-data can be Transactional (e.g., Facebook, PhotoBox), or, Analytic (e.g., ClickFox, Merced Applications). http://wikibon.org/wiki/v/Enterprise_Big-data
  5. 5. Big-data Analytics Complements Data Warehouse Traditional Data Warehouse - Complete record from transactional system - All data centralized - Analytics designed against stable environment - Many reports run on a production basis Big-data Analytic Environment - Data from many sources inside and outside of organization (including traditional DW) - Data often physically distributed - Need to iteration solution to test/improve models - Large-memory analytics also part of iteration - Every iteration usually requires complete reload of information http://wikibon.org/wiki/v/Enterprise_Big-data
  6. 6. Facebook Social plug-in Transactional process over 20 billion events per day (200,000 events per second) with a lag of less than 30 seconds. Feedback Analytic
  7. 7. BigData Collecting Store Analysis Reporting/Searching , SNS / Senti- Cluster- Classifi- mental Indexing ing cation Analysis ( ) / Repository/ User Define Query Script Robot RSS Reader (DBMS, ETL NoSQL) OpenAPI Index Data Aggregator
  8. 8. Twitter : backtype Workers choose queue to enqueue All updates for same URL to using hash/mod of URL guaranteed to go to same worker Workers share the load of schemifying tweetsDistribute tweets randomly Workers schemify tweets Workers update statistics on URLs by on multiple queues and append to Hadoop incrementing counters in Cassandra
  9. 9. BigData Architectural Requirements ? Scalability - Scale-out - - Elasticity - Reliability - - - Hadoop - Flexibility Component , - Easy for adding Analysis Rule - Support various data format Latency - Real time, Near Real time, Batch High Throughput IBM, HP, Oracle - Global web scale traffic - ~ /sec - - BI/DW
  10. 10. BigData Flume, Scribe, Chukwa Hadoop FileSystem MogileFS , NoSQL(Cloudata, HBase, Cassandra) Katta, ElasticSearch count, sum aggregation S4, Storm Hadoop MapReduce(Hive, , Pig) Giraph, GoldenOrb / Cluster, Classification Mahout, R ZooKeeper, HUE, Cloumon Serialization Thrift, Avro, ProtoBuf
  11. 11. Hadoop Echo System http://indoos.wordpress.com/2010/08/16/hadoop-ecosystem-world-map/
  12. 12. Software Stack Interface Rule Management Web Phone Pad Data Visualization (Near)Real-time Batch Analysis Analysis Management Analysis Job Monitoring (cloumon) Analysis Job Mining Lib Statistics Lib Script Language(Hive, Pig) (Mahout) (R) Real-time Analysis Platform Job Workflow Engine(oozie, cascade) CEP Engine Data Analysis Platform(hadoop) (Esper) Management (ZooKeeper) Cluster Aggregator Data Store Collector File System NoSQL Search (flume, scribe, chukwa) (HadoopFS) (Cloudata, HBase, Cassandra) (ElasticSearch)
  13. 13. Application Application Server Collector #1 Server Log4j Centralized Storage Agent log (HDFS) (local) Collector #2 Temp LogChukwa(Yahoo) Hadoop FileSystem HDFS MapReduce ( )Scribe(Facebook) (thrift) Hadoop JNIFlume(Cloudera) , , Hadoop, HBase, Search Engine
  14. 14. - Esper Event- Gruter ClouStream, Yahoo S4, Twitter Storm, Facebook Puma ClouStream Puma
  15. 15. : Hadoop File SystemBigData Defacto Standard x86 /NameNode SPOF(Single Point Of Failure)
  16. 16. : MapReduce
  17. 17. : Hadoop MapReduceMapReduce , MapReduce MapReduceHadoop FileSystem / DB, FTP Server FIFO, Fair, Capacity / MapReduce , (streaming)
  18. 18. : Script Language HiveHive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);hive> LOAD DATA LOCAL INPATH ./examples/files/kv1.txt OVERWRITE INTO TABLE invites;hive> SELECT a.foo FROM invites a WHERE a.ds=2008-08-15;hive> FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar) INSERT OVERWRITE TABLE events SELECT t1.bar, t1.foo, t2.foo; PigVisits = load /data/visits as (user, url, time);Visits = foreach Visits generate user, Canonicalize(url), time;Pages = load /data/pages as (url, pagerank);VP = join Visits by url, Pages by url;UserVisits = group VP by user;UserPageranks = foreach UserVisits generate user,AVG(VP.pagerank) as avgpr;GoodUsers = filter UserPageranks by avgpr > 0.5 ;store GoodUsers into /data/good_users;
  19. 19. Next Generation Hadoop(0.23) YARN (Next MapReduce Framework) HDFS Federation
  20. 20. NoSQL , , Scale-out , Key/value, Document , Simple Column Schema Free Big Data x86 CAP(Brewers Conjecture) Eventually consistent / BASE (not ACID) Simple API Twitter: Cassandra, HBase, Hadoop, Scribe, FlockDB, Redis Facebook: Cassandra, HBase, Hadoop, Scribe, Hive Netflix: Amazon SimpleDB, Cassandra Digg: Cassandra SimpleGeo: Cassandra StumbleUpon: HBase, OpenTSDB Yahoo!: Hadoop, HBase, PNUTS Rackspace: Cassandra DAUM: MongoDB NCSoft: Cassandra
  21. 21. NoSQL: Cloudata/HBase Distributed Data Storage Create, drop, modify table schema semi-structured data store(not file system) Single row operation / Multi row operation: like, between Google Bigtable clone Data Model, Architecture, Features Scanner, Direct Uploader, MapReduce Adapter Open source http://www.cloudata.org Automatic table split & re-assignment Goal (Hadoop) 500 nodes Failover 300 GB /node, Peta bytes ~
  22. 22. seenal.com
  23. 23. 10.29
  24. 24. BigData . , BigData . , , , BigData .BigData , Data . , . . . (6 ~ 1 ) . . . .
  25. 25. .Facebook: babokim@gruter.com www.jaso.co.kr

×