Hadoop in sigmod 2011


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hadoop in sigmod 2011

  1. 1. Hadoop in SIGMOD 2011<br />2011/5/20<br />
  2. 2. Papers<br />LCI: a social channel analysis platform for live customer intelligence<br />Bistro data feed management system<br />Apache hadoop goes realtime at Facebook<br />Nova: continuous Pig/Hadoop workflows<br />A Hadoop based distributed loading approach to parallel data warehouses<br />A batch of PNUTS: experiences connecting cloud batch and serving systems<br />
  3. 3. Papers (Continued)<br />Turbocharging DBMS buffer pool using SSDs<br />Online reorganization in read optimized MMDBS<br />Automated partitioning design in parallel database systems<br />Oracle database filesystem<br />Emerging trends in the enterprise data analytics: connecting Hadoop and DB2 warehouse<br />Efficient processing of data warehousing queries in a split execution environment<br />SQL server column store indexes<br />An analytic data engine for visualization in tableau<br />
  4. 4. Apache Hadoop Goes Realtime at Facebook<br />
  5. 5. Workload Types<br />Facebook Messaging<br />High Write Throughput<br />Large Tables<br />Data Migration<br />Facebook Insights<br />Realtime Analytics<br />High Throughput Increments<br />Facebook Metrics System (ODS)<br />Automatic Sharding<br />Fast Reads of Recent Data and Table Scans<br />
  6. 6. Why Hadoop & HBase<br />Elasticity<br />High write throughput<br />Efficient and low-latency strong consistency semantics within a data center<br />Efficient random reads from disk<br />High Availability and Disaster Recovery<br />Fault Isolation<br />Atomic read-modify-write primitives<br />Range Scans<br />Tolerance of network partitions within a single data center<br />Zero Downtime in case of individual data center failure<br />Active-active serving capability across different data centers<br />
  7. 7. RealtimeHDFS<br />High Availability - AvatarNode<br />Hot Standby – AvatarNode<br />Enhancements to HDFS transaction logging<br />Transparent Failover: DAFS(client enhancement+ZooKeeper)<br />HadoopRPC compatibility<br />Block Availability: Placement Policy<br />a pluggable block placement policy<br />
  8. 8. Realtime HDFS (Cont.)<br />Performance Improvements for a Realtime Workload<br />RPC Timeout<br />Recover File Lease<br />HDFS-append<br />recoverLease<br />Reads from Local Replicas<br />New Features<br />HDFS sync<br />Concurrent Readers (last chunk of data)<br />
  9. 9. Production HBase<br />ACID Compliance (RWCC: Read Write Consistency Control)<br />Atomicity (WALEdit)<br />Consistency<br />Availability Improvements<br />HBase Master Rewrite<br />Region assignment in memory -> ZooKeeper<br />Online Upgrades<br />Distributed Log Splitting<br />Performance Improvements<br />Compaction<br />Read Optimizations<br />
  10. 10. Deployment and Operational Experiences<br />Testing<br />Auto Tesing Tool<br />HBase Verify<br />Monitoring and Tools<br />HBCK<br />More metrics<br />Manual versus Automatic Splitting<br />Add new RegionServers, not region splitting<br />Dark Launch (灰度)<br />Dashboards/ODS integration<br />Backups at the Application layer<br />Schema Changes<br />Importing Data<br />Lzo & zip<br />Reducing Network IO<br />Major compaction<br />
  11. 11. Nova: Continuous Pig/Hadoop Workflows<br />
  12. 12. Nova Overview<br />Scenarios<br />Ingesting and analyzing user behavior logs <br />Building and updating a search index from a stream of crawled web pages <br />Processing semi-structured data feeds<br />Two-layer programming model (Nova over Pig)<br />Continuous processing<br />Independent scheduling<br />Cross-module optimization<br />Manageability features<br />
  13. 13. Abstract Workflow Model<br />Workflow<br />Two kinds of vertices: tasks (processing steps) and channels (data containers)<br />Edges connect tasks to channels and channels to tasks<br />Edge annotations (all, new, B and Δ)<br />Four common patterns of processing<br />Non-incremental (template detection)<br />Stateless incremental (shingling)<br />Stateless incremental with lookup table (template tagging)<br />Stateful incremental (de-duping)<br />
  14. 14. Abstract Workflow Model (Cont.)<br />Data and Update Model<br />Blocks: base blocks and delta blocks<br />Channel functions: merge, chain and diff<br />Task/Data Interface<br />Consumption mode: all or new<br />Production mode: B or Δ<br />Workflow Programming and Scheduling<br />Data Compaction and Garbage Collection<br />
  15. 15. Nova System Architecture<br />
  16. 16. Efficient Processing of Data Warehousing Queries in a Split Execution Environment<br />
  17. 17. Introduction<br />Two approaches<br />Starting with a parallel database system and adding some MapReduce features<br />Starting with MapReduce and adding database system technology<br />HadoopDB follows the second of the approaches<br />Two heuristics for HadoopDB optimizations<br />Database systems can process data at a faster rate than Hadoop.<br />Minimize the number of MapReduce jobs in SQL execution plan.<br />
  18. 18. HadoopDB<br />HadoopDB Architecture<br />Database Connector<br />Data Loader<br />Catalog<br />Query Interface<br />VectorWise/X100 Database (SIMD) vs. PostgreSQL<br />HadoopDB Query Execution<br />selection, projection, and partial aggregation(Map and Combine)  database system<br />co-partitioned tables<br />MR for redistributing data<br />SideDB (a "database task done on the side").<br />
  19. 19. Split Query Execution<br />Referential Partitioning<br />Join in database engine<br />Local join<br />foreign-key  Referential Partitioning<br />Split MR/DB Joins<br />Directed join: one of the tables is already partitioned by the join key.<br />Broadcast join: small table ought to be shipped to every node.<br />Adding specialized joins to the MR framework  Map-side join.<br />Tradeoffs: temporary table for join.<br />Another type of join: MR redistributes data  Directed join<br />Split MR/DB Semijoin like 'foreignKey IN (listOfValues)'<br />Can be split into two MapReduce jobs<br />SideDB to eliminate the first MapReduce job<br />
  20. 20. Split Query Execution (Cont.)<br />Post-join Aggregation<br />Two MapReduce jobs<br />Hash-based partial aggregation  save significant I/O<br />A similar technique is applied to TOP N selections<br />Pre-join Aggregation<br />For MR based join.<br />Group-by and join-key columns is smaller than the cardinality of the entire table.<br />
  21. 21. A Query Plan in HadoopDB<br />
  22. 22. Performance<br />No hash partition feature in Hive<br />
  23. 23. Emerging Trends in the Enterprise Data Analytics: Connecting Hadoop and DB2 Warehouse<br />
  24. 24. DB2 and Hadoop/Jaql Interactions<br />
  25. 25. A HadoopBased Distributed Loading Approach to Parallel Data Warehouses<br />
  26. 26. Introduction<br />Why Hadoop for Teradata EDW<br />More disk space and space can be easily added<br />HDFS as a storage<br />MapReduce<br />Distributed<br />HDFS blocks to Teradata EDW nodes assignment problem<br />Parameters: n blocks, k copies, m nodes<br />Goal: to assign HDFS blocks to nodes evenly and minimize network traffic<br />
  27. 27. Block Assignment Problem<br /> HDFS file F on a cluster of P nodes (each node is uniquely identified with an integer i where 1 ≤ i ≤ P)<br /> The problem is defined by: assignment(X, Y, n,m, k, r) <br />X is the set of n blocks (X = {1, . . . , n}) of F<br />Y is the set of m nodes running PDBMS (called PDBMS nodes) (Y⊆{1, . . . , P })<br />k copies, m nodes<br />r is the mapping recording the replicated block locations of each block.r(i) returns the set of nodes which has a copy of the block i.<br />An assignment g from the blocks in X to the nodes in Y is denoted by a mapping from X = {1, . . . , n} to Y where g(i) = j (i ∈ X, j ∈ Y ) means that the block i is assigned to the node j.<br />
  28. 28. Block Assignment Problem (Cont.)<br />The problem is defined by: assignment(X, Y, n,m, k, r) <br /> An even assignment g is an assignment such that ∀ i ∈ Y ∀j ∈ Y| |{ x | ∀ 1 ≤ x ≤ n&&g(x) = i}| - |{y | ∀ 1 ≤ y ≤ n&&g(y) = j}| | ≤ 1. <br />The cost of an assignment g is defined to be cost(g) = |{i | g(i) /∈r(i) ∀ 1 ≤ i ≤ n}|, which is the number of blocks assigned to remote nodes.<br />We use |g| to denote the number of blocks assigned to local nodes by g. We have |g| = n - cost(g).<br />The optimal assignment problem is to find an even assignment with the smallest cost.<br />
  29. 29. OBA algorithm<br />(X, Y, n,m, k, r)=({1, 2, 3}, {1, 2}, 3, 2, 1, {1 -> {1}, 2 -> {1}, 3 -> {2}})<br />