Hadoop in SIGMOD 20112011/5/20
PapersLCI: a social channel analysis platform for live customer intelligenceBistro data feed management systemApache hadoop goes realtime at FacebookNova: continuous Pig/Hadoop workflowsA Hadoop based distributed loading approach to parallel data warehousesA batch of PNUTS: experiences connecting cloud batch and serving systems
Papers (Continued)Turbocharging DBMS buffer pool using SSDsOnline reorganization in read optimized MMDBSAutomated partitioning design in parallel database systemsOracle database filesystemEmerging trends in the enterprise data analytics: connecting Hadoop and DB2 warehouseEfficient processing of data warehousing queries in a split execution environmentSQL server column store indexesAn analytic data engine for visualization in tableau
Apache Hadoop Goes Realtime at Facebook
Workload TypesFacebook MessagingHigh Write ThroughputLarge TablesData MigrationFacebook InsightsRealtime AnalyticsHigh Throughput IncrementsFacebook Metrics System (ODS)Automatic ShardingFast Reads of Recent Data and Table Scans
Why Hadoop & HBaseElasticityHigh write throughputEfficient and low-latency strong consistency semantics within a data centerEfficient random reads from diskHigh Availability and Disaster RecoveryFault IsolationAtomic read-modify-write primitivesRange ScansTolerance of network partitions within a single data centerZero Downtime in case of individual data center failureActive-active serving capability across different data centers
RealtimeHDFSHigh Availability - AvatarNodeHot Standby – AvatarNodeEnhancements to HDFS transaction loggingTransparent Failover: DAFS(client enhancement+ZooKeeper)HadoopRPC compatibilityBlock Availability: Placement Policya pluggable block placement policy
Realtime HDFS (Cont.)Performance Improvements for a Realtime WorkloadRPC TimeoutRecover File LeaseHDFS-appendrecoverLeaseReads from Local ReplicasNew FeaturesHDFS syncConcurrent Readers (last chunk of data)
Production HBaseACID Compliance (RWCC: Read Write Consistency Control)Atomicity (WALEdit)ConsistencyAvailability ImprovementsHBase Master RewriteRegion assignment in memory -> ZooKeeperOnline UpgradesDistributed Log SplittingPerformance ImprovementsCompactionRead Optimizations
Deployment and Operational ExperiencesTestingAuto Tesing ToolHBase VerifyMonitoring and ToolsHBCKMore metricsManual versus Automatic SplittingAdd new RegionServers, not region splittingDark Launch (灰度)Dashboards/ODS integrationBackups at the Application layerSchema ChangesImporting DataLzo & zipReducing Network IOMajor compaction
Nova: Continuous Pig/Hadoop Workflows
Nova OverviewScenariosIngesting and analyzing user behavior logs Building and updating a search index from a stream of crawled web pages Processing semi-structured data feedsTwo-layer programming model (Nova over Pig)Continuous processingIndependent schedulingCross-module optimizationManageability features
Abstract Workflow ModelWorkflowTwo kinds of vertices: tasks (processing steps) and channels (data containers)Edges connect tasks to channels and channels to tasksEdge annotations (all, new, B and Δ)Four common patterns of processingNon-incremental (template detection)Stateless incremental (shingling)Stateless incremental with lookup table (template tagging)Stateful incremental (de-duping)
Abstract Workflow Model (Cont.)Data and Update ModelBlocks: base blocks and delta blocksChannel functions: merge, chain and diffTask/Data InterfaceConsumption mode: all or newProduction mode: B or ΔWorkflow Programming and SchedulingData Compaction and Garbage Collection
Nova System Architecture
Efficient Processing of Data Warehousing Queries in a Split Execution Environment
IntroductionTwo approachesStarting with a parallel database system and adding some MapReduce featuresStarting with MapReduce and adding database system technologyHadoopDB follows the second of the approachesTwo heuristics for HadoopDB optimizationsDatabase systems can process data at a faster rate than Hadoop.Minimize the number of MapReduce jobs in SQL execution plan.
HadoopDBHadoopDB ArchitectureDatabase ConnectorData LoaderCatalogQuery InterfaceVectorWise/X100 Database (SIMD)  vs. PostgreSQLHadoopDB Query Executionselection, projection, and partial aggregation(Map and Combine)    database systemco-partitioned tablesMR for redistributing dataSideDB (a "database task done on the side").
Split Query ExecutionReferential PartitioningJoin in database engineLocal joinforeign-key  Referential PartitioningSplit MR/DB JoinsDirected join: one of the tables is already partitioned by the join key.Broadcast join: small table ought to be shipped to every node.Adding specialized joins to the MR framework  Map-side join.Tradeoffs: temporary table for join.Another type of join: MR redistributes data  Directed joinSplit MR/DB Semijoin like 'foreignKey IN (listOfValues)'Can be split into two MapReduce jobsSideDB to eliminate the first MapReduce job
Split Query Execution (Cont.)Post-join AggregationTwo MapReduce jobsHash-based partial aggregation   save significant I/OA similar technique is applied to TOP N selectionsPre-join AggregationFor MR based join.Group-by and join-key columns is smaller than the cardinality of the entire table.
A Query Plan in HadoopDB
PerformanceNo hash partition feature in Hive
Emerging Trends in the Enterprise Data Analytics: Connecting Hadoop and DB2 Warehouse
DB2 and Hadoop/Jaql Interactions
A HadoopBased Distributed Loading Approach to Parallel Data Warehouses
IntroductionWhy Hadoop for Teradata EDWMore disk space and space can be easily addedHDFS as a storageMapReduceDistributedHDFS blocks to Teradata EDW nodes assignment problemParameters: n blocks, k copies, m nodesGoal: to assign HDFS blocks to nodes evenly and minimize network traffic
Block Assignment Problem	HDFS file F on a cluster of P nodes (each node is uniquely identified with an integer i where 1 ≤ i ≤ P)    The problem is defined by:  assignment(X, Y, n,m, k, r) X is the set of n blocks (X = {1, . . . , n}) of FY is the set of m nodes running PDBMS (called PDBMS nodes) (Y⊆{1, . . . , P })k copies, m nodesr is the mapping recording the replicated block locations of each block.r(i) returns the set of nodes which has a copy of the block i.An assignment g from the blocks in X to the nodes in Y is denoted by a mapping from X = {1, . . . , n} to Y where g(i) = j (i ∈ X, j ∈ Y ) means that the block i is assigned to the node j.
Block Assignment Problem (Cont.)The problem is defined by:  assignment(X, Y, n,m, k, r)     An even assignment g is an assignment such that ∀ i ∈ Y ∀j ∈ Y| |{ x | ∀ 1 ≤ x ≤ n&&g(x) = i}| - |{y | ∀ 1 ≤ y ≤ n&&g(y) = j}| | ≤ 1. The cost of an assignment g is defined to be cost(g) = |{i | g(i) /∈r(i) ∀ 1 ≤ i ≤ n}|, which is the number of blocks assigned to remote nodes.We use |g| to denote the number of blocks assigned to local nodes by g. We have |g| = n - cost(g).The optimal assignment problem is to find an even assignment with the smallest cost.
OBA algorithm(X, Y, n,m, k, r)=({1, 2, 3}, {1, 2}, 3, 2, 1, {1 -> {1}, 2 -> {1}, 3 -> {2}})

Hadoop in sigmod 2011

  • 1.
    Hadoop in SIGMOD20112011/5/20
  • 2.
    PapersLCI: a socialchannel analysis platform for live customer intelligenceBistro data feed management systemApache hadoop goes realtime at FacebookNova: continuous Pig/Hadoop workflowsA Hadoop based distributed loading approach to parallel data warehousesA batch of PNUTS: experiences connecting cloud batch and serving systems
  • 3.
    Papers (Continued)Turbocharging DBMSbuffer pool using SSDsOnline reorganization in read optimized MMDBSAutomated partitioning design in parallel database systemsOracle database filesystemEmerging trends in the enterprise data analytics: connecting Hadoop and DB2 warehouseEfficient processing of data warehousing queries in a split execution environmentSQL server column store indexesAn analytic data engine for visualization in tableau
  • 4.
    Apache Hadoop GoesRealtime at Facebook
  • 5.
    Workload TypesFacebook MessagingHighWrite ThroughputLarge TablesData MigrationFacebook InsightsRealtime AnalyticsHigh Throughput IncrementsFacebook Metrics System (ODS)Automatic ShardingFast Reads of Recent Data and Table Scans
  • 6.
    Why Hadoop &HBaseElasticityHigh write throughputEfficient and low-latency strong consistency semantics within a data centerEfficient random reads from diskHigh Availability and Disaster RecoveryFault IsolationAtomic read-modify-write primitivesRange ScansTolerance of network partitions within a single data centerZero Downtime in case of individual data center failureActive-active serving capability across different data centers
  • 7.
    RealtimeHDFSHigh Availability -AvatarNodeHot Standby – AvatarNodeEnhancements to HDFS transaction loggingTransparent Failover: DAFS(client enhancement+ZooKeeper)HadoopRPC compatibilityBlock Availability: Placement Policya pluggable block placement policy
  • 8.
    Realtime HDFS (Cont.)PerformanceImprovements for a Realtime WorkloadRPC TimeoutRecover File LeaseHDFS-appendrecoverLeaseReads from Local ReplicasNew FeaturesHDFS syncConcurrent Readers (last chunk of data)
  • 9.
    Production HBaseACID Compliance(RWCC: Read Write Consistency Control)Atomicity (WALEdit)ConsistencyAvailability ImprovementsHBase Master RewriteRegion assignment in memory -> ZooKeeperOnline UpgradesDistributed Log SplittingPerformance ImprovementsCompactionRead Optimizations
  • 10.
    Deployment and OperationalExperiencesTestingAuto Tesing ToolHBase VerifyMonitoring and ToolsHBCKMore metricsManual versus Automatic SplittingAdd new RegionServers, not region splittingDark Launch (灰度)Dashboards/ODS integrationBackups at the Application layerSchema ChangesImporting DataLzo & zipReducing Network IOMajor compaction
  • 11.
  • 12.
    Nova OverviewScenariosIngesting andanalyzing user behavior logs Building and updating a search index from a stream of crawled web pages Processing semi-structured data feedsTwo-layer programming model (Nova over Pig)Continuous processingIndependent schedulingCross-module optimizationManageability features
  • 13.
    Abstract Workflow ModelWorkflowTwokinds of vertices: tasks (processing steps) and channels (data containers)Edges connect tasks to channels and channels to tasksEdge annotations (all, new, B and Δ)Four common patterns of processingNon-incremental (template detection)Stateless incremental (shingling)Stateless incremental with lookup table (template tagging)Stateful incremental (de-duping)
  • 14.
    Abstract Workflow Model(Cont.)Data and Update ModelBlocks: base blocks and delta blocksChannel functions: merge, chain and diffTask/Data InterfaceConsumption mode: all or newProduction mode: B or ΔWorkflow Programming and SchedulingData Compaction and Garbage Collection
  • 15.
  • 16.
    Efficient Processing ofData Warehousing Queries in a Split Execution Environment
  • 17.
    IntroductionTwo approachesStarting witha parallel database system and adding some MapReduce featuresStarting with MapReduce and adding database system technologyHadoopDB follows the second of the approachesTwo heuristics for HadoopDB optimizationsDatabase systems can process data at a faster rate than Hadoop.Minimize the number of MapReduce jobs in SQL execution plan.
  • 18.
    HadoopDBHadoopDB ArchitectureDatabase ConnectorDataLoaderCatalogQuery InterfaceVectorWise/X100 Database (SIMD) vs. PostgreSQLHadoopDB Query Executionselection, projection, and partial aggregation(Map and Combine)  database systemco-partitioned tablesMR for redistributing dataSideDB (a "database task done on the side").
  • 19.
    Split Query ExecutionReferentialPartitioningJoin in database engineLocal joinforeign-key  Referential PartitioningSplit MR/DB JoinsDirected join: one of the tables is already partitioned by the join key.Broadcast join: small table ought to be shipped to every node.Adding specialized joins to the MR framework  Map-side join.Tradeoffs: temporary table for join.Another type of join: MR redistributes data  Directed joinSplit MR/DB Semijoin like 'foreignKey IN (listOfValues)'Can be split into two MapReduce jobsSideDB to eliminate the first MapReduce job
  • 20.
    Split Query Execution(Cont.)Post-join AggregationTwo MapReduce jobsHash-based partial aggregation  save significant I/OA similar technique is applied to TOP N selectionsPre-join AggregationFor MR based join.Group-by and join-key columns is smaller than the cardinality of the entire table.
  • 21.
    A Query Planin HadoopDB
  • 22.
  • 23.
    Emerging Trends inthe Enterprise Data Analytics: Connecting Hadoop and DB2 Warehouse
  • 24.
  • 25.
    A HadoopBased DistributedLoading Approach to Parallel Data Warehouses
  • 26.
    IntroductionWhy Hadoop forTeradata EDWMore disk space and space can be easily addedHDFS as a storageMapReduceDistributedHDFS blocks to Teradata EDW nodes assignment problemParameters: n blocks, k copies, m nodesGoal: to assign HDFS blocks to nodes evenly and minimize network traffic
  • 27.
    Block Assignment Problem HDFSfile F on a cluster of P nodes (each node is uniquely identified with an integer i where 1 ≤ i ≤ P) The problem is defined by: assignment(X, Y, n,m, k, r) X is the set of n blocks (X = {1, . . . , n}) of FY is the set of m nodes running PDBMS (called PDBMS nodes) (Y⊆{1, . . . , P })k copies, m nodesr is the mapping recording the replicated block locations of each block.r(i) returns the set of nodes which has a copy of the block i.An assignment g from the blocks in X to the nodes in Y is denoted by a mapping from X = {1, . . . , n} to Y where g(i) = j (i ∈ X, j ∈ Y ) means that the block i is assigned to the node j.
  • 28.
    Block Assignment Problem(Cont.)The problem is defined by: assignment(X, Y, n,m, k, r) An even assignment g is an assignment such that ∀ i ∈ Y ∀j ∈ Y| |{ x | ∀ 1 ≤ x ≤ n&&g(x) = i}| - |{y | ∀ 1 ≤ y ≤ n&&g(y) = j}| | ≤ 1. The cost of an assignment g is defined to be cost(g) = |{i | g(i) /∈r(i) ∀ 1 ≤ i ≤ n}|, which is the number of blocks assigned to remote nodes.We use |g| to denote the number of blocks assigned to local nodes by g. We have |g| = n - cost(g).The optimal assignment problem is to find an even assignment with the smallest cost.
  • 29.
    OBA algorithm(X, Y,n,m, k, r)=({1, 2, 3}, {1, 2}, 3, 2, 1, {1 -> {1}, 2 -> {1}, 3 -> {2}})