活用段階に入ったNoSQLですがまだまだ実際どう使えるのかご存じ無い方も多いのでは無いでしょうか。当セッションでは、MapR-DB(Hbase互換のNoSQL)が企業でどう活用されているのか、インドのマイナンバー事例や国内事例を元に実際の使い方のイメージと技術的な裏付けをご説明します。2015年6月10〜12日に開催されたdb tech showcase Tokyo 2015での講演資料です。
活用段階に入ったNoSQLですがまだまだ実際どう使えるのかご存じ無い方も多いのでは無いでしょうか。当セッションでは、MapR-DB(Hbase互換のNoSQL)が企業でどう活用されているのか、インドのマイナンバー事例や国内事例を元に実際の使い方のイメージと技術的な裏付けをご説明します。2015年6月10〜12日に開催されたdb tech showcase Tokyo 2015での講演資料です。
Beginner must-see! A future that can be opened by learning HadoopDataWorks Summit
What is "Hadoop" now? It is difficult to hear ... But those who are interested, those who are thinking about the future as active as a data engineer, those who are new to the first time, through introductions of Hadoop and the surrounding ecosystem, introducing merits and examples, "What now Should I learn? "And I will introduce the future spreading through learning Hadoop and the surrounding ecosystem.
This document evaluates the performance of Cloudera Impala 1.1 using two clusters. It finds that RCFile with Snappy compression provides the fastest performance for both Hive and Impala on the clusters for reading-only workloads. Parquet with Snappy may be fastest for larger tables. Issues were identified with memory limits during Parquet table creation and were later fixed. The evaluation shows Impala improving and becoming ready to support more data formats and SQL functions.
Performance Evaluation of Cloudera Impala GAYukinori Suda
This document evaluates the performance of Cloudera Impala 1.0 using a modified HiBench benchmark on a 11-node cluster. It finds that Parquet files compressed with Snappy provide the fastest query performance, completing in 16.2 seconds on average. The order of joined tables can also impact performance. Impala shows significant performance improvements over Hive, with the fastest query being over 14 times faster. Additional functions like UDFs and window functions are recommended for future Impala extensions.
Performance evaluation of cloudera impala 0.6 beta with comparison to HiveYukinori Suda
Impala 0.6 beta was evaluated and compared to Hive for performance. Impala showed over 10 times faster query latency than Hive, with RCFile format compressed with Snappy being fastest at 16.059 seconds versus 197.894 seconds for Hive. Impala 0.6 beta added support for more platforms and RCFile format. Faster performance is expected in the GA release through additional optimizations and support for more efficient formats like Trevni.
Performance evaluation of cloudera impala (with Comparison to Hive)Yukinori Suda
This document evaluates the performance of Cloudera Impala, an open-source SQL query engine for Apache Hadoop, and compares it to Apache Hive. It describes Impala's architecture and how the benchmark was conducted. The benchmark found Impala to be over 10 times faster than Hive for the modified TPC-H query, with the fastest Impala version taking 14.337 seconds compared to 164.161 seconds for Hive. The document concludes that future versions of Impala integrated with CDH5 may provide even better performance by supporting additional file formats.