The ever-increasing interest in running fast analytic scans on constantly updating data is stretching the capabilities of HDFS and NoSQL storage. Users want the fast online updates and serving of real-time data that NoSQL offers, as well as the fast scans, analytics, and processing of HDFS. Additionally, users are demanding that big data storage systems integrate natively with their existing BI and analytic technology investments, which typically use SQL as the standard query language of choice. This demand has led big data back to a familiar friend: relationally structured data storage systems.
Todd Lipcon explores the advantages of relational storage and reviews new developments, including Google Cloud Spanner and Apache Kudu, which provide a scalable relational solution for users who have too much data for a legacy high-performance analytic system. Todd explains how to address use cases that fall between HDFS and NoSQL with technologies like Apache Kudu or Google Cloud Spanner and how the combination of relational data models, SQL query support, and native API-based access enables the next generation of big data applications. Along the way, he also covers suggested architectures, the performance characteristics of Kudu and Spanner, and the deployment flexibility each option provides.