Most data-driven enterprises continue to struggle to generate the insights they need from their data. More data volumes from more data sources, combined with escalating user concurrency, have led to declining query throughput performance and skyrocketing data warehouse costs. Moreover, modern use cases such as customer-360 and hyper-personalization have blurred the boundaries between operational and analytics systems, making even greater demands on data warehouse solutions.
Attributes of a Modern Data Warehouse - Gartner Catalyst
1. Attributes of a
Modern Data Warehouse
Jack Mardack, VP Actian
Gartner Catalyst
August 12-15, 2019
2. Why does this matter?
Well, it’s a bit like a milkshake…
3. Feeding the data analytics demands of the enterprise
Exploding demand on
traditional analytics use
cases.
Democratization of
access has spawned
many more consumers.
Rise of bold, new
analytics use cases and
need for real-time.
Enterprises get the
power (like the big social
players) to leverage
massive amounts of
customer data to drive
large-scale CX
personalization and
marketing.
Reporting
Ad hoc Analysis
Data Mining
Cloud data
warehouse
X
Many more
Customer-360
Hyper-personalization
ML & AI
Contextual Communications
TRENDS
Demanding
NEW
Use Cases
CONSUMERS
Traditional Analytics
Use Cases
4. Generational Evolution of the Data Warehouse
Appliances
Multiple clouds support and
3rd party integrations make
data flow easier
Admin is much easier.
Storage separation makes cost
more elastic, thus lower.
Time to value drops to weeks
Time-to-value drops to < months.
Much easier to get data in (from S3).
Compute and storage sold together
(so not elastic).
Heavy technical setup and
maintenance
Much more powerful relative
compute power
Hybrid and multi-cloud coverage
Built-in integrations make data
flow much easier
Fast ingestion and extraction
3rd Party
Integrations
Built-in
Integrations
3.
X n
1.
X n
2.
CSVs
SPARK
OLTPOLAP
X n
The Cloud Multi-Cloud Gen III
S3
Integrated software and hardware gave
great analytics performance vs the
world of before.
Very expensive to buy more (CAPEX).
Very expensive administration.
Hard to get data in and out.
Low tolerance for concurrent demand.
Time to value > months.
Data
Sources
Hybrid
AvalancheSnowflakeRedshiftExadata TeradataNetezza
5. The Essential Requirements:
Put it where you need it
“Play your data where it lives" - bring
high-performance analytics to all
your data sources, wherever they are.
Higher compute performance ceiling
Great absolute throughput performance at
scale (petabytes of data + thousands of users +
plus high query complexity).
Unparalleled unit economics
Great cost-performance (at scale).
Your rules, over your data
Your unique corporate governance,
compliance, and security needs can
be met headache-free.
A pleasure to use
Easy to setup.
Easy to scale up and down, so peaks don’t
destroy performance nor break the bank.
6. Avalanche Gen III Cloud Data Warehouse Service
Multi-Cloud & Hybrid
Run on-premise or in the cloud
platform of your choice.
Platform Support
Google Cloud
Platform (Planned)
HDFS Azure ADLS AWS EBS S3 Azure Blob Storage Data Lake
Spark Enabled External TablesNative High Performance
Intelligent Elastic Storage
Horizontally Partitioned DataStorage is Separate & Smart
Not just separate from compute
costs, but smartly used .
Pre-Built Connectors
Plus +200 more
apps supported with
pre-built connectors
App2App
Hybrid Data
All your Data Sources
Connect easily to the data
sources your business runs on.
Industry
Standard SQL
Vector
Processing
Query Resource
Optimization
Federated
Query
Real-time
Updates
Advanced
Columnar
Advanced Cloud Compute
Gen-III Cloud Architecture
Elevates absolute throughput
performance to new levels.
Robust Access Business Analyst Data Scientist Data Engineer
ODBC JDBC .NET Python Spark Kafka
Ecosystem Tools
Give everyone on the team
the access they need.
7. Things to do now
Come say hello at booth #405
Visit actian.com/avalanche
Tweet something you liked from my talk (@2hp, @actiancorp)
1. Robust Access
At the highest level, Avalanche enables a diverse set of personas that enable analytics within your organization. Specific set of tools and access points enable each of these personas to interact with avalanche using tools and languages of their choice and leverage the current ecosystem that your organization has invested in
For the Business Analyst persona we partner with the most popular BI tools including Tableau, Qlik, Looker and MicroStrategy, to bring a seamless experience. All the functionalities are exposed with SQL as well from their favorite authoring tool
For the Data Scientist, Avalanche provides highly scalable atomic data analysis exposed via python or java or C++ library which can be plugged into tools such as Jupyter notebook. Importantly, we also provide an optimized native Spark integration and KNIME (a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing (ETL: Extraction, Transformation, Loading), for modeling, data analysis and visualization without, or with only minimal, programming. To some extent as advanced analytics tool KNIME can be considered as a SAS alternative) plugin that helps data scientists define pipelines for advanced analytics work
For the Data Engineer, we provide various language support include Python, Scala, Java, C and access via REST API’s to be language agnostic
2. Advanced Compute
We designed vectorized compute to leverage CPU SIMD (Single instruction, multiple data parallel computers with multiple processing elements that perform the same operation on multiple data points simultaneously for compute throughput acceleration) and process data in the L1/L2 CPU cache instead of RAM which is much faster and thus delivers faster performance.
A bit more on SIMD for those interested – it’s a big deal and is a key reason we are faster
An application that may take advantage of SIMD is one where the same value is being added to (or subtracted from) a large number of data points, a common operation in many multimedia applications. One example would be changing the brightness of an image. Each pixel of an image consists of three values for the brightness of the red (R), green (G) and blue (B) portions of the color. To change the brightness, the R, G and B values are read from memory, a value is added to (or subtracted from) them, and the resulting values are written back out to memory.
With a SIMD processor there are two improvements to this process. For one the data is understood to be in blocks, and a number of values can be loaded all at once. Instead of a series of instructions saying "retrieve this pixel, now retrieve the next pixel", a SIMD processor will have a single instruction that effectively says "retrieve n pixels" (where n is a number that varies from design to design). For a variety of reasons, this can take much less time than retrieving each pixel individually, as with traditional CPU design. Another advantage is that the instruction operates on all loaded data in a single operation. In other words, if the SIMD system works by loading up eight data points at once, the add operation being applied to the data will happen to all eight values at the same time. This parallelism is separate from the parallelism provided by a superscalar processor; the eight values are processed in parallel even on a non-superscalar processor, and a superscalar processor may be able to perform multiple SIMD operations in parallel.
We have enabled real time updates to data in our CDW (Cloud Data Warehouse) offering without adding latency to process data updates (typically seen in most of our competitors) which no other CDW have in the industry. We have several USAA patents in this space. This ensures that Avalanche users can always be assured of accessing the freshest data to power their analytics without paying a performance penalty. Our superior columnar implementation is advanced to enable the least I/O performed while retrieving data from disk
Avalanche’s advanced Federated Query capability is powered by an innovate query execution execution algorithm which introspects schemas and data that doesn’t reside in Avalanche when joined with local data to produce minimal latency. Net-net: You can access and analyze data regardless of its source or location and still deliver blazing fast performance throughput. If you have multiple Avalanche deployments e.g. a combination of on-prem and in the cloud you can treat all the deployments as virtually a single entity that can be seamlessly accessed. Very, very useful for hybrid data migration and off-load deployments use cases
The intelligent Query Resource Optimization (QRO) feature examines query requests and available compute and storage resources and determines ideal allocation of the resources to deliver optimal overall performance throughput. Unlike competitive systems that require virtual data warehouses are allocated to certain workloads on a captive basis (often wasting unused resources) Avalanche always ensures that your compute and storage resources are utilized on a holistic basis to deliver the optimal analytical workload outcome.
Industry Standard SQL compliance – Avalanche, unlike many alternative systems fully supports the ANSI 2016 SQL standard which ensures that any industry standard query will be able to run unaltered. Delivering this level of compliance is especially important in supporting data migration and offload projects.
Avalanche compute is build for elastic scaling from the ground up which can be ramped up on demand for the most demanding concurrent workloads
3. Smart Storage
Storage in the cloud is a lot different than on-prem. One of the key architectural asks from the customers we have talked to is the separation of compute from storage so one can increase or decrease these two components independently of each other. Avalanche addresses that with and intelligent optimized storage is built for multi-cloud environment
Avalanche uses resilient, high performant storage mechanisms such as EBS (Elastic Block Storage) in AWS (magnitudes faster than slow S3 (with higher IOPS and lower latency) which Snowflake depends on), ADLS Gen 2 in Azure and HDFS/posix in on prem. The advanced compression algorithms combined with choosing the most efficient algorithm based on the data stored in every block makes it the most optimized intelligent use of storage. This reduces the TCO for our customers.
External table support are a key feature that enables Avalanche to access data the resides outside the data warehouse e.g. in an external data lake, with having to move the data. The benefit is your analytics can access and leverage regardless of the source or location and you can be sure your system is always access the freshest, most up to date data. Not all data warehouses have this capability so it is a clear differentiator for Avalanche.
We understand that the customers store data in other mechanism in the current ecosystem as well. We enable ease of access to data stored externally via Spark enabled pipeline based parallel access to storage mechanisms such as S3, Azure Blob Storage or custom data lakes
4. Multi-Platforms
Avalanche cloud data warehouse is a platform built for multi-cloud. The same Avalanche Hybrid multi cloud platform is available in AWS, Azure and on-prem on Posix environments and as VMWare containers. This really enables the customers to operate seamlessly in a multi-cloud environment with o- prem data and gives a path to the cloud for everyone at their own pace. All deployments, regardless of location are 100% compatible which means your queries will run unchanged.
5. Data Connectors
Finally, unlike any of our competition Avalanche provides over +200 pre-build enterprise connectors.
These pre-integrated and extensible connectors enable organizations to quickly access popular data sources including ServiceNow, Salesforce, Oracle, SAP, NetSuite and many others. All other competitive CDW platforms requires you to work with a partner to help source and move data to the platform resulting in extra cost and hassle. The integrated Actian FlexPath architecture enables customers to source data from data sources within a few clicks in the UI and more importantly the integration execution is fully managed by Avalanche. Faster, more reliable deployments, single-source support and thousands saved!