Attributes of a Modern Data Warehouse - Gartner Catalyst

•Download as PPTX, PDF•

0 likes•430 views

Most data-driven enterprises continue to struggle to generate the insights they need from their data. More data volumes from more data sources, combined with escalating user concurrency, have led to declining query throughput performance and skyrocketing data warehouse costs. Moreover, modern use cases such as customer-360 and hyper-personalization have blurred the boundaries between operational and analytics systems, making even greater demands on data warehouse solutions.

Data & Analytics

Attributes of a
Modern Data Warehouse
Jack Mardack, VP Actian
Gartner Catalyst
August 12-15, 2019

Why does this matter?
Well, it’s a bit like a milkshake…

Feeding the data analytics demands of the enterprise
 Exploding demand on
traditional analytics use
cases.
 Democratization of
access has spawned
many more consumers.
 Rise of bold, new
analytics use cases and
need for real-time.
 Enterprises get the
power (like the big social
players) to leverage
massive amounts of
customer data to drive
large-scale CX
personalization and
marketing.
 Reporting
 Ad hoc Analysis
 Data Mining
Cloud data
warehouse
X
Many more
 Customer-360
 Hyper-personalization
 ML & AI
 Contextual Communications
TRENDS
Demanding
NEW
Use Cases
CONSUMERS
Traditional Analytics
Use Cases

Generational Evolution of the Data Warehouse
Appliances
 Multiple clouds support and
3rd party integrations make
data flow easier
 Admin is much easier.
 Storage separation makes cost
more elastic, thus lower.
 Time to value drops to weeks
 Time-to-value drops to < months.
 Much easier to get data in (from S3).
 Compute and storage sold together
(so not elastic).
 Heavy technical setup and
maintenance
 Much more powerful relative
compute power
 Hybrid and multi-cloud coverage
 Built-in integrations make data
flow much easier
 Fast ingestion and extraction
3rd Party
Integrations
Built-in
Integrations
3.
X n
1.
X n
2.
CSVs
SPARK
OLTPOLAP
X n
The Cloud Multi-Cloud Gen III
S3
 Integrated software and hardware gave
great analytics performance vs the
world of before.
 Very expensive to buy more (CAPEX).
 Very expensive administration.
 Hard to get data in and out.
 Low tolerance for concurrent demand.
 Time to value > months.
Data
Sources
Hybrid
AvalancheSnowflakeRedshiftExadata TeradataNetezza

The Essential Requirements:
Put it where you need it
 “Play your data where it lives" - bring
high-performance analytics to all
your data sources, wherever they are.
Higher compute performance ceiling
 Great absolute throughput performance at
scale (petabytes of data + thousands of users +
plus high query complexity).
Unparalleled unit economics
 Great cost-performance (at scale).
Your rules, over your data
 Your unique corporate governance,
compliance, and security needs can
be met headache-free.
A pleasure to use
 Easy to setup.
 Easy to scale up and down, so peaks don’t
destroy performance nor break the bank.

Avalanche Gen III Cloud Data Warehouse Service
Multi-Cloud & Hybrid
Run on-premise or in the cloud
platform of your choice.
Platform Support
Google Cloud
Platform (Planned)
HDFS Azure ADLS AWS EBS S3 Azure Blob Storage Data Lake
Spark Enabled External TablesNative High Performance
Intelligent Elastic Storage
Horizontally Partitioned DataStorage is Separate & Smart
Not just separate from compute
costs, but smartly used .
Pre-Built Connectors
Plus +200 more
apps supported with
pre-built connectors
App2App
Hybrid Data
All your Data Sources
Connect easily to the data
sources your business runs on.
Industry
Standard SQL
Vector
Processing
Query Resource
Optimization
Federated
Query
Real-time
Updates
Advanced
Columnar
Advanced Cloud Compute
Gen-III Cloud Architecture
Elevates absolute throughput
performance to new levels.
Robust Access Business Analyst Data Scientist Data Engineer
ODBC JDBC .NET Python Spark Kafka
Ecosystem Tools
Give everyone on the team
the access they need.

Things to do now
Come say hello at booth #405
Visit actian.com/avalanche
Tweet something you liked from my talk  (@2hp, @actiancorp)

What's hot

Data Architecture for Machine LearningDenodo

Prologis: How Data Virtualization Enables Data ScientistsDenodo

Big Data EcosystemIvo Vachkov

Agile Data Management with Enterprise Data Fabric (Middle East)Denodo

Analytics in a Day Ft. Synapse Virtual WorkshopCCG

Big data ecosystemmagda3695

Big data competitive landscape overviewBisakha Praharaj

Auckland SQL Saturday - Azure Data LakeSergio Zenatti Filho

Cloud analyticsgaurav jain

The Ecosystem is too damn big DataWorks Summit/Hadoop Summit

How to accelerate Splunk analyticsClearSky Data

Multi-Cloud Data Integration with Data Virtualization (APAC)Denodo

DW ApplianceShankar R

Data OffloadingNilofar Nigar

Altis Webinar: Use Cases For The Modern Data PlatformAltis Consulting

Redshift vs BigQuery lessons learned at Yahoo!Jonathan Raspaud

Clustrix InfographicKatherine Slick

Augmented analytics will push the analytics adoptionPolestarsolutions

Big Data EcosystemLucian Neghina

BIg Data Trends in 2016Stig-Arne Kristoffersen

What's hot (20)

Data Architecture for Machine Learning

Prologis: How Data Virtualization Enables Data Scientists

Big Data Ecosystem

Agile Data Management with Enterprise Data Fabric (Middle East)

Analytics in a Day Ft. Synapse Virtual Workshop

Big data ecosystem

Big data competitive landscape overview

Auckland SQL Saturday - Azure Data Lake

Cloud analytics

The Ecosystem is too damn big

How to accelerate Splunk analytics

Multi-Cloud Data Integration with Data Virtualization (APAC)

DW Appliance

Data Offloading

Altis Webinar: Use Cases For The Modern Data Platform

Redshift vs BigQuery lessons learned at Yahoo!

Clustrix Infographic

Augmented analytics will push the analytics adoption

Big Data Ecosystem

BIg Data Trends in 2016

Similar to Attributes of a Modern Data Warehouse - Gartner Catalyst

클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스Amazon Web Services Korea

Derfor skal du bruge en DataLakeMicrosoft

SendGrid Improves Email Delivery with Hybrid Data WarehousingAmazon Web Services

Unlock Data-driven Insights in Databricks Using Location IntelligencePrecisely

Big Data on Azure Tutorialrustd

Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.

Big Data Companies and Apache SoftwareBob Marcus

IBM Cloud pak for data brochureSimon Harrison ACMA CGMA

Opportunity: Data, Analytic & Azure Abhimanyu Singhal

Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media

AWS Big Data Solution DaysAmazon Web Services

Modernizing to a Cloud Data ArchitectureDatabricks

Big Data in AzureDataWorks Summit/Hadoop Summit

Modernize storage infrastructure with hybrid cloud & flashCraig McKenna

Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATIONMatt Stubbs

State of the Union: Database & AnalyticsAmazon Web Services

Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Certus Solutions

Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Denodo

CloudantDealmaker Media

Big Data: It’s all about the Use CasesJames Serra

Similar to Attributes of a Modern Data Warehouse - Gartner Catalyst (20)

클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스

Derfor skal du bruge en DataLake

SendGrid Improves Email Delivery with Hybrid Data Warehousing

Unlock Data-driven Insights in Databricks Using Location Intelligence

Big Data on Azure Tutorial

Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Big Data Companies and Apache Software

IBM Cloud pak for data brochure

Opportunity: Data, Analytic & Azure

Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...

AWS Big Data Solution Days

Modernizing to a Cloud Data Architecture

Big Data in Azure

Modernize storage infrastructure with hybrid cloud & flash

Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATION

State of the Union: Database & Analytics

Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...

Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...

Cloudant

Big Data: It’s all about the Use Cases

Recently uploaded

Halmar dropshipping via API with DroFxolyaivanovalion

Carero dropshipping via API with DroFx.pptxolyaivanovalion

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408

Midocean dropshipping via API with DroFxolyaivanovalion

Ukraine War presentation: KNOW THE BASICSAishani27

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth

Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor

Brighton SEO | April 2024 | Data StorytellingNeil Barnes

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Week-01-2.ppt BBB human Computer interactionfulawalesam

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

April 2024 - Crypto Market Report's Analysismanisha194592

Recently uploaded (20)

Halmar dropshipping via API with DroFx

Carero dropshipping via API with DroFx.pptx

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps

Midocean dropshipping via API with DroFx

Ukraine War presentation: KNOW THE BASICS

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Ravak dropshipping via API with DroFx.pptx

CebaBaby dropshipping via API with DroFX.pptx

Customer Service Analytics - Make Sense of All Your Data.pptx

04242024_CCC TUG_Joins and Relationships

Unveiling Insights: The Role of a Data Analyst

Dubai Call Girls Wifey O52&786472 Call Girls Dubai

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai

Brighton SEO | April 2024 | Data Storytelling

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Week-01-2.ppt BBB human Computer interaction

Log Analysis using OSSEC sasoasasasas.pptx

April 2024 - Crypto Market Report's Analysis

Attributes of a Modern Data Warehouse - Gartner Catalyst

1. Attributes of a Modern Data Warehouse Jack Mardack, VP Actian Gartner Catalyst August 12-15, 2019

2. Why does this matter? Well, it’s a bit like a milkshake…

3. Feeding the data analytics demands of the enterprise  Exploding demand on traditional analytics use cases.  Democratization of access has spawned many more consumers.  Rise of bold, new analytics use cases and need for real-time.  Enterprises get the power (like the big social players) to leverage massive amounts of customer data to drive large-scale CX personalization and marketing.  Reporting  Ad hoc Analysis  Data Mining Cloud data warehouse X Many more  Customer-360  Hyper-personalization  ML & AI  Contextual Communications TRENDS Demanding NEW Use Cases CONSUMERS Traditional Analytics Use Cases

4. Generational Evolution of the Data Warehouse Appliances  Multiple clouds support and 3rd party integrations make data flow easier  Admin is much easier.  Storage separation makes cost more elastic, thus lower.  Time to value drops to weeks  Time-to-value drops to < months.  Much easier to get data in (from S3).  Compute and storage sold together (so not elastic).  Heavy technical setup and maintenance  Much more powerful relative compute power  Hybrid and multi-cloud coverage  Built-in integrations make data flow much easier  Fast ingestion and extraction 3rd Party Integrations Built-in Integrations 3. X n 1. X n 2. CSVs SPARK OLTPOLAP X n The Cloud Multi-Cloud Gen III S3  Integrated software and hardware gave great analytics performance vs the world of before.  Very expensive to buy more (CAPEX).  Very expensive administration.  Hard to get data in and out.  Low tolerance for concurrent demand.  Time to value > months. Data Sources Hybrid AvalancheSnowflakeRedshiftExadata TeradataNetezza

5. The Essential Requirements: Put it where you need it  “Play your data where it lives" - bring high-performance analytics to all your data sources, wherever they are. Higher compute performance ceiling  Great absolute throughput performance at scale (petabytes of data + thousands of users + plus high query complexity). Unparalleled unit economics  Great cost-performance (at scale). Your rules, over your data  Your unique corporate governance, compliance, and security needs can be met headache-free. A pleasure to use  Easy to setup.  Easy to scale up and down, so peaks don’t destroy performance nor break the bank.

6. Avalanche Gen III Cloud Data Warehouse Service Multi-Cloud & Hybrid Run on-premise or in the cloud platform of your choice. Platform Support Google Cloud Platform (Planned) HDFS Azure ADLS AWS EBS S3 Azure Blob Storage Data Lake Spark Enabled External TablesNative High Performance Intelligent Elastic Storage Horizontally Partitioned DataStorage is Separate & Smart Not just separate from compute costs, but smartly used . Pre-Built Connectors Plus +200 more apps supported with pre-built connectors App2App Hybrid Data All your Data Sources Connect easily to the data sources your business runs on. Industry Standard SQL Vector Processing Query Resource Optimization Federated Query Real-time Updates Advanced Columnar Advanced Cloud Compute Gen-III Cloud Architecture Elevates absolute throughput performance to new levels. Robust Access Business Analyst Data Scientist Data Engineer ODBC JDBC .NET Python Spark Kafka Ecosystem Tools Give everyone on the team the access they need.

7. Things to do now Come say hello at booth #405 Visit actian.com/avalanche Tweet something you liked from my talk  (@2hp, @actiancorp)

8. Thank you!

Editor's Notes

1. Robust Access At the highest level, Avalanche enables a diverse set of personas that enable analytics within your organization. Specific set of tools and access points enable each of these personas to interact with avalanche using tools and languages of their choice and leverage the current ecosystem that your organization has invested in For the Business Analyst persona we partner with the most popular BI tools including Tableau, Qlik, Looker and MicroStrategy, to bring a seamless experience. All the functionalities are exposed with SQL as well from their favorite authoring tool For the Data Scientist, Avalanche provides highly scalable atomic data analysis exposed via python or java or C++ library which can be plugged into tools such as Jupyter notebook. Importantly, we also provide an optimized native Spark integration and KNIME (a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing (ETL: Extraction, Transformation, Loading), for modeling, data analysis and visualization without, or with only minimal, programming. To some extent as advanced analytics tool KNIME can be considered as a SAS alternative) plugin that helps data scientists define pipelines for advanced analytics work For the Data Engineer, we provide various language support include Python, Scala, Java, C and access via REST API’s to be language agnostic 2. Advanced Compute We designed vectorized compute to leverage CPU SIMD (Single instruction, multiple data parallel computers with multiple processing elements that perform the same operation on multiple data points simultaneously for compute throughput acceleration) and process data in the L1/L2 CPU cache instead of RAM which is much faster and thus delivers faster performance. A bit more on SIMD for those interested – it’s a big deal and is a key reason we are faster An application that may take advantage of SIMD is one where the same value is being added to (or subtracted from) a large number of data points, a common operation in many multimedia applications. One example would be changing the brightness of an image. Each pixel of an image consists of three values for the brightness of the red (R), green (G) and blue (B) portions of the color. To change the brightness, the R, G and B values are read from memory, a value is added to (or subtracted from) them, and the resulting values are written back out to memory. With a SIMD processor there are two improvements to this process. For one the data is understood to be in blocks, and a number of values can be loaded all at once. Instead of a series of instructions saying "retrieve this pixel, now retrieve the next pixel", a SIMD processor will have a single instruction that effectively says "retrieve n pixels" (where n is a number that varies from design to design). For a variety of reasons, this can take much less time than retrieving each pixel individually, as with traditional CPU design. Another advantage is that the instruction operates on all loaded data in a single operation. In other words, if the SIMD system works by loading up eight data points at once, the add operation being applied to the data will happen to all eight values at the same time. This parallelism is separate from the parallelism provided by a superscalar processor; the eight values are processed in parallel even on a non-superscalar processor, and a superscalar processor may be able to perform multiple SIMD operations in parallel. We have enabled real time updates to data in our CDW (Cloud Data Warehouse) offering without adding latency to process data updates (typically seen in most of our competitors) which no other CDW have in the industry. We have several USAA patents in this space. This ensures that Avalanche users can always be assured of accessing the freshest data to power their analytics without paying a performance penalty. Our superior columnar implementation is advanced to enable the least I/O performed while retrieving data from disk Avalanche’s advanced Federated Query capability is powered by an innovate query execution execution algorithm which introspects schemas and data that doesn’t reside in Avalanche when joined with local data to produce minimal latency. Net-net: You can access and analyze data regardless of its source or location and still deliver blazing fast performance throughput. If you have multiple Avalanche deployments e.g. a combination of on-prem and in the cloud you can treat all the deployments as virtually a single entity that can be seamlessly accessed. Very, very useful for hybrid data migration and off-load deployments use cases The intelligent Query Resource Optimization (QRO) feature examines query requests and available compute and storage resources and determines ideal allocation of the resources to deliver optimal overall performance throughput. Unlike competitive systems that require virtual data warehouses are allocated to certain workloads on a captive basis (often wasting unused resources) Avalanche always ensures that your compute and storage resources are utilized on a holistic basis to deliver the optimal analytical workload outcome. Industry Standard SQL compliance – Avalanche, unlike many alternative systems fully supports the ANSI 2016 SQL standard which ensures that any industry standard query will be able to run unaltered. Delivering this level of compliance is especially important in supporting data migration and offload projects. Avalanche compute is build for elastic scaling from the ground up which can be ramped up on demand for the most demanding concurrent workloads 3. Smart Storage Storage in the cloud is a lot different than on-prem. One of the key architectural asks from the customers we have talked to is the separation of compute from storage so one can increase or decrease these two components independently of each other. Avalanche addresses that with and intelligent optimized storage is built for multi-cloud environment Avalanche uses resilient, high performant storage mechanisms such as EBS (Elastic Block Storage) in AWS (magnitudes faster than slow S3 (with higher IOPS and lower latency) which Snowflake depends on), ADLS Gen 2 in Azure and HDFS/posix in on prem. The advanced compression algorithms combined with choosing the most efficient algorithm based on the data stored in every block makes it the most optimized intelligent use of storage. This reduces the TCO for our customers. External table support are a key feature that enables Avalanche to access data the resides outside the data warehouse e.g. in an external data lake, with having to move the data. The benefit is your analytics can access and leverage regardless of the source or location and you can be sure your system is always access the freshest, most up to date data. Not all data warehouses have this capability so it is a clear differentiator for Avalanche. We understand that the customers store data in other mechanism in the current ecosystem as well. We enable ease of access to data stored externally via Spark enabled pipeline based parallel access to storage mechanisms such as S3, Azure Blob Storage or custom data lakes 4. Multi-Platforms Avalanche cloud data warehouse is a platform built for multi-cloud. The same Avalanche Hybrid multi cloud platform is available in AWS, Azure and on-prem on Posix environments and as VMWare containers. This really enables the customers to operate seamlessly in a multi-cloud environment with o- prem data and gives a path to the cloud for everyone at their own pace. All deployments, regardless of location are 100% compatible which means your queries will run unchanged. 5. Data Connectors Finally, unlike any of our competition Avalanche provides over +200 pre-build enterprise connectors. These pre-integrated and extensible connectors enable organizations to quickly access popular data sources including ServiceNow, Salesforce, Oracle, SAP, NetSuite and many others. All other competitive CDW platforms requires you to work with a partner to help source and move data to the platform resulting in extra cost and hassle. The integrated Actian FlexPath architecture enables customers to source data from data sources within a few clicks in the UI and more importantly the integration execution is fully managed by Avalanche. Faster, more reliable deployments, single-source support and thousands saved!

Attributes of a Modern Data Warehouse - Gartner Catalyst

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Attributes of a Modern Data Warehouse - Gartner Catalyst

Similar to Attributes of a Modern Data Warehouse - Gartner Catalyst (20)

Recently uploaded

Recently uploaded (20)

Attributes of a Modern Data Warehouse - Gartner Catalyst

Editor's Notes