Ver: https://bit.ly/347ImDf
En la era digital, la gestión eficiente de los datos es un factor fundamental para optimizar la competitividad de las empresas. Sin embargo, la mayoría de ellas se enfrentan a silos de datos, lo que hace que su tratamiento sea lento y costoso. Además, la velocidad, la diversidad y el volumen de los datos pueden superar las arquitecturas de TI tradicionales.
¿Cómo mejorar la entrega de datos para extraer todo su valor?
¿Cómo conseguir que los datos estén disponibles y poder utilizarlos en tiempo real?
Los expertos de Vault IT y Denodo te proponen este webinar para descubrir cómo la virtualización de datos permite modernizar una arquitectura de TI en un contexto de transformación digital.
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
1. L I V E W E B I N A R
¿Cómo modernizar una arquitectura de TI
con la virtualización de datos?
2. Quiénes somos
✓ + de 30 profesionales
✓ +de 12 años en el mercado corporativo IT
✓ Cobertura Regional en Latinoamérica
✓ + de 30 clientes con facturación recurrente
✓ Clientes de más de 10 años
Dirección de Servicios de
Infraestructura y Middleware
Dirección de Desarrollo e
Integraciones
Dirección de Servicios de IOT e
Innovación Digital
✓ Implementación de proyectos de Integración de
aplicaciones
✓ Consultoría s/ arquitecturas complejas – On premise
y Cloud
✓ Assessment de infraestructura y Middleware
✓ Venta e Implementación de licencias de SW
✓ Tuning de performance y seguridad
✓ Consolidación de servidores
✓ Soporte y mantenimiento de plataformas
✓ Desarrollo de Apps Mobile y Web
✓ Desarrollo de integraciones (3 capas)
✓ Desarrollo e implementación de proyectos IOT
3. ➢Estamos transitando la era de la innovación (supervivencia del mas rápido).
➢LEAN, Agile, evolutivas: Experimentar (MVPs), fallar rápido, escalar, iterar.
➢Esto requiere aumentar exponencialmente la capacidad de innovar, pero al mirar internamente
estamos sujetos a diferentes barreras que la limitan.
➢Proponemos “externalizar” la innovación generando y exponiendo activos digitales (APIs) que
la incentiven y den agilidad a toda la cadena de valor.
➢Vemos en Denodo un socio y solución ideal para desplegar esta estrategia de INNOVACIÓN
EXTREMA, que combina virtualización de datos drag & drop, catálogo de datos self service y
herramientas de generación rápida de APIs.
➢La necesidad incentiva la innovación y en Latinoamérica están dadas las condiciones para salir
a cultivarla….
¿Por qué nos acercamos a Denodo?
5. 5
Business Challenges and Needs
Need for faster, more accurate decision making
• Significant increase in business speed & complexity of
requirements → IT struggles to deliver in a timely fashion
Ensure business continuity amidst technology evolution
• Migration of legacy systems to cloud, modernization of data
and applications
Increased regulatory risk, data privacy and security
• Exponential increase in regulations effecting data across
geographies, departments and industries
6. 6
Challenges for new Architectures
Empower business users
• Self-service and bimodal
• Real-time and enriched
internal/external data
IT manageability
• Data history, Big data
• Service-oriented interfaces,
• Cloud independence and
interoperability
Leverage AI/ML
• Discover new patterns or
trends using all sources
• Providing data faster to
Data Scientists
Leverage IoT
• Simplification to generate
more data
• Edge analytics and Data
Streaming
7. 7
Are existing architectures sufficient?
• Real time?
• Scalability?
• Maintenance?
• Flexibility to Change?
“Enterprise architects are finding that traditional data architectures are failing to meet new business
requirements, especially around data integration for streaming analytics and real-time analytics.”
* The Forrester Wave: Enterprise Data Virtualization, Jan 12, 2018
9. 9
Data Virtualization
The Solution – Data Abstraction Layer
Consume
in business applications
Combine
related data into views
Connect
to disparate data sources
2
3
1
DATA CONSUMERS
DISPARATE DATA SOURCES
Enterprise Applications, Reporting, BI, Portals, ESB, Mobile, Web, Users
Databases & Warehouses, Cloud/Saas Applications, Big Data, NoSQL, Web, XML, Excel, PDF, Word...
Analytical Operational
Less StructuredMore Structured
CONNECT COMBINE PUBLISH
Multiple Protocols,
Formats
Query, Search,
Browse
Request/Reply,
Event Driven
Secure
Delivery
SQL,
MDX
Web
Services
Big Data
APIs
Web Automation
and Indexing
CONNECT Normalized views of disparate data
COMBINE
CONSUME Share, Deliver, Publish, Govern, Collaborate
“Data virtualization
integrates disparate
data sources in real
time or near-real
time to meet
demands for
analytics and
transactional data.”
– Create a Road Map For A
Real-time, Agile, Self-
Service Data Platform,
Forrester Research, Dec 16,
2015
Discover, Transform, Prepare,
Improve Quality, Integrate
10. 10
Source: Gartner 2018 Data Virtualization Market Guide
“Through 2022, 60% of all organizations will implement data
virtualization as one key delivery style in their data integration
architecture”
11. 11
Evolution and Migrations
Decoupling enables gradual migration
DATA CONSUMERS
Base
View
Base
View
Base
View
Unified
View
Unified
View
Mart
View
Business is always up and running
LOB users and data consuming applications do not need to change
Enables to migrate gradually and seamlessly
DATA CONSUMERS
Base
View
Base
View
Base
View
Unified
View
Unified
View
Mart
View
New Location: Private, Public, Hybrid
DATA CONSUMERS
Base
View
Base
View
Base
View
Unified
View
Unified
View
Mart
View
New Technology: SQL differences,
Push down optimizations
DATA CONSUMERS
Base
View
Base
View
Base
View
Unified
View
Unified
View
Mart
View
New Data Model: Semantic Layer
12. 12
Query Push Down: Enables New Business Queries
SELECT
c.c_country,
SUM(ss.ss_quantity),
AVG(ss.ss_sales_price)
FROM
(SELECT * FROM current_store_sales
UNION ALL
SELECT * FROM historic_store_sales) ss
JOIN sqls_customer c
ON ss.ss_customer_sk = c.c_customer_sk
GROUP BY c.c_country
System Execution Time #Rows through network
Federation systems ~ 10 min 593M
Hadoop/MPP systems ~ 4 min 293M
Denodo (With MPP) 13 sec 6M
Denodo (Smart Query
Acceleration)
600 msec 6M
Comparing execution times of the same queries with
Denodo and other federation systems. Smaller is better
0 100 200 300 400 500 600 700
Denodo (With MPP)
Denodo (No MPP)
Hadoop/MPP systems
Federation systems
Execution Time (seconds)
LDW-ready
Performance
Current Sales
290 M rows
(Redshift)
Hist. Sales
300 M rows
(Hadoop)
Customer
2 M rows
(Oracle)
Customers can now run
prohibited queries
16. 16
Centralized Specifications improves speed
Analytical
JDBC & ODBC
Operational
API – WS/Rest/OData
Business views
Standardized
views
Base/Raw views
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Derived
View
Derived
View
Unified
View
Unified
View
Unified
View
Unified
View
Mart
View
LoB
View
Self-service
Catalog & search
DISPARATE DATA SOURCES
Less
Structured
More Structured
DATA CONSUMERSDATA CONSUMERSAnalytical Operational
Connect
Introspect,
Abstract
Meta-data
1
Combine
Discover,
Clean,
Transform,
Calculate
Prepare,
Improve,
Quality,
Integrate
2
Consume
Standard
expositions
3
Agility and Ease
of Use
Decoupling
17. 17
Summary
DV abstraction features ensures business continuity amidst technology
evolution
• efficient data migration and to cloud without disruption
Enables control and governance for data management
• to reduce compliance risk, enhance security and privacy
Faster & more accurate decision making
• Real-time data coupled with self-service for business users
Amplifies benefits of other technologies for modernization like:
• data lakes
• self-service BI
• cloud platforms
• data catalogs
Data virtualization can provide the foundation for a modern data architecture
Modernization must be seamless and gradual
19. Are Existing Data Architectures Sufficient?
ETL
InventorySystem
(MS SQL Server)
Product Catalog
(Web Service -SOAP)
BI / Reporting
JDBC, ODBC,
ADO .NET
Web / Mobile
WS – REST JSON,
XML, HTML, RSSLog files
(.txt/.logfiles)
CRM
(MySQL)
Billing System
(Web Service-
Rest)
Portals
JSR168 / 286,
MS WebParts
SOA, Middleware,
Enterprise Apps
WS – SOAP
Java API
CustomerVoice
(Internet,
Unstruc)
Mainframe
(BatchJobs)
Big Data
(Hadoop)
Cloud Storage
(JSON)
Cloud Data
(JSON)
19
20. Are Existing Data Architectures Sufficient?
Too Complex - Costly to Ma
20
intain
Rigid – Difficult to Adapt or Evolve
Can’t Scale – Doesn’t Match Speed of Business
21. 21
Logical Data Warehouse Reference Architecture
Reporting
Analytics
Data Science
Data Market Place
Data Monetization
AI/MM
iPaaS
Kafka
ETL
CDC
Sqoop
Flume
RawDataZoneStagingArea
CuratedDataZoneCoreDWHmodel
Data Warehouse
Data Lake
Data Virtualization Platform
Analytical Views
Data Science Views
λ Views
Real-Time Views
DWH Views
Hybrid Views
Cloud Views
UniversalCatalogofDataServices
CentralizedAccessControl
Logical Data Warehouse
24. 24
What’s the demo scenario
We have a traditional Data Warehouse in Oracle
To offload the warehouse end expand our data sets with IoT data,
we have acquired a Hadoop cluster
We are big users of SaaS solutions
Need to easily build reports using data coming from these sources
25. 25
Example
What’s the impact of a new
marketing campaign for each
country?
▪ Historical sales data offloaded to
Hadoop cluster for cheaper storage
▪ Marketing campaigns managed in an
external cloud app
▪ Country is part of the customer
details table, stored in the DW
Sources
Combine,
Transform
&
Integrate
Consume
Base View
Source
Abstraction
join
group by country
join
Sales Campaign Customer
27. 27
What is the scenario?
The DV system only stores Metadata
Data is external
• Needs to travel through the Network
• To address: minimize network traffic
Data is distributed in multiple systems
• Needs to be integrated in the virtual layer
• Some sources have processing capabilities
• To address: maximize processing at sources to reduce load in virtualization layer
28. 28
What information do we have?
1. The incoming query (SQL)
2. Table metadata
▪ Source, PK, FK, indexes, “virtual” partitions, etc.
3. Data statistics
▪ Used by the Cost Based Optimizer to estimating data volumes
4. Source capabilities
▪ Can the source process data? (eg. RDBMS vs. CSV file)
▪ “Read-Only” vs. “Can create temp tables”
▪ In an MPP, size of the cluster
29. 29
Why is this so important?
SELECT c.name, AVG(s.amount)
FROM customer c JOIN sales s
ON c.id = s.customer_id
GROUP BY c.state
How Denodo works compared with other federation engines
System Execution Time Data Transferred Optimization Technique
Denodo 9 sec. 4 M Aggregation push-down
Others 125 sec. 302 M None: full scan
300 M 2 M
Sales Customer
join
group by
2 M
2 M
Sales Customer
join
group by ID
Group by
state
To maximize push
down to the EDW
the aggregation is
split in 2 steps:
• 1st by customerID
• 2nd by state
This significantly
reduces network
Traffic and processing
In Denodo
31. 31
How to access the Denodo data model?
SQL Based access
▪ JDBC, ODBC and ADO.NET
• Integration with reporting tools: Tableau, MicroStrategy, PowerBI, BO,
Cognos, Looker, OBIEE, etc.
• Custom built applications
Web Services
▪ Multiple formats
• RESTful
• OData 4.0
• SOAP
▪ Compliance with modern standards: OAuth, JWT, SAML, OpenAPI
Denodo’s Data Catalog
▪ Web-based tool for exploration and discovery by business users
33. 33
The Role of Denodo’s Data Catalog
Catalog of views and web services
▪ Browse and search for existing views and services
▪ See descriptions, relationships and data lineage
Preview and find data
▪ Quick look at data
▪ Search based on content
Consume
▪ Customize existing views for particular needs
▪ “My queries” for personal use & share with other users
▪ Export to local file
▪ Propose new standard business / canonical views
36. 36
Overview
Security in Denodo
Authentication
• Pass-through authentication
• Service accounts
Authentication
• User/password
• Kerberos and Windows SSO
• Web Service security: SAML, OAuth, SPNEGO
LDAP
Active Directory
Role based Authentication
Guest, employee, corporate
Schema-wide Permissions
Data Specific Permissions
(Row, Column level, Masking)
Policy Based Security
Data in motion
• TLSv1.2
Data in motion
• TLS v1.2
Encrypted
data at rest
• Cache
• Swap
37. 37
Security in Denodo
Authentication
▪ Native and LDAP/Active Directory based
▪ Support for Kerberos and Windows SSO
▪ Web Services: Support for Oauth 2.0, SAML and SPNEGO
Authorization
▪ Support for Role Based Authorization
▪ Integration with LDAP user groups
▪ Different privileges (Metadata, Execute, Insert, Create Datasource, etc.)
▪ Multiple granularity levels: schema, view, column and row
▪ Support for conditional dynamic restrictions and masking
▪ Support for custom policies written in Java
Source Authentication
▪ Support for Service Accounts and Credentials Pass-Through
40. 40
Denodo Scheduler
Cache
Denodo Scheduler
Cache Control
Data Exports
• Data base
• CSV
• XML
• TDE
• Etc.
Crawling and
Indexing
Email notifications
Metadata
management
• Statistics
• Source changes
41. 41
Key Takeaways
Conclusion
Source Abstraction
• Hides complexity for ease of data access by business.
Semantic Data Modeling
• Business Entities and pre-aggregated views and reports.
Flexible Publication Options
• Multiple options that adapt to the needs of the consumer.
Development and Operations
• Simplifies data security, privacy and audit
Enable self-service
• Simplifies data exploration and ability to handle metadata