Features or concepts like Change Tracking, Change Data Capture, Temporal Tables, and other similar delta systems are complex and may carry a stigma or misapprehension in your organization around performance or security or cost. Even if you do not directly implement these features or methods absolutely, most information systems rely on tracking changes especially from legacy line of business applications. I'm here to show you robust techniques for implementing delta systems in SQL Server to increase the trustworthiness of your data warehouse. I will also steer you away from common pitfalls.
En esta sesión revisamos las nuevas mejoras y funcionalidades que estarán implementadas en la siguiente versión de SQL Server principalmente en Seguridad, Rendimiento y Alta Disponibilidad
The document discusses new features in SQL Server 2008 that improve data storage, analytics, performance, scalability, high availability, security, and manageability. Key highlights include:
- Storing and querying multiple data types like relational, documents, XML, and spatial data more efficiently
- Enhancements for analytics, reporting, and mixed queries using features like column sets and sparse columns
- Increased scalability through features such as resource governor, memory management improvements, and query optimization
- High availability options like database mirroring, failover clustering, and replication
- Security enhancements including encryption, auditing, and reduced attack surfaces
- Simplified administration using tools such as SQL Server Management
The document discusses several new features in Oracle Database 11g for management enhancements including:
1) Change capture and replay capabilities to setup test environments and perform online application upgrades.
2) Snapshot standbys for test environments that allow testing and discarding of writes without impacting the primary database.
3) Database replay to capture and replay workloads in pre-and post-change systems to analyze for errors or performance issues.
4) Several new capabilities for online patching, upgrades, and automatic diagnostic workflows.
This document describes a DataStage job design to solve a scenario problem. The design includes:
1) A job with a seq file input, aggregator stage to count rows by key, and filter stage to output rows by count.
2) The aggregator stage groups data by the "No" column and calculates row counts for each key.
3) The filter stage outputs rows where count equals 1 and where count is greater than 1 to separate files.
Daniel A. Morgan presented on various Oracle insert statement techniques to the Guatemala Oracle Users Group on August 17, 2015. The presentation covered basic insert statements, insert when/all/first statements, inserting into a select statement, and performance tuning techniques for insert statements. It provided examples of inserting data into single and multiple columns and conditionally inserting into different tables based on column values.
SQL Server 2016 includes several new features such as columnstore indexes, in-memory OLTP, live query statistics, temporal tables, and row-level security. It also features improved manage backup functionality, support for multiple tempdb files, and new ways to format and encrypt query results. Advanced capabilities like PolyBase and Stretch Database further enhance analytics and management of historical data.
This document provides an overview of Oracle Database 11g Direct NFS Client. Some key points:
- Direct NFS Client integrates NFS client functionality directly into Oracle software to optimize I/O performance between Oracle and NFS servers.
- Benefits include significantly better performance, scalability, and high availability compared to operating system NFS clients. It also reduces costs by eliminating need for expensive networking components.
- Administration is simplified as Direct NFS Client provides standard configuration across platforms and automates optimization for RAC databases.
1. Oracle Database 12c includes new features such as pluggable databases for consolidation, increased size limits for data types, and inline PL/SQL functions.
2. Other features include improved redaction policies, the ability to migrate table partitions online without disruption, and indexing on single columns.
3. Performance is also enhanced through in-memory options, adaptive query optimization, and temporary undo settings.
En esta sesión revisamos las nuevas mejoras y funcionalidades que estarán implementadas en la siguiente versión de SQL Server principalmente en Seguridad, Rendimiento y Alta Disponibilidad
The document discusses new features in SQL Server 2008 that improve data storage, analytics, performance, scalability, high availability, security, and manageability. Key highlights include:
- Storing and querying multiple data types like relational, documents, XML, and spatial data more efficiently
- Enhancements for analytics, reporting, and mixed queries using features like column sets and sparse columns
- Increased scalability through features such as resource governor, memory management improvements, and query optimization
- High availability options like database mirroring, failover clustering, and replication
- Security enhancements including encryption, auditing, and reduced attack surfaces
- Simplified administration using tools such as SQL Server Management
The document discusses several new features in Oracle Database 11g for management enhancements including:
1) Change capture and replay capabilities to setup test environments and perform online application upgrades.
2) Snapshot standbys for test environments that allow testing and discarding of writes without impacting the primary database.
3) Database replay to capture and replay workloads in pre-and post-change systems to analyze for errors or performance issues.
4) Several new capabilities for online patching, upgrades, and automatic diagnostic workflows.
This document describes a DataStage job design to solve a scenario problem. The design includes:
1) A job with a seq file input, aggregator stage to count rows by key, and filter stage to output rows by count.
2) The aggregator stage groups data by the "No" column and calculates row counts for each key.
3) The filter stage outputs rows where count equals 1 and where count is greater than 1 to separate files.
Daniel A. Morgan presented on various Oracle insert statement techniques to the Guatemala Oracle Users Group on August 17, 2015. The presentation covered basic insert statements, insert when/all/first statements, inserting into a select statement, and performance tuning techniques for insert statements. It provided examples of inserting data into single and multiple columns and conditionally inserting into different tables based on column values.
SQL Server 2016 includes several new features such as columnstore indexes, in-memory OLTP, live query statistics, temporal tables, and row-level security. It also features improved manage backup functionality, support for multiple tempdb files, and new ways to format and encrypt query results. Advanced capabilities like PolyBase and Stretch Database further enhance analytics and management of historical data.
This document provides an overview of Oracle Database 11g Direct NFS Client. Some key points:
- Direct NFS Client integrates NFS client functionality directly into Oracle software to optimize I/O performance between Oracle and NFS servers.
- Benefits include significantly better performance, scalability, and high availability compared to operating system NFS clients. It also reduces costs by eliminating need for expensive networking components.
- Administration is simplified as Direct NFS Client provides standard configuration across platforms and automates optimization for RAC databases.
1. Oracle Database 12c includes new features such as pluggable databases for consolidation, increased size limits for data types, and inline PL/SQL functions.
2. Other features include improved redaction policies, the ability to migrate table partitions online without disruption, and indexing on single columns.
3. Performance is also enhanced through in-memory options, adaptive query optimization, and temporary undo settings.
This document discusses partitioning in Oracle Database 11g. It introduces partitioning concepts and strategies including range, list, hash, interval and reference partitioning. It describes how partitioning can improve performance through pruning and partition-wise joins. It also explains how partitioning enhances manageability through maintenance operations on individual partitions and improves availability through partition independence. The document outlines Oracle Database 11g's extensions to partitioning including interval partitioning, reference partitioning, and virtual column-based partitioning.
The document discusses various disaster recovery strategies for SQL Server including failover clustering, database mirroring, and peer-to-peer transactional replication. It provides advantages and disadvantages of each approach. It also outlines the steps to configure replication for Always On Availability Groups which involves setting up publications and subscriptions, configuring the availability group, and redirecting the original publisher to the listener name.
This document provides an overview of Oracle 11g data warehousing capabilities. It discusses key concepts like what a data warehouse is and its characteristics. It also outlines the common Oracle data warehousing tasks and steps for setting up a data warehouse system, including preparing the environment, configuring the database, and accessing Oracle Warehouse Builder.
The document discusses installing DataStage and configuring projects. It describes installing the DataStage server first before installing any clients, and provides an overview of the server installation process which includes entering license information and selecting installation directories and options. It also briefly outlines installing the DataStage clients after the server and the different editions available, and notes that projects must be configured and opened before using any of the DataStage tools.
This document provides an overview of new features in Oracle Database 11g for administrators. It includes sections on installation and upgrades, database diagnosis and repair, database administration features, and SQL and development features. The document aims to explain new 11g concepts briefly with code examples and avoids detailed explanations. It also lists resources for further information on 11g features.
This document provides an overview of key concepts in Oracle database administration including:
1) Installing and configuring Oracle databases, exploring the physical and logical database architecture, and managing database storage structures.
2) Administering user security through creating users, granting privileges, and managing roles.
3) Performing backup and recovery of Oracle databases using both user-managed and RMAN-managed methods.
4) Additional topics covered include high availability features like Data Guard, performance tuning, and Oracle Grid Control.
Database performance tuning and query optimizationDhani Ahmad
Database performance tuning involves activities to ensure queries are processed in the minimum amount of time. A DBMS processes queries in three phases - parsing, execution, and fetching. Indexes are crucial for speeding up data access by facilitating operations like searching and sorting. Query optimization involves the DBMS choosing the most efficient plan for accessing data, such as which indexes to use.
Oracle DBA Tutorial for Beginners -Oracle training institute in bangaloreTIB Academy
Get Oracle DBA Training through free Oracle DBA Tutorial, In this Oracle DBA Tutorial specially made for Beginners. You can download Oracle DBA Tutrial
Industry leading
Build mission-critical, intelligent apps with breakthrough scalability, performance, and availability.
Security + performance
Protect data at rest and in motion. SQL Server is the most secure database for six years running in the NIST vulnerabilities database.
End-to-end mobile BI
Transform data into actionable insights. Deliver visual reports on any device—online or offline—at one-fifth the cost of other self-service solutions.
In-database advanced analytics
Analyze data directly within your SQL Server database using R, the popular statistics language.
Consistent experiences
Whether data is in your datacenter, in your private cloud, or on Microsoft Azure, you’ll get a consistent experience.
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eualdaschwede80
The talk explains some differences between Sybase ASE and PostgreSQL and shows two different migration strategies - the dump reload process and replication.
a striped down Version of a presentation about oracle architecture. Goal was a basic understanding and foundation about some components of Oracle, so subsequent discussions should be easier
The document provides an overview of Oracle Database including its architecture, components, and functions. It discusses Oracle's three-level database architecture consisting of the external, conceptual, and internal levels. It also describes Oracle's memory structure including the shared pool, database buffer cache, and redo log buffer. Key Oracle background processes like DBWR, LGWR, PMON, SMON, and CKPT are summarized.
Một số lưu ý khi viết các báo cáo 3 chuyên ngành thuộc bộ môn KD quốc tếminhdoan102
Một số lưu ý khi viết các báo cáo 3 chuyên ngành thuộc bộ môn Kinh doanh quốc tế. PGS.TS. Hà Thị Ngọc Oanh, Chủ nhiệm bộ môn Kinh doanh quốc tế, Khoa Kinh tế Thương mại, Đại học Hoa Sen.
This document outlines three keys to managing rapid change:
1) Leadership is the most powerful change lever and leaders must effectively communicate the benefits of change, address resistance, and coach/motivate employees.
2) Rapid change can cause fear in employees due to loss of security in changing job roles and expertise, which leads to resistance. Leaders must overcome this fear.
3) Successful change requires a methodology including change strategy, stakeholder analysis, readiness assessment, training, communication, champions, feedback, and support.
This document discusses partitioning in Oracle Database 11g. It introduces partitioning concepts and strategies including range, list, hash, interval and reference partitioning. It describes how partitioning can improve performance through pruning and partition-wise joins. It also explains how partitioning enhances manageability through maintenance operations on individual partitions and improves availability through partition independence. The document outlines Oracle Database 11g's extensions to partitioning including interval partitioning, reference partitioning, and virtual column-based partitioning.
The document discusses various disaster recovery strategies for SQL Server including failover clustering, database mirroring, and peer-to-peer transactional replication. It provides advantages and disadvantages of each approach. It also outlines the steps to configure replication for Always On Availability Groups which involves setting up publications and subscriptions, configuring the availability group, and redirecting the original publisher to the listener name.
This document provides an overview of Oracle 11g data warehousing capabilities. It discusses key concepts like what a data warehouse is and its characteristics. It also outlines the common Oracle data warehousing tasks and steps for setting up a data warehouse system, including preparing the environment, configuring the database, and accessing Oracle Warehouse Builder.
The document discusses installing DataStage and configuring projects. It describes installing the DataStage server first before installing any clients, and provides an overview of the server installation process which includes entering license information and selecting installation directories and options. It also briefly outlines installing the DataStage clients after the server and the different editions available, and notes that projects must be configured and opened before using any of the DataStage tools.
This document provides an overview of new features in Oracle Database 11g for administrators. It includes sections on installation and upgrades, database diagnosis and repair, database administration features, and SQL and development features. The document aims to explain new 11g concepts briefly with code examples and avoids detailed explanations. It also lists resources for further information on 11g features.
This document provides an overview of key concepts in Oracle database administration including:
1) Installing and configuring Oracle databases, exploring the physical and logical database architecture, and managing database storage structures.
2) Administering user security through creating users, granting privileges, and managing roles.
3) Performing backup and recovery of Oracle databases using both user-managed and RMAN-managed methods.
4) Additional topics covered include high availability features like Data Guard, performance tuning, and Oracle Grid Control.
Database performance tuning and query optimizationDhani Ahmad
Database performance tuning involves activities to ensure queries are processed in the minimum amount of time. A DBMS processes queries in three phases - parsing, execution, and fetching. Indexes are crucial for speeding up data access by facilitating operations like searching and sorting. Query optimization involves the DBMS choosing the most efficient plan for accessing data, such as which indexes to use.
Oracle DBA Tutorial for Beginners -Oracle training institute in bangaloreTIB Academy
Get Oracle DBA Training through free Oracle DBA Tutorial, In this Oracle DBA Tutorial specially made for Beginners. You can download Oracle DBA Tutrial
Industry leading
Build mission-critical, intelligent apps with breakthrough scalability, performance, and availability.
Security + performance
Protect data at rest and in motion. SQL Server is the most secure database for six years running in the NIST vulnerabilities database.
End-to-end mobile BI
Transform data into actionable insights. Deliver visual reports on any device—online or offline—at one-fifth the cost of other self-service solutions.
In-database advanced analytics
Analyze data directly within your SQL Server database using R, the popular statistics language.
Consistent experiences
Whether data is in your datacenter, in your private cloud, or on Microsoft Azure, you’ll get a consistent experience.
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eualdaschwede80
The talk explains some differences between Sybase ASE and PostgreSQL and shows two different migration strategies - the dump reload process and replication.
a striped down Version of a presentation about oracle architecture. Goal was a basic understanding and foundation about some components of Oracle, so subsequent discussions should be easier
The document provides an overview of Oracle Database including its architecture, components, and functions. It discusses Oracle's three-level database architecture consisting of the external, conceptual, and internal levels. It also describes Oracle's memory structure including the shared pool, database buffer cache, and redo log buffer. Key Oracle background processes like DBWR, LGWR, PMON, SMON, and CKPT are summarized.
Một số lưu ý khi viết các báo cáo 3 chuyên ngành thuộc bộ môn KD quốc tếminhdoan102
Một số lưu ý khi viết các báo cáo 3 chuyên ngành thuộc bộ môn Kinh doanh quốc tế. PGS.TS. Hà Thị Ngọc Oanh, Chủ nhiệm bộ môn Kinh doanh quốc tế, Khoa Kinh tế Thương mại, Đại học Hoa Sen.
This document outlines three keys to managing rapid change:
1) Leadership is the most powerful change lever and leaders must effectively communicate the benefits of change, address resistance, and coach/motivate employees.
2) Rapid change can cause fear in employees due to loss of security in changing job roles and expertise, which leads to resistance. Leaders must overcome this fear.
3) Successful change requires a methodology including change strategy, stakeholder analysis, readiness assessment, training, communication, champions, feedback, and support.
El documento describe una maratón fotográfica que tuvo lugar durante la XIX Semana Cultural y Fiestas en Pajarillos. La maratón incluyó varias categorías de fotos como sonrisas, colores, edificios, parques, paseos y esculturas en Pajarillos, así como una categoría de tema libre. El objetivo era capturar diferentes aspectos de la vida y el paisaje de la comunidad de Pajarillos a través de la fotografía.
The document provides guidance on writing an essay, including the typical structure and components. It discusses the introduction paragraph, which includes a hook to capture the reader's attention, an introduction of the topic and title, listing three reasons or proofs, and concluding with the thesis statement. It then provides an example introduction paragraph analyzing how disguises and costumes are key elements in the play Twelfth Night by William Shakespeare.
This document describes the amenities available at the Ajmera Lugaano Yelahanka housing project in Bangalore. It includes recreational facilities like a swimming pool, beach volleyball court, gym, sunken bar, badminton courts, play areas for kids, basketball courts, gardens, jogging track, and a space for yoga.
Este documento provee una guía sobre urticaria. Define urticaria como ronchas o habones que pueden variar en tamaño, así como angioedema. Explica que la urticaria puede ser espontánea, física, química o inducida. Además, describe la fisiopatogenia, factores desencadenantes, diagnóstico y tratamiento de urticaria aguda y crónica.
9 Dr Ahmed Esawy imaging oral board of skull imaging part IAHMED ESAWY
9 Dr Ahmed Esawy imaging oral board of skull imaging part I
include different cases for oral radiodiagnosis examination all over the world
CT /MRI images
PLAIN X RAY
Advanced Analytics: Analytic Platforms Should Be Columnar OrientationDATAVERSITY
A columnar database is an implementation of the relational theory, but with a twist. The data storage layer does not contain records. It contains a grouping of columns.
Due to the variable column lengths within a row, a small column with low cardinality, or variability of values, may reside completely within one block while another column with high cardinality and longer length may take a thousand blocks. In columnar, all the same data — your data — is there. It’s just organized differently (automatically, by the DBMS).
The main reason why you would want to utilize a columnar approach is simply to speed up the native performance of analytic queries.
Learn about the columnar orientation and how it can be effective for your needs. This is the native orientation of many databases and several others that have optional column-oriented storage layers.
There is also the equivalent in the cloud storage world, which is open format Parquet.
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLInside Analysis
The Briefing Room with Dr. Robin Bloor and Splice Machine
Live Webcast August 11, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=e1b33c9d45b178e13784b4a971a4c1349
The ETL process was born out of necessity, and for decades it has been the glue between data sources and target applications. But as data
growth soars and increased competition demands real-time data, standard ETL has become brittle and often unmanageable. Scaling up resources can do the trick, but it’s very costly and only a matter of time before the processes hit another bottleneck. When outmoded ETL stands in the way of real-time analytics, it might be time to consider a completely new approach.
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor as he explains how modern, data-driven architectures must adopt an equally capable data integration strategy. He’ll be briefed by Rich Reimer of Splice Machine, who will discuss how his company solves ETL performance issues and enables real-time analytics and reports on big data. He will show that by leveraging the scale-out power of Hadoop and the in-memory speed of Spark, users can bring both analytical and operational systems together, eventually performing transformations only when needed.
Visit InsideAnalysis.com for more information.
This document discusses data extraction and transformation in an ETL process. It covers extracting changed data from modern systems using techniques like timestamps and triggers, as well as extracting from legacy systems using log tapes. The document also discusses major types of transformations including format revision, merging information, and date/time conversion. Finally, it provides examples of data content defects seen in source systems.
Preparing Your Legacy Data for Automation in S1000Ddclsocialmedia
This document discusses preparing legacy data for automation in S1000D. It outlines the challenges of converting traditional linear documents into the modular structure required by S1000D. These challenges include identifying reusable content, assigning data modules and codes, and structuring information across publications. The document recommends planning thoroughly for a conversion project, including assessing source materials, analyzing content reuse, specifying the conversion, and normalizing data. It describes setting up the conversion project, performing document analysis, and developing a detailed specification to guide the conversion process.
Snowflake and Oracle Autonomous Data Warehouse are two leading cloud data warehouse services. While both aim to simplify data warehousing, Oracle Autonomous Data Warehouse provides more complete automation through its self-driving, self-securing, and self-repairing capabilities. The document finds that Oracle Autonomous Data Warehouse outperforms Snowflake in several key areas including simplicity, automation, performance, security, flexibility and cost. Specifically, Oracle Autonomous Data Warehouse requires less manual intervention, achieves better performance through full automation, and offers significantly lower costs through its superior performance and elasticity controls.
No matches found
What is C A on DOB = Appt Date?
Clients Appointments
Joins
Service Appt Date
Nails 5/1/2013
Hair 7/5/2013
Hair 9/1/2013
Name Phone Address DOB
Anna 215-123-4567 123 City
Lane
8/14/1995
Nathan 267-333-4444 999 Oak
Blvd
6/1/1998
What is C A on DOB = Appt Date ?
Clients Appointments
second iteration: find every appointment with an Appt Date of 6/1/1998
Joins
Service Appt Date
Nails 5/1/2013
The document discusses the history and evolution of Oracle Database from its beginnings in 1977 through version 12c. It describes how early versions introduced SQL and basic reliability features, and how subsequent versions added capabilities like distributed processing, transactions, PL/SQL, and Real Application Clusters. It also summarizes how new pluggable database and in-memory technologies in version 12c allow for more efficient consolidation of databases and management of storage.
Reduce latency and boost sql server io performanceKevin Kline
Is SQL Server slow for you? Attend this webinar and learn how you can optimize your SQL Server performance. (Download the companion T-SQL scripts from Kevin's at http://blogs.sqlsentry.com/KevinKline). Hear how the pros pinpoint performance bottlenecks and leverage the latest advancements in storage technology to decrease access latency and IO wait times. By the end of the webinar you'll have the tools and information you need to recommend the best approach for your SQL Server environment.
Learn about the three advances in database technologies that eliminate the need for star schemas and the resulting maintenance nightmare.
Relational databases in the 1980s were typically designed using the Codd-Date rules for data normalization. It was the most efficient way to store data used in operations. As BI and multi-dimensional analysis became popular, the relational databases began to have performance issues when multiple joins were requested. The development of the star schema was a clever way to get around performance issues and ensure that multi-dimensional queries could be resolved quickly. But this design came with its own set of problems.
Unfortunately, the analytic process is never simple. Business users always think up unimaginable ways to query the data. And the data itself often changes in unpredictable ways. These result in the need for new dimensions, new and mostly redundant star schemas and their indexes, maintenance difficulties in handling slowly changing dimensions, and other problems causing the analytical environment to become overly complex, very difficult to maintain, long delays in new capabilities, resulting in an unsatisfactory environment for both the users and those maintaining it.
There must be a better way!
Watch this webinar to learn:
- The three technological advances in data storage that eliminate star schemas
- How these innovations benefit analytical environments
- The steps you will need to take to reap the benefits of being star schema-free
The document introduces Oracle Data Integrator and Oracle GoldenGate as solutions for enterprise data integration. It discusses challenges with fragmented data silos and the need to improve data accessibility, reliability, and quality across systems. It describes how Oracle Data Integrator uses an ELT approach to load and transform data, leveraging database technologies. It also explains how Oracle GoldenGate enables real-time data integration. The document highlights benefits of Oracle Data Integrator such as faster performance, simpler setup and management, and lower costs compared to traditional ETL approaches.
The document introduces Oracle Data Integrator and Oracle GoldenGate as solutions for enterprise data integration. It discusses challenges with fragmented data silos and the need to improve data accessibility, reliability, and quality across systems. Oracle Data Integrator is presented as a solution for real-time enterprise data integration using an ELT approach. It can integrate data across various systems faster and with lower total cost of ownership compared to traditional ETL. Oracle GoldenGate enables real-time data replication and change data capture. Together, Oracle Data Integrator and Oracle GoldenGate provide a full suite for batch, incremental, and real-time data integration.
The document summarizes features and capabilities of Oracle Database including:
- Support for structured and unstructured data types including images, XML, and multimedia.
- Tools for managing growth of data and enabling innovation with different data types.
- Self-managing capabilities that help liberate DBAs from resource management tasks.
- Features for high performance, availability, security and compliance at lower costs.
Managing the Complexities of Conversion to S1000Ddclsocialmedia
If you've ever been faced with the challenges related to converting your data to XML, this webinar is for you! In addition to the basic challenges of converting data to XML, the conversion to S1000D has the complexity of Data Module Requirements List (DMRL), Applicability and other content driven tagging structures. Having a solid plan in place and identifying issues prior to conversion is imperative to the overall success of the project.
Vijay Kumar Singh is a software engineer with 3 years of experience in SQL/PLSQL development on Oracle and SQL Server databases. He has worked on projects in banking, capital markets, and retail domains. His responsibilities have included developing stored procedures and functions, preparing test cases, analyzing requirements, and coordinating with clients. He is seeking new assignments involving database development with Oracle SQL/PLSQL.
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017Andrew Miller
This document provides an overview of architecting data protection with Rubrik presented by Andrew Miller and Rebecca Fitzhugh. It discusses key considerations for disaster recovery planning like business impact analyses, service level agreements, recovery point and recovery time objectives. It introduces Rubrik's approach to data management which aims to simplify architectures using a software-defined fabric. The presentation demonstrates Rubrik's capabilities for rapid data ingestion, intelligent SLA policies, instant recovery of VMs and files, and integration with public clouds.
MySQL Cluster is a database that provides in-memory real-time performance, web scalability, and 99.999% availability. It uses memory optimized tables with durability and can handle high volumes of both reads and writes simultaneously in a distributed, auto-sharding fashion while maintaining ACID compliance. It offers high availability through a shared nothing architecture with no single point of failure and self-healing capabilities.
The document provides biographical information about Mahesh Vallampati including his career history working in IT roles for various companies, education background, white papers, blog, LinkedIn group, and contact information. It also discusses various performance issues that can occur with databases and applications and emphasizes the importance of properly identifying the root cause before blaming individuals or components. The case studies describe specific examples of performance problems encountered and the methods used to diagnose and resolve the issues.
This is a high level presentation I delivered at BIWA Summit. It's just some high level thoughts related to today's NoSQL and Hadoop SQL engines (not deeply technical).
AMIS organiseerde op maandagavond 15 juli het seminar ‘Oracle database 12c revealed’. Deze avond bood AMIS Oracle professionals de eerste mogelijkheid om de vernieuwingen in Oracle database 12c in actie te zien! De AMIS specialisten die meer dan een jaar bèta testen hebben uitgevoerd lieten zien wat er nieuw is en hoe we dat de komende jaren gaan inzetten!
Deze presentatie is deze avond gegeven als een plenaire sessie!
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here? In this webinar, we say no.
Databases have not sat around while Hadoop emerged. The Hadoop era generated a ton of interest and confusion, but is it still relevant as organizations are deploying cloud storage like a kid in a candy store? We’ll discuss what platforms to use for what data. This is a critical decision that can dictate two to five times additional work effort if it’s a bad fit.
Drop the herd mentality. In reality, there is no “one size fits all” right now. We need to make our platform decisions amidst this backdrop.
This webinar will distinguish these analytic deployment options and help you platform 2020 and beyond for success.
Similar to Implementing Change Systems in SQL Server 2016 (20)
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
3. The goal of this series is to give you the tools you need to push analytics forward at your company.
• The nature and importance of change systems in an overall data platform
• Compare and contrast traditional and modern data warehouse architectures
• Discuss a key technology that is core to change systems in the enterprise
• Compare the SQL Server features that enable robust change data capture
Change Systems
Agenda
4. Database Engine
MDF LDF
Overview
The Source of Change
• A database engine manages files.
• Data structures
• Transaction logs
• Change systems accurately track
modifications inside data structures.
• The source of record for change is the
transaction log. Using this log directly is
a characteristic of passive change
systems.
• Active change systems watch the data
structure and record observable
change.
5. Overview
Modeling Change
AccountID CustomerID AccountBalance ModifyDate
4568456 2342 1234758.23
2017-03-11
04:11:05
4624572 9875 5768.01
2017-03-11
04:13:15
4745733 8735 478893.33
2017-03-11
04:13:01
AccountID CustomerID Type Amount EventDate
4568456 2342 Deposit 1198575.32
2017-03-08
09:09:04
4624572 9875 Deposit 4438.70
2017-03-08
09:10:01
4745733 8735 Deposit 460436.02
2017-03-07
10:13:20
4568456 2342 Deposit 528.11
2017-03-08
06:13:45
4624572 9875 Deposit 1345.23
2017-03-09
10:22:25
4745733 8735 Deposit 635.20
2017-03-08
11:13:01
4568456 2342 Withdrawal 23.21
2017-03-09
12:12:02
4624572 9875 Fee 21.34
2017-03-09
06:13:45
4745733 8735 Withdrawal 42.66
2017-03-10
13:13:12
4568456 2342 Transfer 35678.01
2017-03-11
04:11:05
4624572 9875 Deposit 5.42
2017-03-11
04:13:15
4745733 8735 Deposit 17864.77
2017-03-11
04:13:01
Table
Log
=*
*Record the CRUD operations to the table and you get a changelog.
The duality is that a table supports data at
rest and logs capture change. If you have a
log you can not only create the original table
but a myriad of other derived tables. Logs
therefore seem to be a more fundamental
data structure.
6. Overview
Modeling Change
Valid Time
John Doe who lived in Flat Rock,
NC made his first visit to us on
April 1st, 1985 and changed his
permanent address during a sale
on November 12th 2005.
Name Address ValidFrom ValidTo
John Doe 81 Carl Sandberg Ln, Flat Rock, NC 28731
1985-04-01
10:00:00
2005-11-12
09:05:00
John Doe 9433 Collingdale Way, Raleigh, NC 27617
2005-11-12
09:06:01
9999-12-31
23:59:99
Transaction Time
Our data warehouse went live on
November 1st 2005. The ETL runs
daily at 4 AM.
Name Address CreateDate ExpireDate
John Doe 81 Carl Sandberg Ln, Flat Rock, NC 28731
2005-11-01
09:25:11
2005-11-13
04:54:11
John Doe 9433 Collingdale Way, Raleigh, NC 27617
2005-11-13
04:54:12
9999-12-31
23:59:99
7. Overview
Modeling Change
ID Name Address ModifyDate
12345 John Doe 81 Carl Sandberg Ln, Flat Rock, NC 28731
1985-04-01
10:00:00
12345 John Doe 9433 Collingdale Way, Raleigh, NC 27617
2005-11-12
09:06:01
Key ID Name Address ValidFrom ValidTo CreateDate ExpireDate
1 12345 John Doe 81 Carl Sandberg Ln, Flat Rock, NC 28731
1985-04-01
10:00:00
2005-11-12
09:06:00
2005-11-01
09:25:11
2005-11-13
04:54:11
2 12345 John Doe 9433 Collingdale Way, Raleigh, NC 27617
2005-11-12
09:06:01
9999-12-31
23:59:99
2005-11-13
04:54:12
9999-12-31
23:59:99
ETL
Source
Target
SCD 2
Dimension
This column
creates risk
Latency of 1 Day at best
9. Application
Database
SQL
DB2
SQL
…
Enterprise Data
Warehouse
Mart Mart
Batch ETL
Jobs
Storage and Query
Traditional Architecture
Focus on the Source
Focus Area One
Friction & Frustration
Data Quality
• Timeliness
• Latency of change
• Latency of build
• Consistency
• Redundant ETL
• Accuracy
• Filters
• Logic
• Source
Lead Time
• Custom ETL
• Manual ETL
• Business case and ceremony
• Domain knowledge
Dependencies
• Business logic
• Redundancy
• Downstream effects
• Team
10. Collect
and
Route
Events Query | Model | Automate
Modern Architecture
The Push Method (Lambda)
Speed Layer
Batch Layer
Serving
Layer
Real-time
Views
Batch
Views
11. Events
Query | Model | Automate
Stream
Modern Architecture
The Push Method (Kappa)
Unified Log Storage
Archive
Collect
13. • Ingest (don’t extract) disparate silos of data
• Store data in its atomic form (no transform)
• Collect changes as if they were events (immutable)
• Run downstream ETL more often (process less data each cycle)
Modern Architecture
Lessons Learned
15. Mirror Layer
Analytical Model
Temporary Staging
Source
Why Have a Mirror Layer?
1. Improve the data structure of a
source system (add primary keys,
indexes)
2. Hide complexity related to the
type of source system (SQL, API,
Mainframe)
3. Improve the quality and
performance of change tracking
4. Enable data governance programs
by homogenizing sources
5. Enable prototyping of new
automation solutions without
developer support
Risks/Assumptions
This layer must be real-time and
simple, close to the metal. The more
it looks like another ETL layer, the
more the risks will outweigh the
benefits.
Transform
Near Real-time
Intensive
Transform
Mirror layer
Overview
16. But all I read is hate for replication on the internets!
17. Mirror layer
Replication in Production
Sale
Transaction
Customer
Profile
Source Database Server
T-LOGT-LOG
Pub
Sub
ArticleArticle
Push
Dist
cmd
• Set up everything in a lower environment
and replay production activity to get an
idea of load.
• The source database is placed into an
Always On availability group so that the
database and replication can failover.
• Distributor and subscriber are moved to
their own failover cluster.
• Subscribers connect to an availability
group listener so they can find the right
server after a failover.
• Database and log backups are still taken
regularly to support disaster recovery,
but additional preparations are made to
enable a smooth restore of replication.
18. Mirror Layer Demo
Features of SQL Server
AccountID CustomerID AccountBalance ModifyDate
4568456 2342 1234758.23
2017-03-11
04:11:05
4624572 9875 5768.01
2017-03-11
04:13:15
AccountID Operation Columns
4568456 INSERT
4624572 UPDATE AccountBalance
4745733 DELETE
Base Table
Change Table (Internal)
AccountID CustomerID AccountBalance ModifyDate CreateDate ExpireDate
4624572 9875 5001.01
2017-03-10
06:19:01
2017-03-10
06:20:35
2017-03-11
04:14:22
4745733 8735 478893.33
2017-03-11
04:13:01
2017-03-11
04:14:59
2017-03-12
09:01:12
History Table
Change Tracking
• Net changes only
• No data
• Internal tables
• Internal functions
• Retention period only
Temporal Tables
• Net changes not automatic
• Data
• Normal tables
• T-SQL language integration
• Full support for archiving
#1 challenge: active = observation is difficult, passive = logs are esoteric and sheltered
Which is more valuable? In general logs are more valuable; however, it can be expensive to implement. It all depends on controls, trust, and latency. If your source database has the right audit fields, they’re trustworthy, and bath ETL is OK, then active change systems are the easy and cheap choice.
These two things are equivalent and they had better be because this is the foundation of reliability, availability, and resilience of SQL Server and all other RDBMS systems. It’s how why we mirror databases, log ship, and how create high availability in data center infrastructure.
A log is essentially a backup of all possible states of the table and any other possible derived table.
Which is more valuable?
As you approach real-time integration these two concepts are effectively interchangeable at the margin.
John spent $30,000 over 600 transactions with us while he lived in Flat Rock ($50/tr). He has spent $250 over 2 transactions since he “moved” to Raleigh ($125/tr). Raleigh is where our brick and mortar store resides.
What if you have to side-load new data from say an acquisition, merger, or archive situation? What if you have to trunc and reload this table? What about failure? You want valid time more than transaction time. Most data warehouses keep only transaction time.
If ModifyDate does not exist or you don’t trust it then you’ve got a rough road ahead.
LOTS OF CEREMONY.
This is a synchronous world. Application have their own databases. We reach in and extract large amounts of data, bring it down to disk, and search for changes. We transform the data and load complex schemas with information.
Who do you scale this system? You can’t do it horizontally. You can only scale up: bigger SQL servers, SSD SANs.
Schemas must be designed and built before the Business can discover and analyze. Arbitrary questions are difficult to ask of the system and typically involve data points not yet modeled. In almost every experience, I have seen the Business’ need for information out pace IT’s capacity to build.
Of course, it’s never this simple….
This general architecture is called a lambda architecture. The traditional Extract portion of ETL is no longer relevant. We now “ingest” data in this architecture. Applications (and even devices) are “emitting” their events.
VALUE: robustness, fault-tolerance, low latency reads/writes, scalability, generalization, extensibility, minimal maintenance, ad hoc queries, debuggability.
I could fill this slide with the companies that implement this architecture including Microsoft, Walmart, Yahoo, LinkedIn, and Netflix.
Nathan Marz – creator of Storm
Jay Kreps – originally a software engineer at LinkedIn who was a key contributor to Kafka. LinkedIn is an interesting case because during Kreps’ tenure at the company they went from an VLB Oracle DW to Teradata to Hadoop and along that path they discovered the value of immutable streams of data in stark contrast to batch analytics.
Clients have to keep track of their place in the log.
The unified log made data fungible.
Oil is fungible because it has equivalent value regardless of source. It is commoditized.
Data, like oil, derives its value from an even quality and its ability to flow freely between endpoints as a homogenous commodity.
By simplifying the data to its most fundamental structure (a log), we can unify organizations around a data platform, and (click) we can unify our analytics process because we have standardized our data flow.
Most organizations already have a data warehouse. Frankly, the architectures previously presented make a lot of assumptions about the organization like the reliance on software engineers, the way the organization approaches data integration and information management.
So how do we get started? Well, luckily SQL Server has been around for a very long time, and we can bootstrap a progressive architecture by understanding and applying the good parts of what we’ve learned from others in the field.
Cloud-born data should remain in the cloud.
Here we move from a pull to a push architecture. We are closer to applications emitting their own events. This is not another ETL layer, we are ingesting database transactions as they appear in real-time. This satisfies the principles of a mirror layer. Indeed, if you cannot satisfy these principles, it is best to move back to the traditional architecture.
With this architecture, we can support micro-batch and batch processes with a robust, fault-tolerate tool that is close to the metal and simple.
Downstream development becomes simpler and more confident where the focus is more on steering the analytical model and less to do with tracking source system data changes. Data quality and governance metrics become trustworthy because the mirror layer is sentient.
Fundamentally, a mirror layer reflects source systems exactly table-by-table. It makes every source system look like a SQL Server database: DB2, flat files, APIs, etc. We are hiding the complexity of a heterogeneous environment. It becomes the source for analytical data models.
This is where you improve a system that you cannot control: you add primary keys, indexing, etc. Remember, a source system database is designed for transactional speed. We would rather it be design for query speed; you can do this in the mirror layer.
This is also where you track changes. Especially when a source system tracks changes poorly, a mirror helps you iron out transactional history in a way that is robust and fault tolerant. It should be able to heal when a source system changes its schema during a release, for example.
By homogenizing sources, we are supporting data governance from a lineage perspective. Moreover, we are allowing governance personnel to access system data that they would not otherwise have access to in the production operation environment. Metrics can be compared more easily and new metrics can be created in less time.
Maybe most importantly, we are providing a layer for the Business to prototype the next valuable solution. The business should not report from or run ad hoc on source system databases. They can run it on this layer and help us build high quality solution faster through prototyping.
This is all possible with SQL Server Standard edition.
Always Encrypted makes this not possible.
Change tracking is essentially just metadata. But it sure is powerful and can transform your ETL process if all you need is to know what has changed.
Temporal tables is a fully supported time machine. Unfortunately, getting net changes is not automatic.
This is transaction time but if replication is setup as continuous then we are very close to valid time at the margin. You will still need to seed valid time as part of an initial load.