Benchmark: Beyond Aurora. Scale-out SQL databases for AWS.Clustrix
The document discusses options for scaling relational database management systems (RDBMS). It describes scale-up vs scale-out approaches, and compares solutions like master-slave replication, sharding, and using scale-out databases. It provides details on ClustrixDB's scale-out architecture with shared-nothing storage and automatic data distribution. Benchmark results show ClustrixDB outperforming Aurora for throughput and latency on OLTP workloads as nodes are added.
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Clustrix
Do you have a high-value, high throughput application running on AWS? Are you moving part or all of your infrastructure to AWS? Do you have a high-transaction workload that is only expected to grow as your company grows? Choosing the right database for your move to AWS can make you a hero or a goat. Be a hero!
Databases are the mission-critical lifeline of most businesses. For years MySQL has been the easy choice -- but the popularity of the cloud and new products like Aurora, RDS MySQL and ClustrixDB have given customers choices and options that can help them work smarter and more efficiently.
Enterprise Strategy Group (ESG) presents their findings from a recent performance benchmark test configured for high-transaction, low-latency workloads running on AWS.
In this webinar, you will learn:
How high-transaction, high-value database workloads perform when run on three popular databases solutions running on AWS.
How key metrics like transactions per second (tps) and database response time (latency) can affect performance and customer satisfaction.
How the ability to scale both database reads and writes is the key to unlocking performance on AWS
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...ScyllaDB
To maximize the benefits of ScyllaDB, you must adapt the structure of your data. Data modeling for ScyllaDB should be query-driven based on your access patterns – a very different approach than normalization for SQL tables. In this session, you will learn how tools can help you migrate your existing SQL structures to accelerate your digital transformation and application modernization.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Beyond Aurora. Scale-out SQL databases for AWS Clustrix
As enterprises move to AWS, they have great choices for MySQL compatible databases. Knowing the best database for the specific job can save you time and money. In this webinar, Lokesh Khosla will discuss high-performance databases for AWS and share his findings based on a benchmark test that simulates the workload of a high-transaction AWS-based solution.
If you work with high transactional workloads, and you need a relational database to keep track of economically valuable items like revenue, inventory and monetary transactions, you'll be interested in this discussion about the strengths and weaknesses of Aurora and other MySQL solutions for AWS.
Database Architecture & Scaling Strategies, in the Cloud & on the Rack Clustrix
Watch the recording here: https://www.youtube.com/watch?v=ZwERp38ynxQ&feature=youtu.be
In this webinar, Robbie Mihayli, VP of Engineering at Clustrix explores how to set up a SQL RDBMS architecture that scales out and is both elastic and consistent, while simultaneously delivering fault tolerance and ACID compliance.
He also covers how data gets distributed in this architecture, how the query processor works, how rebalancing happens and other architectural elements. Examples cited include cloud deployments and e-commerce use-cases.
In this webinar, you will learn:
1. Five RDBMS scaling strategies along with their trade offs
2. The importance of having no single point of failure for OLTP (fault tolerance)
3. The vagaries of the cloud and how it impacts using an RDBMS in the cloud
Who should watch?
1. People interested in high performance, real-time database solutions
2. Companies who have MySQL in their infrastructure and are concerned that their growth will soon overwhelm MySQL’s single-box design
3. DBA’s who implement ‘read slaves’, ‘multiple-masters’ and ‘sharding’ for MySQL databases and want to learn about better ways to scale
Achieve new levels of performance for Magento e-commerce sites.Clustrix
If you run a Magento store that is impacted negatively by catalog updates or indexing/reindexing, listen in. Avoid catalog updates impacting your checkouts, site downtime and long page view load time with the ClustrixDB for Magento Bundle. Created exclusively for high-volume/complex-catalog retailers, this replacement backend is a proven upgrade for Magento sites.
Let us show you how this all works. Recently, at this year’s Magento Imagine, we ran a LIVE demo of the ClustrixDB for Magento Bundle. Stats from the demo:
System ran for 50 hours -- reindexing up to every 12 minutes.
2.6 million orders processed (14.6 per second)
147 million page views (816 per second)
Average Response time of 267ms
0% error rate, 100% checkout uptime
In this webinar, we’ll also cover:
How you can enable catalog updates to process in the background without affecting the normal operations of your site with the Clustrix Shadow (re)Indexer
A Magento-approved alternative database to MySQL that scales performance up/down as your add/subtract commodity nodes to the cluster. ClustrixDB has no read slaves, replication lag or sharding and Flexes Up and down to deliver exactly the right amount of performance and cost every month of the year.
Customer Education Webcast: New Features in Data Integration and Streaming CDCPrecisely
View our quarterly customer education webcast to learn about the new advancements in Syncsort DMX and DMX-h data integration software and DataFunnel - our new easy-to-use browser-based database onboarding application. Learn about DMX Change Data Capture and the advantages of true streaming over micro-batch.
View this webcast on-demand where you'll hear the latest news on:
• Improvements in Syncsort DMX and DMX-h
• What’s next in the new DataFunnel interface
• Streaming data in DMX Change Data Capture
• Hadoop 3 support in Syncsort Integrate products
Scalability truths and serverless architecturesRegunath B
This document discusses scalability challenges with stateful, data-driven systems and event-driven architectures. It introduces concepts like scalability truths, serverless architectures, and maintaining state in event-driven systems. The document then discusses the Flux project, which is an open source, asynchronous, distributed, and reliable state machine-based orchestrator that can be used to build stateful event-driven applications and workflows in a serverless environment.
Benchmark: Beyond Aurora. Scale-out SQL databases for AWS.Clustrix
The document discusses options for scaling relational database management systems (RDBMS). It describes scale-up vs scale-out approaches, and compares solutions like master-slave replication, sharding, and using scale-out databases. It provides details on ClustrixDB's scale-out architecture with shared-nothing storage and automatic data distribution. Benchmark results show ClustrixDB outperforming Aurora for throughput and latency on OLTP workloads as nodes are added.
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Clustrix
Do you have a high-value, high throughput application running on AWS? Are you moving part or all of your infrastructure to AWS? Do you have a high-transaction workload that is only expected to grow as your company grows? Choosing the right database for your move to AWS can make you a hero or a goat. Be a hero!
Databases are the mission-critical lifeline of most businesses. For years MySQL has been the easy choice -- but the popularity of the cloud and new products like Aurora, RDS MySQL and ClustrixDB have given customers choices and options that can help them work smarter and more efficiently.
Enterprise Strategy Group (ESG) presents their findings from a recent performance benchmark test configured for high-transaction, low-latency workloads running on AWS.
In this webinar, you will learn:
How high-transaction, high-value database workloads perform when run on three popular databases solutions running on AWS.
How key metrics like transactions per second (tps) and database response time (latency) can affect performance and customer satisfaction.
How the ability to scale both database reads and writes is the key to unlocking performance on AWS
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...ScyllaDB
To maximize the benefits of ScyllaDB, you must adapt the structure of your data. Data modeling for ScyllaDB should be query-driven based on your access patterns – a very different approach than normalization for SQL tables. In this session, you will learn how tools can help you migrate your existing SQL structures to accelerate your digital transformation and application modernization.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Beyond Aurora. Scale-out SQL databases for AWS Clustrix
As enterprises move to AWS, they have great choices for MySQL compatible databases. Knowing the best database for the specific job can save you time and money. In this webinar, Lokesh Khosla will discuss high-performance databases for AWS and share his findings based on a benchmark test that simulates the workload of a high-transaction AWS-based solution.
If you work with high transactional workloads, and you need a relational database to keep track of economically valuable items like revenue, inventory and monetary transactions, you'll be interested in this discussion about the strengths and weaknesses of Aurora and other MySQL solutions for AWS.
Database Architecture & Scaling Strategies, in the Cloud & on the Rack Clustrix
Watch the recording here: https://www.youtube.com/watch?v=ZwERp38ynxQ&feature=youtu.be
In this webinar, Robbie Mihayli, VP of Engineering at Clustrix explores how to set up a SQL RDBMS architecture that scales out and is both elastic and consistent, while simultaneously delivering fault tolerance and ACID compliance.
He also covers how data gets distributed in this architecture, how the query processor works, how rebalancing happens and other architectural elements. Examples cited include cloud deployments and e-commerce use-cases.
In this webinar, you will learn:
1. Five RDBMS scaling strategies along with their trade offs
2. The importance of having no single point of failure for OLTP (fault tolerance)
3. The vagaries of the cloud and how it impacts using an RDBMS in the cloud
Who should watch?
1. People interested in high performance, real-time database solutions
2. Companies who have MySQL in their infrastructure and are concerned that their growth will soon overwhelm MySQL’s single-box design
3. DBA’s who implement ‘read slaves’, ‘multiple-masters’ and ‘sharding’ for MySQL databases and want to learn about better ways to scale
Achieve new levels of performance for Magento e-commerce sites.Clustrix
If you run a Magento store that is impacted negatively by catalog updates or indexing/reindexing, listen in. Avoid catalog updates impacting your checkouts, site downtime and long page view load time with the ClustrixDB for Magento Bundle. Created exclusively for high-volume/complex-catalog retailers, this replacement backend is a proven upgrade for Magento sites.
Let us show you how this all works. Recently, at this year’s Magento Imagine, we ran a LIVE demo of the ClustrixDB for Magento Bundle. Stats from the demo:
System ran for 50 hours -- reindexing up to every 12 minutes.
2.6 million orders processed (14.6 per second)
147 million page views (816 per second)
Average Response time of 267ms
0% error rate, 100% checkout uptime
In this webinar, we’ll also cover:
How you can enable catalog updates to process in the background without affecting the normal operations of your site with the Clustrix Shadow (re)Indexer
A Magento-approved alternative database to MySQL that scales performance up/down as your add/subtract commodity nodes to the cluster. ClustrixDB has no read slaves, replication lag or sharding and Flexes Up and down to deliver exactly the right amount of performance and cost every month of the year.
Customer Education Webcast: New Features in Data Integration and Streaming CDCPrecisely
View our quarterly customer education webcast to learn about the new advancements in Syncsort DMX and DMX-h data integration software and DataFunnel - our new easy-to-use browser-based database onboarding application. Learn about DMX Change Data Capture and the advantages of true streaming over micro-batch.
View this webcast on-demand where you'll hear the latest news on:
• Improvements in Syncsort DMX and DMX-h
• What’s next in the new DataFunnel interface
• Streaming data in DMX Change Data Capture
• Hadoop 3 support in Syncsort Integrate products
Scalability truths and serverless architecturesRegunath B
This document discusses scalability challenges with stateful, data-driven systems and event-driven architectures. It introduces concepts like scalability truths, serverless architectures, and maintaining state in event-driven systems. The document then discusses the Flux project, which is an open source, asynchronous, distributed, and reliable state machine-based orchestrator that can be used to build stateful event-driven applications and workflows in a serverless environment.
The document provides an introduction and overview of NoSQL databases. It discusses:
- How NoSQL databases are non-relational and differ from traditional relational databases by not requiring fixed schemas and supporting horizontal scaling.
- Examples of different types of NoSQL databases like document stores, key-value stores, and graph databases.
- The CAP theorem and eventual consistency of NoSQL databases, which allow high availability and partitioning at the cost of strong consistency.
- How NoSQL databases are used by large companies to store rapidly growing unstructured and unpredictable data more efficiently than relational databases.
This document provides an overview of Azure SQL Data Warehouse (SQL DWH), a cloud data warehouse service. It discusses SQL DWH's massively parallel processing (MPP) architecture that allows independent scaling of compute and storage. The document demonstrates how to create a SQL DWH, load data using PolyBase, and use common tools. It is intended to help users understand what SQL DWH is, how it works, and common scenarios it can be used for, such as processing large volumes of data without needing to purchase and manage hardware.
VoltDB is an in-memory database designed for high throughput transactional workloads. It partitions data across multiple servers and executes transactions in single threads to avoid locking and improve performance. VoltDB uses stored procedures and an asynchronous client model. It is optimized for high throughput over latency and supports SQL, full ACID compliance, and automatic recovery through snapshotting.
Cassandra is a distributed database designed to handle large amounts of structured data across commodity servers. It provides linear scalability, fault tolerance, and high availability. Cassandra's architecture is masterless with all nodes equal, allowing it to scale out easily. Data is replicated across multiple nodes according to the replication strategy and factor for redundancy. Cassandra supports flexible and dynamic data modeling and tunable consistency levels. It is commonly used for applications requiring high throughput and availability, such as social media, IoT, and retail.
The way we store and manage data is changing. In the old days, there were only a handful of file formats and databases. Now there are countless databases and numerous file formats. The methods by which we access the data has also increased in number. As R users, we often access and analyze data in highly inefficient ways. Big Data tech has solved some of those problems.
This presentation will take attendees on a quick tour of the various relevant Big Data technologies. I’ll explain how these technologies fit together to form a stack for various data analysis uses cases. We’ll talk about what these technologies mean for the future of analyzing data with R.
Even if you work with “small data” this presentation will still be of interest because some Big Data tech has a small data use case.
In the past few years, the term "data lake" has leaked into our lexicon. But what exactly IS a data lake? Some IT managers confuse data lakes with data warehouses. Some people think data lakes replace data warehouses. Both of these conclusions are false. Their is room in your data architecture for both data lakes and data warehouses. They both have different use cases and those use cases can be complementary.
Todd Reichmuth, Solutions Engineer with Snowflake Computing, has spent the past 18 years in the world of Data Warehousing and Big Data. He spent that time at Netezza and then later at IBM Data. Earlier in 2018 making the jump to the cloud at Snowflake Computing.
Mike Myer, Sales Director with Snowflake Computing, has spent the past 6 years in the world of Security and looking to drive awareness to better Data Warehousing and Big Data solutions available! Was previously at local tech companies FireMon and Lockpath and decided to join Snowflake due to the disruptive technology that's truly helping folks in the Big Data world on a day to day basis.
The document provides an overview of the Google Cloud Platform (GCP) Data Engineer certification exam, including the content breakdown and question format. It then details several big data technologies in the GCP ecosystem such as Apache Pig, Hive, Spark, and Beam. Finally, it covers various GCP storage options including Cloud Storage, Cloud SQL, Datastore, BigTable, and BigQuery, outlining their key features, performance characteristics, data models, and use cases.
This document provides an overview of Apache Cassandra, including:
- Cassandra is an open source distributed database designed to handle large amounts of data across commodity servers.
- It was originally created at Facebook and is influenced by Amazon Dynamo and Google Bigtable.
- Cassandra uses a peer-to-peer distributed architecture with no single point of failure and supports replication across multiple data centers.
- It uses a column-oriented data model with tunable consistency levels and supports the Cassandra Query Language (CQL) which is similar to SQL.
- Major companies that use Cassandra include Facebook, Netflix, Twitter, IBM and more for its scalability, availability and flexibility.
CCV: migrating our payment processing system to MariaDBMariaDB plc
CCV is a Dutch payment processor and loyalty provider. CCV's current payment processing platform is built on top of Microsoft SQL Server, but they are currently in the process of migrating it to MariaDB. This migration project is in progress and first production transactions are expected to run in 2020. In this session, Ernst Wernicke and Harry Dijkstra of CCV share how they are using MariaDB to meet critical high availability requirements, including geographic replication, zero data-loss, zero downtime (both planned and unplanned) and no single point of failure anywhere.
ClustrixDB: how distributed databases scale outMariaDB plc
ClustrixDB, now part of MariaDB, is a fully distributed and transactional RDBMS for applications with the highest scalability requirements. In this session Robbie Mihalyi, VP of Engineering for ClustrixDB, provides an introduction to ClustrixDB, followed by an in-depth technical overview of its architecture, with a focus on distributed storage, transactions and query processing – and its unique approach to index partitioning.
Snowflake concepts & hands on expertise to help get you started on implementing Data warehouses using Snowflake. Necessary information and skills that will help you master Snowflake essentials.
This document presents an introduction to NoSQL databases. It begins with an overview comparing SQL and NoSQL databases, describing the architecture of NoSQL databases. Examples of different types of NoSQL databases are provided, including key-value stores, column family stores, document databases and graph databases. MapReduce programming is also introduced. Popular NoSQL databases like Cassandra, MongoDB, HBase, and CouchDB are described. The document concludes that NoSQL is well-suited for large, highly distributed data problems.
What is Change Data Capture (CDC) and Why is it Important?FlyData Inc.
Check out what Change Data Capture (CDC) is and why it is becoming ever more important. Slides also include useful tips on how to design your CDC implementation.
Data warehouses are time variant in the sense because they maintain both
historical and (nearly) current data. Operational databases, in contrast, contain only the most
current, up-to-date data values. Furthermore, they generally maintain this information for not
more than a year. In case of DWs, these are generally loaded from the operational databases
daily, weekly, or monthly which is then typically maintained for a long period.
Stretch Database allows migrating historical transactional data from an on-premises SQL Server database transparently to Microsoft Azure cloud storage. It enables seamless queries of data regardless of its location. Some limitations include inability to enforce uniqueness on stretched tables and limitations on allowed actions. Performance can degrade due to the additional overhead of query translation and data movement between on-premises and cloud locations. Remote data files provide an alternative method of archiving to cloud storage without changes to table structures but only overhead is additional latency.
Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb
Until recently, advancements in data warehousing and analytics were largely incremental. Small innovations in database design would herald a new data warehouse every
2-3 years, which would quickly become overwhelmed with rapidly increasing data volumes. Knowledge workers struggled to access those databases with development intensive BI tools designed for reporting, rather than exploration and sharing. Both databases and BI tools were strained in locally hosted environments that were inflexible to growth or change.
Snowflake and Tableau represent a fundamentally different approach. Snowflake’s multi-cluster shared data architecture was designed for the cloud and to handle logarithmically larger data volumes at blazing speed. Tableau was made to foster an interactive approach to analytics, freeing knowledge workers to use the speed of Snowflake to their greatest advantage.
This document provides a curriculum vitae for Vincent Fiorilli, who has over 30 years of experience as a database administrator (DBA) specializing in Netezza and DB2. He has extensive experience performing tasks such as performance tuning, data modeling, database design, implementation, and maintenance. He has worked on projects involving migration between database platforms and large-scale data warehousing. His technical skills include Netezza, DB2, SQL, Linux, and several hardware and software platforms. He has held consulting roles providing DBA and architecture services to several government and private organizations in Canada.
This document summarizes the key points from a presentation on SQL Server 2016. It discusses in-memory and columnstore features, including performance gains from processing data in memory instead of on disk. New capabilities for real-time operational analytics are presented that allow analytics queries to run concurrently with OLTP workloads using the same data schema. Maintaining a columnstore index for analytics queries is suggested to improve performance.
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScaleMariaDB plc
ApsaraDB is the leading cloud database in China with millions of database instances are running on it. However, the diversity and complexity of the mission-critical applications using it brought a huge challenge to ApsaraDB, scalability – a long-time pain point. To solve the problem, in middle of 2018 and after a careful evaluation, an elegant solution was found in MariaDB MaxScale. So far, the deep synergy of MariaDB MaxScale and ApsaraDB has proved very successful as thousands of high-demand customers of ApsaraDB are benefiting from a much-improved experience. In this presentation, we are going to share following topics:
- How ApsaraDB is using MariaDB MaxScale
- Best practices when leveraging MariaDB MaxScale with ApsaraDB
- Next steps and future plans for for MariaDB MaxScale and ApsaraDB
Christian Coté is an ETL architect and developer with experience using tools like DTS/SSIS, Hummungbird Genio, Informatica, and Datastage. He has worked in domains including pharmaceuticals, finance, insurance, and manufacturing. He specializes in data warehousing and business intelligence and is a Microsoft MVP for SQL Server.
2 years ago if someone had claimed they could stand up a petabyte scale data warehouse in under an hour and then have a non-technical business user querying it live 30 minutes later without knowing any SQL or coding language, they would have been laughed out of the room. These days, that’s called taking advantage of disruptive technology. Amazon Web Services and Tableau Software have shifted the entire paradigm by which organizations not only store and access their data, but ultimately how they innovate with it. The fast, scalable, and inexpensive services that AWS provides for housing data combined with Tableau’s unbelievably flexible and user friendly visual analytic solution means that within hours an organization can securely put the power of their massive data assets into the hands of their domain experts without expensive overhead or lengthy ramp-up time. Attend this webinar to learn how Amazon Web Services and Tableau Software are leveraged together everyday to: • Empower visual ad-hoc data discovery against big data • Revolutionize corporate reporting and dashboards • Promote data driven decision making at every level The presentation will include: • A live demonstration of AWS and Tableau working together • A real customer case study focused on fraud detection and online video metrics • Live Q&A and an opportunity to trial both solutions
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...Charley Hanania
Compression: a hidden Gem for IO heavy Databases
The limiting factor in most database systems is the ability to read and write data to the IO subsystem.
We're still using storage layouts and methodologies in SQL Server that are a reflection of old spinning media in times gone by.
Until major changes are made to the internal storage layouts, we have "some" hope with options such as data compression, sparse columns and filtered indexes, which not only save space on disk, but also reflect a saving in memory.
In this session we will go over the IO savings technologies presented in SQL Server, and discuss how implementing some of these will assist in your operational performance goals.
Presenter: Charley Hanania, MVP
Charley is Principal Consultant at QS2 AG in Switzerland and has consulted to organisations of all sizes during his extensive career in Database and Platform Consulting.
He's been focussed on SQL Server since v4.2 on OS/2 and with over 15 years of experience in IT he's supported companies in the areas of DB training, development, architecture & administration throughout Europe, America & Australasia.
Communities are Charley's passion and he became active in database communities in the mid 90's, participating in heterogeneous database user groups in Australia. He continues to lead an active role through community events such as Database Days, the European PASS Conference, PASS & the Swiss PASS Chapter.
The document provides an introduction and overview of NoSQL databases. It discusses:
- How NoSQL databases are non-relational and differ from traditional relational databases by not requiring fixed schemas and supporting horizontal scaling.
- Examples of different types of NoSQL databases like document stores, key-value stores, and graph databases.
- The CAP theorem and eventual consistency of NoSQL databases, which allow high availability and partitioning at the cost of strong consistency.
- How NoSQL databases are used by large companies to store rapidly growing unstructured and unpredictable data more efficiently than relational databases.
This document provides an overview of Azure SQL Data Warehouse (SQL DWH), a cloud data warehouse service. It discusses SQL DWH's massively parallel processing (MPP) architecture that allows independent scaling of compute and storage. The document demonstrates how to create a SQL DWH, load data using PolyBase, and use common tools. It is intended to help users understand what SQL DWH is, how it works, and common scenarios it can be used for, such as processing large volumes of data without needing to purchase and manage hardware.
VoltDB is an in-memory database designed for high throughput transactional workloads. It partitions data across multiple servers and executes transactions in single threads to avoid locking and improve performance. VoltDB uses stored procedures and an asynchronous client model. It is optimized for high throughput over latency and supports SQL, full ACID compliance, and automatic recovery through snapshotting.
Cassandra is a distributed database designed to handle large amounts of structured data across commodity servers. It provides linear scalability, fault tolerance, and high availability. Cassandra's architecture is masterless with all nodes equal, allowing it to scale out easily. Data is replicated across multiple nodes according to the replication strategy and factor for redundancy. Cassandra supports flexible and dynamic data modeling and tunable consistency levels. It is commonly used for applications requiring high throughput and availability, such as social media, IoT, and retail.
The way we store and manage data is changing. In the old days, there were only a handful of file formats and databases. Now there are countless databases and numerous file formats. The methods by which we access the data has also increased in number. As R users, we often access and analyze data in highly inefficient ways. Big Data tech has solved some of those problems.
This presentation will take attendees on a quick tour of the various relevant Big Data technologies. I’ll explain how these technologies fit together to form a stack for various data analysis uses cases. We’ll talk about what these technologies mean for the future of analyzing data with R.
Even if you work with “small data” this presentation will still be of interest because some Big Data tech has a small data use case.
In the past few years, the term "data lake" has leaked into our lexicon. But what exactly IS a data lake? Some IT managers confuse data lakes with data warehouses. Some people think data lakes replace data warehouses. Both of these conclusions are false. Their is room in your data architecture for both data lakes and data warehouses. They both have different use cases and those use cases can be complementary.
Todd Reichmuth, Solutions Engineer with Snowflake Computing, has spent the past 18 years in the world of Data Warehousing and Big Data. He spent that time at Netezza and then later at IBM Data. Earlier in 2018 making the jump to the cloud at Snowflake Computing.
Mike Myer, Sales Director with Snowflake Computing, has spent the past 6 years in the world of Security and looking to drive awareness to better Data Warehousing and Big Data solutions available! Was previously at local tech companies FireMon and Lockpath and decided to join Snowflake due to the disruptive technology that's truly helping folks in the Big Data world on a day to day basis.
The document provides an overview of the Google Cloud Platform (GCP) Data Engineer certification exam, including the content breakdown and question format. It then details several big data technologies in the GCP ecosystem such as Apache Pig, Hive, Spark, and Beam. Finally, it covers various GCP storage options including Cloud Storage, Cloud SQL, Datastore, BigTable, and BigQuery, outlining their key features, performance characteristics, data models, and use cases.
This document provides an overview of Apache Cassandra, including:
- Cassandra is an open source distributed database designed to handle large amounts of data across commodity servers.
- It was originally created at Facebook and is influenced by Amazon Dynamo and Google Bigtable.
- Cassandra uses a peer-to-peer distributed architecture with no single point of failure and supports replication across multiple data centers.
- It uses a column-oriented data model with tunable consistency levels and supports the Cassandra Query Language (CQL) which is similar to SQL.
- Major companies that use Cassandra include Facebook, Netflix, Twitter, IBM and more for its scalability, availability and flexibility.
CCV: migrating our payment processing system to MariaDBMariaDB plc
CCV is a Dutch payment processor and loyalty provider. CCV's current payment processing platform is built on top of Microsoft SQL Server, but they are currently in the process of migrating it to MariaDB. This migration project is in progress and first production transactions are expected to run in 2020. In this session, Ernst Wernicke and Harry Dijkstra of CCV share how they are using MariaDB to meet critical high availability requirements, including geographic replication, zero data-loss, zero downtime (both planned and unplanned) and no single point of failure anywhere.
ClustrixDB: how distributed databases scale outMariaDB plc
ClustrixDB, now part of MariaDB, is a fully distributed and transactional RDBMS for applications with the highest scalability requirements. In this session Robbie Mihalyi, VP of Engineering for ClustrixDB, provides an introduction to ClustrixDB, followed by an in-depth technical overview of its architecture, with a focus on distributed storage, transactions and query processing – and its unique approach to index partitioning.
Snowflake concepts & hands on expertise to help get you started on implementing Data warehouses using Snowflake. Necessary information and skills that will help you master Snowflake essentials.
This document presents an introduction to NoSQL databases. It begins with an overview comparing SQL and NoSQL databases, describing the architecture of NoSQL databases. Examples of different types of NoSQL databases are provided, including key-value stores, column family stores, document databases and graph databases. MapReduce programming is also introduced. Popular NoSQL databases like Cassandra, MongoDB, HBase, and CouchDB are described. The document concludes that NoSQL is well-suited for large, highly distributed data problems.
What is Change Data Capture (CDC) and Why is it Important?FlyData Inc.
Check out what Change Data Capture (CDC) is and why it is becoming ever more important. Slides also include useful tips on how to design your CDC implementation.
Data warehouses are time variant in the sense because they maintain both
historical and (nearly) current data. Operational databases, in contrast, contain only the most
current, up-to-date data values. Furthermore, they generally maintain this information for not
more than a year. In case of DWs, these are generally loaded from the operational databases
daily, weekly, or monthly which is then typically maintained for a long period.
Stretch Database allows migrating historical transactional data from an on-premises SQL Server database transparently to Microsoft Azure cloud storage. It enables seamless queries of data regardless of its location. Some limitations include inability to enforce uniqueness on stretched tables and limitations on allowed actions. Performance can degrade due to the additional overhead of query translation and data movement between on-premises and cloud locations. Remote data files provide an alternative method of archiving to cloud storage without changes to table structures but only overhead is additional latency.
Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb
Until recently, advancements in data warehousing and analytics were largely incremental. Small innovations in database design would herald a new data warehouse every
2-3 years, which would quickly become overwhelmed with rapidly increasing data volumes. Knowledge workers struggled to access those databases with development intensive BI tools designed for reporting, rather than exploration and sharing. Both databases and BI tools were strained in locally hosted environments that were inflexible to growth or change.
Snowflake and Tableau represent a fundamentally different approach. Snowflake’s multi-cluster shared data architecture was designed for the cloud and to handle logarithmically larger data volumes at blazing speed. Tableau was made to foster an interactive approach to analytics, freeing knowledge workers to use the speed of Snowflake to their greatest advantage.
This document provides a curriculum vitae for Vincent Fiorilli, who has over 30 years of experience as a database administrator (DBA) specializing in Netezza and DB2. He has extensive experience performing tasks such as performance tuning, data modeling, database design, implementation, and maintenance. He has worked on projects involving migration between database platforms and large-scale data warehousing. His technical skills include Netezza, DB2, SQL, Linux, and several hardware and software platforms. He has held consulting roles providing DBA and architecture services to several government and private organizations in Canada.
This document summarizes the key points from a presentation on SQL Server 2016. It discusses in-memory and columnstore features, including performance gains from processing data in memory instead of on disk. New capabilities for real-time operational analytics are presented that allow analytics queries to run concurrently with OLTP workloads using the same data schema. Maintaining a columnstore index for analytics queries is suggested to improve performance.
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScaleMariaDB plc
ApsaraDB is the leading cloud database in China with millions of database instances are running on it. However, the diversity and complexity of the mission-critical applications using it brought a huge challenge to ApsaraDB, scalability – a long-time pain point. To solve the problem, in middle of 2018 and after a careful evaluation, an elegant solution was found in MariaDB MaxScale. So far, the deep synergy of MariaDB MaxScale and ApsaraDB has proved very successful as thousands of high-demand customers of ApsaraDB are benefiting from a much-improved experience. In this presentation, we are going to share following topics:
- How ApsaraDB is using MariaDB MaxScale
- Best practices when leveraging MariaDB MaxScale with ApsaraDB
- Next steps and future plans for for MariaDB MaxScale and ApsaraDB
Christian Coté is an ETL architect and developer with experience using tools like DTS/SSIS, Hummungbird Genio, Informatica, and Datastage. He has worked in domains including pharmaceuticals, finance, insurance, and manufacturing. He specializes in data warehousing and business intelligence and is a Microsoft MVP for SQL Server.
2 years ago if someone had claimed they could stand up a petabyte scale data warehouse in under an hour and then have a non-technical business user querying it live 30 minutes later without knowing any SQL or coding language, they would have been laughed out of the room. These days, that’s called taking advantage of disruptive technology. Amazon Web Services and Tableau Software have shifted the entire paradigm by which organizations not only store and access their data, but ultimately how they innovate with it. The fast, scalable, and inexpensive services that AWS provides for housing data combined with Tableau’s unbelievably flexible and user friendly visual analytic solution means that within hours an organization can securely put the power of their massive data assets into the hands of their domain experts without expensive overhead or lengthy ramp-up time. Attend this webinar to learn how Amazon Web Services and Tableau Software are leveraged together everyday to: • Empower visual ad-hoc data discovery against big data • Revolutionize corporate reporting and dashboards • Promote data driven decision making at every level The presentation will include: • A live demonstration of AWS and Tableau working together • A real customer case study focused on fraud detection and online video metrics • Live Q&A and an opportunity to trial both solutions
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...Charley Hanania
Compression: a hidden Gem for IO heavy Databases
The limiting factor in most database systems is the ability to read and write data to the IO subsystem.
We're still using storage layouts and methodologies in SQL Server that are a reflection of old spinning media in times gone by.
Until major changes are made to the internal storage layouts, we have "some" hope with options such as data compression, sparse columns and filtered indexes, which not only save space on disk, but also reflect a saving in memory.
In this session we will go over the IO savings technologies presented in SQL Server, and discuss how implementing some of these will assist in your operational performance goals.
Presenter: Charley Hanania, MVP
Charley is Principal Consultant at QS2 AG in Switzerland and has consulted to organisations of all sizes during his extensive career in Database and Platform Consulting.
He's been focussed on SQL Server since v4.2 on OS/2 and with over 15 years of experience in IT he's supported companies in the areas of DB training, development, architecture & administration throughout Europe, America & Australasia.
Communities are Charley's passion and he became active in database communities in the mid 90's, participating in heterogeneous database user groups in Australia. He continues to lead an active role through community events such as Database Days, the European PASS Conference, PASS & the Swiss PASS Chapter.
Webinar How to Achieve True Scalability in SaaS ApplicationsTechcello
This document summarizes a webinar on achieving true scalability in SaaS applications. It discusses key factors demanding scalability like increased user concurrency. It covers best practices for scaling the web application and data tiers, such as using auto-scaling, queues, and databases like DynamoDB. It also discusses leveraging cloud services for scalability and provides examples of scaling on AWS. Speaker profiles are included for experts from AWS and Techcello discussing scalability strategies.
MySQL is an open-source relational database management system that works on many platforms. It provides multi-user access to support many storage engines and is backed by Oracle. SQL is the core of a relational database which is used for accessing and managing the database. The different subsets of SQL are DDL, DML, DCL, and TCL. MySQL has many features including ease of management, robust transactional support, high performance, low total cost of ownership, and scalability.
The document discusses various techniques for managing performance and concurrency in SQL Server databases. It covers new features in SQL Server 2008/R2 such as read committed snapshot isolation, partition-level lock escalation, filtered indexes, and bulk loading. It also discusses tools for monitoring performance like the Utility Control Point and Performance Monitor. The document uses case studies to demonstrate how these techniques can be applied.
Maintenance plans provide a way to automate database maintenance tasks such as integrity checks, index maintenance, and backups. They can be created using the Maintenance Plan Wizard or Maintenance Plan Designer. Common tasks include checking database integrity with DBCC CHECKDB, reorganizing or rebuilding indexes, updating statistics, and performing full, differential or transaction log backups. Care must be taken to choose the right tasks and schedule to maintain performance and protect the database.
SQLSaturday is a training event for SQL Server professionals and those wanting to learn about SQL Server. This event will be held Jun 13 2015 at Hochschule Bonn-Rhein-Sieg, Grantham-Allee 20, St. Augustin, Rheinland, 53757, Germany. Admittance to this event is free, all costs are covered by donations and sponsorships. Please register soon as seating is limited, and let friends and colleagues know about the event.
###
Maintenance Plans for Beginners (but not only) | Each of experienced administrators used (to some extent) what is called Maintenance Plans - Plans of Conservation. During this session, I'd like to discuss what can be useful for us to provide functionality when we use them and what to look out for. Session at 200 times the forward-300, with the opening of the discussion.
Achieving Cost and Resource efficiency within OpenStack through Trove Database-As-A-Service (DBaaS)
Trove is an OpenStack DBaaS that allows organizations to leverage their OpenStack infrastructure in a cost-effective way to deploy solutions built upon traditional databases. Trove provides a unified solution for all database types and can provide cost and resource savings through reduced complexity. It allows rapid provisioning of database instances, standardized infrastructure, and self-service capabilities for database management. Trove is integrated with OpenStack and supports both relational and non-relational databases to provide a flexible database solution.
The document is a resume for Mostafa El-Masry, who is seeking a career as a senior database administrator or database analyst. It outlines his extensive experience over 15 years in database administration, including roles at the Ministry of Social Affairs and Ministry of Higher Education in Saudi Arabia. It also lists his technical skills and qualifications, such as being a Microsoft Certified IT Professional in SQL Server 2008 Database Administration.
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
In this webinar, we discuss how the secret sauce to your business analytics strategy remains rooted on your approached, methodologies and the amount of data incorporated into this critical exercise. We also address best practices to supercharge your cloud analytics initiatives, and tips and tricks on designing the right information architecture, data models and other tactical optimizations.
To learn more, visit: http://www.snaplogic.com/redshift-trial
Data warehouse 2.0 and sql server architecture and visionKlaudiia Jacome
The document discusses the evolution of data warehousing architectures from DW 1.0 to DW 2.0. It summarizes how SQL Server has also evolved its architecture to support the needs of advanced data warehouses aligned with DW 2.0, including features like sequential data access for analytics, easy migration from data marts to enterprise data warehouses, and distributed processing to reduce costs for large volumes of data.
RightScale Webinar: So you want to move to the cloud... but you’re not sure what that means, or where you would even start. Or you want to get your feet wet with a proof-of-concept project before you bring out the big guns. We asked Brian Adler, our Professional Services Architect who works directly with customers on cloud projects every single day, to select five cloud projects that you can get started with (and complete!) quickly. In this webinar, Brian and Rafael Saavedra, our VP of Engineering, will walk you through those five projects and will help you demonstrate success in the cloud now.
The document discusses data warehousing concepts including:
1) A data warehouse is a subject-oriented, integrated, and non-volatile collection of data used for decision making. It stores historical and current data from multiple sources.
2) The architecture of a data warehouse is typically three-tiered, with an operational data tier, data warehouse/data mart tier for storage, and client access tier. OLAP servers allow analysis of stored data.
3) ROLAP and MOLAP refer to relational and multidimensional approaches for OLAP. ROLAP dynamically generates data cubes from relational databases, while MOLAP pre-calculates and stores aggregated data in multidimensional structures.
AWS Redshift Introduction - Big Data AnalyticsKeeyong Han
Redshift is a scalable SQL database in AWS that can store up to 1.6PB of data across multiple servers. It uses a columnar data storage model that makes adding or removing columns fast. Data is uploaded from S3 using SQL COPY commands and queried using standard SQL. The document provides recommendations for getting started with Redshift, such as performing daily full refreshes initially and then implementing incremental update mechanisms to enable more frequent updates.
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...Continuent
Marketo uses Continuent Tungsten to solve key data management challenges at scale. Tungsten provides high availability, online maintenance, and parallel replication to allow Marketo to process over 600 million MySQL transactions per day across more than 7TB of data without downtime. Tungsten's innovative caching and sharding techniques help replicas keep up with Marketo's high transaction volumes and uneven tenant sizes. The solution has enabled fast failover, rolling maintenance, and scaling to thousands of customers.
OLAP (online analytical processing) allows users to easily extract and analyze data from different perspectives. It stores data in multidimensional databases to allow for complex queries. There are three main types of OLAP - relational, multidimensional, and hybrid. OLAP is used with data warehouses to enable analytics like data mining and decision making. It provides benefits over transactional systems by facilitating flexible analysis of integrated data over time.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
Choosing technologies for a big data solution in the cloudJames Serra
Has your company been building data warehouses for years using SQL Server? And are you now tasked with creating or moving your data warehouse to the cloud and modernizing it to support “Big Data”? What technologies and tools should use? That is what this presentation will help you answer. First we will cover what questions to ask concerning data (type, size, frequency), reporting, performance needs, on-prem vs cloud, staff technology skills, OSS requirements, cost, and MDM needs. Then we will show you common big data architecture solutions and help you to answer questions such as: Where do I store the data? Should I use a data lake? Do I still need a cube? What about Hadoop/NoSQL? Do I need the power of MPP? Should I build a "logical data warehouse"? What is this lambda architecture? Can I use Hadoop for my DW? Finally, we’ll show some architectures of real-world customer big data solutions. Come to this session to get started down the path to making the proper technology choices in moving to the cloud.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
3. Shehap EL-Nagar
I am MVP,MCTS , MCITP SQL Server, I am DB consultant and Architect for lots of Banking, Telecom ,Ministries and
governmental organizations all over Gulf ,also he has deep knowledge about T-SQL performance , HW Performance issues,
Data Warehousing solutions , SQL Server Replication, Clustering solutions and Database Designs for different kinds of
systems ...
The founder of the biggest SQL Server community all over the middle east http://sqlserver-performance-tuning.net/ , you
can watch its success memories at http://www.youtube.com/user/ShehapElNagar
Moderator and author at http://www.sql-server-performance.com ,
, the 1st SQL Server Author at MSDN Arabia http://msdn.microsoft.com/ar-sa/library/jj149119.aspx
, Speaker at SQL Saturday Events worldwide , local events at Saudi Arabia , many online events , more than 90 video
tutorials and also many private sessions for .net developers and Database Administrators
And also influent participator at Microsoft Forums of SQL Server at http://social.technet.microsoft.com.
More about him , you can find him on MVP Microsoft site http://mvp.microsoft.com/en-us/mvp/Shehap%20El-Nagar5000188 .
You can contact him at the below contacts :Mail :idgdirector@yahoo.com ….Cellular phone :00966560700733
4. Agenda and Overview:
First :Definitions and benefits of DWH
•Definition of Data Warehousing Solutions
•Benefits of DWH Solutions
•Why RTDWH (Real Time Data Warehousing
) is high necessary..?
•Data Warehouse
vs. Data Mart
•Relational DB vs. Dimensional DB
•Dimensional Database vs. Multidimensional Database
•Star Schema vs. Snow flake schema
•Techniques of DWH solutions
Second : RTDWH for online Reporting
•Technique and concepts
•Demo
Third :DWH for online Archiving
•Technique and concepts
•Demo
Fourth :DWH for online ETL
•Technique and concepts
•Demo
6. Definition of Data Warehousing
Relational
Database 1
Optimized Loader
Relational
Database 2
Data
Cleansing
Data Warehouse
Engine
Relational
Database 3
Relational
Database 4
De-normalize
Data
Metadata Repository
6
7. Benefits of Data warehousing:
Data Consolidation & organization
Data standardization for different attributes such as Collation
Support numerous RDBM sources flexibly like SQL Server , Oracle ,
TeraData , Informix , SAP BI , Sybase, Access , CSV files , Excel…etc
Scale up reports either SSRS or SSAS reports (OLAP Reports)
Speed up reports performance
8. Why Real Time Data
Warehousing..?
Active
decision support
Business
activity monitoring (BAM)
Alerting
Efficiently
execute business strategy
9. Relational DB vs. Dimensional DB:
Relational DB represents a normalized DB for OLTP transactions
purposes.
More normalization >>>Less no of columns >>> less possibility of
indexes >>> Less IO cost of cluster indexes while using them for
insert /update /delete of OLTP transactions
Dimensional DB represents a de-normalized DB for OLAP purposes
More number of interrelated columns in one table >>> Less
possibility for joins >>> More covering compound indexes
10. Data Warehouse vs. Data Mart:
•Data warehouse is a global repository for wide scale of
business
•Data mart is a smaller repository for specific business
scope
Therefore, we could say a Data Mart solution is sub set of
a bigger Data warehousing solution
12. Dimensional Database vs. Multidimensional Database:
Dimensional DBs could be used as staging DBs for SSAS reports or they could
be used directly for SSRS reports
Multidimensional DB represent SSAS DBs composed of cubes which are
formed basically of :
•Facts
tables which Contain business process core where aggregative columns
called measures could be found there.
•Dimension
tables which Contain Lookup details relevant to these aggregative
data
(DWH DB)
Dimensional DB
DB Service
(OLAP DB)
Multi-Dimensional DB
OLAP Service
Decision Support Client
Presentation Layer
13. Star Schema vs. Snow flake schema :
Snow flake schema close much the design of star schema
design but the first one is trying to break down schema
design more into smaller tables to avoid more redundancy of
columns.
Snow flake schema isn’t recommended for neither OLAP
transaction nor OLTP transaction
15. Snow flake Schema
Telephone
date, custno, prodno, cityname,Region ...
Name
Fact Table
Gender
Marital status
Gender
lookup table
Marital status
lookup table
16. Data warehousing techniques
•
•Old
2005 codes (Select /insert/Update /Delete)
•New
2008 codes “Merge” which could replace more
efficiently all of above commands in one statement
•DTS
(Data transformation Service) and SSIS Packages
•Enterprise
platform solutions for LDWH(Large DWH)
17. Enterprise Platform Solutions
Fast Data tracking solution
Sybase IQ
Red Brick Warehouse
IBM
DB2 MVS
Universal Server
IBM Data Warehousing
Teradata
Informix
Online Dynamic Server
XPS --Extended Parallel Server
Universal Server for object relational applications
19. Technique of DWH Solutions used for Online Reporting
•Creating
2 tables (One Temp table and the second is DWH table itself)
•
•Making
•Then
all DML transactions on a Temp table.
compare Temp table results with DWH table.
•If
not match for any record /column, then Bulk Merge command from
Temp table to DWH Table
•You
can use now this DWH Table for your online Reports
20. Concepts of DWH Solution used for Online Reporting
1- Set xact_abort on : To ensure the highest transactional status for group of
DML transactions to commit all if all succeeded and rollback all if any of them failed
2- Set nocount on :To speed up queries by avoiding counting no of records each
time of run
3- Set deadlock_priority low; To avoid any impact on end users transactions
while this online data warehousing.
4- Try /Catch commands : To capture any possible errors and report them by
mail.
5- Bulk Logged mode :To save efficiently more storage capacity while bulk
merge
6-Using Read committed snapshot isolation level using row versioning is
recommended to avoid heavy locks/deadlcoksd
23. Techniques of DWH Solutions for Archiving
•Bulk
insert the old data from a Source table to an Archived table
•Bulk
delete from source table after success of 1st step
•Bulk
delete should be split into smaller patches with small
no of records like 1000 delay of 5-30 sec between each patch and
another to avoid any tangible locks or deadlocks
24. Concepts of DWH Solutions used for Archiving
1 - Bulk Logged mode :To save efficiently more storage capacity while bulk
merge as we are going more to show by next workshops
2- Use WAITFOR DELAY '00:00:30'doesn’t mark for risky waits here, but just a
normal wait command like service broker wait.
3- Bulk Insert and bulk delete phases could be conducted in different
transactions in different time intervals without any risks
4- You could validate that also using output commands
27. Technique of DWH Solutions used for ETL
Run your ETL process in parallel with end users activities
but to a different table rather than online tables
Once finish, start to scan all mismatches between the 2
tables through the 3 data warehousing statements
28. Concepts of DWH Solutions of DWH used for ETL
Scanning any new inserted data entity within the source tables to be inserted to
the target tables
Scanning any updated data entity through scanning any records shared between
the 2 tables for PK values but different for any other columns
.
Scanning any deleted data through using except commands
•The
3 phases could be undertaken asynchronously without any risk at all