The document provides an overview of different types of database testing including front end database testing, structural backend testing, functional backend testing, database migration testing, data warehouse testing, and batch job execution testing. It describes the key aspects to test for each type, such as verifying database schemas, stored procedures, triggers, data integrity, security, performance, and more. Screenshots are also included to exemplify some of the testing processes.
Description:
ETL basically stands for Extract Transform Load - which simply implies the process where you extract data from Source Tables, transform them in to the desired format based on certain rules and finally load them onto Target tables. There are numerous tools that help you with ETL process - Informatica, Control-M being a few notable ones.
So ETL Testing implies - Testing this entire process using a tool or at table level with the help of test cases and Rules Mapping document.
In ETL Testing, the following are validated -
1) Data File loads from Source system on to Source Tables.
2) The ETL Job that is designed to extract data from Source tables and then move them to staging tables. (Transform process)
3) Data validation within the Staging tables to check all Mapping Rules / Transformation Rules are followed.
4) Data Validation within Target tables to ensure data is present in required format and there is no data loss from Source to Target tables.
Job Scope: 100% Job guarantee as this rare skill, many companies find crunch for candidates
Duration: Normal Track - 4 weekends
Fast Track – 2 weekends/2days
Fee: 8K
New Batch: Every weekend
A data driven etl test framework sqlsat madisonTerry Bunio
This document provides an overview and summary of a SQL Saturday event on automated database testing. It discusses:
1. The presenter's background and their company Protegra which focuses on Agile and Lean practices.
2. The learning objectives of the presentation which are around why and how to automate database testing using tools like tSQLt and SQLtest.
3. A comparison of Waterfall and Agile methodologies with a focus on how Agile lends itself better to test automation.
4. A demonstration of setting up and running simple tests using tSQLt to showcase how it can automate database testing and make it easier compared to traditional methods.
The Query Service is the new platform solution for querying a variety of data sources. The goal of Query Service is that administrators can configure a metadata description of the data source that can then be used by end users without detailed knowledge of the underlying data source. This session explains how to configure Query Service data sources and use them with the RESTful API or component collection.
- The document provides an introduction to using SoapUI to test web services. It discusses creating a project, adding a WSDL, and exploring the different components such as requests, responses, and endpoints. It also demonstrates how to submit sample requests using a currency conversion web service WSDL and view the responses. The overall goal is to help attendees understand the basics of setting up and using SoapUI to test web services.
The document discusses the basic steps of query processing which include query parsing and translation to check syntax and transform the query to a relational algebra expression, query optimization to transform the initial query plan into the best plan based on the dataset by specifying operations and algorithms, and query evaluation to execute the optimized query plan and return the results.
(ATS6-PLAT02) Accelrys Catalog and Protocol ValidationBIOVIA
Accelrys Catalog is a powerful new technology for creating an index of the protocols and components within your organization. You will learn about strategies for indexing and how search capabilities can be deployed to professional client and Web Port end users. You will also learn how to use this technology to find out about system usage to aid with system upgrades, server consolidations, and general system maintenance. The protocol validation capability in the admin portal allows administrators to created standard reports on server usage characteristics. You will learn how to report on violations of IT policies (e.g. around security), bad protocol authoring practices, or missing or incomplete protocol documentation. Developers will also learn how to extend and customize the rules used to create these reports.
The document provides an overview of different types of database testing including front end database testing, structural backend testing, functional backend testing, database migration testing, data warehouse testing, and batch job execution testing. It describes the key aspects to test for each type, such as verifying database schemas, stored procedures, triggers, data integrity, security, performance, and more. Screenshots are also included to exemplify some of the testing processes.
Description:
ETL basically stands for Extract Transform Load - which simply implies the process where you extract data from Source Tables, transform them in to the desired format based on certain rules and finally load them onto Target tables. There are numerous tools that help you with ETL process - Informatica, Control-M being a few notable ones.
So ETL Testing implies - Testing this entire process using a tool or at table level with the help of test cases and Rules Mapping document.
In ETL Testing, the following are validated -
1) Data File loads from Source system on to Source Tables.
2) The ETL Job that is designed to extract data from Source tables and then move them to staging tables. (Transform process)
3) Data validation within the Staging tables to check all Mapping Rules / Transformation Rules are followed.
4) Data Validation within Target tables to ensure data is present in required format and there is no data loss from Source to Target tables.
Job Scope: 100% Job guarantee as this rare skill, many companies find crunch for candidates
Duration: Normal Track - 4 weekends
Fast Track – 2 weekends/2days
Fee: 8K
New Batch: Every weekend
A data driven etl test framework sqlsat madisonTerry Bunio
This document provides an overview and summary of a SQL Saturday event on automated database testing. It discusses:
1. The presenter's background and their company Protegra which focuses on Agile and Lean practices.
2. The learning objectives of the presentation which are around why and how to automate database testing using tools like tSQLt and SQLtest.
3. A comparison of Waterfall and Agile methodologies with a focus on how Agile lends itself better to test automation.
4. A demonstration of setting up and running simple tests using tSQLt to showcase how it can automate database testing and make it easier compared to traditional methods.
The Query Service is the new platform solution for querying a variety of data sources. The goal of Query Service is that administrators can configure a metadata description of the data source that can then be used by end users without detailed knowledge of the underlying data source. This session explains how to configure Query Service data sources and use them with the RESTful API or component collection.
- The document provides an introduction to using SoapUI to test web services. It discusses creating a project, adding a WSDL, and exploring the different components such as requests, responses, and endpoints. It also demonstrates how to submit sample requests using a currency conversion web service WSDL and view the responses. The overall goal is to help attendees understand the basics of setting up and using SoapUI to test web services.
The document discusses the basic steps of query processing which include query parsing and translation to check syntax and transform the query to a relational algebra expression, query optimization to transform the initial query plan into the best plan based on the dataset by specifying operations and algorithms, and query evaluation to execute the optimized query plan and return the results.
(ATS6-PLAT02) Accelrys Catalog and Protocol ValidationBIOVIA
Accelrys Catalog is a powerful new technology for creating an index of the protocols and components within your organization. You will learn about strategies for indexing and how search capabilities can be deployed to professional client and Web Port end users. You will also learn how to use this technology to find out about system usage to aid with system upgrades, server consolidations, and general system maintenance. The protocol validation capability in the admin portal allows administrators to created standard reports on server usage characteristics. You will learn how to report on violations of IT policies (e.g. around security), bad protocol authoring practices, or missing or incomplete protocol documentation. Developers will also learn how to extend and customize the rules used to create these reports.
MySQL optimisations of Docplanner servicesTomasz Wójcik
I would like to show how small maintenance negligence have negatively impacted the scalability and stability of our applications. I also wanted to focus on database indexes which are often forgotten in application maintenance yet they have huge impact on query speeds.
The document discusses various agile tools including test-driven development, version control, continuous integration testing, web performance testing, and load testing. It provides an overview and instructions for using each tool. For test-driven development, it describes writing tests before code, running tests, and refactoring code. For version control, it discusses managing code revisions with Team Foundation Server. For continuous integration, it explains automatically running tests on each code check-in. For performance testing, it outlines recording and running web and load tests to analyze a system's behavior under different usage scenarios.
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and ChallengesBIOVIA
This document provides tips and tricks for using Pipeline Pilot, including how to use protocol search, favorites bar, tool tips, component profiling, design mode, protocol recovery, recursion vs looping, merge/join operations, debugging tips, and RTC subprotocols. It emphasizes best practices like avoiding loops and using recursion instead. Design mode and checkpoints are highlighted as useful debugging aids. Resources like training, support, and the user community are recommended for additional help.
Exciting Features for SQL Devs in SQL 2012Brij Mishra
SQL 2012 includes several new features for SQL developers including contained databases, columnstore indexes, sequence objects, data paging improvements, and new analytic functions like LEAD() and LAG(). It also enhances Transact-SQL with new conversion, date/time and logical functions and improves metadata discovery and error handling. Visual Studio integration is also improved with tighter management studio integration.
Annotation Sniffer is an Eclipse plugin that detects annotation bad smells. It downloads as a JAR file and detects bad smells in code annotations, which are a form of code. The plugin has been improved by making metrics, thresholds, and bad smells more extensible through annotations and external configuration instead of hardcoded values. These changes allow new metrics, thresholds, and bad smells to be added more easily.
Importing Queries using Mass Import ToolDatagaps Inc
ETL Validator is a data testing automation platform that allows users to import source and target queries from an existing CSV file to quickly get started testing in ETLV. The CSV file must be in the proper format, with fields in a specific order, including parameters like select, connections, and file. Once imported, ETL Validator will automatically generate a "Query Test Case" for each row in the CSV to test the queries.
This document provides an overview of DbFit, which is a tool for database unit testing. It discusses what DbFit is, its key features for testing databases, how it works by connecting Fit tables to database fixtures, and examples of different DbFit fixtures like Query, Insert, Update, Execute, and ExecuteProcedure. Installation steps are also covered along with using a Wiki to write test scripts.
The document provides guidance on writing high-quality code and functions. It recommends that functions should have a single well-described purpose, use meaningful naming conventions, limit their size and number of parameters, and handle errors. It also emphasizes the importance of testing code through test cases to ensure it works as intended.
Data warehousing testing strategies cognosSandeep Mehta
The document describes a testing methodology for a data warehouse project. It will involve three phases: unit testing of ETL processes and validating data matches between source systems and the data warehouse; a conference room pilot where users can validate reports and test performance; and system integration testing where users test analytical reporting tools to answer business questions across multiple data sources.
The document discusses object-oriented analysis, design, and programming. It covers topics like use cases, conceptual models, classes, objects, encapsulation, inheritance, polymorphism, interfaces, and access modifiers. The analysis process involves modeling system objects and their interactions. Design refines the analysis models and introduces key concepts. Programming implements the design using languages like C# that support object-oriented principles.
IRE2014 Filtering Tweets Related to an entitykartik179
This document describes a project to filter tweets related to entities. The team used supervised machine learning with features extracted from tweets, entity homepages, and wikipedia pages to train an SVM model to classify tweets as related or unrelated to entities. The preprocessing removed user mentions, URLs, punctuation, and stop words before extracting features. Testing the model on 61 entities achieved an overall accuracy of 80% for classifying tweets, with accuracy for individual entities ranging from 96% to 40%.
Slides are created to demonstrate about ETL Testing, some one who want to start and learn ETL Tesing can make use of this ppt. It includes contents related all ETL Testing schema
Software design with Domain-driven design Allan Mangune
The document discusses domain-driven software design and related concepts. It begins with an overview of monolithic architecture and modular monoliths. It then covers IdentityServer4 for centralized authentication. Next, it defines domain-driven design and the utility of domain models. It discusses ingredients for effective modeling like prototyping and collaboration. It also covers bounded contexts, entities, value objects, and repositories. Finally, it provides tips for optimizing entities and database operations.
Test strategy utilising mc useful toolsMark Chappell
1) The document outlines a high level test strategy that involves layering the project under test and identifying components in each layer. It describes identifying test basis documentation, creating a dependency matrix, and formulating an overall test "big picture".
2) Test packs will be designed based on project layers, and key documentation will be stored in a repository to facilitate test coverage analysis. A dependency matrix and big picture diagram will guide regression test selection.
3) Tools like DocIndex, InternetMiner and VisioDecompositer are used to extract and store information from documents, web pages and diagrams to generate the test basis repository, and inform the dependency matrix and big picture diagram.
ETL Testing Services - Safeguard Your DataBugRaptors
ETL testing is done to ensure that the data that has been loaded from a source to the destination after business transformation is accurate. It also involves the verification of data at various middle stages that are being used between source and destination. To know more, visit our portfolio at www.bugraptors.com
This document discusses using Open Refine to clean and enhance library data. It describes the problem of messy, unstructured data that libraries often have. Open Refine was selected as the best tool to allow both novice and advanced users to clean titles, identifiers, and dates. Extensions were developed for Open Refine that integrate with the GOKb knowledge base and allow "round trip" data journeys between the two systems. The use of Open Refine provides automation, collaboration, and leverages existing skills while helping to manage distributed data cleaning activities.
NPQS provides data engineering testing services including:
- Testing of data integration, ETL processes, data warehouses, OLAP cubes, and reports across various database, ETL and reporting tools.
- A team of 70+ ISTQB certified QA engineers with extensive experience testing complex data warehouse architectures and over 10 BI products.
- Well defined testing methodologies and the use of test automation tools to test all phases from data integration and ETL to data warehousing to OLAP and reporting.
MySQL optimisations of Docplanner servicesTomasz Wójcik
I would like to show how small maintenance negligence have negatively impacted the scalability and stability of our applications. I also wanted to focus on database indexes which are often forgotten in application maintenance yet they have huge impact on query speeds.
The document discusses various agile tools including test-driven development, version control, continuous integration testing, web performance testing, and load testing. It provides an overview and instructions for using each tool. For test-driven development, it describes writing tests before code, running tests, and refactoring code. For version control, it discusses managing code revisions with Team Foundation Server. For continuous integration, it explains automatically running tests on each code check-in. For performance testing, it outlines recording and running web and load tests to analyze a system's behavior under different usage scenarios.
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and ChallengesBIOVIA
This document provides tips and tricks for using Pipeline Pilot, including how to use protocol search, favorites bar, tool tips, component profiling, design mode, protocol recovery, recursion vs looping, merge/join operations, debugging tips, and RTC subprotocols. It emphasizes best practices like avoiding loops and using recursion instead. Design mode and checkpoints are highlighted as useful debugging aids. Resources like training, support, and the user community are recommended for additional help.
Exciting Features for SQL Devs in SQL 2012Brij Mishra
SQL 2012 includes several new features for SQL developers including contained databases, columnstore indexes, sequence objects, data paging improvements, and new analytic functions like LEAD() and LAG(). It also enhances Transact-SQL with new conversion, date/time and logical functions and improves metadata discovery and error handling. Visual Studio integration is also improved with tighter management studio integration.
Annotation Sniffer is an Eclipse plugin that detects annotation bad smells. It downloads as a JAR file and detects bad smells in code annotations, which are a form of code. The plugin has been improved by making metrics, thresholds, and bad smells more extensible through annotations and external configuration instead of hardcoded values. These changes allow new metrics, thresholds, and bad smells to be added more easily.
Importing Queries using Mass Import ToolDatagaps Inc
ETL Validator is a data testing automation platform that allows users to import source and target queries from an existing CSV file to quickly get started testing in ETLV. The CSV file must be in the proper format, with fields in a specific order, including parameters like select, connections, and file. Once imported, ETL Validator will automatically generate a "Query Test Case" for each row in the CSV to test the queries.
This document provides an overview of DbFit, which is a tool for database unit testing. It discusses what DbFit is, its key features for testing databases, how it works by connecting Fit tables to database fixtures, and examples of different DbFit fixtures like Query, Insert, Update, Execute, and ExecuteProcedure. Installation steps are also covered along with using a Wiki to write test scripts.
The document provides guidance on writing high-quality code and functions. It recommends that functions should have a single well-described purpose, use meaningful naming conventions, limit their size and number of parameters, and handle errors. It also emphasizes the importance of testing code through test cases to ensure it works as intended.
Data warehousing testing strategies cognosSandeep Mehta
The document describes a testing methodology for a data warehouse project. It will involve three phases: unit testing of ETL processes and validating data matches between source systems and the data warehouse; a conference room pilot where users can validate reports and test performance; and system integration testing where users test analytical reporting tools to answer business questions across multiple data sources.
The document discusses object-oriented analysis, design, and programming. It covers topics like use cases, conceptual models, classes, objects, encapsulation, inheritance, polymorphism, interfaces, and access modifiers. The analysis process involves modeling system objects and their interactions. Design refines the analysis models and introduces key concepts. Programming implements the design using languages like C# that support object-oriented principles.
IRE2014 Filtering Tweets Related to an entitykartik179
This document describes a project to filter tweets related to entities. The team used supervised machine learning with features extracted from tweets, entity homepages, and wikipedia pages to train an SVM model to classify tweets as related or unrelated to entities. The preprocessing removed user mentions, URLs, punctuation, and stop words before extracting features. Testing the model on 61 entities achieved an overall accuracy of 80% for classifying tweets, with accuracy for individual entities ranging from 96% to 40%.
Slides are created to demonstrate about ETL Testing, some one who want to start and learn ETL Tesing can make use of this ppt. It includes contents related all ETL Testing schema
Software design with Domain-driven design Allan Mangune
The document discusses domain-driven software design and related concepts. It begins with an overview of monolithic architecture and modular monoliths. It then covers IdentityServer4 for centralized authentication. Next, it defines domain-driven design and the utility of domain models. It discusses ingredients for effective modeling like prototyping and collaboration. It also covers bounded contexts, entities, value objects, and repositories. Finally, it provides tips for optimizing entities and database operations.
Test strategy utilising mc useful toolsMark Chappell
1) The document outlines a high level test strategy that involves layering the project under test and identifying components in each layer. It describes identifying test basis documentation, creating a dependency matrix, and formulating an overall test "big picture".
2) Test packs will be designed based on project layers, and key documentation will be stored in a repository to facilitate test coverage analysis. A dependency matrix and big picture diagram will guide regression test selection.
3) Tools like DocIndex, InternetMiner and VisioDecompositer are used to extract and store information from documents, web pages and diagrams to generate the test basis repository, and inform the dependency matrix and big picture diagram.
ETL Testing Services - Safeguard Your DataBugRaptors
ETL testing is done to ensure that the data that has been loaded from a source to the destination after business transformation is accurate. It also involves the verification of data at various middle stages that are being used between source and destination. To know more, visit our portfolio at www.bugraptors.com
This document discusses using Open Refine to clean and enhance library data. It describes the problem of messy, unstructured data that libraries often have. Open Refine was selected as the best tool to allow both novice and advanced users to clean titles, identifiers, and dates. Extensions were developed for Open Refine that integrate with the GOKb knowledge base and allow "round trip" data journeys between the two systems. The use of Open Refine provides automation, collaboration, and leverages existing skills while helping to manage distributed data cleaning activities.
NPQS provides data engineering testing services including:
- Testing of data integration, ETL processes, data warehouses, OLAP cubes, and reports across various database, ETL and reporting tools.
- A team of 70+ ISTQB certified QA engineers with extensive experience testing complex data warehouse architectures and over 10 BI products.
- Well defined testing methodologies and the use of test automation tools to test all phases from data integration and ETL to data warehousing to OLAP and reporting.
This document discusses techniques for optimizing Power BI performance. It recommends tracing queries using DAX Studio to identify slow queries and refresh times. Tracing tools like SQL Profiler and log files can provide insights into issues occurring in the data sources, Power BI layer, and across the network. Focusing on optimization by addressing wait times through a scientific process can help resolve long-term performance problems.
Test Design and Automation for REST APIIvan Katunou
This document discusses test design and automation for REST API applications. It covers topics such as the characteristics of RESTful APIs, test design and coverage, and automation. For test design and coverage, it recommends testing endpoints, request methods, headers, bodies, response codes, headers, and bodies. It also discusses testing search, pagination, versioning, and more. For automation, it recommends tools like Postman, RestAssured, and using object mapping, builders, and JSON schema validation for test data. The presenter's contact information is provided at the end.
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...Apica
In this Q&A-style webinar, you'll learn:
1. How and why to load test at least three months prior to the holidays
2. How to integrate CI/CD into your holiday load testing
3. How to determine and evaluate load curves
This document provides an overview of API testing and web services protocols. It discusses XML, SOAP, REST, and introduces the tool SoapUI for testing web services. Key points include:
1. XML is used to transport and store data on the web. It has elements, attributes, and syntax rules. XML Namespaces avoid element name conflicts.
2. SOAP is a protocol for accessing web services. It uses XML, includes envelope, header and body elements. WSDL describes SOAP web services operations.
3. REST services use HTTP to manipulate resources via operations like GET, PUT, POST and DELETE. It can output JSON, XML and is language/platform independent.
4.
Finding ways to make ETL loads faster is not always obvious. Moreover, there is a difference in how to tune OLAP vs OLTP databases. Some of the techniques learned through years of tuning EBS seem to make no effect on tuning a BI ETL. This presentation will discuss why this is the case, present some techniques on how to find the bottlenecks in your BI ETL jobs and some techniques to tune these slow SQL statements, improving the speed of nightly ETL jobs. Attendees will learn the steps to monitor ETLs, capture Problem SQL and gain knowledge to improve the overall ETL Performance.
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.COMAQA.BY
Чем тестирование RESTful API сервисов схоже и чем отличается от тестирования других типов приложений? Какое покрытие тестами является достаточным? Какие лучшие практики существуют для автоматизированного тестирования REST API? Эти и другие темы будут раскрыты в рамках доклада.
This document discusses challenges and opportunities in automating testing for data warehouses and BI systems. It notes that while BI projects have adopted agile methodologies, testing has not. Large and diverse data volumes make testing nearly infinite test cases difficult. It proposes a testing lifecycle and V-model for BI systems. Automating complex functional tests, SQL validation, reconciliation, and test data generation can help address challenges by shortening regression cycles and enabling continuous testing. Various automation tools are discussed, including how they can validate ETL processes and reporting integrity. Automation can help complete testing and ensure data quality, compliance, and performance.
ETL is a process that extracts data from multiple sources, transforms it to fit operational needs, and loads it into a data warehouse or other destination system. It migrates, converts, and transforms data to make it accessible for business analysis. The ETL process extracts raw data, transforms it by cleaning, consolidating, and formatting the data, and loads the transformed data into the target data warehouse or data marts.
Creating a Data validation and Testing StrategyRTTS
This document discusses strategies for creating an effective data validation and testing process. It provides examples of common data issues found during testing such as missing data, wrong translations, and duplicate records. Solutions discussed include identifying important test points, reviewing data mappings, developing automated and manual testing approaches, and assessing how much data needs validation. The presentation also includes a case study of a company that improved its process by centralizing documentation, improving communication, and automating more of its testing.
A method of communicating between two devices
A software function provided at a network address over the web with the service always on
It has an interface described in a machine-processable format
http://www.qualitestgroup.com/
This document provides an overview of Apex and Force.com development. It covers Apex language basics, data types, collections, exceptions, asynchronous execution, database integration, triggers, debugging, limits, and unit testing. Key topics include the similarities between Apex and Java, SOQL, DML statements, polymorphism in Apex, and the requirements to deploy code changes to production.
The document discusses testing for a data warehouse. It describes requirements testing to validate requirements, unit testing of ETL procedures and mappings, and integration testing of ETL job sequences and initial data loading. Integration testing also covers end-to-end scenarios like count validation, source isolation, and data quality checks. Report data is validated by verifying it against source data. User acceptance testing tests the full system functionality. Continuous testing is needed as data warehouse schema and data evolve over time.
- The document discusses understanding system performance and knowing when it's time for a system tune-up. It covers monitoring tools like DBQL and Viewpoint, establishing performance baselines, using real-time alerts, and examining growth patterns.
- It emphasizes the importance of regular benchmarks to compare performance over time, especially before and after upgrades. Successful benchmarks require consistency in data, queries, indexing, and concurrency levels.
- The document outlines various aspects of performance tuning like query tuning, load techniques, compression, and utilizing new database features. It stresses automating processes and educating developers on database technologies.
Load Testing Best Practices: Application complexity is increasing, yet the stringent requirements for web performance is increasing exponentially. Learn more about the three major types of load testing, determine which you need and how to conduct them.
The document discusses testing processes for data warehouses, including requirements testing, unit testing, integration testing, and user acceptance testing. It describes validating that requirements are complete and testable. Unit testing checks ETL procedures and mappings. Integration testing verifies initial and incremental loads as well as error handling. Integration testing scenarios include count validation, source isolation, and data quality checks. User acceptance testing tests full functionality for production use.
Similar to Data Pipeline Installation Quality (20)
User Case of Migration from MicroStrategy to Power BIGreenM
На Data Monsters Руслан Золотухин, Power BI Trainer & Consultant, описал юзкейсы при переносе отчетов из MicroStrategy в Power BI, рассказал о миграционной стратегии и функциях MicroStrategy, которые пугают всех BI девелоперов.
The document provides an overview of MicroStrategy and Tableau business intelligence software products. It discusses their positioning in Gartner's Magic Quadrant, new features for 2020, licensing and pricing models, and examples of how each product could be used based on capabilities like visualizations, drill options, security settings, scheduling, and deployment process. The document aims to compare the two platforms to help understand their differences.
На вебинаре Data Monsters Stayed Home Тарас Ярощук, Senior Data Engineer в компании Sigma Software, в своем докладе простым и понятным языком объяснил, что такое вероятностные структуры данных (Probability Data Structures), для чего нужны, какие бывают и на какие вопросы отвечают.
Макс Мушкин в докладе Snorkeling in Azure Data Lake на примере эволюции проекта по потоковой обработке данных из мира IoT рассмотрит решения на основе сервисов Azure, расскажет об их возможностях, ограничениях и альтернативах, а также коснется тенденций и видения Microsoft.
Meet Ilya Savelyev - Software Architect at GreenM company.
Ilya will talk about the dynamics in big data. About when the traditional ETL will no longer be much help, but it’s still far from streaming. During the lecture, we will look at working with analytical and operational data warehouses, try serverless treats, touch Elasticsearch and so much more.
Slides of the meetup "DAX as Power BI Visualization Weapon" taught by Ruslan Zolotukhin (Business Intelligence Engineer at Akvelon Inc., Kharkiv Power BI User Group leader, Advisor at community.powerbi.com, DAX couch) at Data Monsters.
Got a Data project in mind? Take our short partnership survey and, together, we can develop the perfect solution to maximize the return on your investment in data. Contact us here: https://greenm.io/
Be ready to big data challenges.
The material was composed based on the performance of Leonid Sokolov, Big Data Architect from GreenM.
Full article https://medium.com/greenm/scalable-data-pipeline-f5d3c8f7a6d9
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
6. New Pipeline Version
Regression Testing
Non-Functional Testing
• Same Sources & Targets
• Same Transformation Rules
• Previous fully tested version of
ETL available
7. Regression via Reference Data Schema
• Exclude
• Tracking fields
• New functionality Data
• Clean up Test Schema
• Run Smoke suite first SOURCE
TESTED
TARGET
REFERENCE
TARGET
NEW ETL
VERSION
PROD ETL
VERSION
Regression Testing
8. FitNesse for ETL Regression
• Config files
• Connections
• Tab parameters
• Fixtures
• Non-empty tab
• No duplicates
• Counts match
• Content match
Regression Testing
12. Set the Limits!
• “Partial” run & Extract re-using
• Limit compared data
• Set timeout in tests
• Model missing data
Extract Transform Load
Regression Testing: Challenges
13. Take Care about Production Support Group
or
Non-functional ETL Testing
17. Reliability Testing Challenges
Hidden Risks Underestimation of severity
Dependency on 3d party services Underestimation of probability
Communication gaps
Non-Functional Testing: Challenges
18. Be Informed!
• Monitor Services Logs
• Organize Recovery Training
• Be specific with to-do’s
Non-Functional Testing: Challenges
22. Data Warehouse Testing
Extract Transform Load
SOURCE
TARGET
Test Underlying Data
Test Data Model
Balancing Tests
Data Quality Tests
Smoke Tests
Balancing Tests
Balancing Tests
23. Test Underlying Data
1. Gather info – bridge gaps!
2. Break rules that can be broken
3. Draft a Troubleshooting doc
Source Area Testing
24. Test Target Data Model
1. Naming convention
2. Optimal base for Visualization
3. Testability checks
Data Mart Structure Testing
25. Functional ETL Testing
• Smoke Tests
• Target Data Quality tests:
• Type
• Constraint
• Data Plausibility
• Logical Constraints
! Create similar / relevant tests where applicable for Source to help with further debugging
Functional ETL Testing
26. Functional ETL Testing
• Balancing Tests:
• Study/ Create Specification
• Test Minus Queries Assertions
via mutated data
• Do both-sides comparison
Functional ETL Testing
28. Most Common bugs
• Count Mismatch (incl. Duplicates)
• Anomalies issues: Null or Length relevant
• Date relevant calculations
Functional ETL Testing
29. ETL Testing Challenges
• Tests Complexity
• Unpredictable slow work of AWS Athena
• Impossible to check each single record
Functional ETL Testing
30. Visualization in Data QA
• Source Data Analysis
• Target Quality
Dashboard
• Dedicated resources
& Test Results
visualization
Functional ETL Testing
31. Ongoing Support
• Data Integrity Project
• Ongoing Logs Analysis
• Monitoring Rules &
Alarms
Testing in Production
Data Pipeline
32. Key Takeaways
• ETL verification is not that bad
• Know your data
• Be ready to meet Monsters
• Long ETL duration
• Big Data Volume
• Difference of Test Data from Prod