Maintaining a data historization is a very common but time consuming task in a data warehouse environment. The common techniques used involve outer joins and some kind of change detection. This change detection must be done with respect of Null-values and is possibly the most trickiest part. But, on the other hand, SQL offers standard functionality with exactly desired behaviour: Group By or Partitioning with analytic functions. Can it be used for this task?
How to Use KENPAVE Software to find out the tensile and radial stresses acting due to wheel load.
KENPAVE Software used to analyse pavement design to check whether pavement is failed in Rutting or Fatigue due to repetitive cycles of loading.
Approximation of large Matern covariance functions in the H-matrix format. We computed relative errors in spectral, Frobenius norms as well as the Kullback-Leibler divergence. Storage and computational costs are drastically reduced.
How to Use KENPAVE Software to find out the tensile and radial stresses acting due to wheel load.
KENPAVE Software used to analyse pavement design to check whether pavement is failed in Rutting or Fatigue due to repetitive cycles of loading.
Approximation of large Matern covariance functions in the H-matrix format. We computed relative errors in spectral, Frobenius norms as well as the Kullback-Leibler divergence. Storage and computational costs are drastically reduced.
AutoLISP is a dialect of the LISP programming language built specially to use with AutoCAD and its derivatives. It is a subset of the LISP (List Processor) programming language, which is used in bids of artificial intelligence and expert systems. Many functions have been added to the LISP program in order to interface AutoLISP directly to AutoCAD, and you will see that some AutoCAD commands have been retained as AutoLISP functions. Flange Coupling is also a simple type of coupling than others. Here it consists of two flanges one keyed to the driving shaft and the other two the driven shaft. The two flanges are connected with the help of four or six bolts arranged in a concentric circle .In this thesis a flange coupling model is designed with simple programming language. Initially, transmitting power depending on the application is taken as the input for the generating various dimensions of the coupling.
AutoLISP is a dialect of the LISP programming language built specially to use with AutoCAD and its derivatives. It is a subset of the LISP (List Processor) programming language, which is used in bids of artificial intelligence and expert systems. Many functions have been added to the LISP program in order to interface AutoLISP directly to AutoCAD, and you will see that some AutoCAD commands have been retained as AutoLISP functions. Flange Coupling is also a simple type of coupling than others. Here it consists of two flanges one keyed to the driving shaft and the other two the driven shaft. The two flanges are connected with the help of four or six bolts arranged in a concentric circle .In this thesis a flange coupling model is designed with simple programming language. Initially, transmitting power depending on the application is taken as the input for the generating various dimensions of the coupling.
SQL offers many powerful techniques for analyzing your data out of the box, but being also extendable if you are still missing something. Now much more easier in 18c with polymorphic table functions (PTF). As an evolution of table functions, PTF is invoked in the FROM clause and is capable to encapsulate the custom processing of the input data, whereas the input row type does not have to be known at design time and the output row type may first be determined by the actual PTF invocation parameters. This session will give you an introduction based on simple examples. Discover how you can develop your own flexible and self-describing extensions while focusing on business logic and leaving complex things like parallel execution to the database.
It is no secret that for high-performance ETL processes, not only queries but also write operations should be parallelized.
But when you make use of it, is it simply "switch on and forget"? What do you have to consider? Can it also have negative effects?
After a short reminder on how it works (including space management methods), some patterns are presented that have been noticed in several ETL review and tuning projects and help to find the answers to the following questions:
What is the interaction between PDML and partitioning of the target table? Can PDML lead to increased fragmentation of the tablespace? Can you control it? How does the Hint PQ_DISTRIBUTE help? Do indexes on the target table have any influence?
In Oracle 18c, SQL and PL/SQL work even more closely together. A good example are the Polymorphic Table Functions (PTF). With PTF, which are part of the ANSI SQL 2016 standard, you get a powerful and flexible tool to extend existing analytical capabilities of SQL. The basics of the new functionality is discussed as well as the implementation details shown by concrete examples.
Online Statistics Gathering for Bulk Loads - the official name of the feature - was introduced in Oracle 12.1. The idea is to gather optimizer statistics "on the fly" for direct path loads. Sounds good for ETL? In certain scenarios it makes sense but even then there are many points to consider so that it becomes a reliable part of your ETL processes. When exactly will it be working and when not? Do you prevent it yourself? Documented, undocumented cases, known bugs. Which statistics are gathered and which are not? What has to be considered with partitioned tables? Interval partitioning - special case?
Introduced in Oracle Database 12c, the new MATCH_RECOGNIZE clause allows pattern matching across rows and is often associated with Big Data, complex event processing, etc. Should SQL developers who are not (yet) faced with such tasks ignore it? No way! The new feature is powerful enough to simplify a lot of day-to-day tasks and to solve them in a new, simple and efficient way. The insight into a new syntax is given based on common examples, as finding gaps, merging temporal intervals or grouping on fuzzy criteria. Providing more straightforward approach for solving known problems, the new functionality is worth to be a part of every developer’s toolbox.
It is obvious that for bulk data processing performance is the key factor. It often means balancing well structured, maintainable, reusable and high-performance code.
Even though there are more features and optimizations that support bulk processing with each release of PL/SQL, the "Pure SQL" approach often leads to better performance.
Best practices, tips and tricks. How do I develop a complex SQL? How can Subquery Factoring help to increase readability? How does this help to test complex SQLs? What is a Row Generator? How do I change the cardinality of the original data set?
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
2. Unser Unternehmen.
Trivadis DOAG17: SCD2 mal anders2 29.11.2018
Trivadis ist führend bei der IT-Beratung, der Systemintegration, dem Solution
Engineering und der Erbringung von IT-Services mit Fokussierung auf -
und -Technologien in der Schweiz, Deutschland, Österreich und
Dänemark. Trivadis erbringt ihre Leistungen aus den strategischen Geschäftsfeldern:
Trivadis Services übernimmt den korrespondierenden Betrieb Ihrer IT Systeme.
B E T R I E B
3. KOPENHAGEN
MÜNCHEN
LAUSANNE
BERN
ZÜRICH
BRUGG
GENF
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
WIEN
Mit über 600 IT- und Fachexperten bei Ihnen vor Ort.
Trivadis DOAG17: SCD2 mal anders3 29.11.2018
14 Trivadis Niederlassungen mit
über 600 Mitarbeitenden.
Über 200 Service Level Agreements.
Mehr als 4'000 Trainingsteilnehmer.
Forschungs- und Entwicklungsbudget:
CHF 5.0 Mio. / EUR 4.0 Mio.
Finanziell unabhängig und
nachhaltig profitabel.
Erfahrung aus mehr als 1'900 Projekten
pro Jahr bei über 800 Kunden.
4. Über mich
Trivadis DOAG17: SCD2 mal anders4 29.11.2018
Senior Consultant bei der Trivadis GmbH, Düsseldorf
Schwerpunkt Oracle
– Data Warehousing
– Application Development
– Application Performance
Kurs-Referent „Oracle 12c New Features für Entwickler“
und „TechnoCircle Oracle 12c Release 2“
Blog: http://blog.sqlora.com
5. Agenda
Trivadis DOAG17: SCD2 mal anders5 29.11.2018
1. Introduction and state of the art
2. The „new“ approach
3. Use cases and performance
4. Conclusion
7. Introduction
Trivadis DOAG17: SCD2 mal anders7 29.11.2018
Historization? As a part of loading process in a data warehouse
We consider Slowly Changing Dimensions Type II
All changes are completely tracked. The change in at least one of the tracked
columns toggles the creation of the new version record
The most challenging task is the change detection
DWH_KEY VALID_FROM VALID_TO CUR_VERSION ETL_OP BUS_KEY FIRST_NAME SECOND_NAMES LAST_NAME HIRE_DATE FIRE_DATE SALARY
1 01.12.2016 02.12.2016 N UPD 123 Roger Federer 01.01.2010 900000
11 03.12.2016 Y INS 123 Roger Federer 01.01.2010 920000
6 02.12.2016 02.12.2016 N UPD 345 Venus Williams 01.11.2016 500000
10 03.12.2016 Y INS 345 Venus Williams 01.11.2016 01.12.2016 500000
2 01.12.2016 02.12.2016 N UPD 456 Rafael Nadal 01.05.2009 720000
3 01.12.2016 01.12.2016 N UPD 789 Serena Williams 01.06.2008 650000
5 02.12.2016 Y INS 789 Serena Jameka Williams 01.06.2008 650000
8. State of the Art
Trivadis DOAG17: SCD2 mal anders8 29.11.2018
Typical OWB mapping
9. BK_T C1_T C2_T
11 A BB
22 D E
77 M N
33 F G
State of the Art
Trivadis DOAG17: SCD2 mal anders9 29.11.2018
BK C1 C2
11 A B
22 D E
44 K L
77 M
BK C1 C2
11 A BB
22 D E
33 F G
77 M N
BK_S C1_S C2_S
11 A B
22 D E
44 K L
77 M
NVL(C2_S,'(NULL)') != NVL(C2_T,'(NULL)')
LNNVL(C2_S = C2_T) AND NVL(C2_S, C2_T) IS NOT NULL
DECODE, STANDARD_HASH, SYS_OP_MAP_NONNULL …
Full
Outer
Join
Change
Detection?
Old
Versions
New
Versions
Old
New
Target
Source
Target
Split
UNION ALL
MERGE
More on delta detection: https://danischnider.wordpress.com/2016/10/08/delta-detection-in-oracle-sql/
Data to the left has to be
accessed twice!
10. State of the Art
Trivadis DOAG17: SCD2 mal anders10 29.11.2018
Change detection must be done with respect to null values
Comparing each and every column in a complex way
Or maintaining and comparing hash-diffs: common rules needed, re-hashing after
structural changes sometimes needed
Full outer join may be expensive if not working with „deltas“
Splitting the join result into two data sets causes this join to be made twice
Another
solution?
12. The „new“ approach
Trivadis DOAG17: SCD2 mal anders12 29.11.2018
The „new“ approach is not really new
Oft used for ad hoc queries
Are these two records different?
Using Group BY
BK C1 C2 C3 C4 … … C467 C468 C469
11 A B C D … … AA BB CC
11 A B C D … … AB BB CC
SELECT COUNT(*)
FROM t
GROUP BY BK, C1, C2, C3, C4, … C467, C468, C469
13. The „new“ approach
Trivadis DOAG17: SCD2 mal anders13 29.11.2018
Or using analytical function:
If count equals 2 – they are the same
If count equals 1 – they are different
For GROUP BY and PARTITION BY:
NULL=NULL, VALUE!=NULL
SELECT COUNT(*) OVER (PARTITION BY BK, C1, C2, C3, … C468, C469)
FROM t;
But what
about NULLs?
14. BK C1 C2
11 A BB
33 F G
77 M N
S_T BK C1 C2
T 11 A BB
T 22 D E
T 33 F G
T 77 M N
The „new“ approach
Trivadis DOAG17: SCD2 mal anders14 29.11.2018
BK C1 C2
11 A B
22 D E
44 K L
77 M
BK C1 C2
11 A BB
22 D E
33 F G
77 M N
UNION ALL
Target
Source
Target
GROUP BY MERGE
S_T BK C1 C2
S 11 A B
S 22 D E
S 44 K L
S 77 M
MIN
(S_T)
S
S
S
S
T
T
T
DEMO!
BK C1 C2
11 A B
22 D E
44 K L
77 M
CNT
1
2
1
1
1
1
1
17. Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders17 29.11.2018
Source
Older
Versions
Full Data
Current
VersionsJOIN
may be
slow
Filter
may be
slow
Partitio-
ning?
Target
Full Data Load
Full Data
Current
Versions
Group By
may be
slow
UNION ALLLegacy New
18. Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders18 29.11.2018
Source
Delta
JOIN Filter
may be
slow
Partitio-
ning?
Older
Versions
Current
Versions
Target
Delta Load
Delta
Current
Versions
Group By
may be
slow
UNION ALLLegacy New
19. Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders19 29.11.2018
Source
Older
Versions
Delta
Current
Versions
JOIN
Filter
Business_key
IN …
Target
Delta Load with pre-filter
Delta
Current Ver-
sions (filtered)
Group By
fast
UNION ALLLegacy New
20. Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders20 29.11.2018
Data Warehouse with Siebel-CRM as a source
Order table S_ORDER – 120 columns „only“
Comparing legacy approach vs. GROUP BY vs. analytical functions
Full staging table as a source vs. delta (with or without pre-filtering)
Ca. 6 Mio rows in the target table
Ca. 3 Mio rows in the full load dataset
Ca. 3000 rows in the delta load dataset
21. Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders21 29.11.2018
Method Delta Load, min Full Load, min
Outer Join (legacy approach) 0:09 0:41
GROUP BY 1:10 1:04
GROUP BY with pre-filter 0:04 N/A
Analytic Function 2:12 4:52
Analytic with pre-filter 0:12 N/A
23. Legacy New
Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders23 29.11.2018
Source
Older
Versions
Current
Versions
Core
Current
Versions
Dim
JOIN
may be
slow
Filter
may be
slow
Partitio-
ning?
Target
Loading Dimensions from Core
Current
Versions
Core
Current
Versions
Dim
Group By
may be
slow
UNION ALL
Older
Versions
24. Legacy New
Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders24 29.11.2018
Source is a View
Older
Versions
Current
VersionsJOIN
may be
slow
Filter
may be
slow
Partitio-
ning?
Target
Loading Dimensions from Core
Full Data
Current
Versions
Group By
may be
slow
UNION ALL
25. Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders25 29.11.2018
Loading of a dimension via view
The view joins some „big“ tables (50 Gb, 40+ Mio rows)
And produces < 500 dimension records per day
The loading time could be reduced by 45 percent (3 min 50 sec → 2 min)
26. Conclusion
Trivadis DOAG17: SCD2 mal anders26 29.11.2018
It is simpler and faster in certain cases
The source is queried only once, can be significant if the source is a view
The code can be simply generated
Simple to build even without generation (only a plain list of columns to Copy&Paste)
It‘s worth to do an ad hoc testing with your data
Test it!
28. Trivadis @ DOAG 2017
#opencompany
Stand: 3ter Stock, direkt an der Rolltreppe
Wir teilen unser Know how!
Einfach vorbei kommen, Live-Präsentationen
und Dokumentenarchiv
T-Shirts, Gewinnspiel und mehr
Wir freuen uns wenn Sie vorbei schauen
29.11.2018 Trivadis DOAG17: SCD2 mal anders29