1. Welcome!
This presentation is a collection of topics around the challenges of sharing and
moving data between different types of relational databases.
Often there is a need to move or share data between different repositories and
companies often find this difficult because they underestimate the effort required
as it seems a simple process on the surface, or they don’t fully understand the
capabilities and limitations of the platforms and tools they are using.
A want to talk a bit about the SQL language, in particular, the SQL Standards that
have made inter-operability easier between RDBMS but is also a source of
confusion in the interpretation and implementation of those standards.
I will discuss some of the options available when moving or sharing data between
different types of databases.
Migration tools, replication tools, and access tools.
And I want to spend a bit of time talking about Oracle’s Heterogeneous Services
which is an area of Oracle functionality not often discussed and, I suspect,
relatively unfamiliar to a lot of you.
1
2. Here’s the “me” slide.
I started life in Oracle Pre-Sales before becoming an independent contractor doing
mostly operational DBA work.
Then I spent a few years designing and building RAC systems but for the last 3 or 4
years I’ve been specialising in logical replication within the Oracle area.
And as you can see, I like fishing from a kayak.
2
3. Cloudification is our way of cutting through the hype and confusion that businesses
commonly associate with moving their infrastructure and services into the Cloud.
We do this by focusing on the key issues that they face with their projects, and by
putting things in plain English for them.
Let’s have a quick look at the what, why, and who questions around heterogeneous
data.
3
4. So, the dictionary definition of the word heterogeneous which is something like
“not of the same kind or type”
doesn’t quite match with the word as it’s commonly used in computing.
In computing, it normally refers to a difference in architecture of the same type of
thing, be it hardware like processor, memory bus, etc,
or software like different types of relational database management systems.
4
5. A brief definition of Heterogeneous Data for the purposes of this presentation.
As I said, this topic brings together a number of different areas of interest but as a
one-liner, what I’ll be talking about is the accessing, moving, copying, or
synchronising of data between Oracle and non-Oracle SQL databases. I will
mention briefly some of the reasons for, and challenges of, re-platforming an
application’s data repository.
Accessing Data – Ability to remotely read, and possibly change, data in a non-
Oracle repository.
Moving Data – So Migrations, re-platforming. These are essentially “one-off” style
operations where, at the end of it, there is still only one copy of the data.
Copying Data – So this is essentially a static copy with batch updates for reasons of
accessibility, reliability, and availability.
Potentially a “one-off” style operation but more likely an on-going batch style
refresh process.
Synchronising Data – So this is essentially a dynamic copy. Keeping a copy of the
data in-sync with the original by applying any changes to the copy. This is typically a
continuous, or near continuous, process.
5
6. So basically, it’s “Oracle to non-Oracle RDBMS” or “non-Oracle RDBMS to Oracle”
What I won’t be talking about is moving data between other data stores like
noSQL or Haadoop style. That’s a whole other topic and too much to cover in one
presentation.
5
7. Why is Heterogeneous Data relevant?
Hands up who has more than one type of data repository in your organisation?
If you work for a medium to large business, there is a very good chance you will
have more than one, and probably more than one type of RDBMS.
• One of the use-cases I’m seeing is businesses are wanting to replicate their
data from their operational data store, typically Oracle to Microsoft SQL Server
to use it’s Reporting and/or Analysis Services for BI reporting.
• Another one is the re-platforming of an existing data repository by migrating it
to another database platform for various reasons. This may require Change
Data Capture (CDC) techniques to minimise the disruption during the exercise.
• And often, replicating data to a less expensive platform, like Oracle to MySQL
for example, and then directing the reporting and/or query requirements of the
application to the read-only MySQL copy is seen as a way to reduce the load on
the production system and extend it’s useful lifetime. However, as I’ll try to
point out, this is often not as easy as it seems.
6
8. I think everyone could benefit from at least a little knowledge of the capabilities
and limitations heterogeneous data access.
• Business Owners need to know they have options. Having data trapped in a
proprietary or “one of a kind” repository could be hurting your business. “Yeah,
we have that information, but it’s part of the ABC software package we
purchased a few years ago and it runs on a XYZ database, all our other stuff is
Oracle.”
• Architects/Developers needs to know the options. If you don’t know all the
alternatives, how can you recommend the best option? Is Goldengate really the
best data replication solution for your requirements? Maybe Dbvisit Replicate
would have been a much more cost effective option, or maybe Active Data
Guard would be a better fit?
• DBA/Operators need to know capabilities and limitations of the chosen
solution. “The third party ODBC driver cost additional money but it was 3 times
faster than the included driver.”
7
9. If you’re anything like me who has spent most of their career working with only
one type of database and looking down my nose at anyone using anything else, it
was a bit of an attitude adjustment to find companies making successful use of
something other than Oracle!
According to db-engines.com, which is a site that keeps track of DBMS and
publishes a “Popularity” ranking.
This is not a scientific measurement, rather it’s a ranking based on current activity
in social media and internet.
Number of mentions of the systems on websites using Google and Bing.
General interest in the system using Google Trends
Frequency of technical discussions about the system from Stack Overflow and DBA
Stack Exchange.
Number of job offers, in which the system is mentioned on international job sites
like Indeed and Simply Hired.
Number of profiles in professional networks, in which the system is mentioned like
Linkedin.
Relevance in social networks like Twitter.
As you can see the top three clearly distinguish themselves.
Note that vertical scale is logarithmic so the top three players, namely Oracle,
MySQL, and SQL Server are streets ahead of the others.
8
10. So it's not a count of the installed base of each RDBMS, but it's probably better
than that, it offers an early indicator as to the trending direction of these products.
So if you're looking for your next RDBMS, this is the info you need.
8
11. These rankings do include non-relational database management systems.
I think MongoDB is classified as a NoSQL database
Although NoSQL doesn’t mean “No SQL” it means “Not Only SQL”.
Most of the NoSQL databases out there today are more accurately “No Relational”
than “No SQL”.
The split between commercial and open source stands at about 1/3 open source
and 2/3 commercial.
So, what’s the big deal? The Relational Database Management Systems all use SQL
standard as their language, so how hard can it be to move data and/or applications
between them?
9
12. Quick show of hands. Who has written SQL? And who has written SQL with
consideration given to the SQL standard? Not many, if any.
SQL became an ANSI standard in 1986 and an ISO standard in 1987. Since then, the
standard has been enhanced several times with added features.
Despite these standards, code is not completely portable among different database
systems.
The different makers do not perfectly adhere to the standard, for instance by
adding extensions, and the standard itself is sometimes ambiguous.
There has been 7 revisions to the SQL standard since the SQL-86 ANSI standard.
The most significant was the second revision in 1992 (SQL-92) where it's entry level
standard was adopted as FIPS 127-2. Federal Information Processing Standards
(FIPS)
This was significant because up until 1996, there was an independent body,
the National Institute of Standards and Technology (NIST) that used to certify SQL
DBMS compliance with the current SQL standard, but they stopped doing this in
1996.
So the next release of SQL in 1999 (SQL-99) was the first release of the SQL
standard where the database vendors self-certified their compliance against the
standard, and for the last 18 years, through 4 more revisions to the SQL standard,
10
13. database vendors have been self-certifying the compliance of their products with
the latest SQL standard.
If you really want your own copy of the latest SQL standard, SQL-2011, it’s
available from standards.co.nz but it comes in something like 13 parts and is not
light reading. Each part costs about $250 so you’re looking at about over $3000 for
your own copy.
So, how different are the SQL based RDBMSs? Let’s take a look at a few
comparisons and I think you’ll get the idea.
10
14. First off, each RDBMS typically has very different internals, Database vendors are
free to do whatever they like with things that aren’t covered by any standard, or
indeed, interpret and implement and part of a standard as they see fit.
So what we get are SQL based RDBMS that are very different under the hood.
Concurrency (locking) models are very different and will affect the application in
periods of high concurrency.
Some databases may not flag a certain condition as an error while others will. In
fact you may have noticed during upgrade testing of Oracle that certain conditions
that weren’t considered an error are now raised as one due to Oracle tightening up
on it’s error checking from version to version.
I should note that there’s not a single database that follows the SQL standard
100%. Oracle, SQL Server, MySQL, DB2 and others, each claim certain levels of
support for the standard, but as you have seen with multiple versions of the
standard and self certification, even that statement is open to interpretation.
11
15. Not all databases implement all the standard SQL datatypes, and if they do, they
are often not the same.
By way of example, I’d like to look at a very simple datatype, that is, the CHAR
datatype.
The CHAR datatype, as you probably know, it a fixed length string datatype and is a
core SQL standard datatype.
I want to look at the two CHAR requirements, as specified by the SQL standard and
see how three of the leading RDBMS have implemented them.
12
16. So as you can see, even simple requirements for a basic datatype are not always
implemented consistently.
13
17. I don’t want to bore you with endless examples, so I’ve chosen one simple function
from the SQL standard to highlight the point.
String concatenation is a core function of the SQL standard and is done using the ||
operator with one of the rules that if any argument string is NULL then the
resulting concatenated string is NULL.
And speaking of NULLs…
14
18. So we all know what NULLs are, right?
NULLS support the representation of "missing or inapplicable information".
In SQL, NULL is a state (unknown) and not a value.
Misunderstanding of how NULLs work is the cause of a great number of errors in
SQL code.
These mistakes are usually the result of confusion between NULL and either 0
(zero) or an empty string, which is a string value with a length of zero.
NULL is defined by the ISO SQL standard as different from both an empty string and
the numerical value 0, however and while NULL indicates the absence of a value,
the empty string and numerical zero both represent actual values. And I think
that’s the source of most of the confusion.
Let’s look at how some of the RDBMS handle just one aspect of NULL processing.
That is, where do NULLs sit when sorted column containing NULLS.
But first, let’s look at what does the SQL standard has to say about this?
The core standard doesn’t explicitly define a default sort order for NULLs but in a
2003 optional extension, NULLs can be sorted using the NULLS FIRST or NULLS
LAST addition to the ORDER BY clause, but not all vendors have implemented this.
Nulls are ordered differently in Oracle compared with SQL Server or MySQL.
So, depending on how your SQL statements are written, they could produce a
15
19. different output if you executed the same (valid) SQL on Oracle or SQL Server.
PostgreSQL is different again by the way. (orders NULLs higher than non-NULL
values and allows the standard NULLS FIRST or NULLS LAST clauses)
15
20. So, how did a supposed standard become to be so different across vendors
implementing and supporting the standard?
• The complexity and size of the SQL standard means that most implementers do
not support the entire standard.
• The standard doesn’t specify database behaviour in several important areas
(e.g. indexes, file storage...), leaving the database vendors to decide how it
should behave.
• The SQL standard precisely specifies the syntax that a conforming database
system must implement. However, the standard's specification of the
semantics of language constructs is less well-defined, leading to ambiguity.
• Many database vendors have large existing customer bases; where the newer
version of the SQL standard conflicts with the prior behaviour of the vendor's
database, the vendor may be unwilling to break this backward compatibility.
• There is little commercial incentive for vendors to make it easier for users to
change database suppliers.
• Users evaluating database software tend to place other factors such as
performance higher in their priorities rather than compliance with standards.
16
21. So, how can you guard against your application issuing “non-standard” SQL?
I want to introduce you to what I am confidently calling “The most useless piece of
functionality in Oracle”.
16
22. Trouble is, Oracle supports numerous features that extend beyond what they call
standard SQL.
According to the Oracle manual, and this is a quote, “If you are concerned with the
portability of your applications to other implementations of SQL, then use Oracle's
FIPS Flagger to help identify the use of Oracle extensions to SQL92.”
FIPS, by the way, stands for Federal Information Processing Standard. It’s an
American standard developed by the US Federal government and they are usually
the same or slightly modified versions of ANSI, IEEE, or ISO standards.
The FLAGGER parameter specifies FIPS flagging, which causes an error message to
be generated when a SQL statement issued is an extension of the Entry Level of
SQL-92, which is a standard that has been superseded by SQL2008 (but there is no
FIPS certification for SQL2008).
FLAGGER is a session level parameter only. You can’t set it at the database level,
and why would you want to anyway?
17
23. So, what happens when you set the fips FLAGGER?
Here’s a simple test with a very basic table.
So I create the table,
then set the session level FLAGGER
and try a very simple SQL statement.
Now I’ve tried to make sense of the error message but the answer must be buried
in the SQL standard and I’m not stumping up $3k to find out.
18
24. But it gets even weirder.
With the fips FLAGGER set, here’s a select using a NUMBER column and a numeric
digit, and it works!
But try an inequality match with != and it errors, telling you in the error message to
try <> instead.
But when you try that form of inequality, it still says that function is not part of the
ANSI standard!
19
25. Oh, and I have to show you this.
Here’s what happens when you set the fips FLAGGER before creating the table.
So it seems that the fip FLAGGER is either broken or so restrictive that is appears to
be broken.
In the end, the FIPS 127-2 is 22 years old, is based on a version of SQL that is 5
versions old.
The last version of Oracle that complied with FIPS 127-2 was (probably) Oracle 7.
The standards body that certified compliance with the standard stopped 18 years
ago.
While SQL-92 has been superseded by other releases, there has been no
conformance testing authority for any version of SQL since SQL-92; hence, Entry
SQL-92 offers you the most assurance of portability. But appears to be broken and
is practically useless.
To be fair, Oracle had to include the fips FLAGGER in the code as part of their
compliance with FIPS.
(You’ve paid good money for all those neat Oracle features. Use them!)
20
26. But an RDBMS is more than just datatypes and functions and it’s ability to execute
SQL.
There are a raft of other considerations if you are considering re-platforming your
application to another database.
In fact, depending on the application, often the data migration is one of the easier
tasks.
Much more difficult is the migration of things like stored code (PL/SQL), Security
and access (users, privileges)
There are tools available to help to with re-platforming.
SSMA does a reasonable job if you’re moving from Oracle to SQL Server.
Oracle’s SQL Developer (apparently) does a reasonable job at migrating a selection
of common RDBMS to Oracle, and it’s free.
21
27. So, just to wrap up this whole SQL standard thing.
Don’t you love it when some consultant answers your question with the “It
depends” answer?
• No, because it frequently changes, is ambiguous in places, contains many
optional parts and no database vendor follows the standard 100%
• Yes, because it gives us, at the very least, a framework or common ground.
Standards promote a common skill set amongst IT professionals.
SQL’s a standard, but it’s a loose one at best. It’s useful for what it is, but don’t
make assumptions that it
It’s not a paved highway between different types of RDBMS that will let you flip
between vendors with ease.
It’s more like a gravel road that provides a path but you may get a bit dusty if you
travel it.
22
28. Ok, now that we have looked at some of the challenges with heterogeneous data,
let’s take a look at some of the technology solutions currently available to assist
with moving, replicating, or accessing data across different types of RDBMS.
Before I start, I’ll note again that this is not a complete list of solutions, even for the
top RDBMS’s mentioned at the start of the presentation.
These are the ones that are most obvious as a solution or those that I’ve had some
experience with so I feel I’m qualified to comment.
23
29. Most databases have much better tools and utilities for getting data into their
database compared with ways of transferring data to other types of databases.
Here’s some of the common ones but there are also plenty of 3rd party utilities
available ranging from free to very expensive but in my experience, you definitely
pay for want you get in this area.
So if you’re looking to migrate your data from A to B, look at the tools and utilities
available from B, they will usually be better than those from A.
I guess this makes sense from a competitive point of view. Let’s make it easy for
customers to move data into our database but don’t give them any help moving
data out of our database.
24
30. I’m going to give a special mention to MySQL and it’s migration tool, mainly
because of it’s relationship to the Oracle RDBMS and what Oracle did to MySQL’s
migration tool.
A little bit of the interesting history behind MySQL
MySQL was created in 1995
In 2000 a company called Innobase developed the InnoDB storage engine for
MySQL. This is what made MySQL a “real” RDBMS as it included things like
transactions, row level locking, and foreign keys, etc.
In 2005, Oracle acquired Innobase saying it wanted to increase support for Open
Source software. (yeah, right). It was really a strategic move by Oracle to squeeze
the life out of MySQL.
Also in 2005, MySQL released a utility called the MySQL Migration Toolkit as part of
MySQL GUI Tools Bundle that offered Oracle to MySQL schema and data transfer.
In 2008, Sun acquired MySQL.
In 2010, Oracle purchased Sun and acquired MySQL in the process.
Now, I thought Oracle would kill MySQL but I'm happy to see they have continued
to support and enhance the platform. Oracle OpenWorld this year had over 70
sessions around MySQL content. Although, the cynic in me thinks Oracle is still
trying to keep MySQL from being a serious competitor with Oracle’s database.
In 2010 MySQL added migration functionality to their MySQL WorkBench utility
which replaced the Migration Toolkit.
And when they did than, Oracle de-supported the Oracle database as a source for
25
31. migration. So you couldn’t do Oracle to MySQL anymore.
I can't find anything official except some forum comments to the effect that
"Migration from Oracle DB's is not supported."
So, If you want migrate data from Oracle to MySQL, you can’t do it with the MySQL
Workbench as Oracle has removed that functionality.
There are other third party solutions for Oracle -> MySQL migrations. Eg
http://www.ispirer.com/products/oracle-to-mysql-migration
Going the other way, as I’ve mentioned, using Oracle’s SQL Developer can migrate
a selection of common RDBMS to Oracle, and it’s free.
25
32. Also, I very quick mention of some of the products that enable you to capture
changes to data in one type of database and apply those changes into another type
of database.
That is, replicating data between heterogeneous databases, and by this I mean,
synchronised copies in near real time.
In heterogeneous environments, this typically means the logical replication of the
data where the SQL that is executed on the source database that changes data (I’m
talking about the DML statements of the SQL language like insert, update, delete)
is extracted as they occur, again, typically from the databases transaction logs and
converted to the native SQL of the target database. This process is known as
Change Data Capture, or CDC.
Logical replication using Change Data Capture is often a viable solution in
heterogeneous environments because the they have the ability to translate the
changes into the native SQL of the target database, so once the bulk of the data
has been migrated to the target, a heterogeneous CDC product can keep the two
data sources in sync.
There are many companies offering heterogeneous change data capture with
Oracle being at least one of the source and/or target databases.
• Oracle GoldenGate http://www.oracle.com/us/products/middleware/data-
26
34. In the final section of this presentation, I’d like to talk briefly about Oracle’s
solution to heterogeneous data access from other relational data sources.
Oracle’s had this functionality for many years but it’s gone though a few a number
of name changes. It started off being something called SQL*Connect, then
Transparent Gateways, but the latest name under 12c is Oracle Database
Gateways. But it’s essentially part of what Oracle called Heterogeneous Services
under 11g.
Oracle Gateways allow heterogeneous data access from other relational data
sources to an Oracle application.
Gateways are available for RDBMs like DB2 and SQL Server but also non relational
data sources like Excel and transaction managers like IBM’s CICS and message
queuing systems like IBM’s MQ.
The Gateways are a separate purchased option but are available, with a couple of
exceptions, for both Standard and Enterprise database editions.
These gateways handle some of the issues I have been talking about like SQL
translations, dictionary translations, datatype mappings.
The gateways for specific databases aren’t cheap. About the same per processor
license cost as GoldenGate, however the Database Gateway for ODBC, which is a
27
35. generic gateway for any ODBC compliant non-Oracle system is free with the
database although more functionally restricted that the specific Database
Gateways and you typically still need to purchase an ODBC driver
27
36. The way they work is that SQL statements are translated into the SQL of the non-
Oracle database.
With SQL statements, if the functionality is missing on the non-Oracle system, then
either a simpler query is issued, or the statement is broken up into multiple queries
and the results are obtained by post-processing in the Oracle database.
Remember, most of these features come with a list of restrictions and limitations
to capability so it’s not as simple as I’ve described it. For example, the
Heterogeneous Connectivity User’s Guide lists 10 rules restricting the use of SQL
statements in a heterogeneous distributed environment, so it’s not 100%
transparent.
But here’s a couple of examples of what I’m talking about.
28
37. All RDBM’s store metadata, that is, data about the data. Trouble is, they all store
this information in different ways.
One of the facilities that the Gateway provides is data dictionary translations.
So the example shows Oracle executing a select from the ALL_CATALOG data
dictionary but through a link to a SQL Server database.
The Gateway intercepts the query and translates it into the dictionary objects of
the SQL Server database.
The results of the new query are then returned to the user as it the information
came from the ALL_CATALOG view within Oracle.
29
38. There’s a package that’s part of Oracle’s heterogeneous services that deserves
special mention.
Using the DBMS_HS_PASSTHROUGH package allows you to execute SQL
statements directly on the non-Oracle system without them being interpreted by
the Oracle database.
What’s special about DBMS_HS_PASSTHROUGH is that it’s a virtual package, It
doesn’t exist in the Oracle or non-Oracle system, yet it still works! Conceptually it
resides on the non-Oracle system but in reality, calls to the package are intercepted
by the Heterogeneous Services component of Oracle and mapped to one of the
Gateway calls.
30
39. And so, just to wrap up before I take any questions, here’s a few key points from
the session.
• Know that there are options out there for accessing a moving data between
different types of data stores.
Depending on you position within your company, you may not need be
aware of them all but at least know someone who does and can select the
one that’s right for you.
• SQL databases are not the same, but with the SQL language, and some careful
consideration, they can work together, often seamlessly.
• Don’t sacrifice performance and features for conformity. Use what you have
been given, and paid for, to the best of it’s ability.
32
40. I want to leave a bit of time for questions so I’ve skipped a few topics like database
abstraction layers and ODBC.
Also, the combinations of lots of RDBMS’s and business use-case requirements
results in dozens of different functional specifications and it would be impossible to
cover all the options in this type of session but knowing the capabilities of the
available options will help you select the best fit to your requirements.
33
41. Ok, I hope you found that interesting and learnt a few things along the way.
Well, thank you for your attendance, and please enjoy the rest of the conference.
Thanks!
34