I’m a code monkey in the mysql optimizer team who turned a teenager last year. Feels good to call myself that. Been working with mysql for 13 years now. The last few years I’ve been wallowing around in the boondocks of the optimizer with regular expressions, character sets and now time zones. All the fun stuff.
Right off the bat/ Here are the points I like you to take with you. (read)
If you follow all of these, I will be happy. And you will be happy, too. I hope that I can convince you to follow these.
These are the most important temporal types. We have the types date and time as well but they are not as commonly used These types can also have 1-6 fractional digits, but it works the same way in all types. I’ts one, two or three bytes. But this for comparison.
A timestamp is a Unix time_t underneath unless you have fractional digits. Note that on disk, it’s a 32 bit time_t.
A datetime is all the numbers packed together. Sign + 17 (year+month), 5 bits day hour, 6 bits minute second. 40 bits. 5 Bytes
TIMESTAMP has the same limitation as time_t and it has the infamous year 2038 problem (more precisely…) when it wraps around.
DATETIME runs from 1000 to 9999. Let’s just say it’s slightly more future proof. Supports some invalid dates, zero day or month, leap day etc. Not that you should do that. Understand there’s a lot of legacy out there.
And then there’s the thing with time zone. A datetime is what it is. Whatever you store you get back, regardless. A timestamp is converted from the session’s time zone and stored as utc. When read it’s converted back to your time zone. If that time zone observes daylight saving, we have a problem.
5 min. This problem is so big it gets its own slide. Let me talk a bit about what I call the ds problem.
There’s a lot of things that can go wrong when you query data stored as TIMESTAMP and ds comes into play.
- Some rows may be equal under certain conditions, different in others. - Ordering of rows may be unexpected - Replication slaves may go out of sync, creating problems for analytics - Queries over unique indexes may appear to produce duplicates
All of these boil down to the daylight savings problem. What is that?
Many countries use daylight savings. It means that in summer time, clocks are set ahead one hour so that people have one extra hour of daylight after work. In fall, the clocks are set back one hour. So we have one 23 hour day in spring, and a 25 hour day in fall.
The idea of daylight saving was proposed by the astronomer George Hudson in 1895 and was in use for the first time in 1917 in the German empire, and it’s controversial to this day. The main proponents were golfers and entomologists who didn’t like to cut their activities short at dusk. For some time they experimented with having government meetings eariler in the day in summer, but eventually they settled for simply setting the clocks forward. It’s really unique for a unit of measurement to be tailored to a specific purpose.
I wish we’d do the same with other units. Maybe make a kilo heavier around Christmas time?
In the Nordic countries, wher I live, we don’t care about daylight saing so much because in the summer you have as much daylight as you can take either way. The sun is up when I wake up, and it’s still up when I go to bed. And in the winter, who cares? It’s pitch black when I go to work and pitch black when I get home.
UTC time doesn’t observe daylight savings. So in a time zone that has dst, the displacement from UTC time will vary. This happened on Sunday March 31 last year. Instead of going from 1:59 to 2, the clock jumped from 1:59 to 3, thereby moving the displacement from utc. Skipping ahead to be two hours ahead of UTC. This means that there is a whole hour that doesn’t exist in the cet calendar. A leap hour if you wish. But it doesn’t get much worse than that. The shift in the other direction is far worse.
On October 27 last year at 3 in the morning, this happened. After 2:59:59, the next second it was 2 am yet again, and the displacement from UTC became not two but just one hour.
So in the CET time zone we’re reliving one hour each fall, groundhog day-style.
This is a very popular hours to go partying where I live because the authorities force the bars to close at 3. So at 2:59 you have another hour to party!
As you can see, the conversion from UTC to CET is not lossless.
<tryck> If you only see 2:30, it could mean two different times.
<tryck> MySQL will always choose the later interpretation in these cases.
Here’s a demonstration where mysql chooses the later time. The big number is the time_t value, or seconds since the unix epoch January first 1970.
The numbers are 3600 seconds apart, which is 60 x 60, one hour. Which is the groundhog hour. And when it gets an ambiguous time such as 2:30,
it chooses the later time.
The representation is different on different levels, which has some surprising effects!
Let me show you what I mean by that with an example.
Let me show you where ordering goes awry. We first set the time zone to utc and insert times right in the groundhog hour. <read>
10 min. All is well. Now let’s just change the time zone back to CET and let’s get the rows. And let’s get them in order, please.
As we change the time zone to CET, the times look funny. there’s 59 min from the second to the third. And ordering?! Is MySQL trolling us? Actually it isn’t, because these times are stored as 0:30, 0:59 and 1:00 am. We only interpret the output of filesort in our local time zone. We would get the same result event with an index.
We’ll need to take a look at mysql’s architecture to sort this out.
In the storage engine layer is where the tables and indexes live. Here timestamps are utc. In the sql layer we have the time in our local time zone, mostly.
As you saw in the previous slide, I had to play with the time_zone variable to deliver these examples. I don’t know bout you but I find it annoying having to change the time zone around all the time. Beside, that’s not really an option for some of you. The application might not let you, or you reuse your sessions.
Since 8.0.19, you can add a tz to a ts or dt literal. Expressed as displacement from UTC. That looks like a plus or minus sign, with hours and minutes, two digits each. Note that we follow ISO8601 to the letter here. Mysql is extremely liberal with the syntax of temporal literals, and this is a complete departure from that. The range is -14 to +14 as the sql standard prescribes, it used to be 13, so we bumped it up for that very reason. You can ONLY use it only with the ISO8601 extended format, which looks exactly as these. Only valid dates: internally dates are converted to the equivalent of epoch seconds, then displacement subtracted, then converted back. An invalid date has an undefined result when you convert to epoch seconds.
15 min. You can of course use it in the client/server protocol as well. It is then an extra signed 2-byte integer for the time zone. Added to the MYSQL_TIME struct.
So now we have 4 bytes for date values, 7 for date+time, 11 for date+time w/fractional, and 13 byte value for date+time+fractional+tz. We deliberately didn’t do a 9 byte value for date+time+tz b/c didn’t want to paint ourselves into a corner. In case we want to extend this in the future.
Armed with this, what happens when you use a displacement of zero to insert into a TIMESTAMP column is you insert exactly what you see. What you see is what you get.
Now, as you can see, the client is still on CET, which does the lossy conversion from UTC. to 2:30. Ambigous. Say I want to see it in UTC. Let’s fix that.
Since 8.017, you can use the set_var hint in comment hints. That way you can temporarily a session var to something else for the duration of a statement. Works for select, insert, update, delete. It looks like this. No quotes So if we set it to utc, we’re going to get back what we stored, lossless.
So, as you can see, TIMESTAMP is challenging to work with. DATETIME has none of these drawbacks. Granted, it doesn’t have a time zone information so the application has to convert it. Convention. Another reason we push for datetime is standard.
The SQL standard define three temporal types: date, time and timestamp. Timestamp and time come with or w/o tz. Temporal values w/o tz don’t point to a specific point in time, they need to be interpreted through a tz. time w/tz is a specific time but not a specific date. ts w/tz points to an exact time no matter the tz of the client that inserted it. There is no loss of info here.
We have four types Date and time act like standard. DATETIME most similar to ts w/o tz. Timestamp is an odd creature in this regard. somewhere in between the two. It’s a tswtz in the storage engine, but only UTC. And it’s displayed as tswotz to the user. where all the problems stem from.
Hopefully It’s obvious by now why it’s a good idea to have the database server set to utc. Either the entire box, or set globally in Mysql server.
Why should you use NULL instead of zero dates? What is it even? <ask>
So without further ado, I present to you, the zero date. This is what it looks like.
Obviously, just a hack that was needed at some point. So why shouldn’t you use it?
I’ll turn the question around. My question is, why would you? These are some of the arguments in favor of “zero dates” – or myths, I should call them.
Let’s debunk them. We start with automatic init
This is automatic initialization of dates. You can have a row auto updated either when the row is inserted or when it’s updated. In the past there were a weird set of restrictions. The column had to be TIMESTAMP, it had to be NOT NULL, and it had to be the first such column. You couldn’t have one column default now() and one on update now(). for example.
This forced users to resort to the zero date hack when they needed a null value. There is also some support for it in the server. That’s why the lowest timestamp is jan first 1970 at midnight PLUS ONE SECOND because the zero value is reserved to mean “zero date”.
20 min. BUT the previous slide hasn’t been true since 5.6.5. Nowadays you can have auto init for datetime and as many as you like. Mix and match. Knock yourself out. and yes they can have null in them.
Is it easier? The relational model was made for this.
Easier. How can It possible be any easier than that? It’s plain English. In SQL, NULL means “piece of information missing”. Some say NULL loses all comparisons. That’s not really true, the result is UNKNOWN. But the WHERE clause has an implicit IS TRUE predicate. Which you bypass easy by adding another IS UNKNOWN pred on top.
If you work with the relational model, it all works out. If you don’t like the relational model, fine, you can use mysql as a document store. That’s fine, too. But this kind of hack will only make life harder.
If you use NULL, it will get filtered out from your comparisons by default, so you only get known times.
However, if you have the zero date, it is a valid value and a query like this will give you all zero dates unless you add a special case fo it.
What about less space though? In fact, the opposite is true. In the Barracuda format in InnoDB we have null bytes before each record, so typically a null value doesn’t take up any space at all. [todo mer detaljer]
Efficiency? Nah, comparison with null takes exactly the same time as comparison with the zero date. There’s a statistically insignificant slant in *favor* of NULLs.
This holds for both myisam and innodb, btw.
A very valid reason to use zero dates is because of legacy data, which is why it’s still supported and will continue to be. But you need to be aware of the problems involved.
Invalid dates are a headache to work with so you’re better off not letting them into your database. Arithmetic on invalid dates does not work, nor will the new time zone information.
to_days(), to_seconds() give null for zero date and unexpected values for dates with zeroes.
Quoting a bug report. When you do SET sql_mode = 'ALLOW_INVALID_DATES';
you cannot expect anything good after it.
Now for my last point. ISO formats.
We are definitely moving in the direction of following standards more and more. We have deprecated a lot of non-standard behavior in the past, like NO_ZERO_DATE, NO_ZERO_IN_DATE and ALLOW_INVALID_DATES.
As I touched upon before, mysql is extremely liberal to interpret dates. And there are plenty of ways you can shoot yourself in the foot. To be on the safe side, I strongly recommend that you stick with the ISO formats. ISO8601 is most important. This standard is concerned only with the formats, not the interpretation.
A general principle is that the number of digits is always fixed. There *has* to be leading zeroes in all fields. And the year *has* to have four digits. The standard uses a 24-hour clock. Sorry, all Americans.
I already covered time zone so won’t go into it here. If no time zone is present, local time is assumed, which is consistent with how it works now in mysql.
It comes in basically two versions, the extended and basic formats.
The rfc3339 allows whitespace.
There is also a truncated. y2k problem. Was removed from std in 2004.
With that removed, both formats may omit the last digits for smaller precision. Always greater to smaller however.
Everything you always wanted to know about datetime types but didn’t have time to ask
MySQL Optimizer team
How to avoid the most common pitfalls of date and time types in MySQL
Everything you always wanted to know about
datetime types but didn’t have time to ask
The Main Players
32-bit seconds since Epoch (time_t)
1970-01-01 00:00:01 UTC to
2038-01-19 04:14:07 UTC
Session time zone in server, UTC in SE
Bit-encoded year, month, day, etc, 5 bytes.
1000-01-01 00:00:00 to