0
Today we will use the ggplot2 and lubridate
packages
install.packages("lubridate")
library(ggplot2) #load first!
library(lu...
Garrett Grolemund
Phd Student / Rice University
Department of Statistics
Dates and
Times
1. four more table rules
2. manipulating date-times
3. manipulating time spans (i.e, math
with date-times)
Table style
guidelines
1. When to use a table
Use a table instead of a graphic if:
a) there is only a small amount of data, or
b) precision is im...
2. Significant digits
Pick an appropriate amount of significant
digits (at most 4 or 5)
Use signif( ) to round data to that ...
3. Align decimals
Ensure that decimal points line up so that
differences in order of magnitude are easy
to spot.
3. Align decimals
Ensure that decimal points line up so that
differences in order of magnitude are easy
to spot.
4. Include captions
Always include a captions.
Less enthusiastic readers will only look at
your figures, so try to summariz...
date-times
in R
Time is a measurement system similar to
the number system (which measures
quantity). Just as numbers can be
arranged on a ...
0 CE
A date-time is a specific instant of time. It
refers to an exact point on the time line.
For example,
January 1st, 200...
0 CE
2000-01-01 12:34:00
Identifying instants
x seconds since 0 CE
Instants of time are commonly identified in two
ways:
1)...
R stores date-times as either POSIXct or a POSIXlt
objects.
POSIXct objects are stored as the number of seconds
since a re...
Parsing dates
Parse character strings into date-times
with the ymd() type functions in lubridate
e.g, ymd("2010-11-01")*
*...
Parsing dates
use the function whose name matches the
order of the elements in the date
ymd("2010-11-02")
dmy("02/11/2010"...
Your turn
Parse the column of dates in emails.csv
into POSIXct objects.
Access a specific element of a date-time with the
function that has the element’s name
now()
year(now())
hour(now())
day(now())
yday(now())
wday(now())
wday(now(), label = TRUE)
wday(now(), label = TRUE, abbr ...
Your turn
Use the email data to determine at which
hour of the day Professor Wickham is
most likely to respond to your ema...
em <- read.csv("emails.csv")
em$hour <- hour(em$time)
qplot(hour, data = em, binwidth = 1)
Accessor functions can also be used to
change the elements of a date-time
now()
year(now()) <- 1999
hour(now()) <- 23
day(...
Time zones
Different clock times. All refer to the same
instant of time
2010-11-01
03:53:06 PDT
2010-11-01
04:53:06 MDT
2010-11-01
05...
with_tz(now(), "UTC")
with_tz(now(), "America/New_York")
Same clock times. All refer to the different
instants of time
2010-11-01
03:53:06 PDT
2010-11-01
03:53:06 MDT
2010-11-01
0...
Recall that force_tz() returns a new instant
of time
force_tz(now(), "UTC")
force_tz(now(), "America/New_York")
What time ...
rounding instants
We can round an instant to a specified
unit using floor_date(), ceiling_date(),
round_date()
e.g.
round_da...
Your turn
Use ddply to calculate how many emails
Hadley received per day since September
1st. How will you subset? How wil...
# just the recent emails
recent <- subset(em, time > ymd("2010-09-01"))
# binning into days
recent$day <- lubridate::floor_...
qplot(day, emails, data = daily, geom = "line")
qplot(wday(day, label = T), emails, data = daily,
geom = "boxplot")
Time spans/
math with
date-times
0 CE
A time span is a period of time. It refers to
an interval on the time line. For example,
19 months, or
one century
on...
What does time
measure?
- one half of something called space-time?
- position of the sun?
- tilt of the Earth’s axis?
- nu...
The month suggests the tilt of the Earth’s
axis (but requires a leap day to get back
in sync)
The hour suggests where the ...
Consider
Suppose we wish to record the opening value of
the S&P 500 everyday for a month. Since the
stock market opens at ...
Why does this matter?
What do we mean by exactly one month
from now?
How long is an hour?
How long will an hour be at 2:00...
According to clock times, the time line
looks more like this
day light savings leap day
day light savings
3 types of time spans
We can still do math on the time line, as
long as we’re specific about how we want
to measure time.
W...
DST <- force_tz(ymd_hms("2010-11-7 01:06:39"), "")
durations
Durations measure the exact amount of
seconds that pass betwe...
Use new_duration() or a helper function to create a
durations object. Helper functions are named d + the
plural of the obj...
Durations are appropriate when you wish
to measure a time span exactly, or
compare tow time spans. For example,
- the radi...
Periods measure time spans in units
larger than seconds. Periods pay no
attention to how many sub-units occur
during the u...
{
No surprises.
2010-11-01 00:00:00 + months(1) will always
equal 2010-12-01 00:00:00 no matter how many
leap seconds, lea...
Why not use periods?
We cannot accurately compare two
periods unless we know when they occur.
1 month = 31 days
January = ...
Use new_period() or a helper function to create a
period object. Helper functions are simply the plural
of the object you ...
Periods are appropriate when you wish to
model events that depend on the clock
time. For example,
- the opening bell of a ...
parsing time spans
a time span that contains only hours,
minutes, and seconds information can be
parsed as a period with h...
Intervals measure a time span by
recording its endpoints. Since we know
when the time span occurs, we can
calculate the le...
Intervals retain all of the information
available about a time span, but cannot be
generalized to other spots on the time ...
Create an interval with new_interval() or by
subtracting two dates.
int <- ymd("2010-01-01") - ymd("2009-01-01")
Access an...
converting between
time spans
Periods can be converted to durations by
using the most common lengths (in
seconds) of each ...
arithmetic with
date-times
time spans are meant to be added and
subtracted to both each other and instants
e.g, now() + da...
Challenge
Write a function that takes a date and
returns the last day of the month the date
occurs in.
multiplication/division
Multiplication and division of time spans works as
expected.
Dividing periods with durations or ot...
modulo arithmetic and
integer division
integer division
128 %/% 5 = 25
modulo
128 %% 5 = 3
Your turn
Calculate your age in minutes. In hours. In
days. How old is someone who was born
on June 4th, 1958?
lifetime <- now() - ymd("1981-01-22")
lifetime / dminutes(1)
lifetime / dhours(1)
lifetime / ddays(1)
(now() - ymd("1981-0...
This work is licensed under the Creative
Commons Attribution-Noncommercial 3.0 United
States License. To view a copy of th...
20 date-times
Upcoming SlideShare
Loading in...5
×

20 date-times

1,206

Published on

Published in: Business, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,206
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
15
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "20 date-times"

  1. 1. Today we will use the ggplot2 and lubridate packages install.packages("lubridate") library(ggplot2) #load first! library(lubridate) #load second!
  2. 2. Garrett Grolemund Phd Student / Rice University Department of Statistics Dates and Times
  3. 3. 1. four more table rules 2. manipulating date-times 3. manipulating time spans (i.e, math with date-times)
  4. 4. Table style guidelines
  5. 5. 1. When to use a table Use a table instead of a graphic if: a) there is only a small amount of data, or b) precision is important
  6. 6. 2. Significant digits Pick an appropriate amount of significant digits (at most 4 or 5) Use signif( ) to round data to that amount
  7. 7. 3. Align decimals Ensure that decimal points line up so that differences in order of magnitude are easy to spot.
  8. 8. 3. Align decimals Ensure that decimal points line up so that differences in order of magnitude are easy to spot.
  9. 9. 4. Include captions Always include a captions. Less enthusiastic readers will only look at your figures, so try to summarize the whole story there. Captions should explain what data the figure shows and highlight the important finding.
  10. 10. date-times in R
  11. 11. Time is a measurement system similar to the number system (which measures quantity). Just as numbers can be arranged on a number line, times can be arranged on a time line monotonically increasing 0 CEBC AD
  12. 12. 0 CE A date-time is a specific instant of time. It refers to an exact point on the time line. For example, January 1st, 2000 12:34:00, or right now January 1st, 2000 12:34:00 right now
  13. 13. 0 CE 2000-01-01 12:34:00 Identifying instants x seconds since 0 CE Instants of time are commonly identified in two ways: 1) as the number of seconds since a reference time 2) by a unique combination of year, month, day, hour, minute, second, and time zone values
  14. 14. R stores date-times as either POSIXct or a POSIXlt objects. POSIXct objects are stored as the number of seconds since a reference time (by default 1970-01-01 00:00:00 UTC) unclass(now()) POSIXlt objects are stored as a unique combination of year, month, day, hour, minute, second, and time zone values unclass(as.POSIXlt(now())
  15. 15. Parsing dates Parse character strings into date-times with the ymd() type functions in lubridate e.g, ymd("2010-11-01")* *note: this is also a quick way to create new dates
  16. 16. Parsing dates use the function whose name matches the order of the elements in the date ymd("2010-11-02") dmy("02/11/2010") mdy("11.02.10") ymd_hms("2010-11-02 04:22:58")
  17. 17. Your turn Parse the column of dates in emails.csv into POSIXct objects.
  18. 18. Access a specific element of a date-time with the function that has the element’s name
  19. 19. now() year(now()) hour(now()) day(now()) yday(now()) wday(now()) wday(now(), label = TRUE) wday(now(), label = TRUE, abbr = FALSE) month(now(), label = TRUE, abbr = FALSE) Determine which day of the week you were born on
  20. 20. Your turn Use the email data to determine at which hour of the day Professor Wickham is most likely to respond to your email.
  21. 21. em <- read.csv("emails.csv") em$hour <- hour(em$time) qplot(hour, data = em, binwidth = 1)
  22. 22. Accessor functions can also be used to change the elements of a date-time now() year(now()) <- 1999 hour(now()) <- 23 day(now()) <- 45 tz(now()) <- "UTC" Determine what day of the week your birthday will be on next year
  23. 23. Time zones
  24. 24. Different clock times. All refer to the same instant of time 2010-11-01 03:53:06 PDT 2010-11-01 04:53:06 MDT 2010-11-01 05:53:06 PDT 2010-11-01 06:53:06 EDT Switch between time zones with with_tz()
  25. 25. with_tz(now(), "UTC") with_tz(now(), "America/New_York")
  26. 26. Same clock times. All refer to the different instants of time 2010-11-01 03:53:06 PDT 2010-11-01 03:53:06 MDT 2010-11-01 03:53:06 PDT 2010-11-01 03:53:06 EDT Alter time zone with force_tz()
  27. 27. Recall that force_tz() returns a new instant of time force_tz(now(), "UTC") force_tz(now(), "America/New_York") What time is it now where Hadley is (London)? What time was it here when London clocks displayed our current time?
  28. 28. rounding instants We can round an instant to a specified unit using floor_date(), ceiling_date(), round_date() e.g. round_date(now(), "day") round_date(now(), "month")
  29. 29. Your turn Use ddply to calculate how many emails Hadley received per day since September 1st. How will you subset? How will you pair up emails sent on the same day? Plot the number of emails per day over time. Compare the number of emails sent by day of the week.
  30. 30. # just the recent emails recent <- subset(em, time > ymd("2010-09-01")) # binning into days recent$day <- lubridate::floor_date(recent$time, "day") # calculating daily totals > daily <- ddply(recent, "day", summarise, words = sum(words), emails = length(day))
  31. 31. qplot(day, emails, data = daily, geom = "line") qplot(wday(day, label = T), emails, data = daily, geom = "boxplot")
  32. 32. Time spans/ math with date-times
  33. 33. 0 CE A time span is a period of time. It refers to an interval on the time line. For example, 19 months, or one century one century 19 months
  34. 34. What does time measure? - one half of something called space-time? - position of the sun? - tilt of the Earth’s axis? - number of days left until the weekend? - All of the above (and poorly at that)?
  35. 35. The month suggests the tilt of the Earth’s axis (but requires a leap day to get back in sync) The hour suggests where the sun is in the sky (but requires time zones and daylight savings) The Earth’s movement is decelerating, but space-time is constant (which requires random leap seconds) as.period(diff(.leap.seconds))
  36. 36. Consider Suppose we wish to record the opening value of the S&P 500 everyday for a month. Since the stock market opens at 8:30 CST, we could calculate: force_tz(ymd_hms("2010-01-01_08:30:00"), "") + ddays(0:30) how about force_tz(ymd_hms("2010-03-01_08:30:00"), "") + ddays(0:30) What went wrong?
  37. 37. Why does this matter? What do we mean by exactly one month from now? How long is an hour? How long will an hour be at 2:00 this sunday morning?
  38. 38. According to clock times, the time line looks more like this day light savings leap day day light savings
  39. 39. 3 types of time spans We can still do math on the time line, as long as we’re specific about how we want to measure time. We can use 3 types of time spans, each measures time differently: durations, periods, and intervals
  40. 40. DST <- force_tz(ymd_hms("2010-11-7 01:06:39"), "") durations Durations measure the exact amount of seconds that pass between two time points. DST + ddays(1)
  41. 41. Use new_duration() or a helper function to create a durations object. Helper functions are named d + the plural of the object you are trying to create. new_duration(3601) new_duration(minute = 5) dminutes(5) dhours(278) dmonths(4) #no dmonths()
  42. 42. Durations are appropriate when you wish to measure a time span exactly, or compare tow time spans. For example, - the radioactive half life of an atom - the lifespan of two brands of lightbulb - the speed of a baseball pitch
  43. 43. Periods measure time spans in units larger than seconds. Periods pay no attention to how many sub-units occur during the unit of measurement. periods DST + days(1) {
  44. 44. { No surprises. 2010-11-01 00:00:00 + months(1) will always equal 2010-12-01 00:00:00 no matter how many leap seconds, leap days or changes in DST occur in between Why use periods? =
  45. 45. Why not use periods? We cannot accurately compare two periods unless we know when they occur. 1 month = 31 days January = 31 days February = 31 days ?
  46. 46. Use new_period() or a helper function to create a period object. Helper functions are simply the plural of the object you are trying to create. new_period(3601) new_period(minute = 5) minutes(5) hours(278) months(4) # months are not a problem
  47. 47. Periods are appropriate when you wish to model events that depend on the clock time. For example, - the opening bell of a stock market - quarterly earnings reports - reoccurring deadlines
  48. 48. parsing time spans a time span that contains only hours, minutes, and seconds information can be parsed as a period with hms(), hm(), ms(), hours(), minutes(), or seconds() e.g ms("11:45")
  49. 49. Intervals measure a time span by recording its endpoints. Since we know when the time span occurs, we can calculate the lengths of all the units involved. intervals {
  50. 50. Intervals retain all of the information available about a time span, but cannot be generalized to other spots on the time line. Intervals can be accurately converted to either periods or durations with as.period() and as.duration()
  51. 51. Create an interval with new_interval() or by subtracting two dates. int <- ymd("2010-01-01") - ymd("2009-01-01") Access and set the endpoints with start() and end(). Note that setting preserves length (in seconds). start(int) end(int) <- ymd("2010-03-14") Intervals are always positive
  52. 52. converting between time spans Periods can be converted to durations by using the most common lengths (in seconds) of each time unit. These are just estimates. For accuracy, convert a period to a interval first and then convert the interval to a duration.
  53. 53. arithmetic with date-times time spans are meant to be added and subtracted to both each other and instants e.g, now() + days(1) - minutes(25:30)
  54. 54. Challenge Write a function that takes a date and returns the last day of the month the date occurs in.
  55. 55. multiplication/division Multiplication and division of time spans works as expected. Dividing periods with durations or other periods can only provide an estimate. Convert to intervals for accuracy. Dividing intervals by durations creates an exact answer. Dividing intervals by periods performs integer division.
  56. 56. modulo arithmetic and integer division integer division 128 %/% 5 = 25 modulo 128 %% 5 = 3
  57. 57. Your turn Calculate your age in minutes. In hours. In days. How old is someone who was born on June 4th, 1958?
  58. 58. lifetime <- now() - ymd("1981-01-22") lifetime / dminutes(1) lifetime / dhours(1) lifetime / ddays(1) (now() - ymd("1981-01-22")) / years(1)
  59. 59. This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×