20 date-times


Published on

Published in: Business, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

20 date-times

  1. 1. Today we will use the ggplot2 and lubridate packages install.packages("lubridate") library(ggplot2) #load first! library(lubridate) #load second!
  2. 2. Garrett Grolemund Phd Student / Rice University Department of Statistics Dates and Times
  3. 3. 1. four more table rules 2. manipulating date-times 3. manipulating time spans (i.e, math with date-times)
  4. 4. Table style guidelines
  5. 5. 1. When to use a table Use a table instead of a graphic if: a) there is only a small amount of data, or b) precision is important
  6. 6. 2. Significant digits Pick an appropriate amount of significant digits (at most 4 or 5) Use signif( ) to round data to that amount
  7. 7. 3. Align decimals Ensure that decimal points line up so that differences in order of magnitude are easy to spot.
  8. 8. 3. Align decimals Ensure that decimal points line up so that differences in order of magnitude are easy to spot.
  9. 9. 4. Include captions Always include a captions. Less enthusiastic readers will only look at your figures, so try to summarize the whole story there. Captions should explain what data the figure shows and highlight the important finding.
  10. 10. date-times in R
  11. 11. Time is a measurement system similar to the number system (which measures quantity). Just as numbers can be arranged on a number line, times can be arranged on a time line monotonically increasing 0 CEBC AD
  12. 12. 0 CE A date-time is a specific instant of time. It refers to an exact point on the time line. For example, January 1st, 2000 12:34:00, or right now January 1st, 2000 12:34:00 right now
  13. 13. 0 CE 2000-01-01 12:34:00 Identifying instants x seconds since 0 CE Instants of time are commonly identified in two ways: 1) as the number of seconds since a reference time 2) by a unique combination of year, month, day, hour, minute, second, and time zone values
  14. 14. R stores date-times as either POSIXct or a POSIXlt objects. POSIXct objects are stored as the number of seconds since a reference time (by default 1970-01-01 00:00:00 UTC) unclass(now()) POSIXlt objects are stored as a unique combination of year, month, day, hour, minute, second, and time zone values unclass(as.POSIXlt(now())
  15. 15. Parsing dates Parse character strings into date-times with the ymd() type functions in lubridate e.g, ymd("2010-11-01")* *note: this is also a quick way to create new dates
  16. 16. Parsing dates use the function whose name matches the order of the elements in the date ymd("2010-11-02") dmy("02/11/2010") mdy("11.02.10") ymd_hms("2010-11-02 04:22:58")
  17. 17. Your turn Parse the column of dates in emails.csv into POSIXct objects.
  18. 18. Access a specific element of a date-time with the function that has the element’s name
  19. 19. now() year(now()) hour(now()) day(now()) yday(now()) wday(now()) wday(now(), label = TRUE) wday(now(), label = TRUE, abbr = FALSE) month(now(), label = TRUE, abbr = FALSE) Determine which day of the week you were born on
  20. 20. Your turn Use the email data to determine at which hour of the day Professor Wickham is most likely to respond to your email.
  21. 21. em <- read.csv("emails.csv") em$hour <- hour(em$time) qplot(hour, data = em, binwidth = 1)
  22. 22. Accessor functions can also be used to change the elements of a date-time now() year(now()) <- 1999 hour(now()) <- 23 day(now()) <- 45 tz(now()) <- "UTC" Determine what day of the week your birthday will be on next year
  23. 23. Time zones
  24. 24. Different clock times. All refer to the same instant of time 2010-11-01 03:53:06 PDT 2010-11-01 04:53:06 MDT 2010-11-01 05:53:06 PDT 2010-11-01 06:53:06 EDT Switch between time zones with with_tz()
  25. 25. with_tz(now(), "UTC") with_tz(now(), "America/New_York")
  26. 26. Same clock times. All refer to the different instants of time 2010-11-01 03:53:06 PDT 2010-11-01 03:53:06 MDT 2010-11-01 03:53:06 PDT 2010-11-01 03:53:06 EDT Alter time zone with force_tz()
  27. 27. Recall that force_tz() returns a new instant of time force_tz(now(), "UTC") force_tz(now(), "America/New_York") What time is it now where Hadley is (London)? What time was it here when London clocks displayed our current time?
  28. 28. rounding instants We can round an instant to a specified unit using floor_date(), ceiling_date(), round_date() e.g. round_date(now(), "day") round_date(now(), "month")
  29. 29. Your turn Use ddply to calculate how many emails Hadley received per day since September 1st. How will you subset? How will you pair up emails sent on the same day? Plot the number of emails per day over time. Compare the number of emails sent by day of the week.
  30. 30. # just the recent emails recent <- subset(em, time > ymd("2010-09-01")) # binning into days recent$day <- lubridate::floor_date(recent$time, "day") # calculating daily totals > daily <- ddply(recent, "day", summarise, words = sum(words), emails = length(day))
  31. 31. qplot(day, emails, data = daily, geom = "line") qplot(wday(day, label = T), emails, data = daily, geom = "boxplot")
  32. 32. Time spans/ math with date-times
  33. 33. 0 CE A time span is a period of time. It refers to an interval on the time line. For example, 19 months, or one century one century 19 months
  34. 34. What does time measure? - one half of something called space-time? - position of the sun? - tilt of the Earth’s axis? - number of days left until the weekend? - All of the above (and poorly at that)?
  35. 35. The month suggests the tilt of the Earth’s axis (but requires a leap day to get back in sync) The hour suggests where the sun is in the sky (but requires time zones and daylight savings) The Earth’s movement is decelerating, but space-time is constant (which requires random leap seconds) as.period(diff(.leap.seconds))
  36. 36. Consider Suppose we wish to record the opening value of the S&P 500 everyday for a month. Since the stock market opens at 8:30 CST, we could calculate: force_tz(ymd_hms("2010-01-01_08:30:00"), "") + ddays(0:30) how about force_tz(ymd_hms("2010-03-01_08:30:00"), "") + ddays(0:30) What went wrong?
  37. 37. Why does this matter? What do we mean by exactly one month from now? How long is an hour? How long will an hour be at 2:00 this sunday morning?
  38. 38. According to clock times, the time line looks more like this day light savings leap day day light savings
  39. 39. 3 types of time spans We can still do math on the time line, as long as we’re specific about how we want to measure time. We can use 3 types of time spans, each measures time differently: durations, periods, and intervals
  40. 40. DST <- force_tz(ymd_hms("2010-11-7 01:06:39"), "") durations Durations measure the exact amount of seconds that pass between two time points. DST + ddays(1)
  41. 41. Use new_duration() or a helper function to create a durations object. Helper functions are named d + the plural of the object you are trying to create. new_duration(3601) new_duration(minute = 5) dminutes(5) dhours(278) dmonths(4) #no dmonths()
  42. 42. Durations are appropriate when you wish to measure a time span exactly, or compare tow time spans. For example, - the radioactive half life of an atom - the lifespan of two brands of lightbulb - the speed of a baseball pitch
  43. 43. Periods measure time spans in units larger than seconds. Periods pay no attention to how many sub-units occur during the unit of measurement. periods DST + days(1) {
  44. 44. { No surprises. 2010-11-01 00:00:00 + months(1) will always equal 2010-12-01 00:00:00 no matter how many leap seconds, leap days or changes in DST occur in between Why use periods? =
  45. 45. Why not use periods? We cannot accurately compare two periods unless we know when they occur. 1 month = 31 days January = 31 days February = 31 days ?
  46. 46. Use new_period() or a helper function to create a period object. Helper functions are simply the plural of the object you are trying to create. new_period(3601) new_period(minute = 5) minutes(5) hours(278) months(4) # months are not a problem
  47. 47. Periods are appropriate when you wish to model events that depend on the clock time. For example, - the opening bell of a stock market - quarterly earnings reports - reoccurring deadlines
  48. 48. parsing time spans a time span that contains only hours, minutes, and seconds information can be parsed as a period with hms(), hm(), ms(), hours(), minutes(), or seconds() e.g ms("11:45")
  49. 49. Intervals measure a time span by recording its endpoints. Since we know when the time span occurs, we can calculate the lengths of all the units involved. intervals {
  50. 50. Intervals retain all of the information available about a time span, but cannot be generalized to other spots on the time line. Intervals can be accurately converted to either periods or durations with as.period() and as.duration()
  51. 51. Create an interval with new_interval() or by subtracting two dates. int <- ymd("2010-01-01") - ymd("2009-01-01") Access and set the endpoints with start() and end(). Note that setting preserves length (in seconds). start(int) end(int) <- ymd("2010-03-14") Intervals are always positive
  52. 52. converting between time spans Periods can be converted to durations by using the most common lengths (in seconds) of each time unit. These are just estimates. For accuracy, convert a period to a interval first and then convert the interval to a duration.
  53. 53. arithmetic with date-times time spans are meant to be added and subtracted to both each other and instants e.g, now() + days(1) - minutes(25:30)
  54. 54. Challenge Write a function that takes a date and returns the last day of the month the date occurs in.
  55. 55. multiplication/division Multiplication and division of time spans works as expected. Dividing periods with durations or other periods can only provide an estimate. Convert to intervals for accuracy. Dividing intervals by durations creates an exact answer. Dividing intervals by periods performs integer division.
  56. 56. modulo arithmetic and integer division integer division 128 %/% 5 = 25 modulo 128 %% 5 = 3
  57. 57. Your turn Calculate your age in minutes. In hours. In days. How old is someone who was born on June 4th, 1958?
  58. 58. lifetime <- now() - ymd("1981-01-22") lifetime / dminutes(1) lifetime / dhours(1) lifetime / ddays(1) (now() - ymd("1981-01-22")) / years(1)
  59. 59. This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.