5.
1. When to use a table
Use a table instead of a graphic if:
a) there is only a small amount of data, or
b) precision is important
6.
2. Signiﬁcant digits
Pick an appropriate amount of signiﬁcant
digits (at most 4 or 5)
Use signif( ) to round data to that amount
7.
3. Align decimals
Ensure that decimal points line up so that
differences in order of magnitude are easy
to spot.
8.
3. Align decimals
Ensure that decimal points line up so that
differences in order of magnitude are easy
to spot.
9.
4. Include captions
Always include a captions.
Less enthusiastic readers will only look at
your ﬁgures, so try to summarize the
whole story there.
Captions should explain what data the
ﬁgure shows and highlight the important
ﬁnding.
11.
Time is a measurement system similar to
the number system (which measures
quantity). Just as numbers can be
arranged on a number line, times can be
arranged on a time line
monotonically increasing
0 CEBC AD
12.
0 CE
A date-time is a speciﬁc instant of time. It
refers to an exact point on the time line.
For example,
January 1st, 2000 12:34:00, or
right now
January 1st,
2000 12:34:00
right now
13.
0 CE
2000-01-01 12:34:00
Identifying instants
x seconds since 0 CE
Instants of time are commonly identiﬁed in two
ways:
1) as the number of seconds since a reference time
2) by a unique combination of year, month, day,
hour, minute, second, and time zone values
14.
R stores date-times as either POSIXct or a POSIXlt
objects.
POSIXct objects are stored as the number of seconds
since a reference time (by default 1970-01-01 00:00:00
UTC)
unclass(now())
POSIXlt objects are stored as a unique combination of
year, month, day, hour, minute, second, and time zone
values
unclass(as.POSIXlt(now())
15.
Parsing dates
Parse character strings into date-times
with the ymd() type functions in lubridate
e.g, ymd("2010-11-01")*
*note: this is also a quick way to create new dates
16.
Parsing dates
use the function whose name matches the
order of the elements in the date
ymd("2010-11-02")
dmy("02/11/2010")
mdy("11.02.10")
ymd_hms("2010-11-02 04:22:58")
17.
Your turn
Parse the column of dates in emails.csv
into POSIXct objects.
18.
Access a speciﬁc element of a date-time with the
function that has the element’s name
19.
now()
year(now())
hour(now())
day(now())
yday(now())
wday(now())
wday(now(), label = TRUE)
wday(now(), label = TRUE, abbr = FALSE)
month(now(), label = TRUE, abbr = FALSE)
Determine which day of
the week you were born on
20.
Your turn
Use the email data to determine at which
hour of the day Professor Wickham is
most likely to respond to your email.
21.
em <- read.csv("emails.csv")
em$hour <- hour(em$time)
qplot(hour, data = em, binwidth = 1)
22.
Accessor functions can also be used to
change the elements of a date-time
now()
year(now()) <- 1999
hour(now()) <- 23
day(now()) <- 45
tz(now()) <- "UTC"
Determine what day of
the week your birthday
will be on next year
24.
Different clock times. All refer to the same
instant of time
2010-11-01
03:53:06 PDT
2010-11-01
04:53:06 MDT
2010-11-01
05:53:06 PDT
2010-11-01
06:53:06 EDT
Switch between time zones with
with_tz()
26.
Same clock times. All refer to the different
instants of time
2010-11-01
03:53:06 PDT
2010-11-01
03:53:06 MDT
2010-11-01
03:53:06 PDT
2010-11-01
03:53:06 EDT
Alter time zone with force_tz()
27.
Recall that force_tz() returns a new instant
of time
force_tz(now(), "UTC")
force_tz(now(), "America/New_York")
What time is it now where Hadley is
(London)? What time was it here when
London clocks displayed our current
time?
28.
rounding instants
We can round an instant to a speciﬁed
unit using ﬂoor_date(), ceiling_date(),
round_date()
e.g.
round_date(now(), "day")
round_date(now(), "month")
29.
Your turn
Use ddply to calculate how many emails
Hadley received per day since September
1st. How will you subset? How will you
pair up emails sent on the same day?
Plot the number of emails per day over
time. Compare the number of emails sent
by day of the week.
30.
# just the recent emails
recent <- subset(em, time > ymd("2010-09-01"))
# binning into days
recent$day <- lubridate::ﬂoor_date(recent$time,
"day")
# calculating daily totals
> daily <- ddply(recent, "day", summarise, words
= sum(words), emails = length(day))
31.
qplot(day, emails, data = daily, geom = "line")
qplot(wday(day, label = T), emails, data = daily,
geom = "boxplot")
33.
0 CE
A time span is a period of time. It refers to
an interval on the time line. For example,
19 months, or
one century
one century 19 months
34.
What does time
measure?
- one half of something called space-time?
- position of the sun?
- tilt of the Earth’s axis?
- number of days left until the weekend?
- All of the above (and poorly at that)?
35.
The month suggests the tilt of the Earth’s
axis (but requires a leap day to get back
in sync)
The hour suggests where the sun is in the
sky (but requires time zones and daylight
savings)
The Earth’s movement is decelerating, but
space-time is constant (which requires
random leap seconds)
as.period(diff(.leap.seconds))
36.
Consider
Suppose we wish to record the opening value of
the S&P 500 everyday for a month. Since the
stock market opens at 8:30 CST, we could
calculate:
force_tz(ymd_hms("2010-01-01_08:30:00"), "") +
ddays(0:30)
how about
force_tz(ymd_hms("2010-03-01_08:30:00"), "") +
ddays(0:30)
What went wrong?
37.
Why does this matter?
What do we mean by exactly one month
from now?
How long is an hour?
How long will an hour be at 2:00 this
sunday morning?
38.
According to clock times, the time line
looks more like this
day light savings leap day
day light savings
39.
3 types of time spans
We can still do math on the time line, as
long as we’re speciﬁc about how we want
to measure time.
We can use 3 types of time spans, each
measures time differently: durations,
periods, and intervals
40.
DST <- force_tz(ymd_hms("2010-11-7 01:06:39"), "")
durations
Durations measure the exact amount of
seconds that pass between two time
points.
DST + ddays(1)
41.
Use new_duration() or a helper function to create a
durations object. Helper functions are named d + the
plural of the object you are trying to create.
new_duration(3601)
new_duration(minute = 5)
dminutes(5)
dhours(278)
dmonths(4) #no dmonths()
42.
Durations are appropriate when you wish
to measure a time span exactly, or
compare tow time spans. For example,
- the radioactive half life of an atom
- the lifespan of two brands of lightbulb
- the speed of a baseball pitch
43.
Periods measure time spans in units
larger than seconds. Periods pay no
attention to how many sub-units occur
during the unit of measurement.
periods
DST + days(1)
{
44.
{
No surprises.
2010-11-01 00:00:00 + months(1) will always
equal 2010-12-01 00:00:00 no matter how many
leap seconds, leap days or changes in DST
occur in between
Why use periods?
=
45.
Why not use periods?
We cannot accurately compare two
periods unless we know when they occur.
1 month = 31 days
January = 31 days
February = 31 days
?
46.
Use new_period() or a helper function to create a
period object. Helper functions are simply the plural
of the object you are trying to create.
new_period(3601)
new_period(minute = 5)
minutes(5)
hours(278)
months(4) # months are not a problem
47.
Periods are appropriate when you wish to
model events that depend on the clock
time. For example,
- the opening bell of a stock market
- quarterly earnings reports
- reoccurring deadlines
48.
parsing time spans
a time span that contains only hours,
minutes, and seconds information can be
parsed as a period with hms(), hm(), ms(),
hours(), minutes(), or seconds()
e.g ms("11:45")
49.
Intervals measure a time span by
recording its endpoints. Since we know
when the time span occurs, we can
calculate the lengths of all the units
involved.
intervals
{
50.
Intervals retain all of the information
available about a time span, but cannot be
generalized to other spots on the time line.
Intervals can be accurately converted to
either periods or durations with as.period()
and as.duration()
51.
Create an interval with new_interval() or by
subtracting two dates.
int <- ymd("2010-01-01") - ymd("2009-01-01")
Access and set the endpoints with start() and end().
Note that setting preserves length (in seconds).
start(int)
end(int) <- ymd("2010-03-14")
Intervals are always positive
52.
converting between
time spans
Periods can be converted to durations by
using the most common lengths (in
seconds) of each time unit.
These are just estimates. For accuracy,
convert a period to a interval ﬁrst and
then convert the interval to a duration.
53.
arithmetic with
date-times
time spans are meant to be added and
subtracted to both each other and instants
e.g, now() + days(1) - minutes(25:30)
54.
Challenge
Write a function that takes a date and
returns the last day of the month the date
occurs in.
55.
multiplication/division
Multiplication and division of time spans works as
expected.
Dividing periods with durations or other periods
can only provide an estimate. Convert to intervals
for accuracy.
Dividing intervals by durations creates an exact
answer.
Dividing intervals by periods performs integer
division.
59.
This work is licensed under the Creative
Commons Attribution-Noncommercial 3.0 United
States License. To view a copy of this license,
visit http://creativecommons.org/licenses/by-nc/
3.0/us/ or send a letter to Creative Commons,
171 Second Street, Suite 300, San Francisco,
California, 94105, USA.
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.
Be the first to comment