2. Kan Nishida
co-founder/CEO
Exploratory
Summary
Beginning of 2016, launched Exploratory, Inc. to make
Data Science available for everyone.
Prior to Exploratory, Kan was a director of development at
Oracle leading development teams for building various
Data Science products in areas including Machine
Learning, BI, Data Visualization, Mobile Analytics, Big Data,
etc.
While at Oracle, Kan also provided training and consulting
services to help organizations transform with data.
@KanAugust
Instructor
7. Questions
What you can do with Exploratory
CommunicationData Access
Data Wrangling
Visualization
Machine Learning /
Statistics
Exploratory
Data
Analysis
10. • User Activity Data
• Each row represents an user access for a fictional online service.
• There are 6 columns, timestamp, user id, event type, IP address,
OS, and OS version.
• Download EDF
Data
11.
12. Questions
1. What is the duration (date range) of this data?
2. What is DAU (Daily Active Users) and how it’s
been changed over time?
3. Which days of week (e.g. Monday) and hours
are more active?
18. Character vs. Date/Time
Date data is recognized as character.
Dates duration is
igonored
Sorted as
character.
e.g. 10 (Oct.)
comes after 1
(Jan)
Data: Date-unicorn.csv
19. Character vs. Date/Time
Various transformation on date data is available
Data is sorted as dates.
Duration
honors
date
interval.
22. By making it to Date & Time data type, you
can do a lot of cool things.
23. 1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
24. 1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
25.
26.
27. Only codes you need to know
• Year
• Month
• Day
• Hour
• Minute
• Second
33. 1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
46. Ordinal - Ordered Factor
• Month, Day of Week should
be sorted in the natural
order.
• R’s factor data type
supports ‘Order’ information.
• Functions like ‘wday’,
‘month’, take care of it.
47. 1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
56. 1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
57. 3 weeks
4 weeks
2 weeks
First Date Last Date
First Date Last Date
First Date Last Date
Duration
64. From Column Header Menu
1. Select “Change Data Type”
2. Select “Convert to Number”
3. Select “Days”
2. Convert the lifetime to numeric data type (in days)
67. 1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
81. 1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
82. • We have Temperature Data of London and Tokyo
• Each row represents a temperature for a certain date/time in year 2016.
There are 17,498 temperature data of London and 19,489 temperature
data of Tokyo
• Each temperature record has date/time, longitude, latitude, temperature,
etc
• Filename: Date-London-temp.csv and Date-Tokyo-temp.csv
Timezone - Data
88. For London, 2:00pm is the peak of
Average temperature
→ It sounds reasonable.
For Tokyo, 5:00am is the peak of
Average temperature
→ ???
When you compare hourly temperature data
between London and Tokyo
Data: Date-London-temp.csv, Date-Tokyo-temp.csv
89. • From the hourly temperature data of Tokyo, I want to know what time is
the most hot in the day, but the time indicated by the date / time data
is different from the actual time in Tokyo
• We would like to compare average hourly temperatures of two cities
with different time zones
Problem
90. 2PM JST (Japan Standard Time)
2PM GMT (Greenwich Mean Time)
Timezone
92. UTC (Coordinated Universal Time)
• It is the base point for all other time zones in the world
• POSIXct is basically based on the UTC
• UTC and GMT (Greenwich Mean Time) are almost identical. (→ That is
why the hourly temperature data for London is displayed correctly on
the previous chart.)
96. with_tz
# Append Timezone information
with_tz(ymd_hms("2015-10-01 02:20:34”))
→ "2015-09-30 19:20:34 PDT"
Default value of with_tz is local
machine’s timezone.
In this example, PDT (Pacific
Daylight Time)
97. with_tz(ymd_hms("2015-10-01 02:20:34”))
→ "2015-09-30 19:20:34 PDT"
with_tz(ymd_hms("2015-10-01 02:20:34"), tz = "Asia/Tokyo")
→ "2015-10-01 11:20:34 JST"
with_tz
By specifying timezone information,
You can convert date/time to any
timezone
103. January 15th (Tuesday), 2019
• Data Wrangling: Working with Text Data
Planned
• Analytics 101 - When to use which algorithms?
• Data Wrangling: Introduction to Regular Expression
https://exploratory.io/online-seminar