Monthly time trend from dataframe of dates
I have a dataset that looks like this:
group id date1 date2 date3 date4
1 1 1 1991-10-14 1992-05-20 1992-12-09 1993-06-30
2 1 2 <NA> 1992-05-21 1992-12-10 1993-06-29
3 1 3 <NA> <NA> 1992-12-08 1993-06-29
4 1 4 1991-10-14 1992-05-19 <NA> <NA>
5 1 5 1991-10-15 1992-05-21 <NA> 1993-06-30
6 1 6 1991-10-15 <NA> <NA> 1993-06-30
Here the data is in R format:
structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L),
id = 1:6,
date1 = structure(c(7956, NA, NA, 7956, 7957, 7957), class = "Date"),
date2 = structure(c(8175, 8176, NA, 8174, 8176, NA), class = "Date"),
date3 = structure(c(8378, 8379, 8377, NA, NA, NA), class = "Date"),
date4 = structure(c(8581, 8580, 8580, NA, 8581, 8581), class = "Date")),
.Names = c("group", "id", "date1", "date2", "date3", "date4"),
row.names = c(NA, 6L), class = "data.frame")
That is, we have a grouping variable, several individuals and four possible dates of interest.
Now I want to construct a linear month time trend for each individual from this. In other words, I try to construct a trend with value 1 on the first non-NA
date. After that, the trend for the remaining non-NA
periods are the months passed since the first non-NA
date.
My goal is this structure (individual 1, group 1):
group id period trend
1 1 1 1 1
2 1 1 2 8
3 1 1 3 15
4 1 1 4 21
That is, a molten data.frame with the months passed since t = 1.
I've played around with the ideas from this thread: Number of months between two dates. However, I can't find a solution that does not involve a for
-loop and and excruciating number of if
-statements.
Any help appreciated!
r date dataframe trend
add a comment |
I have a dataset that looks like this:
group id date1 date2 date3 date4
1 1 1 1991-10-14 1992-05-20 1992-12-09 1993-06-30
2 1 2 <NA> 1992-05-21 1992-12-10 1993-06-29
3 1 3 <NA> <NA> 1992-12-08 1993-06-29
4 1 4 1991-10-14 1992-05-19 <NA> <NA>
5 1 5 1991-10-15 1992-05-21 <NA> 1993-06-30
6 1 6 1991-10-15 <NA> <NA> 1993-06-30
Here the data is in R format:
structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L),
id = 1:6,
date1 = structure(c(7956, NA, NA, 7956, 7957, 7957), class = "Date"),
date2 = structure(c(8175, 8176, NA, 8174, 8176, NA), class = "Date"),
date3 = structure(c(8378, 8379, 8377, NA, NA, NA), class = "Date"),
date4 = structure(c(8581, 8580, 8580, NA, 8581, 8581), class = "Date")),
.Names = c("group", "id", "date1", "date2", "date3", "date4"),
row.names = c(NA, 6L), class = "data.frame")
That is, we have a grouping variable, several individuals and four possible dates of interest.
Now I want to construct a linear month time trend for each individual from this. In other words, I try to construct a trend with value 1 on the first non-NA
date. After that, the trend for the remaining non-NA
periods are the months passed since the first non-NA
date.
My goal is this structure (individual 1, group 1):
group id period trend
1 1 1 1 1
2 1 1 2 8
3 1 1 3 15
4 1 1 4 21
That is, a molten data.frame with the months passed since t = 1.
I've played around with the ideas from this thread: Number of months between two dates. However, I can't find a solution that does not involve a for
-loop and and excruciating number of if
-statements.
Any help appreciated!
r date dataframe trend
Please include the definition of data frame into the question itself.
– Heikki
Nov 22 at 20:13
1
I included it at the bottom of the question.
– Mr. Zen
Nov 22 at 20:17
add a comment |
I have a dataset that looks like this:
group id date1 date2 date3 date4
1 1 1 1991-10-14 1992-05-20 1992-12-09 1993-06-30
2 1 2 <NA> 1992-05-21 1992-12-10 1993-06-29
3 1 3 <NA> <NA> 1992-12-08 1993-06-29
4 1 4 1991-10-14 1992-05-19 <NA> <NA>
5 1 5 1991-10-15 1992-05-21 <NA> 1993-06-30
6 1 6 1991-10-15 <NA> <NA> 1993-06-30
Here the data is in R format:
structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L),
id = 1:6,
date1 = structure(c(7956, NA, NA, 7956, 7957, 7957), class = "Date"),
date2 = structure(c(8175, 8176, NA, 8174, 8176, NA), class = "Date"),
date3 = structure(c(8378, 8379, 8377, NA, NA, NA), class = "Date"),
date4 = structure(c(8581, 8580, 8580, NA, 8581, 8581), class = "Date")),
.Names = c("group", "id", "date1", "date2", "date3", "date4"),
row.names = c(NA, 6L), class = "data.frame")
That is, we have a grouping variable, several individuals and four possible dates of interest.
Now I want to construct a linear month time trend for each individual from this. In other words, I try to construct a trend with value 1 on the first non-NA
date. After that, the trend for the remaining non-NA
periods are the months passed since the first non-NA
date.
My goal is this structure (individual 1, group 1):
group id period trend
1 1 1 1 1
2 1 1 2 8
3 1 1 3 15
4 1 1 4 21
That is, a molten data.frame with the months passed since t = 1.
I've played around with the ideas from this thread: Number of months between two dates. However, I can't find a solution that does not involve a for
-loop and and excruciating number of if
-statements.
Any help appreciated!
r date dataframe trend
I have a dataset that looks like this:
group id date1 date2 date3 date4
1 1 1 1991-10-14 1992-05-20 1992-12-09 1993-06-30
2 1 2 <NA> 1992-05-21 1992-12-10 1993-06-29
3 1 3 <NA> <NA> 1992-12-08 1993-06-29
4 1 4 1991-10-14 1992-05-19 <NA> <NA>
5 1 5 1991-10-15 1992-05-21 <NA> 1993-06-30
6 1 6 1991-10-15 <NA> <NA> 1993-06-30
Here the data is in R format:
structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L),
id = 1:6,
date1 = structure(c(7956, NA, NA, 7956, 7957, 7957), class = "Date"),
date2 = structure(c(8175, 8176, NA, 8174, 8176, NA), class = "Date"),
date3 = structure(c(8378, 8379, 8377, NA, NA, NA), class = "Date"),
date4 = structure(c(8581, 8580, 8580, NA, 8581, 8581), class = "Date")),
.Names = c("group", "id", "date1", "date2", "date3", "date4"),
row.names = c(NA, 6L), class = "data.frame")
That is, we have a grouping variable, several individuals and four possible dates of interest.
Now I want to construct a linear month time trend for each individual from this. In other words, I try to construct a trend with value 1 on the first non-NA
date. After that, the trend for the remaining non-NA
periods are the months passed since the first non-NA
date.
My goal is this structure (individual 1, group 1):
group id period trend
1 1 1 1 1
2 1 1 2 8
3 1 1 3 15
4 1 1 4 21
That is, a molten data.frame with the months passed since t = 1.
I've played around with the ideas from this thread: Number of months between two dates. However, I can't find a solution that does not involve a for
-loop and and excruciating number of if
-statements.
Any help appreciated!
r date dataframe trend
r date dataframe trend
edited Nov 22 at 20:29
Heikki
1,2471017
1,2471017
asked Nov 22 at 19:47
Mr. Zen
483215
483215
Please include the definition of data frame into the question itself.
– Heikki
Nov 22 at 20:13
1
I included it at the bottom of the question.
– Mr. Zen
Nov 22 at 20:17
add a comment |
Please include the definition of data frame into the question itself.
– Heikki
Nov 22 at 20:13
1
I included it at the bottom of the question.
– Mr. Zen
Nov 22 at 20:17
Please include the definition of data frame into the question itself.
– Heikki
Nov 22 at 20:13
Please include the definition of data frame into the question itself.
– Heikki
Nov 22 at 20:13
1
1
I included it at the bottom of the question.
– Mr. Zen
Nov 22 at 20:17
I included it at the bottom of the question.
– Mr. Zen
Nov 22 at 20:17
add a comment |
2 Answers
2
active
oldest
votes
Here is one potential solution using dplyr
and tidyr
:
library(dplyr)
library(tidyr)
library(stringr)
df %>%
gather(period, date, -group, -id) %>%
arrange(group, id, period) %>%
mutate(date = as.Date(date)) %>%
group_by(group, id) %>%
filter(!all(is.na(date))) %>%
mutate(
trend = as.integer(
floor(difftime(date, date[which.max(!is.na(date))], units = 'days') / 30)
) + 1,
period = str_replace(period, 'date', '')
) %>%
select(-date)
Output is as follows:
# A tibble: 24 x 4
# Groups: group, id [6]
group id period trend
<int> <int> <chr> <dbl>
1 1 1 1 1
2 1 1 2 8
3 1 1 3 15
4 1 1 4 21
5 1 2 1 NA
6 1 2 2 1
7 1 2 3 7
8 1 2 4 14
9 1 3 1 NA
10 1 3 2 NA
# ... with 14 more rows
NOTE: Edited to add a filter to filter out cases where ALL dates are NA for a given group / id. Otherwise, which,max
will fail.
1
Just a note for future reference: For this solution, at least I neededlibrary(stringr)
forstr_replace()
.
– Mr. Zen
Nov 22 at 20:28
Yes. Edited. Sorry, I missed that.
– Gopala
Nov 22 at 20:30
add a comment |
data.table approach
I leave the rounding and/or adding +1 to you.. this is always tricky with months. I personally try to avoid this, and calculate with days or weeks (or just about anything BUT months)...
library( data.table)
dt <- melt ( as.data.table( df ), id.vars = c("group", "id"), variable.name = "date_id", value.name = "date" )
setkey(dt, id, group, date_id)
dt[, diff := lubridate::interval( date[which.min( date ) ], date ) / months(1) , by = c("group", "id")]
head(dt)
# group id date_id date diff
# 1: 1 1 date1 1991-10-14 0.000000
# 2: 1 1 date2 1992-05-20 7.193548
# 3: 1 1 date3 1992-12-09 13.833333
# 4: 1 1 date4 1993-06-30 20.533333
# 5: 1 2 date1 <NA> NA
# 6: 1 2 date2 1992-05-21 0.000000
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437276%2fmonthly-time-trend-from-dataframe-of-dates%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here is one potential solution using dplyr
and tidyr
:
library(dplyr)
library(tidyr)
library(stringr)
df %>%
gather(period, date, -group, -id) %>%
arrange(group, id, period) %>%
mutate(date = as.Date(date)) %>%
group_by(group, id) %>%
filter(!all(is.na(date))) %>%
mutate(
trend = as.integer(
floor(difftime(date, date[which.max(!is.na(date))], units = 'days') / 30)
) + 1,
period = str_replace(period, 'date', '')
) %>%
select(-date)
Output is as follows:
# A tibble: 24 x 4
# Groups: group, id [6]
group id period trend
<int> <int> <chr> <dbl>
1 1 1 1 1
2 1 1 2 8
3 1 1 3 15
4 1 1 4 21
5 1 2 1 NA
6 1 2 2 1
7 1 2 3 7
8 1 2 4 14
9 1 3 1 NA
10 1 3 2 NA
# ... with 14 more rows
NOTE: Edited to add a filter to filter out cases where ALL dates are NA for a given group / id. Otherwise, which,max
will fail.
1
Just a note for future reference: For this solution, at least I neededlibrary(stringr)
forstr_replace()
.
– Mr. Zen
Nov 22 at 20:28
Yes. Edited. Sorry, I missed that.
– Gopala
Nov 22 at 20:30
add a comment |
Here is one potential solution using dplyr
and tidyr
:
library(dplyr)
library(tidyr)
library(stringr)
df %>%
gather(period, date, -group, -id) %>%
arrange(group, id, period) %>%
mutate(date = as.Date(date)) %>%
group_by(group, id) %>%
filter(!all(is.na(date))) %>%
mutate(
trend = as.integer(
floor(difftime(date, date[which.max(!is.na(date))], units = 'days') / 30)
) + 1,
period = str_replace(period, 'date', '')
) %>%
select(-date)
Output is as follows:
# A tibble: 24 x 4
# Groups: group, id [6]
group id period trend
<int> <int> <chr> <dbl>
1 1 1 1 1
2 1 1 2 8
3 1 1 3 15
4 1 1 4 21
5 1 2 1 NA
6 1 2 2 1
7 1 2 3 7
8 1 2 4 14
9 1 3 1 NA
10 1 3 2 NA
# ... with 14 more rows
NOTE: Edited to add a filter to filter out cases where ALL dates are NA for a given group / id. Otherwise, which,max
will fail.
1
Just a note for future reference: For this solution, at least I neededlibrary(stringr)
forstr_replace()
.
– Mr. Zen
Nov 22 at 20:28
Yes. Edited. Sorry, I missed that.
– Gopala
Nov 22 at 20:30
add a comment |
Here is one potential solution using dplyr
and tidyr
:
library(dplyr)
library(tidyr)
library(stringr)
df %>%
gather(period, date, -group, -id) %>%
arrange(group, id, period) %>%
mutate(date = as.Date(date)) %>%
group_by(group, id) %>%
filter(!all(is.na(date))) %>%
mutate(
trend = as.integer(
floor(difftime(date, date[which.max(!is.na(date))], units = 'days') / 30)
) + 1,
period = str_replace(period, 'date', '')
) %>%
select(-date)
Output is as follows:
# A tibble: 24 x 4
# Groups: group, id [6]
group id period trend
<int> <int> <chr> <dbl>
1 1 1 1 1
2 1 1 2 8
3 1 1 3 15
4 1 1 4 21
5 1 2 1 NA
6 1 2 2 1
7 1 2 3 7
8 1 2 4 14
9 1 3 1 NA
10 1 3 2 NA
# ... with 14 more rows
NOTE: Edited to add a filter to filter out cases where ALL dates are NA for a given group / id. Otherwise, which,max
will fail.
Here is one potential solution using dplyr
and tidyr
:
library(dplyr)
library(tidyr)
library(stringr)
df %>%
gather(period, date, -group, -id) %>%
arrange(group, id, period) %>%
mutate(date = as.Date(date)) %>%
group_by(group, id) %>%
filter(!all(is.na(date))) %>%
mutate(
trend = as.integer(
floor(difftime(date, date[which.max(!is.na(date))], units = 'days') / 30)
) + 1,
period = str_replace(period, 'date', '')
) %>%
select(-date)
Output is as follows:
# A tibble: 24 x 4
# Groups: group, id [6]
group id period trend
<int> <int> <chr> <dbl>
1 1 1 1 1
2 1 1 2 8
3 1 1 3 15
4 1 1 4 21
5 1 2 1 NA
6 1 2 2 1
7 1 2 3 7
8 1 2 4 14
9 1 3 1 NA
10 1 3 2 NA
# ... with 14 more rows
NOTE: Edited to add a filter to filter out cases where ALL dates are NA for a given group / id. Otherwise, which,max
will fail.
edited Nov 22 at 20:30
answered Nov 22 at 20:22
Gopala
7,04621949
7,04621949
1
Just a note for future reference: For this solution, at least I neededlibrary(stringr)
forstr_replace()
.
– Mr. Zen
Nov 22 at 20:28
Yes. Edited. Sorry, I missed that.
– Gopala
Nov 22 at 20:30
add a comment |
1
Just a note for future reference: For this solution, at least I neededlibrary(stringr)
forstr_replace()
.
– Mr. Zen
Nov 22 at 20:28
Yes. Edited. Sorry, I missed that.
– Gopala
Nov 22 at 20:30
1
1
Just a note for future reference: For this solution, at least I needed
library(stringr)
for str_replace()
.– Mr. Zen
Nov 22 at 20:28
Just a note for future reference: For this solution, at least I needed
library(stringr)
for str_replace()
.– Mr. Zen
Nov 22 at 20:28
Yes. Edited. Sorry, I missed that.
– Gopala
Nov 22 at 20:30
Yes. Edited. Sorry, I missed that.
– Gopala
Nov 22 at 20:30
add a comment |
data.table approach
I leave the rounding and/or adding +1 to you.. this is always tricky with months. I personally try to avoid this, and calculate with days or weeks (or just about anything BUT months)...
library( data.table)
dt <- melt ( as.data.table( df ), id.vars = c("group", "id"), variable.name = "date_id", value.name = "date" )
setkey(dt, id, group, date_id)
dt[, diff := lubridate::interval( date[which.min( date ) ], date ) / months(1) , by = c("group", "id")]
head(dt)
# group id date_id date diff
# 1: 1 1 date1 1991-10-14 0.000000
# 2: 1 1 date2 1992-05-20 7.193548
# 3: 1 1 date3 1992-12-09 13.833333
# 4: 1 1 date4 1993-06-30 20.533333
# 5: 1 2 date1 <NA> NA
# 6: 1 2 date2 1992-05-21 0.000000
add a comment |
data.table approach
I leave the rounding and/or adding +1 to you.. this is always tricky with months. I personally try to avoid this, and calculate with days or weeks (or just about anything BUT months)...
library( data.table)
dt <- melt ( as.data.table( df ), id.vars = c("group", "id"), variable.name = "date_id", value.name = "date" )
setkey(dt, id, group, date_id)
dt[, diff := lubridate::interval( date[which.min( date ) ], date ) / months(1) , by = c("group", "id")]
head(dt)
# group id date_id date diff
# 1: 1 1 date1 1991-10-14 0.000000
# 2: 1 1 date2 1992-05-20 7.193548
# 3: 1 1 date3 1992-12-09 13.833333
# 4: 1 1 date4 1993-06-30 20.533333
# 5: 1 2 date1 <NA> NA
# 6: 1 2 date2 1992-05-21 0.000000
add a comment |
data.table approach
I leave the rounding and/or adding +1 to you.. this is always tricky with months. I personally try to avoid this, and calculate with days or weeks (or just about anything BUT months)...
library( data.table)
dt <- melt ( as.data.table( df ), id.vars = c("group", "id"), variable.name = "date_id", value.name = "date" )
setkey(dt, id, group, date_id)
dt[, diff := lubridate::interval( date[which.min( date ) ], date ) / months(1) , by = c("group", "id")]
head(dt)
# group id date_id date diff
# 1: 1 1 date1 1991-10-14 0.000000
# 2: 1 1 date2 1992-05-20 7.193548
# 3: 1 1 date3 1992-12-09 13.833333
# 4: 1 1 date4 1993-06-30 20.533333
# 5: 1 2 date1 <NA> NA
# 6: 1 2 date2 1992-05-21 0.000000
data.table approach
I leave the rounding and/or adding +1 to you.. this is always tricky with months. I personally try to avoid this, and calculate with days or weeks (or just about anything BUT months)...
library( data.table)
dt <- melt ( as.data.table( df ), id.vars = c("group", "id"), variable.name = "date_id", value.name = "date" )
setkey(dt, id, group, date_id)
dt[, diff := lubridate::interval( date[which.min( date ) ], date ) / months(1) , by = c("group", "id")]
head(dt)
# group id date_id date diff
# 1: 1 1 date1 1991-10-14 0.000000
# 2: 1 1 date2 1992-05-20 7.193548
# 3: 1 1 date3 1992-12-09 13.833333
# 4: 1 1 date4 1993-06-30 20.533333
# 5: 1 2 date1 <NA> NA
# 6: 1 2 date2 1992-05-21 0.000000
answered Nov 22 at 21:08
Wimpel
4,165321
4,165321
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437276%2fmonthly-time-trend-from-dataframe-of-dates%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please include the definition of data frame into the question itself.
– Heikki
Nov 22 at 20:13
1
I included it at the bottom of the question.
– Mr. Zen
Nov 22 at 20:17