Monthly time trend from dataframe of dates

I have a dataset that looks like this:

  group id      date1      date2      date3      date4

1     1  1 1991-10-14 1992-05-20 1992-12-09 1993-06-30

2     1  2       <NA> 1992-05-21 1992-12-10 1993-06-29

3     1  3       <NA>       <NA> 1992-12-08 1993-06-29

4     1  4 1991-10-14 1992-05-19       <NA>       <NA>

5     1  5 1991-10-15 1992-05-21       <NA> 1993-06-30

6     1  6 1991-10-15       <NA>       <NA> 1993-06-30

Here the data is in R format:

structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L), 

               id = 1:6, 

               date1 = structure(c(7956, NA, NA, 7956, 7957, 7957), class = "Date"), 

               date2 = structure(c(8175, 8176, NA, 8174, 8176, NA), class = "Date"), 

               date3 = structure(c(8378, 8379, 8377, NA, NA, NA), class = "Date"), 

               date4 = structure(c(8581, 8580, 8580, NA, 8581, 8581), class = "Date")), 

          .Names = c("group", "id", "date1", "date2", "date3", "date4"),

          row.names = c(NA, 6L), class = "data.frame")

That is, we have a grouping variable, several individuals and four possible dates of interest.

Now I want to construct a linear month time trend for each individual from this. In other words, I try to construct a trend with value 1 on the first non-NA date. After that, the trend for the remaining non-NA periods are the months passed since the first non-NA date.

My goal is this structure (individual 1, group 1):

  group id period trend

1     1  1      1     1

2     1  1      2     8

3     1  1      3    15

4     1  1      4    21

That is, a molten data.frame with the months passed since t = 1.

I've played around with the ideas from this thread: Number of months between two dates. However, I can't find a solution that does not involve a for-loop and and excruciating number of if-statements.

Any help appreciated!

edited Nov 22 at 20:29

Heikki

1,2471017

asked Nov 22 at 19:47

Mr. Zen

483215

Please include the definition of data frame into the question itself.
– Heikki
Nov 22 at 20:13

1

I included it at the bottom of the question.
– Mr. Zen
Nov 22 at 20:17

add a comment |

I have a dataset that looks like this:

  group id      date1      date2      date3      date4

1     1  1 1991-10-14 1992-05-20 1992-12-09 1993-06-30

2     1  2       <NA> 1992-05-21 1992-12-10 1993-06-29

3     1  3       <NA>       <NA> 1992-12-08 1993-06-29

4     1  4 1991-10-14 1992-05-19       <NA>       <NA>

5     1  5 1991-10-15 1992-05-21       <NA> 1993-06-30

6     1  6 1991-10-15       <NA>       <NA> 1993-06-30

Here the data is in R format:

structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L), 

               id = 1:6, 

               date1 = structure(c(7956, NA, NA, 7956, 7957, 7957), class = "Date"), 

               date2 = structure(c(8175, 8176, NA, 8174, 8176, NA), class = "Date"), 

               date3 = structure(c(8378, 8379, 8377, NA, NA, NA), class = "Date"), 

               date4 = structure(c(8581, 8580, 8580, NA, 8581, 8581), class = "Date")), 

          .Names = c("group", "id", "date1", "date2", "date3", "date4"),

          row.names = c(NA, 6L), class = "data.frame")

That is, we have a grouping variable, several individuals and four possible dates of interest.

My goal is this structure (individual 1, group 1):

  group id period trend

1     1  1      1     1

2     1  1      2     8

3     1  1      3    15

4     1  1      4    21

That is, a molten data.frame with the months passed since t = 1.

Any help appreciated!

edited Nov 22 at 20:29

Heikki

1,2471017

asked Nov 22 at 19:47

Mr. Zen

483215

Please include the definition of data frame into the question itself.
– Heikki
Nov 22 at 20:13

1

I included it at the bottom of the question.
– Mr. Zen
Nov 22 at 20:17

add a comment |

I have a dataset that looks like this:

  group id      date1      date2      date3      date4

1     1  1 1991-10-14 1992-05-20 1992-12-09 1993-06-30

2     1  2       <NA> 1992-05-21 1992-12-10 1993-06-29

3     1  3       <NA>       <NA> 1992-12-08 1993-06-29

4     1  4 1991-10-14 1992-05-19       <NA>       <NA>

5     1  5 1991-10-15 1992-05-21       <NA> 1993-06-30

6     1  6 1991-10-15       <NA>       <NA> 1993-06-30

Here the data is in R format:

structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L), 

               id = 1:6, 

               date1 = structure(c(7956, NA, NA, 7956, 7957, 7957), class = "Date"), 

               date2 = structure(c(8175, 8176, NA, 8174, 8176, NA), class = "Date"), 

               date3 = structure(c(8378, 8379, 8377, NA, NA, NA), class = "Date"), 

               date4 = structure(c(8581, 8580, 8580, NA, 8581, 8581), class = "Date")), 

          .Names = c("group", "id", "date1", "date2", "date3", "date4"),

          row.names = c(NA, 6L), class = "data.frame")

That is, we have a grouping variable, several individuals and four possible dates of interest.

My goal is this structure (individual 1, group 1):

  group id period trend

1     1  1      1     1

2     1  1      2     8

3     1  1      3    15

4     1  1      4    21

That is, a molten data.frame with the months passed since t = 1.

Any help appreciated!

edited Nov 22 at 20:29

Heikki

1,2471017

asked Nov 22 at 19:47

Mr. Zen

483215

I have a dataset that looks like this:

  group id      date1      date2      date3      date4

1     1  1 1991-10-14 1992-05-20 1992-12-09 1993-06-30

2     1  2       <NA> 1992-05-21 1992-12-10 1993-06-29

3     1  3       <NA>       <NA> 1992-12-08 1993-06-29

4     1  4 1991-10-14 1992-05-19       <NA>       <NA>

5     1  5 1991-10-15 1992-05-21       <NA> 1993-06-30

6     1  6 1991-10-15       <NA>       <NA> 1993-06-30

Here the data is in R format:

structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L), 

               id = 1:6, 

               date1 = structure(c(7956, NA, NA, 7956, 7957, 7957), class = "Date"), 

               date2 = structure(c(8175, 8176, NA, 8174, 8176, NA), class = "Date"), 

               date3 = structure(c(8378, 8379, 8377, NA, NA, NA), class = "Date"), 

               date4 = structure(c(8581, 8580, 8580, NA, 8581, 8581), class = "Date")), 

          .Names = c("group", "id", "date1", "date2", "date3", "date4"),

          row.names = c(NA, 6L), class = "data.frame")

That is, we have a grouping variable, several individuals and four possible dates of interest.

My goal is this structure (individual 1, group 1):

  group id period trend

1     1  1      1     1

2     1  1      2     8

3     1  1      3    15

4     1  1      4    21

That is, a molten data.frame with the months passed since t = 1.

Any help appreciated!

r date dataframe trend

edited Nov 22 at 20:29

Heikki

1,2471017

asked Nov 22 at 19:47

Mr. Zen

483215

edited Nov 22 at 20:29

Heikki

1,2471017

asked Nov 22 at 19:47

Mr. Zen

483215

edited Nov 22 at 20:29

Heikki

1,2471017

edited Nov 22 at 20:29

Heikki

1,2471017

edited Nov 22 at 20:29

Heikki

1,2471017

asked Nov 22 at 19:47

Mr. Zen

483215

asked Nov 22 at 19:47

Mr. Zen

483215

asked Nov 22 at 19:47

Mr. Zen

483215

Please include the definition of data frame into the question itself.
– Heikki
Nov 22 at 20:13

1

I included it at the bottom of the question.
– Mr. Zen
Nov 22 at 20:17

add a comment |

Please include the definition of data frame into the question itself.
– Heikki
Nov 22 at 20:13

1

I included it at the bottom of the question.
– Mr. Zen
Nov 22 at 20:17

Please include the definition of data frame into the question itself.
– Heikki
Nov 22 at 20:13

I included it at the bottom of the question.
– Mr. Zen
Nov 22 at 20:17

add a comment |

2 Answers
2

active

oldest

votes

Here is one potential solution using dplyr and tidyr:

library(dplyr)

library(tidyr)

library(stringr)



df %>%

  gather(period, date, -group, -id) %>%

  arrange(group, id, period) %>%

  mutate(date = as.Date(date)) %>%

  group_by(group, id) %>%

  filter(!all(is.na(date))) %>% 

  mutate(

    trend = as.integer(

      floor(difftime(date, date[which.max(!is.na(date))], units = 'days') / 30)

      ) + 1,

    period = str_replace(period, 'date', '')

    ) %>%

  select(-date)

Output is as follows:

# A tibble: 24 x 4

# Groups:   group, id [6]

   group    id period trend

   <int> <int>  <chr> <dbl>

 1     1     1      1     1

 2     1     1      2     8

 3     1     1      3    15

 4     1     1      4    21

 5     1     2      1    NA

 6     1     2      2     1

 7     1     2      3     7

 8     1     2      4    14

 9     1     3      1    NA

10     1     3      2    NA

# ... with 14 more rows

NOTE: Edited to add a filter to filter out cases where ALL dates are NA for a given group / id. Otherwise, which,max will fail.

edited Nov 22 at 20:30

answered Nov 22 at 20:22

Gopala

7,04621949

1

Just a note for future reference: For this solution, at least I needed library(stringr) for str_replace().
– Mr. Zen
Nov 22 at 20:28

Yes. Edited. Sorry, I missed that.
– Gopala
Nov 22 at 20:30

add a comment |

data.table approach

I leave the rounding and/or adding +1 to you.. this is always tricky with months. I personally try to avoid this, and calculate with days or weeks (or just about anything BUT months)...

library( data.table)

dt <- melt ( as.data.table( df ), id.vars = c("group", "id"), variable.name = "date_id", value.name = "date" )

setkey(dt, id, group, date_id)

dt[, diff := lubridate::interval(  date[which.min( date ) ], date ) / months(1) , by = c("group", "id")]



head(dt)

#    group id date_id       date      diff

# 1:     1  1   date1 1991-10-14  0.000000

# 2:     1  1   date2 1992-05-20  7.193548

# 3:     1  1   date3 1992-12-09 13.833333

# 4:     1  1   date4 1993-06-30 20.533333

# 5:     1  2   date1       <NA>        NA

# 6:     1  2   date2 1992-05-21  0.000000

answered Nov 22 at 21:08

Wimpel

4,165321

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437276%2fmonthly-time-trend-from-dataframe-of-dates%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Here is one potential solution using dplyr and tidyr:

library(dplyr)

library(tidyr)

library(stringr)



df %>%

  gather(period, date, -group, -id) %>%

  arrange(group, id, period) %>%

  mutate(date = as.Date(date)) %>%

  group_by(group, id) %>%

  filter(!all(is.na(date))) %>% 

  mutate(

    trend = as.integer(

      floor(difftime(date, date[which.max(!is.na(date))], units = 'days') / 30)

      ) + 1,

    period = str_replace(period, 'date', '')

    ) %>%

  select(-date)

Output is as follows:

# A tibble: 24 x 4

# Groups:   group, id [6]

   group    id period trend

   <int> <int>  <chr> <dbl>

 1     1     1      1     1

 2     1     1      2     8

 3     1     1      3    15

 4     1     1      4    21

 5     1     2      1    NA

 6     1     2      2     1

 7     1     2      3     7

 8     1     2      4    14

 9     1     3      1    NA

10     1     3      2    NA

# ... with 14 more rows

NOTE: Edited to add a filter to filter out cases where ALL dates are NA for a given group / id. Otherwise, which,max will fail.

edited Nov 22 at 20:30

answered Nov 22 at 20:22

Gopala

7,04621949

1

Just a note for future reference: For this solution, at least I needed library(stringr) for str_replace().
– Mr. Zen
Nov 22 at 20:28

Yes. Edited. Sorry, I missed that.
– Gopala
Nov 22 at 20:30

add a comment |

Here is one potential solution using dplyr and tidyr:

library(dplyr)

library(tidyr)

library(stringr)



df %>%

  gather(period, date, -group, -id) %>%

  arrange(group, id, period) %>%

  mutate(date = as.Date(date)) %>%

  group_by(group, id) %>%

  filter(!all(is.na(date))) %>% 

  mutate(

    trend = as.integer(

      floor(difftime(date, date[which.max(!is.na(date))], units = 'days') / 30)

      ) + 1,

    period = str_replace(period, 'date', '')

    ) %>%

  select(-date)

Output is as follows:

# A tibble: 24 x 4

# Groups:   group, id [6]

   group    id period trend

   <int> <int>  <chr> <dbl>

 1     1     1      1     1

 2     1     1      2     8

 3     1     1      3    15

 4     1     1      4    21

 5     1     2      1    NA

 6     1     2      2     1

 7     1     2      3     7

 8     1     2      4    14

 9     1     3      1    NA

10     1     3      2    NA

# ... with 14 more rows

NOTE: Edited to add a filter to filter out cases where ALL dates are NA for a given group / id. Otherwise, which,max will fail.

edited Nov 22 at 20:30

answered Nov 22 at 20:22

Gopala

7,04621949

1

Just a note for future reference: For this solution, at least I needed library(stringr) for str_replace().
– Mr. Zen
Nov 22 at 20:28

Yes. Edited. Sorry, I missed that.
– Gopala
Nov 22 at 20:30

add a comment |

Here is one potential solution using dplyr and tidyr:

library(dplyr)

library(tidyr)

library(stringr)



df %>%

  gather(period, date, -group, -id) %>%

  arrange(group, id, period) %>%

  mutate(date = as.Date(date)) %>%

  group_by(group, id) %>%

  filter(!all(is.na(date))) %>% 

  mutate(

    trend = as.integer(

      floor(difftime(date, date[which.max(!is.na(date))], units = 'days') / 30)

      ) + 1,

    period = str_replace(period, 'date', '')

    ) %>%

  select(-date)

Output is as follows:

# A tibble: 24 x 4

# Groups:   group, id [6]

   group    id period trend

   <int> <int>  <chr> <dbl>

 1     1     1      1     1

 2     1     1      2     8

 3     1     1      3    15

 4     1     1      4    21

 5     1     2      1    NA

 6     1     2      2     1

 7     1     2      3     7

 8     1     2      4    14

 9     1     3      1    NA

10     1     3      2    NA

# ... with 14 more rows

NOTE: Edited to add a filter to filter out cases where ALL dates are NA for a given group / id. Otherwise, which,max will fail.

edited Nov 22 at 20:30

answered Nov 22 at 20:22

Gopala

7,04621949

Here is one potential solution using dplyr and tidyr:

library(dplyr)

library(tidyr)

library(stringr)



df %>%

  gather(period, date, -group, -id) %>%

  arrange(group, id, period) %>%

  mutate(date = as.Date(date)) %>%

  group_by(group, id) %>%

  filter(!all(is.na(date))) %>% 

  mutate(

    trend = as.integer(

      floor(difftime(date, date[which.max(!is.na(date))], units = 'days') / 30)

      ) + 1,

    period = str_replace(period, 'date', '')

    ) %>%

  select(-date)

Output is as follows:

# A tibble: 24 x 4

# Groups:   group, id [6]

   group    id period trend

   <int> <int>  <chr> <dbl>

 1     1     1      1     1

 2     1     1      2     8

 3     1     1      3    15

 4     1     1      4    21

 5     1     2      1    NA

 6     1     2      2     1

 7     1     2      3     7

 8     1     2      4    14

 9     1     3      1    NA

10     1     3      2    NA

# ... with 14 more rows

NOTE: Edited to add a filter to filter out cases where ALL dates are NA for a given group / id. Otherwise, which,max will fail.

edited Nov 22 at 20:30

answered Nov 22 at 20:22

Gopala

7,04621949

edited Nov 22 at 20:30

answered Nov 22 at 20:22

Gopala

7,04621949

answered Nov 22 at 20:22

Gopala

7,04621949

answered Nov 22 at 20:22

Gopala

7,04621949

1

Just a note for future reference: For this solution, at least I needed library(stringr) for str_replace().
– Mr. Zen
Nov 22 at 20:28

Yes. Edited. Sorry, I missed that.
– Gopala
Nov 22 at 20:30

add a comment |

1

Just a note for future reference: For this solution, at least I needed library(stringr) for str_replace().
– Mr. Zen
Nov 22 at 20:28

Yes. Edited. Sorry, I missed that.
– Gopala
Nov 22 at 20:30

Just a note for future reference: For this solution, at least I needed library(stringr) for str_replace().
– Mr. Zen
Nov 22 at 20:28

Yes. Edited. Sorry, I missed that.
– Gopala
Nov 22 at 20:30

add a comment |

data.table approach

I leave the rounding and/or adding +1 to you.. this is always tricky with months. I personally try to avoid this, and calculate with days or weeks (or just about anything BUT months)...

library( data.table)

dt <- melt ( as.data.table( df ), id.vars = c("group", "id"), variable.name = "date_id", value.name = "date" )

setkey(dt, id, group, date_id)

dt[, diff := lubridate::interval(  date[which.min( date ) ], date ) / months(1) , by = c("group", "id")]



head(dt)

#    group id date_id       date      diff

# 1:     1  1   date1 1991-10-14  0.000000

# 2:     1  1   date2 1992-05-20  7.193548

# 3:     1  1   date3 1992-12-09 13.833333

# 4:     1  1   date4 1993-06-30 20.533333

# 5:     1  2   date1       <NA>        NA

# 6:     1  2   date2 1992-05-21  0.000000

answered Nov 22 at 21:08

Wimpel

4,165321

add a comment |

data.table approach

I leave the rounding and/or adding +1 to you.. this is always tricky with months. I personally try to avoid this, and calculate with days or weeks (or just about anything BUT months)...

library( data.table)

dt <- melt ( as.data.table( df ), id.vars = c("group", "id"), variable.name = "date_id", value.name = "date" )

setkey(dt, id, group, date_id)

dt[, diff := lubridate::interval(  date[which.min( date ) ], date ) / months(1) , by = c("group", "id")]



head(dt)

#    group id date_id       date      diff

# 1:     1  1   date1 1991-10-14  0.000000

# 2:     1  1   date2 1992-05-20  7.193548

# 3:     1  1   date3 1992-12-09 13.833333

# 4:     1  1   date4 1993-06-30 20.533333

# 5:     1  2   date1       <NA>        NA

# 6:     1  2   date2 1992-05-21  0.000000

answered Nov 22 at 21:08

Wimpel

4,165321

add a comment |

data.table approach

I leave the rounding and/or adding +1 to you.. this is always tricky with months. I personally try to avoid this, and calculate with days or weeks (or just about anything BUT months)...

library( data.table)

dt <- melt ( as.data.table( df ), id.vars = c("group", "id"), variable.name = "date_id", value.name = "date" )

setkey(dt, id, group, date_id)

dt[, diff := lubridate::interval(  date[which.min( date ) ], date ) / months(1) , by = c("group", "id")]



head(dt)

#    group id date_id       date      diff

# 1:     1  1   date1 1991-10-14  0.000000

# 2:     1  1   date2 1992-05-20  7.193548

# 3:     1  1   date3 1992-12-09 13.833333

# 4:     1  1   date4 1993-06-30 20.533333

# 5:     1  2   date1       <NA>        NA

# 6:     1  2   date2 1992-05-21  0.000000

answered Nov 22 at 21:08

Wimpel

4,165321

data.table approach

I leave the rounding and/or adding +1 to you.. this is always tricky with months. I personally try to avoid this, and calculate with days or weeks (or just about anything BUT months)...

library( data.table)

dt <- melt ( as.data.table( df ), id.vars = c("group", "id"), variable.name = "date_id", value.name = "date" )

setkey(dt, id, group, date_id)

dt[, diff := lubridate::interval(  date[which.min( date ) ], date ) / months(1) , by = c("group", "id")]



head(dt)

#    group id date_id       date      diff

# 1:     1  1   date1 1991-10-14  0.000000

# 2:     1  1   date2 1992-05-20  7.193548

# 3:     1  1   date3 1992-12-09 13.833333

# 4:     1  1   date4 1993-06-30 20.533333

# 5:     1  2   date1       <NA>        NA

# 6:     1  2   date2 1992-05-21  0.000000

answered Nov 22 at 21:08

Wimpel

4,165321

answered Nov 22 at 21:08

Wimpel

4,165321

answered Nov 22 at 21:08

Wimpel

4,165321

answered Nov 22 at 21:08

Wimpel

4,165321

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Qfyilyi