Monthly time trend from dataframe of dates












0














I have a dataset that looks like this:



  group id      date1      date2      date3      date4
1 1 1 1991-10-14 1992-05-20 1992-12-09 1993-06-30
2 1 2 <NA> 1992-05-21 1992-12-10 1993-06-29
3 1 3 <NA> <NA> 1992-12-08 1993-06-29
4 1 4 1991-10-14 1992-05-19 <NA> <NA>
5 1 5 1991-10-15 1992-05-21 <NA> 1993-06-30
6 1 6 1991-10-15 <NA> <NA> 1993-06-30


Here the data is in R format:



structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L), 
id = 1:6,
date1 = structure(c(7956, NA, NA, 7956, 7957, 7957), class = "Date"),
date2 = structure(c(8175, 8176, NA, 8174, 8176, NA), class = "Date"),
date3 = structure(c(8378, 8379, 8377, NA, NA, NA), class = "Date"),
date4 = structure(c(8581, 8580, 8580, NA, 8581, 8581), class = "Date")),
.Names = c("group", "id", "date1", "date2", "date3", "date4"),
row.names = c(NA, 6L), class = "data.frame")


That is, we have a grouping variable, several individuals and four possible dates of interest.



Now I want to construct a linear month time trend for each individual from this. In other words, I try to construct a trend with value 1 on the first non-NA date. After that, the trend for the remaining non-NA periods are the months passed since the first non-NA date.



My goal is this structure (individual 1, group 1):



  group id period trend
1 1 1 1 1
2 1 1 2 8
3 1 1 3 15
4 1 1 4 21


That is, a molten data.frame with the months passed since t = 1.



I've played around with the ideas from this thread: Number of months between two dates. However, I can't find a solution that does not involve a for-loop and and excruciating number of if-statements.



Any help appreciated!










share|improve this question
























  • Please include the definition of data frame into the question itself.
    – Heikki
    Nov 22 at 20:13






  • 1




    I included it at the bottom of the question.
    – Mr. Zen
    Nov 22 at 20:17
















0














I have a dataset that looks like this:



  group id      date1      date2      date3      date4
1 1 1 1991-10-14 1992-05-20 1992-12-09 1993-06-30
2 1 2 <NA> 1992-05-21 1992-12-10 1993-06-29
3 1 3 <NA> <NA> 1992-12-08 1993-06-29
4 1 4 1991-10-14 1992-05-19 <NA> <NA>
5 1 5 1991-10-15 1992-05-21 <NA> 1993-06-30
6 1 6 1991-10-15 <NA> <NA> 1993-06-30


Here the data is in R format:



structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L), 
id = 1:6,
date1 = structure(c(7956, NA, NA, 7956, 7957, 7957), class = "Date"),
date2 = structure(c(8175, 8176, NA, 8174, 8176, NA), class = "Date"),
date3 = structure(c(8378, 8379, 8377, NA, NA, NA), class = "Date"),
date4 = structure(c(8581, 8580, 8580, NA, 8581, 8581), class = "Date")),
.Names = c("group", "id", "date1", "date2", "date3", "date4"),
row.names = c(NA, 6L), class = "data.frame")


That is, we have a grouping variable, several individuals and four possible dates of interest.



Now I want to construct a linear month time trend for each individual from this. In other words, I try to construct a trend with value 1 on the first non-NA date. After that, the trend for the remaining non-NA periods are the months passed since the first non-NA date.



My goal is this structure (individual 1, group 1):



  group id period trend
1 1 1 1 1
2 1 1 2 8
3 1 1 3 15
4 1 1 4 21


That is, a molten data.frame with the months passed since t = 1.



I've played around with the ideas from this thread: Number of months between two dates. However, I can't find a solution that does not involve a for-loop and and excruciating number of if-statements.



Any help appreciated!










share|improve this question
























  • Please include the definition of data frame into the question itself.
    – Heikki
    Nov 22 at 20:13






  • 1




    I included it at the bottom of the question.
    – Mr. Zen
    Nov 22 at 20:17














0












0








0







I have a dataset that looks like this:



  group id      date1      date2      date3      date4
1 1 1 1991-10-14 1992-05-20 1992-12-09 1993-06-30
2 1 2 <NA> 1992-05-21 1992-12-10 1993-06-29
3 1 3 <NA> <NA> 1992-12-08 1993-06-29
4 1 4 1991-10-14 1992-05-19 <NA> <NA>
5 1 5 1991-10-15 1992-05-21 <NA> 1993-06-30
6 1 6 1991-10-15 <NA> <NA> 1993-06-30


Here the data is in R format:



structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L), 
id = 1:6,
date1 = structure(c(7956, NA, NA, 7956, 7957, 7957), class = "Date"),
date2 = structure(c(8175, 8176, NA, 8174, 8176, NA), class = "Date"),
date3 = structure(c(8378, 8379, 8377, NA, NA, NA), class = "Date"),
date4 = structure(c(8581, 8580, 8580, NA, 8581, 8581), class = "Date")),
.Names = c("group", "id", "date1", "date2", "date3", "date4"),
row.names = c(NA, 6L), class = "data.frame")


That is, we have a grouping variable, several individuals and four possible dates of interest.



Now I want to construct a linear month time trend for each individual from this. In other words, I try to construct a trend with value 1 on the first non-NA date. After that, the trend for the remaining non-NA periods are the months passed since the first non-NA date.



My goal is this structure (individual 1, group 1):



  group id period trend
1 1 1 1 1
2 1 1 2 8
3 1 1 3 15
4 1 1 4 21


That is, a molten data.frame with the months passed since t = 1.



I've played around with the ideas from this thread: Number of months between two dates. However, I can't find a solution that does not involve a for-loop and and excruciating number of if-statements.



Any help appreciated!










share|improve this question















I have a dataset that looks like this:



  group id      date1      date2      date3      date4
1 1 1 1991-10-14 1992-05-20 1992-12-09 1993-06-30
2 1 2 <NA> 1992-05-21 1992-12-10 1993-06-29
3 1 3 <NA> <NA> 1992-12-08 1993-06-29
4 1 4 1991-10-14 1992-05-19 <NA> <NA>
5 1 5 1991-10-15 1992-05-21 <NA> 1993-06-30
6 1 6 1991-10-15 <NA> <NA> 1993-06-30


Here the data is in R format:



structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L), 
id = 1:6,
date1 = structure(c(7956, NA, NA, 7956, 7957, 7957), class = "Date"),
date2 = structure(c(8175, 8176, NA, 8174, 8176, NA), class = "Date"),
date3 = structure(c(8378, 8379, 8377, NA, NA, NA), class = "Date"),
date4 = structure(c(8581, 8580, 8580, NA, 8581, 8581), class = "Date")),
.Names = c("group", "id", "date1", "date2", "date3", "date4"),
row.names = c(NA, 6L), class = "data.frame")


That is, we have a grouping variable, several individuals and four possible dates of interest.



Now I want to construct a linear month time trend for each individual from this. In other words, I try to construct a trend with value 1 on the first non-NA date. After that, the trend for the remaining non-NA periods are the months passed since the first non-NA date.



My goal is this structure (individual 1, group 1):



  group id period trend
1 1 1 1 1
2 1 1 2 8
3 1 1 3 15
4 1 1 4 21


That is, a molten data.frame with the months passed since t = 1.



I've played around with the ideas from this thread: Number of months between two dates. However, I can't find a solution that does not involve a for-loop and and excruciating number of if-statements.



Any help appreciated!







r date dataframe trend






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 at 20:29









Heikki

1,2471017




1,2471017










asked Nov 22 at 19:47









Mr. Zen

483215




483215












  • Please include the definition of data frame into the question itself.
    – Heikki
    Nov 22 at 20:13






  • 1




    I included it at the bottom of the question.
    – Mr. Zen
    Nov 22 at 20:17


















  • Please include the definition of data frame into the question itself.
    – Heikki
    Nov 22 at 20:13






  • 1




    I included it at the bottom of the question.
    – Mr. Zen
    Nov 22 at 20:17
















Please include the definition of data frame into the question itself.
– Heikki
Nov 22 at 20:13




Please include the definition of data frame into the question itself.
– Heikki
Nov 22 at 20:13




1




1




I included it at the bottom of the question.
– Mr. Zen
Nov 22 at 20:17




I included it at the bottom of the question.
– Mr. Zen
Nov 22 at 20:17












2 Answers
2






active

oldest

votes


















1














Here is one potential solution using dplyr and tidyr:



library(dplyr)
library(tidyr)
library(stringr)

df %>%
gather(period, date, -group, -id) %>%
arrange(group, id, period) %>%
mutate(date = as.Date(date)) %>%
group_by(group, id) %>%
filter(!all(is.na(date))) %>%
mutate(
trend = as.integer(
floor(difftime(date, date[which.max(!is.na(date))], units = 'days') / 30)
) + 1,
period = str_replace(period, 'date', '')
) %>%
select(-date)


Output is as follows:



# A tibble: 24 x 4
# Groups: group, id [6]
group id period trend
<int> <int> <chr> <dbl>
1 1 1 1 1
2 1 1 2 8
3 1 1 3 15
4 1 1 4 21
5 1 2 1 NA
6 1 2 2 1
7 1 2 3 7
8 1 2 4 14
9 1 3 1 NA
10 1 3 2 NA
# ... with 14 more rows


NOTE: Edited to add a filter to filter out cases where ALL dates are NA for a given group / id. Otherwise, which,max will fail.






share|improve this answer



















  • 1




    Just a note for future reference: For this solution, at least I needed library(stringr) for str_replace().
    – Mr. Zen
    Nov 22 at 20:28










  • Yes. Edited. Sorry, I missed that.
    – Gopala
    Nov 22 at 20:30



















1














data.table approach



I leave the rounding and/or adding +1 to you.. this is always tricky with months. I personally try to avoid this, and calculate with days or weeks (or just about anything BUT months)...



library( data.table)
dt <- melt ( as.data.table( df ), id.vars = c("group", "id"), variable.name = "date_id", value.name = "date" )
setkey(dt, id, group, date_id)
dt[, diff := lubridate::interval( date[which.min( date ) ], date ) / months(1) , by = c("group", "id")]

head(dt)
# group id date_id date diff
# 1: 1 1 date1 1991-10-14 0.000000
# 2: 1 1 date2 1992-05-20 7.193548
# 3: 1 1 date3 1992-12-09 13.833333
# 4: 1 1 date4 1993-06-30 20.533333
# 5: 1 2 date1 <NA> NA
# 6: 1 2 date2 1992-05-21 0.000000





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437276%2fmonthly-time-trend-from-dataframe-of-dates%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Here is one potential solution using dplyr and tidyr:



    library(dplyr)
    library(tidyr)
    library(stringr)

    df %>%
    gather(period, date, -group, -id) %>%
    arrange(group, id, period) %>%
    mutate(date = as.Date(date)) %>%
    group_by(group, id) %>%
    filter(!all(is.na(date))) %>%
    mutate(
    trend = as.integer(
    floor(difftime(date, date[which.max(!is.na(date))], units = 'days') / 30)
    ) + 1,
    period = str_replace(period, 'date', '')
    ) %>%
    select(-date)


    Output is as follows:



    # A tibble: 24 x 4
    # Groups: group, id [6]
    group id period trend
    <int> <int> <chr> <dbl>
    1 1 1 1 1
    2 1 1 2 8
    3 1 1 3 15
    4 1 1 4 21
    5 1 2 1 NA
    6 1 2 2 1
    7 1 2 3 7
    8 1 2 4 14
    9 1 3 1 NA
    10 1 3 2 NA
    # ... with 14 more rows


    NOTE: Edited to add a filter to filter out cases where ALL dates are NA for a given group / id. Otherwise, which,max will fail.






    share|improve this answer



















    • 1




      Just a note for future reference: For this solution, at least I needed library(stringr) for str_replace().
      – Mr. Zen
      Nov 22 at 20:28










    • Yes. Edited. Sorry, I missed that.
      – Gopala
      Nov 22 at 20:30
















    1














    Here is one potential solution using dplyr and tidyr:



    library(dplyr)
    library(tidyr)
    library(stringr)

    df %>%
    gather(period, date, -group, -id) %>%
    arrange(group, id, period) %>%
    mutate(date = as.Date(date)) %>%
    group_by(group, id) %>%
    filter(!all(is.na(date))) %>%
    mutate(
    trend = as.integer(
    floor(difftime(date, date[which.max(!is.na(date))], units = 'days') / 30)
    ) + 1,
    period = str_replace(period, 'date', '')
    ) %>%
    select(-date)


    Output is as follows:



    # A tibble: 24 x 4
    # Groups: group, id [6]
    group id period trend
    <int> <int> <chr> <dbl>
    1 1 1 1 1
    2 1 1 2 8
    3 1 1 3 15
    4 1 1 4 21
    5 1 2 1 NA
    6 1 2 2 1
    7 1 2 3 7
    8 1 2 4 14
    9 1 3 1 NA
    10 1 3 2 NA
    # ... with 14 more rows


    NOTE: Edited to add a filter to filter out cases where ALL dates are NA for a given group / id. Otherwise, which,max will fail.






    share|improve this answer



















    • 1




      Just a note for future reference: For this solution, at least I needed library(stringr) for str_replace().
      – Mr. Zen
      Nov 22 at 20:28










    • Yes. Edited. Sorry, I missed that.
      – Gopala
      Nov 22 at 20:30














    1












    1








    1






    Here is one potential solution using dplyr and tidyr:



    library(dplyr)
    library(tidyr)
    library(stringr)

    df %>%
    gather(period, date, -group, -id) %>%
    arrange(group, id, period) %>%
    mutate(date = as.Date(date)) %>%
    group_by(group, id) %>%
    filter(!all(is.na(date))) %>%
    mutate(
    trend = as.integer(
    floor(difftime(date, date[which.max(!is.na(date))], units = 'days') / 30)
    ) + 1,
    period = str_replace(period, 'date', '')
    ) %>%
    select(-date)


    Output is as follows:



    # A tibble: 24 x 4
    # Groups: group, id [6]
    group id period trend
    <int> <int> <chr> <dbl>
    1 1 1 1 1
    2 1 1 2 8
    3 1 1 3 15
    4 1 1 4 21
    5 1 2 1 NA
    6 1 2 2 1
    7 1 2 3 7
    8 1 2 4 14
    9 1 3 1 NA
    10 1 3 2 NA
    # ... with 14 more rows


    NOTE: Edited to add a filter to filter out cases where ALL dates are NA for a given group / id. Otherwise, which,max will fail.






    share|improve this answer














    Here is one potential solution using dplyr and tidyr:



    library(dplyr)
    library(tidyr)
    library(stringr)

    df %>%
    gather(period, date, -group, -id) %>%
    arrange(group, id, period) %>%
    mutate(date = as.Date(date)) %>%
    group_by(group, id) %>%
    filter(!all(is.na(date))) %>%
    mutate(
    trend = as.integer(
    floor(difftime(date, date[which.max(!is.na(date))], units = 'days') / 30)
    ) + 1,
    period = str_replace(period, 'date', '')
    ) %>%
    select(-date)


    Output is as follows:



    # A tibble: 24 x 4
    # Groups: group, id [6]
    group id period trend
    <int> <int> <chr> <dbl>
    1 1 1 1 1
    2 1 1 2 8
    3 1 1 3 15
    4 1 1 4 21
    5 1 2 1 NA
    6 1 2 2 1
    7 1 2 3 7
    8 1 2 4 14
    9 1 3 1 NA
    10 1 3 2 NA
    # ... with 14 more rows


    NOTE: Edited to add a filter to filter out cases where ALL dates are NA for a given group / id. Otherwise, which,max will fail.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 22 at 20:30

























    answered Nov 22 at 20:22









    Gopala

    7,04621949




    7,04621949








    • 1




      Just a note for future reference: For this solution, at least I needed library(stringr) for str_replace().
      – Mr. Zen
      Nov 22 at 20:28










    • Yes. Edited. Sorry, I missed that.
      – Gopala
      Nov 22 at 20:30














    • 1




      Just a note for future reference: For this solution, at least I needed library(stringr) for str_replace().
      – Mr. Zen
      Nov 22 at 20:28










    • Yes. Edited. Sorry, I missed that.
      – Gopala
      Nov 22 at 20:30








    1




    1




    Just a note for future reference: For this solution, at least I needed library(stringr) for str_replace().
    – Mr. Zen
    Nov 22 at 20:28




    Just a note for future reference: For this solution, at least I needed library(stringr) for str_replace().
    – Mr. Zen
    Nov 22 at 20:28












    Yes. Edited. Sorry, I missed that.
    – Gopala
    Nov 22 at 20:30




    Yes. Edited. Sorry, I missed that.
    – Gopala
    Nov 22 at 20:30













    1














    data.table approach



    I leave the rounding and/or adding +1 to you.. this is always tricky with months. I personally try to avoid this, and calculate with days or weeks (or just about anything BUT months)...



    library( data.table)
    dt <- melt ( as.data.table( df ), id.vars = c("group", "id"), variable.name = "date_id", value.name = "date" )
    setkey(dt, id, group, date_id)
    dt[, diff := lubridate::interval( date[which.min( date ) ], date ) / months(1) , by = c("group", "id")]

    head(dt)
    # group id date_id date diff
    # 1: 1 1 date1 1991-10-14 0.000000
    # 2: 1 1 date2 1992-05-20 7.193548
    # 3: 1 1 date3 1992-12-09 13.833333
    # 4: 1 1 date4 1993-06-30 20.533333
    # 5: 1 2 date1 <NA> NA
    # 6: 1 2 date2 1992-05-21 0.000000





    share|improve this answer


























      1














      data.table approach



      I leave the rounding and/or adding +1 to you.. this is always tricky with months. I personally try to avoid this, and calculate with days or weeks (or just about anything BUT months)...



      library( data.table)
      dt <- melt ( as.data.table( df ), id.vars = c("group", "id"), variable.name = "date_id", value.name = "date" )
      setkey(dt, id, group, date_id)
      dt[, diff := lubridate::interval( date[which.min( date ) ], date ) / months(1) , by = c("group", "id")]

      head(dt)
      # group id date_id date diff
      # 1: 1 1 date1 1991-10-14 0.000000
      # 2: 1 1 date2 1992-05-20 7.193548
      # 3: 1 1 date3 1992-12-09 13.833333
      # 4: 1 1 date4 1993-06-30 20.533333
      # 5: 1 2 date1 <NA> NA
      # 6: 1 2 date2 1992-05-21 0.000000





      share|improve this answer
























        1












        1








        1






        data.table approach



        I leave the rounding and/or adding +1 to you.. this is always tricky with months. I personally try to avoid this, and calculate with days or weeks (or just about anything BUT months)...



        library( data.table)
        dt <- melt ( as.data.table( df ), id.vars = c("group", "id"), variable.name = "date_id", value.name = "date" )
        setkey(dt, id, group, date_id)
        dt[, diff := lubridate::interval( date[which.min( date ) ], date ) / months(1) , by = c("group", "id")]

        head(dt)
        # group id date_id date diff
        # 1: 1 1 date1 1991-10-14 0.000000
        # 2: 1 1 date2 1992-05-20 7.193548
        # 3: 1 1 date3 1992-12-09 13.833333
        # 4: 1 1 date4 1993-06-30 20.533333
        # 5: 1 2 date1 <NA> NA
        # 6: 1 2 date2 1992-05-21 0.000000





        share|improve this answer












        data.table approach



        I leave the rounding and/or adding +1 to you.. this is always tricky with months. I personally try to avoid this, and calculate with days or weeks (or just about anything BUT months)...



        library( data.table)
        dt <- melt ( as.data.table( df ), id.vars = c("group", "id"), variable.name = "date_id", value.name = "date" )
        setkey(dt, id, group, date_id)
        dt[, diff := lubridate::interval( date[which.min( date ) ], date ) / months(1) , by = c("group", "id")]

        head(dt)
        # group id date_id date diff
        # 1: 1 1 date1 1991-10-14 0.000000
        # 2: 1 1 date2 1992-05-20 7.193548
        # 3: 1 1 date3 1992-12-09 13.833333
        # 4: 1 1 date4 1993-06-30 20.533333
        # 5: 1 2 date1 <NA> NA
        # 6: 1 2 date2 1992-05-21 0.000000






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 22 at 21:08









        Wimpel

        4,165321




        4,165321






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437276%2fmonthly-time-trend-from-dataframe-of-dates%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to ignore python UserWarning in pytest?

            What visual should I use to simply compare current year value vs last year in Power BI desktop

            Script to remove string up to first number