Efficient solution to (recursively) replace NAs with the mean of lags, by group

up vote
0
down vote

favorite

I need to replace NAs with the mean of previous three values, by group.
Once an NA is replaced, it will serve as input for computing the mean corresponding to the next NA (if next NA is within the next three months).

Here it is an example:

id   date   value

1 2017-04-01 40

1 2017-05-01 40

1 2017-06-01 10

1 2017-07-01 NA

1 2017-08-01 NA

2 2014-01-01 27

2 2014-02-01 13

Data:

dt <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L), date = structure(c(17257, 17287, 17318, 17348, 17379, 16071, 16102), class = "Date"), value = c(40, 40, 10, NA, NA, 27, 13)), row.names = c(1L, 2L, 3L, 4L, 5L, 8L, 9L), class = "data.frame")

The output should look like:

id   date   value

1 2017-04-01 40.00

1 2017-05-01 40.00

1 2017-06-01 10.00

1 2017-07-01 30.00

1 2017-08-01 26.66

2 2014-01-01 27.00

2 2014-02-01 13.00

where 26.66 = (30 + 10 + 40)/3

What is an efficient way to do this (i.e. to avoid for loops)?

edited Nov 22 at 19:00

asked Nov 22 at 15:56

Luminita

1918

1

See this question.
– Rui Barradas
Nov 22 at 16:05

add a comment |

up vote
0
down vote

favorite

Here it is an example:

id   date   value

1 2017-04-01 40

1 2017-05-01 40

1 2017-06-01 10

1 2017-07-01 NA

1 2017-08-01 NA

2 2014-01-01 27

2 2014-02-01 13

Data:

The output should look like:

id   date   value

1 2017-04-01 40.00

1 2017-05-01 40.00

1 2017-06-01 10.00

1 2017-07-01 30.00

1 2017-08-01 26.66

2 2014-01-01 27.00

2 2014-02-01 13.00

where 26.66 = (30 + 10 + 40)/3

What is an efficient way to do this (i.e. to avoid for loops)?

edited Nov 22 at 19:00

asked Nov 22 at 15:56

Luminita

1918

1

See this question.
– Rui Barradas
Nov 22 at 16:05

add a comment |

up vote
0
down vote

favorite

Here it is an example:

id   date   value

1 2017-04-01 40

1 2017-05-01 40

1 2017-06-01 10

1 2017-07-01 NA

1 2017-08-01 NA

2 2014-01-01 27

2 2014-02-01 13

Data:

The output should look like:

id   date   value

1 2017-04-01 40.00

1 2017-05-01 40.00

1 2017-06-01 10.00

1 2017-07-01 30.00

1 2017-08-01 26.66

2 2014-01-01 27.00

2 2014-02-01 13.00

where 26.66 = (30 + 10 + 40)/3

What is an efficient way to do this (i.e. to avoid for loops)?

edited Nov 22 at 19:00

asked Nov 22 at 15:56

Luminita

1918

Here it is an example:

id   date   value

1 2017-04-01 40

1 2017-05-01 40

1 2017-06-01 10

1 2017-07-01 NA

1 2017-08-01 NA

2 2014-01-01 27

2 2014-02-01 13

Data:

The output should look like:

id   date   value

1 2017-04-01 40.00

1 2017-05-01 40.00

1 2017-06-01 10.00

1 2017-07-01 30.00

1 2017-08-01 26.66

2 2014-01-01 27.00

2 2014-02-01 13.00

where 26.66 = (30 + 10 + 40)/3

What is an efficient way to do this (i.e. to avoid for loops)?

r dplyr apply

edited Nov 22 at 19:00

asked Nov 22 at 15:56

Luminita

1918

edited Nov 22 at 19:00

asked Nov 22 at 15:56

Luminita

1918

edited Nov 22 at 19:00

asked Nov 22 at 15:56

Luminita

1918

asked Nov 22 at 15:56

Luminita

1918

asked Nov 22 at 15:56

Luminita

1918

1

See this question.
– Rui Barradas
Nov 22 at 16:05

add a comment |

1

See this question.
– Rui Barradas
Nov 22 at 16:05

See this question.
– Rui Barradas
Nov 22 at 16:05

add a comment |

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

Define a roll function which takes 3 or less previous values as a list and the current value and returns as a list the previous 2 values with the current value if the current value is not NA and the prevous 2 values with the mean if the current value is NA. Use that with Reduce and pick off the last value of each list in the result. Then apply all that to each group using ave.

roll <- function(prev, cur) {

  prev <- unlist(prev)

  list(tail(prev, 2), if (is.na(cur)) mean(prev) else cur)

}



reduce_roll <- function(x) {

  sapply(Reduce(roll, init = x[1], x[-1], acc = TRUE), tail, 1)

}



transform(dt, value = ave(value, id, FUN = reduce_roll))

giving:

  id       date    value

1  1 2017-04-01       40

2  1 2017-05-01       40

3  1 2017-06-01       10

4  1 2017-07-01       30

5  1 2017-08-01 26.66667

8  2 2014-01-01       27

9  2 2014-02-01       13

edited Nov 22 at 22:09

answered Nov 22 at 16:48

G. Grothendieck

144k9125230

Neat answer, thanks. Also, I corrected the cut off data (it was a copy-paste mistake which I didn't notice).
– Luminita
Nov 22 at 19:04

add a comment |

up vote
1
down vote

The following uses base R only and does what you need.

sp <- split(dt, dt$id)

sp <- lapply(sp, function(DF){

  for(i in which(is.na(DF$value))){

    tmp <- DF[seq_len(i - 1), ]

    DF$value[i] <- mean(tail(tmp$value, 3))

  }

  DF

})



result <- do.call(rbind, sp)

row.names(result) <- NULL



result

#  id       date    value

#1  1 2017-01-04 40.00000

#2  1 2017-01-05 40.00000

#3  1 2017-01-06 10.00000

#4  1 2017-01-07 30.00000

#5  1 2017-01-08 26.66667

#6  2 2014-01-01 27.00000

#7  2 2014-01-02 13.00000

answered Nov 22 at 16:16

Rui Barradas

15.4k31730

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53434559%2fefficient-solution-to-recursively-replace-nas-with-the-mean-of-lags-by-group%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

roll <- function(prev, cur) {

  prev <- unlist(prev)

  list(tail(prev, 2), if (is.na(cur)) mean(prev) else cur)

}



reduce_roll <- function(x) {

  sapply(Reduce(roll, init = x[1], x[-1], acc = TRUE), tail, 1)

}



transform(dt, value = ave(value, id, FUN = reduce_roll))

giving:

  id       date    value

1  1 2017-04-01       40

2  1 2017-05-01       40

3  1 2017-06-01       10

4  1 2017-07-01       30

5  1 2017-08-01 26.66667

8  2 2014-01-01       27

9  2 2014-02-01       13

edited Nov 22 at 22:09

answered Nov 22 at 16:48

G. Grothendieck

144k9125230

Neat answer, thanks. Also, I corrected the cut off data (it was a copy-paste mistake which I didn't notice).
– Luminita
Nov 22 at 19:04

add a comment |

up vote
1
down vote

accepted

roll <- function(prev, cur) {

  prev <- unlist(prev)

  list(tail(prev, 2), if (is.na(cur)) mean(prev) else cur)

}



reduce_roll <- function(x) {

  sapply(Reduce(roll, init = x[1], x[-1], acc = TRUE), tail, 1)

}



transform(dt, value = ave(value, id, FUN = reduce_roll))

giving:

  id       date    value

1  1 2017-04-01       40

2  1 2017-05-01       40

3  1 2017-06-01       10

4  1 2017-07-01       30

5  1 2017-08-01 26.66667

8  2 2014-01-01       27

9  2 2014-02-01       13

edited Nov 22 at 22:09

answered Nov 22 at 16:48

G. Grothendieck

144k9125230

Neat answer, thanks. Also, I corrected the cut off data (it was a copy-paste mistake which I didn't notice).
– Luminita
Nov 22 at 19:04

add a comment |

up vote
1
down vote

accepted

roll <- function(prev, cur) {

  prev <- unlist(prev)

  list(tail(prev, 2), if (is.na(cur)) mean(prev) else cur)

}



reduce_roll <- function(x) {

  sapply(Reduce(roll, init = x[1], x[-1], acc = TRUE), tail, 1)

}



transform(dt, value = ave(value, id, FUN = reduce_roll))

giving:

  id       date    value

1  1 2017-04-01       40

2  1 2017-05-01       40

3  1 2017-06-01       10

4  1 2017-07-01       30

5  1 2017-08-01 26.66667

8  2 2014-01-01       27

9  2 2014-02-01       13

edited Nov 22 at 22:09

answered Nov 22 at 16:48

G. Grothendieck

144k9125230

roll <- function(prev, cur) {

  prev <- unlist(prev)

  list(tail(prev, 2), if (is.na(cur)) mean(prev) else cur)

}



reduce_roll <- function(x) {

  sapply(Reduce(roll, init = x[1], x[-1], acc = TRUE), tail, 1)

}



transform(dt, value = ave(value, id, FUN = reduce_roll))

giving:

  id       date    value

1  1 2017-04-01       40

2  1 2017-05-01       40

3  1 2017-06-01       10

4  1 2017-07-01       30

5  1 2017-08-01 26.66667

8  2 2014-01-01       27

9  2 2014-02-01       13

edited Nov 22 at 22:09

answered Nov 22 at 16:48

G. Grothendieck

144k9125230

edited Nov 22 at 22:09

answered Nov 22 at 16:48

G. Grothendieck

144k9125230

answered Nov 22 at 16:48

G. Grothendieck

144k9125230

answered Nov 22 at 16:48

G. Grothendieck

144k9125230

Neat answer, thanks. Also, I corrected the cut off data (it was a copy-paste mistake which I didn't notice).
– Luminita
Nov 22 at 19:04

add a comment |

Neat answer, thanks. Also, I corrected the cut off data (it was a copy-paste mistake which I didn't notice).
– Luminita
Nov 22 at 19:04

Neat answer, thanks. Also, I corrected the cut off data (it was a copy-paste mistake which I didn't notice).
– Luminita
Nov 22 at 19:04

add a comment |

up vote
1
down vote

The following uses base R only and does what you need.

sp <- split(dt, dt$id)

sp <- lapply(sp, function(DF){

  for(i in which(is.na(DF$value))){

    tmp <- DF[seq_len(i - 1), ]

    DF$value[i] <- mean(tail(tmp$value, 3))

  }

  DF

})



result <- do.call(rbind, sp)

row.names(result) <- NULL



result

#  id       date    value

#1  1 2017-01-04 40.00000

#2  1 2017-01-05 40.00000

#3  1 2017-01-06 10.00000

#4  1 2017-01-07 30.00000

#5  1 2017-01-08 26.66667

#6  2 2014-01-01 27.00000

#7  2 2014-01-02 13.00000

answered Nov 22 at 16:16

Rui Barradas

15.4k31730

add a comment |

up vote
1
down vote

The following uses base R only and does what you need.

sp <- split(dt, dt$id)

sp <- lapply(sp, function(DF){

  for(i in which(is.na(DF$value))){

    tmp <- DF[seq_len(i - 1), ]

    DF$value[i] <- mean(tail(tmp$value, 3))

  }

  DF

})



result <- do.call(rbind, sp)

row.names(result) <- NULL



result

#  id       date    value

#1  1 2017-01-04 40.00000

#2  1 2017-01-05 40.00000

#3  1 2017-01-06 10.00000

#4  1 2017-01-07 30.00000

#5  1 2017-01-08 26.66667

#6  2 2014-01-01 27.00000

#7  2 2014-01-02 13.00000

answered Nov 22 at 16:16

Rui Barradas

15.4k31730

add a comment |

up vote
1
down vote

The following uses base R only and does what you need.

sp <- split(dt, dt$id)

sp <- lapply(sp, function(DF){

  for(i in which(is.na(DF$value))){

    tmp <- DF[seq_len(i - 1), ]

    DF$value[i] <- mean(tail(tmp$value, 3))

  }

  DF

})



result <- do.call(rbind, sp)

row.names(result) <- NULL



result

#  id       date    value

#1  1 2017-01-04 40.00000

#2  1 2017-01-05 40.00000

#3  1 2017-01-06 10.00000

#4  1 2017-01-07 30.00000

#5  1 2017-01-08 26.66667

#6  2 2014-01-01 27.00000

#7  2 2014-01-02 13.00000

answered Nov 22 at 16:16

Rui Barradas

15.4k31730

The following uses base R only and does what you need.

sp <- split(dt, dt$id)

sp <- lapply(sp, function(DF){

  for(i in which(is.na(DF$value))){

    tmp <- DF[seq_len(i - 1), ]

    DF$value[i] <- mean(tail(tmp$value, 3))

  }

  DF

})



result <- do.call(rbind, sp)

row.names(result) <- NULL



result

#  id       date    value

#1  1 2017-01-04 40.00000

#2  1 2017-01-05 40.00000

#3  1 2017-01-06 10.00000

#4  1 2017-01-07 30.00000

#5  1 2017-01-08 26.66667

#6  2 2014-01-01 27.00000

#7  2 2014-01-02 13.00000

answered Nov 22 at 16:16

Rui Barradas

15.4k31730

answered Nov 22 at 16:16

Rui Barradas

15.4k31730

answered Nov 22 at 16:16

Rui Barradas

15.4k31730

answered Nov 22 at 16:16

Rui Barradas

15.4k31730

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Qfyilyi