How to calculate the average of the most recent three non-nan value using Python

I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.

name day1 day2 day3 day4  day5 day6 day7

A    1     1   nan   2    3    0   3

B    nan   nan nan   nan  nan  nan 3

C    1     1   0     1    1    1   1

D    1     1   0     1    nan  1   4

The expect output should looks like the following

name day1 day2 day3 day4  day5 day6 day7    expected 

A    1     1   nan   2    3    0   3        2     <-  1/3*(day5 + day6 + day7)

B    nan   nan nan   nan  nan  nan 3        nan   <-  less than 3 non-missing

C    1     1   0     1    1    1   1        1     <-  1/3*(day5 + day6 + day7)

D    1     1   0     1    nan  1   4        2    <-  1/3 *(day4 + day6 + day7)

I know how to calculate the average of the last three column and count how many non-missing observation are there.
df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column

If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.

But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?

edited 6 hours ago

asked 7 hours ago

fly36

1721112

add a comment |

I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.

name day1 day2 day3 day4  day5 day6 day7

A    1     1   nan   2    3    0   3

B    nan   nan nan   nan  nan  nan 3

C    1     1   0     1    1    1   1

D    1     1   0     1    nan  1   4

The expect output should looks like the following

name day1 day2 day3 day4  day5 day6 day7    expected 

A    1     1   nan   2    3    0   3        2     <-  1/3*(day5 + day6 + day7)

B    nan   nan nan   nan  nan  nan 3        nan   <-  less than 3 non-missing

C    1     1   0     1    1    1   1        1     <-  1/3*(day5 + day6 + day7)

D    1     1   0     1    nan  1   4        2    <-  1/3 *(day4 + day6 + day7)

If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.

But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?

edited 6 hours ago

asked 7 hours ago

fly36

1721112

add a comment |

I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.

name day1 day2 day3 day4  day5 day6 day7

A    1     1   nan   2    3    0   3

B    nan   nan nan   nan  nan  nan 3

C    1     1   0     1    1    1   1

D    1     1   0     1    nan  1   4

The expect output should looks like the following

name day1 day2 day3 day4  day5 day6 day7    expected 

A    1     1   nan   2    3    0   3        2     <-  1/3*(day5 + day6 + day7)

B    nan   nan nan   nan  nan  nan 3        nan   <-  less than 3 non-missing

C    1     1   0     1    1    1   1        1     <-  1/3*(day5 + day6 + day7)

D    1     1   0     1    nan  1   4        2    <-  1/3 *(day4 + day6 + day7)

If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.

But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?

edited 6 hours ago

asked 7 hours ago

fly36

1721112

I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.

name day1 day2 day3 day4  day5 day6 day7

A    1     1   nan   2    3    0   3

B    nan   nan nan   nan  nan  nan 3

C    1     1   0     1    1    1   1

D    1     1   0     1    nan  1   4

The expect output should looks like the following

name day1 day2 day3 day4  day5 day6 day7    expected 

A    1     1   nan   2    3    0   3        2     <-  1/3*(day5 + day6 + day7)

B    nan   nan nan   nan  nan  nan 3        nan   <-  less than 3 non-missing

C    1     1   0     1    1    1   1        1     <-  1/3*(day5 + day6 + day7)

D    1     1   0     1    nan  1   4        2    <-  1/3 *(day4 + day6 + day7)

If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.

But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?

python pandas numpy

edited 6 hours ago

asked 7 hours ago

fly36

1721112

edited 6 hours ago

asked 7 hours ago

fly36

1721112

edited 6 hours ago

asked 7 hours ago

fly36

1721112

asked 7 hours ago

fly36

1721112

asked 7 hours ago

fly36

1721112

add a comment |

3 Answers
3

active

oldest

votes

Vectorized one using justify -

N = 3 # last N entries for averaging

avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)

df['expected'] = avg

edited 6 hours ago

answered 6 hours ago

Divakar

154k1480169

2

I know I will see justify here :-)
– W-B
6 hours ago

add a comment |

You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.

def mean_calculator(row):

    non_nulls = row.notnull()

    if non_nulls.sum() < 3:

        return np.nan

    return row[non_nulls].values[-3:].mean()



df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)



print(df)



  name  day1  day2  day3  day4  day5  day6  day7  expected

0    A   1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

1    B   NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

2    C   1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

3    D   1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

answered 6 hours ago

jpp

90.5k2052101

add a comment |

You can start by calculating the expected column using applying the following function:

expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)

And insert these values in the columns that have at least 3 valid values:

m = df.isnull().sum(axis=1) > 3

df.loc[~m,'expected'] = expected.mask(m)



       day1  day2  day3  day4  day5  day6  day7  expected

name                                                    

A      1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

B      NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

C      1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

D      1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

edited 6 hours ago

answered 6 hours ago

nixon

3,6291222

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53936985%2fhow-to-calculate-the-average-of-the-most-recent-three-non-nan-value-using-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Vectorized one using justify -

N = 3 # last N entries for averaging

avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)

df['expected'] = avg

edited 6 hours ago

answered 6 hours ago

Divakar

154k1480169

2

I know I will see justify here :-)
– W-B
6 hours ago

add a comment |

Vectorized one using justify -

N = 3 # last N entries for averaging

avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)

df['expected'] = avg

edited 6 hours ago

answered 6 hours ago

Divakar

154k1480169

2

I know I will see justify here :-)
– W-B
6 hours ago

add a comment |

Vectorized one using justify -

N = 3 # last N entries for averaging

avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)

df['expected'] = avg

edited 6 hours ago

answered 6 hours ago

Divakar

154k1480169

Vectorized one using justify -

N = 3 # last N entries for averaging

avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)

df['expected'] = avg

edited 6 hours ago

answered 6 hours ago

Divakar

154k1480169

edited 6 hours ago

answered 6 hours ago

Divakar

154k1480169

answered 6 hours ago

Divakar

154k1480169

answered 6 hours ago

Divakar

154k1480169

2

I know I will see justify here :-)
– W-B
6 hours ago

add a comment |

2

I know I will see justify here :-)
– W-B
6 hours ago

I know I will see justify here :-)
– W-B
6 hours ago

add a comment |

You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.

def mean_calculator(row):

    non_nulls = row.notnull()

    if non_nulls.sum() < 3:

        return np.nan

    return row[non_nulls].values[-3:].mean()



df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)



print(df)



  name  day1  day2  day3  day4  day5  day6  day7  expected

0    A   1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

1    B   NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

2    C   1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

3    D   1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

answered 6 hours ago

jpp

90.5k2052101

add a comment |

You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.

def mean_calculator(row):

    non_nulls = row.notnull()

    if non_nulls.sum() < 3:

        return np.nan

    return row[non_nulls].values[-3:].mean()



df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)



print(df)



  name  day1  day2  day3  day4  day5  day6  day7  expected

0    A   1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

1    B   NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

2    C   1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

3    D   1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

answered 6 hours ago

jpp

90.5k2052101

add a comment |

You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.

def mean_calculator(row):

    non_nulls = row.notnull()

    if non_nulls.sum() < 3:

        return np.nan

    return row[non_nulls].values[-3:].mean()



df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)



print(df)



  name  day1  day2  day3  day4  day5  day6  day7  expected

0    A   1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

1    B   NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

2    C   1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

3    D   1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

answered 6 hours ago

jpp

90.5k2052101

You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.

def mean_calculator(row):

    non_nulls = row.notnull()

    if non_nulls.sum() < 3:

        return np.nan

    return row[non_nulls].values[-3:].mean()



df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)



print(df)



  name  day1  day2  day3  day4  day5  day6  day7  expected

0    A   1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

1    B   NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

2    C   1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

3    D   1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

answered 6 hours ago

jpp

90.5k2052101

answered 6 hours ago

jpp

90.5k2052101

answered 6 hours ago

jpp

90.5k2052101

answered 6 hours ago

jpp

90.5k2052101

add a comment |

You can start by calculating the expected column using applying the following function:

expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)

And insert these values in the columns that have at least 3 valid values:

m = df.isnull().sum(axis=1) > 3

df.loc[~m,'expected'] = expected.mask(m)



       day1  day2  day3  day4  day5  day6  day7  expected

name                                                    

A      1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

B      NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

C      1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

D      1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

edited 6 hours ago

answered 6 hours ago

nixon

3,6291222

add a comment |

You can start by calculating the expected column using applying the following function:

expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)

And insert these values in the columns that have at least 3 valid values:

m = df.isnull().sum(axis=1) > 3

df.loc[~m,'expected'] = expected.mask(m)



       day1  day2  day3  day4  day5  day6  day7  expected

name                                                    

A      1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

B      NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

C      1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

D      1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

edited 6 hours ago

answered 6 hours ago

nixon

3,6291222

add a comment |

You can start by calculating the expected column using applying the following function:

expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)

And insert these values in the columns that have at least 3 valid values:

m = df.isnull().sum(axis=1) > 3

df.loc[~m,'expected'] = expected.mask(m)



       day1  day2  day3  day4  day5  day6  day7  expected

name                                                    

A      1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

B      NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

C      1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

D      1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

edited 6 hours ago

answered 6 hours ago

nixon

3,6291222

You can start by calculating the expected column using applying the following function:

expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)

And insert these values in the columns that have at least 3 valid values:

m = df.isnull().sum(axis=1) > 3

df.loc[~m,'expected'] = expected.mask(m)



       day1  day2  day3  day4  day5  day6  day7  expected

name                                                    

A      1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

B      NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

C      1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

D      1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

edited 6 hours ago

answered 6 hours ago

nixon

3,6291222

edited 6 hours ago

answered 6 hours ago

nixon

3,6291222

answered 6 hours ago

nixon

3,6291222

answered 6 hours ago

nixon

3,6291222

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Qfyilyi