How to calculate the average of the most recent three non-nan value using Python
I have a dataframe df
looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.
name day1 day2 day3 day4 day5 day6 day7
A 1 1 nan 2 3 0 3
B nan nan nan nan nan nan 3
C 1 1 0 1 1 1 1
D 1 1 0 1 nan 1 4
The expect output should looks like the following
name day1 day2 day3 day4 day5 day6 day7 expected
A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)
I know how to calculate the average of the last three column and count how many non-missing observation are there.
df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column
If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3
.
But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?
python pandas numpy
add a comment |
I have a dataframe df
looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.
name day1 day2 day3 day4 day5 day6 day7
A 1 1 nan 2 3 0 3
B nan nan nan nan nan nan 3
C 1 1 0 1 1 1 1
D 1 1 0 1 nan 1 4
The expect output should looks like the following
name day1 day2 day3 day4 day5 day6 day7 expected
A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)
I know how to calculate the average of the last three column and count how many non-missing observation are there.
df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column
If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3
.
But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?
python pandas numpy
add a comment |
I have a dataframe df
looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.
name day1 day2 day3 day4 day5 day6 day7
A 1 1 nan 2 3 0 3
B nan nan nan nan nan nan 3
C 1 1 0 1 1 1 1
D 1 1 0 1 nan 1 4
The expect output should looks like the following
name day1 day2 day3 day4 day5 day6 day7 expected
A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)
I know how to calculate the average of the last three column and count how many non-missing observation are there.
df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column
If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3
.
But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?
python pandas numpy
I have a dataframe df
looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.
name day1 day2 day3 day4 day5 day6 day7
A 1 1 nan 2 3 0 3
B nan nan nan nan nan nan 3
C 1 1 0 1 1 1 1
D 1 1 0 1 nan 1 4
The expect output should looks like the following
name day1 day2 day3 day4 day5 day6 day7 expected
A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)
I know how to calculate the average of the last three column and count how many non-missing observation are there.
df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column
If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3
.
But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?
python pandas numpy
python pandas numpy
edited 6 hours ago
asked 7 hours ago
fly36
1721112
1721112
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
Vectorized one using justify
-
N = 3 # last N entries for averaging
avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
df['expected'] = avg
2
I know I will seejustify
here :-)
– W-B
6 hours ago
add a comment |
You can use pd.DataFrame.apply
with a custom function. This is only partially vectorised.
def mean_calculator(row):
non_nulls = row.notnull()
if non_nulls.sum() < 3:
return np.nan
return row[non_nulls].values[-3:].mean()
df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)
print(df)
name day1 day2 day3 day4 day5 day6 day7 expected
0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
1 B NaN NaN NaN NaN NaN NaN 3 NaN
2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
add a comment |
You can start by calculating the expected
column using applying the following function:
expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)
And insert these values in the columns that have at least 3
valid values:
m = df.isnull().sum(axis=1) > 3
df.loc[~m,'expected'] = expected.mask(m)
day1 day2 day3 day4 day5 day6 day7 expected
name
A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
B NaN NaN NaN NaN NaN NaN 3 NaN
C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53936985%2fhow-to-calculate-the-average-of-the-most-recent-three-non-nan-value-using-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Vectorized one using justify
-
N = 3 # last N entries for averaging
avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
df['expected'] = avg
2
I know I will seejustify
here :-)
– W-B
6 hours ago
add a comment |
Vectorized one using justify
-
N = 3 # last N entries for averaging
avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
df['expected'] = avg
2
I know I will seejustify
here :-)
– W-B
6 hours ago
add a comment |
Vectorized one using justify
-
N = 3 # last N entries for averaging
avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
df['expected'] = avg
Vectorized one using justify
-
N = 3 # last N entries for averaging
avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
df['expected'] = avg
edited 6 hours ago
answered 6 hours ago
Divakar
154k1480169
154k1480169
2
I know I will seejustify
here :-)
– W-B
6 hours ago
add a comment |
2
I know I will seejustify
here :-)
– W-B
6 hours ago
2
2
I know I will see
justify
here :-)– W-B
6 hours ago
I know I will see
justify
here :-)– W-B
6 hours ago
add a comment |
You can use pd.DataFrame.apply
with a custom function. This is only partially vectorised.
def mean_calculator(row):
non_nulls = row.notnull()
if non_nulls.sum() < 3:
return np.nan
return row[non_nulls].values[-3:].mean()
df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)
print(df)
name day1 day2 day3 day4 day5 day6 day7 expected
0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
1 B NaN NaN NaN NaN NaN NaN 3 NaN
2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
add a comment |
You can use pd.DataFrame.apply
with a custom function. This is only partially vectorised.
def mean_calculator(row):
non_nulls = row.notnull()
if non_nulls.sum() < 3:
return np.nan
return row[non_nulls].values[-3:].mean()
df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)
print(df)
name day1 day2 day3 day4 day5 day6 day7 expected
0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
1 B NaN NaN NaN NaN NaN NaN 3 NaN
2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
add a comment |
You can use pd.DataFrame.apply
with a custom function. This is only partially vectorised.
def mean_calculator(row):
non_nulls = row.notnull()
if non_nulls.sum() < 3:
return np.nan
return row[non_nulls].values[-3:].mean()
df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)
print(df)
name day1 day2 day3 day4 day5 day6 day7 expected
0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
1 B NaN NaN NaN NaN NaN NaN 3 NaN
2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
You can use pd.DataFrame.apply
with a custom function. This is only partially vectorised.
def mean_calculator(row):
non_nulls = row.notnull()
if non_nulls.sum() < 3:
return np.nan
return row[non_nulls].values[-3:].mean()
df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)
print(df)
name day1 day2 day3 day4 day5 day6 day7 expected
0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
1 B NaN NaN NaN NaN NaN NaN 3 NaN
2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
answered 6 hours ago
jpp
90.5k2052101
90.5k2052101
add a comment |
add a comment |
You can start by calculating the expected
column using applying the following function:
expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)
And insert these values in the columns that have at least 3
valid values:
m = df.isnull().sum(axis=1) > 3
df.loc[~m,'expected'] = expected.mask(m)
day1 day2 day3 day4 day5 day6 day7 expected
name
A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
B NaN NaN NaN NaN NaN NaN 3 NaN
C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
add a comment |
You can start by calculating the expected
column using applying the following function:
expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)
And insert these values in the columns that have at least 3
valid values:
m = df.isnull().sum(axis=1) > 3
df.loc[~m,'expected'] = expected.mask(m)
day1 day2 day3 day4 day5 day6 day7 expected
name
A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
B NaN NaN NaN NaN NaN NaN 3 NaN
C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
add a comment |
You can start by calculating the expected
column using applying the following function:
expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)
And insert these values in the columns that have at least 3
valid values:
m = df.isnull().sum(axis=1) > 3
df.loc[~m,'expected'] = expected.mask(m)
day1 day2 day3 day4 day5 day6 day7 expected
name
A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
B NaN NaN NaN NaN NaN NaN 3 NaN
C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
You can start by calculating the expected
column using applying the following function:
expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)
And insert these values in the columns that have at least 3
valid values:
m = df.isnull().sum(axis=1) > 3
df.loc[~m,'expected'] = expected.mask(m)
day1 day2 day3 day4 day5 day6 day7 expected
name
A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
B NaN NaN NaN NaN NaN NaN 3 NaN
C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
edited 6 hours ago
answered 6 hours ago
nixon
3,6291222
3,6291222
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53936985%2fhow-to-calculate-the-average-of-the-most-recent-three-non-nan-value-using-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown