How to calculate the average of the most recent three non-nan value using Python












6














I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.



name day1 day2 day3 day4  day5 day6 day7
A 1 1 nan 2 3 0 3
B nan nan nan nan nan nan 3
C 1 1 0 1 1 1 1
D 1 1 0 1 nan 1 4


The expect output should looks like the following



name day1 day2 day3 day4  day5 day6 day7    expected 
A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)


I know how to calculate the average of the last three column and count how many non-missing observation are there.
df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column



If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.



But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?










share|improve this question





























    6














    I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.



    name day1 day2 day3 day4  day5 day6 day7
    A 1 1 nan 2 3 0 3
    B nan nan nan nan nan nan 3
    C 1 1 0 1 1 1 1
    D 1 1 0 1 nan 1 4


    The expect output should looks like the following



    name day1 day2 day3 day4  day5 day6 day7    expected 
    A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
    B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
    C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
    D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)


    I know how to calculate the average of the last three column and count how many non-missing observation are there.
    df.iloc[:, 5:7].count(axis=1) average of the last three column
    df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column



    If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.



    But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?










    share|improve this question



























      6












      6








      6


      1





      I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.



      name day1 day2 day3 day4  day5 day6 day7
      A 1 1 nan 2 3 0 3
      B nan nan nan nan nan nan 3
      C 1 1 0 1 1 1 1
      D 1 1 0 1 nan 1 4


      The expect output should looks like the following



      name day1 day2 day3 day4  day5 day6 day7    expected 
      A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
      B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
      C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
      D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)


      I know how to calculate the average of the last three column and count how many non-missing observation are there.
      df.iloc[:, 5:7].count(axis=1) average of the last three column
      df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column



      If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.



      But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?










      share|improve this question















      I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.



      name day1 day2 day3 day4  day5 day6 day7
      A 1 1 nan 2 3 0 3
      B nan nan nan nan nan nan 3
      C 1 1 0 1 1 1 1
      D 1 1 0 1 nan 1 4


      The expect output should looks like the following



      name day1 day2 day3 day4  day5 day6 day7    expected 
      A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
      B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
      C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
      D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)


      I know how to calculate the average of the last three column and count how many non-missing observation are there.
      df.iloc[:, 5:7].count(axis=1) average of the last three column
      df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column



      If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.



      But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?







      python pandas numpy






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 6 hours ago

























      asked 7 hours ago









      fly36

      1721112




      1721112
























          3 Answers
          3






          active

          oldest

          votes


















          5














          Vectorized one using justify -



          N = 3 # last N entries for averaging
          avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
          df['expected'] = avg





          share|improve this answer



















          • 2




            I know I will see justify here :-)
            – W-B
            6 hours ago



















          1














          You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.



          def mean_calculator(row):
          non_nulls = row.notnull()
          if non_nulls.sum() < 3:
          return np.nan
          return row[non_nulls].values[-3:].mean()

          df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)

          print(df)

          name day1 day2 day3 day4 day5 day6 day7 expected
          0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
          1 B NaN NaN NaN NaN NaN NaN 3 NaN
          2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
          3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





          share|improve this answer





























            0














            You can start by calculating the expected column using applying the following function:



            expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)


            And insert these values in the columns that have at least 3 valid values:



            m = df.isnull().sum(axis=1) > 3
            df.loc[~m,'expected'] = expected.mask(m)

            day1 day2 day3 day4 day5 day6 day7 expected
            name
            A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
            B NaN NaN NaN NaN NaN NaN 3 NaN
            C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
            D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53936985%2fhow-to-calculate-the-average-of-the-most-recent-three-non-nan-value-using-python%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              5














              Vectorized one using justify -



              N = 3 # last N entries for averaging
              avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
              df['expected'] = avg





              share|improve this answer



















              • 2




                I know I will see justify here :-)
                – W-B
                6 hours ago
















              5














              Vectorized one using justify -



              N = 3 # last N entries for averaging
              avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
              df['expected'] = avg





              share|improve this answer



















              • 2




                I know I will see justify here :-)
                – W-B
                6 hours ago














              5












              5








              5






              Vectorized one using justify -



              N = 3 # last N entries for averaging
              avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
              df['expected'] = avg





              share|improve this answer














              Vectorized one using justify -



              N = 3 # last N entries for averaging
              avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
              df['expected'] = avg






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 6 hours ago

























              answered 6 hours ago









              Divakar

              154k1480169




              154k1480169








              • 2




                I know I will see justify here :-)
                – W-B
                6 hours ago














              • 2




                I know I will see justify here :-)
                – W-B
                6 hours ago








              2




              2




              I know I will see justify here :-)
              – W-B
              6 hours ago




              I know I will see justify here :-)
              – W-B
              6 hours ago













              1














              You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.



              def mean_calculator(row):
              non_nulls = row.notnull()
              if non_nulls.sum() < 3:
              return np.nan
              return row[non_nulls].values[-3:].mean()

              df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)

              print(df)

              name day1 day2 day3 day4 day5 day6 day7 expected
              0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
              1 B NaN NaN NaN NaN NaN NaN 3 NaN
              2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
              3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





              share|improve this answer


























                1














                You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.



                def mean_calculator(row):
                non_nulls = row.notnull()
                if non_nulls.sum() < 3:
                return np.nan
                return row[non_nulls].values[-3:].mean()

                df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)

                print(df)

                name day1 day2 day3 day4 day5 day6 day7 expected
                0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
                1 B NaN NaN NaN NaN NaN NaN 3 NaN
                2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
                3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





                share|improve this answer
























                  1












                  1








                  1






                  You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.



                  def mean_calculator(row):
                  non_nulls = row.notnull()
                  if non_nulls.sum() < 3:
                  return np.nan
                  return row[non_nulls].values[-3:].mean()

                  df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)

                  print(df)

                  name day1 day2 day3 day4 day5 day6 day7 expected
                  0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
                  1 B NaN NaN NaN NaN NaN NaN 3 NaN
                  2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
                  3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





                  share|improve this answer












                  You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.



                  def mean_calculator(row):
                  non_nulls = row.notnull()
                  if non_nulls.sum() < 3:
                  return np.nan
                  return row[non_nulls].values[-3:].mean()

                  df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)

                  print(df)

                  name day1 day2 day3 day4 day5 day6 day7 expected
                  0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
                  1 B NaN NaN NaN NaN NaN NaN 3 NaN
                  2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
                  3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 6 hours ago









                  jpp

                  90.5k2052101




                  90.5k2052101























                      0














                      You can start by calculating the expected column using applying the following function:



                      expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)


                      And insert these values in the columns that have at least 3 valid values:



                      m = df.isnull().sum(axis=1) > 3
                      df.loc[~m,'expected'] = expected.mask(m)

                      day1 day2 day3 day4 day5 day6 day7 expected
                      name
                      A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
                      B NaN NaN NaN NaN NaN NaN 3 NaN
                      C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
                      D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





                      share|improve this answer




























                        0














                        You can start by calculating the expected column using applying the following function:



                        expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)


                        And insert these values in the columns that have at least 3 valid values:



                        m = df.isnull().sum(axis=1) > 3
                        df.loc[~m,'expected'] = expected.mask(m)

                        day1 day2 day3 day4 day5 day6 day7 expected
                        name
                        A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
                        B NaN NaN NaN NaN NaN NaN 3 NaN
                        C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
                        D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





                        share|improve this answer


























                          0












                          0








                          0






                          You can start by calculating the expected column using applying the following function:



                          expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)


                          And insert these values in the columns that have at least 3 valid values:



                          m = df.isnull().sum(axis=1) > 3
                          df.loc[~m,'expected'] = expected.mask(m)

                          day1 day2 day3 day4 day5 day6 day7 expected
                          name
                          A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
                          B NaN NaN NaN NaN NaN NaN 3 NaN
                          C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
                          D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





                          share|improve this answer














                          You can start by calculating the expected column using applying the following function:



                          expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)


                          And insert these values in the columns that have at least 3 valid values:



                          m = df.isnull().sum(axis=1) > 3
                          df.loc[~m,'expected'] = expected.mask(m)

                          day1 day2 day3 day4 day5 day6 day7 expected
                          name
                          A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
                          B NaN NaN NaN NaN NaN NaN 3 NaN
                          C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
                          D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited 6 hours ago

























                          answered 6 hours ago









                          nixon

                          3,6291222




                          3,6291222






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53936985%2fhow-to-calculate-the-average-of-the-most-recent-three-non-nan-value-using-python%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              What visual should I use to simply compare current year value vs last year in Power BI desktop

                              Alexandru Averescu

                              Trompette piccolo