Use columns 1 and 2 to populate column 3











up vote
7
down vote

favorite












I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:



Days    long_entry_flag long_exit_flag  signal
1 FALSE TRUE
2 FALSE FALSE
3 TRUE FALSE 1
4 TRUE FALSE 1
5 FALSE FALSE 1
6 TRUE FALSE 1
7 TRUE FALSE 1
8 FALSE TRUE
9 FALSE TRUE
10 TRUE FALSE 1
11 TRUE FALSE 1
12 TRUE FALSE 1
13 FALSE FALSE 1
14 FALSE TRUE
15 FALSE FALSE
16 FALSE TRUE
17 TRUE FALSE 1
18 TRUE FALSE 1
19 FALSE FALSE 1
20 FALSE FALSE 1
21 FALSE TRUE


My pseudo-code version would take the following steps




  1. Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)

  2. Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8

  3. Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)

  4. And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)

  5. etc


Welcome ideas about ways to populate the ‘signal’ column rapidly if possible (using vectorisation?) - this is a subset of a large dataframe with tens of thousands of rows, and it is one of many dataframes being analysed in sequence.



Many thanks in advance!










share|improve this question
























  • I'm sorry I'm trying to delete this question to get the formatting right...
    – Baz
    6 hours ago






  • 1




    If u still think you want to edit your question you can delete and raise a new question.
    – Mohamed Thasin ah
    6 hours ago






  • 1




    Oh perfect editing thank you Mohamed Thasin ah!
    – Baz
    6 hours ago















up vote
7
down vote

favorite












I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:



Days    long_entry_flag long_exit_flag  signal
1 FALSE TRUE
2 FALSE FALSE
3 TRUE FALSE 1
4 TRUE FALSE 1
5 FALSE FALSE 1
6 TRUE FALSE 1
7 TRUE FALSE 1
8 FALSE TRUE
9 FALSE TRUE
10 TRUE FALSE 1
11 TRUE FALSE 1
12 TRUE FALSE 1
13 FALSE FALSE 1
14 FALSE TRUE
15 FALSE FALSE
16 FALSE TRUE
17 TRUE FALSE 1
18 TRUE FALSE 1
19 FALSE FALSE 1
20 FALSE FALSE 1
21 FALSE TRUE


My pseudo-code version would take the following steps




  1. Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)

  2. Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8

  3. Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)

  4. And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)

  5. etc


Welcome ideas about ways to populate the ‘signal’ column rapidly if possible (using vectorisation?) - this is a subset of a large dataframe with tens of thousands of rows, and it is one of many dataframes being analysed in sequence.



Many thanks in advance!










share|improve this question
























  • I'm sorry I'm trying to delete this question to get the formatting right...
    – Baz
    6 hours ago






  • 1




    If u still think you want to edit your question you can delete and raise a new question.
    – Mohamed Thasin ah
    6 hours ago






  • 1




    Oh perfect editing thank you Mohamed Thasin ah!
    – Baz
    6 hours ago













up vote
7
down vote

favorite









up vote
7
down vote

favorite











I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:



Days    long_entry_flag long_exit_flag  signal
1 FALSE TRUE
2 FALSE FALSE
3 TRUE FALSE 1
4 TRUE FALSE 1
5 FALSE FALSE 1
6 TRUE FALSE 1
7 TRUE FALSE 1
8 FALSE TRUE
9 FALSE TRUE
10 TRUE FALSE 1
11 TRUE FALSE 1
12 TRUE FALSE 1
13 FALSE FALSE 1
14 FALSE TRUE
15 FALSE FALSE
16 FALSE TRUE
17 TRUE FALSE 1
18 TRUE FALSE 1
19 FALSE FALSE 1
20 FALSE FALSE 1
21 FALSE TRUE


My pseudo-code version would take the following steps




  1. Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)

  2. Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8

  3. Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)

  4. And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)

  5. etc


Welcome ideas about ways to populate the ‘signal’ column rapidly if possible (using vectorisation?) - this is a subset of a large dataframe with tens of thousands of rows, and it is one of many dataframes being analysed in sequence.



Many thanks in advance!










share|improve this question















I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:



Days    long_entry_flag long_exit_flag  signal
1 FALSE TRUE
2 FALSE FALSE
3 TRUE FALSE 1
4 TRUE FALSE 1
5 FALSE FALSE 1
6 TRUE FALSE 1
7 TRUE FALSE 1
8 FALSE TRUE
9 FALSE TRUE
10 TRUE FALSE 1
11 TRUE FALSE 1
12 TRUE FALSE 1
13 FALSE FALSE 1
14 FALSE TRUE
15 FALSE FALSE
16 FALSE TRUE
17 TRUE FALSE 1
18 TRUE FALSE 1
19 FALSE FALSE 1
20 FALSE FALSE 1
21 FALSE TRUE


My pseudo-code version would take the following steps




  1. Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)

  2. Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8

  3. Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)

  4. And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)

  5. etc


Welcome ideas about ways to populate the ‘signal’ column rapidly if possible (using vectorisation?) - this is a subset of a large dataframe with tens of thousands of rows, and it is one of many dataframes being analysed in sequence.



Many thanks in advance!







python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 5 hours ago

























asked 6 hours ago









Baz

636




636












  • I'm sorry I'm trying to delete this question to get the formatting right...
    – Baz
    6 hours ago






  • 1




    If u still think you want to edit your question you can delete and raise a new question.
    – Mohamed Thasin ah
    6 hours ago






  • 1




    Oh perfect editing thank you Mohamed Thasin ah!
    – Baz
    6 hours ago


















  • I'm sorry I'm trying to delete this question to get the formatting right...
    – Baz
    6 hours ago






  • 1




    If u still think you want to edit your question you can delete and raise a new question.
    – Mohamed Thasin ah
    6 hours ago






  • 1




    Oh perfect editing thank you Mohamed Thasin ah!
    – Baz
    6 hours ago
















I'm sorry I'm trying to delete this question to get the formatting right...
– Baz
6 hours ago




I'm sorry I'm trying to delete this question to get the formatting right...
– Baz
6 hours ago




1




1




If u still think you want to edit your question you can delete and raise a new question.
– Mohamed Thasin ah
6 hours ago




If u still think you want to edit your question you can delete and raise a new question.
– Mohamed Thasin ah
6 hours ago




1




1




Oh perfect editing thank you Mohamed Thasin ah!
– Baz
6 hours ago




Oh perfect editing thank you Mohamed Thasin ah!
– Baz
6 hours ago












4 Answers
4






active

oldest

votes

















up vote
5
down vote



accepted










You can do



# Assuming we're starting from the "outside"
inside = False
for ix, row in df.iterrows():
inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
df.at[ix, 'signal'] = 1 if inside else np.nan


which is going to give you exactly the output you posted.






share|improve this answer





















  • No worries. Good question @Baz
    – ayorgo
    4 hours ago










  • I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using apply (currying?) or do it in a vectorized way I'll be the first to upvote.
    – ayorgo
    4 hours ago










  • @ayorgo - added solution.
    – jezrael
    3 hours ago


















up vote
3
down vote













For improve performance use Numba solution:



arr = df[['long_exit_flag','long_entry_flag']].values

@jit
def f(A):
inside = False
out = np.ones(len(A), dtype=float)
for i in range(len(arr)):
inside = not A[i, 0] if inside else A[i, 1]
if not inside:
out[i] = np.nan
return out

df['signal'] = f(arr)


Performance:



#[21000 rows x 5 columns]
df = pd.concat([df] * 1000, ignore_index=True)

In [189]: %%timeit
...: inside = False
...: for ix, row in df.iterrows():
...: inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
...: df.at[ix, 'signal'] = 1 if inside else np.nan
...:
1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [190]: %%timeit
...: arr = df[['long_exit_flag','long_entry_flag']].values
...:
...: @jit
...: def f(A):
...: inside = False
...: out = np.ones(len(A), dtype=float)
...: for i in range(len(arr)):
...: inside = not A[i, 0] if inside else A[i, 1]
...: if not inside:
...: out[i] = np.nan
...: return out
...:
...: df['signal'] = f(arr)
...:
171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [200]: %%timeit
...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)
...:
...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])
...:
2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


You can also use numpy for shifting, also @Dark code is simplify:



In [222]: %%timeit
...: d = np.where(~df['long_exit_flag'].values, df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)
...: shifted = np.insert(d[:-1], 0, np.nan)
...: m = (shifted==0) | (shifted==1)
...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)
...:
590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


EDIT:



You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.






share|improve this answer



















  • 1




    Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
    – ayorgo
    3 hours ago










  • @jezrael do check the timings against my approach : )
    – Dark
    3 hours ago


















up vote
1
down vote













Here's an approach with complete boolean operations which is a vectorized approach and will be fast.



Step 1 :
If long_exit_flag is True return Np.nan else apply or between long_entry_flag and long_exit_flag



df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])


Step 2 : Now its the state where the both the columns are false. We need to ignore it and replace the values with the previous state. Which can be done using where and select



df['new_signal']= np.where(df['d']==0, 
np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),
df['d'])

Days long_entry_flag long_exit_flag signal d new_signal
0 1 False True NaN NaN NaN
1 2 False False NaN 0.0 NaN
2 3 True False 1.0 1.0 1.0
3 4 True False 1.0 1.0 1.0
4 5 False False 1.0 0.0 1.0
5 6 True False 1.0 1.0 1.0
6 7 True False 1.0 1.0 1.0
7 8 False True NaN NaN NaN
8 9 False True NaN NaN NaN
9 10 True False 1.0 1.0 1.0
10 11 True False 1.0 1.0 1.0
11 12 True False 1.0 1.0 1.0
12 13 False False 1.0 0.0 1.0
13 14 False True NaN NaN NaN
14 15 False False NaN 0.0 NaN
15 16 False True NaN NaN NaN
16 17 True False 1.0 1.0 1.0
17 18 True False 1.0 1.0 1.0
18 19 False False 1.0 0.0 1.0
19 20 False False 1.0 0.0 1.0
20 21 False True NaN NaN NaN





share|improve this answer



















  • 1




    Nice solution, I try numpy fy it - check edited my answer with new timings.
    – jezrael
    2 hours ago












  • I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
    – Dark
    2 hours ago












  • I know it and already upvote too. Good luck!
    – jezrael
    2 hours ago


















up vote
0
down vote













#let the long_exit_flag equal to 0 when the exit is TRUE
df['long_exit_flag_r']=~df.long_exit_flag_r
df.temp=''

for i in range(1,len(df.index)):
df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r


if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53721733%2fuse-columns-1-and-2-to-populate-column-3%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    5
    down vote



    accepted










    You can do



    # Assuming we're starting from the "outside"
    inside = False
    for ix, row in df.iterrows():
    inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
    df.at[ix, 'signal'] = 1 if inside else np.nan


    which is going to give you exactly the output you posted.






    share|improve this answer





















    • No worries. Good question @Baz
      – ayorgo
      4 hours ago










    • I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using apply (currying?) or do it in a vectorized way I'll be the first to upvote.
      – ayorgo
      4 hours ago










    • @ayorgo - added solution.
      – jezrael
      3 hours ago















    up vote
    5
    down vote



    accepted










    You can do



    # Assuming we're starting from the "outside"
    inside = False
    for ix, row in df.iterrows():
    inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
    df.at[ix, 'signal'] = 1 if inside else np.nan


    which is going to give you exactly the output you posted.






    share|improve this answer





















    • No worries. Good question @Baz
      – ayorgo
      4 hours ago










    • I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using apply (currying?) or do it in a vectorized way I'll be the first to upvote.
      – ayorgo
      4 hours ago










    • @ayorgo - added solution.
      – jezrael
      3 hours ago













    up vote
    5
    down vote



    accepted







    up vote
    5
    down vote



    accepted






    You can do



    # Assuming we're starting from the "outside"
    inside = False
    for ix, row in df.iterrows():
    inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
    df.at[ix, 'signal'] = 1 if inside else np.nan


    which is going to give you exactly the output you posted.






    share|improve this answer












    You can do



    # Assuming we're starting from the "outside"
    inside = False
    for ix, row in df.iterrows():
    inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
    df.at[ix, 'signal'] = 1 if inside else np.nan


    which is going to give you exactly the output you posted.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered 5 hours ago









    ayorgo

    663311




    663311












    • No worries. Good question @Baz
      – ayorgo
      4 hours ago










    • I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using apply (currying?) or do it in a vectorized way I'll be the first to upvote.
      – ayorgo
      4 hours ago










    • @ayorgo - added solution.
      – jezrael
      3 hours ago


















    • No worries. Good question @Baz
      – ayorgo
      4 hours ago










    • I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using apply (currying?) or do it in a vectorized way I'll be the first to upvote.
      – ayorgo
      4 hours ago










    • @ayorgo - added solution.
      – jezrael
      3 hours ago
















    No worries. Good question @Baz
    – ayorgo
    4 hours ago




    No worries. Good question @Baz
    – ayorgo
    4 hours ago












    I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using apply (currying?) or do it in a vectorized way I'll be the first to upvote.
    – ayorgo
    4 hours ago




    I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using apply (currying?) or do it in a vectorized way I'll be the first to upvote.
    – ayorgo
    4 hours ago












    @ayorgo - added solution.
    – jezrael
    3 hours ago




    @ayorgo - added solution.
    – jezrael
    3 hours ago












    up vote
    3
    down vote













    For improve performance use Numba solution:



    arr = df[['long_exit_flag','long_entry_flag']].values

    @jit
    def f(A):
    inside = False
    out = np.ones(len(A), dtype=float)
    for i in range(len(arr)):
    inside = not A[i, 0] if inside else A[i, 1]
    if not inside:
    out[i] = np.nan
    return out

    df['signal'] = f(arr)


    Performance:



    #[21000 rows x 5 columns]
    df = pd.concat([df] * 1000, ignore_index=True)

    In [189]: %%timeit
    ...: inside = False
    ...: for ix, row in df.iterrows():
    ...: inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
    ...: df.at[ix, 'signal'] = 1 if inside else np.nan
    ...:
    1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    In [190]: %%timeit
    ...: arr = df[['long_exit_flag','long_entry_flag']].values
    ...:
    ...: @jit
    ...: def f(A):
    ...: inside = False
    ...: out = np.ones(len(A), dtype=float)
    ...: for i in range(len(arr)):
    ...: inside = not A[i, 0] if inside else A[i, 1]
    ...: if not inside:
    ...: out[i] = np.nan
    ...: return out
    ...:
    ...: df['signal'] = f(arr)
    ...:
    171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

    In [200]: %%timeit
    ...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)
    ...:
    ...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])
    ...:
    2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


    You can also use numpy for shifting, also @Dark code is simplify:



    In [222]: %%timeit
    ...: d = np.where(~df['long_exit_flag'].values, df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)
    ...: shifted = np.insert(d[:-1], 0, np.nan)
    ...: m = (shifted==0) | (shifted==1)
    ...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)
    ...:
    590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


    EDIT:



    You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.






    share|improve this answer



















    • 1




      Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
      – ayorgo
      3 hours ago










    • @jezrael do check the timings against my approach : )
      – Dark
      3 hours ago















    up vote
    3
    down vote













    For improve performance use Numba solution:



    arr = df[['long_exit_flag','long_entry_flag']].values

    @jit
    def f(A):
    inside = False
    out = np.ones(len(A), dtype=float)
    for i in range(len(arr)):
    inside = not A[i, 0] if inside else A[i, 1]
    if not inside:
    out[i] = np.nan
    return out

    df['signal'] = f(arr)


    Performance:



    #[21000 rows x 5 columns]
    df = pd.concat([df] * 1000, ignore_index=True)

    In [189]: %%timeit
    ...: inside = False
    ...: for ix, row in df.iterrows():
    ...: inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
    ...: df.at[ix, 'signal'] = 1 if inside else np.nan
    ...:
    1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    In [190]: %%timeit
    ...: arr = df[['long_exit_flag','long_entry_flag']].values
    ...:
    ...: @jit
    ...: def f(A):
    ...: inside = False
    ...: out = np.ones(len(A), dtype=float)
    ...: for i in range(len(arr)):
    ...: inside = not A[i, 0] if inside else A[i, 1]
    ...: if not inside:
    ...: out[i] = np.nan
    ...: return out
    ...:
    ...: df['signal'] = f(arr)
    ...:
    171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

    In [200]: %%timeit
    ...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)
    ...:
    ...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])
    ...:
    2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


    You can also use numpy for shifting, also @Dark code is simplify:



    In [222]: %%timeit
    ...: d = np.where(~df['long_exit_flag'].values, df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)
    ...: shifted = np.insert(d[:-1], 0, np.nan)
    ...: m = (shifted==0) | (shifted==1)
    ...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)
    ...:
    590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


    EDIT:



    You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.






    share|improve this answer



















    • 1




      Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
      – ayorgo
      3 hours ago










    • @jezrael do check the timings against my approach : )
      – Dark
      3 hours ago













    up vote
    3
    down vote










    up vote
    3
    down vote









    For improve performance use Numba solution:



    arr = df[['long_exit_flag','long_entry_flag']].values

    @jit
    def f(A):
    inside = False
    out = np.ones(len(A), dtype=float)
    for i in range(len(arr)):
    inside = not A[i, 0] if inside else A[i, 1]
    if not inside:
    out[i] = np.nan
    return out

    df['signal'] = f(arr)


    Performance:



    #[21000 rows x 5 columns]
    df = pd.concat([df] * 1000, ignore_index=True)

    In [189]: %%timeit
    ...: inside = False
    ...: for ix, row in df.iterrows():
    ...: inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
    ...: df.at[ix, 'signal'] = 1 if inside else np.nan
    ...:
    1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    In [190]: %%timeit
    ...: arr = df[['long_exit_flag','long_entry_flag']].values
    ...:
    ...: @jit
    ...: def f(A):
    ...: inside = False
    ...: out = np.ones(len(A), dtype=float)
    ...: for i in range(len(arr)):
    ...: inside = not A[i, 0] if inside else A[i, 1]
    ...: if not inside:
    ...: out[i] = np.nan
    ...: return out
    ...:
    ...: df['signal'] = f(arr)
    ...:
    171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

    In [200]: %%timeit
    ...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)
    ...:
    ...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])
    ...:
    2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


    You can also use numpy for shifting, also @Dark code is simplify:



    In [222]: %%timeit
    ...: d = np.where(~df['long_exit_flag'].values, df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)
    ...: shifted = np.insert(d[:-1], 0, np.nan)
    ...: m = (shifted==0) | (shifted==1)
    ...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)
    ...:
    590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


    EDIT:



    You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.






    share|improve this answer














    For improve performance use Numba solution:



    arr = df[['long_exit_flag','long_entry_flag']].values

    @jit
    def f(A):
    inside = False
    out = np.ones(len(A), dtype=float)
    for i in range(len(arr)):
    inside = not A[i, 0] if inside else A[i, 1]
    if not inside:
    out[i] = np.nan
    return out

    df['signal'] = f(arr)


    Performance:



    #[21000 rows x 5 columns]
    df = pd.concat([df] * 1000, ignore_index=True)

    In [189]: %%timeit
    ...: inside = False
    ...: for ix, row in df.iterrows():
    ...: inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
    ...: df.at[ix, 'signal'] = 1 if inside else np.nan
    ...:
    1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    In [190]: %%timeit
    ...: arr = df[['long_exit_flag','long_entry_flag']].values
    ...:
    ...: @jit
    ...: def f(A):
    ...: inside = False
    ...: out = np.ones(len(A), dtype=float)
    ...: for i in range(len(arr)):
    ...: inside = not A[i, 0] if inside else A[i, 1]
    ...: if not inside:
    ...: out[i] = np.nan
    ...: return out
    ...:
    ...: df['signal'] = f(arr)
    ...:
    171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

    In [200]: %%timeit
    ...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)
    ...:
    ...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])
    ...:
    2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


    You can also use numpy for shifting, also @Dark code is simplify:



    In [222]: %%timeit
    ...: d = np.where(~df['long_exit_flag'].values, df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)
    ...: shifted = np.insert(d[:-1], 0, np.nan)
    ...: m = (shifted==0) | (shifted==1)
    ...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)
    ...:
    590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


    EDIT:



    You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 2 hours ago

























    answered 3 hours ago









    jezrael

    315k21253331




    315k21253331








    • 1




      Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
      – ayorgo
      3 hours ago










    • @jezrael do check the timings against my approach : )
      – Dark
      3 hours ago














    • 1




      Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
      – ayorgo
      3 hours ago










    • @jezrael do check the timings against my approach : )
      – Dark
      3 hours ago








    1




    1




    Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
    – ayorgo
    3 hours ago




    Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
    – ayorgo
    3 hours ago












    @jezrael do check the timings against my approach : )
    – Dark
    3 hours ago




    @jezrael do check the timings against my approach : )
    – Dark
    3 hours ago










    up vote
    1
    down vote













    Here's an approach with complete boolean operations which is a vectorized approach and will be fast.



    Step 1 :
    If long_exit_flag is True return Np.nan else apply or between long_entry_flag and long_exit_flag



    df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])


    Step 2 : Now its the state where the both the columns are false. We need to ignore it and replace the values with the previous state. Which can be done using where and select



    df['new_signal']= np.where(df['d']==0, 
    np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),
    df['d'])

    Days long_entry_flag long_exit_flag signal d new_signal
    0 1 False True NaN NaN NaN
    1 2 False False NaN 0.0 NaN
    2 3 True False 1.0 1.0 1.0
    3 4 True False 1.0 1.0 1.0
    4 5 False False 1.0 0.0 1.0
    5 6 True False 1.0 1.0 1.0
    6 7 True False 1.0 1.0 1.0
    7 8 False True NaN NaN NaN
    8 9 False True NaN NaN NaN
    9 10 True False 1.0 1.0 1.0
    10 11 True False 1.0 1.0 1.0
    11 12 True False 1.0 1.0 1.0
    12 13 False False 1.0 0.0 1.0
    13 14 False True NaN NaN NaN
    14 15 False False NaN 0.0 NaN
    15 16 False True NaN NaN NaN
    16 17 True False 1.0 1.0 1.0
    17 18 True False 1.0 1.0 1.0
    18 19 False False 1.0 0.0 1.0
    19 20 False False 1.0 0.0 1.0
    20 21 False True NaN NaN NaN





    share|improve this answer



















    • 1




      Nice solution, I try numpy fy it - check edited my answer with new timings.
      – jezrael
      2 hours ago












    • I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
      – Dark
      2 hours ago












    • I know it and already upvote too. Good luck!
      – jezrael
      2 hours ago















    up vote
    1
    down vote













    Here's an approach with complete boolean operations which is a vectorized approach and will be fast.



    Step 1 :
    If long_exit_flag is True return Np.nan else apply or between long_entry_flag and long_exit_flag



    df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])


    Step 2 : Now its the state where the both the columns are false. We need to ignore it and replace the values with the previous state. Which can be done using where and select



    df['new_signal']= np.where(df['d']==0, 
    np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),
    df['d'])

    Days long_entry_flag long_exit_flag signal d new_signal
    0 1 False True NaN NaN NaN
    1 2 False False NaN 0.0 NaN
    2 3 True False 1.0 1.0 1.0
    3 4 True False 1.0 1.0 1.0
    4 5 False False 1.0 0.0 1.0
    5 6 True False 1.0 1.0 1.0
    6 7 True False 1.0 1.0 1.0
    7 8 False True NaN NaN NaN
    8 9 False True NaN NaN NaN
    9 10 True False 1.0 1.0 1.0
    10 11 True False 1.0 1.0 1.0
    11 12 True False 1.0 1.0 1.0
    12 13 False False 1.0 0.0 1.0
    13 14 False True NaN NaN NaN
    14 15 False False NaN 0.0 NaN
    15 16 False True NaN NaN NaN
    16 17 True False 1.0 1.0 1.0
    17 18 True False 1.0 1.0 1.0
    18 19 False False 1.0 0.0 1.0
    19 20 False False 1.0 0.0 1.0
    20 21 False True NaN NaN NaN





    share|improve this answer



















    • 1




      Nice solution, I try numpy fy it - check edited my answer with new timings.
      – jezrael
      2 hours ago












    • I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
      – Dark
      2 hours ago












    • I know it and already upvote too. Good luck!
      – jezrael
      2 hours ago













    up vote
    1
    down vote










    up vote
    1
    down vote









    Here's an approach with complete boolean operations which is a vectorized approach and will be fast.



    Step 1 :
    If long_exit_flag is True return Np.nan else apply or between long_entry_flag and long_exit_flag



    df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])


    Step 2 : Now its the state where the both the columns are false. We need to ignore it and replace the values with the previous state. Which can be done using where and select



    df['new_signal']= np.where(df['d']==0, 
    np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),
    df['d'])

    Days long_entry_flag long_exit_flag signal d new_signal
    0 1 False True NaN NaN NaN
    1 2 False False NaN 0.0 NaN
    2 3 True False 1.0 1.0 1.0
    3 4 True False 1.0 1.0 1.0
    4 5 False False 1.0 0.0 1.0
    5 6 True False 1.0 1.0 1.0
    6 7 True False 1.0 1.0 1.0
    7 8 False True NaN NaN NaN
    8 9 False True NaN NaN NaN
    9 10 True False 1.0 1.0 1.0
    10 11 True False 1.0 1.0 1.0
    11 12 True False 1.0 1.0 1.0
    12 13 False False 1.0 0.0 1.0
    13 14 False True NaN NaN NaN
    14 15 False False NaN 0.0 NaN
    15 16 False True NaN NaN NaN
    16 17 True False 1.0 1.0 1.0
    17 18 True False 1.0 1.0 1.0
    18 19 False False 1.0 0.0 1.0
    19 20 False False 1.0 0.0 1.0
    20 21 False True NaN NaN NaN





    share|improve this answer














    Here's an approach with complete boolean operations which is a vectorized approach and will be fast.



    Step 1 :
    If long_exit_flag is True return Np.nan else apply or between long_entry_flag and long_exit_flag



    df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])


    Step 2 : Now its the state where the both the columns are false. We need to ignore it and replace the values with the previous state. Which can be done using where and select



    df['new_signal']= np.where(df['d']==0, 
    np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),
    df['d'])

    Days long_entry_flag long_exit_flag signal d new_signal
    0 1 False True NaN NaN NaN
    1 2 False False NaN 0.0 NaN
    2 3 True False 1.0 1.0 1.0
    3 4 True False 1.0 1.0 1.0
    4 5 False False 1.0 0.0 1.0
    5 6 True False 1.0 1.0 1.0
    6 7 True False 1.0 1.0 1.0
    7 8 False True NaN NaN NaN
    8 9 False True NaN NaN NaN
    9 10 True False 1.0 1.0 1.0
    10 11 True False 1.0 1.0 1.0
    11 12 True False 1.0 1.0 1.0
    12 13 False False 1.0 0.0 1.0
    13 14 False True NaN NaN NaN
    14 15 False False NaN 0.0 NaN
    15 16 False True NaN NaN NaN
    16 17 True False 1.0 1.0 1.0
    17 18 True False 1.0 1.0 1.0
    18 19 False False 1.0 0.0 1.0
    19 20 False False 1.0 0.0 1.0
    20 21 False True NaN NaN NaN






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 2 hours ago

























    answered 3 hours ago









    Dark

    20.9k31946




    20.9k31946








    • 1




      Nice solution, I try numpy fy it - check edited my answer with new timings.
      – jezrael
      2 hours ago












    • I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
      – Dark
      2 hours ago












    • I know it and already upvote too. Good luck!
      – jezrael
      2 hours ago














    • 1




      Nice solution, I try numpy fy it - check edited my answer with new timings.
      – jezrael
      2 hours ago












    • I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
      – Dark
      2 hours ago












    • I know it and already upvote too. Good luck!
      – jezrael
      2 hours ago








    1




    1




    Nice solution, I try numpy fy it - check edited my answer with new timings.
    – jezrael
    2 hours ago






    Nice solution, I try numpy fy it - check edited my answer with new timings.
    – jezrael
    2 hours ago














    I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
    – Dark
    2 hours ago






    I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
    – Dark
    2 hours ago














    I know it and already upvote too. Good luck!
    – jezrael
    2 hours ago




    I know it and already upvote too. Good luck!
    – jezrael
    2 hours ago










    up vote
    0
    down vote













    #let the long_exit_flag equal to 0 when the exit is TRUE
    df['long_exit_flag_r']=~df.long_exit_flag_r
    df.temp=''

    for i in range(1,len(df.index)):
    df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r


    if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)






    share|improve this answer



























      up vote
      0
      down vote













      #let the long_exit_flag equal to 0 when the exit is TRUE
      df['long_exit_flag_r']=~df.long_exit_flag_r
      df.temp=''

      for i in range(1,len(df.index)):
      df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r


      if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)






      share|improve this answer

























        up vote
        0
        down vote










        up vote
        0
        down vote









        #let the long_exit_flag equal to 0 when the exit is TRUE
        df['long_exit_flag_r']=~df.long_exit_flag_r
        df.temp=''

        for i in range(1,len(df.index)):
        df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r


        if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)






        share|improve this answer














        #let the long_exit_flag equal to 0 when the exit is TRUE
        df['long_exit_flag_r']=~df.long_exit_flag_r
        df.temp=''

        for i in range(1,len(df.index)):
        df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r


        if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 5 hours ago

























        answered 5 hours ago









        ZhouXing98

        445




        445






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53721733%2fuse-columns-1-and-2-to-populate-column-3%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to ignore python UserWarning in pytest?

            What visual should I use to simply compare current year value vs last year in Power BI desktop

            Script to remove string up to first number