Use columns 1 and 2 to populate column 3

up vote
7
down vote

favorite

I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:

Days    long_entry_flag long_exit_flag  signal

 1      FALSE           TRUE    

 2      FALSE           FALSE   

 3      TRUE            FALSE            1

 4      TRUE            FALSE            1

 5      FALSE           FALSE            1

 6      TRUE            FALSE            1

 7      TRUE            FALSE            1

 8      FALSE           TRUE    

 9      FALSE           TRUE    

 10     TRUE            FALSE            1

 11     TRUE            FALSE            1

 12     TRUE            FALSE            1

 13     FALSE           FALSE            1

 14     FALSE           TRUE    

 15     FALSE           FALSE   

 16     FALSE           TRUE    

 17     TRUE            FALSE            1

 18     TRUE            FALSE            1

 19     FALSE           FALSE            1

 20     FALSE           FALSE            1

 21     FALSE           TRUE

My pseudo-code version would take the following steps

Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)

Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8

Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)

And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)

Welcome ideas about ways to populate the ‘signal’ column rapidly if possible (using vectorisation?) - this is a subset of a large dataframe with tens of thousands of rows, and it is one of many dataframes being analysed in sequence.

Many thanks in advance!

edited 5 hours ago

asked 6 hours ago

Baz

636

I'm sorry I'm trying to delete this question to get the formatting right...
– Baz
6 hours ago

1

If u still think you want to edit your question you can delete and raise a new question.
– Mohamed Thasin ah
6 hours ago

1

Oh perfect editing thank you Mohamed Thasin ah!
– Baz
6 hours ago

add a comment |

up vote
7
down vote

favorite

I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:

Days    long_entry_flag long_exit_flag  signal

 1      FALSE           TRUE    

 2      FALSE           FALSE   

 3      TRUE            FALSE            1

 4      TRUE            FALSE            1

 5      FALSE           FALSE            1

 6      TRUE            FALSE            1

 7      TRUE            FALSE            1

 8      FALSE           TRUE    

 9      FALSE           TRUE    

 10     TRUE            FALSE            1

 11     TRUE            FALSE            1

 12     TRUE            FALSE            1

 13     FALSE           FALSE            1

 14     FALSE           TRUE    

 15     FALSE           FALSE   

 16     FALSE           TRUE    

 17     TRUE            FALSE            1

 18     TRUE            FALSE            1

 19     FALSE           FALSE            1

 20     FALSE           FALSE            1

 21     FALSE           TRUE

My pseudo-code version would take the following steps

Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)

Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8

Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)

And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)

Many thanks in advance!

edited 5 hours ago

asked 6 hours ago

Baz

636

I'm sorry I'm trying to delete this question to get the formatting right...
– Baz
6 hours ago

1

If u still think you want to edit your question you can delete and raise a new question.
– Mohamed Thasin ah
6 hours ago

1

Oh perfect editing thank you Mohamed Thasin ah!
– Baz
6 hours ago

add a comment |

up vote
7
down vote

favorite

I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:

Days    long_entry_flag long_exit_flag  signal

 1      FALSE           TRUE    

 2      FALSE           FALSE   

 3      TRUE            FALSE            1

 4      TRUE            FALSE            1

 5      FALSE           FALSE            1

 6      TRUE            FALSE            1

 7      TRUE            FALSE            1

 8      FALSE           TRUE    

 9      FALSE           TRUE    

 10     TRUE            FALSE            1

 11     TRUE            FALSE            1

 12     TRUE            FALSE            1

 13     FALSE           FALSE            1

 14     FALSE           TRUE    

 15     FALSE           FALSE   

 16     FALSE           TRUE    

 17     TRUE            FALSE            1

 18     TRUE            FALSE            1

 19     FALSE           FALSE            1

 20     FALSE           FALSE            1

 21     FALSE           TRUE

My pseudo-code version would take the following steps

Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)

Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8

Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)

And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)

Many thanks in advance!

edited 5 hours ago

asked 6 hours ago

Baz

636

I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:

Days    long_entry_flag long_exit_flag  signal

 1      FALSE           TRUE    

 2      FALSE           FALSE   

 3      TRUE            FALSE            1

 4      TRUE            FALSE            1

 5      FALSE           FALSE            1

 6      TRUE            FALSE            1

 7      TRUE            FALSE            1

 8      FALSE           TRUE    

 9      FALSE           TRUE    

 10     TRUE            FALSE            1

 11     TRUE            FALSE            1

 12     TRUE            FALSE            1

 13     FALSE           FALSE            1

 14     FALSE           TRUE    

 15     FALSE           FALSE   

 16     FALSE           TRUE    

 17     TRUE            FALSE            1

 18     TRUE            FALSE            1

 19     FALSE           FALSE            1

 20     FALSE           FALSE            1

 21     FALSE           TRUE

My pseudo-code version would take the following steps

Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)

Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8

Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)

And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)

Many thanks in advance!

python pandas

edited 5 hours ago

asked 6 hours ago

Baz

636

edited 5 hours ago

asked 6 hours ago

Baz

636

edited 5 hours ago

asked 6 hours ago

Baz

636

asked 6 hours ago

Baz

636

asked 6 hours ago

Baz

636

I'm sorry I'm trying to delete this question to get the formatting right...
– Baz
6 hours ago

1

If u still think you want to edit your question you can delete and raise a new question.
– Mohamed Thasin ah
6 hours ago

1

Oh perfect editing thank you Mohamed Thasin ah!
– Baz
6 hours ago

add a comment |

I'm sorry I'm trying to delete this question to get the formatting right...
– Baz
6 hours ago

1

If u still think you want to edit your question you can delete and raise a new question.
– Mohamed Thasin ah
6 hours ago

1

Oh perfect editing thank you Mohamed Thasin ah!
– Baz
6 hours ago

I'm sorry I'm trying to delete this question to get the formatting right...
– Baz
6 hours ago

If u still think you want to edit your question you can delete and raise a new question.
– Mohamed Thasin ah
6 hours ago

Oh perfect editing thank you Mohamed Thasin ah!
– Baz
6 hours ago

add a comment |

4 Answers
4

active

oldest

votes

up vote
5
down vote

accepted

You can do

# Assuming we're starting from the "outside"

inside = False

for ix, row in df.iterrows():

    inside = not row['long_exit_flag'] if inside else row['long_entry_flag']

    df.at[ix, 'signal'] = 1 if inside else np.nan

which is going to give you exactly the output you posted.

answered 5 hours ago

ayorgo

663311

No worries. Good question @Baz
– ayorgo
4 hours ago

I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using apply (currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
4 hours ago

@ayorgo - added solution.
– jezrael
3 hours ago

add a comment |

up vote
3
down vote

For improve performance use Numba solution:

arr = df[['long_exit_flag','long_entry_flag']].values



@jit

def f(A):

    inside = False

    out = np.ones(len(A), dtype=float)

    for i in range(len(arr)):

        inside = not A[i, 0] if inside else A[i, 1] 

        if not inside:

            out[i] = np.nan

    return out



df['signal'] = f(arr)

Performance:

#[21000 rows x 5 columns]

df = pd.concat([df] * 1000, ignore_index=True)



In [189]: %%timeit

     ...: inside = False

     ...: for ix, row in df.iterrows():

     ...:     inside = not row['long_exit_flag'] if inside else row['long_entry_flag']

     ...:     df.at[ix, 'signal'] = 1 if inside else np.nan

     ...: 

1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [190]: %%timeit

     ...: arr = df[['long_exit_flag','long_entry_flag']].values

     ...: 

     ...: @jit

     ...: def f(A):

     ...:     inside = False

     ...:     out = np.ones(len(A), dtype=float)

     ...:     for i in range(len(arr)):

     ...:         inside = not A[i, 0] if inside else A[i, 1] 

     ...:         if not inside:

     ...:             out[i] = np.nan

     ...:     return out

     ...: 

     ...: df['signal'] = f(arr)

     ...: 

171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



In [200]: %%timeit

     ...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)

     ...: 

     ...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])

     ...: 

2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

You can also use numpy for shifting, also @Dark code is simplify:

In [222]: %%timeit

     ...: d = np.where(~df['long_exit_flag'].values,  df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)

     ...: shifted = np.insert(d[:-1], 0, np.nan)

     ...: m = (shifted==0) | (shifted==1)

     ...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)

     ...: 

590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

EDIT:

You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.

edited 2 hours ago

answered 3 hours ago

jezrael

315k21253331

1

Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
3 hours ago

@jezrael do check the timings against my approach : )
– Dark
3 hours ago

add a comment |

up vote
1
down vote

Here's an approach with complete boolean operations which is a vectorized approach and will be fast.

Step 1 :
If long_exit_flag is True return Np.nan else apply or between long_entry_flag and long_exit_flag

df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])

Step 2 : Now its the state where the both the columns are false. We need to ignore it and replace the values with the previous state. Which can be done using where and select

df['new_signal']= np.where(df['d']==0, 

                  np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),

                  df['d'])



    Days  long_entry_flag  long_exit_flag  signal    d  new_signal

0      1            False            True     NaN  NaN         NaN

1      2            False           False     NaN  0.0         NaN

2      3             True           False     1.0  1.0         1.0

3      4             True           False     1.0  1.0         1.0

4      5            False           False     1.0  0.0         1.0

5      6             True           False     1.0  1.0         1.0

6      7             True           False     1.0  1.0         1.0

7      8            False            True     NaN  NaN         NaN

8      9            False            True     NaN  NaN         NaN

9     10             True           False     1.0  1.0         1.0

10    11             True           False     1.0  1.0         1.0

11    12             True           False     1.0  1.0         1.0

12    13            False           False     1.0  0.0         1.0

13    14            False            True     NaN  NaN         NaN

14    15            False           False     NaN  0.0         NaN

15    16            False            True     NaN  NaN         NaN

16    17             True           False     1.0  1.0         1.0

17    18             True           False     1.0  1.0         1.0

18    19            False           False     1.0  0.0         1.0

19    20            False           False     1.0  0.0         1.0

20    21            False            True     NaN  NaN         NaN

edited 2 hours ago

answered 3 hours ago

Dark

20.9k31946

1

Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
2 hours ago

I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
2 hours ago

I know it and already upvote too. Good luck!
– jezrael
2 hours ago

add a comment |

up vote
0
down vote

#let the long_exit_flag equal to 0 when the exit is TRUE

df['long_exit_flag_r']=~df.long_exit_flag_r

df.temp=''



for i in range(1,len(df.index)):

    df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r

if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)

edited 5 hours ago

answered 5 hours ago

ZhouXing98

445

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53721733%2fuse-columns-1-and-2-to-populate-column-3%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
5
down vote

accepted

You can do

# Assuming we're starting from the "outside"

inside = False

for ix, row in df.iterrows():

    inside = not row['long_exit_flag'] if inside else row['long_entry_flag']

    df.at[ix, 'signal'] = 1 if inside else np.nan

which is going to give you exactly the output you posted.

answered 5 hours ago

ayorgo

663311

No worries. Good question @Baz
– ayorgo
4 hours ago

I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using apply (currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
4 hours ago

@ayorgo - added solution.
– jezrael
3 hours ago

add a comment |

up vote
5
down vote

accepted

You can do

# Assuming we're starting from the "outside"

inside = False

for ix, row in df.iterrows():

    inside = not row['long_exit_flag'] if inside else row['long_entry_flag']

    df.at[ix, 'signal'] = 1 if inside else np.nan

which is going to give you exactly the output you posted.

answered 5 hours ago

ayorgo

663311

No worries. Good question @Baz
– ayorgo
4 hours ago

I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using apply (currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
4 hours ago

@ayorgo - added solution.
– jezrael
3 hours ago

add a comment |

up vote
5
down vote

accepted

You can do

# Assuming we're starting from the "outside"

inside = False

for ix, row in df.iterrows():

    inside = not row['long_exit_flag'] if inside else row['long_entry_flag']

    df.at[ix, 'signal'] = 1 if inside else np.nan

which is going to give you exactly the output you posted.

answered 5 hours ago

ayorgo

663311

You can do

# Assuming we're starting from the "outside"

inside = False

for ix, row in df.iterrows():

    inside = not row['long_exit_flag'] if inside else row['long_entry_flag']

    df.at[ix, 'signal'] = 1 if inside else np.nan

which is going to give you exactly the output you posted.

answered 5 hours ago

ayorgo

663311

answered 5 hours ago

ayorgo

663311

answered 5 hours ago

ayorgo

663311

answered 5 hours ago

ayorgo

663311

No worries. Good question @Baz
– ayorgo
4 hours ago

I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using apply (currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
4 hours ago

@ayorgo - added solution.
– jezrael
3 hours ago

add a comment |

No worries. Good question @Baz
– ayorgo
4 hours ago

I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using apply (currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
4 hours ago

@ayorgo - added solution.
– jezrael
3 hours ago

No worries. Good question @Baz
– ayorgo
4 hours ago

I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using apply (currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
4 hours ago

@ayorgo - added solution.
– jezrael
3 hours ago

add a comment |

up vote
3
down vote

For improve performance use Numba solution:

arr = df[['long_exit_flag','long_entry_flag']].values



@jit

def f(A):

    inside = False

    out = np.ones(len(A), dtype=float)

    for i in range(len(arr)):

        inside = not A[i, 0] if inside else A[i, 1] 

        if not inside:

            out[i] = np.nan

    return out



df['signal'] = f(arr)

Performance:

#[21000 rows x 5 columns]

df = pd.concat([df] * 1000, ignore_index=True)



In [189]: %%timeit

     ...: inside = False

     ...: for ix, row in df.iterrows():

     ...:     inside = not row['long_exit_flag'] if inside else row['long_entry_flag']

     ...:     df.at[ix, 'signal'] = 1 if inside else np.nan

     ...: 

1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [190]: %%timeit

     ...: arr = df[['long_exit_flag','long_entry_flag']].values

     ...: 

     ...: @jit

     ...: def f(A):

     ...:     inside = False

     ...:     out = np.ones(len(A), dtype=float)

     ...:     for i in range(len(arr)):

     ...:         inside = not A[i, 0] if inside else A[i, 1] 

     ...:         if not inside:

     ...:             out[i] = np.nan

     ...:     return out

     ...: 

     ...: df['signal'] = f(arr)

     ...: 

171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



In [200]: %%timeit

     ...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)

     ...: 

     ...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])

     ...: 

2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

You can also use numpy for shifting, also @Dark code is simplify:

In [222]: %%timeit

     ...: d = np.where(~df['long_exit_flag'].values,  df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)

     ...: shifted = np.insert(d[:-1], 0, np.nan)

     ...: m = (shifted==0) | (shifted==1)

     ...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)

     ...: 

590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

EDIT:

You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.

edited 2 hours ago

answered 3 hours ago

jezrael

315k21253331

1

Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
3 hours ago

@jezrael do check the timings against my approach : )
– Dark
3 hours ago

add a comment |

up vote
3
down vote

For improve performance use Numba solution:

arr = df[['long_exit_flag','long_entry_flag']].values



@jit

def f(A):

    inside = False

    out = np.ones(len(A), dtype=float)

    for i in range(len(arr)):

        inside = not A[i, 0] if inside else A[i, 1] 

        if not inside:

            out[i] = np.nan

    return out



df['signal'] = f(arr)

Performance:

#[21000 rows x 5 columns]

df = pd.concat([df] * 1000, ignore_index=True)



In [189]: %%timeit

     ...: inside = False

     ...: for ix, row in df.iterrows():

     ...:     inside = not row['long_exit_flag'] if inside else row['long_entry_flag']

     ...:     df.at[ix, 'signal'] = 1 if inside else np.nan

     ...: 

1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [190]: %%timeit

     ...: arr = df[['long_exit_flag','long_entry_flag']].values

     ...: 

     ...: @jit

     ...: def f(A):

     ...:     inside = False

     ...:     out = np.ones(len(A), dtype=float)

     ...:     for i in range(len(arr)):

     ...:         inside = not A[i, 0] if inside else A[i, 1] 

     ...:         if not inside:

     ...:             out[i] = np.nan

     ...:     return out

     ...: 

     ...: df['signal'] = f(arr)

     ...: 

171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



In [200]: %%timeit

     ...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)

     ...: 

     ...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])

     ...: 

2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

You can also use numpy for shifting, also @Dark code is simplify:

In [222]: %%timeit

     ...: d = np.where(~df['long_exit_flag'].values,  df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)

     ...: shifted = np.insert(d[:-1], 0, np.nan)

     ...: m = (shifted==0) | (shifted==1)

     ...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)

     ...: 

590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

EDIT:

You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.

edited 2 hours ago

answered 3 hours ago

jezrael

315k21253331

1

Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
3 hours ago

@jezrael do check the timings against my approach : )
– Dark
3 hours ago

add a comment |

up vote
3
down vote

For improve performance use Numba solution:

arr = df[['long_exit_flag','long_entry_flag']].values



@jit

def f(A):

    inside = False

    out = np.ones(len(A), dtype=float)

    for i in range(len(arr)):

        inside = not A[i, 0] if inside else A[i, 1] 

        if not inside:

            out[i] = np.nan

    return out



df['signal'] = f(arr)

Performance:

#[21000 rows x 5 columns]

df = pd.concat([df] * 1000, ignore_index=True)



In [189]: %%timeit

     ...: inside = False

     ...: for ix, row in df.iterrows():

     ...:     inside = not row['long_exit_flag'] if inside else row['long_entry_flag']

     ...:     df.at[ix, 'signal'] = 1 if inside else np.nan

     ...: 

1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [190]: %%timeit

     ...: arr = df[['long_exit_flag','long_entry_flag']].values

     ...: 

     ...: @jit

     ...: def f(A):

     ...:     inside = False

     ...:     out = np.ones(len(A), dtype=float)

     ...:     for i in range(len(arr)):

     ...:         inside = not A[i, 0] if inside else A[i, 1] 

     ...:         if not inside:

     ...:             out[i] = np.nan

     ...:     return out

     ...: 

     ...: df['signal'] = f(arr)

     ...: 

171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



In [200]: %%timeit

     ...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)

     ...: 

     ...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])

     ...: 

2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

You can also use numpy for shifting, also @Dark code is simplify:

In [222]: %%timeit

     ...: d = np.where(~df['long_exit_flag'].values,  df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)

     ...: shifted = np.insert(d[:-1], 0, np.nan)

     ...: m = (shifted==0) | (shifted==1)

     ...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)

     ...: 

590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

EDIT:

You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.

edited 2 hours ago

answered 3 hours ago

jezrael

315k21253331

For improve performance use Numba solution:

arr = df[['long_exit_flag','long_entry_flag']].values



@jit

def f(A):

    inside = False

    out = np.ones(len(A), dtype=float)

    for i in range(len(arr)):

        inside = not A[i, 0] if inside else A[i, 1] 

        if not inside:

            out[i] = np.nan

    return out



df['signal'] = f(arr)

Performance:

#[21000 rows x 5 columns]

df = pd.concat([df] * 1000, ignore_index=True)



In [189]: %%timeit

     ...: inside = False

     ...: for ix, row in df.iterrows():

     ...:     inside = not row['long_exit_flag'] if inside else row['long_entry_flag']

     ...:     df.at[ix, 'signal'] = 1 if inside else np.nan

     ...: 

1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [190]: %%timeit

     ...: arr = df[['long_exit_flag','long_entry_flag']].values

     ...: 

     ...: @jit

     ...: def f(A):

     ...:     inside = False

     ...:     out = np.ones(len(A), dtype=float)

     ...:     for i in range(len(arr)):

     ...:         inside = not A[i, 0] if inside else A[i, 1] 

     ...:         if not inside:

     ...:             out[i] = np.nan

     ...:     return out

     ...: 

     ...: df['signal'] = f(arr)

     ...: 

171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



In [200]: %%timeit

     ...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)

     ...: 

     ...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])

     ...: 

2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

You can also use numpy for shifting, also @Dark code is simplify:

In [222]: %%timeit

     ...: d = np.where(~df['long_exit_flag'].values,  df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)

     ...: shifted = np.insert(d[:-1], 0, np.nan)

     ...: m = (shifted==0) | (shifted==1)

     ...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)

     ...: 

590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

EDIT:

You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.

edited 2 hours ago

answered 3 hours ago

jezrael

315k21253331

edited 2 hours ago

answered 3 hours ago

jezrael

315k21253331

answered 3 hours ago

jezrael

315k21253331

answered 3 hours ago

jezrael

315k21253331

1

Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
3 hours ago

@jezrael do check the timings against my approach : )
– Dark
3 hours ago

add a comment |

1

Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
3 hours ago

@jezrael do check the timings against my approach : )
– Dark
3 hours ago

Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
3 hours ago

@jezrael do check the timings against my approach : )
– Dark
3 hours ago

add a comment |

up vote
1
down vote

Here's an approach with complete boolean operations which is a vectorized approach and will be fast.

Step 1 :
If long_exit_flag is True return Np.nan else apply or between long_entry_flag and long_exit_flag

df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])

Step 2 : Now its the state where the both the columns are false. We need to ignore it and replace the values with the previous state. Which can be done using where and select

df['new_signal']= np.where(df['d']==0, 

                  np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),

                  df['d'])



    Days  long_entry_flag  long_exit_flag  signal    d  new_signal

0      1            False            True     NaN  NaN         NaN

1      2            False           False     NaN  0.0         NaN

2      3             True           False     1.0  1.0         1.0

3      4             True           False     1.0  1.0         1.0

4      5            False           False     1.0  0.0         1.0

5      6             True           False     1.0  1.0         1.0

6      7             True           False     1.0  1.0         1.0

7      8            False            True     NaN  NaN         NaN

8      9            False            True     NaN  NaN         NaN

9     10             True           False     1.0  1.0         1.0

10    11             True           False     1.0  1.0         1.0

11    12             True           False     1.0  1.0         1.0

12    13            False           False     1.0  0.0         1.0

13    14            False            True     NaN  NaN         NaN

14    15            False           False     NaN  0.0         NaN

15    16            False            True     NaN  NaN         NaN

16    17             True           False     1.0  1.0         1.0

17    18             True           False     1.0  1.0         1.0

18    19            False           False     1.0  0.0         1.0

19    20            False           False     1.0  0.0         1.0

20    21            False            True     NaN  NaN         NaN

edited 2 hours ago

answered 3 hours ago

Dark

20.9k31946

1

Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
2 hours ago

I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
2 hours ago

I know it and already upvote too. Good luck!
– jezrael
2 hours ago

add a comment |

up vote
1
down vote

Here's an approach with complete boolean operations which is a vectorized approach and will be fast.

Step 1 :
If long_exit_flag is True return Np.nan else apply or between long_entry_flag and long_exit_flag

df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])

Step 2 : Now its the state where the both the columns are false. We need to ignore it and replace the values with the previous state. Which can be done using where and select

df['new_signal']= np.where(df['d']==0, 

                  np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),

                  df['d'])



    Days  long_entry_flag  long_exit_flag  signal    d  new_signal

0      1            False            True     NaN  NaN         NaN

1      2            False           False     NaN  0.0         NaN

2      3             True           False     1.0  1.0         1.0

3      4             True           False     1.0  1.0         1.0

4      5            False           False     1.0  0.0         1.0

5      6             True           False     1.0  1.0         1.0

6      7             True           False     1.0  1.0         1.0

7      8            False            True     NaN  NaN         NaN

8      9            False            True     NaN  NaN         NaN

9     10             True           False     1.0  1.0         1.0

10    11             True           False     1.0  1.0         1.0

11    12             True           False     1.0  1.0         1.0

12    13            False           False     1.0  0.0         1.0

13    14            False            True     NaN  NaN         NaN

14    15            False           False     NaN  0.0         NaN

15    16            False            True     NaN  NaN         NaN

16    17             True           False     1.0  1.0         1.0

17    18             True           False     1.0  1.0         1.0

18    19            False           False     1.0  0.0         1.0

19    20            False           False     1.0  0.0         1.0

20    21            False            True     NaN  NaN         NaN

edited 2 hours ago

answered 3 hours ago

Dark

20.9k31946

1

Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
2 hours ago

I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
2 hours ago

I know it and already upvote too. Good luck!
– jezrael
2 hours ago

add a comment |

up vote
1
down vote

Here's an approach with complete boolean operations which is a vectorized approach and will be fast.

Step 1 :
If long_exit_flag is True return Np.nan else apply or between long_entry_flag and long_exit_flag

df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])

Step 2 : Now its the state where the both the columns are false. We need to ignore it and replace the values with the previous state. Which can be done using where and select

df['new_signal']= np.where(df['d']==0, 

                  np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),

                  df['d'])



    Days  long_entry_flag  long_exit_flag  signal    d  new_signal

0      1            False            True     NaN  NaN         NaN

1      2            False           False     NaN  0.0         NaN

2      3             True           False     1.0  1.0         1.0

3      4             True           False     1.0  1.0         1.0

4      5            False           False     1.0  0.0         1.0

5      6             True           False     1.0  1.0         1.0

6      7             True           False     1.0  1.0         1.0

7      8            False            True     NaN  NaN         NaN

8      9            False            True     NaN  NaN         NaN

9     10             True           False     1.0  1.0         1.0

10    11             True           False     1.0  1.0         1.0

11    12             True           False     1.0  1.0         1.0

12    13            False           False     1.0  0.0         1.0

13    14            False            True     NaN  NaN         NaN

14    15            False           False     NaN  0.0         NaN

15    16            False            True     NaN  NaN         NaN

16    17             True           False     1.0  1.0         1.0

17    18             True           False     1.0  1.0         1.0

18    19            False           False     1.0  0.0         1.0

19    20            False           False     1.0  0.0         1.0

20    21            False            True     NaN  NaN         NaN

edited 2 hours ago

answered 3 hours ago

Dark

20.9k31946

Here's an approach with complete boolean operations which is a vectorized approach and will be fast.

Step 1 :
If long_exit_flag is True return Np.nan else apply or between long_entry_flag and long_exit_flag

df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])

Step 2 : Now its the state where the both the columns are false. We need to ignore it and replace the values with the previous state. Which can be done using where and select

df['new_signal']= np.where(df['d']==0, 

                  np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),

                  df['d'])



    Days  long_entry_flag  long_exit_flag  signal    d  new_signal

0      1            False            True     NaN  NaN         NaN

1      2            False           False     NaN  0.0         NaN

2      3             True           False     1.0  1.0         1.0

3      4             True           False     1.0  1.0         1.0

4      5            False           False     1.0  0.0         1.0

5      6             True           False     1.0  1.0         1.0

6      7             True           False     1.0  1.0         1.0

7      8            False            True     NaN  NaN         NaN

8      9            False            True     NaN  NaN         NaN

9     10             True           False     1.0  1.0         1.0

10    11             True           False     1.0  1.0         1.0

11    12             True           False     1.0  1.0         1.0

12    13            False           False     1.0  0.0         1.0

13    14            False            True     NaN  NaN         NaN

14    15            False           False     NaN  0.0         NaN

15    16            False            True     NaN  NaN         NaN

16    17             True           False     1.0  1.0         1.0

17    18             True           False     1.0  1.0         1.0

18    19            False           False     1.0  0.0         1.0

19    20            False           False     1.0  0.0         1.0

20    21            False            True     NaN  NaN         NaN

edited 2 hours ago

answered 3 hours ago

Dark

20.9k31946

edited 2 hours ago

answered 3 hours ago

Dark

20.9k31946

answered 3 hours ago

Dark

20.9k31946

answered 3 hours ago

Dark

20.9k31946

1

Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
2 hours ago

I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
2 hours ago

I know it and already upvote too. Good luck!
– jezrael
2 hours ago

add a comment |

1

Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
2 hours ago

I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
2 hours ago

I know it and already upvote too. Good luck!
– jezrael
2 hours ago

Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
2 hours ago

I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
2 hours ago

I know it and already upvote too. Good luck!
– jezrael
2 hours ago

add a comment |

up vote
0
down vote

#let the long_exit_flag equal to 0 when the exit is TRUE

df['long_exit_flag_r']=~df.long_exit_flag_r

df.temp=''



for i in range(1,len(df.index)):

    df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r

if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)

edited 5 hours ago

answered 5 hours ago

ZhouXing98

445

add a comment |

up vote
0
down vote

#let the long_exit_flag equal to 0 when the exit is TRUE

df['long_exit_flag_r']=~df.long_exit_flag_r

df.temp=''



for i in range(1,len(df.index)):

    df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r

if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)

edited 5 hours ago

answered 5 hours ago

ZhouXing98

445

add a comment |

up vote
0
down vote

#let the long_exit_flag equal to 0 when the exit is TRUE

df['long_exit_flag_r']=~df.long_exit_flag_r

df.temp=''



for i in range(1,len(df.index)):

    df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r

if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)

edited 5 hours ago

answered 5 hours ago

ZhouXing98

445

#let the long_exit_flag equal to 0 when the exit is TRUE

df['long_exit_flag_r']=~df.long_exit_flag_r

df.temp=''



for i in range(1,len(df.index)):

    df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r

if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)

edited 5 hours ago

answered 5 hours ago

ZhouXing98

445

edited 5 hours ago

answered 5 hours ago

ZhouXing98

445

answered 5 hours ago

ZhouXing98

445

answered 5 hours ago

ZhouXing98

445

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Qfyilyi