Use columns 1 and 2 to populate column 3
up vote
7
down vote
favorite
I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:
Days long_entry_flag long_exit_flag signal
1 FALSE TRUE
2 FALSE FALSE
3 TRUE FALSE 1
4 TRUE FALSE 1
5 FALSE FALSE 1
6 TRUE FALSE 1
7 TRUE FALSE 1
8 FALSE TRUE
9 FALSE TRUE
10 TRUE FALSE 1
11 TRUE FALSE 1
12 TRUE FALSE 1
13 FALSE FALSE 1
14 FALSE TRUE
15 FALSE FALSE
16 FALSE TRUE
17 TRUE FALSE 1
18 TRUE FALSE 1
19 FALSE FALSE 1
20 FALSE FALSE 1
21 FALSE TRUE
My pseudo-code
version would take the following steps
- Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)
- Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8
- Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)
- And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)
- etc
Welcome ideas about ways to populate the ‘signal’ column rapidly if possible (using vectorisation?) - this is a subset of a large dataframe with tens of thousands of rows, and it is one of many dataframes being analysed in sequence.
Many thanks in advance!
python pandas
add a comment |
up vote
7
down vote
favorite
I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:
Days long_entry_flag long_exit_flag signal
1 FALSE TRUE
2 FALSE FALSE
3 TRUE FALSE 1
4 TRUE FALSE 1
5 FALSE FALSE 1
6 TRUE FALSE 1
7 TRUE FALSE 1
8 FALSE TRUE
9 FALSE TRUE
10 TRUE FALSE 1
11 TRUE FALSE 1
12 TRUE FALSE 1
13 FALSE FALSE 1
14 FALSE TRUE
15 FALSE FALSE
16 FALSE TRUE
17 TRUE FALSE 1
18 TRUE FALSE 1
19 FALSE FALSE 1
20 FALSE FALSE 1
21 FALSE TRUE
My pseudo-code
version would take the following steps
- Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)
- Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8
- Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)
- And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)
- etc
Welcome ideas about ways to populate the ‘signal’ column rapidly if possible (using vectorisation?) - this is a subset of a large dataframe with tens of thousands of rows, and it is one of many dataframes being analysed in sequence.
Many thanks in advance!
python pandas
I'm sorry I'm trying to delete this question to get the formatting right...
– Baz
6 hours ago
1
If u still think you want to edit your question you can delete and raise a new question.
– Mohamed Thasin ah
6 hours ago
1
Oh perfect editing thank you Mohamed Thasin ah!
– Baz
6 hours ago
add a comment |
up vote
7
down vote
favorite
up vote
7
down vote
favorite
I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:
Days long_entry_flag long_exit_flag signal
1 FALSE TRUE
2 FALSE FALSE
3 TRUE FALSE 1
4 TRUE FALSE 1
5 FALSE FALSE 1
6 TRUE FALSE 1
7 TRUE FALSE 1
8 FALSE TRUE
9 FALSE TRUE
10 TRUE FALSE 1
11 TRUE FALSE 1
12 TRUE FALSE 1
13 FALSE FALSE 1
14 FALSE TRUE
15 FALSE FALSE
16 FALSE TRUE
17 TRUE FALSE 1
18 TRUE FALSE 1
19 FALSE FALSE 1
20 FALSE FALSE 1
21 FALSE TRUE
My pseudo-code
version would take the following steps
- Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)
- Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8
- Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)
- And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)
- etc
Welcome ideas about ways to populate the ‘signal’ column rapidly if possible (using vectorisation?) - this is a subset of a large dataframe with tens of thousands of rows, and it is one of many dataframes being analysed in sequence.
Many thanks in advance!
python pandas
I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:
Days long_entry_flag long_exit_flag signal
1 FALSE TRUE
2 FALSE FALSE
3 TRUE FALSE 1
4 TRUE FALSE 1
5 FALSE FALSE 1
6 TRUE FALSE 1
7 TRUE FALSE 1
8 FALSE TRUE
9 FALSE TRUE
10 TRUE FALSE 1
11 TRUE FALSE 1
12 TRUE FALSE 1
13 FALSE FALSE 1
14 FALSE TRUE
15 FALSE FALSE
16 FALSE TRUE
17 TRUE FALSE 1
18 TRUE FALSE 1
19 FALSE FALSE 1
20 FALSE FALSE 1
21 FALSE TRUE
My pseudo-code
version would take the following steps
- Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)
- Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8
- Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)
- And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)
- etc
Welcome ideas about ways to populate the ‘signal’ column rapidly if possible (using vectorisation?) - this is a subset of a large dataframe with tens of thousands of rows, and it is one of many dataframes being analysed in sequence.
Many thanks in advance!
python pandas
python pandas
edited 5 hours ago
asked 6 hours ago
Baz
636
636
I'm sorry I'm trying to delete this question to get the formatting right...
– Baz
6 hours ago
1
If u still think you want to edit your question you can delete and raise a new question.
– Mohamed Thasin ah
6 hours ago
1
Oh perfect editing thank you Mohamed Thasin ah!
– Baz
6 hours ago
add a comment |
I'm sorry I'm trying to delete this question to get the formatting right...
– Baz
6 hours ago
1
If u still think you want to edit your question you can delete and raise a new question.
– Mohamed Thasin ah
6 hours ago
1
Oh perfect editing thank you Mohamed Thasin ah!
– Baz
6 hours ago
I'm sorry I'm trying to delete this question to get the formatting right...
– Baz
6 hours ago
I'm sorry I'm trying to delete this question to get the formatting right...
– Baz
6 hours ago
1
1
If u still think you want to edit your question you can delete and raise a new question.
– Mohamed Thasin ah
6 hours ago
If u still think you want to edit your question you can delete and raise a new question.
– Mohamed Thasin ah
6 hours ago
1
1
Oh perfect editing thank you Mohamed Thasin ah!
– Baz
6 hours ago
Oh perfect editing thank you Mohamed Thasin ah!
– Baz
6 hours ago
add a comment |
4 Answers
4
active
oldest
votes
up vote
5
down vote
accepted
You can do
# Assuming we're starting from the "outside"
inside = False
for ix, row in df.iterrows():
inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
df.at[ix, 'signal'] = 1 if inside else np.nan
which is going to give you exactly the output you posted.
No worries. Good question @Baz
– ayorgo
4 hours ago
I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state usingapply
(currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
4 hours ago
@ayorgo - added solution.
– jezrael
3 hours ago
add a comment |
up vote
3
down vote
For improve performance use Numba solution:
arr = df[['long_exit_flag','long_entry_flag']].values
@jit
def f(A):
inside = False
out = np.ones(len(A), dtype=float)
for i in range(len(arr)):
inside = not A[i, 0] if inside else A[i, 1]
if not inside:
out[i] = np.nan
return out
df['signal'] = f(arr)
Performance:
#[21000 rows x 5 columns]
df = pd.concat([df] * 1000, ignore_index=True)
In [189]: %%timeit
...: inside = False
...: for ix, row in df.iterrows():
...: inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
...: df.at[ix, 'signal'] = 1 if inside else np.nan
...:
1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [190]: %%timeit
...: arr = df[['long_exit_flag','long_entry_flag']].values
...:
...: @jit
...: def f(A):
...: inside = False
...: out = np.ones(len(A), dtype=float)
...: for i in range(len(arr)):
...: inside = not A[i, 0] if inside else A[i, 1]
...: if not inside:
...: out[i] = np.nan
...: return out
...:
...: df['signal'] = f(arr)
...:
171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [200]: %%timeit
...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)
...:
...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])
...:
2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
You can also use numpy for shifting, also @Dark code is simplify:
In [222]: %%timeit
...: d = np.where(~df['long_exit_flag'].values, df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)
...: shifted = np.insert(d[:-1], 0, np.nan)
...: m = (shifted==0) | (shifted==1)
...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)
...:
590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
EDIT:
You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.
1
Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
3 hours ago
@jezrael do check the timings against my approach : )
– Dark
3 hours ago
add a comment |
up vote
1
down vote
Here's an approach with complete boolean operations which is a vectorized approach and will be fast.
Step 1 :
If long_exit_flag is True return Np.nan else apply or
between long_entry_flag
and long_exit_flag
df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])
Step 2 : Now its the state where the both the columns are false
. We need to ignore it and replace the values with the previous state. Which can be done using where
and select
df['new_signal']= np.where(df['d']==0,
np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),
df['d'])
Days long_entry_flag long_exit_flag signal d new_signal
0 1 False True NaN NaN NaN
1 2 False False NaN 0.0 NaN
2 3 True False 1.0 1.0 1.0
3 4 True False 1.0 1.0 1.0
4 5 False False 1.0 0.0 1.0
5 6 True False 1.0 1.0 1.0
6 7 True False 1.0 1.0 1.0
7 8 False True NaN NaN NaN
8 9 False True NaN NaN NaN
9 10 True False 1.0 1.0 1.0
10 11 True False 1.0 1.0 1.0
11 12 True False 1.0 1.0 1.0
12 13 False False 1.0 0.0 1.0
13 14 False True NaN NaN NaN
14 15 False False NaN 0.0 NaN
15 16 False True NaN NaN NaN
16 17 True False 1.0 1.0 1.0
17 18 True False 1.0 1.0 1.0
18 19 False False 1.0 0.0 1.0
19 20 False False 1.0 0.0 1.0
20 21 False True NaN NaN NaN
1
Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
2 hours ago
I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
2 hours ago
I know it and already upvote too. Good luck!
– jezrael
2 hours ago
add a comment |
up vote
0
down vote
#let the long_exit_flag equal to 0 when the exit is TRUE
df['long_exit_flag_r']=~df.long_exit_flag_r
df.temp=''
for i in range(1,len(df.index)):
df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r
if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
You can do
# Assuming we're starting from the "outside"
inside = False
for ix, row in df.iterrows():
inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
df.at[ix, 'signal'] = 1 if inside else np.nan
which is going to give you exactly the output you posted.
No worries. Good question @Baz
– ayorgo
4 hours ago
I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state usingapply
(currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
4 hours ago
@ayorgo - added solution.
– jezrael
3 hours ago
add a comment |
up vote
5
down vote
accepted
You can do
# Assuming we're starting from the "outside"
inside = False
for ix, row in df.iterrows():
inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
df.at[ix, 'signal'] = 1 if inside else np.nan
which is going to give you exactly the output you posted.
No worries. Good question @Baz
– ayorgo
4 hours ago
I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state usingapply
(currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
4 hours ago
@ayorgo - added solution.
– jezrael
3 hours ago
add a comment |
up vote
5
down vote
accepted
up vote
5
down vote
accepted
You can do
# Assuming we're starting from the "outside"
inside = False
for ix, row in df.iterrows():
inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
df.at[ix, 'signal'] = 1 if inside else np.nan
which is going to give you exactly the output you posted.
You can do
# Assuming we're starting from the "outside"
inside = False
for ix, row in df.iterrows():
inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
df.at[ix, 'signal'] = 1 if inside else np.nan
which is going to give you exactly the output you posted.
answered 5 hours ago
ayorgo
663311
663311
No worries. Good question @Baz
– ayorgo
4 hours ago
I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state usingapply
(currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
4 hours ago
@ayorgo - added solution.
– jezrael
3 hours ago
add a comment |
No worries. Good question @Baz
– ayorgo
4 hours ago
I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state usingapply
(currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
4 hours ago
@ayorgo - added solution.
– jezrael
3 hours ago
No worries. Good question @Baz
– ayorgo
4 hours ago
No worries. Good question @Baz
– ayorgo
4 hours ago
I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using
apply
(currying?) or do it in a vectorized way I'll be the first to upvote.– ayorgo
4 hours ago
I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using
apply
(currying?) or do it in a vectorized way I'll be the first to upvote.– ayorgo
4 hours ago
@ayorgo - added solution.
– jezrael
3 hours ago
@ayorgo - added solution.
– jezrael
3 hours ago
add a comment |
up vote
3
down vote
For improve performance use Numba solution:
arr = df[['long_exit_flag','long_entry_flag']].values
@jit
def f(A):
inside = False
out = np.ones(len(A), dtype=float)
for i in range(len(arr)):
inside = not A[i, 0] if inside else A[i, 1]
if not inside:
out[i] = np.nan
return out
df['signal'] = f(arr)
Performance:
#[21000 rows x 5 columns]
df = pd.concat([df] * 1000, ignore_index=True)
In [189]: %%timeit
...: inside = False
...: for ix, row in df.iterrows():
...: inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
...: df.at[ix, 'signal'] = 1 if inside else np.nan
...:
1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [190]: %%timeit
...: arr = df[['long_exit_flag','long_entry_flag']].values
...:
...: @jit
...: def f(A):
...: inside = False
...: out = np.ones(len(A), dtype=float)
...: for i in range(len(arr)):
...: inside = not A[i, 0] if inside else A[i, 1]
...: if not inside:
...: out[i] = np.nan
...: return out
...:
...: df['signal'] = f(arr)
...:
171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [200]: %%timeit
...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)
...:
...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])
...:
2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
You can also use numpy for shifting, also @Dark code is simplify:
In [222]: %%timeit
...: d = np.where(~df['long_exit_flag'].values, df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)
...: shifted = np.insert(d[:-1], 0, np.nan)
...: m = (shifted==0) | (shifted==1)
...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)
...:
590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
EDIT:
You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.
1
Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
3 hours ago
@jezrael do check the timings against my approach : )
– Dark
3 hours ago
add a comment |
up vote
3
down vote
For improve performance use Numba solution:
arr = df[['long_exit_flag','long_entry_flag']].values
@jit
def f(A):
inside = False
out = np.ones(len(A), dtype=float)
for i in range(len(arr)):
inside = not A[i, 0] if inside else A[i, 1]
if not inside:
out[i] = np.nan
return out
df['signal'] = f(arr)
Performance:
#[21000 rows x 5 columns]
df = pd.concat([df] * 1000, ignore_index=True)
In [189]: %%timeit
...: inside = False
...: for ix, row in df.iterrows():
...: inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
...: df.at[ix, 'signal'] = 1 if inside else np.nan
...:
1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [190]: %%timeit
...: arr = df[['long_exit_flag','long_entry_flag']].values
...:
...: @jit
...: def f(A):
...: inside = False
...: out = np.ones(len(A), dtype=float)
...: for i in range(len(arr)):
...: inside = not A[i, 0] if inside else A[i, 1]
...: if not inside:
...: out[i] = np.nan
...: return out
...:
...: df['signal'] = f(arr)
...:
171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [200]: %%timeit
...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)
...:
...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])
...:
2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
You can also use numpy for shifting, also @Dark code is simplify:
In [222]: %%timeit
...: d = np.where(~df['long_exit_flag'].values, df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)
...: shifted = np.insert(d[:-1], 0, np.nan)
...: m = (shifted==0) | (shifted==1)
...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)
...:
590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
EDIT:
You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.
1
Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
3 hours ago
@jezrael do check the timings against my approach : )
– Dark
3 hours ago
add a comment |
up vote
3
down vote
up vote
3
down vote
For improve performance use Numba solution:
arr = df[['long_exit_flag','long_entry_flag']].values
@jit
def f(A):
inside = False
out = np.ones(len(A), dtype=float)
for i in range(len(arr)):
inside = not A[i, 0] if inside else A[i, 1]
if not inside:
out[i] = np.nan
return out
df['signal'] = f(arr)
Performance:
#[21000 rows x 5 columns]
df = pd.concat([df] * 1000, ignore_index=True)
In [189]: %%timeit
...: inside = False
...: for ix, row in df.iterrows():
...: inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
...: df.at[ix, 'signal'] = 1 if inside else np.nan
...:
1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [190]: %%timeit
...: arr = df[['long_exit_flag','long_entry_flag']].values
...:
...: @jit
...: def f(A):
...: inside = False
...: out = np.ones(len(A), dtype=float)
...: for i in range(len(arr)):
...: inside = not A[i, 0] if inside else A[i, 1]
...: if not inside:
...: out[i] = np.nan
...: return out
...:
...: df['signal'] = f(arr)
...:
171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [200]: %%timeit
...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)
...:
...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])
...:
2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
You can also use numpy for shifting, also @Dark code is simplify:
In [222]: %%timeit
...: d = np.where(~df['long_exit_flag'].values, df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)
...: shifted = np.insert(d[:-1], 0, np.nan)
...: m = (shifted==0) | (shifted==1)
...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)
...:
590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
EDIT:
You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.
For improve performance use Numba solution:
arr = df[['long_exit_flag','long_entry_flag']].values
@jit
def f(A):
inside = False
out = np.ones(len(A), dtype=float)
for i in range(len(arr)):
inside = not A[i, 0] if inside else A[i, 1]
if not inside:
out[i] = np.nan
return out
df['signal'] = f(arr)
Performance:
#[21000 rows x 5 columns]
df = pd.concat([df] * 1000, ignore_index=True)
In [189]: %%timeit
...: inside = False
...: for ix, row in df.iterrows():
...: inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
...: df.at[ix, 'signal'] = 1 if inside else np.nan
...:
1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [190]: %%timeit
...: arr = df[['long_exit_flag','long_entry_flag']].values
...:
...: @jit
...: def f(A):
...: inside = False
...: out = np.ones(len(A), dtype=float)
...: for i in range(len(arr)):
...: inside = not A[i, 0] if inside else A[i, 1]
...: if not inside:
...: out[i] = np.nan
...: return out
...:
...: df['signal'] = f(arr)
...:
171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [200]: %%timeit
...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)
...:
...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])
...:
2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
You can also use numpy for shifting, also @Dark code is simplify:
In [222]: %%timeit
...: d = np.where(~df['long_exit_flag'].values, df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)
...: shifted = np.insert(d[:-1], 0, np.nan)
...: m = (shifted==0) | (shifted==1)
...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)
...:
590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
EDIT:
You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.
edited 2 hours ago
answered 3 hours ago
jezrael
315k21253331
315k21253331
1
Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
3 hours ago
@jezrael do check the timings against my approach : )
– Dark
3 hours ago
add a comment |
1
Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
3 hours ago
@jezrael do check the timings against my approach : )
– Dark
3 hours ago
1
1
Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
3 hours ago
Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
3 hours ago
@jezrael do check the timings against my approach : )
– Dark
3 hours ago
@jezrael do check the timings against my approach : )
– Dark
3 hours ago
add a comment |
up vote
1
down vote
Here's an approach with complete boolean operations which is a vectorized approach and will be fast.
Step 1 :
If long_exit_flag is True return Np.nan else apply or
between long_entry_flag
and long_exit_flag
df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])
Step 2 : Now its the state where the both the columns are false
. We need to ignore it and replace the values with the previous state. Which can be done using where
and select
df['new_signal']= np.where(df['d']==0,
np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),
df['d'])
Days long_entry_flag long_exit_flag signal d new_signal
0 1 False True NaN NaN NaN
1 2 False False NaN 0.0 NaN
2 3 True False 1.0 1.0 1.0
3 4 True False 1.0 1.0 1.0
4 5 False False 1.0 0.0 1.0
5 6 True False 1.0 1.0 1.0
6 7 True False 1.0 1.0 1.0
7 8 False True NaN NaN NaN
8 9 False True NaN NaN NaN
9 10 True False 1.0 1.0 1.0
10 11 True False 1.0 1.0 1.0
11 12 True False 1.0 1.0 1.0
12 13 False False 1.0 0.0 1.0
13 14 False True NaN NaN NaN
14 15 False False NaN 0.0 NaN
15 16 False True NaN NaN NaN
16 17 True False 1.0 1.0 1.0
17 18 True False 1.0 1.0 1.0
18 19 False False 1.0 0.0 1.0
19 20 False False 1.0 0.0 1.0
20 21 False True NaN NaN NaN
1
Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
2 hours ago
I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
2 hours ago
I know it and already upvote too. Good luck!
– jezrael
2 hours ago
add a comment |
up vote
1
down vote
Here's an approach with complete boolean operations which is a vectorized approach and will be fast.
Step 1 :
If long_exit_flag is True return Np.nan else apply or
between long_entry_flag
and long_exit_flag
df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])
Step 2 : Now its the state where the both the columns are false
. We need to ignore it and replace the values with the previous state. Which can be done using where
and select
df['new_signal']= np.where(df['d']==0,
np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),
df['d'])
Days long_entry_flag long_exit_flag signal d new_signal
0 1 False True NaN NaN NaN
1 2 False False NaN 0.0 NaN
2 3 True False 1.0 1.0 1.0
3 4 True False 1.0 1.0 1.0
4 5 False False 1.0 0.0 1.0
5 6 True False 1.0 1.0 1.0
6 7 True False 1.0 1.0 1.0
7 8 False True NaN NaN NaN
8 9 False True NaN NaN NaN
9 10 True False 1.0 1.0 1.0
10 11 True False 1.0 1.0 1.0
11 12 True False 1.0 1.0 1.0
12 13 False False 1.0 0.0 1.0
13 14 False True NaN NaN NaN
14 15 False False NaN 0.0 NaN
15 16 False True NaN NaN NaN
16 17 True False 1.0 1.0 1.0
17 18 True False 1.0 1.0 1.0
18 19 False False 1.0 0.0 1.0
19 20 False False 1.0 0.0 1.0
20 21 False True NaN NaN NaN
1
Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
2 hours ago
I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
2 hours ago
I know it and already upvote too. Good luck!
– jezrael
2 hours ago
add a comment |
up vote
1
down vote
up vote
1
down vote
Here's an approach with complete boolean operations which is a vectorized approach and will be fast.
Step 1 :
If long_exit_flag is True return Np.nan else apply or
between long_entry_flag
and long_exit_flag
df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])
Step 2 : Now its the state where the both the columns are false
. We need to ignore it and replace the values with the previous state. Which can be done using where
and select
df['new_signal']= np.where(df['d']==0,
np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),
df['d'])
Days long_entry_flag long_exit_flag signal d new_signal
0 1 False True NaN NaN NaN
1 2 False False NaN 0.0 NaN
2 3 True False 1.0 1.0 1.0
3 4 True False 1.0 1.0 1.0
4 5 False False 1.0 0.0 1.0
5 6 True False 1.0 1.0 1.0
6 7 True False 1.0 1.0 1.0
7 8 False True NaN NaN NaN
8 9 False True NaN NaN NaN
9 10 True False 1.0 1.0 1.0
10 11 True False 1.0 1.0 1.0
11 12 True False 1.0 1.0 1.0
12 13 False False 1.0 0.0 1.0
13 14 False True NaN NaN NaN
14 15 False False NaN 0.0 NaN
15 16 False True NaN NaN NaN
16 17 True False 1.0 1.0 1.0
17 18 True False 1.0 1.0 1.0
18 19 False False 1.0 0.0 1.0
19 20 False False 1.0 0.0 1.0
20 21 False True NaN NaN NaN
Here's an approach with complete boolean operations which is a vectorized approach and will be fast.
Step 1 :
If long_exit_flag is True return Np.nan else apply or
between long_entry_flag
and long_exit_flag
df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])
Step 2 : Now its the state where the both the columns are false
. We need to ignore it and replace the values with the previous state. Which can be done using where
and select
df['new_signal']= np.where(df['d']==0,
np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),
df['d'])
Days long_entry_flag long_exit_flag signal d new_signal
0 1 False True NaN NaN NaN
1 2 False False NaN 0.0 NaN
2 3 True False 1.0 1.0 1.0
3 4 True False 1.0 1.0 1.0
4 5 False False 1.0 0.0 1.0
5 6 True False 1.0 1.0 1.0
6 7 True False 1.0 1.0 1.0
7 8 False True NaN NaN NaN
8 9 False True NaN NaN NaN
9 10 True False 1.0 1.0 1.0
10 11 True False 1.0 1.0 1.0
11 12 True False 1.0 1.0 1.0
12 13 False False 1.0 0.0 1.0
13 14 False True NaN NaN NaN
14 15 False False NaN 0.0 NaN
15 16 False True NaN NaN NaN
16 17 True False 1.0 1.0 1.0
17 18 True False 1.0 1.0 1.0
18 19 False False 1.0 0.0 1.0
19 20 False False 1.0 0.0 1.0
20 21 False True NaN NaN NaN
edited 2 hours ago
answered 3 hours ago
Dark
20.9k31946
20.9k31946
1
Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
2 hours ago
I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
2 hours ago
I know it and already upvote too. Good luck!
– jezrael
2 hours ago
add a comment |
1
Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
2 hours ago
I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
2 hours ago
I know it and already upvote too. Good luck!
– jezrael
2 hours ago
1
1
Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
2 hours ago
Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
2 hours ago
I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
2 hours ago
I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
2 hours ago
I know it and already upvote too. Good luck!
– jezrael
2 hours ago
I know it and already upvote too. Good luck!
– jezrael
2 hours ago
add a comment |
up vote
0
down vote
#let the long_exit_flag equal to 0 when the exit is TRUE
df['long_exit_flag_r']=~df.long_exit_flag_r
df.temp=''
for i in range(1,len(df.index)):
df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r
if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)
add a comment |
up vote
0
down vote
#let the long_exit_flag equal to 0 when the exit is TRUE
df['long_exit_flag_r']=~df.long_exit_flag_r
df.temp=''
for i in range(1,len(df.index)):
df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r
if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)
add a comment |
up vote
0
down vote
up vote
0
down vote
#let the long_exit_flag equal to 0 when the exit is TRUE
df['long_exit_flag_r']=~df.long_exit_flag_r
df.temp=''
for i in range(1,len(df.index)):
df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r
if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)
#let the long_exit_flag equal to 0 when the exit is TRUE
df['long_exit_flag_r']=~df.long_exit_flag_r
df.temp=''
for i in range(1,len(df.index)):
df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r
if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)
edited 5 hours ago
answered 5 hours ago
ZhouXing98
445
445
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53721733%2fuse-columns-1-and-2-to-populate-column-3%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I'm sorry I'm trying to delete this question to get the formatting right...
– Baz
6 hours ago
1
If u still think you want to edit your question you can delete and raise a new question.
– Mohamed Thasin ah
6 hours ago
1
Oh perfect editing thank you Mohamed Thasin ah!
– Baz
6 hours ago