Can't drop null values in Python
I got this dataset that when values are null it has the symbol -
At first I thought it wasn't a problem, so to drop these rows I did:
df_c = df[df != '-']
But it didn't actually drop the rows, it put instead a NaN in place of -
Then I did:
df_c = df_c[df_c.notnull()]
But it doesn't work, and it gives me back - again.
What am I doind wrong?
python python-3.x nan missing-data
|
show 1 more comment
I got this dataset that when values are null it has the symbol -
At first I thought it wasn't a problem, so to drop these rows I did:
df_c = df[df != '-']
But it didn't actually drop the rows, it put instead a NaN in place of -
Then I did:
df_c = df_c[df_c.notnull()]
But it doesn't work, and it gives me back - again.
What am I doind wrong?
python python-3.x nan missing-data
Can you add the language in your labels? Python?
– vahdet
Nov 23 '18 at 12:00
Sorry I totally forgot, yes it's Python
– Zhang_anlan
Nov 23 '18 at 12:25
trydel df[df != '-']alternatively, can you give an example of the df data structure?
– Tomos Williams
Nov 23 '18 at 12:30
In 2nd operation you are usingdfinstead ofdf_c. Usedf_c.dropna()
– Sociopath
Nov 23 '18 at 12:44
@Sociopath I know but even doing df_c it doesn't work
– Zhang_anlan
Nov 23 '18 at 12:57
|
show 1 more comment
I got this dataset that when values are null it has the symbol -
At first I thought it wasn't a problem, so to drop these rows I did:
df_c = df[df != '-']
But it didn't actually drop the rows, it put instead a NaN in place of -
Then I did:
df_c = df_c[df_c.notnull()]
But it doesn't work, and it gives me back - again.
What am I doind wrong?
python python-3.x nan missing-data
I got this dataset that when values are null it has the symbol -
At first I thought it wasn't a problem, so to drop these rows I did:
df_c = df[df != '-']
But it didn't actually drop the rows, it put instead a NaN in place of -
Then I did:
df_c = df_c[df_c.notnull()]
But it doesn't work, and it gives me back - again.
What am I doind wrong?
python python-3.x nan missing-data
python python-3.x nan missing-data
edited Nov 23 '18 at 12:56
Zhang_anlan
asked Nov 23 '18 at 11:08
Zhang_anlanZhang_anlan
347
347
Can you add the language in your labels? Python?
– vahdet
Nov 23 '18 at 12:00
Sorry I totally forgot, yes it's Python
– Zhang_anlan
Nov 23 '18 at 12:25
trydel df[df != '-']alternatively, can you give an example of the df data structure?
– Tomos Williams
Nov 23 '18 at 12:30
In 2nd operation you are usingdfinstead ofdf_c. Usedf_c.dropna()
– Sociopath
Nov 23 '18 at 12:44
@Sociopath I know but even doing df_c it doesn't work
– Zhang_anlan
Nov 23 '18 at 12:57
|
show 1 more comment
Can you add the language in your labels? Python?
– vahdet
Nov 23 '18 at 12:00
Sorry I totally forgot, yes it's Python
– Zhang_anlan
Nov 23 '18 at 12:25
trydel df[df != '-']alternatively, can you give an example of the df data structure?
– Tomos Williams
Nov 23 '18 at 12:30
In 2nd operation you are usingdfinstead ofdf_c. Usedf_c.dropna()
– Sociopath
Nov 23 '18 at 12:44
@Sociopath I know but even doing df_c it doesn't work
– Zhang_anlan
Nov 23 '18 at 12:57
Can you add the language in your labels? Python?
– vahdet
Nov 23 '18 at 12:00
Can you add the language in your labels? Python?
– vahdet
Nov 23 '18 at 12:00
Sorry I totally forgot, yes it's Python
– Zhang_anlan
Nov 23 '18 at 12:25
Sorry I totally forgot, yes it's Python
– Zhang_anlan
Nov 23 '18 at 12:25
try
del df[df != '-'] alternatively, can you give an example of the df data structure?– Tomos Williams
Nov 23 '18 at 12:30
try
del df[df != '-'] alternatively, can you give an example of the df data structure?– Tomos Williams
Nov 23 '18 at 12:30
In 2nd operation you are using
df instead of df_c. Use df_c.dropna()– Sociopath
Nov 23 '18 at 12:44
In 2nd operation you are using
df instead of df_c. Use df_c.dropna()– Sociopath
Nov 23 '18 at 12:44
@Sociopath I know but even doing df_c it doesn't work
– Zhang_anlan
Nov 23 '18 at 12:57
@Sociopath I know but even doing df_c it doesn't work
– Zhang_anlan
Nov 23 '18 at 12:57
|
show 1 more comment
1 Answer
1
active
oldest
votes
mask + dropna
You can mask with a Boolean dataframe, then use dropna:
df = pd.DataFrame({'A': [1, '-', '-', 4, '-'],
'B': ['A', 'B', '-', 'C', '-'],
'C': [0.5, '-', '-', 1.5, 2.5]})
df = df.mask(df == '-').dropna()
print(df)
A B C
0 1 A 0.5
3 4 C 1.5
By default, dropna drops rows (axis=0) where any value is null (how='any'). You can amend these parameters as appropriate.
Note: This is functionally identical to df = df[df != '-'].dropna(). Though, from a cosmetic perspective, the intent of mask may seem clearer.
The problem with your solution is df_c.notnull() gives a Boolean dataframe array, but you want to index via a 1-dimensional array / series. You could use:
df_c = df[df != '-']
df_c = df_c[df_c.notnull().all(1)]
But this is verbose and likely inefficient.
Thank you, now it works. But what was I doing wrong? because evendf.dropna()didn't work.
– Zhang_anlan
Nov 23 '18 at 13:02
@Zhang_anlan, I've updated with what you were doing wrong with your initial attempt. I can't replicate "dropnadidn't work".
– jpp
Nov 23 '18 at 13:41
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53445560%2fcant-drop-null-values-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
mask + dropna
You can mask with a Boolean dataframe, then use dropna:
df = pd.DataFrame({'A': [1, '-', '-', 4, '-'],
'B': ['A', 'B', '-', 'C', '-'],
'C': [0.5, '-', '-', 1.5, 2.5]})
df = df.mask(df == '-').dropna()
print(df)
A B C
0 1 A 0.5
3 4 C 1.5
By default, dropna drops rows (axis=0) where any value is null (how='any'). You can amend these parameters as appropriate.
Note: This is functionally identical to df = df[df != '-'].dropna(). Though, from a cosmetic perspective, the intent of mask may seem clearer.
The problem with your solution is df_c.notnull() gives a Boolean dataframe array, but you want to index via a 1-dimensional array / series. You could use:
df_c = df[df != '-']
df_c = df_c[df_c.notnull().all(1)]
But this is verbose and likely inefficient.
Thank you, now it works. But what was I doing wrong? because evendf.dropna()didn't work.
– Zhang_anlan
Nov 23 '18 at 13:02
@Zhang_anlan, I've updated with what you were doing wrong with your initial attempt. I can't replicate "dropnadidn't work".
– jpp
Nov 23 '18 at 13:41
add a comment |
mask + dropna
You can mask with a Boolean dataframe, then use dropna:
df = pd.DataFrame({'A': [1, '-', '-', 4, '-'],
'B': ['A', 'B', '-', 'C', '-'],
'C': [0.5, '-', '-', 1.5, 2.5]})
df = df.mask(df == '-').dropna()
print(df)
A B C
0 1 A 0.5
3 4 C 1.5
By default, dropna drops rows (axis=0) where any value is null (how='any'). You can amend these parameters as appropriate.
Note: This is functionally identical to df = df[df != '-'].dropna(). Though, from a cosmetic perspective, the intent of mask may seem clearer.
The problem with your solution is df_c.notnull() gives a Boolean dataframe array, but you want to index via a 1-dimensional array / series. You could use:
df_c = df[df != '-']
df_c = df_c[df_c.notnull().all(1)]
But this is verbose and likely inefficient.
Thank you, now it works. But what was I doing wrong? because evendf.dropna()didn't work.
– Zhang_anlan
Nov 23 '18 at 13:02
@Zhang_anlan, I've updated with what you were doing wrong with your initial attempt. I can't replicate "dropnadidn't work".
– jpp
Nov 23 '18 at 13:41
add a comment |
mask + dropna
You can mask with a Boolean dataframe, then use dropna:
df = pd.DataFrame({'A': [1, '-', '-', 4, '-'],
'B': ['A', 'B', '-', 'C', '-'],
'C': [0.5, '-', '-', 1.5, 2.5]})
df = df.mask(df == '-').dropna()
print(df)
A B C
0 1 A 0.5
3 4 C 1.5
By default, dropna drops rows (axis=0) where any value is null (how='any'). You can amend these parameters as appropriate.
Note: This is functionally identical to df = df[df != '-'].dropna(). Though, from a cosmetic perspective, the intent of mask may seem clearer.
The problem with your solution is df_c.notnull() gives a Boolean dataframe array, but you want to index via a 1-dimensional array / series. You could use:
df_c = df[df != '-']
df_c = df_c[df_c.notnull().all(1)]
But this is verbose and likely inefficient.
mask + dropna
You can mask with a Boolean dataframe, then use dropna:
df = pd.DataFrame({'A': [1, '-', '-', 4, '-'],
'B': ['A', 'B', '-', 'C', '-'],
'C': [0.5, '-', '-', 1.5, 2.5]})
df = df.mask(df == '-').dropna()
print(df)
A B C
0 1 A 0.5
3 4 C 1.5
By default, dropna drops rows (axis=0) where any value is null (how='any'). You can amend these parameters as appropriate.
Note: This is functionally identical to df = df[df != '-'].dropna(). Though, from a cosmetic perspective, the intent of mask may seem clearer.
The problem with your solution is df_c.notnull() gives a Boolean dataframe array, but you want to index via a 1-dimensional array / series. You could use:
df_c = df[df != '-']
df_c = df_c[df_c.notnull().all(1)]
But this is verbose and likely inefficient.
edited Nov 23 '18 at 13:47
answered Nov 23 '18 at 12:44
jppjpp
92.9k2054103
92.9k2054103
Thank you, now it works. But what was I doing wrong? because evendf.dropna()didn't work.
– Zhang_anlan
Nov 23 '18 at 13:02
@Zhang_anlan, I've updated with what you were doing wrong with your initial attempt. I can't replicate "dropnadidn't work".
– jpp
Nov 23 '18 at 13:41
add a comment |
Thank you, now it works. But what was I doing wrong? because evendf.dropna()didn't work.
– Zhang_anlan
Nov 23 '18 at 13:02
@Zhang_anlan, I've updated with what you were doing wrong with your initial attempt. I can't replicate "dropnadidn't work".
– jpp
Nov 23 '18 at 13:41
Thank you, now it works. But what was I doing wrong? because even
df.dropna() didn't work.– Zhang_anlan
Nov 23 '18 at 13:02
Thank you, now it works. But what was I doing wrong? because even
df.dropna() didn't work.– Zhang_anlan
Nov 23 '18 at 13:02
@Zhang_anlan, I've updated with what you were doing wrong with your initial attempt. I can't replicate "
dropna didn't work".– jpp
Nov 23 '18 at 13:41
@Zhang_anlan, I've updated with what you were doing wrong with your initial attempt. I can't replicate "
dropna didn't work".– jpp
Nov 23 '18 at 13:41
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53445560%2fcant-drop-null-values-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you add the language in your labels? Python?
– vahdet
Nov 23 '18 at 12:00
Sorry I totally forgot, yes it's Python
– Zhang_anlan
Nov 23 '18 at 12:25
try
del df[df != '-']alternatively, can you give an example of the df data structure?– Tomos Williams
Nov 23 '18 at 12:30
In 2nd operation you are using
dfinstead ofdf_c. Usedf_c.dropna()– Sociopath
Nov 23 '18 at 12:44
@Sociopath I know but even doing df_c it doesn't work
– Zhang_anlan
Nov 23 '18 at 12:57