Can't drop null values in Python












3














I got this dataset that when values are null it has the symbol -



At first I thought it wasn't a problem, so to drop these rows I did:



df_c = df[df != '-']


But it didn't actually drop the rows, it put instead a NaN in place of -



Then I did:



df_c = df_c[df_c.notnull()]


But it doesn't work, and it gives me back - again.
What am I doind wrong?










share|improve this question
























  • Can you add the language in your labels? Python?
    – vahdet
    Nov 23 '18 at 12:00












  • Sorry I totally forgot, yes it's Python
    – Zhang_anlan
    Nov 23 '18 at 12:25










  • try del df[df != '-'] alternatively, can you give an example of the df data structure?
    – Tomos Williams
    Nov 23 '18 at 12:30












  • In 2nd operation you are using df instead of df_c. Use df_c.dropna()
    – Sociopath
    Nov 23 '18 at 12:44










  • @Sociopath I know but even doing df_c it doesn't work
    – Zhang_anlan
    Nov 23 '18 at 12:57
















3














I got this dataset that when values are null it has the symbol -



At first I thought it wasn't a problem, so to drop these rows I did:



df_c = df[df != '-']


But it didn't actually drop the rows, it put instead a NaN in place of -



Then I did:



df_c = df_c[df_c.notnull()]


But it doesn't work, and it gives me back - again.
What am I doind wrong?










share|improve this question
























  • Can you add the language in your labels? Python?
    – vahdet
    Nov 23 '18 at 12:00












  • Sorry I totally forgot, yes it's Python
    – Zhang_anlan
    Nov 23 '18 at 12:25










  • try del df[df != '-'] alternatively, can you give an example of the df data structure?
    – Tomos Williams
    Nov 23 '18 at 12:30












  • In 2nd operation you are using df instead of df_c. Use df_c.dropna()
    – Sociopath
    Nov 23 '18 at 12:44










  • @Sociopath I know but even doing df_c it doesn't work
    – Zhang_anlan
    Nov 23 '18 at 12:57














3












3








3







I got this dataset that when values are null it has the symbol -



At first I thought it wasn't a problem, so to drop these rows I did:



df_c = df[df != '-']


But it didn't actually drop the rows, it put instead a NaN in place of -



Then I did:



df_c = df_c[df_c.notnull()]


But it doesn't work, and it gives me back - again.
What am I doind wrong?










share|improve this question















I got this dataset that when values are null it has the symbol -



At first I thought it wasn't a problem, so to drop these rows I did:



df_c = df[df != '-']


But it didn't actually drop the rows, it put instead a NaN in place of -



Then I did:



df_c = df_c[df_c.notnull()]


But it doesn't work, and it gives me back - again.
What am I doind wrong?







python python-3.x nan missing-data






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 '18 at 12:56







Zhang_anlan

















asked Nov 23 '18 at 11:08









Zhang_anlanZhang_anlan

347




347












  • Can you add the language in your labels? Python?
    – vahdet
    Nov 23 '18 at 12:00












  • Sorry I totally forgot, yes it's Python
    – Zhang_anlan
    Nov 23 '18 at 12:25










  • try del df[df != '-'] alternatively, can you give an example of the df data structure?
    – Tomos Williams
    Nov 23 '18 at 12:30












  • In 2nd operation you are using df instead of df_c. Use df_c.dropna()
    – Sociopath
    Nov 23 '18 at 12:44










  • @Sociopath I know but even doing df_c it doesn't work
    – Zhang_anlan
    Nov 23 '18 at 12:57


















  • Can you add the language in your labels? Python?
    – vahdet
    Nov 23 '18 at 12:00












  • Sorry I totally forgot, yes it's Python
    – Zhang_anlan
    Nov 23 '18 at 12:25










  • try del df[df != '-'] alternatively, can you give an example of the df data structure?
    – Tomos Williams
    Nov 23 '18 at 12:30












  • In 2nd operation you are using df instead of df_c. Use df_c.dropna()
    – Sociopath
    Nov 23 '18 at 12:44










  • @Sociopath I know but even doing df_c it doesn't work
    – Zhang_anlan
    Nov 23 '18 at 12:57
















Can you add the language in your labels? Python?
– vahdet
Nov 23 '18 at 12:00






Can you add the language in your labels? Python?
– vahdet
Nov 23 '18 at 12:00














Sorry I totally forgot, yes it's Python
– Zhang_anlan
Nov 23 '18 at 12:25




Sorry I totally forgot, yes it's Python
– Zhang_anlan
Nov 23 '18 at 12:25












try del df[df != '-'] alternatively, can you give an example of the df data structure?
– Tomos Williams
Nov 23 '18 at 12:30






try del df[df != '-'] alternatively, can you give an example of the df data structure?
– Tomos Williams
Nov 23 '18 at 12:30














In 2nd operation you are using df instead of df_c. Use df_c.dropna()
– Sociopath
Nov 23 '18 at 12:44




In 2nd operation you are using df instead of df_c. Use df_c.dropna()
– Sociopath
Nov 23 '18 at 12:44












@Sociopath I know but even doing df_c it doesn't work
– Zhang_anlan
Nov 23 '18 at 12:57




@Sociopath I know but even doing df_c it doesn't work
– Zhang_anlan
Nov 23 '18 at 12:57












1 Answer
1






active

oldest

votes


















4















mask + dropna



You can mask with a Boolean dataframe, then use dropna:



df = pd.DataFrame({'A': [1, '-', '-', 4, '-'],
'B': ['A', 'B', '-', 'C', '-'],
'C': [0.5, '-', '-', 1.5, 2.5]})

df = df.mask(df == '-').dropna()

print(df)

A B C
0 1 A 0.5
3 4 C 1.5


By default, dropna drops rows (axis=0) where any value is null (how='any'). You can amend these parameters as appropriate.



Note: This is functionally identical to df = df[df != '-'].dropna(). Though, from a cosmetic perspective, the intent of mask may seem clearer.





The problem with your solution is df_c.notnull() gives a Boolean dataframe array, but you want to index via a 1-dimensional array / series. You could use:



df_c = df[df != '-']
df_c = df_c[df_c.notnull().all(1)]


But this is verbose and likely inefficient.






share|improve this answer























  • Thank you, now it works. But what was I doing wrong? because even df.dropna() didn't work.
    – Zhang_anlan
    Nov 23 '18 at 13:02










  • @Zhang_anlan, I've updated with what you were doing wrong with your initial attempt. I can't replicate "dropna didn't work".
    – jpp
    Nov 23 '18 at 13:41













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53445560%2fcant-drop-null-values-in-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









4















mask + dropna



You can mask with a Boolean dataframe, then use dropna:



df = pd.DataFrame({'A': [1, '-', '-', 4, '-'],
'B': ['A', 'B', '-', 'C', '-'],
'C': [0.5, '-', '-', 1.5, 2.5]})

df = df.mask(df == '-').dropna()

print(df)

A B C
0 1 A 0.5
3 4 C 1.5


By default, dropna drops rows (axis=0) where any value is null (how='any'). You can amend these parameters as appropriate.



Note: This is functionally identical to df = df[df != '-'].dropna(). Though, from a cosmetic perspective, the intent of mask may seem clearer.





The problem with your solution is df_c.notnull() gives a Boolean dataframe array, but you want to index via a 1-dimensional array / series. You could use:



df_c = df[df != '-']
df_c = df_c[df_c.notnull().all(1)]


But this is verbose and likely inefficient.






share|improve this answer























  • Thank you, now it works. But what was I doing wrong? because even df.dropna() didn't work.
    – Zhang_anlan
    Nov 23 '18 at 13:02










  • @Zhang_anlan, I've updated with what you were doing wrong with your initial attempt. I can't replicate "dropna didn't work".
    – jpp
    Nov 23 '18 at 13:41


















4















mask + dropna



You can mask with a Boolean dataframe, then use dropna:



df = pd.DataFrame({'A': [1, '-', '-', 4, '-'],
'B': ['A', 'B', '-', 'C', '-'],
'C': [0.5, '-', '-', 1.5, 2.5]})

df = df.mask(df == '-').dropna()

print(df)

A B C
0 1 A 0.5
3 4 C 1.5


By default, dropna drops rows (axis=0) where any value is null (how='any'). You can amend these parameters as appropriate.



Note: This is functionally identical to df = df[df != '-'].dropna(). Though, from a cosmetic perspective, the intent of mask may seem clearer.





The problem with your solution is df_c.notnull() gives a Boolean dataframe array, but you want to index via a 1-dimensional array / series. You could use:



df_c = df[df != '-']
df_c = df_c[df_c.notnull().all(1)]


But this is verbose and likely inefficient.






share|improve this answer























  • Thank you, now it works. But what was I doing wrong? because even df.dropna() didn't work.
    – Zhang_anlan
    Nov 23 '18 at 13:02










  • @Zhang_anlan, I've updated with what you were doing wrong with your initial attempt. I can't replicate "dropna didn't work".
    – jpp
    Nov 23 '18 at 13:41
















4












4








4







mask + dropna



You can mask with a Boolean dataframe, then use dropna:



df = pd.DataFrame({'A': [1, '-', '-', 4, '-'],
'B': ['A', 'B', '-', 'C', '-'],
'C': [0.5, '-', '-', 1.5, 2.5]})

df = df.mask(df == '-').dropna()

print(df)

A B C
0 1 A 0.5
3 4 C 1.5


By default, dropna drops rows (axis=0) where any value is null (how='any'). You can amend these parameters as appropriate.



Note: This is functionally identical to df = df[df != '-'].dropna(). Though, from a cosmetic perspective, the intent of mask may seem clearer.





The problem with your solution is df_c.notnull() gives a Boolean dataframe array, but you want to index via a 1-dimensional array / series. You could use:



df_c = df[df != '-']
df_c = df_c[df_c.notnull().all(1)]


But this is verbose and likely inefficient.






share|improve this answer















mask + dropna



You can mask with a Boolean dataframe, then use dropna:



df = pd.DataFrame({'A': [1, '-', '-', 4, '-'],
'B': ['A', 'B', '-', 'C', '-'],
'C': [0.5, '-', '-', 1.5, 2.5]})

df = df.mask(df == '-').dropna()

print(df)

A B C
0 1 A 0.5
3 4 C 1.5


By default, dropna drops rows (axis=0) where any value is null (how='any'). You can amend these parameters as appropriate.



Note: This is functionally identical to df = df[df != '-'].dropna(). Though, from a cosmetic perspective, the intent of mask may seem clearer.





The problem with your solution is df_c.notnull() gives a Boolean dataframe array, but you want to index via a 1-dimensional array / series. You could use:



df_c = df[df != '-']
df_c = df_c[df_c.notnull().all(1)]


But this is verbose and likely inefficient.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 23 '18 at 13:47

























answered Nov 23 '18 at 12:44









jppjpp

92.9k2054103




92.9k2054103












  • Thank you, now it works. But what was I doing wrong? because even df.dropna() didn't work.
    – Zhang_anlan
    Nov 23 '18 at 13:02










  • @Zhang_anlan, I've updated with what you were doing wrong with your initial attempt. I can't replicate "dropna didn't work".
    – jpp
    Nov 23 '18 at 13:41




















  • Thank you, now it works. But what was I doing wrong? because even df.dropna() didn't work.
    – Zhang_anlan
    Nov 23 '18 at 13:02










  • @Zhang_anlan, I've updated with what you were doing wrong with your initial attempt. I can't replicate "dropna didn't work".
    – jpp
    Nov 23 '18 at 13:41


















Thank you, now it works. But what was I doing wrong? because even df.dropna() didn't work.
– Zhang_anlan
Nov 23 '18 at 13:02




Thank you, now it works. But what was I doing wrong? because even df.dropna() didn't work.
– Zhang_anlan
Nov 23 '18 at 13:02












@Zhang_anlan, I've updated with what you were doing wrong with your initial attempt. I can't replicate "dropna didn't work".
– jpp
Nov 23 '18 at 13:41






@Zhang_anlan, I've updated with what you were doing wrong with your initial attempt. I can't replicate "dropna didn't work".
– jpp
Nov 23 '18 at 13:41




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53445560%2fcant-drop-null-values-in-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Catalogne

Violoncelliste

Héron pourpré