Dividing columns by particular values using dplyr
I have a dataframe like this:
Setting q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam
and I would like to divide columns 2, 3, 4, 5, 6 by the total for that particular country.
Doing it in base R is a bit clumsy:
df[df$Country=="Vietnam",][c(3, 4, 5, 6)] = df[df$Country=="Vietnam",][c(3, 4, 5, 6)] / sum(df[df$Country=="Vietnam",][c(3, 4, 5, 6)])
(I think that works).
I'm trying to convert as much of my code as possible to use tidyverse functions. Is there a way of doing the same thing more efficiently using, dplyr
, for instance?
Thanks.
r dplyr
add a comment |
I have a dataframe like this:
Setting q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam
and I would like to divide columns 2, 3, 4, 5, 6 by the total for that particular country.
Doing it in base R is a bit clumsy:
df[df$Country=="Vietnam",][c(3, 4, 5, 6)] = df[df$Country=="Vietnam",][c(3, 4, 5, 6)] / sum(df[df$Country=="Vietnam",][c(3, 4, 5, 6)])
(I think that works).
I'm trying to convert as much of my code as possible to use tidyverse functions. Is there a way of doing the same thing more efficiently using, dplyr
, for instance?
Thanks.
r dplyr
divide with total (all columns) for the country, or total for the country per column?
– missuse
Mar 21 '18 at 11:11
add a comment |
I have a dataframe like this:
Setting q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam
and I would like to divide columns 2, 3, 4, 5, 6 by the total for that particular country.
Doing it in base R is a bit clumsy:
df[df$Country=="Vietnam",][c(3, 4, 5, 6)] = df[df$Country=="Vietnam",][c(3, 4, 5, 6)] / sum(df[df$Country=="Vietnam",][c(3, 4, 5, 6)])
(I think that works).
I'm trying to convert as much of my code as possible to use tidyverse functions. Is there a way of doing the same thing more efficiently using, dplyr
, for instance?
Thanks.
r dplyr
I have a dataframe like this:
Setting q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam
and I would like to divide columns 2, 3, 4, 5, 6 by the total for that particular country.
Doing it in base R is a bit clumsy:
df[df$Country=="Vietnam",][c(3, 4, 5, 6)] = df[df$Country=="Vietnam",][c(3, 4, 5, 6)] / sum(df[df$Country=="Vietnam",][c(3, 4, 5, 6)])
(I think that works).
I'm trying to convert as much of my code as possible to use tidyverse functions. Is there a way of doing the same thing more efficiently using, dplyr
, for instance?
Thanks.
r dplyr
r dplyr
edited Nov 23 '18 at 7:31
Marcus Campbell
2,01921027
2,01921027
asked Mar 21 '18 at 10:55
sahwahn
669
669
divide with total (all columns) for the country, or total for the country per column?
– missuse
Mar 21 '18 at 11:11
add a comment |
divide with total (all columns) for the country, or total for the country per column?
– missuse
Mar 21 '18 at 11:11
divide with total (all columns) for the country, or total for the country per column?
– missuse
Mar 21 '18 at 11:11
divide with total (all columns) for the country, or total for the country per column?
– missuse
Mar 21 '18 at 11:11
add a comment |
2 Answers
2
active
oldest
votes
I trust this is what you are after:
Divide each column by the sum of that column - grouped by Country
library(tidyverse)
df1 %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(.)))
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 NaN 0.238 0.600 1.00 0.250 Vietnam
2 Rural 11900031 1.00 1.00 NaN NaN NaN China
3 Rural 11900033 NaN 0.143 0 0 0.750 Vietnam
4 Rural 11900053 NaN 0.333 0.400 0 0 Vietnam
5 Rural 11900114 1.00 1.00 NaN NaN NaN Malaysia
6 Rural 11900446 NaN 0.286 0 0 0 Vietnam
or alternatively divide each column by the total sum for each country as in your example (only difference is I used columns 3:7 as I trust you intended.
df1 %>%
mutate(sum = rowSums(.[,3:7])) %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(sum))) %>%
select(-sum)
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 0 0.161 0.0968 0.0323 0.0323 Vietnam
2 Rural 11900031 0.667 0.333 0 0 0 China
3 Rural 11900033 0 0.0968 0 0 0.0968 Vietnam
4 Rural 11900053 0 0.226 0.0645 0 0 Vietnam
5 Rural 11900114 0.333 0.667 0 0 0 Malaysia
6 Rural 11900446 0 0.194 0 0 0 Vietnam
data:
df1 = read.table(text ="Setting q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam", header = T)
add a comment |
I know you ask for tidyverse
functions, but this is also a task where the data.table
package shines:
library(data.table)
setDT(df)
df[, lapply(.SD, function(x) x / sum(x)), by = Country, .SDcols = 3:7]
Country c_school c_home c_work c_transport c_leisure
1: Vietnam NaN 0.2380952 0.6 1 0.25
2: Vietnam NaN 0.1428571 0.0 0 0.75
3: Vietnam NaN 0.3333333 0.4 0 0.00
4: Vietnam NaN 0.2857143 0.0 0 0.00
5: China 1 1.0000000 NaN NaN NaN
6: Malaysia 1 1.0000000 NaN NaN NaN
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f49404461%2fdividing-columns-by-particular-values-using-dplyr%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I trust this is what you are after:
Divide each column by the sum of that column - grouped by Country
library(tidyverse)
df1 %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(.)))
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 NaN 0.238 0.600 1.00 0.250 Vietnam
2 Rural 11900031 1.00 1.00 NaN NaN NaN China
3 Rural 11900033 NaN 0.143 0 0 0.750 Vietnam
4 Rural 11900053 NaN 0.333 0.400 0 0 Vietnam
5 Rural 11900114 1.00 1.00 NaN NaN NaN Malaysia
6 Rural 11900446 NaN 0.286 0 0 0 Vietnam
or alternatively divide each column by the total sum for each country as in your example (only difference is I used columns 3:7 as I trust you intended.
df1 %>%
mutate(sum = rowSums(.[,3:7])) %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(sum))) %>%
select(-sum)
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 0 0.161 0.0968 0.0323 0.0323 Vietnam
2 Rural 11900031 0.667 0.333 0 0 0 China
3 Rural 11900033 0 0.0968 0 0 0.0968 Vietnam
4 Rural 11900053 0 0.226 0.0645 0 0 Vietnam
5 Rural 11900114 0.333 0.667 0 0 0 Malaysia
6 Rural 11900446 0 0.194 0 0 0 Vietnam
data:
df1 = read.table(text ="Setting q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam", header = T)
add a comment |
I trust this is what you are after:
Divide each column by the sum of that column - grouped by Country
library(tidyverse)
df1 %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(.)))
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 NaN 0.238 0.600 1.00 0.250 Vietnam
2 Rural 11900031 1.00 1.00 NaN NaN NaN China
3 Rural 11900033 NaN 0.143 0 0 0.750 Vietnam
4 Rural 11900053 NaN 0.333 0.400 0 0 Vietnam
5 Rural 11900114 1.00 1.00 NaN NaN NaN Malaysia
6 Rural 11900446 NaN 0.286 0 0 0 Vietnam
or alternatively divide each column by the total sum for each country as in your example (only difference is I used columns 3:7 as I trust you intended.
df1 %>%
mutate(sum = rowSums(.[,3:7])) %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(sum))) %>%
select(-sum)
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 0 0.161 0.0968 0.0323 0.0323 Vietnam
2 Rural 11900031 0.667 0.333 0 0 0 China
3 Rural 11900033 0 0.0968 0 0 0.0968 Vietnam
4 Rural 11900053 0 0.226 0.0645 0 0 Vietnam
5 Rural 11900114 0.333 0.667 0 0 0 Malaysia
6 Rural 11900446 0 0.194 0 0 0 Vietnam
data:
df1 = read.table(text ="Setting q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam", header = T)
add a comment |
I trust this is what you are after:
Divide each column by the sum of that column - grouped by Country
library(tidyverse)
df1 %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(.)))
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 NaN 0.238 0.600 1.00 0.250 Vietnam
2 Rural 11900031 1.00 1.00 NaN NaN NaN China
3 Rural 11900033 NaN 0.143 0 0 0.750 Vietnam
4 Rural 11900053 NaN 0.333 0.400 0 0 Vietnam
5 Rural 11900114 1.00 1.00 NaN NaN NaN Malaysia
6 Rural 11900446 NaN 0.286 0 0 0 Vietnam
or alternatively divide each column by the total sum for each country as in your example (only difference is I used columns 3:7 as I trust you intended.
df1 %>%
mutate(sum = rowSums(.[,3:7])) %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(sum))) %>%
select(-sum)
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 0 0.161 0.0968 0.0323 0.0323 Vietnam
2 Rural 11900031 0.667 0.333 0 0 0 China
3 Rural 11900033 0 0.0968 0 0 0.0968 Vietnam
4 Rural 11900053 0 0.226 0.0645 0 0 Vietnam
5 Rural 11900114 0.333 0.667 0 0 0 Malaysia
6 Rural 11900446 0 0.194 0 0 0 Vietnam
data:
df1 = read.table(text ="Setting q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam", header = T)
I trust this is what you are after:
Divide each column by the sum of that column - grouped by Country
library(tidyverse)
df1 %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(.)))
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 NaN 0.238 0.600 1.00 0.250 Vietnam
2 Rural 11900031 1.00 1.00 NaN NaN NaN China
3 Rural 11900033 NaN 0.143 0 0 0.750 Vietnam
4 Rural 11900053 NaN 0.333 0.400 0 0 Vietnam
5 Rural 11900114 1.00 1.00 NaN NaN NaN Malaysia
6 Rural 11900446 NaN 0.286 0 0 0 Vietnam
or alternatively divide each column by the total sum for each country as in your example (only difference is I used columns 3:7 as I trust you intended.
df1 %>%
mutate(sum = rowSums(.[,3:7])) %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(sum))) %>%
select(-sum)
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 0 0.161 0.0968 0.0323 0.0323 Vietnam
2 Rural 11900031 0.667 0.333 0 0 0 China
3 Rural 11900033 0 0.0968 0 0 0.0968 Vietnam
4 Rural 11900053 0 0.226 0.0645 0 0 Vietnam
5 Rural 11900114 0.333 0.667 0 0 0 Malaysia
6 Rural 11900446 0 0.194 0 0 0 Vietnam
data:
df1 = read.table(text ="Setting q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam", header = T)
edited Mar 21 '18 at 11:16
answered Mar 21 '18 at 11:06
missuse
11.6k2622
11.6k2622
add a comment |
add a comment |
I know you ask for tidyverse
functions, but this is also a task where the data.table
package shines:
library(data.table)
setDT(df)
df[, lapply(.SD, function(x) x / sum(x)), by = Country, .SDcols = 3:7]
Country c_school c_home c_work c_transport c_leisure
1: Vietnam NaN 0.2380952 0.6 1 0.25
2: Vietnam NaN 0.1428571 0.0 0 0.75
3: Vietnam NaN 0.3333333 0.4 0 0.00
4: Vietnam NaN 0.2857143 0.0 0 0.00
5: China 1 1.0000000 NaN NaN NaN
6: Malaysia 1 1.0000000 NaN NaN NaN
add a comment |
I know you ask for tidyverse
functions, but this is also a task where the data.table
package shines:
library(data.table)
setDT(df)
df[, lapply(.SD, function(x) x / sum(x)), by = Country, .SDcols = 3:7]
Country c_school c_home c_work c_transport c_leisure
1: Vietnam NaN 0.2380952 0.6 1 0.25
2: Vietnam NaN 0.1428571 0.0 0 0.75
3: Vietnam NaN 0.3333333 0.4 0 0.00
4: Vietnam NaN 0.2857143 0.0 0 0.00
5: China 1 1.0000000 NaN NaN NaN
6: Malaysia 1 1.0000000 NaN NaN NaN
add a comment |
I know you ask for tidyverse
functions, but this is also a task where the data.table
package shines:
library(data.table)
setDT(df)
df[, lapply(.SD, function(x) x / sum(x)), by = Country, .SDcols = 3:7]
Country c_school c_home c_work c_transport c_leisure
1: Vietnam NaN 0.2380952 0.6 1 0.25
2: Vietnam NaN 0.1428571 0.0 0 0.75
3: Vietnam NaN 0.3333333 0.4 0 0.00
4: Vietnam NaN 0.2857143 0.0 0 0.00
5: China 1 1.0000000 NaN NaN NaN
6: Malaysia 1 1.0000000 NaN NaN NaN
I know you ask for tidyverse
functions, but this is also a task where the data.table
package shines:
library(data.table)
setDT(df)
df[, lapply(.SD, function(x) x / sum(x)), by = Country, .SDcols = 3:7]
Country c_school c_home c_work c_transport c_leisure
1: Vietnam NaN 0.2380952 0.6 1 0.25
2: Vietnam NaN 0.1428571 0.0 0 0.75
3: Vietnam NaN 0.3333333 0.4 0 0.00
4: Vietnam NaN 0.2857143 0.0 0 0.00
5: China 1 1.0000000 NaN NaN NaN
6: Malaysia 1 1.0000000 NaN NaN NaN
answered Nov 23 '18 at 8:29
snoram
6,402831
6,402831
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f49404461%2fdividing-columns-by-particular-values-using-dplyr%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
divide with total (all columns) for the country, or total for the country per column?
– missuse
Mar 21 '18 at 11:11