Dividing columns by particular values using dplyr












0














I have a dataframe like this:



 Setting   q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam


and I would like to divide columns 2, 3, 4, 5, 6 by the total for that particular country.



Doing it in base R is a bit clumsy:



df[df$Country=="Vietnam",][c(3, 4, 5, 6)] = df[df$Country=="Vietnam",][c(3, 4, 5, 6)] / sum(df[df$Country=="Vietnam",][c(3, 4, 5, 6)])


(I think that works).



I'm trying to convert as much of my code as possible to use tidyverse functions. Is there a way of doing the same thing more efficiently using, dplyr, for instance?



Thanks.










share|improve this question
























  • divide with total (all columns) for the country, or total for the country per column?
    – missuse
    Mar 21 '18 at 11:11
















0














I have a dataframe like this:



 Setting   q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam


and I would like to divide columns 2, 3, 4, 5, 6 by the total for that particular country.



Doing it in base R is a bit clumsy:



df[df$Country=="Vietnam",][c(3, 4, 5, 6)] = df[df$Country=="Vietnam",][c(3, 4, 5, 6)] / sum(df[df$Country=="Vietnam",][c(3, 4, 5, 6)])


(I think that works).



I'm trying to convert as much of my code as possible to use tidyverse functions. Is there a way of doing the same thing more efficiently using, dplyr, for instance?



Thanks.










share|improve this question
























  • divide with total (all columns) for the country, or total for the country per column?
    – missuse
    Mar 21 '18 at 11:11














0












0








0







I have a dataframe like this:



 Setting   q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam


and I would like to divide columns 2, 3, 4, 5, 6 by the total for that particular country.



Doing it in base R is a bit clumsy:



df[df$Country=="Vietnam",][c(3, 4, 5, 6)] = df[df$Country=="Vietnam",][c(3, 4, 5, 6)] / sum(df[df$Country=="Vietnam",][c(3, 4, 5, 6)])


(I think that works).



I'm trying to convert as much of my code as possible to use tidyverse functions. Is there a way of doing the same thing more efficiently using, dplyr, for instance?



Thanks.










share|improve this question















I have a dataframe like this:



 Setting   q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam


and I would like to divide columns 2, 3, 4, 5, 6 by the total for that particular country.



Doing it in base R is a bit clumsy:



df[df$Country=="Vietnam",][c(3, 4, 5, 6)] = df[df$Country=="Vietnam",][c(3, 4, 5, 6)] / sum(df[df$Country=="Vietnam",][c(3, 4, 5, 6)])


(I think that works).



I'm trying to convert as much of my code as possible to use tidyverse functions. Is there a way of doing the same thing more efficiently using, dplyr, for instance?



Thanks.







r dplyr






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 '18 at 7:31









Marcus Campbell

2,01921027




2,01921027










asked Mar 21 '18 at 10:55









sahwahn

669




669












  • divide with total (all columns) for the country, or total for the country per column?
    – missuse
    Mar 21 '18 at 11:11


















  • divide with total (all columns) for the country, or total for the country per column?
    – missuse
    Mar 21 '18 at 11:11
















divide with total (all columns) for the country, or total for the country per column?
– missuse
Mar 21 '18 at 11:11




divide with total (all columns) for the country, or total for the country per column?
– missuse
Mar 21 '18 at 11:11












2 Answers
2






active

oldest

votes


















0














I trust this is what you are after:



Divide each column by the sum of that column - grouped by Country



library(tidyverse)
df1 %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(.)))
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 NaN 0.238 0.600 1.00 0.250 Vietnam
2 Rural 11900031 1.00 1.00 NaN NaN NaN China
3 Rural 11900033 NaN 0.143 0 0 0.750 Vietnam
4 Rural 11900053 NaN 0.333 0.400 0 0 Vietnam
5 Rural 11900114 1.00 1.00 NaN NaN NaN Malaysia
6 Rural 11900446 NaN 0.286 0 0 0 Vietnam


or alternatively divide each column by the total sum for each country as in your example (only difference is I used columns 3:7 as I trust you intended.



df1 %>%
mutate(sum = rowSums(.[,3:7])) %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(sum))) %>%
select(-sum)
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 0 0.161 0.0968 0.0323 0.0323 Vietnam
2 Rural 11900031 0.667 0.333 0 0 0 China
3 Rural 11900033 0 0.0968 0 0 0.0968 Vietnam
4 Rural 11900053 0 0.226 0.0645 0 0 Vietnam
5 Rural 11900114 0.333 0.667 0 0 0 Malaysia
6 Rural 11900446 0 0.194 0 0 0 Vietnam


data:



df1 = read.table(text ="Setting   q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam", header = T)





share|improve this answer































    0














    I know you ask for tidyverse functions, but this is also a task where the data.table package shines:



    library(data.table)
    setDT(df)
    df[, lapply(.SD, function(x) x / sum(x)), by = Country, .SDcols = 3:7]

    Country c_school c_home c_work c_transport c_leisure
    1: Vietnam NaN 0.2380952 0.6 1 0.25
    2: Vietnam NaN 0.1428571 0.0 0 0.75
    3: Vietnam NaN 0.3333333 0.4 0 0.00
    4: Vietnam NaN 0.2857143 0.0 0 0.00
    5: China 1 1.0000000 NaN NaN NaN
    6: Malaysia 1 1.0000000 NaN NaN NaN





    share|improve this answer





















      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f49404461%2fdividing-columns-by-particular-values-using-dplyr%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0














      I trust this is what you are after:



      Divide each column by the sum of that column - grouped by Country



      library(tidyverse)
      df1 %>%
      group_by(Country) %>%
      mutate_at(vars(c_school: c_leisure), funs(./ sum(.)))
      #output
      Setting q02_id c_school c_home c_work c_transport c_leisure Country
      <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
      1 Rural 11900006 NaN 0.238 0.600 1.00 0.250 Vietnam
      2 Rural 11900031 1.00 1.00 NaN NaN NaN China
      3 Rural 11900033 NaN 0.143 0 0 0.750 Vietnam
      4 Rural 11900053 NaN 0.333 0.400 0 0 Vietnam
      5 Rural 11900114 1.00 1.00 NaN NaN NaN Malaysia
      6 Rural 11900446 NaN 0.286 0 0 0 Vietnam


      or alternatively divide each column by the total sum for each country as in your example (only difference is I used columns 3:7 as I trust you intended.



      df1 %>%
      mutate(sum = rowSums(.[,3:7])) %>%
      group_by(Country) %>%
      mutate_at(vars(c_school: c_leisure), funs(./ sum(sum))) %>%
      select(-sum)
      #output
      Setting q02_id c_school c_home c_work c_transport c_leisure Country
      <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
      1 Rural 11900006 0 0.161 0.0968 0.0323 0.0323 Vietnam
      2 Rural 11900031 0.667 0.333 0 0 0 China
      3 Rural 11900033 0 0.0968 0 0 0.0968 Vietnam
      4 Rural 11900053 0 0.226 0.0645 0 0 Vietnam
      5 Rural 11900114 0.333 0.667 0 0 0 Malaysia
      6 Rural 11900446 0 0.194 0 0 0 Vietnam


      data:



      df1 = read.table(text ="Setting   q02_id c_school c_home c_work c_transport c_leisure Country
      Rural 11900006 0 5 3 1 1 Vietnam
      Rural 11900031 10 5 0 0 0 China
      Rural 11900033 0 3 0 0 3 Vietnam
      Rural 11900053 0 7 2 0 0 Vietnam
      Rural 11900114 3 6 0 0 0 Malaysia
      Rural 11900446 0 6 0 0 0 Vietnam", header = T)





      share|improve this answer




























        0














        I trust this is what you are after:



        Divide each column by the sum of that column - grouped by Country



        library(tidyverse)
        df1 %>%
        group_by(Country) %>%
        mutate_at(vars(c_school: c_leisure), funs(./ sum(.)))
        #output
        Setting q02_id c_school c_home c_work c_transport c_leisure Country
        <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
        1 Rural 11900006 NaN 0.238 0.600 1.00 0.250 Vietnam
        2 Rural 11900031 1.00 1.00 NaN NaN NaN China
        3 Rural 11900033 NaN 0.143 0 0 0.750 Vietnam
        4 Rural 11900053 NaN 0.333 0.400 0 0 Vietnam
        5 Rural 11900114 1.00 1.00 NaN NaN NaN Malaysia
        6 Rural 11900446 NaN 0.286 0 0 0 Vietnam


        or alternatively divide each column by the total sum for each country as in your example (only difference is I used columns 3:7 as I trust you intended.



        df1 %>%
        mutate(sum = rowSums(.[,3:7])) %>%
        group_by(Country) %>%
        mutate_at(vars(c_school: c_leisure), funs(./ sum(sum))) %>%
        select(-sum)
        #output
        Setting q02_id c_school c_home c_work c_transport c_leisure Country
        <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
        1 Rural 11900006 0 0.161 0.0968 0.0323 0.0323 Vietnam
        2 Rural 11900031 0.667 0.333 0 0 0 China
        3 Rural 11900033 0 0.0968 0 0 0.0968 Vietnam
        4 Rural 11900053 0 0.226 0.0645 0 0 Vietnam
        5 Rural 11900114 0.333 0.667 0 0 0 Malaysia
        6 Rural 11900446 0 0.194 0 0 0 Vietnam


        data:



        df1 = read.table(text ="Setting   q02_id c_school c_home c_work c_transport c_leisure Country
        Rural 11900006 0 5 3 1 1 Vietnam
        Rural 11900031 10 5 0 0 0 China
        Rural 11900033 0 3 0 0 3 Vietnam
        Rural 11900053 0 7 2 0 0 Vietnam
        Rural 11900114 3 6 0 0 0 Malaysia
        Rural 11900446 0 6 0 0 0 Vietnam", header = T)





        share|improve this answer


























          0












          0








          0






          I trust this is what you are after:



          Divide each column by the sum of that column - grouped by Country



          library(tidyverse)
          df1 %>%
          group_by(Country) %>%
          mutate_at(vars(c_school: c_leisure), funs(./ sum(.)))
          #output
          Setting q02_id c_school c_home c_work c_transport c_leisure Country
          <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
          1 Rural 11900006 NaN 0.238 0.600 1.00 0.250 Vietnam
          2 Rural 11900031 1.00 1.00 NaN NaN NaN China
          3 Rural 11900033 NaN 0.143 0 0 0.750 Vietnam
          4 Rural 11900053 NaN 0.333 0.400 0 0 Vietnam
          5 Rural 11900114 1.00 1.00 NaN NaN NaN Malaysia
          6 Rural 11900446 NaN 0.286 0 0 0 Vietnam


          or alternatively divide each column by the total sum for each country as in your example (only difference is I used columns 3:7 as I trust you intended.



          df1 %>%
          mutate(sum = rowSums(.[,3:7])) %>%
          group_by(Country) %>%
          mutate_at(vars(c_school: c_leisure), funs(./ sum(sum))) %>%
          select(-sum)
          #output
          Setting q02_id c_school c_home c_work c_transport c_leisure Country
          <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
          1 Rural 11900006 0 0.161 0.0968 0.0323 0.0323 Vietnam
          2 Rural 11900031 0.667 0.333 0 0 0 China
          3 Rural 11900033 0 0.0968 0 0 0.0968 Vietnam
          4 Rural 11900053 0 0.226 0.0645 0 0 Vietnam
          5 Rural 11900114 0.333 0.667 0 0 0 Malaysia
          6 Rural 11900446 0 0.194 0 0 0 Vietnam


          data:



          df1 = read.table(text ="Setting   q02_id c_school c_home c_work c_transport c_leisure Country
          Rural 11900006 0 5 3 1 1 Vietnam
          Rural 11900031 10 5 0 0 0 China
          Rural 11900033 0 3 0 0 3 Vietnam
          Rural 11900053 0 7 2 0 0 Vietnam
          Rural 11900114 3 6 0 0 0 Malaysia
          Rural 11900446 0 6 0 0 0 Vietnam", header = T)





          share|improve this answer














          I trust this is what you are after:



          Divide each column by the sum of that column - grouped by Country



          library(tidyverse)
          df1 %>%
          group_by(Country) %>%
          mutate_at(vars(c_school: c_leisure), funs(./ sum(.)))
          #output
          Setting q02_id c_school c_home c_work c_transport c_leisure Country
          <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
          1 Rural 11900006 NaN 0.238 0.600 1.00 0.250 Vietnam
          2 Rural 11900031 1.00 1.00 NaN NaN NaN China
          3 Rural 11900033 NaN 0.143 0 0 0.750 Vietnam
          4 Rural 11900053 NaN 0.333 0.400 0 0 Vietnam
          5 Rural 11900114 1.00 1.00 NaN NaN NaN Malaysia
          6 Rural 11900446 NaN 0.286 0 0 0 Vietnam


          or alternatively divide each column by the total sum for each country as in your example (only difference is I used columns 3:7 as I trust you intended.



          df1 %>%
          mutate(sum = rowSums(.[,3:7])) %>%
          group_by(Country) %>%
          mutate_at(vars(c_school: c_leisure), funs(./ sum(sum))) %>%
          select(-sum)
          #output
          Setting q02_id c_school c_home c_work c_transport c_leisure Country
          <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
          1 Rural 11900006 0 0.161 0.0968 0.0323 0.0323 Vietnam
          2 Rural 11900031 0.667 0.333 0 0 0 China
          3 Rural 11900033 0 0.0968 0 0 0.0968 Vietnam
          4 Rural 11900053 0 0.226 0.0645 0 0 Vietnam
          5 Rural 11900114 0.333 0.667 0 0 0 Malaysia
          6 Rural 11900446 0 0.194 0 0 0 Vietnam


          data:



          df1 = read.table(text ="Setting   q02_id c_school c_home c_work c_transport c_leisure Country
          Rural 11900006 0 5 3 1 1 Vietnam
          Rural 11900031 10 5 0 0 0 China
          Rural 11900033 0 3 0 0 3 Vietnam
          Rural 11900053 0 7 2 0 0 Vietnam
          Rural 11900114 3 6 0 0 0 Malaysia
          Rural 11900446 0 6 0 0 0 Vietnam", header = T)






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 21 '18 at 11:16

























          answered Mar 21 '18 at 11:06









          missuse

          11.6k2622




          11.6k2622

























              0














              I know you ask for tidyverse functions, but this is also a task where the data.table package shines:



              library(data.table)
              setDT(df)
              df[, lapply(.SD, function(x) x / sum(x)), by = Country, .SDcols = 3:7]

              Country c_school c_home c_work c_transport c_leisure
              1: Vietnam NaN 0.2380952 0.6 1 0.25
              2: Vietnam NaN 0.1428571 0.0 0 0.75
              3: Vietnam NaN 0.3333333 0.4 0 0.00
              4: Vietnam NaN 0.2857143 0.0 0 0.00
              5: China 1 1.0000000 NaN NaN NaN
              6: Malaysia 1 1.0000000 NaN NaN NaN





              share|improve this answer


























                0














                I know you ask for tidyverse functions, but this is also a task where the data.table package shines:



                library(data.table)
                setDT(df)
                df[, lapply(.SD, function(x) x / sum(x)), by = Country, .SDcols = 3:7]

                Country c_school c_home c_work c_transport c_leisure
                1: Vietnam NaN 0.2380952 0.6 1 0.25
                2: Vietnam NaN 0.1428571 0.0 0 0.75
                3: Vietnam NaN 0.3333333 0.4 0 0.00
                4: Vietnam NaN 0.2857143 0.0 0 0.00
                5: China 1 1.0000000 NaN NaN NaN
                6: Malaysia 1 1.0000000 NaN NaN NaN





                share|improve this answer
























                  0












                  0








                  0






                  I know you ask for tidyverse functions, but this is also a task where the data.table package shines:



                  library(data.table)
                  setDT(df)
                  df[, lapply(.SD, function(x) x / sum(x)), by = Country, .SDcols = 3:7]

                  Country c_school c_home c_work c_transport c_leisure
                  1: Vietnam NaN 0.2380952 0.6 1 0.25
                  2: Vietnam NaN 0.1428571 0.0 0 0.75
                  3: Vietnam NaN 0.3333333 0.4 0 0.00
                  4: Vietnam NaN 0.2857143 0.0 0 0.00
                  5: China 1 1.0000000 NaN NaN NaN
                  6: Malaysia 1 1.0000000 NaN NaN NaN





                  share|improve this answer












                  I know you ask for tidyverse functions, but this is also a task where the data.table package shines:



                  library(data.table)
                  setDT(df)
                  df[, lapply(.SD, function(x) x / sum(x)), by = Country, .SDcols = 3:7]

                  Country c_school c_home c_work c_transport c_leisure
                  1: Vietnam NaN 0.2380952 0.6 1 0.25
                  2: Vietnam NaN 0.1428571 0.0 0 0.75
                  3: Vietnam NaN 0.3333333 0.4 0 0.00
                  4: Vietnam NaN 0.2857143 0.0 0 0.00
                  5: China 1 1.0000000 NaN NaN NaN
                  6: Malaysia 1 1.0000000 NaN NaN NaN






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 23 '18 at 8:29









                  snoram

                  6,402831




                  6,402831






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f49404461%2fdividing-columns-by-particular-values-using-dplyr%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      How to ignore python UserWarning in pytest?

                      What visual should I use to simply compare current year value vs last year in Power BI desktop

                      Script to remove string up to first number