Aggregate dataframe rows into a dictionary












1














I have a pandas DataFrame object where each row represents one object in an image.



One example of a possible row would be:



{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}


I want to aggregate all the objects that belong to the same image, and get something whose rows would be like:



{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {{'obj_size':'6', 'obj_type':'bus'}}]}


That is, the third column is a list of columns containing the data of each group.



How can I do this?



EDIT:



Consider the following example.



import pandas as pd
df1 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'},
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'6', 'obj_type':'bus'},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj_size':'4', 'obj_type':'car'}
])

df2 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {'obj_size':'6', 'obj_type':'bus'}]},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj': [{'obj_size':'4', 'obj_type':'car'}]}
])


I want to turn df1 into df2.










share|improve this question
























  • You have dictionaries inside pandas cells?
    – roganjosh
    Nov 22 at 17:51










  • @roganjosh I represented the rows as dictionaries, but they are not
    – Jsevillamol
    Nov 22 at 17:53










  • Could you add a small reproducible example to help understand the problem?
    – Sven Harris
    Nov 22 at 17:54






  • 1




    Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
    – roganjosh
    Nov 22 at 17:55






  • 1




    @roganjosh done
    – Jsevillamol
    Nov 22 at 17:59
















1














I have a pandas DataFrame object where each row represents one object in an image.



One example of a possible row would be:



{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}


I want to aggregate all the objects that belong to the same image, and get something whose rows would be like:



{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {{'obj_size':'6', 'obj_type':'bus'}}]}


That is, the third column is a list of columns containing the data of each group.



How can I do this?



EDIT:



Consider the following example.



import pandas as pd
df1 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'},
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'6', 'obj_type':'bus'},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj_size':'4', 'obj_type':'car'}
])

df2 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {'obj_size':'6', 'obj_type':'bus'}]},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj': [{'obj_size':'4', 'obj_type':'car'}]}
])


I want to turn df1 into df2.










share|improve this question
























  • You have dictionaries inside pandas cells?
    – roganjosh
    Nov 22 at 17:51










  • @roganjosh I represented the rows as dictionaries, but they are not
    – Jsevillamol
    Nov 22 at 17:53










  • Could you add a small reproducible example to help understand the problem?
    – Sven Harris
    Nov 22 at 17:54






  • 1




    Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
    – roganjosh
    Nov 22 at 17:55






  • 1




    @roganjosh done
    – Jsevillamol
    Nov 22 at 17:59














1












1








1







I have a pandas DataFrame object where each row represents one object in an image.



One example of a possible row would be:



{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}


I want to aggregate all the objects that belong to the same image, and get something whose rows would be like:



{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {{'obj_size':'6', 'obj_type':'bus'}}]}


That is, the third column is a list of columns containing the data of each group.



How can I do this?



EDIT:



Consider the following example.



import pandas as pd
df1 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'},
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'6', 'obj_type':'bus'},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj_size':'4', 'obj_type':'car'}
])

df2 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {'obj_size':'6', 'obj_type':'bus'}]},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj': [{'obj_size':'4', 'obj_type':'car'}]}
])


I want to turn df1 into df2.










share|improve this question















I have a pandas DataFrame object where each row represents one object in an image.



One example of a possible row would be:



{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}


I want to aggregate all the objects that belong to the same image, and get something whose rows would be like:



{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {{'obj_size':'6', 'obj_type':'bus'}}]}


That is, the third column is a list of columns containing the data of each group.



How can I do this?



EDIT:



Consider the following example.



import pandas as pd
df1 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'},
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'6', 'obj_type':'bus'},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj_size':'4', 'obj_type':'car'}
])

df2 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {'obj_size':'6', 'obj_type':'bus'}]},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj': [{'obj_size':'4', 'obj_type':'car'}]}
])


I want to turn df1 into df2.







python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 at 17:59

























asked Nov 22 at 17:46









Jsevillamol

628617




628617












  • You have dictionaries inside pandas cells?
    – roganjosh
    Nov 22 at 17:51










  • @roganjosh I represented the rows as dictionaries, but they are not
    – Jsevillamol
    Nov 22 at 17:53










  • Could you add a small reproducible example to help understand the problem?
    – Sven Harris
    Nov 22 at 17:54






  • 1




    Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
    – roganjosh
    Nov 22 at 17:55






  • 1




    @roganjosh done
    – Jsevillamol
    Nov 22 at 17:59


















  • You have dictionaries inside pandas cells?
    – roganjosh
    Nov 22 at 17:51










  • @roganjosh I represented the rows as dictionaries, but they are not
    – Jsevillamol
    Nov 22 at 17:53










  • Could you add a small reproducible example to help understand the problem?
    – Sven Harris
    Nov 22 at 17:54






  • 1




    Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
    – roganjosh
    Nov 22 at 17:55






  • 1




    @roganjosh done
    – Jsevillamol
    Nov 22 at 17:59
















You have dictionaries inside pandas cells?
– roganjosh
Nov 22 at 17:51




You have dictionaries inside pandas cells?
– roganjosh
Nov 22 at 17:51












@roganjosh I represented the rows as dictionaries, but they are not
– Jsevillamol
Nov 22 at 17:53




@roganjosh I represented the rows as dictionaries, but they are not
– Jsevillamol
Nov 22 at 17:53












Could you add a small reproducible example to help understand the problem?
– Sven Harris
Nov 22 at 17:54




Could you add a small reproducible example to help understand the problem?
– Sven Harris
Nov 22 at 17:54




1




1




Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
– roganjosh
Nov 22 at 17:55




Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
– roganjosh
Nov 22 at 17:55




1




1




@roganjosh done
– Jsevillamol
Nov 22 at 17:59




@roganjosh done
– Jsevillamol
Nov 22 at 17:59












2 Answers
2






active

oldest

votes


















1














One way using to_dict



df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
df2 = df2.reset_index(name='obj')

# Assuming you have multiple same img files with different sizes then I'm choosing first.
# If this not the case then groupby directly and reset index.
#df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))

df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values

print (df2)

img_filename obj img_size
0 img1.txt [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s... 20
1 img2.txt [{'obj_size': '4', 'obj_type': 'car'}] 25





share|improve this answer































    1














    One liner.



    Suppose you have same img_filename and different img_size and you want to join the value.
    For ex:



      img_filename img_size obj_size obj_type
    0 img1.txt 20 5 car
    1 img1.txt 22 6 bus
    2 img2.txt 25 4 car

    # if you want to join the img_size of img1.txt like 20, 22
    df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()


    Output:



      img_filename                                                obj img_size
    0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20,22
    1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25


    Considering first value



    #if you want to consider only first value i.e. 20
    df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()


    Output:



      img_filename                                                obj img_size
    0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20
    1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436055%2faggregate-dataframe-rows-into-a-dictionary%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      One way using to_dict



      df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
      df2 = df2.reset_index(name='obj')

      # Assuming you have multiple same img files with different sizes then I'm choosing first.
      # If this not the case then groupby directly and reset index.
      #df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))

      df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values

      print (df2)

      img_filename obj img_size
      0 img1.txt [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s... 20
      1 img2.txt [{'obj_size': '4', 'obj_type': 'car'}] 25





      share|improve this answer




























        1














        One way using to_dict



        df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
        df2 = df2.reset_index(name='obj')

        # Assuming you have multiple same img files with different sizes then I'm choosing first.
        # If this not the case then groupby directly and reset index.
        #df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))

        df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values

        print (df2)

        img_filename obj img_size
        0 img1.txt [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s... 20
        1 img2.txt [{'obj_size': '4', 'obj_type': 'car'}] 25





        share|improve this answer


























          1












          1








          1






          One way using to_dict



          df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
          df2 = df2.reset_index(name='obj')

          # Assuming you have multiple same img files with different sizes then I'm choosing first.
          # If this not the case then groupby directly and reset index.
          #df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))

          df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values

          print (df2)

          img_filename obj img_size
          0 img1.txt [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s... 20
          1 img2.txt [{'obj_size': '4', 'obj_type': 'car'}] 25





          share|improve this answer














          One way using to_dict



          df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
          df2 = df2.reset_index(name='obj')

          # Assuming you have multiple same img files with different sizes then I'm choosing first.
          # If this not the case then groupby directly and reset index.
          #df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))

          df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values

          print (df2)

          img_filename obj img_size
          0 img1.txt [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s... 20
          1 img2.txt [{'obj_size': '4', 'obj_type': 'car'}] 25






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 22 at 19:01

























          answered Nov 22 at 18:24









          Abhi

          2,479320




          2,479320

























              1














              One liner.



              Suppose you have same img_filename and different img_size and you want to join the value.
              For ex:



                img_filename img_size obj_size obj_type
              0 img1.txt 20 5 car
              1 img1.txt 22 6 bus
              2 img2.txt 25 4 car

              # if you want to join the img_size of img1.txt like 20, 22
              df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()


              Output:



                img_filename                                                obj img_size
              0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20,22
              1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25


              Considering first value



              #if you want to consider only first value i.e. 20
              df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()


              Output:



                img_filename                                                obj img_size
              0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20
              1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25





              share|improve this answer




























                1














                One liner.



                Suppose you have same img_filename and different img_size and you want to join the value.
                For ex:



                  img_filename img_size obj_size obj_type
                0 img1.txt 20 5 car
                1 img1.txt 22 6 bus
                2 img2.txt 25 4 car

                # if you want to join the img_size of img1.txt like 20, 22
                df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()


                Output:



                  img_filename                                                obj img_size
                0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20,22
                1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25


                Considering first value



                #if you want to consider only first value i.e. 20
                df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()


                Output:



                  img_filename                                                obj img_size
                0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20
                1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25





                share|improve this answer


























                  1












                  1








                  1






                  One liner.



                  Suppose you have same img_filename and different img_size and you want to join the value.
                  For ex:



                    img_filename img_size obj_size obj_type
                  0 img1.txt 20 5 car
                  1 img1.txt 22 6 bus
                  2 img2.txt 25 4 car

                  # if you want to join the img_size of img1.txt like 20, 22
                  df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()


                  Output:



                    img_filename                                                obj img_size
                  0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20,22
                  1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25


                  Considering first value



                  #if you want to consider only first value i.e. 20
                  df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()


                  Output:



                    img_filename                                                obj img_size
                  0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20
                  1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25





                  share|improve this answer














                  One liner.



                  Suppose you have same img_filename and different img_size and you want to join the value.
                  For ex:



                    img_filename img_size obj_size obj_type
                  0 img1.txt 20 5 car
                  1 img1.txt 22 6 bus
                  2 img2.txt 25 4 car

                  # if you want to join the img_size of img1.txt like 20, 22
                  df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()


                  Output:



                    img_filename                                                obj img_size
                  0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20,22
                  1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25


                  Considering first value



                  #if you want to consider only first value i.e. 20
                  df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()


                  Output:



                    img_filename                                                obj img_size
                  0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20
                  1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 22 at 19:28

























                  answered Nov 22 at 18:42









                  Srce Cde

                  1,136411




                  1,136411






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436055%2faggregate-dataframe-rows-into-a-dictionary%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      What visual should I use to simply compare current year value vs last year in Power BI desktop

                      How to ignore python UserWarning in pytest?

                      Alexandru Averescu