Aggregate dataframe rows into a dictionary

I have a pandas DataFrame object where each row represents one object in an image.

One example of a possible row would be:

{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}

I want to aggregate all the objects that belong to the same image, and get something whose rows would be like:

{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {{'obj_size':'6', 'obj_type':'bus'}}]}

That is, the third column is a list of columns containing the data of each group.

How can I do this?

EDIT:

Consider the following example.

import pandas as pd

df1 = pd.DataFrame([

{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}, 

{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'6', 'obj_type':'bus'}, 

{'img_filename': 'img2.txt', 'img_size':'25', 'obj_size':'4', 'obj_type':'car'}

])



df2 = pd.DataFrame([

{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {'obj_size':'6', 'obj_type':'bus'}]},

{'img_filename': 'img2.txt', 'img_size':'25', 'obj': [{'obj_size':'4', 'obj_type':'car'}]}

])

I want to turn df1 into df2.

edited Nov 22 at 17:59

asked Nov 22 at 17:46

Jsevillamol

628617

You have dictionaries inside pandas cells?
– roganjosh
Nov 22 at 17:51

@roganjosh I represented the rows as dictionaries, but they are not
– Jsevillamol
Nov 22 at 17:53

Could you add a small reproducible example to help understand the problem?
– Sven Harris
Nov 22 at 17:54

1

Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
– roganjosh
Nov 22 at 17:55

1

@roganjosh done
– Jsevillamol
Nov 22 at 17:59

add a comment |

I have a pandas DataFrame object where each row represents one object in an image.

One example of a possible row would be:

{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}

I want to aggregate all the objects that belong to the same image, and get something whose rows would be like:

{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {{'obj_size':'6', 'obj_type':'bus'}}]}

That is, the third column is a list of columns containing the data of each group.

How can I do this?

EDIT:

Consider the following example.

import pandas as pd

df1 = pd.DataFrame([

{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}, 

{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'6', 'obj_type':'bus'}, 

{'img_filename': 'img2.txt', 'img_size':'25', 'obj_size':'4', 'obj_type':'car'}

])



df2 = pd.DataFrame([

{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {'obj_size':'6', 'obj_type':'bus'}]},

{'img_filename': 'img2.txt', 'img_size':'25', 'obj': [{'obj_size':'4', 'obj_type':'car'}]}

])

I want to turn df1 into df2.

edited Nov 22 at 17:59

asked Nov 22 at 17:46

Jsevillamol

628617

You have dictionaries inside pandas cells?
– roganjosh
Nov 22 at 17:51

@roganjosh I represented the rows as dictionaries, but they are not
– Jsevillamol
Nov 22 at 17:53

Could you add a small reproducible example to help understand the problem?
– Sven Harris
Nov 22 at 17:54

1

Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
– roganjosh
Nov 22 at 17:55

1

@roganjosh done
– Jsevillamol
Nov 22 at 17:59

add a comment |

I have a pandas DataFrame object where each row represents one object in an image.

One example of a possible row would be:

{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}

I want to aggregate all the objects that belong to the same image, and get something whose rows would be like:

{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {{'obj_size':'6', 'obj_type':'bus'}}]}

That is, the third column is a list of columns containing the data of each group.

How can I do this?

EDIT:

Consider the following example.

import pandas as pd

df1 = pd.DataFrame([

{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}, 

{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'6', 'obj_type':'bus'}, 

{'img_filename': 'img2.txt', 'img_size':'25', 'obj_size':'4', 'obj_type':'car'}

])



df2 = pd.DataFrame([

{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {'obj_size':'6', 'obj_type':'bus'}]},

{'img_filename': 'img2.txt', 'img_size':'25', 'obj': [{'obj_size':'4', 'obj_type':'car'}]}

])

I want to turn df1 into df2.

edited Nov 22 at 17:59

asked Nov 22 at 17:46

Jsevillamol

628617

I have a pandas DataFrame object where each row represents one object in an image.

One example of a possible row would be:

{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}

I want to aggregate all the objects that belong to the same image, and get something whose rows would be like:

{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {{'obj_size':'6', 'obj_type':'bus'}}]}

That is, the third column is a list of columns containing the data of each group.

How can I do this?

EDIT:

Consider the following example.

import pandas as pd

df1 = pd.DataFrame([

{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}, 

{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'6', 'obj_type':'bus'}, 

{'img_filename': 'img2.txt', 'img_size':'25', 'obj_size':'4', 'obj_type':'car'}

])



df2 = pd.DataFrame([

{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {'obj_size':'6', 'obj_type':'bus'}]},

{'img_filename': 'img2.txt', 'img_size':'25', 'obj': [{'obj_size':'4', 'obj_type':'car'}]}

])

I want to turn df1 into df2.

python pandas

edited Nov 22 at 17:59

asked Nov 22 at 17:46

Jsevillamol

628617

edited Nov 22 at 17:59

asked Nov 22 at 17:46

Jsevillamol

628617

edited Nov 22 at 17:59

asked Nov 22 at 17:46

Jsevillamol

628617

asked Nov 22 at 17:46

Jsevillamol

628617

asked Nov 22 at 17:46

Jsevillamol

628617

You have dictionaries inside pandas cells?
– roganjosh
Nov 22 at 17:51

@roganjosh I represented the rows as dictionaries, but they are not
– Jsevillamol
Nov 22 at 17:53

Could you add a small reproducible example to help understand the problem?
– Sven Harris
Nov 22 at 17:54

1

Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
– roganjosh
Nov 22 at 17:55

1

@roganjosh done
– Jsevillamol
Nov 22 at 17:59

add a comment |

You have dictionaries inside pandas cells?
– roganjosh
Nov 22 at 17:51

@roganjosh I represented the rows as dictionaries, but they are not
– Jsevillamol
Nov 22 at 17:53

Could you add a small reproducible example to help understand the problem?
– Sven Harris
Nov 22 at 17:54

1

Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
– roganjosh
Nov 22 at 17:55

1

@roganjosh done
– Jsevillamol
Nov 22 at 17:59

You have dictionaries inside pandas cells?
– roganjosh
Nov 22 at 17:51

@roganjosh I represented the rows as dictionaries, but they are not
– Jsevillamol
Nov 22 at 17:53

Could you add a small reproducible example to help understand the problem?
– Sven Harris
Nov 22 at 17:54

Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
– roganjosh
Nov 22 at 17:55

@roganjosh done
– Jsevillamol
Nov 22 at 17:59

add a comment |

2 Answers
2

active

oldest

votes

One way using to_dict

df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))

df2 = df2.reset_index(name='obj')



# Assuming you have multiple same img files with different sizes then I'm choosing first.

# If this not the case then groupby directly and reset index.

#df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))



df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values



print (df2)



  img_filename                                                obj img_size

0     img1.txt  [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s...       20

1     img2.txt             [{'obj_size': '4', 'obj_type': 'car'}]       25

edited Nov 22 at 19:01

answered Nov 22 at 18:24

Abhi

2,479320

add a comment |

One liner.

Suppose you have same img_filename and different img_size and you want to join the value.
For ex:

  img_filename img_size obj_size obj_type

0     img1.txt       20        5      car

1     img1.txt       22        6      bus

2     img2.txt       25        4      car



# if you want to join the img_size of img1.txt like 20, 22

df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()

Output:

  img_filename                                                obj img_size

0     img1.txt  [{"obj_size":"5","obj_type":"car"},{"obj_size"...    20,22

1     img2.txt                [{"obj_size":"4","obj_type":"car"}]       25

Considering first value

#if you want to consider only first value i.e. 20

df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()

Output:

  img_filename                                                obj img_size

0     img1.txt  [{"obj_size":"5","obj_type":"car"},{"obj_size"...       20

1     img2.txt                [{"obj_size":"4","obj_type":"car"}]       25

edited Nov 22 at 19:28

answered Nov 22 at 18:42

Srce Cde

1,136411

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436055%2faggregate-dataframe-rows-into-a-dictionary%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

One way using to_dict

df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))

df2 = df2.reset_index(name='obj')



# Assuming you have multiple same img files with different sizes then I'm choosing first.

# If this not the case then groupby directly and reset index.

#df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))



df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values



print (df2)



  img_filename                                                obj img_size

0     img1.txt  [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s...       20

1     img2.txt             [{'obj_size': '4', 'obj_type': 'car'}]       25

edited Nov 22 at 19:01

answered Nov 22 at 18:24

Abhi

2,479320

add a comment |

One way using to_dict

df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))

df2 = df2.reset_index(name='obj')



# Assuming you have multiple same img files with different sizes then I'm choosing first.

# If this not the case then groupby directly and reset index.

#df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))



df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values



print (df2)



  img_filename                                                obj img_size

0     img1.txt  [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s...       20

1     img2.txt             [{'obj_size': '4', 'obj_type': 'car'}]       25

edited Nov 22 at 19:01

answered Nov 22 at 18:24

Abhi

2,479320

add a comment |

One way using to_dict

df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))

df2 = df2.reset_index(name='obj')



# Assuming you have multiple same img files with different sizes then I'm choosing first.

# If this not the case then groupby directly and reset index.

#df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))



df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values



print (df2)



  img_filename                                                obj img_size

0     img1.txt  [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s...       20

1     img2.txt             [{'obj_size': '4', 'obj_type': 'car'}]       25

edited Nov 22 at 19:01

answered Nov 22 at 18:24

Abhi

2,479320

One way using to_dict

df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))

df2 = df2.reset_index(name='obj')



# Assuming you have multiple same img files with different sizes then I'm choosing first.

# If this not the case then groupby directly and reset index.

#df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))



df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values



print (df2)



  img_filename                                                obj img_size

0     img1.txt  [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s...       20

1     img2.txt             [{'obj_size': '4', 'obj_type': 'car'}]       25

edited Nov 22 at 19:01

answered Nov 22 at 18:24

Abhi

2,479320

edited Nov 22 at 19:01

answered Nov 22 at 18:24

Abhi

2,479320

answered Nov 22 at 18:24

Abhi

2,479320

answered Nov 22 at 18:24

Abhi

2,479320

add a comment |

One liner.

Suppose you have same img_filename and different img_size and you want to join the value.
For ex:

  img_filename img_size obj_size obj_type

0     img1.txt       20        5      car

1     img1.txt       22        6      bus

2     img2.txt       25        4      car



# if you want to join the img_size of img1.txt like 20, 22

df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()

Output:

  img_filename                                                obj img_size

0     img1.txt  [{"obj_size":"5","obj_type":"car"},{"obj_size"...    20,22

1     img2.txt                [{"obj_size":"4","obj_type":"car"}]       25

Considering first value

#if you want to consider only first value i.e. 20

df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()

Output:

  img_filename                                                obj img_size

0     img1.txt  [{"obj_size":"5","obj_type":"car"},{"obj_size"...       20

1     img2.txt                [{"obj_size":"4","obj_type":"car"}]       25

edited Nov 22 at 19:28

answered Nov 22 at 18:42

Srce Cde

1,136411

add a comment |

One liner.

Suppose you have same img_filename and different img_size and you want to join the value.
For ex:

  img_filename img_size obj_size obj_type

0     img1.txt       20        5      car

1     img1.txt       22        6      bus

2     img2.txt       25        4      car



# if you want to join the img_size of img1.txt like 20, 22

df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()

Output:

  img_filename                                                obj img_size

0     img1.txt  [{"obj_size":"5","obj_type":"car"},{"obj_size"...    20,22

1     img2.txt                [{"obj_size":"4","obj_type":"car"}]       25

Considering first value

#if you want to consider only first value i.e. 20

df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()

Output:

  img_filename                                                obj img_size

0     img1.txt  [{"obj_size":"5","obj_type":"car"},{"obj_size"...       20

1     img2.txt                [{"obj_size":"4","obj_type":"car"}]       25

edited Nov 22 at 19:28

answered Nov 22 at 18:42

Srce Cde

1,136411

add a comment |

One liner.

Suppose you have same img_filename and different img_size and you want to join the value.
For ex:

  img_filename img_size obj_size obj_type

0     img1.txt       20        5      car

1     img1.txt       22        6      bus

2     img2.txt       25        4      car



# if you want to join the img_size of img1.txt like 20, 22

df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()

Output:

  img_filename                                                obj img_size

0     img1.txt  [{"obj_size":"5","obj_type":"car"},{"obj_size"...    20,22

1     img2.txt                [{"obj_size":"4","obj_type":"car"}]       25

Considering first value

#if you want to consider only first value i.e. 20

df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()

Output:

  img_filename                                                obj img_size

0     img1.txt  [{"obj_size":"5","obj_type":"car"},{"obj_size"...       20

1     img2.txt                [{"obj_size":"4","obj_type":"car"}]       25

edited Nov 22 at 19:28

answered Nov 22 at 18:42

Srce Cde

1,136411

One liner.

Suppose you have same img_filename and different img_size and you want to join the value.
For ex:

  img_filename img_size obj_size obj_type

0     img1.txt       20        5      car

1     img1.txt       22        6      bus

2     img2.txt       25        4      car



# if you want to join the img_size of img1.txt like 20, 22

df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()

Output:

  img_filename                                                obj img_size

0     img1.txt  [{"obj_size":"5","obj_type":"car"},{"obj_size"...    20,22

1     img2.txt                [{"obj_size":"4","obj_type":"car"}]       25

Considering first value

#if you want to consider only first value i.e. 20

df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()

Output:

  img_filename                                                obj img_size

0     img1.txt  [{"obj_size":"5","obj_type":"car"},{"obj_size"...       20

1     img2.txt                [{"obj_size":"4","obj_type":"car"}]       25

edited Nov 22 at 19:28

answered Nov 22 at 18:42

Srce Cde

1,136411

edited Nov 22 at 19:28

answered Nov 22 at 18:42

Srce Cde

1,136411

answered Nov 22 at 18:42

Srce Cde

1,136411

answered Nov 22 at 18:42

Srce Cde

1,136411

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Qfyilyi