Aggregate dataframe rows into a dictionary
I have a pandas DataFrame object where each row represents one object in an image.
One example of a possible row would be:
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}
I want to aggregate all the objects that belong to the same image, and get something whose rows would be like:
{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {{'obj_size':'6', 'obj_type':'bus'}}]}
That is, the third column is a list of columns containing the data of each group.
How can I do this?
EDIT:
Consider the following example.
import pandas as pd
df1 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'},
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'6', 'obj_type':'bus'},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj_size':'4', 'obj_type':'car'}
])
df2 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {'obj_size':'6', 'obj_type':'bus'}]},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj': [{'obj_size':'4', 'obj_type':'car'}]}
])
I want to turn df1
into df2
.
python pandas
add a comment |
I have a pandas DataFrame object where each row represents one object in an image.
One example of a possible row would be:
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}
I want to aggregate all the objects that belong to the same image, and get something whose rows would be like:
{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {{'obj_size':'6', 'obj_type':'bus'}}]}
That is, the third column is a list of columns containing the data of each group.
How can I do this?
EDIT:
Consider the following example.
import pandas as pd
df1 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'},
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'6', 'obj_type':'bus'},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj_size':'4', 'obj_type':'car'}
])
df2 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {'obj_size':'6', 'obj_type':'bus'}]},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj': [{'obj_size':'4', 'obj_type':'car'}]}
])
I want to turn df1
into df2
.
python pandas
You have dictionaries inside pandas cells?
– roganjosh
Nov 22 at 17:51
@roganjosh I represented the rows as dictionaries, but they are not
– Jsevillamol
Nov 22 at 17:53
Could you add a small reproducible example to help understand the problem?
– Sven Harris
Nov 22 at 17:54
1
Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
– roganjosh
Nov 22 at 17:55
1
@roganjosh done
– Jsevillamol
Nov 22 at 17:59
add a comment |
I have a pandas DataFrame object where each row represents one object in an image.
One example of a possible row would be:
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}
I want to aggregate all the objects that belong to the same image, and get something whose rows would be like:
{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {{'obj_size':'6', 'obj_type':'bus'}}]}
That is, the third column is a list of columns containing the data of each group.
How can I do this?
EDIT:
Consider the following example.
import pandas as pd
df1 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'},
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'6', 'obj_type':'bus'},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj_size':'4', 'obj_type':'car'}
])
df2 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {'obj_size':'6', 'obj_type':'bus'}]},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj': [{'obj_size':'4', 'obj_type':'car'}]}
])
I want to turn df1
into df2
.
python pandas
I have a pandas DataFrame object where each row represents one object in an image.
One example of a possible row would be:
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}
I want to aggregate all the objects that belong to the same image, and get something whose rows would be like:
{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {{'obj_size':'6', 'obj_type':'bus'}}]}
That is, the third column is a list of columns containing the data of each group.
How can I do this?
EDIT:
Consider the following example.
import pandas as pd
df1 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'},
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'6', 'obj_type':'bus'},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj_size':'4', 'obj_type':'car'}
])
df2 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {'obj_size':'6', 'obj_type':'bus'}]},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj': [{'obj_size':'4', 'obj_type':'car'}]}
])
I want to turn df1
into df2
.
python pandas
python pandas
edited Nov 22 at 17:59
asked Nov 22 at 17:46
Jsevillamol
628617
628617
You have dictionaries inside pandas cells?
– roganjosh
Nov 22 at 17:51
@roganjosh I represented the rows as dictionaries, but they are not
– Jsevillamol
Nov 22 at 17:53
Could you add a small reproducible example to help understand the problem?
– Sven Harris
Nov 22 at 17:54
1
Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
– roganjosh
Nov 22 at 17:55
1
@roganjosh done
– Jsevillamol
Nov 22 at 17:59
add a comment |
You have dictionaries inside pandas cells?
– roganjosh
Nov 22 at 17:51
@roganjosh I represented the rows as dictionaries, but they are not
– Jsevillamol
Nov 22 at 17:53
Could you add a small reproducible example to help understand the problem?
– Sven Harris
Nov 22 at 17:54
1
Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
– roganjosh
Nov 22 at 17:55
1
@roganjosh done
– Jsevillamol
Nov 22 at 17:59
You have dictionaries inside pandas cells?
– roganjosh
Nov 22 at 17:51
You have dictionaries inside pandas cells?
– roganjosh
Nov 22 at 17:51
@roganjosh I represented the rows as dictionaries, but they are not
– Jsevillamol
Nov 22 at 17:53
@roganjosh I represented the rows as dictionaries, but they are not
– Jsevillamol
Nov 22 at 17:53
Could you add a small reproducible example to help understand the problem?
– Sven Harris
Nov 22 at 17:54
Could you add a small reproducible example to help understand the problem?
– Sven Harris
Nov 22 at 17:54
1
1
Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
– roganjosh
Nov 22 at 17:55
Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
– roganjosh
Nov 22 at 17:55
1
1
@roganjosh done
– Jsevillamol
Nov 22 at 17:59
@roganjosh done
– Jsevillamol
Nov 22 at 17:59
add a comment |
2 Answers
2
active
oldest
votes
One way using to_dict
df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
df2 = df2.reset_index(name='obj')
# Assuming you have multiple same img files with different sizes then I'm choosing first.
# If this not the case then groupby directly and reset index.
#df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values
print (df2)
img_filename obj img_size
0 img1.txt [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s... 20
1 img2.txt [{'obj_size': '4', 'obj_type': 'car'}] 25
add a comment |
One liner.
Suppose you have same img_filename
and different img_size
and you want to join
the value.
For ex:
img_filename img_size obj_size obj_type
0 img1.txt 20 5 car
1 img1.txt 22 6 bus
2 img2.txt 25 4 car
# if you want to join the img_size of img1.txt like 20, 22
df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()
Output:
img_filename obj img_size
0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20,22
1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25
Considering first value
#if you want to consider only first value i.e. 20
df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()
Output:
img_filename obj img_size
0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20
1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436055%2faggregate-dataframe-rows-into-a-dictionary%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
One way using to_dict
df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
df2 = df2.reset_index(name='obj')
# Assuming you have multiple same img files with different sizes then I'm choosing first.
# If this not the case then groupby directly and reset index.
#df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values
print (df2)
img_filename obj img_size
0 img1.txt [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s... 20
1 img2.txt [{'obj_size': '4', 'obj_type': 'car'}] 25
add a comment |
One way using to_dict
df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
df2 = df2.reset_index(name='obj')
# Assuming you have multiple same img files with different sizes then I'm choosing first.
# If this not the case then groupby directly and reset index.
#df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values
print (df2)
img_filename obj img_size
0 img1.txt [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s... 20
1 img2.txt [{'obj_size': '4', 'obj_type': 'car'}] 25
add a comment |
One way using to_dict
df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
df2 = df2.reset_index(name='obj')
# Assuming you have multiple same img files with different sizes then I'm choosing first.
# If this not the case then groupby directly and reset index.
#df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values
print (df2)
img_filename obj img_size
0 img1.txt [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s... 20
1 img2.txt [{'obj_size': '4', 'obj_type': 'car'}] 25
One way using to_dict
df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
df2 = df2.reset_index(name='obj')
# Assuming you have multiple same img files with different sizes then I'm choosing first.
# If this not the case then groupby directly and reset index.
#df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values
print (df2)
img_filename obj img_size
0 img1.txt [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s... 20
1 img2.txt [{'obj_size': '4', 'obj_type': 'car'}] 25
edited Nov 22 at 19:01
answered Nov 22 at 18:24
Abhi
2,479320
2,479320
add a comment |
add a comment |
One liner.
Suppose you have same img_filename
and different img_size
and you want to join
the value.
For ex:
img_filename img_size obj_size obj_type
0 img1.txt 20 5 car
1 img1.txt 22 6 bus
2 img2.txt 25 4 car
# if you want to join the img_size of img1.txt like 20, 22
df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()
Output:
img_filename obj img_size
0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20,22
1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25
Considering first value
#if you want to consider only first value i.e. 20
df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()
Output:
img_filename obj img_size
0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20
1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25
add a comment |
One liner.
Suppose you have same img_filename
and different img_size
and you want to join
the value.
For ex:
img_filename img_size obj_size obj_type
0 img1.txt 20 5 car
1 img1.txt 22 6 bus
2 img2.txt 25 4 car
# if you want to join the img_size of img1.txt like 20, 22
df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()
Output:
img_filename obj img_size
0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20,22
1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25
Considering first value
#if you want to consider only first value i.e. 20
df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()
Output:
img_filename obj img_size
0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20
1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25
add a comment |
One liner.
Suppose you have same img_filename
and different img_size
and you want to join
the value.
For ex:
img_filename img_size obj_size obj_type
0 img1.txt 20 5 car
1 img1.txt 22 6 bus
2 img2.txt 25 4 car
# if you want to join the img_size of img1.txt like 20, 22
df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()
Output:
img_filename obj img_size
0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20,22
1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25
Considering first value
#if you want to consider only first value i.e. 20
df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()
Output:
img_filename obj img_size
0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20
1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25
One liner.
Suppose you have same img_filename
and different img_size
and you want to join
the value.
For ex:
img_filename img_size obj_size obj_type
0 img1.txt 20 5 car
1 img1.txt 22 6 bus
2 img2.txt 25 4 car
# if you want to join the img_size of img1.txt like 20, 22
df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()
Output:
img_filename obj img_size
0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20,22
1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25
Considering first value
#if you want to consider only first value i.e. 20
df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()
Output:
img_filename obj img_size
0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20
1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25
edited Nov 22 at 19:28
answered Nov 22 at 18:42
Srce Cde
1,136411
1,136411
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436055%2faggregate-dataframe-rows-into-a-dictionary%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You have dictionaries inside pandas cells?
– roganjosh
Nov 22 at 17:51
@roganjosh I represented the rows as dictionaries, but they are not
– Jsevillamol
Nov 22 at 17:53
Could you add a small reproducible example to help understand the problem?
– Sven Harris
Nov 22 at 17:54
1
Then this is actually pretty confusing. Please make a small DF that actually illustrates your task. It need only be 3 or 4 rows of junk data, so long as it illustrates the point
– roganjosh
Nov 22 at 17:55
1
@roganjosh done
– Jsevillamol
Nov 22 at 17:59