Sorting Strings in Python based on visual similarity
up vote
0
down vote
favorite
I am looking to Sort elements in a List or Array based on their visual similarity. For example for the Strings below:
MyStrings=["Hello","Reed","Hell","Olleh","Red","Hello2"]
I would expect the sorted List to be:
["Hello","Hell","Hello2","Red","Reed","Olleh"]
Notice that my wish is fulfilled as long as similar elements are next to each other. For example below could work totally fine as well:
["Red","Reed","Hell","Hello","Hello2,"Olleh"]
I know that Levenshtein Distance function in combination of Sort function could be helpful, but I am open to any other suggestions.
I would give extra context but you can safely ignore these if it unnecessarily complicates the matter. These Strings are attributes of an item that will sit at the X axis of a regression model. Since they are not of numeric nature, I need to have a logical way to sort and place them at the X axis. If they are listed randomly, the expected trend in the Y axis will not be realized for the model to work. I am as well considering feeding these elements into a Neural Network. But again the Mapping of the String to Numeric values becomes very important. In my opinion the mapping function would still need to apply the same sorting logic in a similar way. I would highly appreciate any insights about this problem
python sorting neural-network regression levenshtein-distance
add a comment |
up vote
0
down vote
favorite
I am looking to Sort elements in a List or Array based on their visual similarity. For example for the Strings below:
MyStrings=["Hello","Reed","Hell","Olleh","Red","Hello2"]
I would expect the sorted List to be:
["Hello","Hell","Hello2","Red","Reed","Olleh"]
Notice that my wish is fulfilled as long as similar elements are next to each other. For example below could work totally fine as well:
["Red","Reed","Hell","Hello","Hello2,"Olleh"]
I know that Levenshtein Distance function in combination of Sort function could be helpful, but I am open to any other suggestions.
I would give extra context but you can safely ignore these if it unnecessarily complicates the matter. These Strings are attributes of an item that will sit at the X axis of a regression model. Since they are not of numeric nature, I need to have a logical way to sort and place them at the X axis. If they are listed randomly, the expected trend in the Y axis will not be realized for the model to work. I am as well considering feeding these elements into a Neural Network. But again the Mapping of the String to Numeric values becomes very important. In my opinion the mapping function would still need to apply the same sorting logic in a similar way. I would highly appreciate any insights about this problem
python sorting neural-network regression levenshtein-distance
Compute the matrix of distances between your words (if that defines a similarity measure), do a 1D PCA, and use the ordering on the single coordinates (knowing that you can have warps, so if you don't want them, you may want to do 2D or more and find a 1D curve int he embedded space).
– Matthieu Brucher
Nov 22 at 17:01
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am looking to Sort elements in a List or Array based on their visual similarity. For example for the Strings below:
MyStrings=["Hello","Reed","Hell","Olleh","Red","Hello2"]
I would expect the sorted List to be:
["Hello","Hell","Hello2","Red","Reed","Olleh"]
Notice that my wish is fulfilled as long as similar elements are next to each other. For example below could work totally fine as well:
["Red","Reed","Hell","Hello","Hello2,"Olleh"]
I know that Levenshtein Distance function in combination of Sort function could be helpful, but I am open to any other suggestions.
I would give extra context but you can safely ignore these if it unnecessarily complicates the matter. These Strings are attributes of an item that will sit at the X axis of a regression model. Since they are not of numeric nature, I need to have a logical way to sort and place them at the X axis. If they are listed randomly, the expected trend in the Y axis will not be realized for the model to work. I am as well considering feeding these elements into a Neural Network. But again the Mapping of the String to Numeric values becomes very important. In my opinion the mapping function would still need to apply the same sorting logic in a similar way. I would highly appreciate any insights about this problem
python sorting neural-network regression levenshtein-distance
I am looking to Sort elements in a List or Array based on their visual similarity. For example for the Strings below:
MyStrings=["Hello","Reed","Hell","Olleh","Red","Hello2"]
I would expect the sorted List to be:
["Hello","Hell","Hello2","Red","Reed","Olleh"]
Notice that my wish is fulfilled as long as similar elements are next to each other. For example below could work totally fine as well:
["Red","Reed","Hell","Hello","Hello2,"Olleh"]
I know that Levenshtein Distance function in combination of Sort function could be helpful, but I am open to any other suggestions.
I would give extra context but you can safely ignore these if it unnecessarily complicates the matter. These Strings are attributes of an item that will sit at the X axis of a regression model. Since they are not of numeric nature, I need to have a logical way to sort and place them at the X axis. If they are listed randomly, the expected trend in the Y axis will not be realized for the model to work. I am as well considering feeding these elements into a Neural Network. But again the Mapping of the String to Numeric values becomes very important. In my opinion the mapping function would still need to apply the same sorting logic in a similar way. I would highly appreciate any insights about this problem
python sorting neural-network regression levenshtein-distance
python sorting neural-network regression levenshtein-distance
edited Nov 22 at 16:59
Matthieu Brucher
10.1k21935
10.1k21935
asked Nov 22 at 16:57
Saman
32
32
Compute the matrix of distances between your words (if that defines a similarity measure), do a 1D PCA, and use the ordering on the single coordinates (knowing that you can have warps, so if you don't want them, you may want to do 2D or more and find a 1D curve int he embedded space).
– Matthieu Brucher
Nov 22 at 17:01
add a comment |
Compute the matrix of distances between your words (if that defines a similarity measure), do a 1D PCA, and use the ordering on the single coordinates (knowing that you can have warps, so if you don't want them, you may want to do 2D or more and find a 1D curve int he embedded space).
– Matthieu Brucher
Nov 22 at 17:01
Compute the matrix of distances between your words (if that defines a similarity measure), do a 1D PCA, and use the ordering on the single coordinates (knowing that you can have warps, so if you don't want them, you may want to do 2D or more and find a 1D curve int he embedded space).
– Matthieu Brucher
Nov 22 at 17:01
Compute the matrix of distances between your words (if that defines a similarity measure), do a 1D PCA, and use the ordering on the single coordinates (knowing that you can have warps, so if you don't want them, you may want to do 2D or more and find a 1D curve int he embedded space).
– Matthieu Brucher
Nov 22 at 17:01
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435465%2fsorting-strings-in-python-based-on-visual-similarity%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435465%2fsorting-strings-in-python-based-on-visual-similarity%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Compute the matrix of distances between your words (if that defines a similarity measure), do a 1D PCA, and use the ordering on the single coordinates (knowing that you can have warps, so if you don't want them, you may want to do 2D or more and find a 1D curve int he embedded space).
– Matthieu Brucher
Nov 22 at 17:01