Sorting Strings in Python based on visual similarity











up vote
0
down vote

favorite












I am looking to Sort elements in a List or Array based on their visual similarity. For example for the Strings below:



MyStrings=["Hello","Reed","Hell","Olleh","Red","Hello2"]


I would expect the sorted List to be:



["Hello","Hell","Hello2","Red","Reed","Olleh"]


Notice that my wish is fulfilled as long as similar elements are next to each other. For example below could work totally fine as well:



["Red","Reed","Hell","Hello","Hello2,"Olleh"]


I know that Levenshtein Distance function in combination of Sort function could be helpful, but I am open to any other suggestions.



I would give extra context but you can safely ignore these if it unnecessarily complicates the matter. These Strings are attributes of an item that will sit at the X axis of a regression model. Since they are not of numeric nature, I need to have a logical way to sort and place them at the X axis. If they are listed randomly, the expected trend in the Y axis will not be realized for the model to work. I am as well considering feeding these elements into a Neural Network. But again the Mapping of the String to Numeric values becomes very important. In my opinion the mapping function would still need to apply the same sorting logic in a similar way. I would highly appreciate any insights about this problem










share|improve this question
























  • Compute the matrix of distances between your words (if that defines a similarity measure), do a 1D PCA, and use the ordering on the single coordinates (knowing that you can have warps, so if you don't want them, you may want to do 2D or more and find a 1D curve int he embedded space).
    – Matthieu Brucher
    Nov 22 at 17:01

















up vote
0
down vote

favorite












I am looking to Sort elements in a List or Array based on their visual similarity. For example for the Strings below:



MyStrings=["Hello","Reed","Hell","Olleh","Red","Hello2"]


I would expect the sorted List to be:



["Hello","Hell","Hello2","Red","Reed","Olleh"]


Notice that my wish is fulfilled as long as similar elements are next to each other. For example below could work totally fine as well:



["Red","Reed","Hell","Hello","Hello2,"Olleh"]


I know that Levenshtein Distance function in combination of Sort function could be helpful, but I am open to any other suggestions.



I would give extra context but you can safely ignore these if it unnecessarily complicates the matter. These Strings are attributes of an item that will sit at the X axis of a regression model. Since they are not of numeric nature, I need to have a logical way to sort and place them at the X axis. If they are listed randomly, the expected trend in the Y axis will not be realized for the model to work. I am as well considering feeding these elements into a Neural Network. But again the Mapping of the String to Numeric values becomes very important. In my opinion the mapping function would still need to apply the same sorting logic in a similar way. I would highly appreciate any insights about this problem










share|improve this question
























  • Compute the matrix of distances between your words (if that defines a similarity measure), do a 1D PCA, and use the ordering on the single coordinates (knowing that you can have warps, so if you don't want them, you may want to do 2D or more and find a 1D curve int he embedded space).
    – Matthieu Brucher
    Nov 22 at 17:01















up vote
0
down vote

favorite









up vote
0
down vote

favorite











I am looking to Sort elements in a List or Array based on their visual similarity. For example for the Strings below:



MyStrings=["Hello","Reed","Hell","Olleh","Red","Hello2"]


I would expect the sorted List to be:



["Hello","Hell","Hello2","Red","Reed","Olleh"]


Notice that my wish is fulfilled as long as similar elements are next to each other. For example below could work totally fine as well:



["Red","Reed","Hell","Hello","Hello2,"Olleh"]


I know that Levenshtein Distance function in combination of Sort function could be helpful, but I am open to any other suggestions.



I would give extra context but you can safely ignore these if it unnecessarily complicates the matter. These Strings are attributes of an item that will sit at the X axis of a regression model. Since they are not of numeric nature, I need to have a logical way to sort and place them at the X axis. If they are listed randomly, the expected trend in the Y axis will not be realized for the model to work. I am as well considering feeding these elements into a Neural Network. But again the Mapping of the String to Numeric values becomes very important. In my opinion the mapping function would still need to apply the same sorting logic in a similar way. I would highly appreciate any insights about this problem










share|improve this question















I am looking to Sort elements in a List or Array based on their visual similarity. For example for the Strings below:



MyStrings=["Hello","Reed","Hell","Olleh","Red","Hello2"]


I would expect the sorted List to be:



["Hello","Hell","Hello2","Red","Reed","Olleh"]


Notice that my wish is fulfilled as long as similar elements are next to each other. For example below could work totally fine as well:



["Red","Reed","Hell","Hello","Hello2,"Olleh"]


I know that Levenshtein Distance function in combination of Sort function could be helpful, but I am open to any other suggestions.



I would give extra context but you can safely ignore these if it unnecessarily complicates the matter. These Strings are attributes of an item that will sit at the X axis of a regression model. Since they are not of numeric nature, I need to have a logical way to sort and place them at the X axis. If they are listed randomly, the expected trend in the Y axis will not be realized for the model to work. I am as well considering feeding these elements into a Neural Network. But again the Mapping of the String to Numeric values becomes very important. In my opinion the mapping function would still need to apply the same sorting logic in a similar way. I would highly appreciate any insights about this problem







python sorting neural-network regression levenshtein-distance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 at 16:59









Matthieu Brucher

10.1k21935




10.1k21935










asked Nov 22 at 16:57









Saman

32




32












  • Compute the matrix of distances between your words (if that defines a similarity measure), do a 1D PCA, and use the ordering on the single coordinates (knowing that you can have warps, so if you don't want them, you may want to do 2D or more and find a 1D curve int he embedded space).
    – Matthieu Brucher
    Nov 22 at 17:01




















  • Compute the matrix of distances between your words (if that defines a similarity measure), do a 1D PCA, and use the ordering on the single coordinates (knowing that you can have warps, so if you don't want them, you may want to do 2D or more and find a 1D curve int he embedded space).
    – Matthieu Brucher
    Nov 22 at 17:01


















Compute the matrix of distances between your words (if that defines a similarity measure), do a 1D PCA, and use the ordering on the single coordinates (knowing that you can have warps, so if you don't want them, you may want to do 2D or more and find a 1D curve int he embedded space).
– Matthieu Brucher
Nov 22 at 17:01






Compute the matrix of distances between your words (if that defines a similarity measure), do a 1D PCA, and use the ordering on the single coordinates (knowing that you can have warps, so if you don't want them, you may want to do 2D or more and find a 1D curve int he embedded space).
– Matthieu Brucher
Nov 22 at 17:01



















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435465%2fsorting-strings-in-python-based-on-visual-similarity%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435465%2fsorting-strings-in-python-based-on-visual-similarity%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to ignore python UserWarning in pytest?

What visual should I use to simply compare current year value vs last year in Power BI desktop

Script to remove string up to first number