Split a paragraph into a list of strings, each not exceeding a given size and avoiding splitting words in...











up vote
2
down vote

favorite












Question



How can the following be done in an idiomatic way:
Split a large String into a list of Strings, each not exceeding the given size, and avoiding splitting words in half.



Closest solution with String.chunked() (Splits words)



The closest solution to this is using the String class's chunked() method. However, the problem with this is that
it splits words in the given String.



Code example of use of String.chunked()



val longString = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod " +
"tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, " +
"quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo " +
"consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse " +
"cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non " +
"proident, sunt in culpa qui officia deserunt mollit anim id est laborum. "

// Split [longString] into list
var listOfStrings = longString.chunked(40)
listOfStrings.forEach {
println(it)
}


Example output of closest example with String.chunked()



Below is the output received by running the example code provided. As can be seen, the words are split at the end of the lines.



Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna ali
qua. Ut enim ad minim veniam, quis nostr
ud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis au
te irure dolor in reprehenderit in volup
tate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qu
i officia deserunt mollit anim id est la
borum.









share|improve this question




















  • 1




    Dunno about kotlin, but I would first split into single words, then reassemble words into strings that do not exceed limit
    – ravenspoint
    Nov 22 at 15:36















up vote
2
down vote

favorite












Question



How can the following be done in an idiomatic way:
Split a large String into a list of Strings, each not exceeding the given size, and avoiding splitting words in half.



Closest solution with String.chunked() (Splits words)



The closest solution to this is using the String class's chunked() method. However, the problem with this is that
it splits words in the given String.



Code example of use of String.chunked()



val longString = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod " +
"tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, " +
"quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo " +
"consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse " +
"cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non " +
"proident, sunt in culpa qui officia deserunt mollit anim id est laborum. "

// Split [longString] into list
var listOfStrings = longString.chunked(40)
listOfStrings.forEach {
println(it)
}


Example output of closest example with String.chunked()



Below is the output received by running the example code provided. As can be seen, the words are split at the end of the lines.



Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna ali
qua. Ut enim ad minim veniam, quis nostr
ud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis au
te irure dolor in reprehenderit in volup
tate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qu
i officia deserunt mollit anim id est la
borum.









share|improve this question




















  • 1




    Dunno about kotlin, but I would first split into single words, then reassemble words into strings that do not exceed limit
    – ravenspoint
    Nov 22 at 15:36













up vote
2
down vote

favorite









up vote
2
down vote

favorite











Question



How can the following be done in an idiomatic way:
Split a large String into a list of Strings, each not exceeding the given size, and avoiding splitting words in half.



Closest solution with String.chunked() (Splits words)



The closest solution to this is using the String class's chunked() method. However, the problem with this is that
it splits words in the given String.



Code example of use of String.chunked()



val longString = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod " +
"tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, " +
"quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo " +
"consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse " +
"cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non " +
"proident, sunt in culpa qui officia deserunt mollit anim id est laborum. "

// Split [longString] into list
var listOfStrings = longString.chunked(40)
listOfStrings.forEach {
println(it)
}


Example output of closest example with String.chunked()



Below is the output received by running the example code provided. As can be seen, the words are split at the end of the lines.



Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna ali
qua. Ut enim ad minim veniam, quis nostr
ud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis au
te irure dolor in reprehenderit in volup
tate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qu
i officia deserunt mollit anim id est la
borum.









share|improve this question















Question



How can the following be done in an idiomatic way:
Split a large String into a list of Strings, each not exceeding the given size, and avoiding splitting words in half.



Closest solution with String.chunked() (Splits words)



The closest solution to this is using the String class's chunked() method. However, the problem with this is that
it splits words in the given String.



Code example of use of String.chunked()



val longString = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod " +
"tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, " +
"quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo " +
"consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse " +
"cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non " +
"proident, sunt in culpa qui officia deserunt mollit anim id est laborum. "

// Split [longString] into list
var listOfStrings = longString.chunked(40)
listOfStrings.forEach {
println(it)
}


Example output of closest example with String.chunked()



Below is the output received by running the example code provided. As can be seen, the words are split at the end of the lines.



Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna ali
qua. Ut enim ad minim veniam, quis nostr
ud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis au
te irure dolor in reprehenderit in volup
tate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qu
i officia deserunt mollit anim id est la
borum.






string kotlin






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 at 16:07









Poul Bak

5,42831132




5,42831132










asked Nov 22 at 15:32









AliAvci

692713




692713








  • 1




    Dunno about kotlin, but I would first split into single words, then reassemble words into strings that do not exceed limit
    – ravenspoint
    Nov 22 at 15:36














  • 1




    Dunno about kotlin, but I would first split into single words, then reassemble words into strings that do not exceed limit
    – ravenspoint
    Nov 22 at 15:36








1




1




Dunno about kotlin, but I would first split into single words, then reassemble words into strings that do not exceed limit
– ravenspoint
Nov 22 at 15:36




Dunno about kotlin, but I would first split into single words, then reassemble words into strings that do not exceed limit
– ravenspoint
Nov 22 at 15:36












2 Answers
2






active

oldest

votes

















up vote
3
down vote



accepted










Not really the most idiomatic way I found, but maybe it suffices your needs:



fun String.chunkedWords(limitChars: Int,
delimiter: Char = ' ',
joinCharacter: Char = 'n') =
splitToSequence(delimiter)
.reduce { cumulatedString, word ->
val exceedsSize = cumulatedString.length - cumulatedString.indexOfLast { it == joinCharacter } + "$delimiter$word".length > limitChars
cumulatedString + if (exceedsSize) {
joinCharacter
} else {
delimiter
} + word
}


You can then use it as follows:



longText.chunkedWords(40).run(::println)


which for your given string would then print:



Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit
in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint
occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim
id est laborum.


You could also split it to lines from there, e.g. longText.chunkedWords(40).splitAsSequence("n"). Note that it also splits nicely if there are already new-line characters in the string, i.e. if you have a String like "Testing shorter lines.nAnd now there comes a very long line" a call of .chunkedWords(17) will produce the following output:



Testing shorter
lines.
And now there // this tries to use the whole 17 characters again
comes a very
long line





share|improve this answer























  • This is really nice
    – AliAvci
    Nov 22 at 17:39


















up vote
1
down vote













You could use this simple helper function:



fun splitIntoChunks(max: Int, string: String): List<String> = ArrayList<String>(string.length / max + 1).also {
var firstWord = true
val builder = StringBuilder()

// split string by whitespace
for (word in string.split(Regex("( |n|r|nr)+"))) {
// if the current string exceeds the max size
if (builder.length + word.length > max) {
// then we add the string to the list and clear the builder
it.add(builder.toString())
builder.setLength(0)
firstWord = true
}
// append a space at the beginning of each word, except the first one
if (firstWord) firstWord = false else builder.append(' ')
builder.append(word)
}

// add the last collected part if there was any
if(builder.isNotEmpty()){
it.add(builder.toString())
}
}


Which then can be called simply like this:



val chunks: List<String> = splitIntoChunks(20, longString)





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53434185%2fsplit-a-paragraph-into-a-list-of-strings-each-not-exceeding-a-given-size-and-av%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    3
    down vote



    accepted










    Not really the most idiomatic way I found, but maybe it suffices your needs:



    fun String.chunkedWords(limitChars: Int,
    delimiter: Char = ' ',
    joinCharacter: Char = 'n') =
    splitToSequence(delimiter)
    .reduce { cumulatedString, word ->
    val exceedsSize = cumulatedString.length - cumulatedString.indexOfLast { it == joinCharacter } + "$delimiter$word".length > limitChars
    cumulatedString + if (exceedsSize) {
    joinCharacter
    } else {
    delimiter
    } + word
    }


    You can then use it as follows:



    longText.chunkedWords(40).run(::println)


    which for your given string would then print:



    Lorem ipsum dolor sit amet, consectetur
    adipisicing elit, sed do eiusmod tempor
    incididunt ut labore et dolore magna
    aliqua. Ut enim ad minim veniam, quis
    nostrud exercitation ullamco laboris
    nisi ut aliquip ex ea commodo consequat.
    Duis aute irure dolor in reprehenderit
    in voluptate velit esse cillum dolore eu
    fugiat nulla pariatur. Excepteur sint
    occaecat cupidatat non proident, sunt in
    culpa qui officia deserunt mollit anim
    id est laborum.


    You could also split it to lines from there, e.g. longText.chunkedWords(40).splitAsSequence("n"). Note that it also splits nicely if there are already new-line characters in the string, i.e. if you have a String like "Testing shorter lines.nAnd now there comes a very long line" a call of .chunkedWords(17) will produce the following output:



    Testing shorter
    lines.
    And now there // this tries to use the whole 17 characters again
    comes a very
    long line





    share|improve this answer























    • This is really nice
      – AliAvci
      Nov 22 at 17:39















    up vote
    3
    down vote



    accepted










    Not really the most idiomatic way I found, but maybe it suffices your needs:



    fun String.chunkedWords(limitChars: Int,
    delimiter: Char = ' ',
    joinCharacter: Char = 'n') =
    splitToSequence(delimiter)
    .reduce { cumulatedString, word ->
    val exceedsSize = cumulatedString.length - cumulatedString.indexOfLast { it == joinCharacter } + "$delimiter$word".length > limitChars
    cumulatedString + if (exceedsSize) {
    joinCharacter
    } else {
    delimiter
    } + word
    }


    You can then use it as follows:



    longText.chunkedWords(40).run(::println)


    which for your given string would then print:



    Lorem ipsum dolor sit amet, consectetur
    adipisicing elit, sed do eiusmod tempor
    incididunt ut labore et dolore magna
    aliqua. Ut enim ad minim veniam, quis
    nostrud exercitation ullamco laboris
    nisi ut aliquip ex ea commodo consequat.
    Duis aute irure dolor in reprehenderit
    in voluptate velit esse cillum dolore eu
    fugiat nulla pariatur. Excepteur sint
    occaecat cupidatat non proident, sunt in
    culpa qui officia deserunt mollit anim
    id est laborum.


    You could also split it to lines from there, e.g. longText.chunkedWords(40).splitAsSequence("n"). Note that it also splits nicely if there are already new-line characters in the string, i.e. if you have a String like "Testing shorter lines.nAnd now there comes a very long line" a call of .chunkedWords(17) will produce the following output:



    Testing shorter
    lines.
    And now there // this tries to use the whole 17 characters again
    comes a very
    long line





    share|improve this answer























    • This is really nice
      – AliAvci
      Nov 22 at 17:39













    up vote
    3
    down vote



    accepted







    up vote
    3
    down vote



    accepted






    Not really the most idiomatic way I found, but maybe it suffices your needs:



    fun String.chunkedWords(limitChars: Int,
    delimiter: Char = ' ',
    joinCharacter: Char = 'n') =
    splitToSequence(delimiter)
    .reduce { cumulatedString, word ->
    val exceedsSize = cumulatedString.length - cumulatedString.indexOfLast { it == joinCharacter } + "$delimiter$word".length > limitChars
    cumulatedString + if (exceedsSize) {
    joinCharacter
    } else {
    delimiter
    } + word
    }


    You can then use it as follows:



    longText.chunkedWords(40).run(::println)


    which for your given string would then print:



    Lorem ipsum dolor sit amet, consectetur
    adipisicing elit, sed do eiusmod tempor
    incididunt ut labore et dolore magna
    aliqua. Ut enim ad minim veniam, quis
    nostrud exercitation ullamco laboris
    nisi ut aliquip ex ea commodo consequat.
    Duis aute irure dolor in reprehenderit
    in voluptate velit esse cillum dolore eu
    fugiat nulla pariatur. Excepteur sint
    occaecat cupidatat non proident, sunt in
    culpa qui officia deserunt mollit anim
    id est laborum.


    You could also split it to lines from there, e.g. longText.chunkedWords(40).splitAsSequence("n"). Note that it also splits nicely if there are already new-line characters in the string, i.e. if you have a String like "Testing shorter lines.nAnd now there comes a very long line" a call of .chunkedWords(17) will produce the following output:



    Testing shorter
    lines.
    And now there // this tries to use the whole 17 characters again
    comes a very
    long line





    share|improve this answer














    Not really the most idiomatic way I found, but maybe it suffices your needs:



    fun String.chunkedWords(limitChars: Int,
    delimiter: Char = ' ',
    joinCharacter: Char = 'n') =
    splitToSequence(delimiter)
    .reduce { cumulatedString, word ->
    val exceedsSize = cumulatedString.length - cumulatedString.indexOfLast { it == joinCharacter } + "$delimiter$word".length > limitChars
    cumulatedString + if (exceedsSize) {
    joinCharacter
    } else {
    delimiter
    } + word
    }


    You can then use it as follows:



    longText.chunkedWords(40).run(::println)


    which for your given string would then print:



    Lorem ipsum dolor sit amet, consectetur
    adipisicing elit, sed do eiusmod tempor
    incididunt ut labore et dolore magna
    aliqua. Ut enim ad minim veniam, quis
    nostrud exercitation ullamco laboris
    nisi ut aliquip ex ea commodo consequat.
    Duis aute irure dolor in reprehenderit
    in voluptate velit esse cillum dolore eu
    fugiat nulla pariatur. Excepteur sint
    occaecat cupidatat non proident, sunt in
    culpa qui officia deserunt mollit anim
    id est laborum.


    You could also split it to lines from there, e.g. longText.chunkedWords(40).splitAsSequence("n"). Note that it also splits nicely if there are already new-line characters in the string, i.e. if you have a String like "Testing shorter lines.nAnd now there comes a very long line" a call of .chunkedWords(17) will produce the following output:



    Testing shorter
    lines.
    And now there // this tries to use the whole 17 characters again
    comes a very
    long line






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 22 at 17:13

























    answered Nov 22 at 16:27









    Roland

    9,16711141




    9,16711141












    • This is really nice
      – AliAvci
      Nov 22 at 17:39


















    • This is really nice
      – AliAvci
      Nov 22 at 17:39
















    This is really nice
    – AliAvci
    Nov 22 at 17:39




    This is really nice
    – AliAvci
    Nov 22 at 17:39












    up vote
    1
    down vote













    You could use this simple helper function:



    fun splitIntoChunks(max: Int, string: String): List<String> = ArrayList<String>(string.length / max + 1).also {
    var firstWord = true
    val builder = StringBuilder()

    // split string by whitespace
    for (word in string.split(Regex("( |n|r|nr)+"))) {
    // if the current string exceeds the max size
    if (builder.length + word.length > max) {
    // then we add the string to the list and clear the builder
    it.add(builder.toString())
    builder.setLength(0)
    firstWord = true
    }
    // append a space at the beginning of each word, except the first one
    if (firstWord) firstWord = false else builder.append(' ')
    builder.append(word)
    }

    // add the last collected part if there was any
    if(builder.isNotEmpty()){
    it.add(builder.toString())
    }
    }


    Which then can be called simply like this:



    val chunks: List<String> = splitIntoChunks(20, longString)





    share|improve this answer



























      up vote
      1
      down vote













      You could use this simple helper function:



      fun splitIntoChunks(max: Int, string: String): List<String> = ArrayList<String>(string.length / max + 1).also {
      var firstWord = true
      val builder = StringBuilder()

      // split string by whitespace
      for (word in string.split(Regex("( |n|r|nr)+"))) {
      // if the current string exceeds the max size
      if (builder.length + word.length > max) {
      // then we add the string to the list and clear the builder
      it.add(builder.toString())
      builder.setLength(0)
      firstWord = true
      }
      // append a space at the beginning of each word, except the first one
      if (firstWord) firstWord = false else builder.append(' ')
      builder.append(word)
      }

      // add the last collected part if there was any
      if(builder.isNotEmpty()){
      it.add(builder.toString())
      }
      }


      Which then can be called simply like this:



      val chunks: List<String> = splitIntoChunks(20, longString)





      share|improve this answer

























        up vote
        1
        down vote










        up vote
        1
        down vote









        You could use this simple helper function:



        fun splitIntoChunks(max: Int, string: String): List<String> = ArrayList<String>(string.length / max + 1).also {
        var firstWord = true
        val builder = StringBuilder()

        // split string by whitespace
        for (word in string.split(Regex("( |n|r|nr)+"))) {
        // if the current string exceeds the max size
        if (builder.length + word.length > max) {
        // then we add the string to the list and clear the builder
        it.add(builder.toString())
        builder.setLength(0)
        firstWord = true
        }
        // append a space at the beginning of each word, except the first one
        if (firstWord) firstWord = false else builder.append(' ')
        builder.append(word)
        }

        // add the last collected part if there was any
        if(builder.isNotEmpty()){
        it.add(builder.toString())
        }
        }


        Which then can be called simply like this:



        val chunks: List<String> = splitIntoChunks(20, longString)





        share|improve this answer














        You could use this simple helper function:



        fun splitIntoChunks(max: Int, string: String): List<String> = ArrayList<String>(string.length / max + 1).also {
        var firstWord = true
        val builder = StringBuilder()

        // split string by whitespace
        for (word in string.split(Regex("( |n|r|nr)+"))) {
        // if the current string exceeds the max size
        if (builder.length + word.length > max) {
        // then we add the string to the list and clear the builder
        it.add(builder.toString())
        builder.setLength(0)
        firstWord = true
        }
        // append a space at the beginning of each word, except the first one
        if (firstWord) firstWord = false else builder.append(' ')
        builder.append(word)
        }

        // add the last collected part if there was any
        if(builder.isNotEmpty()){
        it.add(builder.toString())
        }
        }


        Which then can be called simply like this:



        val chunks: List<String> = splitIntoChunks(20, longString)






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 23 at 8:51









        Roland

        9,16711141




        9,16711141










        answered Nov 22 at 16:01









        Lino

        6,98221936




        6,98221936






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53434185%2fsplit-a-paragraph-into-a-list-of-strings-each-not-exceeding-a-given-size-and-av%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to ignore python UserWarning in pytest?

            What visual should I use to simply compare current year value vs last year in Power BI desktop

            Script to remove string up to first number