Parsing string into tokens

up vote
0
down vote

favorite

I have a program that takes incoming text converts it to type Reader and returns the next token, be it a word, or a space (non-word). It is not behaving as expected.

To be as specific as possible, here is my testing infrastructure in Eclipse using JUnit4:

@Test

    public void testGetNextTokenWord() throws IOException {

        Reader in = new StringReader("Aren't you ntired"); 

        TokenScanner d = new TokenScanner(in);

        try {

            assertTrue("has next", d.hasNext());

            assertEquals("Aren't", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals(" ", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals("you", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals(" n", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals("tired", d.next());



            assertFalse("reached end of stream", d.hasNext());

        } finally {

            in.close();

        }

    }

I will post complete code to facilitate assistance on this problem and then post expected and observed behavior:

//Reads as much to determine hasNext() and next()

    public TokenScanner(java.io.Reader in) throws IOException {



        //Throw exception if null

        if (in == null) {

            throw new IllegalArgumentException();

        }



        //Read in token

        try {   



            System.out.println("TokenScanner!");

            //Create new token scanner for argued reader

            this.tokenScanner = in;



            //Read next character

            ch = tokenScanner.read();

        }



        //Throw exception if error in reading

        catch (IOException e){

            ch = -1;

        }    

    }



//Determines whether the argued character is a valid word character.

    public static boolean isWordCharacter(int c) {



        //Cast int character to a char

        char character = (char)c;



        //Return true if character is valid word character

        if(Character.isLetter(character) || character == ''') {

            return true;    

        }



        //Return false otherwise

        return false;

    }



//Determine whether another token is avaialble

    public boolean hasNext() {



        //Leverage invariant

        return ch != -1 ;

    }

And the function where alot of my headache is coming from (potentially)

//Determine next token

    public String next() {



        //End of stream reached

        if(!hasNext()) {

            throw new NoSuchElementException();

        }



        //Initialize variable to hold token

        String word = "";



        try {



            //Character is a word character

            while(isWordCharacter(ch)) {

                word = word + (char)ch;

                ch = tokenScanner.read();



            }



            //Character is a space

            while(!Character.isWhitespace(ch)) {

                word = word + (char)ch;

                ch = tokenScanner.read();



            }           



            System.out.println("Word is: "+ word);

            return word;

        }



        //Exception catching

        catch(Exception e) {



            throw new NoSuchElementException();



        }   

    }

The expected output given the testing infrastructure above is:

TokenScanner!

Word is: Aren't

Word is: you

Word is: /*Not sure how to represent newline in output*/

Word is: tired

The actual output below is:

TokenScanner!

Word is: Aren't

Word is:

The question here is why is this happening?

My output shows that the first test to fail is:

assertEquals(" ", d.next());

The fundamental issue here is how I'm representing non-words (spaces). The last test fails as well. Any help here is appreciated!

edited Nov 22 at 21:47

asked Nov 22 at 16:52

J_code

425

What's wrong with regular expressions?
– Perdi Estaquel
Nov 23 at 3:04

add a comment |

up vote
0
down vote

favorite

I have a program that takes incoming text converts it to type Reader and returns the next token, be it a word, or a space (non-word). It is not behaving as expected.

To be as specific as possible, here is my testing infrastructure in Eclipse using JUnit4:

@Test

    public void testGetNextTokenWord() throws IOException {

        Reader in = new StringReader("Aren't you ntired"); 

        TokenScanner d = new TokenScanner(in);

        try {

            assertTrue("has next", d.hasNext());

            assertEquals("Aren't", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals(" ", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals("you", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals(" n", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals("tired", d.next());



            assertFalse("reached end of stream", d.hasNext());

        } finally {

            in.close();

        }

    }

I will post complete code to facilitate assistance on this problem and then post expected and observed behavior:

//Reads as much to determine hasNext() and next()

    public TokenScanner(java.io.Reader in) throws IOException {



        //Throw exception if null

        if (in == null) {

            throw new IllegalArgumentException();

        }



        //Read in token

        try {   



            System.out.println("TokenScanner!");

            //Create new token scanner for argued reader

            this.tokenScanner = in;



            //Read next character

            ch = tokenScanner.read();

        }



        //Throw exception if error in reading

        catch (IOException e){

            ch = -1;

        }    

    }



//Determines whether the argued character is a valid word character.

    public static boolean isWordCharacter(int c) {



        //Cast int character to a char

        char character = (char)c;



        //Return true if character is valid word character

        if(Character.isLetter(character) || character == ''') {

            return true;    

        }



        //Return false otherwise

        return false;

    }



//Determine whether another token is avaialble

    public boolean hasNext() {



        //Leverage invariant

        return ch != -1 ;

    }

And the function where alot of my headache is coming from (potentially)

//Determine next token

    public String next() {



        //End of stream reached

        if(!hasNext()) {

            throw new NoSuchElementException();

        }



        //Initialize variable to hold token

        String word = "";



        try {



            //Character is a word character

            while(isWordCharacter(ch)) {

                word = word + (char)ch;

                ch = tokenScanner.read();



            }



            //Character is a space

            while(!Character.isWhitespace(ch)) {

                word = word + (char)ch;

                ch = tokenScanner.read();



            }           



            System.out.println("Word is: "+ word);

            return word;

        }



        //Exception catching

        catch(Exception e) {



            throw new NoSuchElementException();



        }   

    }

The expected output given the testing infrastructure above is:

TokenScanner!

Word is: Aren't

Word is: you

Word is: /*Not sure how to represent newline in output*/

Word is: tired

The actual output below is:

TokenScanner!

Word is: Aren't

Word is:

The question here is why is this happening?

My output shows that the first test to fail is:

assertEquals(" ", d.next());

The fundamental issue here is how I'm representing non-words (spaces). The last test fails as well. Any help here is appreciated!

edited Nov 22 at 21:47

asked Nov 22 at 16:52

J_code

425

What's wrong with regular expressions?
– Perdi Estaquel
Nov 23 at 3:04

add a comment |

up vote
0
down vote

favorite

I have a program that takes incoming text converts it to type Reader and returns the next token, be it a word, or a space (non-word). It is not behaving as expected.

To be as specific as possible, here is my testing infrastructure in Eclipse using JUnit4:

@Test

    public void testGetNextTokenWord() throws IOException {

        Reader in = new StringReader("Aren't you ntired"); 

        TokenScanner d = new TokenScanner(in);

        try {

            assertTrue("has next", d.hasNext());

            assertEquals("Aren't", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals(" ", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals("you", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals(" n", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals("tired", d.next());



            assertFalse("reached end of stream", d.hasNext());

        } finally {

            in.close();

        }

    }

I will post complete code to facilitate assistance on this problem and then post expected and observed behavior:

//Reads as much to determine hasNext() and next()

    public TokenScanner(java.io.Reader in) throws IOException {



        //Throw exception if null

        if (in == null) {

            throw new IllegalArgumentException();

        }



        //Read in token

        try {   



            System.out.println("TokenScanner!");

            //Create new token scanner for argued reader

            this.tokenScanner = in;



            //Read next character

            ch = tokenScanner.read();

        }



        //Throw exception if error in reading

        catch (IOException e){

            ch = -1;

        }    

    }



//Determines whether the argued character is a valid word character.

    public static boolean isWordCharacter(int c) {



        //Cast int character to a char

        char character = (char)c;



        //Return true if character is valid word character

        if(Character.isLetter(character) || character == ''') {

            return true;    

        }



        //Return false otherwise

        return false;

    }



//Determine whether another token is avaialble

    public boolean hasNext() {



        //Leverage invariant

        return ch != -1 ;

    }

And the function where alot of my headache is coming from (potentially)

//Determine next token

    public String next() {



        //End of stream reached

        if(!hasNext()) {

            throw new NoSuchElementException();

        }



        //Initialize variable to hold token

        String word = "";



        try {



            //Character is a word character

            while(isWordCharacter(ch)) {

                word = word + (char)ch;

                ch = tokenScanner.read();



            }



            //Character is a space

            while(!Character.isWhitespace(ch)) {

                word = word + (char)ch;

                ch = tokenScanner.read();



            }           



            System.out.println("Word is: "+ word);

            return word;

        }



        //Exception catching

        catch(Exception e) {



            throw new NoSuchElementException();



        }   

    }

The expected output given the testing infrastructure above is:

TokenScanner!

Word is: Aren't

Word is: you

Word is: /*Not sure how to represent newline in output*/

Word is: tired

The actual output below is:

TokenScanner!

Word is: Aren't

Word is:

The question here is why is this happening?

My output shows that the first test to fail is:

assertEquals(" ", d.next());

The fundamental issue here is how I'm representing non-words (spaces). The last test fails as well. Any help here is appreciated!

edited Nov 22 at 21:47

asked Nov 22 at 16:52

J_code

425

I have a program that takes incoming text converts it to type Reader and returns the next token, be it a word, or a space (non-word). It is not behaving as expected.

To be as specific as possible, here is my testing infrastructure in Eclipse using JUnit4:

@Test

    public void testGetNextTokenWord() throws IOException {

        Reader in = new StringReader("Aren't you ntired"); 

        TokenScanner d = new TokenScanner(in);

        try {

            assertTrue("has next", d.hasNext());

            assertEquals("Aren't", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals(" ", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals("you", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals(" n", d.next());

            assertTrue("has next", d.hasNext());

            assertEquals("tired", d.next());



            assertFalse("reached end of stream", d.hasNext());

        } finally {

            in.close();

        }

    }

I will post complete code to facilitate assistance on this problem and then post expected and observed behavior:

//Reads as much to determine hasNext() and next()

    public TokenScanner(java.io.Reader in) throws IOException {



        //Throw exception if null

        if (in == null) {

            throw new IllegalArgumentException();

        }



        //Read in token

        try {   



            System.out.println("TokenScanner!");

            //Create new token scanner for argued reader

            this.tokenScanner = in;



            //Read next character

            ch = tokenScanner.read();

        }



        //Throw exception if error in reading

        catch (IOException e){

            ch = -1;

        }    

    }



//Determines whether the argued character is a valid word character.

    public static boolean isWordCharacter(int c) {



        //Cast int character to a char

        char character = (char)c;



        //Return true if character is valid word character

        if(Character.isLetter(character) || character == ''') {

            return true;    

        }



        //Return false otherwise

        return false;

    }



//Determine whether another token is avaialble

    public boolean hasNext() {



        //Leverage invariant

        return ch != -1 ;

    }

And the function where alot of my headache is coming from (potentially)

//Determine next token

    public String next() {



        //End of stream reached

        if(!hasNext()) {

            throw new NoSuchElementException();

        }



        //Initialize variable to hold token

        String word = "";



        try {



            //Character is a word character

            while(isWordCharacter(ch)) {

                word = word + (char)ch;

                ch = tokenScanner.read();



            }



            //Character is a space

            while(!Character.isWhitespace(ch)) {

                word = word + (char)ch;

                ch = tokenScanner.read();



            }           



            System.out.println("Word is: "+ word);

            return word;

        }



        //Exception catching

        catch(Exception e) {



            throw new NoSuchElementException();



        }   

    }

The expected output given the testing infrastructure above is:

TokenScanner!

Word is: Aren't

Word is: you

Word is: /*Not sure how to represent newline in output*/

Word is: tired

The actual output below is:

TokenScanner!

Word is: Aren't

Word is:

The question here is why is this happening?

My output shows that the first test to fail is:

assertEquals(" ", d.next());

The fundamental issue here is how I'm representing non-words (spaces). The last test fails as well. Any help here is appreciated!

java token junit4

edited Nov 22 at 21:47

asked Nov 22 at 16:52

J_code

425

edited Nov 22 at 21:47

asked Nov 22 at 16:52

J_code

425

edited Nov 22 at 21:47

asked Nov 22 at 16:52

J_code

425

asked Nov 22 at 16:52

J_code

425

asked Nov 22 at 16:52

J_code

425

What's wrong with regular expressions?
– Perdi Estaquel
Nov 23 at 3:04

add a comment |

What's wrong with regular expressions?
– Perdi Estaquel
Nov 23 at 3:04

What's wrong with regular expressions?
– Perdi Estaquel
Nov 23 at 3:04

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435389%2fparsing-string-into-tokens%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

iy7m c O,I0Oc dO2K1jav,m 8G

搜尋此網誌

Qfyilyi