Parsing string into tokens











up vote
0
down vote

favorite












I have a program that takes incoming text converts it to type Reader and returns the next token, be it a word, or a space (non-word). It is not behaving as expected.



To be as specific as possible, here is my testing infrastructure in Eclipse using JUnit4:



@Test
public void testGetNextTokenWord() throws IOException {
Reader in = new StringReader("Aren't you ntired");
TokenScanner d = new TokenScanner(in);
try {
assertTrue("has next", d.hasNext());
assertEquals("Aren't", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" ", d.next());
assertTrue("has next", d.hasNext());
assertEquals("you", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" n", d.next());
assertTrue("has next", d.hasNext());
assertEquals("tired", d.next());

assertFalse("reached end of stream", d.hasNext());
} finally {
in.close();
}
}


I will post complete code to facilitate assistance on this problem and then post expected and observed behavior:



//Reads as much to determine hasNext() and next()
public TokenScanner(java.io.Reader in) throws IOException {

//Throw exception if null
if (in == null) {
throw new IllegalArgumentException();
}

//Read in token
try {

System.out.println("TokenScanner!");
//Create new token scanner for argued reader
this.tokenScanner = in;

//Read next character
ch = tokenScanner.read();
}

//Throw exception if error in reading
catch (IOException e){
ch = -1;
}
}

//Determines whether the argued character is a valid word character.
public static boolean isWordCharacter(int c) {

//Cast int character to a char
char character = (char)c;

//Return true if character is valid word character
if(Character.isLetter(character) || character == ''') {
return true;
}

//Return false otherwise
return false;
}

//Determine whether another token is avaialble
public boolean hasNext() {

//Leverage invariant
return ch != -1 ;
}


And the function where alot of my headache is coming from (potentially)



//Determine next token
public String next() {

//End of stream reached
if(!hasNext()) {
throw new NoSuchElementException();
}

//Initialize variable to hold token
String word = "";

try {

//Character is a word character
while(isWordCharacter(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();

}

//Character is a space
while(!Character.isWhitespace(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();

}

System.out.println("Word is: "+ word);
return word;
}

//Exception catching
catch(Exception e) {

throw new NoSuchElementException();

}
}


The expected output given the testing infrastructure above is:



TokenScanner!
Word is: Aren't
Word is: you
Word is: /*Not sure how to represent newline in output*/
Word is: tired


The actual output below is:



TokenScanner!
Word is: Aren't
Word is:


The question here is why is this happening?



My output shows that the first test to fail is:



assertEquals(" ", d.next());


The fundamental issue here is how I'm representing non-words (spaces). The last test fails as well. Any help here is appreciated!










share|improve this question
























  • What's wrong with regular expressions?
    – Perdi Estaquel
    Nov 23 at 3:04















up vote
0
down vote

favorite












I have a program that takes incoming text converts it to type Reader and returns the next token, be it a word, or a space (non-word). It is not behaving as expected.



To be as specific as possible, here is my testing infrastructure in Eclipse using JUnit4:



@Test
public void testGetNextTokenWord() throws IOException {
Reader in = new StringReader("Aren't you ntired");
TokenScanner d = new TokenScanner(in);
try {
assertTrue("has next", d.hasNext());
assertEquals("Aren't", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" ", d.next());
assertTrue("has next", d.hasNext());
assertEquals("you", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" n", d.next());
assertTrue("has next", d.hasNext());
assertEquals("tired", d.next());

assertFalse("reached end of stream", d.hasNext());
} finally {
in.close();
}
}


I will post complete code to facilitate assistance on this problem and then post expected and observed behavior:



//Reads as much to determine hasNext() and next()
public TokenScanner(java.io.Reader in) throws IOException {

//Throw exception if null
if (in == null) {
throw new IllegalArgumentException();
}

//Read in token
try {

System.out.println("TokenScanner!");
//Create new token scanner for argued reader
this.tokenScanner = in;

//Read next character
ch = tokenScanner.read();
}

//Throw exception if error in reading
catch (IOException e){
ch = -1;
}
}

//Determines whether the argued character is a valid word character.
public static boolean isWordCharacter(int c) {

//Cast int character to a char
char character = (char)c;

//Return true if character is valid word character
if(Character.isLetter(character) || character == ''') {
return true;
}

//Return false otherwise
return false;
}

//Determine whether another token is avaialble
public boolean hasNext() {

//Leverage invariant
return ch != -1 ;
}


And the function where alot of my headache is coming from (potentially)



//Determine next token
public String next() {

//End of stream reached
if(!hasNext()) {
throw new NoSuchElementException();
}

//Initialize variable to hold token
String word = "";

try {

//Character is a word character
while(isWordCharacter(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();

}

//Character is a space
while(!Character.isWhitespace(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();

}

System.out.println("Word is: "+ word);
return word;
}

//Exception catching
catch(Exception e) {

throw new NoSuchElementException();

}
}


The expected output given the testing infrastructure above is:



TokenScanner!
Word is: Aren't
Word is: you
Word is: /*Not sure how to represent newline in output*/
Word is: tired


The actual output below is:



TokenScanner!
Word is: Aren't
Word is:


The question here is why is this happening?



My output shows that the first test to fail is:



assertEquals(" ", d.next());


The fundamental issue here is how I'm representing non-words (spaces). The last test fails as well. Any help here is appreciated!










share|improve this question
























  • What's wrong with regular expressions?
    – Perdi Estaquel
    Nov 23 at 3:04













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have a program that takes incoming text converts it to type Reader and returns the next token, be it a word, or a space (non-word). It is not behaving as expected.



To be as specific as possible, here is my testing infrastructure in Eclipse using JUnit4:



@Test
public void testGetNextTokenWord() throws IOException {
Reader in = new StringReader("Aren't you ntired");
TokenScanner d = new TokenScanner(in);
try {
assertTrue("has next", d.hasNext());
assertEquals("Aren't", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" ", d.next());
assertTrue("has next", d.hasNext());
assertEquals("you", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" n", d.next());
assertTrue("has next", d.hasNext());
assertEquals("tired", d.next());

assertFalse("reached end of stream", d.hasNext());
} finally {
in.close();
}
}


I will post complete code to facilitate assistance on this problem and then post expected and observed behavior:



//Reads as much to determine hasNext() and next()
public TokenScanner(java.io.Reader in) throws IOException {

//Throw exception if null
if (in == null) {
throw new IllegalArgumentException();
}

//Read in token
try {

System.out.println("TokenScanner!");
//Create new token scanner for argued reader
this.tokenScanner = in;

//Read next character
ch = tokenScanner.read();
}

//Throw exception if error in reading
catch (IOException e){
ch = -1;
}
}

//Determines whether the argued character is a valid word character.
public static boolean isWordCharacter(int c) {

//Cast int character to a char
char character = (char)c;

//Return true if character is valid word character
if(Character.isLetter(character) || character == ''') {
return true;
}

//Return false otherwise
return false;
}

//Determine whether another token is avaialble
public boolean hasNext() {

//Leverage invariant
return ch != -1 ;
}


And the function where alot of my headache is coming from (potentially)



//Determine next token
public String next() {

//End of stream reached
if(!hasNext()) {
throw new NoSuchElementException();
}

//Initialize variable to hold token
String word = "";

try {

//Character is a word character
while(isWordCharacter(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();

}

//Character is a space
while(!Character.isWhitespace(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();

}

System.out.println("Word is: "+ word);
return word;
}

//Exception catching
catch(Exception e) {

throw new NoSuchElementException();

}
}


The expected output given the testing infrastructure above is:



TokenScanner!
Word is: Aren't
Word is: you
Word is: /*Not sure how to represent newline in output*/
Word is: tired


The actual output below is:



TokenScanner!
Word is: Aren't
Word is:


The question here is why is this happening?



My output shows that the first test to fail is:



assertEquals(" ", d.next());


The fundamental issue here is how I'm representing non-words (spaces). The last test fails as well. Any help here is appreciated!










share|improve this question















I have a program that takes incoming text converts it to type Reader and returns the next token, be it a word, or a space (non-word). It is not behaving as expected.



To be as specific as possible, here is my testing infrastructure in Eclipse using JUnit4:



@Test
public void testGetNextTokenWord() throws IOException {
Reader in = new StringReader("Aren't you ntired");
TokenScanner d = new TokenScanner(in);
try {
assertTrue("has next", d.hasNext());
assertEquals("Aren't", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" ", d.next());
assertTrue("has next", d.hasNext());
assertEquals("you", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" n", d.next());
assertTrue("has next", d.hasNext());
assertEquals("tired", d.next());

assertFalse("reached end of stream", d.hasNext());
} finally {
in.close();
}
}


I will post complete code to facilitate assistance on this problem and then post expected and observed behavior:



//Reads as much to determine hasNext() and next()
public TokenScanner(java.io.Reader in) throws IOException {

//Throw exception if null
if (in == null) {
throw new IllegalArgumentException();
}

//Read in token
try {

System.out.println("TokenScanner!");
//Create new token scanner for argued reader
this.tokenScanner = in;

//Read next character
ch = tokenScanner.read();
}

//Throw exception if error in reading
catch (IOException e){
ch = -1;
}
}

//Determines whether the argued character is a valid word character.
public static boolean isWordCharacter(int c) {

//Cast int character to a char
char character = (char)c;

//Return true if character is valid word character
if(Character.isLetter(character) || character == ''') {
return true;
}

//Return false otherwise
return false;
}

//Determine whether another token is avaialble
public boolean hasNext() {

//Leverage invariant
return ch != -1 ;
}


And the function where alot of my headache is coming from (potentially)



//Determine next token
public String next() {

//End of stream reached
if(!hasNext()) {
throw new NoSuchElementException();
}

//Initialize variable to hold token
String word = "";

try {

//Character is a word character
while(isWordCharacter(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();

}

//Character is a space
while(!Character.isWhitespace(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();

}

System.out.println("Word is: "+ word);
return word;
}

//Exception catching
catch(Exception e) {

throw new NoSuchElementException();

}
}


The expected output given the testing infrastructure above is:



TokenScanner!
Word is: Aren't
Word is: you
Word is: /*Not sure how to represent newline in output*/
Word is: tired


The actual output below is:



TokenScanner!
Word is: Aren't
Word is:


The question here is why is this happening?



My output shows that the first test to fail is:



assertEquals(" ", d.next());


The fundamental issue here is how I'm representing non-words (spaces). The last test fails as well. Any help here is appreciated!







java token junit4






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 at 21:47

























asked Nov 22 at 16:52









J_code

425




425












  • What's wrong with regular expressions?
    – Perdi Estaquel
    Nov 23 at 3:04


















  • What's wrong with regular expressions?
    – Perdi Estaquel
    Nov 23 at 3:04
















What's wrong with regular expressions?
– Perdi Estaquel
Nov 23 at 3:04




What's wrong with regular expressions?
– Perdi Estaquel
Nov 23 at 3:04

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435389%2fparsing-string-into-tokens%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435389%2fparsing-string-into-tokens%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to ignore python UserWarning in pytest?

What visual should I use to simply compare current year value vs last year in Power BI desktop

Script to remove string up to first number