Parsing string into tokens
up vote
0
down vote
favorite
I have a program that takes incoming text converts it to type Reader and returns the next token, be it a word, or a space (non-word). It is not behaving as expected.
To be as specific as possible, here is my testing infrastructure in Eclipse using JUnit4:
@Test
public void testGetNextTokenWord() throws IOException {
Reader in = new StringReader("Aren't you ntired");
TokenScanner d = new TokenScanner(in);
try {
assertTrue("has next", d.hasNext());
assertEquals("Aren't", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" ", d.next());
assertTrue("has next", d.hasNext());
assertEquals("you", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" n", d.next());
assertTrue("has next", d.hasNext());
assertEquals("tired", d.next());
assertFalse("reached end of stream", d.hasNext());
} finally {
in.close();
}
}
I will post complete code to facilitate assistance on this problem and then post expected and observed behavior:
//Reads as much to determine hasNext() and next()
public TokenScanner(java.io.Reader in) throws IOException {
//Throw exception if null
if (in == null) {
throw new IllegalArgumentException();
}
//Read in token
try {
System.out.println("TokenScanner!");
//Create new token scanner for argued reader
this.tokenScanner = in;
//Read next character
ch = tokenScanner.read();
}
//Throw exception if error in reading
catch (IOException e){
ch = -1;
}
}
//Determines whether the argued character is a valid word character.
public static boolean isWordCharacter(int c) {
//Cast int character to a char
char character = (char)c;
//Return true if character is valid word character
if(Character.isLetter(character) || character == ''') {
return true;
}
//Return false otherwise
return false;
}
//Determine whether another token is avaialble
public boolean hasNext() {
//Leverage invariant
return ch != -1 ;
}
And the function where alot of my headache is coming from (potentially)
//Determine next token
public String next() {
//End of stream reached
if(!hasNext()) {
throw new NoSuchElementException();
}
//Initialize variable to hold token
String word = "";
try {
//Character is a word character
while(isWordCharacter(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();
}
//Character is a space
while(!Character.isWhitespace(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();
}
System.out.println("Word is: "+ word);
return word;
}
//Exception catching
catch(Exception e) {
throw new NoSuchElementException();
}
}
The expected output given the testing infrastructure above is:
TokenScanner!
Word is: Aren't
Word is: you
Word is: /*Not sure how to represent newline in output*/
Word is: tired
The actual output below is:
TokenScanner!
Word is: Aren't
Word is:
The question here is why is this happening?
My output shows that the first test to fail is:
assertEquals(" ", d.next());
The fundamental issue here is how I'm representing non-words (spaces). The last test fails as well. Any help here is appreciated!
java token junit4
add a comment |
up vote
0
down vote
favorite
I have a program that takes incoming text converts it to type Reader and returns the next token, be it a word, or a space (non-word). It is not behaving as expected.
To be as specific as possible, here is my testing infrastructure in Eclipse using JUnit4:
@Test
public void testGetNextTokenWord() throws IOException {
Reader in = new StringReader("Aren't you ntired");
TokenScanner d = new TokenScanner(in);
try {
assertTrue("has next", d.hasNext());
assertEquals("Aren't", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" ", d.next());
assertTrue("has next", d.hasNext());
assertEquals("you", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" n", d.next());
assertTrue("has next", d.hasNext());
assertEquals("tired", d.next());
assertFalse("reached end of stream", d.hasNext());
} finally {
in.close();
}
}
I will post complete code to facilitate assistance on this problem and then post expected and observed behavior:
//Reads as much to determine hasNext() and next()
public TokenScanner(java.io.Reader in) throws IOException {
//Throw exception if null
if (in == null) {
throw new IllegalArgumentException();
}
//Read in token
try {
System.out.println("TokenScanner!");
//Create new token scanner for argued reader
this.tokenScanner = in;
//Read next character
ch = tokenScanner.read();
}
//Throw exception if error in reading
catch (IOException e){
ch = -1;
}
}
//Determines whether the argued character is a valid word character.
public static boolean isWordCharacter(int c) {
//Cast int character to a char
char character = (char)c;
//Return true if character is valid word character
if(Character.isLetter(character) || character == ''') {
return true;
}
//Return false otherwise
return false;
}
//Determine whether another token is avaialble
public boolean hasNext() {
//Leverage invariant
return ch != -1 ;
}
And the function where alot of my headache is coming from (potentially)
//Determine next token
public String next() {
//End of stream reached
if(!hasNext()) {
throw new NoSuchElementException();
}
//Initialize variable to hold token
String word = "";
try {
//Character is a word character
while(isWordCharacter(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();
}
//Character is a space
while(!Character.isWhitespace(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();
}
System.out.println("Word is: "+ word);
return word;
}
//Exception catching
catch(Exception e) {
throw new NoSuchElementException();
}
}
The expected output given the testing infrastructure above is:
TokenScanner!
Word is: Aren't
Word is: you
Word is: /*Not sure how to represent newline in output*/
Word is: tired
The actual output below is:
TokenScanner!
Word is: Aren't
Word is:
The question here is why is this happening?
My output shows that the first test to fail is:
assertEquals(" ", d.next());
The fundamental issue here is how I'm representing non-words (spaces). The last test fails as well. Any help here is appreciated!
java token junit4
What's wrong with regular expressions?
– Perdi Estaquel
Nov 23 at 3:04
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a program that takes incoming text converts it to type Reader and returns the next token, be it a word, or a space (non-word). It is not behaving as expected.
To be as specific as possible, here is my testing infrastructure in Eclipse using JUnit4:
@Test
public void testGetNextTokenWord() throws IOException {
Reader in = new StringReader("Aren't you ntired");
TokenScanner d = new TokenScanner(in);
try {
assertTrue("has next", d.hasNext());
assertEquals("Aren't", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" ", d.next());
assertTrue("has next", d.hasNext());
assertEquals("you", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" n", d.next());
assertTrue("has next", d.hasNext());
assertEquals("tired", d.next());
assertFalse("reached end of stream", d.hasNext());
} finally {
in.close();
}
}
I will post complete code to facilitate assistance on this problem and then post expected and observed behavior:
//Reads as much to determine hasNext() and next()
public TokenScanner(java.io.Reader in) throws IOException {
//Throw exception if null
if (in == null) {
throw new IllegalArgumentException();
}
//Read in token
try {
System.out.println("TokenScanner!");
//Create new token scanner for argued reader
this.tokenScanner = in;
//Read next character
ch = tokenScanner.read();
}
//Throw exception if error in reading
catch (IOException e){
ch = -1;
}
}
//Determines whether the argued character is a valid word character.
public static boolean isWordCharacter(int c) {
//Cast int character to a char
char character = (char)c;
//Return true if character is valid word character
if(Character.isLetter(character) || character == ''') {
return true;
}
//Return false otherwise
return false;
}
//Determine whether another token is avaialble
public boolean hasNext() {
//Leverage invariant
return ch != -1 ;
}
And the function where alot of my headache is coming from (potentially)
//Determine next token
public String next() {
//End of stream reached
if(!hasNext()) {
throw new NoSuchElementException();
}
//Initialize variable to hold token
String word = "";
try {
//Character is a word character
while(isWordCharacter(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();
}
//Character is a space
while(!Character.isWhitespace(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();
}
System.out.println("Word is: "+ word);
return word;
}
//Exception catching
catch(Exception e) {
throw new NoSuchElementException();
}
}
The expected output given the testing infrastructure above is:
TokenScanner!
Word is: Aren't
Word is: you
Word is: /*Not sure how to represent newline in output*/
Word is: tired
The actual output below is:
TokenScanner!
Word is: Aren't
Word is:
The question here is why is this happening?
My output shows that the first test to fail is:
assertEquals(" ", d.next());
The fundamental issue here is how I'm representing non-words (spaces). The last test fails as well. Any help here is appreciated!
java token junit4
I have a program that takes incoming text converts it to type Reader and returns the next token, be it a word, or a space (non-word). It is not behaving as expected.
To be as specific as possible, here is my testing infrastructure in Eclipse using JUnit4:
@Test
public void testGetNextTokenWord() throws IOException {
Reader in = new StringReader("Aren't you ntired");
TokenScanner d = new TokenScanner(in);
try {
assertTrue("has next", d.hasNext());
assertEquals("Aren't", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" ", d.next());
assertTrue("has next", d.hasNext());
assertEquals("you", d.next());
assertTrue("has next", d.hasNext());
assertEquals(" n", d.next());
assertTrue("has next", d.hasNext());
assertEquals("tired", d.next());
assertFalse("reached end of stream", d.hasNext());
} finally {
in.close();
}
}
I will post complete code to facilitate assistance on this problem and then post expected and observed behavior:
//Reads as much to determine hasNext() and next()
public TokenScanner(java.io.Reader in) throws IOException {
//Throw exception if null
if (in == null) {
throw new IllegalArgumentException();
}
//Read in token
try {
System.out.println("TokenScanner!");
//Create new token scanner for argued reader
this.tokenScanner = in;
//Read next character
ch = tokenScanner.read();
}
//Throw exception if error in reading
catch (IOException e){
ch = -1;
}
}
//Determines whether the argued character is a valid word character.
public static boolean isWordCharacter(int c) {
//Cast int character to a char
char character = (char)c;
//Return true if character is valid word character
if(Character.isLetter(character) || character == ''') {
return true;
}
//Return false otherwise
return false;
}
//Determine whether another token is avaialble
public boolean hasNext() {
//Leverage invariant
return ch != -1 ;
}
And the function where alot of my headache is coming from (potentially)
//Determine next token
public String next() {
//End of stream reached
if(!hasNext()) {
throw new NoSuchElementException();
}
//Initialize variable to hold token
String word = "";
try {
//Character is a word character
while(isWordCharacter(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();
}
//Character is a space
while(!Character.isWhitespace(ch)) {
word = word + (char)ch;
ch = tokenScanner.read();
}
System.out.println("Word is: "+ word);
return word;
}
//Exception catching
catch(Exception e) {
throw new NoSuchElementException();
}
}
The expected output given the testing infrastructure above is:
TokenScanner!
Word is: Aren't
Word is: you
Word is: /*Not sure how to represent newline in output*/
Word is: tired
The actual output below is:
TokenScanner!
Word is: Aren't
Word is:
The question here is why is this happening?
My output shows that the first test to fail is:
assertEquals(" ", d.next());
The fundamental issue here is how I'm representing non-words (spaces). The last test fails as well. Any help here is appreciated!
java token junit4
java token junit4
edited Nov 22 at 21:47
asked Nov 22 at 16:52
J_code
425
425
What's wrong with regular expressions?
– Perdi Estaquel
Nov 23 at 3:04
add a comment |
What's wrong with regular expressions?
– Perdi Estaquel
Nov 23 at 3:04
What's wrong with regular expressions?
– Perdi Estaquel
Nov 23 at 3:04
What's wrong with regular expressions?
– Perdi Estaquel
Nov 23 at 3:04
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435389%2fparsing-string-into-tokens%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435389%2fparsing-string-into-tokens%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What's wrong with regular expressions?
– Perdi Estaquel
Nov 23 at 3:04