“Incorrect number of dimensions” error, help me understand why












12














Organization of this question:



I.   Background
II. The Problem/Question
III. Steps Taken to Make this Question Good
IV. Update: the output of head(x.path) and dput(x.path)


I. Background



I am customizing/adapting the e-mail classification code from the O'Reilly book "Machine Learning for Hackers" (Chapter 3). That code and its accompanying data can be found here: https://github.com/johnmyleswhite/ML_for_Hackers/tree/master/03-Classification



II. The Problem/Question



One of the main functions in that code is called get.msg(). The original function is



get.msg <- function(path)
{
con <- file(path, open = "rt", encoding = "latin1")
text <- readLines(con)
# The message always begins after the first full line break
msg <- text[seq(which(text == "")[1] + 1, length(text), 1)]
close(con)
return(paste(msg, collapse = "n"))
}


My data is different in a number of ways though, so I have to edit this quite a bit. My data is read in earlier from a relational DB, thus I don't have to read in and clean a text file. Instead, my email body data is the 18th column of a dataframe, which we can call x. Here is my version of get.msg():



get.msg <- function(path) {
bodyvector <- path[!(is.na(path[,18]) | path[,18]==""), ]
return(paste(bodyvector))
}


Originally I referred to it as x$email and this worked through most of the code, however in a later step the get.msg() function was used on x.path, where x.path pointed to x and was used within another function in combination with the paste() function, as per the authors of the example code:



 z.spam <- sapply(spam.docs, function(p) count.word(paste(x.path,p,sep = ""),         "keyword"))


Here, the count.word() function is a function containing get.msg(). So, the paste() function was causing problems because it caused x.path to be considered an atomic array apparently, and gave the error that $ could not be used with an atomic array. As per an older StackOverflow Q&A, I changed the way I referred to the column to path[,18] (which is evaluated as x.path[,18] and therefore is the same as x[,18]).



Then I did some checking to ensure that x.path[,18] had the same information as x.path$email, which it did. However, when I try to run the code I get an error message on get.msg(x.path), which is:



Error in path[,18] : incorrect number of dimensions.


I tried path[,'email'], then path[18,] and then just path by itself and all three led to the same error. I tried path[[1]][[18]] and that gave me a subscript out of bounds error.



Any thoughts?



III. Steps Taken to Make this Question Good



To avoid annoying anyone and getting any down votes, I confirmed that the topic was relevant to StackOverflow and I feel that it may be relevant to other people dealing with this or similar programming problems in the future. I also spent almost an hour researching this problem online and trying things in R to fix it.



There were plenty of references to this error message, however the causes seemed to be very diverse and completely unrelated (such as networking trouble, etc). Finally, I spent a significant amount of time editing this question to try to make it readable and properly formatted (I hope I did okay, I know it's a lot of information).



IV. The output of head() and dput()



Some of you extremely helpful folks have requested to see the output of head(x.path) or dput(x.path). I don't mind except that it's confidential company email data and I'll be out of a job and sued if I publish it. ;-)



I've pasted it here and replaced the real info with fake info. I hope this is okay. I tried to use dput() at first and I can do so if you like but it was truly an overwhelming amount of data. Here's head(x.path):



> head(x.path)
[1] "c("Z12e3317e4b1jZbbajZ9Zdd6", "Z12e3317e4b1jZbbajZ99124", "Z12e331Ze4b1jZbbajZ996dd", "Z12e3319e4b1jZbbajZ9acb6", "Z12e3319e4b1jZbbajZ9ad3b", "Z12e3319e4b1jZbbajZ9adjd", "Z12e3319e4b1jZbbajZ9aebZ", "Z12e3319e4b1jZbbajZ9aj23", "Z12e3319e4b1jZbbajZ9b22b", "Z12e3319e4b1jZbbajZ9b42a", "Z12e3319e4b1jZbbajZ9b49a", "Z12e331ae4b1jZbbajZ9bZ11", "Z12e331ae4b1jZbbajZ9bZZ4", "Z12e331ae4b1jZbbajZ9c237", "Z12e331ae4b1jZbbajZ9c2e4", "Z12e331ae4b1jZbbajZ9c3bZ", "Z12e331ae4b1jZbbajZ9c3cZ", "Z12e331ae4b1jZbbajZ9cZ31", n"Z12e331be4b1jZbbajZ9cddd", "Z12e331be4b1jZbbajZ9cja6", "Z12e331ce4b1jZbbajZ9da1j", "Z12e331de4b1jZbbajZ9e649", "Z12e331de4b1jZbbajZ9j669", "Z12e331de4b1jZbbajZ9jZZZ", "Z12e331ee4b1jZbbajZ9j944", "Z12e331ee4b1jZbbajZ9jcZa", "Z12e331ee4b1jZbbajZ9jd4c", "Z12e331ee4b1jZbbajZa11e2", "Z12e331ee4b1jZbbajZa1291", "Z12e331ee4b1jZbbajZa1344", "Z12e3311e4b1jZbbajZa1j73", "Z12e3311e4b1jZbbajZa1131", "Z12e3311e4b1jZbbajZa11Z6", "Z12e3311e4b1jZbbajZa124c", "Z12e3311e4b1jZbbajZa1Zbc", "Z12e3311e4b1jZbbajZa19a9", n"Z12e3311e4b1jZbbajZa1ac2", "Z12e3311e4b1jZbbajZa1b79", "Z12e3311e4b1jZbbajZa1db2", "Z12e3311e4b1jZbbajZa1ejb", "Z12e3312e4b1jZbbajZa2333", "Z12e3312e4b1jZbbajZa23aZ", "Z12e3312e4b1jZbbajZa24bb", "Z12e3312e4b1jZbbajZa2Z79", "Z12e3312e4b1jZbbajZa2Zea", "Z12e3312e4b1jZbbajZa2ba9", "Z12e3312e4b1jZbbajZa2cZa", "Z12e3313e4b1jZbbajZa3bc1", "Z12e3313e4b1jZbbajZa3ca9", "Z12e3313e4b1jZbbajZa3e71", "Z12e3ajbe4b1j66Zbcja4eZc", "Z12e3ajbe4b1j66Zbcja4ja4", "Z12e3c79e4b1j66ZbcjaZc36", "Z12e3e1ce4b1j66Zbcja64bd", n"Z12e4117e4b1j66Zbcja6Zj1", "Z12e41bae4b1j66Zbcja734Z", "Z12e4226e4b1j66Zbcja7b13", "Z12e4226e4b1j66Zbcja7cbZ", "Z12e4ajee4b1j66Zbcjaa916", "Z12e4e61e4b1j66Zbcjab1c2", "Z12e4e61e4b1j66Zbcjab2da", "Z12eZ226e4b1j66ZbcjacZea", "Z12e6141e4b1j66Zbcjb19Z9", "Z12e6141e4b1j66Zbcjb19jd", "Z12e61Z9e4b1j66Zbcjb1acb", "Z12e61Z9e4b1j66Zbcjb1acj", "Z12j9713e4b1j66Zbcjc34db", "Z12j9713e4b1j66Zbcjc3ZZa", "Z12j9713e4b1j66Zbcjc3Za7", "Z12j9713e4b1j66Zbcjc3Zd2", "Z12j9713e4b1j66Zbcjc36c2", "Z12j973ce4b1j66Zbcjc396b"n)"
[2] "c("Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", n"Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something")"
[3] "c(61Z7, 674Z, Z462, 692, Z26, 1121, 1213, 1317, 21ZZ, 2Z9Z, 2711, 3612, 3717, 4774, 4Z93, Z117, Z113, Z197, Z77Z, 61Z3, Z16Z, 11771, 12923, 13374, 13Z93, 14277, 1446Z, 1Z3ZZ, 1ZZ16, 1Z993, 164Z2, 16664, 1711Z, 171Z6, 1Z6ZZ, 1Z921, 19211, 193ZZ, 19931, 21117, 21164, 21177, 21371, 21Z61, 21673, 22ZZ7, 23137, 2ZZ44, 26166, 26Z1Z, 173Z6, 17661, 21Z74, 23119, 232ZZ, 249Z3, 2ZZ31, 261Z9, 31211, 33414, 336Z6, 37941, 1743, 1Z61, 216Z, 2171, 1ZZ3, 2119, 21Z4, 2129, 2334, 2ZZZ)"
[4] "c("Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", n"Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty")"
[5] "c(Z6, 93Z, 1314, 3, 4, Z, 6, 7, 9, 11, 11, 13, 14, 2Z, 26, 27, 2Z, 29, 33, 34, ZZ, Z3, 122, 12Z, 133, 139, 142, 147, 1Z2, 1Z3, 16Z, 169, 171, 171, 219, 221, 221, 222, 22Z, 226, 244, 246, 247, 24Z, 249, 2637, 264, 2Z9, 292, 296, 49, Z1, 76, 93, 9Z, 112, 111, 114, 1Z7, 211, 214, 263, 6, 7, 11, 11, 11, 11, 12, 13, 14, 1Z)"
[6] "c(3Z11, 3Z11, 3Z11, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, 66Z1, 66Z1, 66Z1, 66Z1, 4ZZ4, 4ZZ4, 4ZZ4, 4ZZ4, 4ZZ4, 4ZZ4)"


If this were to show you more then you'd see message bodies for [18].










share|improve this question




















  • 3




    It will be much easier if you show us your object (head, str) and the offending line of code. A reproducible example may go even farther.
    – Roman Luštrik
    Mar 29 '13 at 23:57










  • Thought is path needs to be a two-dimension object (e.g. data.frame or matrix) so you can do path[,18]; your x.path is not. Just do class(x.path) and you should see that.
    – flodel
    Mar 30 '13 at 0:16










  • Update: I also tried replacing [,18] with [,'email'] but got the same message. I see two comments popped up while I'm editing this so let me save my commend then I will follow up on yours (and thanks btw!). I would give you the output of head() but it's confidential email bodies : /
    – user2225772
    Mar 30 '13 at 0:16












  • flodel: You're right, class(x.path) shows that it's character due to the paste() command, but I used that because of the authors' example and because I can't figure out how to get away from it while still using the anonymous function like in that third snippet of code in my original post. Is there a way I could do that without paste tho? Sorry for the dumb question.
    – user2225772
    Mar 30 '13 at 0:21










  • Roman: I can however describe the output of head(x.path). It's a large dataframe with different kinds of data stored relating to an email client. The only column I care about at the moment is the email body column and that one is just the text of emails, with r and n and other such text representations of formatting.
    – user2225772
    Mar 30 '13 at 0:24
















12














Organization of this question:



I.   Background
II. The Problem/Question
III. Steps Taken to Make this Question Good
IV. Update: the output of head(x.path) and dput(x.path)


I. Background



I am customizing/adapting the e-mail classification code from the O'Reilly book "Machine Learning for Hackers" (Chapter 3). That code and its accompanying data can be found here: https://github.com/johnmyleswhite/ML_for_Hackers/tree/master/03-Classification



II. The Problem/Question



One of the main functions in that code is called get.msg(). The original function is



get.msg <- function(path)
{
con <- file(path, open = "rt", encoding = "latin1")
text <- readLines(con)
# The message always begins after the first full line break
msg <- text[seq(which(text == "")[1] + 1, length(text), 1)]
close(con)
return(paste(msg, collapse = "n"))
}


My data is different in a number of ways though, so I have to edit this quite a bit. My data is read in earlier from a relational DB, thus I don't have to read in and clean a text file. Instead, my email body data is the 18th column of a dataframe, which we can call x. Here is my version of get.msg():



get.msg <- function(path) {
bodyvector <- path[!(is.na(path[,18]) | path[,18]==""), ]
return(paste(bodyvector))
}


Originally I referred to it as x$email and this worked through most of the code, however in a later step the get.msg() function was used on x.path, where x.path pointed to x and was used within another function in combination with the paste() function, as per the authors of the example code:



 z.spam <- sapply(spam.docs, function(p) count.word(paste(x.path,p,sep = ""),         "keyword"))


Here, the count.word() function is a function containing get.msg(). So, the paste() function was causing problems because it caused x.path to be considered an atomic array apparently, and gave the error that $ could not be used with an atomic array. As per an older StackOverflow Q&A, I changed the way I referred to the column to path[,18] (which is evaluated as x.path[,18] and therefore is the same as x[,18]).



Then I did some checking to ensure that x.path[,18] had the same information as x.path$email, which it did. However, when I try to run the code I get an error message on get.msg(x.path), which is:



Error in path[,18] : incorrect number of dimensions.


I tried path[,'email'], then path[18,] and then just path by itself and all three led to the same error. I tried path[[1]][[18]] and that gave me a subscript out of bounds error.



Any thoughts?



III. Steps Taken to Make this Question Good



To avoid annoying anyone and getting any down votes, I confirmed that the topic was relevant to StackOverflow and I feel that it may be relevant to other people dealing with this or similar programming problems in the future. I also spent almost an hour researching this problem online and trying things in R to fix it.



There were plenty of references to this error message, however the causes seemed to be very diverse and completely unrelated (such as networking trouble, etc). Finally, I spent a significant amount of time editing this question to try to make it readable and properly formatted (I hope I did okay, I know it's a lot of information).



IV. The output of head() and dput()



Some of you extremely helpful folks have requested to see the output of head(x.path) or dput(x.path). I don't mind except that it's confidential company email data and I'll be out of a job and sued if I publish it. ;-)



I've pasted it here and replaced the real info with fake info. I hope this is okay. I tried to use dput() at first and I can do so if you like but it was truly an overwhelming amount of data. Here's head(x.path):



> head(x.path)
[1] "c("Z12e3317e4b1jZbbajZ9Zdd6", "Z12e3317e4b1jZbbajZ99124", "Z12e331Ze4b1jZbbajZ996dd", "Z12e3319e4b1jZbbajZ9acb6", "Z12e3319e4b1jZbbajZ9ad3b", "Z12e3319e4b1jZbbajZ9adjd", "Z12e3319e4b1jZbbajZ9aebZ", "Z12e3319e4b1jZbbajZ9aj23", "Z12e3319e4b1jZbbajZ9b22b", "Z12e3319e4b1jZbbajZ9b42a", "Z12e3319e4b1jZbbajZ9b49a", "Z12e331ae4b1jZbbajZ9bZ11", "Z12e331ae4b1jZbbajZ9bZZ4", "Z12e331ae4b1jZbbajZ9c237", "Z12e331ae4b1jZbbajZ9c2e4", "Z12e331ae4b1jZbbajZ9c3bZ", "Z12e331ae4b1jZbbajZ9c3cZ", "Z12e331ae4b1jZbbajZ9cZ31", n"Z12e331be4b1jZbbajZ9cddd", "Z12e331be4b1jZbbajZ9cja6", "Z12e331ce4b1jZbbajZ9da1j", "Z12e331de4b1jZbbajZ9e649", "Z12e331de4b1jZbbajZ9j669", "Z12e331de4b1jZbbajZ9jZZZ", "Z12e331ee4b1jZbbajZ9j944", "Z12e331ee4b1jZbbajZ9jcZa", "Z12e331ee4b1jZbbajZ9jd4c", "Z12e331ee4b1jZbbajZa11e2", "Z12e331ee4b1jZbbajZa1291", "Z12e331ee4b1jZbbajZa1344", "Z12e3311e4b1jZbbajZa1j73", "Z12e3311e4b1jZbbajZa1131", "Z12e3311e4b1jZbbajZa11Z6", "Z12e3311e4b1jZbbajZa124c", "Z12e3311e4b1jZbbajZa1Zbc", "Z12e3311e4b1jZbbajZa19a9", n"Z12e3311e4b1jZbbajZa1ac2", "Z12e3311e4b1jZbbajZa1b79", "Z12e3311e4b1jZbbajZa1db2", "Z12e3311e4b1jZbbajZa1ejb", "Z12e3312e4b1jZbbajZa2333", "Z12e3312e4b1jZbbajZa23aZ", "Z12e3312e4b1jZbbajZa24bb", "Z12e3312e4b1jZbbajZa2Z79", "Z12e3312e4b1jZbbajZa2Zea", "Z12e3312e4b1jZbbajZa2ba9", "Z12e3312e4b1jZbbajZa2cZa", "Z12e3313e4b1jZbbajZa3bc1", "Z12e3313e4b1jZbbajZa3ca9", "Z12e3313e4b1jZbbajZa3e71", "Z12e3ajbe4b1j66Zbcja4eZc", "Z12e3ajbe4b1j66Zbcja4ja4", "Z12e3c79e4b1j66ZbcjaZc36", "Z12e3e1ce4b1j66Zbcja64bd", n"Z12e4117e4b1j66Zbcja6Zj1", "Z12e41bae4b1j66Zbcja734Z", "Z12e4226e4b1j66Zbcja7b13", "Z12e4226e4b1j66Zbcja7cbZ", "Z12e4ajee4b1j66Zbcjaa916", "Z12e4e61e4b1j66Zbcjab1c2", "Z12e4e61e4b1j66Zbcjab2da", "Z12eZ226e4b1j66ZbcjacZea", "Z12e6141e4b1j66Zbcjb19Z9", "Z12e6141e4b1j66Zbcjb19jd", "Z12e61Z9e4b1j66Zbcjb1acb", "Z12e61Z9e4b1j66Zbcjb1acj", "Z12j9713e4b1j66Zbcjc34db", "Z12j9713e4b1j66Zbcjc3ZZa", "Z12j9713e4b1j66Zbcjc3Za7", "Z12j9713e4b1j66Zbcjc3Zd2", "Z12j9713e4b1j66Zbcjc36c2", "Z12j973ce4b1j66Zbcjc396b"n)"
[2] "c("Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", n"Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something")"
[3] "c(61Z7, 674Z, Z462, 692, Z26, 1121, 1213, 1317, 21ZZ, 2Z9Z, 2711, 3612, 3717, 4774, 4Z93, Z117, Z113, Z197, Z77Z, 61Z3, Z16Z, 11771, 12923, 13374, 13Z93, 14277, 1446Z, 1Z3ZZ, 1ZZ16, 1Z993, 164Z2, 16664, 1711Z, 171Z6, 1Z6ZZ, 1Z921, 19211, 193ZZ, 19931, 21117, 21164, 21177, 21371, 21Z61, 21673, 22ZZ7, 23137, 2ZZ44, 26166, 26Z1Z, 173Z6, 17661, 21Z74, 23119, 232ZZ, 249Z3, 2ZZ31, 261Z9, 31211, 33414, 336Z6, 37941, 1743, 1Z61, 216Z, 2171, 1ZZ3, 2119, 21Z4, 2129, 2334, 2ZZZ)"
[4] "c("Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", n"Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty")"
[5] "c(Z6, 93Z, 1314, 3, 4, Z, 6, 7, 9, 11, 11, 13, 14, 2Z, 26, 27, 2Z, 29, 33, 34, ZZ, Z3, 122, 12Z, 133, 139, 142, 147, 1Z2, 1Z3, 16Z, 169, 171, 171, 219, 221, 221, 222, 22Z, 226, 244, 246, 247, 24Z, 249, 2637, 264, 2Z9, 292, 296, 49, Z1, 76, 93, 9Z, 112, 111, 114, 1Z7, 211, 214, 263, 6, 7, 11, 11, 11, 11, 12, 13, 14, 1Z)"
[6] "c(3Z11, 3Z11, 3Z11, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, 66Z1, 66Z1, 66Z1, 66Z1, 4ZZ4, 4ZZ4, 4ZZ4, 4ZZ4, 4ZZ4, 4ZZ4)"


If this were to show you more then you'd see message bodies for [18].










share|improve this question




















  • 3




    It will be much easier if you show us your object (head, str) and the offending line of code. A reproducible example may go even farther.
    – Roman Luštrik
    Mar 29 '13 at 23:57










  • Thought is path needs to be a two-dimension object (e.g. data.frame or matrix) so you can do path[,18]; your x.path is not. Just do class(x.path) and you should see that.
    – flodel
    Mar 30 '13 at 0:16










  • Update: I also tried replacing [,18] with [,'email'] but got the same message. I see two comments popped up while I'm editing this so let me save my commend then I will follow up on yours (and thanks btw!). I would give you the output of head() but it's confidential email bodies : /
    – user2225772
    Mar 30 '13 at 0:16












  • flodel: You're right, class(x.path) shows that it's character due to the paste() command, but I used that because of the authors' example and because I can't figure out how to get away from it while still using the anonymous function like in that third snippet of code in my original post. Is there a way I could do that without paste tho? Sorry for the dumb question.
    – user2225772
    Mar 30 '13 at 0:21










  • Roman: I can however describe the output of head(x.path). It's a large dataframe with different kinds of data stored relating to an email client. The only column I care about at the moment is the email body column and that one is just the text of emails, with r and n and other such text representations of formatting.
    – user2225772
    Mar 30 '13 at 0:24














12












12








12


2





Organization of this question:



I.   Background
II. The Problem/Question
III. Steps Taken to Make this Question Good
IV. Update: the output of head(x.path) and dput(x.path)


I. Background



I am customizing/adapting the e-mail classification code from the O'Reilly book "Machine Learning for Hackers" (Chapter 3). That code and its accompanying data can be found here: https://github.com/johnmyleswhite/ML_for_Hackers/tree/master/03-Classification



II. The Problem/Question



One of the main functions in that code is called get.msg(). The original function is



get.msg <- function(path)
{
con <- file(path, open = "rt", encoding = "latin1")
text <- readLines(con)
# The message always begins after the first full line break
msg <- text[seq(which(text == "")[1] + 1, length(text), 1)]
close(con)
return(paste(msg, collapse = "n"))
}


My data is different in a number of ways though, so I have to edit this quite a bit. My data is read in earlier from a relational DB, thus I don't have to read in and clean a text file. Instead, my email body data is the 18th column of a dataframe, which we can call x. Here is my version of get.msg():



get.msg <- function(path) {
bodyvector <- path[!(is.na(path[,18]) | path[,18]==""), ]
return(paste(bodyvector))
}


Originally I referred to it as x$email and this worked through most of the code, however in a later step the get.msg() function was used on x.path, where x.path pointed to x and was used within another function in combination with the paste() function, as per the authors of the example code:



 z.spam <- sapply(spam.docs, function(p) count.word(paste(x.path,p,sep = ""),         "keyword"))


Here, the count.word() function is a function containing get.msg(). So, the paste() function was causing problems because it caused x.path to be considered an atomic array apparently, and gave the error that $ could not be used with an atomic array. As per an older StackOverflow Q&A, I changed the way I referred to the column to path[,18] (which is evaluated as x.path[,18] and therefore is the same as x[,18]).



Then I did some checking to ensure that x.path[,18] had the same information as x.path$email, which it did. However, when I try to run the code I get an error message on get.msg(x.path), which is:



Error in path[,18] : incorrect number of dimensions.


I tried path[,'email'], then path[18,] and then just path by itself and all three led to the same error. I tried path[[1]][[18]] and that gave me a subscript out of bounds error.



Any thoughts?



III. Steps Taken to Make this Question Good



To avoid annoying anyone and getting any down votes, I confirmed that the topic was relevant to StackOverflow and I feel that it may be relevant to other people dealing with this or similar programming problems in the future. I also spent almost an hour researching this problem online and trying things in R to fix it.



There were plenty of references to this error message, however the causes seemed to be very diverse and completely unrelated (such as networking trouble, etc). Finally, I spent a significant amount of time editing this question to try to make it readable and properly formatted (I hope I did okay, I know it's a lot of information).



IV. The output of head() and dput()



Some of you extremely helpful folks have requested to see the output of head(x.path) or dput(x.path). I don't mind except that it's confidential company email data and I'll be out of a job and sued if I publish it. ;-)



I've pasted it here and replaced the real info with fake info. I hope this is okay. I tried to use dput() at first and I can do so if you like but it was truly an overwhelming amount of data. Here's head(x.path):



> head(x.path)
[1] "c("Z12e3317e4b1jZbbajZ9Zdd6", "Z12e3317e4b1jZbbajZ99124", "Z12e331Ze4b1jZbbajZ996dd", "Z12e3319e4b1jZbbajZ9acb6", "Z12e3319e4b1jZbbajZ9ad3b", "Z12e3319e4b1jZbbajZ9adjd", "Z12e3319e4b1jZbbajZ9aebZ", "Z12e3319e4b1jZbbajZ9aj23", "Z12e3319e4b1jZbbajZ9b22b", "Z12e3319e4b1jZbbajZ9b42a", "Z12e3319e4b1jZbbajZ9b49a", "Z12e331ae4b1jZbbajZ9bZ11", "Z12e331ae4b1jZbbajZ9bZZ4", "Z12e331ae4b1jZbbajZ9c237", "Z12e331ae4b1jZbbajZ9c2e4", "Z12e331ae4b1jZbbajZ9c3bZ", "Z12e331ae4b1jZbbajZ9c3cZ", "Z12e331ae4b1jZbbajZ9cZ31", n"Z12e331be4b1jZbbajZ9cddd", "Z12e331be4b1jZbbajZ9cja6", "Z12e331ce4b1jZbbajZ9da1j", "Z12e331de4b1jZbbajZ9e649", "Z12e331de4b1jZbbajZ9j669", "Z12e331de4b1jZbbajZ9jZZZ", "Z12e331ee4b1jZbbajZ9j944", "Z12e331ee4b1jZbbajZ9jcZa", "Z12e331ee4b1jZbbajZ9jd4c", "Z12e331ee4b1jZbbajZa11e2", "Z12e331ee4b1jZbbajZa1291", "Z12e331ee4b1jZbbajZa1344", "Z12e3311e4b1jZbbajZa1j73", "Z12e3311e4b1jZbbajZa1131", "Z12e3311e4b1jZbbajZa11Z6", "Z12e3311e4b1jZbbajZa124c", "Z12e3311e4b1jZbbajZa1Zbc", "Z12e3311e4b1jZbbajZa19a9", n"Z12e3311e4b1jZbbajZa1ac2", "Z12e3311e4b1jZbbajZa1b79", "Z12e3311e4b1jZbbajZa1db2", "Z12e3311e4b1jZbbajZa1ejb", "Z12e3312e4b1jZbbajZa2333", "Z12e3312e4b1jZbbajZa23aZ", "Z12e3312e4b1jZbbajZa24bb", "Z12e3312e4b1jZbbajZa2Z79", "Z12e3312e4b1jZbbajZa2Zea", "Z12e3312e4b1jZbbajZa2ba9", "Z12e3312e4b1jZbbajZa2cZa", "Z12e3313e4b1jZbbajZa3bc1", "Z12e3313e4b1jZbbajZa3ca9", "Z12e3313e4b1jZbbajZa3e71", "Z12e3ajbe4b1j66Zbcja4eZc", "Z12e3ajbe4b1j66Zbcja4ja4", "Z12e3c79e4b1j66ZbcjaZc36", "Z12e3e1ce4b1j66Zbcja64bd", n"Z12e4117e4b1j66Zbcja6Zj1", "Z12e41bae4b1j66Zbcja734Z", "Z12e4226e4b1j66Zbcja7b13", "Z12e4226e4b1j66Zbcja7cbZ", "Z12e4ajee4b1j66Zbcjaa916", "Z12e4e61e4b1j66Zbcjab1c2", "Z12e4e61e4b1j66Zbcjab2da", "Z12eZ226e4b1j66ZbcjacZea", "Z12e6141e4b1j66Zbcjb19Z9", "Z12e6141e4b1j66Zbcjb19jd", "Z12e61Z9e4b1j66Zbcjb1acb", "Z12e61Z9e4b1j66Zbcjb1acj", "Z12j9713e4b1j66Zbcjc34db", "Z12j9713e4b1j66Zbcjc3ZZa", "Z12j9713e4b1j66Zbcjc3Za7", "Z12j9713e4b1j66Zbcjc3Zd2", "Z12j9713e4b1j66Zbcjc36c2", "Z12j973ce4b1j66Zbcjc396b"n)"
[2] "c("Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", n"Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something")"
[3] "c(61Z7, 674Z, Z462, 692, Z26, 1121, 1213, 1317, 21ZZ, 2Z9Z, 2711, 3612, 3717, 4774, 4Z93, Z117, Z113, Z197, Z77Z, 61Z3, Z16Z, 11771, 12923, 13374, 13Z93, 14277, 1446Z, 1Z3ZZ, 1ZZ16, 1Z993, 164Z2, 16664, 1711Z, 171Z6, 1Z6ZZ, 1Z921, 19211, 193ZZ, 19931, 21117, 21164, 21177, 21371, 21Z61, 21673, 22ZZ7, 23137, 2ZZ44, 26166, 26Z1Z, 173Z6, 17661, 21Z74, 23119, 232ZZ, 249Z3, 2ZZ31, 261Z9, 31211, 33414, 336Z6, 37941, 1743, 1Z61, 216Z, 2171, 1ZZ3, 2119, 21Z4, 2129, 2334, 2ZZZ)"
[4] "c("Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", n"Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty")"
[5] "c(Z6, 93Z, 1314, 3, 4, Z, 6, 7, 9, 11, 11, 13, 14, 2Z, 26, 27, 2Z, 29, 33, 34, ZZ, Z3, 122, 12Z, 133, 139, 142, 147, 1Z2, 1Z3, 16Z, 169, 171, 171, 219, 221, 221, 222, 22Z, 226, 244, 246, 247, 24Z, 249, 2637, 264, 2Z9, 292, 296, 49, Z1, 76, 93, 9Z, 112, 111, 114, 1Z7, 211, 214, 263, 6, 7, 11, 11, 11, 11, 12, 13, 14, 1Z)"
[6] "c(3Z11, 3Z11, 3Z11, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, 66Z1, 66Z1, 66Z1, 66Z1, 4ZZ4, 4ZZ4, 4ZZ4, 4ZZ4, 4ZZ4, 4ZZ4)"


If this were to show you more then you'd see message bodies for [18].










share|improve this question















Organization of this question:



I.   Background
II. The Problem/Question
III. Steps Taken to Make this Question Good
IV. Update: the output of head(x.path) and dput(x.path)


I. Background



I am customizing/adapting the e-mail classification code from the O'Reilly book "Machine Learning for Hackers" (Chapter 3). That code and its accompanying data can be found here: https://github.com/johnmyleswhite/ML_for_Hackers/tree/master/03-Classification



II. The Problem/Question



One of the main functions in that code is called get.msg(). The original function is



get.msg <- function(path)
{
con <- file(path, open = "rt", encoding = "latin1")
text <- readLines(con)
# The message always begins after the first full line break
msg <- text[seq(which(text == "")[1] + 1, length(text), 1)]
close(con)
return(paste(msg, collapse = "n"))
}


My data is different in a number of ways though, so I have to edit this quite a bit. My data is read in earlier from a relational DB, thus I don't have to read in and clean a text file. Instead, my email body data is the 18th column of a dataframe, which we can call x. Here is my version of get.msg():



get.msg <- function(path) {
bodyvector <- path[!(is.na(path[,18]) | path[,18]==""), ]
return(paste(bodyvector))
}


Originally I referred to it as x$email and this worked through most of the code, however in a later step the get.msg() function was used on x.path, where x.path pointed to x and was used within another function in combination with the paste() function, as per the authors of the example code:



 z.spam <- sapply(spam.docs, function(p) count.word(paste(x.path,p,sep = ""),         "keyword"))


Here, the count.word() function is a function containing get.msg(). So, the paste() function was causing problems because it caused x.path to be considered an atomic array apparently, and gave the error that $ could not be used with an atomic array. As per an older StackOverflow Q&A, I changed the way I referred to the column to path[,18] (which is evaluated as x.path[,18] and therefore is the same as x[,18]).



Then I did some checking to ensure that x.path[,18] had the same information as x.path$email, which it did. However, when I try to run the code I get an error message on get.msg(x.path), which is:



Error in path[,18] : incorrect number of dimensions.


I tried path[,'email'], then path[18,] and then just path by itself and all three led to the same error. I tried path[[1]][[18]] and that gave me a subscript out of bounds error.



Any thoughts?



III. Steps Taken to Make this Question Good



To avoid annoying anyone and getting any down votes, I confirmed that the topic was relevant to StackOverflow and I feel that it may be relevant to other people dealing with this or similar programming problems in the future. I also spent almost an hour researching this problem online and trying things in R to fix it.



There were plenty of references to this error message, however the causes seemed to be very diverse and completely unrelated (such as networking trouble, etc). Finally, I spent a significant amount of time editing this question to try to make it readable and properly formatted (I hope I did okay, I know it's a lot of information).



IV. The output of head() and dput()



Some of you extremely helpful folks have requested to see the output of head(x.path) or dput(x.path). I don't mind except that it's confidential company email data and I'll be out of a job and sued if I publish it. ;-)



I've pasted it here and replaced the real info with fake info. I hope this is okay. I tried to use dput() at first and I can do so if you like but it was truly an overwhelming amount of data. Here's head(x.path):



> head(x.path)
[1] "c("Z12e3317e4b1jZbbajZ9Zdd6", "Z12e3317e4b1jZbbajZ99124", "Z12e331Ze4b1jZbbajZ996dd", "Z12e3319e4b1jZbbajZ9acb6", "Z12e3319e4b1jZbbajZ9ad3b", "Z12e3319e4b1jZbbajZ9adjd", "Z12e3319e4b1jZbbajZ9aebZ", "Z12e3319e4b1jZbbajZ9aj23", "Z12e3319e4b1jZbbajZ9b22b", "Z12e3319e4b1jZbbajZ9b42a", "Z12e3319e4b1jZbbajZ9b49a", "Z12e331ae4b1jZbbajZ9bZ11", "Z12e331ae4b1jZbbajZ9bZZ4", "Z12e331ae4b1jZbbajZ9c237", "Z12e331ae4b1jZbbajZ9c2e4", "Z12e331ae4b1jZbbajZ9c3bZ", "Z12e331ae4b1jZbbajZ9c3cZ", "Z12e331ae4b1jZbbajZ9cZ31", n"Z12e331be4b1jZbbajZ9cddd", "Z12e331be4b1jZbbajZ9cja6", "Z12e331ce4b1jZbbajZ9da1j", "Z12e331de4b1jZbbajZ9e649", "Z12e331de4b1jZbbajZ9j669", "Z12e331de4b1jZbbajZ9jZZZ", "Z12e331ee4b1jZbbajZ9j944", "Z12e331ee4b1jZbbajZ9jcZa", "Z12e331ee4b1jZbbajZ9jd4c", "Z12e331ee4b1jZbbajZa11e2", "Z12e331ee4b1jZbbajZa1291", "Z12e331ee4b1jZbbajZa1344", "Z12e3311e4b1jZbbajZa1j73", "Z12e3311e4b1jZbbajZa1131", "Z12e3311e4b1jZbbajZa11Z6", "Z12e3311e4b1jZbbajZa124c", "Z12e3311e4b1jZbbajZa1Zbc", "Z12e3311e4b1jZbbajZa19a9", n"Z12e3311e4b1jZbbajZa1ac2", "Z12e3311e4b1jZbbajZa1b79", "Z12e3311e4b1jZbbajZa1db2", "Z12e3311e4b1jZbbajZa1ejb", "Z12e3312e4b1jZbbajZa2333", "Z12e3312e4b1jZbbajZa23aZ", "Z12e3312e4b1jZbbajZa24bb", "Z12e3312e4b1jZbbajZa2Z79", "Z12e3312e4b1jZbbajZa2Zea", "Z12e3312e4b1jZbbajZa2ba9", "Z12e3312e4b1jZbbajZa2cZa", "Z12e3313e4b1jZbbajZa3bc1", "Z12e3313e4b1jZbbajZa3ca9", "Z12e3313e4b1jZbbajZa3e71", "Z12e3ajbe4b1j66Zbcja4eZc", "Z12e3ajbe4b1j66Zbcja4ja4", "Z12e3c79e4b1j66ZbcjaZc36", "Z12e3e1ce4b1j66Zbcja64bd", n"Z12e4117e4b1j66Zbcja6Zj1", "Z12e41bae4b1j66Zbcja734Z", "Z12e4226e4b1j66Zbcja7b13", "Z12e4226e4b1j66Zbcja7cbZ", "Z12e4ajee4b1j66Zbcjaa916", "Z12e4e61e4b1j66Zbcjab1c2", "Z12e4e61e4b1j66Zbcjab2da", "Z12eZ226e4b1j66ZbcjacZea", "Z12e6141e4b1j66Zbcjb19Z9", "Z12e6141e4b1j66Zbcjb19jd", "Z12e61Z9e4b1j66Zbcjb1acb", "Z12e61Z9e4b1j66Zbcjb1acj", "Z12j9713e4b1j66Zbcjc34db", "Z12j9713e4b1j66Zbcjc3ZZa", "Z12j9713e4b1j66Zbcjc3Za7", "Z12j9713e4b1j66Zbcjc3Zd2", "Z12j9713e4b1j66Zbcjc36c2", "Z12j973ce4b1j66Zbcjc396b"n)"
[2] "c("Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", n"Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something", "Something")"
[3] "c(61Z7, 674Z, Z462, 692, Z26, 1121, 1213, 1317, 21ZZ, 2Z9Z, 2711, 3612, 3717, 4774, 4Z93, Z117, Z113, Z197, Z77Z, 61Z3, Z16Z, 11771, 12923, 13374, 13Z93, 14277, 1446Z, 1Z3ZZ, 1ZZ16, 1Z993, 164Z2, 16664, 1711Z, 171Z6, 1Z6ZZ, 1Z921, 19211, 193ZZ, 19931, 21117, 21164, 21177, 21371, 21Z61, 21673, 22ZZ7, 23137, 2ZZ44, 26166, 26Z1Z, 173Z6, 17661, 21Z74, 23119, 232ZZ, 249Z3, 2ZZ31, 261Z9, 31211, 33414, 336Z6, 37941, 1743, 1Z61, 216Z, 2171, 1ZZ3, 2119, 21Z4, 2129, 2334, 2ZZZ)"
[4] "c("Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", n"Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty", "Booty")"
[5] "c(Z6, 93Z, 1314, 3, 4, Z, 6, 7, 9, 11, 11, 13, 14, 2Z, 26, 27, 2Z, 29, 33, 34, ZZ, Z3, 122, 12Z, 133, 139, 142, 147, 1Z2, 1Z3, 16Z, 169, 171, 171, 219, 221, 221, 222, 22Z, 226, 244, 246, 247, 24Z, 249, 2637, 264, 2Z9, 292, 296, 49, Z1, 76, 93, 9Z, 112, 111, 114, 1Z7, 211, 214, 263, 6, 7, 11, 11, 11, 11, 12, 13, 14, 1Z)"
[6] "c(3Z11, 3Z11, 3Z11, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, 66Z1, 66Z1, 66Z1, 66Z1, 4ZZ4, 4ZZ4, 4ZZ4, 4ZZ4, 4ZZ4, 4ZZ4)"


If this were to show you more then you'd see message bodies for [18].







r dimensions






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 '18 at 9:12









zx8754

29.2k76398




29.2k76398










asked Mar 29 '13 at 23:52









user2225772

74117




74117








  • 3




    It will be much easier if you show us your object (head, str) and the offending line of code. A reproducible example may go even farther.
    – Roman Luštrik
    Mar 29 '13 at 23:57










  • Thought is path needs to be a two-dimension object (e.g. data.frame or matrix) so you can do path[,18]; your x.path is not. Just do class(x.path) and you should see that.
    – flodel
    Mar 30 '13 at 0:16










  • Update: I also tried replacing [,18] with [,'email'] but got the same message. I see two comments popped up while I'm editing this so let me save my commend then I will follow up on yours (and thanks btw!). I would give you the output of head() but it's confidential email bodies : /
    – user2225772
    Mar 30 '13 at 0:16












  • flodel: You're right, class(x.path) shows that it's character due to the paste() command, but I used that because of the authors' example and because I can't figure out how to get away from it while still using the anonymous function like in that third snippet of code in my original post. Is there a way I could do that without paste tho? Sorry for the dumb question.
    – user2225772
    Mar 30 '13 at 0:21










  • Roman: I can however describe the output of head(x.path). It's a large dataframe with different kinds of data stored relating to an email client. The only column I care about at the moment is the email body column and that one is just the text of emails, with r and n and other such text representations of formatting.
    – user2225772
    Mar 30 '13 at 0:24














  • 3




    It will be much easier if you show us your object (head, str) and the offending line of code. A reproducible example may go even farther.
    – Roman Luštrik
    Mar 29 '13 at 23:57










  • Thought is path needs to be a two-dimension object (e.g. data.frame or matrix) so you can do path[,18]; your x.path is not. Just do class(x.path) and you should see that.
    – flodel
    Mar 30 '13 at 0:16










  • Update: I also tried replacing [,18] with [,'email'] but got the same message. I see two comments popped up while I'm editing this so let me save my commend then I will follow up on yours (and thanks btw!). I would give you the output of head() but it's confidential email bodies : /
    – user2225772
    Mar 30 '13 at 0:16












  • flodel: You're right, class(x.path) shows that it's character due to the paste() command, but I used that because of the authors' example and because I can't figure out how to get away from it while still using the anonymous function like in that third snippet of code in my original post. Is there a way I could do that without paste tho? Sorry for the dumb question.
    – user2225772
    Mar 30 '13 at 0:21










  • Roman: I can however describe the output of head(x.path). It's a large dataframe with different kinds of data stored relating to an email client. The only column I care about at the moment is the email body column and that one is just the text of emails, with r and n and other such text representations of formatting.
    – user2225772
    Mar 30 '13 at 0:24








3




3




It will be much easier if you show us your object (head, str) and the offending line of code. A reproducible example may go even farther.
– Roman Luštrik
Mar 29 '13 at 23:57




It will be much easier if you show us your object (head, str) and the offending line of code. A reproducible example may go even farther.
– Roman Luštrik
Mar 29 '13 at 23:57












Thought is path needs to be a two-dimension object (e.g. data.frame or matrix) so you can do path[,18]; your x.path is not. Just do class(x.path) and you should see that.
– flodel
Mar 30 '13 at 0:16




Thought is path needs to be a two-dimension object (e.g. data.frame or matrix) so you can do path[,18]; your x.path is not. Just do class(x.path) and you should see that.
– flodel
Mar 30 '13 at 0:16












Update: I also tried replacing [,18] with [,'email'] but got the same message. I see two comments popped up while I'm editing this so let me save my commend then I will follow up on yours (and thanks btw!). I would give you the output of head() but it's confidential email bodies : /
– user2225772
Mar 30 '13 at 0:16






Update: I also tried replacing [,18] with [,'email'] but got the same message. I see two comments popped up while I'm editing this so let me save my commend then I will follow up on yours (and thanks btw!). I would give you the output of head() but it's confidential email bodies : /
– user2225772
Mar 30 '13 at 0:16














flodel: You're right, class(x.path) shows that it's character due to the paste() command, but I used that because of the authors' example and because I can't figure out how to get away from it while still using the anonymous function like in that third snippet of code in my original post. Is there a way I could do that without paste tho? Sorry for the dumb question.
– user2225772
Mar 30 '13 at 0:21




flodel: You're right, class(x.path) shows that it's character due to the paste() command, but I used that because of the authors' example and because I can't figure out how to get away from it while still using the anonymous function like in that third snippet of code in my original post. Is there a way I could do that without paste tho? Sorry for the dumb question.
– user2225772
Mar 30 '13 at 0:21












Roman: I can however describe the output of head(x.path). It's a large dataframe with different kinds of data stored relating to an email client. The only column I care about at the moment is the email body column and that one is just the text of emails, with r and n and other such text representations of formatting.
– user2225772
Mar 30 '13 at 0:24




Roman: I can however describe the output of head(x.path). It's a large dataframe with different kinds of data stored relating to an email client. The only column I care about at the moment is the email body column and that one is just the text of emails, with r and n and other such text representations of formatting.
– user2225772
Mar 30 '13 at 0:24












2 Answers
2






active

oldest

votes


















5














Your example is a little complex for me to run, but I have gotten this error a number of times and the problem has always been due ultimately to the default behavior of the extract function (i.e. ) in coercing to the lowest possible number of dimensions. As BondedDust observes, if you extract a single column from a data frame you can no longer select subsets of the frame with the same syntax, because you do not have a data frame any more.



Frequently these problems vanish if, in any operation in which you may be reducing the data frame to a single column, you set the parameter drop=FALSE in the extract operation. I suggest that you look carefully not only at the line where the error is generated but also at any preceding lines in which the "" operation is used on the problem data frame. Look at the help for the data frame method for the extract function, "extract.data.frame"
believe the problem is probably that when you subset the data frame to a single column, it is coerced to a single dimension and can no longer be indexed by column number or row number.






share|improve this answer





























    2














    This might deserve to be a comment but it wouldn't fit and I'm prepared to delete if warranted. You say



    "So, the paste function was causing problems because it caused x.path to be considered an atomic array apparently, and gave the error that $ could not be used with an atomic array. As per an older StackOverflow Q&A, I changed the way I referred to the column to path[,18] (which is evaluated as x.path[,18] and therefore is the same as x[,18])."



    If x.path is an atomic array then you cannot use x.path[ , 18] but rather need to use x.path[18].



    You can inspect x.path with str(x.path) and your output suggests that is indeed a character vector. In R only objects with two dimensions (matrices and data.frames) can be referenced with object[ , n] references.






    share|improve this answer





















    • I think you might be onto something but I got this error: Error in path[!(is.na(path[18]) | path[18] == ""), ] : incorrect number of dimensions
      – user2225772
      Mar 30 '13 at 1:58












    • By the way, it is character (thanks to paste) and str(x.path) gives me this: str(x.path) chr [1:145] "c("5... and then it goes on a long way
      – user2225772
      Mar 30 '13 at 2:01










    • dim(x.path) says NULL... so it has no dimensions, but I'm not even really sure what that means. If there are no dimensions shouldn't path by itself have worked? But that gave the same error...
      – user2225772
      Mar 30 '13 at 2:03










    • Vectors in R have no dimensions. It just an ordinary character vector. I do not know what you mean by "shouldn't path work".
      – 42-
      Mar 30 '13 at 2:04












    • Thanks. It sounds like we are on the same page; that's what I thought it meant. When I said "shouldn't path work" I meant, since there is only one column/vector should just using "path" by itself instead of path[,18] in get.msg make it work? But it gives the same error. I also just tried changing the paste statement to specify the column of interest at that point but it didn't work.
      – user2225772
      Mar 30 '13 at 2:18













    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f15713222%2fincorrect-number-of-dimensions-error-help-me-understand-why%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    5














    Your example is a little complex for me to run, but I have gotten this error a number of times and the problem has always been due ultimately to the default behavior of the extract function (i.e. ) in coercing to the lowest possible number of dimensions. As BondedDust observes, if you extract a single column from a data frame you can no longer select subsets of the frame with the same syntax, because you do not have a data frame any more.



    Frequently these problems vanish if, in any operation in which you may be reducing the data frame to a single column, you set the parameter drop=FALSE in the extract operation. I suggest that you look carefully not only at the line where the error is generated but also at any preceding lines in which the "" operation is used on the problem data frame. Look at the help for the data frame method for the extract function, "extract.data.frame"
    believe the problem is probably that when you subset the data frame to a single column, it is coerced to a single dimension and can no longer be indexed by column number or row number.






    share|improve this answer


























      5














      Your example is a little complex for me to run, but I have gotten this error a number of times and the problem has always been due ultimately to the default behavior of the extract function (i.e. ) in coercing to the lowest possible number of dimensions. As BondedDust observes, if you extract a single column from a data frame you can no longer select subsets of the frame with the same syntax, because you do not have a data frame any more.



      Frequently these problems vanish if, in any operation in which you may be reducing the data frame to a single column, you set the parameter drop=FALSE in the extract operation. I suggest that you look carefully not only at the line where the error is generated but also at any preceding lines in which the "" operation is used on the problem data frame. Look at the help for the data frame method for the extract function, "extract.data.frame"
      believe the problem is probably that when you subset the data frame to a single column, it is coerced to a single dimension and can no longer be indexed by column number or row number.






      share|improve this answer
























        5












        5








        5






        Your example is a little complex for me to run, but I have gotten this error a number of times and the problem has always been due ultimately to the default behavior of the extract function (i.e. ) in coercing to the lowest possible number of dimensions. As BondedDust observes, if you extract a single column from a data frame you can no longer select subsets of the frame with the same syntax, because you do not have a data frame any more.



        Frequently these problems vanish if, in any operation in which you may be reducing the data frame to a single column, you set the parameter drop=FALSE in the extract operation. I suggest that you look carefully not only at the line where the error is generated but also at any preceding lines in which the "" operation is used on the problem data frame. Look at the help for the data frame method for the extract function, "extract.data.frame"
        believe the problem is probably that when you subset the data frame to a single column, it is coerced to a single dimension and can no longer be indexed by column number or row number.






        share|improve this answer












        Your example is a little complex for me to run, but I have gotten this error a number of times and the problem has always been due ultimately to the default behavior of the extract function (i.e. ) in coercing to the lowest possible number of dimensions. As BondedDust observes, if you extract a single column from a data frame you can no longer select subsets of the frame with the same syntax, because you do not have a data frame any more.



        Frequently these problems vanish if, in any operation in which you may be reducing the data frame to a single column, you set the parameter drop=FALSE in the extract operation. I suggest that you look carefully not only at the line where the error is generated but also at any preceding lines in which the "" operation is used on the problem data frame. Look at the help for the data frame method for the extract function, "extract.data.frame"
        believe the problem is probably that when you subset the data frame to a single column, it is coerced to a single dimension and can no longer be indexed by column number or row number.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Apr 16 '14 at 6:52









        andrewH

        7811717




        7811717

























            2














            This might deserve to be a comment but it wouldn't fit and I'm prepared to delete if warranted. You say



            "So, the paste function was causing problems because it caused x.path to be considered an atomic array apparently, and gave the error that $ could not be used with an atomic array. As per an older StackOverflow Q&A, I changed the way I referred to the column to path[,18] (which is evaluated as x.path[,18] and therefore is the same as x[,18])."



            If x.path is an atomic array then you cannot use x.path[ , 18] but rather need to use x.path[18].



            You can inspect x.path with str(x.path) and your output suggests that is indeed a character vector. In R only objects with two dimensions (matrices and data.frames) can be referenced with object[ , n] references.






            share|improve this answer





















            • I think you might be onto something but I got this error: Error in path[!(is.na(path[18]) | path[18] == ""), ] : incorrect number of dimensions
              – user2225772
              Mar 30 '13 at 1:58












            • By the way, it is character (thanks to paste) and str(x.path) gives me this: str(x.path) chr [1:145] "c("5... and then it goes on a long way
              – user2225772
              Mar 30 '13 at 2:01










            • dim(x.path) says NULL... so it has no dimensions, but I'm not even really sure what that means. If there are no dimensions shouldn't path by itself have worked? But that gave the same error...
              – user2225772
              Mar 30 '13 at 2:03










            • Vectors in R have no dimensions. It just an ordinary character vector. I do not know what you mean by "shouldn't path work".
              – 42-
              Mar 30 '13 at 2:04












            • Thanks. It sounds like we are on the same page; that's what I thought it meant. When I said "shouldn't path work" I meant, since there is only one column/vector should just using "path" by itself instead of path[,18] in get.msg make it work? But it gives the same error. I also just tried changing the paste statement to specify the column of interest at that point but it didn't work.
              – user2225772
              Mar 30 '13 at 2:18


















            2














            This might deserve to be a comment but it wouldn't fit and I'm prepared to delete if warranted. You say



            "So, the paste function was causing problems because it caused x.path to be considered an atomic array apparently, and gave the error that $ could not be used with an atomic array. As per an older StackOverflow Q&A, I changed the way I referred to the column to path[,18] (which is evaluated as x.path[,18] and therefore is the same as x[,18])."



            If x.path is an atomic array then you cannot use x.path[ , 18] but rather need to use x.path[18].



            You can inspect x.path with str(x.path) and your output suggests that is indeed a character vector. In R only objects with two dimensions (matrices and data.frames) can be referenced with object[ , n] references.






            share|improve this answer





















            • I think you might be onto something but I got this error: Error in path[!(is.na(path[18]) | path[18] == ""), ] : incorrect number of dimensions
              – user2225772
              Mar 30 '13 at 1:58












            • By the way, it is character (thanks to paste) and str(x.path) gives me this: str(x.path) chr [1:145] "c("5... and then it goes on a long way
              – user2225772
              Mar 30 '13 at 2:01










            • dim(x.path) says NULL... so it has no dimensions, but I'm not even really sure what that means. If there are no dimensions shouldn't path by itself have worked? But that gave the same error...
              – user2225772
              Mar 30 '13 at 2:03










            • Vectors in R have no dimensions. It just an ordinary character vector. I do not know what you mean by "shouldn't path work".
              – 42-
              Mar 30 '13 at 2:04












            • Thanks. It sounds like we are on the same page; that's what I thought it meant. When I said "shouldn't path work" I meant, since there is only one column/vector should just using "path" by itself instead of path[,18] in get.msg make it work? But it gives the same error. I also just tried changing the paste statement to specify the column of interest at that point but it didn't work.
              – user2225772
              Mar 30 '13 at 2:18
















            2












            2








            2






            This might deserve to be a comment but it wouldn't fit and I'm prepared to delete if warranted. You say



            "So, the paste function was causing problems because it caused x.path to be considered an atomic array apparently, and gave the error that $ could not be used with an atomic array. As per an older StackOverflow Q&A, I changed the way I referred to the column to path[,18] (which is evaluated as x.path[,18] and therefore is the same as x[,18])."



            If x.path is an atomic array then you cannot use x.path[ , 18] but rather need to use x.path[18].



            You can inspect x.path with str(x.path) and your output suggests that is indeed a character vector. In R only objects with two dimensions (matrices and data.frames) can be referenced with object[ , n] references.






            share|improve this answer












            This might deserve to be a comment but it wouldn't fit and I'm prepared to delete if warranted. You say



            "So, the paste function was causing problems because it caused x.path to be considered an atomic array apparently, and gave the error that $ could not be used with an atomic array. As per an older StackOverflow Q&A, I changed the way I referred to the column to path[,18] (which is evaluated as x.path[,18] and therefore is the same as x[,18])."



            If x.path is an atomic array then you cannot use x.path[ , 18] but rather need to use x.path[18].



            You can inspect x.path with str(x.path) and your output suggests that is indeed a character vector. In R only objects with two dimensions (matrices and data.frames) can be referenced with object[ , n] references.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Mar 30 '13 at 1:55









            42-

            211k14249395




            211k14249395












            • I think you might be onto something but I got this error: Error in path[!(is.na(path[18]) | path[18] == ""), ] : incorrect number of dimensions
              – user2225772
              Mar 30 '13 at 1:58












            • By the way, it is character (thanks to paste) and str(x.path) gives me this: str(x.path) chr [1:145] "c("5... and then it goes on a long way
              – user2225772
              Mar 30 '13 at 2:01










            • dim(x.path) says NULL... so it has no dimensions, but I'm not even really sure what that means. If there are no dimensions shouldn't path by itself have worked? But that gave the same error...
              – user2225772
              Mar 30 '13 at 2:03










            • Vectors in R have no dimensions. It just an ordinary character vector. I do not know what you mean by "shouldn't path work".
              – 42-
              Mar 30 '13 at 2:04












            • Thanks. It sounds like we are on the same page; that's what I thought it meant. When I said "shouldn't path work" I meant, since there is only one column/vector should just using "path" by itself instead of path[,18] in get.msg make it work? But it gives the same error. I also just tried changing the paste statement to specify the column of interest at that point but it didn't work.
              – user2225772
              Mar 30 '13 at 2:18




















            • I think you might be onto something but I got this error: Error in path[!(is.na(path[18]) | path[18] == ""), ] : incorrect number of dimensions
              – user2225772
              Mar 30 '13 at 1:58












            • By the way, it is character (thanks to paste) and str(x.path) gives me this: str(x.path) chr [1:145] "c("5... and then it goes on a long way
              – user2225772
              Mar 30 '13 at 2:01










            • dim(x.path) says NULL... so it has no dimensions, but I'm not even really sure what that means. If there are no dimensions shouldn't path by itself have worked? But that gave the same error...
              – user2225772
              Mar 30 '13 at 2:03










            • Vectors in R have no dimensions. It just an ordinary character vector. I do not know what you mean by "shouldn't path work".
              – 42-
              Mar 30 '13 at 2:04












            • Thanks. It sounds like we are on the same page; that's what I thought it meant. When I said "shouldn't path work" I meant, since there is only one column/vector should just using "path" by itself instead of path[,18] in get.msg make it work? But it gives the same error. I also just tried changing the paste statement to specify the column of interest at that point but it didn't work.
              – user2225772
              Mar 30 '13 at 2:18


















            I think you might be onto something but I got this error: Error in path[!(is.na(path[18]) | path[18] == ""), ] : incorrect number of dimensions
            – user2225772
            Mar 30 '13 at 1:58






            I think you might be onto something but I got this error: Error in path[!(is.na(path[18]) | path[18] == ""), ] : incorrect number of dimensions
            – user2225772
            Mar 30 '13 at 1:58














            By the way, it is character (thanks to paste) and str(x.path) gives me this: str(x.path) chr [1:145] "c("5... and then it goes on a long way
            – user2225772
            Mar 30 '13 at 2:01




            By the way, it is character (thanks to paste) and str(x.path) gives me this: str(x.path) chr [1:145] "c("5... and then it goes on a long way
            – user2225772
            Mar 30 '13 at 2:01












            dim(x.path) says NULL... so it has no dimensions, but I'm not even really sure what that means. If there are no dimensions shouldn't path by itself have worked? But that gave the same error...
            – user2225772
            Mar 30 '13 at 2:03




            dim(x.path) says NULL... so it has no dimensions, but I'm not even really sure what that means. If there are no dimensions shouldn't path by itself have worked? But that gave the same error...
            – user2225772
            Mar 30 '13 at 2:03












            Vectors in R have no dimensions. It just an ordinary character vector. I do not know what you mean by "shouldn't path work".
            – 42-
            Mar 30 '13 at 2:04






            Vectors in R have no dimensions. It just an ordinary character vector. I do not know what you mean by "shouldn't path work".
            – 42-
            Mar 30 '13 at 2:04














            Thanks. It sounds like we are on the same page; that's what I thought it meant. When I said "shouldn't path work" I meant, since there is only one column/vector should just using "path" by itself instead of path[,18] in get.msg make it work? But it gives the same error. I also just tried changing the paste statement to specify the column of interest at that point but it didn't work.
            – user2225772
            Mar 30 '13 at 2:18






            Thanks. It sounds like we are on the same page; that's what I thought it meant. When I said "shouldn't path work" I meant, since there is only one column/vector should just using "path" by itself instead of path[,18] in get.msg make it work? But it gives the same error. I also just tried changing the paste statement to specify the column of interest at that point but it didn't work.
            – user2225772
            Mar 30 '13 at 2:18




















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f15713222%2fincorrect-number-of-dimensions-error-help-me-understand-why%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Catalogne

            Violoncelliste

            Héron pourpré