How to delete a row by reference in data.table?












128














My question is related to assignment by reference versus copying in data.table. I want to know if one can delete rows by reference, similar to



DT[ , someCol := NULL]


I want to know about



DT[someRow := NULL, ]


I guess there's a good reason for why this function doesn't exist, so maybe you could just point out a good alternative to the usual copying approach, as below. In particular, going with my favourite from example(data.table),



DT = data.table(x = rep(c("a", "b", "c"), each = 3), y = c(1, 3, 6), v = 1:9)
# x y v
# [1,] a 1 1
# [2,] a 3 2
# [3,] a 6 3
# [4,] b 1 4
# [5,] b 3 5
# [6,] b 6 6
# [7,] c 1 7
# [8,] c 3 8
# [9,] c 6 9


Say I want to delete the first row from this data.table. I know I can do this:



DT <- DT[-1, ]


but often we may want to avoid that, because we are copying the object (and that requires about 3*N memory, if N object.size(DT), as pointed out here.
Now I found set(DT, i, j, value). I know how to set specific values (like here: set all values in rows 1 and 2 and columns 2 and 3 to zero)



set(DT, 1:2, 2:3, 0) 
DT
# x y v
# [1,] a 0 0
# [2,] a 0 0
# [3,] a 6 3
# [4,] b 1 4
# [5,] b 3 5
# [6,] b 6 6
# [7,] c 1 7
# [8,] c 3 8
# [9,] c 6 9


But how can I erase the first two rows, say? Doing



set(DT, 1:2, 1:3, NULL)


sets the entire DT to NULL.



My SQL knowledge is very limited, so you guys tell me: given data.table uses SQL technology, is there an equivalent to the SQL command



DELETE FROM table_name
WHERE some_column=some_value


in data.table?










share|improve this question




















  • 15




    I don't think it is that data.table() uses SQL technology so much as one can draw a parallel between the different operations in SQL and the various arguments to a data.table. To me, the reference to "technology" somewhat implies that data.table is sitting on top of a SQL database somewhere, which AFAIK is not the case.
    – Chase
    May 28 '12 at 21:15






  • 1




    thanks chase. yeah, i guess that sql analogy was a wild guess.
    – Florian Oswald
    May 29 '12 at 21:44






  • 1




    Often it should be sufficient to define a flag for keeping rows, like DT[ , keep := .I > 1], then subset for later operations: DT[(keep), ...], perhaps even setindex(DT, keep) the speed of this subsetting. Not a panacea, but worthwhile to consider as a design choice in your workflow -- do you really want to delete all those rows from memory, or would you prefer to exclude them? The answer differs by use case.
    – MichaelChirico
    Dec 19 '17 at 5:54
















128














My question is related to assignment by reference versus copying in data.table. I want to know if one can delete rows by reference, similar to



DT[ , someCol := NULL]


I want to know about



DT[someRow := NULL, ]


I guess there's a good reason for why this function doesn't exist, so maybe you could just point out a good alternative to the usual copying approach, as below. In particular, going with my favourite from example(data.table),



DT = data.table(x = rep(c("a", "b", "c"), each = 3), y = c(1, 3, 6), v = 1:9)
# x y v
# [1,] a 1 1
# [2,] a 3 2
# [3,] a 6 3
# [4,] b 1 4
# [5,] b 3 5
# [6,] b 6 6
# [7,] c 1 7
# [8,] c 3 8
# [9,] c 6 9


Say I want to delete the first row from this data.table. I know I can do this:



DT <- DT[-1, ]


but often we may want to avoid that, because we are copying the object (and that requires about 3*N memory, if N object.size(DT), as pointed out here.
Now I found set(DT, i, j, value). I know how to set specific values (like here: set all values in rows 1 and 2 and columns 2 and 3 to zero)



set(DT, 1:2, 2:3, 0) 
DT
# x y v
# [1,] a 0 0
# [2,] a 0 0
# [3,] a 6 3
# [4,] b 1 4
# [5,] b 3 5
# [6,] b 6 6
# [7,] c 1 7
# [8,] c 3 8
# [9,] c 6 9


But how can I erase the first two rows, say? Doing



set(DT, 1:2, 1:3, NULL)


sets the entire DT to NULL.



My SQL knowledge is very limited, so you guys tell me: given data.table uses SQL technology, is there an equivalent to the SQL command



DELETE FROM table_name
WHERE some_column=some_value


in data.table?










share|improve this question




















  • 15




    I don't think it is that data.table() uses SQL technology so much as one can draw a parallel between the different operations in SQL and the various arguments to a data.table. To me, the reference to "technology" somewhat implies that data.table is sitting on top of a SQL database somewhere, which AFAIK is not the case.
    – Chase
    May 28 '12 at 21:15






  • 1




    thanks chase. yeah, i guess that sql analogy was a wild guess.
    – Florian Oswald
    May 29 '12 at 21:44






  • 1




    Often it should be sufficient to define a flag for keeping rows, like DT[ , keep := .I > 1], then subset for later operations: DT[(keep), ...], perhaps even setindex(DT, keep) the speed of this subsetting. Not a panacea, but worthwhile to consider as a design choice in your workflow -- do you really want to delete all those rows from memory, or would you prefer to exclude them? The answer differs by use case.
    – MichaelChirico
    Dec 19 '17 at 5:54














128












128








128


50





My question is related to assignment by reference versus copying in data.table. I want to know if one can delete rows by reference, similar to



DT[ , someCol := NULL]


I want to know about



DT[someRow := NULL, ]


I guess there's a good reason for why this function doesn't exist, so maybe you could just point out a good alternative to the usual copying approach, as below. In particular, going with my favourite from example(data.table),



DT = data.table(x = rep(c("a", "b", "c"), each = 3), y = c(1, 3, 6), v = 1:9)
# x y v
# [1,] a 1 1
# [2,] a 3 2
# [3,] a 6 3
# [4,] b 1 4
# [5,] b 3 5
# [6,] b 6 6
# [7,] c 1 7
# [8,] c 3 8
# [9,] c 6 9


Say I want to delete the first row from this data.table. I know I can do this:



DT <- DT[-1, ]


but often we may want to avoid that, because we are copying the object (and that requires about 3*N memory, if N object.size(DT), as pointed out here.
Now I found set(DT, i, j, value). I know how to set specific values (like here: set all values in rows 1 and 2 and columns 2 and 3 to zero)



set(DT, 1:2, 2:3, 0) 
DT
# x y v
# [1,] a 0 0
# [2,] a 0 0
# [3,] a 6 3
# [4,] b 1 4
# [5,] b 3 5
# [6,] b 6 6
# [7,] c 1 7
# [8,] c 3 8
# [9,] c 6 9


But how can I erase the first two rows, say? Doing



set(DT, 1:2, 1:3, NULL)


sets the entire DT to NULL.



My SQL knowledge is very limited, so you guys tell me: given data.table uses SQL technology, is there an equivalent to the SQL command



DELETE FROM table_name
WHERE some_column=some_value


in data.table?










share|improve this question















My question is related to assignment by reference versus copying in data.table. I want to know if one can delete rows by reference, similar to



DT[ , someCol := NULL]


I want to know about



DT[someRow := NULL, ]


I guess there's a good reason for why this function doesn't exist, so maybe you could just point out a good alternative to the usual copying approach, as below. In particular, going with my favourite from example(data.table),



DT = data.table(x = rep(c("a", "b", "c"), each = 3), y = c(1, 3, 6), v = 1:9)
# x y v
# [1,] a 1 1
# [2,] a 3 2
# [3,] a 6 3
# [4,] b 1 4
# [5,] b 3 5
# [6,] b 6 6
# [7,] c 1 7
# [8,] c 3 8
# [9,] c 6 9


Say I want to delete the first row from this data.table. I know I can do this:



DT <- DT[-1, ]


but often we may want to avoid that, because we are copying the object (and that requires about 3*N memory, if N object.size(DT), as pointed out here.
Now I found set(DT, i, j, value). I know how to set specific values (like here: set all values in rows 1 and 2 and columns 2 and 3 to zero)



set(DT, 1:2, 2:3, 0) 
DT
# x y v
# [1,] a 0 0
# [2,] a 0 0
# [3,] a 6 3
# [4,] b 1 4
# [5,] b 3 5
# [6,] b 6 6
# [7,] c 1 7
# [8,] c 3 8
# [9,] c 6 9


But how can I erase the first two rows, say? Doing



set(DT, 1:2, 1:3, NULL)


sets the entire DT to NULL.



My SQL knowledge is very limited, so you guys tell me: given data.table uses SQL technology, is there an equivalent to the SQL command



DELETE FROM table_name
WHERE some_column=some_value


in data.table?







r data.table






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 23 '17 at 12:10









Community

11




11










asked May 28 '12 at 20:41









Florian OswaldFlorian Oswald

2,10251927




2,10251927








  • 15




    I don't think it is that data.table() uses SQL technology so much as one can draw a parallel between the different operations in SQL and the various arguments to a data.table. To me, the reference to "technology" somewhat implies that data.table is sitting on top of a SQL database somewhere, which AFAIK is not the case.
    – Chase
    May 28 '12 at 21:15






  • 1




    thanks chase. yeah, i guess that sql analogy was a wild guess.
    – Florian Oswald
    May 29 '12 at 21:44






  • 1




    Often it should be sufficient to define a flag for keeping rows, like DT[ , keep := .I > 1], then subset for later operations: DT[(keep), ...], perhaps even setindex(DT, keep) the speed of this subsetting. Not a panacea, but worthwhile to consider as a design choice in your workflow -- do you really want to delete all those rows from memory, or would you prefer to exclude them? The answer differs by use case.
    – MichaelChirico
    Dec 19 '17 at 5:54














  • 15




    I don't think it is that data.table() uses SQL technology so much as one can draw a parallel between the different operations in SQL and the various arguments to a data.table. To me, the reference to "technology" somewhat implies that data.table is sitting on top of a SQL database somewhere, which AFAIK is not the case.
    – Chase
    May 28 '12 at 21:15






  • 1




    thanks chase. yeah, i guess that sql analogy was a wild guess.
    – Florian Oswald
    May 29 '12 at 21:44






  • 1




    Often it should be sufficient to define a flag for keeping rows, like DT[ , keep := .I > 1], then subset for later operations: DT[(keep), ...], perhaps even setindex(DT, keep) the speed of this subsetting. Not a panacea, but worthwhile to consider as a design choice in your workflow -- do you really want to delete all those rows from memory, or would you prefer to exclude them? The answer differs by use case.
    – MichaelChirico
    Dec 19 '17 at 5:54








15




15




I don't think it is that data.table() uses SQL technology so much as one can draw a parallel between the different operations in SQL and the various arguments to a data.table. To me, the reference to "technology" somewhat implies that data.table is sitting on top of a SQL database somewhere, which AFAIK is not the case.
– Chase
May 28 '12 at 21:15




I don't think it is that data.table() uses SQL technology so much as one can draw a parallel between the different operations in SQL and the various arguments to a data.table. To me, the reference to "technology" somewhat implies that data.table is sitting on top of a SQL database somewhere, which AFAIK is not the case.
– Chase
May 28 '12 at 21:15




1




1




thanks chase. yeah, i guess that sql analogy was a wild guess.
– Florian Oswald
May 29 '12 at 21:44




thanks chase. yeah, i guess that sql analogy was a wild guess.
– Florian Oswald
May 29 '12 at 21:44




1




1




Often it should be sufficient to define a flag for keeping rows, like DT[ , keep := .I > 1], then subset for later operations: DT[(keep), ...], perhaps even setindex(DT, keep) the speed of this subsetting. Not a panacea, but worthwhile to consider as a design choice in your workflow -- do you really want to delete all those rows from memory, or would you prefer to exclude them? The answer differs by use case.
– MichaelChirico
Dec 19 '17 at 5:54




Often it should be sufficient to define a flag for keeping rows, like DT[ , keep := .I > 1], then subset for later operations: DT[(keep), ...], perhaps even setindex(DT, keep) the speed of this subsetting. Not a panacea, but worthwhile to consider as a design choice in your workflow -- do you really want to delete all those rows from memory, or would you prefer to exclude them? The answer differs by use case.
– MichaelChirico
Dec 19 '17 at 5:54












6 Answers
6






active

oldest

votes


















101














Good question. data.table can't delete rows by reference yet.



data.table can add and delete columns by reference since it over-allocates the vector of column pointers, as you know. The plan is to do something similar for rows and allow fast insert and delete. A row delete would use memmove in C to budge up the items (in each and every column) after the deleted rows. Deleting a row in the middle of the table would still be quite inefficient compared to a row store database such as SQL, which is more suited for fast insert and delete of rows wherever those rows are in the table. But still, it would be a lot faster than copying a new large object without the deleted rows.



On the other hand, since column vectors would be over-allocated, rows could be inserted (and deleted) at the end, instantly; e.g., a growing time series.





It's filed as an issue: Delete rows by reference.






share|improve this answer



















  • 20




    Looking forward to this shipping...
    – Sim
    Dec 19 '12 at 6:06






  • 1




    @Matthew Dowle Is there some news on this ?
    – statquant
    Apr 19 '13 at 16:08








  • 15




    @statquant I think I should fix the 37 bugs, and finish fread first. After that it's pretty high.
    – Matt Dowle
    Apr 19 '13 at 18:07






  • 15




    @MatthewDowle sure, thanks again for everything you are doing.
    – statquant
    Apr 19 '13 at 18:26






  • 3




    It's filed as FR#635
    – Matt Dowle
    Oct 29 '15 at 18:58



















28














the approach that i have taken in order to make memory use be similar to in-place deletion is to subset a column at a time and delete. not as fast as a proper C memmove solution, but memory use is all i care about here. something like this:



DT = data.table(col1 = 1:1e6)
cols = paste0('col', 2:100)
for (col in cols){ DT[, (col) := 1:1e6] }
keep.idxs = sample(1e6, 9e5, FALSE) # keep 90% of entries
DT.subset = data.table(col1 = DT[['col1']][keep.idxs]) # this is the subsetted table
for (col in cols){
DT.subset[, (col) := DT[[col]][keep.idxs]]
DT[, (col) := NULL] #delete
}





share|improve this answer



















  • 5




    +1 Nice memory efficient approach. So ideally we need to delete a set of rows by reference actually don't we, I hadn't thought of that. It'll have to be a series of memmoves to budge up the gaps, but that's ok.
    – Matt Dowle
    Jan 21 '14 at 20:50










  • Would this work as a function, or does the use in a function and return force it to make memory copies?
    – russellpierce
    Feb 21 '14 at 16:06






  • 1




    it would work in a function, since data.tables are always references.
    – vc273
    Feb 21 '14 at 19:26






  • 1




    thanks, nice one. To speed up a little bit (especially with many columns) you change DT[, col:= NULL, with = F] in set(DT, NULL, col, NULL)
    – Michele
    Jul 7 '14 at 17:13








  • 2




    Updating in light of changing idiom and warning "with=FALSE together with := was deprecated in v1.9.4 released Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[,(myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. See ?':=' for other examples. As warned in 2014, this is now a warning."
    – Frank
    Nov 18 '16 at 17:39



















5














Here is a working function based on @vc273's answer and @Frank's feedback.



delete <- function(DT, del.idxs) {           # pls note 'del.idxs' vs. 'keep.idxs'
keep.idxs <- setdiff(DT[, .I], del.idxs); # select row indexes to keep
cols = names(DT);
DT.subset <- data.table(DT[[1]][keep.idxs]); # this is the subsetted table
setnames(DT.subset, cols[1]);
for (col in cols[2:length(cols)]) {
DT.subset[, (col) := DT[[col]][keep.idxs]];
DT[, (col) := NULL]; # delete
}
return(DT.subset);
}


And example of its usage:



dat <- delete(dat,del.idxs)   ## Pls note 'del.idxs' instead of 'keep.idxs'


Where "dat" is a data.table. Removing 14k rows from 1.4M rows takes 0.25 sec on my laptop.



> dim(dat)
[1] 1419393 25
> system.time(dat <- delete(dat,del.idxs))
user system elapsed
0.23 0.02 0.25
> dim(dat)
[1] 1404715 25
>


PS. Since I am new to SO, I could not add comment to @vc273's thread :-(






share|improve this answer























  • I commented under vc's answer explaining the changed syntax for (col) :=. Kind of odd to have a function named "delete" but an arg related to what to keep. Btw, generally it's preferred to use a reproducible example rather than to show dim for your own data. You could reuse DT from the question, for example.
    – Frank
    Nov 18 '16 at 17:42










  • I don't understand why you do it by reference but later use an assignment dat <-
    – skan
    Jan 10 '17 at 0:16






  • 1




    @skan , That assignment assigns "dat" to point to the modified data.table that itself has been created by subsetting the original data.table. The <- assingment does not do copy of the return data, just assigns new name for it. link
    – Jarno P.
    Jan 11 '17 at 2:54












  • @Frank , I have updated the function for the oddity you pointed out.
    – Jarno P.
    Jan 11 '17 at 2:57










  • Ok, thanks. I'm leaving the comment since I still think it's worth noting that showing console output instead of a reproducible example is not encouraged here. Also, a single benchmark isn't so informative. If you also measured the time taken for the subsetting, it'd be more informative (since most of us don't intuitively know how long that takes, much less how long it takes on your comp). Anyway, I don't mean to suggest this is a bad answer; I'm one of its upvoters.
    – Frank
    Jan 11 '17 at 3:20





















4














Instead or trying to set to NULL, try setting to NA (matching the NA-type for the first column)



set(DT,1:2, 1:3 ,NA_character_)





share|improve this answer

















  • 3




    yeah, that works I guess. My problem is that I have a lot of data and I want to get rid of exactly those rows with NA, possibly without having to copy DT to get rid of those rows. thanks for your comment anyway!
    – Florian Oswald
    May 29 '12 at 21:48



















3














The topic is still interesting many people (me included).



What about that? I used assign to replace the glovalenv and the code described previously. It would be better to capture the original environment but at least in globalenv it is memory efficient and acts like a change by ref.



delete <- function(DT, del.idxs) 
{
varname = deparse(substitute(DT))

keep.idxs <- setdiff(DT[, .I], del.idxs)
cols = names(DT);
DT.subset <- data.table(DT[[1]][keep.idxs])
setnames(DT.subset, cols[1])

for (col in cols[2:length(cols)])
{
DT.subset[, (col) := DT[[col]][keep.idxs]]
DT[, (col) := NULL]; # delete
}

assign(varname, DT.subset, envir = globalenv())
return(invisible())
}

DT = data.table(x = rep(c("a", "b", "c"), each = 3), y = c(1, 3, 6), v = 1:9)
delete(DT, 3)





share|improve this answer





















  • Just to be clear, this does not delete by reference (based on address(DT); delete(DT, 3); address(DT)), though it may be efficient in some sense.
    – Frank
    Aug 28 '17 at 16:41






  • 1




    No it does not. It emulates the behavior and is memory efficient. That's why I said: it acts like. But strictly speaking you're right the address changed.
    – JRR
    Aug 28 '17 at 18:22



















1














Here are some strategies I have used. I believe a .ROW function may be coming. None of these approaches below are fast. These are some strategies a little beyond subsets or filtering. I tried to think like dba just trying to clean up data. As noted above, you can select or remove rows in data.table:



data(iris)
iris <- data.table(iris)

iris[3] # Select row three

iris[-3] # Remove row three

You can also use .SD to select or remove rows:

iris[,.SD[3]] # Select row three

iris[,.SD[3:6],by=,.(Species)] # Select row 3 - 6 for each Species

iris[,.SD[-3]] # Remove row three

iris[,.SD[-3:-6],by=,.(Species)] # Remove row 3 - 6 for each Species


Note: .SD creates a subset of the original data and allows you to do quite a bit of work in j or subsequent data.table. See https://stackoverflow.com/a/47406952/305675. Here I ordered my irises by Sepal Length, take a specified Sepal.Length as minimum,select the top three (by Sepal Length) of all Species and return all accompanying data:



iris[order(-Sepal.Length)][Sepal.Length > 3,.SD[1:3],by=,.(Species)]


The approaches above all reorder a data.table sequentially when removing rows. You can transpose a data.table and remove or replace the old rows which are now transposed columns. When using ':=NULL' to remove a transposed row, the subsequent column name is removed as well:



m_iris <- data.table(t(iris))[,V3:=NULL] # V3 column removed

d_iris <- data.table(t(iris))[,V3:=V2] # V3 column replaced with V2


When you transpose the data.frame back to a data.table, you may want to rename from the original data.table and restore class attributes in the case of deletion. Applying ":=NULL" to a now transposed data.table creates all character classes.



m_iris <- data.table(t(d_iris));
setnames(d_iris,names(iris))

d_iris <- data.table(t(m_iris));
setnames(m_iris,names(iris))


You may just want to remove duplicate rows which you can do with or without a Key:



d_iris[,Key:=paste0(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)]     

d_iris[!duplicated(Key),]

d_iris[!duplicated(paste0(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)),]


It is also possible to add an incremental counter with '.I'. You can then search for duplicated keys or fields and remove them by removing the record with the counter. This is computationally expensive, but has some advantages since you can print the lines to be removed.



d_iris[,I:=.I,] # add a counter field

d_iris[,Key:=paste0(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)]

for(i in d_iris[duplicated(Key),I]) {print(i)} # See lines with duplicated Key or Field

for(i in d_iris[duplicated(Key),I]) {d_iris <- d_iris[!I == i,]} # Remove lines with duplicated Key or any particular field.


You can also just fill a row with 0s or NAs and then use an i query to delete them:



 X 
x v foo
1: c 8 4
2: b 7 2

X[1] <- c(0)

X
x v foo
1: 0 0 0
2: b 7 2

X[2] <- c(NA)
X
x v foo
1: 0 0 0
2: NA NA NA

X <- X[x != 0,]
X <- X[!is.na(x),]





share|improve this answer























  • This doesn't really answer the question (about removal by reference) and using t on a data.frame is usually not a good idea; check str(m_iris) to see that all data has become string/character. Btw, you can also get row numbers by using d_iris[duplicated(Key), which = TRUE] without making a counter column.
    – Frank
    Feb 6 '18 at 20:39










  • Yes, you are right. I don't answer the question specifically. But removing a row by reference doesn't have official functionality or documentation yet and many people are going to come to this post looking for generic functionality to do exactly that. We could create a post to just answer the question on how to remove a row. Stack overflow is very useful and I really understand the necessity to keep answers exact to the question. Sometimes though, I think SO can be a just a little fascist in this regard...but maybe there is a good reason for that.
    – rferrisx
    Feb 8 '18 at 18:28










  • Ok, thanks for explaining. I think for now our discussion here is enough of a signpost for anyone who gets confused in this case.
    – Frank
    Feb 8 '18 at 19:18











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f10790204%2fhow-to-delete-a-row-by-reference-in-data-table%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























6 Answers
6






active

oldest

votes








6 Answers
6






active

oldest

votes









active

oldest

votes






active

oldest

votes









101














Good question. data.table can't delete rows by reference yet.



data.table can add and delete columns by reference since it over-allocates the vector of column pointers, as you know. The plan is to do something similar for rows and allow fast insert and delete. A row delete would use memmove in C to budge up the items (in each and every column) after the deleted rows. Deleting a row in the middle of the table would still be quite inefficient compared to a row store database such as SQL, which is more suited for fast insert and delete of rows wherever those rows are in the table. But still, it would be a lot faster than copying a new large object without the deleted rows.



On the other hand, since column vectors would be over-allocated, rows could be inserted (and deleted) at the end, instantly; e.g., a growing time series.





It's filed as an issue: Delete rows by reference.






share|improve this answer



















  • 20




    Looking forward to this shipping...
    – Sim
    Dec 19 '12 at 6:06






  • 1




    @Matthew Dowle Is there some news on this ?
    – statquant
    Apr 19 '13 at 16:08








  • 15




    @statquant I think I should fix the 37 bugs, and finish fread first. After that it's pretty high.
    – Matt Dowle
    Apr 19 '13 at 18:07






  • 15




    @MatthewDowle sure, thanks again for everything you are doing.
    – statquant
    Apr 19 '13 at 18:26






  • 3




    It's filed as FR#635
    – Matt Dowle
    Oct 29 '15 at 18:58
















101














Good question. data.table can't delete rows by reference yet.



data.table can add and delete columns by reference since it over-allocates the vector of column pointers, as you know. The plan is to do something similar for rows and allow fast insert and delete. A row delete would use memmove in C to budge up the items (in each and every column) after the deleted rows. Deleting a row in the middle of the table would still be quite inefficient compared to a row store database such as SQL, which is more suited for fast insert and delete of rows wherever those rows are in the table. But still, it would be a lot faster than copying a new large object without the deleted rows.



On the other hand, since column vectors would be over-allocated, rows could be inserted (and deleted) at the end, instantly; e.g., a growing time series.





It's filed as an issue: Delete rows by reference.






share|improve this answer



















  • 20




    Looking forward to this shipping...
    – Sim
    Dec 19 '12 at 6:06






  • 1




    @Matthew Dowle Is there some news on this ?
    – statquant
    Apr 19 '13 at 16:08








  • 15




    @statquant I think I should fix the 37 bugs, and finish fread first. After that it's pretty high.
    – Matt Dowle
    Apr 19 '13 at 18:07






  • 15




    @MatthewDowle sure, thanks again for everything you are doing.
    – statquant
    Apr 19 '13 at 18:26






  • 3




    It's filed as FR#635
    – Matt Dowle
    Oct 29 '15 at 18:58














101












101








101






Good question. data.table can't delete rows by reference yet.



data.table can add and delete columns by reference since it over-allocates the vector of column pointers, as you know. The plan is to do something similar for rows and allow fast insert and delete. A row delete would use memmove in C to budge up the items (in each and every column) after the deleted rows. Deleting a row in the middle of the table would still be quite inefficient compared to a row store database such as SQL, which is more suited for fast insert and delete of rows wherever those rows are in the table. But still, it would be a lot faster than copying a new large object without the deleted rows.



On the other hand, since column vectors would be over-allocated, rows could be inserted (and deleted) at the end, instantly; e.g., a growing time series.





It's filed as an issue: Delete rows by reference.






share|improve this answer














Good question. data.table can't delete rows by reference yet.



data.table can add and delete columns by reference since it over-allocates the vector of column pointers, as you know. The plan is to do something similar for rows and allow fast insert and delete. A row delete would use memmove in C to budge up the items (in each and every column) after the deleted rows. Deleting a row in the middle of the table would still be quite inefficient compared to a row store database such as SQL, which is more suited for fast insert and delete of rows wherever those rows are in the table. But still, it would be a lot faster than copying a new large object without the deleted rows.



On the other hand, since column vectors would be over-allocated, rows could be inserted (and deleted) at the end, instantly; e.g., a growing time series.





It's filed as an issue: Delete rows by reference.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 23 '18 at 12:47









Henrik

41k992107




41k992107










answered May 29 '12 at 0:20









Matt DowleMatt Dowle

46.5k16132200




46.5k16132200








  • 20




    Looking forward to this shipping...
    – Sim
    Dec 19 '12 at 6:06






  • 1




    @Matthew Dowle Is there some news on this ?
    – statquant
    Apr 19 '13 at 16:08








  • 15




    @statquant I think I should fix the 37 bugs, and finish fread first. After that it's pretty high.
    – Matt Dowle
    Apr 19 '13 at 18:07






  • 15




    @MatthewDowle sure, thanks again for everything you are doing.
    – statquant
    Apr 19 '13 at 18:26






  • 3




    It's filed as FR#635
    – Matt Dowle
    Oct 29 '15 at 18:58














  • 20




    Looking forward to this shipping...
    – Sim
    Dec 19 '12 at 6:06






  • 1




    @Matthew Dowle Is there some news on this ?
    – statquant
    Apr 19 '13 at 16:08








  • 15




    @statquant I think I should fix the 37 bugs, and finish fread first. After that it's pretty high.
    – Matt Dowle
    Apr 19 '13 at 18:07






  • 15




    @MatthewDowle sure, thanks again for everything you are doing.
    – statquant
    Apr 19 '13 at 18:26






  • 3




    It's filed as FR#635
    – Matt Dowle
    Oct 29 '15 at 18:58








20




20




Looking forward to this shipping...
– Sim
Dec 19 '12 at 6:06




Looking forward to this shipping...
– Sim
Dec 19 '12 at 6:06




1




1




@Matthew Dowle Is there some news on this ?
– statquant
Apr 19 '13 at 16:08






@Matthew Dowle Is there some news on this ?
– statquant
Apr 19 '13 at 16:08






15




15




@statquant I think I should fix the 37 bugs, and finish fread first. After that it's pretty high.
– Matt Dowle
Apr 19 '13 at 18:07




@statquant I think I should fix the 37 bugs, and finish fread first. After that it's pretty high.
– Matt Dowle
Apr 19 '13 at 18:07




15




15




@MatthewDowle sure, thanks again for everything you are doing.
– statquant
Apr 19 '13 at 18:26




@MatthewDowle sure, thanks again for everything you are doing.
– statquant
Apr 19 '13 at 18:26




3




3




It's filed as FR#635
– Matt Dowle
Oct 29 '15 at 18:58




It's filed as FR#635
– Matt Dowle
Oct 29 '15 at 18:58













28














the approach that i have taken in order to make memory use be similar to in-place deletion is to subset a column at a time and delete. not as fast as a proper C memmove solution, but memory use is all i care about here. something like this:



DT = data.table(col1 = 1:1e6)
cols = paste0('col', 2:100)
for (col in cols){ DT[, (col) := 1:1e6] }
keep.idxs = sample(1e6, 9e5, FALSE) # keep 90% of entries
DT.subset = data.table(col1 = DT[['col1']][keep.idxs]) # this is the subsetted table
for (col in cols){
DT.subset[, (col) := DT[[col]][keep.idxs]]
DT[, (col) := NULL] #delete
}





share|improve this answer



















  • 5




    +1 Nice memory efficient approach. So ideally we need to delete a set of rows by reference actually don't we, I hadn't thought of that. It'll have to be a series of memmoves to budge up the gaps, but that's ok.
    – Matt Dowle
    Jan 21 '14 at 20:50










  • Would this work as a function, or does the use in a function and return force it to make memory copies?
    – russellpierce
    Feb 21 '14 at 16:06






  • 1




    it would work in a function, since data.tables are always references.
    – vc273
    Feb 21 '14 at 19:26






  • 1




    thanks, nice one. To speed up a little bit (especially with many columns) you change DT[, col:= NULL, with = F] in set(DT, NULL, col, NULL)
    – Michele
    Jul 7 '14 at 17:13








  • 2




    Updating in light of changing idiom and warning "with=FALSE together with := was deprecated in v1.9.4 released Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[,(myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. See ?':=' for other examples. As warned in 2014, this is now a warning."
    – Frank
    Nov 18 '16 at 17:39
















28














the approach that i have taken in order to make memory use be similar to in-place deletion is to subset a column at a time and delete. not as fast as a proper C memmove solution, but memory use is all i care about here. something like this:



DT = data.table(col1 = 1:1e6)
cols = paste0('col', 2:100)
for (col in cols){ DT[, (col) := 1:1e6] }
keep.idxs = sample(1e6, 9e5, FALSE) # keep 90% of entries
DT.subset = data.table(col1 = DT[['col1']][keep.idxs]) # this is the subsetted table
for (col in cols){
DT.subset[, (col) := DT[[col]][keep.idxs]]
DT[, (col) := NULL] #delete
}





share|improve this answer



















  • 5




    +1 Nice memory efficient approach. So ideally we need to delete a set of rows by reference actually don't we, I hadn't thought of that. It'll have to be a series of memmoves to budge up the gaps, but that's ok.
    – Matt Dowle
    Jan 21 '14 at 20:50










  • Would this work as a function, or does the use in a function and return force it to make memory copies?
    – russellpierce
    Feb 21 '14 at 16:06






  • 1




    it would work in a function, since data.tables are always references.
    – vc273
    Feb 21 '14 at 19:26






  • 1




    thanks, nice one. To speed up a little bit (especially with many columns) you change DT[, col:= NULL, with = F] in set(DT, NULL, col, NULL)
    – Michele
    Jul 7 '14 at 17:13








  • 2




    Updating in light of changing idiom and warning "with=FALSE together with := was deprecated in v1.9.4 released Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[,(myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. See ?':=' for other examples. As warned in 2014, this is now a warning."
    – Frank
    Nov 18 '16 at 17:39














28












28








28






the approach that i have taken in order to make memory use be similar to in-place deletion is to subset a column at a time and delete. not as fast as a proper C memmove solution, but memory use is all i care about here. something like this:



DT = data.table(col1 = 1:1e6)
cols = paste0('col', 2:100)
for (col in cols){ DT[, (col) := 1:1e6] }
keep.idxs = sample(1e6, 9e5, FALSE) # keep 90% of entries
DT.subset = data.table(col1 = DT[['col1']][keep.idxs]) # this is the subsetted table
for (col in cols){
DT.subset[, (col) := DT[[col]][keep.idxs]]
DT[, (col) := NULL] #delete
}





share|improve this answer














the approach that i have taken in order to make memory use be similar to in-place deletion is to subset a column at a time and delete. not as fast as a proper C memmove solution, but memory use is all i care about here. something like this:



DT = data.table(col1 = 1:1e6)
cols = paste0('col', 2:100)
for (col in cols){ DT[, (col) := 1:1e6] }
keep.idxs = sample(1e6, 9e5, FALSE) # keep 90% of entries
DT.subset = data.table(col1 = DT[['col1']][keep.idxs]) # this is the subsetted table
for (col in cols){
DT.subset[, (col) := DT[[col]][keep.idxs]]
DT[, (col) := NULL] #delete
}






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 18 '16 at 17:39









Frank

53.7k653127




53.7k653127










answered Jan 21 '14 at 18:39









vc273vc273

49449




49449








  • 5




    +1 Nice memory efficient approach. So ideally we need to delete a set of rows by reference actually don't we, I hadn't thought of that. It'll have to be a series of memmoves to budge up the gaps, but that's ok.
    – Matt Dowle
    Jan 21 '14 at 20:50










  • Would this work as a function, or does the use in a function and return force it to make memory copies?
    – russellpierce
    Feb 21 '14 at 16:06






  • 1




    it would work in a function, since data.tables are always references.
    – vc273
    Feb 21 '14 at 19:26






  • 1




    thanks, nice one. To speed up a little bit (especially with many columns) you change DT[, col:= NULL, with = F] in set(DT, NULL, col, NULL)
    – Michele
    Jul 7 '14 at 17:13








  • 2




    Updating in light of changing idiom and warning "with=FALSE together with := was deprecated in v1.9.4 released Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[,(myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. See ?':=' for other examples. As warned in 2014, this is now a warning."
    – Frank
    Nov 18 '16 at 17:39














  • 5




    +1 Nice memory efficient approach. So ideally we need to delete a set of rows by reference actually don't we, I hadn't thought of that. It'll have to be a series of memmoves to budge up the gaps, but that's ok.
    – Matt Dowle
    Jan 21 '14 at 20:50










  • Would this work as a function, or does the use in a function and return force it to make memory copies?
    – russellpierce
    Feb 21 '14 at 16:06






  • 1




    it would work in a function, since data.tables are always references.
    – vc273
    Feb 21 '14 at 19:26






  • 1




    thanks, nice one. To speed up a little bit (especially with many columns) you change DT[, col:= NULL, with = F] in set(DT, NULL, col, NULL)
    – Michele
    Jul 7 '14 at 17:13








  • 2




    Updating in light of changing idiom and warning "with=FALSE together with := was deprecated in v1.9.4 released Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[,(myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. See ?':=' for other examples. As warned in 2014, this is now a warning."
    – Frank
    Nov 18 '16 at 17:39








5




5




+1 Nice memory efficient approach. So ideally we need to delete a set of rows by reference actually don't we, I hadn't thought of that. It'll have to be a series of memmoves to budge up the gaps, but that's ok.
– Matt Dowle
Jan 21 '14 at 20:50




+1 Nice memory efficient approach. So ideally we need to delete a set of rows by reference actually don't we, I hadn't thought of that. It'll have to be a series of memmoves to budge up the gaps, but that's ok.
– Matt Dowle
Jan 21 '14 at 20:50












Would this work as a function, or does the use in a function and return force it to make memory copies?
– russellpierce
Feb 21 '14 at 16:06




Would this work as a function, or does the use in a function and return force it to make memory copies?
– russellpierce
Feb 21 '14 at 16:06




1




1




it would work in a function, since data.tables are always references.
– vc273
Feb 21 '14 at 19:26




it would work in a function, since data.tables are always references.
– vc273
Feb 21 '14 at 19:26




1




1




thanks, nice one. To speed up a little bit (especially with many columns) you change DT[, col:= NULL, with = F] in set(DT, NULL, col, NULL)
– Michele
Jul 7 '14 at 17:13






thanks, nice one. To speed up a little bit (especially with many columns) you change DT[, col:= NULL, with = F] in set(DT, NULL, col, NULL)
– Michele
Jul 7 '14 at 17:13






2




2




Updating in light of changing idiom and warning "with=FALSE together with := was deprecated in v1.9.4 released Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[,(myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. See ?':=' for other examples. As warned in 2014, this is now a warning."
– Frank
Nov 18 '16 at 17:39




Updating in light of changing idiom and warning "with=FALSE together with := was deprecated in v1.9.4 released Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[,(myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. See ?':=' for other examples. As warned in 2014, this is now a warning."
– Frank
Nov 18 '16 at 17:39











5














Here is a working function based on @vc273's answer and @Frank's feedback.



delete <- function(DT, del.idxs) {           # pls note 'del.idxs' vs. 'keep.idxs'
keep.idxs <- setdiff(DT[, .I], del.idxs); # select row indexes to keep
cols = names(DT);
DT.subset <- data.table(DT[[1]][keep.idxs]); # this is the subsetted table
setnames(DT.subset, cols[1]);
for (col in cols[2:length(cols)]) {
DT.subset[, (col) := DT[[col]][keep.idxs]];
DT[, (col) := NULL]; # delete
}
return(DT.subset);
}


And example of its usage:



dat <- delete(dat,del.idxs)   ## Pls note 'del.idxs' instead of 'keep.idxs'


Where "dat" is a data.table. Removing 14k rows from 1.4M rows takes 0.25 sec on my laptop.



> dim(dat)
[1] 1419393 25
> system.time(dat <- delete(dat,del.idxs))
user system elapsed
0.23 0.02 0.25
> dim(dat)
[1] 1404715 25
>


PS. Since I am new to SO, I could not add comment to @vc273's thread :-(






share|improve this answer























  • I commented under vc's answer explaining the changed syntax for (col) :=. Kind of odd to have a function named "delete" but an arg related to what to keep. Btw, generally it's preferred to use a reproducible example rather than to show dim for your own data. You could reuse DT from the question, for example.
    – Frank
    Nov 18 '16 at 17:42










  • I don't understand why you do it by reference but later use an assignment dat <-
    – skan
    Jan 10 '17 at 0:16






  • 1




    @skan , That assignment assigns "dat" to point to the modified data.table that itself has been created by subsetting the original data.table. The <- assingment does not do copy of the return data, just assigns new name for it. link
    – Jarno P.
    Jan 11 '17 at 2:54












  • @Frank , I have updated the function for the oddity you pointed out.
    – Jarno P.
    Jan 11 '17 at 2:57










  • Ok, thanks. I'm leaving the comment since I still think it's worth noting that showing console output instead of a reproducible example is not encouraged here. Also, a single benchmark isn't so informative. If you also measured the time taken for the subsetting, it'd be more informative (since most of us don't intuitively know how long that takes, much less how long it takes on your comp). Anyway, I don't mean to suggest this is a bad answer; I'm one of its upvoters.
    – Frank
    Jan 11 '17 at 3:20


















5














Here is a working function based on @vc273's answer and @Frank's feedback.



delete <- function(DT, del.idxs) {           # pls note 'del.idxs' vs. 'keep.idxs'
keep.idxs <- setdiff(DT[, .I], del.idxs); # select row indexes to keep
cols = names(DT);
DT.subset <- data.table(DT[[1]][keep.idxs]); # this is the subsetted table
setnames(DT.subset, cols[1]);
for (col in cols[2:length(cols)]) {
DT.subset[, (col) := DT[[col]][keep.idxs]];
DT[, (col) := NULL]; # delete
}
return(DT.subset);
}


And example of its usage:



dat <- delete(dat,del.idxs)   ## Pls note 'del.idxs' instead of 'keep.idxs'


Where "dat" is a data.table. Removing 14k rows from 1.4M rows takes 0.25 sec on my laptop.



> dim(dat)
[1] 1419393 25
> system.time(dat <- delete(dat,del.idxs))
user system elapsed
0.23 0.02 0.25
> dim(dat)
[1] 1404715 25
>


PS. Since I am new to SO, I could not add comment to @vc273's thread :-(






share|improve this answer























  • I commented under vc's answer explaining the changed syntax for (col) :=. Kind of odd to have a function named "delete" but an arg related to what to keep. Btw, generally it's preferred to use a reproducible example rather than to show dim for your own data. You could reuse DT from the question, for example.
    – Frank
    Nov 18 '16 at 17:42










  • I don't understand why you do it by reference but later use an assignment dat <-
    – skan
    Jan 10 '17 at 0:16






  • 1




    @skan , That assignment assigns "dat" to point to the modified data.table that itself has been created by subsetting the original data.table. The <- assingment does not do copy of the return data, just assigns new name for it. link
    – Jarno P.
    Jan 11 '17 at 2:54












  • @Frank , I have updated the function for the oddity you pointed out.
    – Jarno P.
    Jan 11 '17 at 2:57










  • Ok, thanks. I'm leaving the comment since I still think it's worth noting that showing console output instead of a reproducible example is not encouraged here. Also, a single benchmark isn't so informative. If you also measured the time taken for the subsetting, it'd be more informative (since most of us don't intuitively know how long that takes, much less how long it takes on your comp). Anyway, I don't mean to suggest this is a bad answer; I'm one of its upvoters.
    – Frank
    Jan 11 '17 at 3:20
















5












5








5






Here is a working function based on @vc273's answer and @Frank's feedback.



delete <- function(DT, del.idxs) {           # pls note 'del.idxs' vs. 'keep.idxs'
keep.idxs <- setdiff(DT[, .I], del.idxs); # select row indexes to keep
cols = names(DT);
DT.subset <- data.table(DT[[1]][keep.idxs]); # this is the subsetted table
setnames(DT.subset, cols[1]);
for (col in cols[2:length(cols)]) {
DT.subset[, (col) := DT[[col]][keep.idxs]];
DT[, (col) := NULL]; # delete
}
return(DT.subset);
}


And example of its usage:



dat <- delete(dat,del.idxs)   ## Pls note 'del.idxs' instead of 'keep.idxs'


Where "dat" is a data.table. Removing 14k rows from 1.4M rows takes 0.25 sec on my laptop.



> dim(dat)
[1] 1419393 25
> system.time(dat <- delete(dat,del.idxs))
user system elapsed
0.23 0.02 0.25
> dim(dat)
[1] 1404715 25
>


PS. Since I am new to SO, I could not add comment to @vc273's thread :-(






share|improve this answer














Here is a working function based on @vc273's answer and @Frank's feedback.



delete <- function(DT, del.idxs) {           # pls note 'del.idxs' vs. 'keep.idxs'
keep.idxs <- setdiff(DT[, .I], del.idxs); # select row indexes to keep
cols = names(DT);
DT.subset <- data.table(DT[[1]][keep.idxs]); # this is the subsetted table
setnames(DT.subset, cols[1]);
for (col in cols[2:length(cols)]) {
DT.subset[, (col) := DT[[col]][keep.idxs]];
DT[, (col) := NULL]; # delete
}
return(DT.subset);
}


And example of its usage:



dat <- delete(dat,del.idxs)   ## Pls note 'del.idxs' instead of 'keep.idxs'


Where "dat" is a data.table. Removing 14k rows from 1.4M rows takes 0.25 sec on my laptop.



> dim(dat)
[1] 1419393 25
> system.time(dat <- delete(dat,del.idxs))
user system elapsed
0.23 0.02 0.25
> dim(dat)
[1] 1404715 25
>


PS. Since I am new to SO, I could not add comment to @vc273's thread :-(







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 19 '16 at 7:16

























answered Nov 18 '16 at 8:29









Jarno P.Jarno P.

5113




5113












  • I commented under vc's answer explaining the changed syntax for (col) :=. Kind of odd to have a function named "delete" but an arg related to what to keep. Btw, generally it's preferred to use a reproducible example rather than to show dim for your own data. You could reuse DT from the question, for example.
    – Frank
    Nov 18 '16 at 17:42










  • I don't understand why you do it by reference but later use an assignment dat <-
    – skan
    Jan 10 '17 at 0:16






  • 1




    @skan , That assignment assigns "dat" to point to the modified data.table that itself has been created by subsetting the original data.table. The <- assingment does not do copy of the return data, just assigns new name for it. link
    – Jarno P.
    Jan 11 '17 at 2:54












  • @Frank , I have updated the function for the oddity you pointed out.
    – Jarno P.
    Jan 11 '17 at 2:57










  • Ok, thanks. I'm leaving the comment since I still think it's worth noting that showing console output instead of a reproducible example is not encouraged here. Also, a single benchmark isn't so informative. If you also measured the time taken for the subsetting, it'd be more informative (since most of us don't intuitively know how long that takes, much less how long it takes on your comp). Anyway, I don't mean to suggest this is a bad answer; I'm one of its upvoters.
    – Frank
    Jan 11 '17 at 3:20




















  • I commented under vc's answer explaining the changed syntax for (col) :=. Kind of odd to have a function named "delete" but an arg related to what to keep. Btw, generally it's preferred to use a reproducible example rather than to show dim for your own data. You could reuse DT from the question, for example.
    – Frank
    Nov 18 '16 at 17:42










  • I don't understand why you do it by reference but later use an assignment dat <-
    – skan
    Jan 10 '17 at 0:16






  • 1




    @skan , That assignment assigns "dat" to point to the modified data.table that itself has been created by subsetting the original data.table. The <- assingment does not do copy of the return data, just assigns new name for it. link
    – Jarno P.
    Jan 11 '17 at 2:54












  • @Frank , I have updated the function for the oddity you pointed out.
    – Jarno P.
    Jan 11 '17 at 2:57










  • Ok, thanks. I'm leaving the comment since I still think it's worth noting that showing console output instead of a reproducible example is not encouraged here. Also, a single benchmark isn't so informative. If you also measured the time taken for the subsetting, it'd be more informative (since most of us don't intuitively know how long that takes, much less how long it takes on your comp). Anyway, I don't mean to suggest this is a bad answer; I'm one of its upvoters.
    – Frank
    Jan 11 '17 at 3:20


















I commented under vc's answer explaining the changed syntax for (col) :=. Kind of odd to have a function named "delete" but an arg related to what to keep. Btw, generally it's preferred to use a reproducible example rather than to show dim for your own data. You could reuse DT from the question, for example.
– Frank
Nov 18 '16 at 17:42




I commented under vc's answer explaining the changed syntax for (col) :=. Kind of odd to have a function named "delete" but an arg related to what to keep. Btw, generally it's preferred to use a reproducible example rather than to show dim for your own data. You could reuse DT from the question, for example.
– Frank
Nov 18 '16 at 17:42












I don't understand why you do it by reference but later use an assignment dat <-
– skan
Jan 10 '17 at 0:16




I don't understand why you do it by reference but later use an assignment dat <-
– skan
Jan 10 '17 at 0:16




1




1




@skan , That assignment assigns "dat" to point to the modified data.table that itself has been created by subsetting the original data.table. The <- assingment does not do copy of the return data, just assigns new name for it. link
– Jarno P.
Jan 11 '17 at 2:54






@skan , That assignment assigns "dat" to point to the modified data.table that itself has been created by subsetting the original data.table. The <- assingment does not do copy of the return data, just assigns new name for it. link
– Jarno P.
Jan 11 '17 at 2:54














@Frank , I have updated the function for the oddity you pointed out.
– Jarno P.
Jan 11 '17 at 2:57




@Frank , I have updated the function for the oddity you pointed out.
– Jarno P.
Jan 11 '17 at 2:57












Ok, thanks. I'm leaving the comment since I still think it's worth noting that showing console output instead of a reproducible example is not encouraged here. Also, a single benchmark isn't so informative. If you also measured the time taken for the subsetting, it'd be more informative (since most of us don't intuitively know how long that takes, much less how long it takes on your comp). Anyway, I don't mean to suggest this is a bad answer; I'm one of its upvoters.
– Frank
Jan 11 '17 at 3:20






Ok, thanks. I'm leaving the comment since I still think it's worth noting that showing console output instead of a reproducible example is not encouraged here. Also, a single benchmark isn't so informative. If you also measured the time taken for the subsetting, it'd be more informative (since most of us don't intuitively know how long that takes, much less how long it takes on your comp). Anyway, I don't mean to suggest this is a bad answer; I'm one of its upvoters.
– Frank
Jan 11 '17 at 3:20













4














Instead or trying to set to NULL, try setting to NA (matching the NA-type for the first column)



set(DT,1:2, 1:3 ,NA_character_)





share|improve this answer

















  • 3




    yeah, that works I guess. My problem is that I have a lot of data and I want to get rid of exactly those rows with NA, possibly without having to copy DT to get rid of those rows. thanks for your comment anyway!
    – Florian Oswald
    May 29 '12 at 21:48
















4














Instead or trying to set to NULL, try setting to NA (matching the NA-type for the first column)



set(DT,1:2, 1:3 ,NA_character_)





share|improve this answer

















  • 3




    yeah, that works I guess. My problem is that I have a lot of data and I want to get rid of exactly those rows with NA, possibly without having to copy DT to get rid of those rows. thanks for your comment anyway!
    – Florian Oswald
    May 29 '12 at 21:48














4












4








4






Instead or trying to set to NULL, try setting to NA (matching the NA-type for the first column)



set(DT,1:2, 1:3 ,NA_character_)





share|improve this answer












Instead or trying to set to NULL, try setting to NA (matching the NA-type for the first column)



set(DT,1:2, 1:3 ,NA_character_)






share|improve this answer












share|improve this answer



share|improve this answer










answered May 28 '12 at 22:33









42-42-

211k14250396




211k14250396








  • 3




    yeah, that works I guess. My problem is that I have a lot of data and I want to get rid of exactly those rows with NA, possibly without having to copy DT to get rid of those rows. thanks for your comment anyway!
    – Florian Oswald
    May 29 '12 at 21:48














  • 3




    yeah, that works I guess. My problem is that I have a lot of data and I want to get rid of exactly those rows with NA, possibly without having to copy DT to get rid of those rows. thanks for your comment anyway!
    – Florian Oswald
    May 29 '12 at 21:48








3




3




yeah, that works I guess. My problem is that I have a lot of data and I want to get rid of exactly those rows with NA, possibly without having to copy DT to get rid of those rows. thanks for your comment anyway!
– Florian Oswald
May 29 '12 at 21:48




yeah, that works I guess. My problem is that I have a lot of data and I want to get rid of exactly those rows with NA, possibly without having to copy DT to get rid of those rows. thanks for your comment anyway!
– Florian Oswald
May 29 '12 at 21:48











3














The topic is still interesting many people (me included).



What about that? I used assign to replace the glovalenv and the code described previously. It would be better to capture the original environment but at least in globalenv it is memory efficient and acts like a change by ref.



delete <- function(DT, del.idxs) 
{
varname = deparse(substitute(DT))

keep.idxs <- setdiff(DT[, .I], del.idxs)
cols = names(DT);
DT.subset <- data.table(DT[[1]][keep.idxs])
setnames(DT.subset, cols[1])

for (col in cols[2:length(cols)])
{
DT.subset[, (col) := DT[[col]][keep.idxs]]
DT[, (col) := NULL]; # delete
}

assign(varname, DT.subset, envir = globalenv())
return(invisible())
}

DT = data.table(x = rep(c("a", "b", "c"), each = 3), y = c(1, 3, 6), v = 1:9)
delete(DT, 3)





share|improve this answer





















  • Just to be clear, this does not delete by reference (based on address(DT); delete(DT, 3); address(DT)), though it may be efficient in some sense.
    – Frank
    Aug 28 '17 at 16:41






  • 1




    No it does not. It emulates the behavior and is memory efficient. That's why I said: it acts like. But strictly speaking you're right the address changed.
    – JRR
    Aug 28 '17 at 18:22
















3














The topic is still interesting many people (me included).



What about that? I used assign to replace the glovalenv and the code described previously. It would be better to capture the original environment but at least in globalenv it is memory efficient and acts like a change by ref.



delete <- function(DT, del.idxs) 
{
varname = deparse(substitute(DT))

keep.idxs <- setdiff(DT[, .I], del.idxs)
cols = names(DT);
DT.subset <- data.table(DT[[1]][keep.idxs])
setnames(DT.subset, cols[1])

for (col in cols[2:length(cols)])
{
DT.subset[, (col) := DT[[col]][keep.idxs]]
DT[, (col) := NULL]; # delete
}

assign(varname, DT.subset, envir = globalenv())
return(invisible())
}

DT = data.table(x = rep(c("a", "b", "c"), each = 3), y = c(1, 3, 6), v = 1:9)
delete(DT, 3)





share|improve this answer





















  • Just to be clear, this does not delete by reference (based on address(DT); delete(DT, 3); address(DT)), though it may be efficient in some sense.
    – Frank
    Aug 28 '17 at 16:41






  • 1




    No it does not. It emulates the behavior and is memory efficient. That's why I said: it acts like. But strictly speaking you're right the address changed.
    – JRR
    Aug 28 '17 at 18:22














3












3








3






The topic is still interesting many people (me included).



What about that? I used assign to replace the glovalenv and the code described previously. It would be better to capture the original environment but at least in globalenv it is memory efficient and acts like a change by ref.



delete <- function(DT, del.idxs) 
{
varname = deparse(substitute(DT))

keep.idxs <- setdiff(DT[, .I], del.idxs)
cols = names(DT);
DT.subset <- data.table(DT[[1]][keep.idxs])
setnames(DT.subset, cols[1])

for (col in cols[2:length(cols)])
{
DT.subset[, (col) := DT[[col]][keep.idxs]]
DT[, (col) := NULL]; # delete
}

assign(varname, DT.subset, envir = globalenv())
return(invisible())
}

DT = data.table(x = rep(c("a", "b", "c"), each = 3), y = c(1, 3, 6), v = 1:9)
delete(DT, 3)





share|improve this answer












The topic is still interesting many people (me included).



What about that? I used assign to replace the glovalenv and the code described previously. It would be better to capture the original environment but at least in globalenv it is memory efficient and acts like a change by ref.



delete <- function(DT, del.idxs) 
{
varname = deparse(substitute(DT))

keep.idxs <- setdiff(DT[, .I], del.idxs)
cols = names(DT);
DT.subset <- data.table(DT[[1]][keep.idxs])
setnames(DT.subset, cols[1])

for (col in cols[2:length(cols)])
{
DT.subset[, (col) := DT[[col]][keep.idxs]]
DT[, (col) := NULL]; # delete
}

assign(varname, DT.subset, envir = globalenv())
return(invisible())
}

DT = data.table(x = rep(c("a", "b", "c"), each = 3), y = c(1, 3, 6), v = 1:9)
delete(DT, 3)






share|improve this answer












share|improve this answer



share|improve this answer










answered Aug 27 '17 at 21:52









JRRJRR

9321220




9321220












  • Just to be clear, this does not delete by reference (based on address(DT); delete(DT, 3); address(DT)), though it may be efficient in some sense.
    – Frank
    Aug 28 '17 at 16:41






  • 1




    No it does not. It emulates the behavior and is memory efficient. That's why I said: it acts like. But strictly speaking you're right the address changed.
    – JRR
    Aug 28 '17 at 18:22


















  • Just to be clear, this does not delete by reference (based on address(DT); delete(DT, 3); address(DT)), though it may be efficient in some sense.
    – Frank
    Aug 28 '17 at 16:41






  • 1




    No it does not. It emulates the behavior and is memory efficient. That's why I said: it acts like. But strictly speaking you're right the address changed.
    – JRR
    Aug 28 '17 at 18:22
















Just to be clear, this does not delete by reference (based on address(DT); delete(DT, 3); address(DT)), though it may be efficient in some sense.
– Frank
Aug 28 '17 at 16:41




Just to be clear, this does not delete by reference (based on address(DT); delete(DT, 3); address(DT)), though it may be efficient in some sense.
– Frank
Aug 28 '17 at 16:41




1




1




No it does not. It emulates the behavior and is memory efficient. That's why I said: it acts like. But strictly speaking you're right the address changed.
– JRR
Aug 28 '17 at 18:22




No it does not. It emulates the behavior and is memory efficient. That's why I said: it acts like. But strictly speaking you're right the address changed.
– JRR
Aug 28 '17 at 18:22











1














Here are some strategies I have used. I believe a .ROW function may be coming. None of these approaches below are fast. These are some strategies a little beyond subsets or filtering. I tried to think like dba just trying to clean up data. As noted above, you can select or remove rows in data.table:



data(iris)
iris <- data.table(iris)

iris[3] # Select row three

iris[-3] # Remove row three

You can also use .SD to select or remove rows:

iris[,.SD[3]] # Select row three

iris[,.SD[3:6],by=,.(Species)] # Select row 3 - 6 for each Species

iris[,.SD[-3]] # Remove row three

iris[,.SD[-3:-6],by=,.(Species)] # Remove row 3 - 6 for each Species


Note: .SD creates a subset of the original data and allows you to do quite a bit of work in j or subsequent data.table. See https://stackoverflow.com/a/47406952/305675. Here I ordered my irises by Sepal Length, take a specified Sepal.Length as minimum,select the top three (by Sepal Length) of all Species and return all accompanying data:



iris[order(-Sepal.Length)][Sepal.Length > 3,.SD[1:3],by=,.(Species)]


The approaches above all reorder a data.table sequentially when removing rows. You can transpose a data.table and remove or replace the old rows which are now transposed columns. When using ':=NULL' to remove a transposed row, the subsequent column name is removed as well:



m_iris <- data.table(t(iris))[,V3:=NULL] # V3 column removed

d_iris <- data.table(t(iris))[,V3:=V2] # V3 column replaced with V2


When you transpose the data.frame back to a data.table, you may want to rename from the original data.table and restore class attributes in the case of deletion. Applying ":=NULL" to a now transposed data.table creates all character classes.



m_iris <- data.table(t(d_iris));
setnames(d_iris,names(iris))

d_iris <- data.table(t(m_iris));
setnames(m_iris,names(iris))


You may just want to remove duplicate rows which you can do with or without a Key:



d_iris[,Key:=paste0(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)]     

d_iris[!duplicated(Key),]

d_iris[!duplicated(paste0(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)),]


It is also possible to add an incremental counter with '.I'. You can then search for duplicated keys or fields and remove them by removing the record with the counter. This is computationally expensive, but has some advantages since you can print the lines to be removed.



d_iris[,I:=.I,] # add a counter field

d_iris[,Key:=paste0(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)]

for(i in d_iris[duplicated(Key),I]) {print(i)} # See lines with duplicated Key or Field

for(i in d_iris[duplicated(Key),I]) {d_iris <- d_iris[!I == i,]} # Remove lines with duplicated Key or any particular field.


You can also just fill a row with 0s or NAs and then use an i query to delete them:



 X 
x v foo
1: c 8 4
2: b 7 2

X[1] <- c(0)

X
x v foo
1: 0 0 0
2: b 7 2

X[2] <- c(NA)
X
x v foo
1: 0 0 0
2: NA NA NA

X <- X[x != 0,]
X <- X[!is.na(x),]





share|improve this answer























  • This doesn't really answer the question (about removal by reference) and using t on a data.frame is usually not a good idea; check str(m_iris) to see that all data has become string/character. Btw, you can also get row numbers by using d_iris[duplicated(Key), which = TRUE] without making a counter column.
    – Frank
    Feb 6 '18 at 20:39










  • Yes, you are right. I don't answer the question specifically. But removing a row by reference doesn't have official functionality or documentation yet and many people are going to come to this post looking for generic functionality to do exactly that. We could create a post to just answer the question on how to remove a row. Stack overflow is very useful and I really understand the necessity to keep answers exact to the question. Sometimes though, I think SO can be a just a little fascist in this regard...but maybe there is a good reason for that.
    – rferrisx
    Feb 8 '18 at 18:28










  • Ok, thanks for explaining. I think for now our discussion here is enough of a signpost for anyone who gets confused in this case.
    – Frank
    Feb 8 '18 at 19:18
















1














Here are some strategies I have used. I believe a .ROW function may be coming. None of these approaches below are fast. These are some strategies a little beyond subsets or filtering. I tried to think like dba just trying to clean up data. As noted above, you can select or remove rows in data.table:



data(iris)
iris <- data.table(iris)

iris[3] # Select row three

iris[-3] # Remove row three

You can also use .SD to select or remove rows:

iris[,.SD[3]] # Select row three

iris[,.SD[3:6],by=,.(Species)] # Select row 3 - 6 for each Species

iris[,.SD[-3]] # Remove row three

iris[,.SD[-3:-6],by=,.(Species)] # Remove row 3 - 6 for each Species


Note: .SD creates a subset of the original data and allows you to do quite a bit of work in j or subsequent data.table. See https://stackoverflow.com/a/47406952/305675. Here I ordered my irises by Sepal Length, take a specified Sepal.Length as minimum,select the top three (by Sepal Length) of all Species and return all accompanying data:



iris[order(-Sepal.Length)][Sepal.Length > 3,.SD[1:3],by=,.(Species)]


The approaches above all reorder a data.table sequentially when removing rows. You can transpose a data.table and remove or replace the old rows which are now transposed columns. When using ':=NULL' to remove a transposed row, the subsequent column name is removed as well:



m_iris <- data.table(t(iris))[,V3:=NULL] # V3 column removed

d_iris <- data.table(t(iris))[,V3:=V2] # V3 column replaced with V2


When you transpose the data.frame back to a data.table, you may want to rename from the original data.table and restore class attributes in the case of deletion. Applying ":=NULL" to a now transposed data.table creates all character classes.



m_iris <- data.table(t(d_iris));
setnames(d_iris,names(iris))

d_iris <- data.table(t(m_iris));
setnames(m_iris,names(iris))


You may just want to remove duplicate rows which you can do with or without a Key:



d_iris[,Key:=paste0(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)]     

d_iris[!duplicated(Key),]

d_iris[!duplicated(paste0(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)),]


It is also possible to add an incremental counter with '.I'. You can then search for duplicated keys or fields and remove them by removing the record with the counter. This is computationally expensive, but has some advantages since you can print the lines to be removed.



d_iris[,I:=.I,] # add a counter field

d_iris[,Key:=paste0(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)]

for(i in d_iris[duplicated(Key),I]) {print(i)} # See lines with duplicated Key or Field

for(i in d_iris[duplicated(Key),I]) {d_iris <- d_iris[!I == i,]} # Remove lines with duplicated Key or any particular field.


You can also just fill a row with 0s or NAs and then use an i query to delete them:



 X 
x v foo
1: c 8 4
2: b 7 2

X[1] <- c(0)

X
x v foo
1: 0 0 0
2: b 7 2

X[2] <- c(NA)
X
x v foo
1: 0 0 0
2: NA NA NA

X <- X[x != 0,]
X <- X[!is.na(x),]





share|improve this answer























  • This doesn't really answer the question (about removal by reference) and using t on a data.frame is usually not a good idea; check str(m_iris) to see that all data has become string/character. Btw, you can also get row numbers by using d_iris[duplicated(Key), which = TRUE] without making a counter column.
    – Frank
    Feb 6 '18 at 20:39










  • Yes, you are right. I don't answer the question specifically. But removing a row by reference doesn't have official functionality or documentation yet and many people are going to come to this post looking for generic functionality to do exactly that. We could create a post to just answer the question on how to remove a row. Stack overflow is very useful and I really understand the necessity to keep answers exact to the question. Sometimes though, I think SO can be a just a little fascist in this regard...but maybe there is a good reason for that.
    – rferrisx
    Feb 8 '18 at 18:28










  • Ok, thanks for explaining. I think for now our discussion here is enough of a signpost for anyone who gets confused in this case.
    – Frank
    Feb 8 '18 at 19:18














1












1








1






Here are some strategies I have used. I believe a .ROW function may be coming. None of these approaches below are fast. These are some strategies a little beyond subsets or filtering. I tried to think like dba just trying to clean up data. As noted above, you can select or remove rows in data.table:



data(iris)
iris <- data.table(iris)

iris[3] # Select row three

iris[-3] # Remove row three

You can also use .SD to select or remove rows:

iris[,.SD[3]] # Select row three

iris[,.SD[3:6],by=,.(Species)] # Select row 3 - 6 for each Species

iris[,.SD[-3]] # Remove row three

iris[,.SD[-3:-6],by=,.(Species)] # Remove row 3 - 6 for each Species


Note: .SD creates a subset of the original data and allows you to do quite a bit of work in j or subsequent data.table. See https://stackoverflow.com/a/47406952/305675. Here I ordered my irises by Sepal Length, take a specified Sepal.Length as minimum,select the top three (by Sepal Length) of all Species and return all accompanying data:



iris[order(-Sepal.Length)][Sepal.Length > 3,.SD[1:3],by=,.(Species)]


The approaches above all reorder a data.table sequentially when removing rows. You can transpose a data.table and remove or replace the old rows which are now transposed columns. When using ':=NULL' to remove a transposed row, the subsequent column name is removed as well:



m_iris <- data.table(t(iris))[,V3:=NULL] # V3 column removed

d_iris <- data.table(t(iris))[,V3:=V2] # V3 column replaced with V2


When you transpose the data.frame back to a data.table, you may want to rename from the original data.table and restore class attributes in the case of deletion. Applying ":=NULL" to a now transposed data.table creates all character classes.



m_iris <- data.table(t(d_iris));
setnames(d_iris,names(iris))

d_iris <- data.table(t(m_iris));
setnames(m_iris,names(iris))


You may just want to remove duplicate rows which you can do with or without a Key:



d_iris[,Key:=paste0(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)]     

d_iris[!duplicated(Key),]

d_iris[!duplicated(paste0(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)),]


It is also possible to add an incremental counter with '.I'. You can then search for duplicated keys or fields and remove them by removing the record with the counter. This is computationally expensive, but has some advantages since you can print the lines to be removed.



d_iris[,I:=.I,] # add a counter field

d_iris[,Key:=paste0(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)]

for(i in d_iris[duplicated(Key),I]) {print(i)} # See lines with duplicated Key or Field

for(i in d_iris[duplicated(Key),I]) {d_iris <- d_iris[!I == i,]} # Remove lines with duplicated Key or any particular field.


You can also just fill a row with 0s or NAs and then use an i query to delete them:



 X 
x v foo
1: c 8 4
2: b 7 2

X[1] <- c(0)

X
x v foo
1: 0 0 0
2: b 7 2

X[2] <- c(NA)
X
x v foo
1: 0 0 0
2: NA NA NA

X <- X[x != 0,]
X <- X[!is.na(x),]





share|improve this answer














Here are some strategies I have used. I believe a .ROW function may be coming. None of these approaches below are fast. These are some strategies a little beyond subsets or filtering. I tried to think like dba just trying to clean up data. As noted above, you can select or remove rows in data.table:



data(iris)
iris <- data.table(iris)

iris[3] # Select row three

iris[-3] # Remove row three

You can also use .SD to select or remove rows:

iris[,.SD[3]] # Select row three

iris[,.SD[3:6],by=,.(Species)] # Select row 3 - 6 for each Species

iris[,.SD[-3]] # Remove row three

iris[,.SD[-3:-6],by=,.(Species)] # Remove row 3 - 6 for each Species


Note: .SD creates a subset of the original data and allows you to do quite a bit of work in j or subsequent data.table. See https://stackoverflow.com/a/47406952/305675. Here I ordered my irises by Sepal Length, take a specified Sepal.Length as minimum,select the top three (by Sepal Length) of all Species and return all accompanying data:



iris[order(-Sepal.Length)][Sepal.Length > 3,.SD[1:3],by=,.(Species)]


The approaches above all reorder a data.table sequentially when removing rows. You can transpose a data.table and remove or replace the old rows which are now transposed columns. When using ':=NULL' to remove a transposed row, the subsequent column name is removed as well:



m_iris <- data.table(t(iris))[,V3:=NULL] # V3 column removed

d_iris <- data.table(t(iris))[,V3:=V2] # V3 column replaced with V2


When you transpose the data.frame back to a data.table, you may want to rename from the original data.table and restore class attributes in the case of deletion. Applying ":=NULL" to a now transposed data.table creates all character classes.



m_iris <- data.table(t(d_iris));
setnames(d_iris,names(iris))

d_iris <- data.table(t(m_iris));
setnames(m_iris,names(iris))


You may just want to remove duplicate rows which you can do with or without a Key:



d_iris[,Key:=paste0(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)]     

d_iris[!duplicated(Key),]

d_iris[!duplicated(paste0(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)),]


It is also possible to add an incremental counter with '.I'. You can then search for duplicated keys or fields and remove them by removing the record with the counter. This is computationally expensive, but has some advantages since you can print the lines to be removed.



d_iris[,I:=.I,] # add a counter field

d_iris[,Key:=paste0(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)]

for(i in d_iris[duplicated(Key),I]) {print(i)} # See lines with duplicated Key or Field

for(i in d_iris[duplicated(Key),I]) {d_iris <- d_iris[!I == i,]} # Remove lines with duplicated Key or any particular field.


You can also just fill a row with 0s or NAs and then use an i query to delete them:



 X 
x v foo
1: c 8 4
2: b 7 2

X[1] <- c(0)

X
x v foo
1: 0 0 0
2: b 7 2

X[2] <- c(NA)
X
x v foo
1: 0 0 0
2: NA NA NA

X <- X[x != 0,]
X <- X[!is.na(x),]






share|improve this answer














share|improve this answer



share|improve this answer








edited Feb 8 '18 at 18:25

























answered Jan 29 '18 at 1:47









rferrisxrferrisx

3931211




3931211












  • This doesn't really answer the question (about removal by reference) and using t on a data.frame is usually not a good idea; check str(m_iris) to see that all data has become string/character. Btw, you can also get row numbers by using d_iris[duplicated(Key), which = TRUE] without making a counter column.
    – Frank
    Feb 6 '18 at 20:39










  • Yes, you are right. I don't answer the question specifically. But removing a row by reference doesn't have official functionality or documentation yet and many people are going to come to this post looking for generic functionality to do exactly that. We could create a post to just answer the question on how to remove a row. Stack overflow is very useful and I really understand the necessity to keep answers exact to the question. Sometimes though, I think SO can be a just a little fascist in this regard...but maybe there is a good reason for that.
    – rferrisx
    Feb 8 '18 at 18:28










  • Ok, thanks for explaining. I think for now our discussion here is enough of a signpost for anyone who gets confused in this case.
    – Frank
    Feb 8 '18 at 19:18


















  • This doesn't really answer the question (about removal by reference) and using t on a data.frame is usually not a good idea; check str(m_iris) to see that all data has become string/character. Btw, you can also get row numbers by using d_iris[duplicated(Key), which = TRUE] without making a counter column.
    – Frank
    Feb 6 '18 at 20:39










  • Yes, you are right. I don't answer the question specifically. But removing a row by reference doesn't have official functionality or documentation yet and many people are going to come to this post looking for generic functionality to do exactly that. We could create a post to just answer the question on how to remove a row. Stack overflow is very useful and I really understand the necessity to keep answers exact to the question. Sometimes though, I think SO can be a just a little fascist in this regard...but maybe there is a good reason for that.
    – rferrisx
    Feb 8 '18 at 18:28










  • Ok, thanks for explaining. I think for now our discussion here is enough of a signpost for anyone who gets confused in this case.
    – Frank
    Feb 8 '18 at 19:18
















This doesn't really answer the question (about removal by reference) and using t on a data.frame is usually not a good idea; check str(m_iris) to see that all data has become string/character. Btw, you can also get row numbers by using d_iris[duplicated(Key), which = TRUE] without making a counter column.
– Frank
Feb 6 '18 at 20:39




This doesn't really answer the question (about removal by reference) and using t on a data.frame is usually not a good idea; check str(m_iris) to see that all data has become string/character. Btw, you can also get row numbers by using d_iris[duplicated(Key), which = TRUE] without making a counter column.
– Frank
Feb 6 '18 at 20:39












Yes, you are right. I don't answer the question specifically. But removing a row by reference doesn't have official functionality or documentation yet and many people are going to come to this post looking for generic functionality to do exactly that. We could create a post to just answer the question on how to remove a row. Stack overflow is very useful and I really understand the necessity to keep answers exact to the question. Sometimes though, I think SO can be a just a little fascist in this regard...but maybe there is a good reason for that.
– rferrisx
Feb 8 '18 at 18:28




Yes, you are right. I don't answer the question specifically. But removing a row by reference doesn't have official functionality or documentation yet and many people are going to come to this post looking for generic functionality to do exactly that. We could create a post to just answer the question on how to remove a row. Stack overflow is very useful and I really understand the necessity to keep answers exact to the question. Sometimes though, I think SO can be a just a little fascist in this regard...but maybe there is a good reason for that.
– rferrisx
Feb 8 '18 at 18:28












Ok, thanks for explaining. I think for now our discussion here is enough of a signpost for anyone who gets confused in this case.
– Frank
Feb 8 '18 at 19:18




Ok, thanks for explaining. I think for now our discussion here is enough of a signpost for anyone who gets confused in this case.
– Frank
Feb 8 '18 at 19:18


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f10790204%2fhow-to-delete-a-row-by-reference-in-data-table%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to ignore python UserWarning in pytest?

What visual should I use to simply compare current year value vs last year in Power BI desktop

Script to remove string up to first number