Subset a dataframe in two dataframes by values in two columns of another dataframe [duplicate]
up vote
1
down vote
favorite
This question already has an answer here:
Filter by ranges supplied by two vectors, without a join operation
2 answers
Efficient way to filter one data frame by ranges in another
3 answers
Subset by multiple ranges [duplicate]
3 answers
I have two dataframes.
df1 looks like (or the column of df1 i am interested in):
position
2
6
12
18
25
31
and df2 looks like:
start end
2 17
24 29
I want to keep the positions in df1 that only fall between (<= or >=) the start and end coordinates of df2, so that df1 looks like this after filtering:
position
2
6
12
25
Then I want to keep the filtered out "leftover" values of df1 in another dataframe, let's call it df4.
df4 would look like:
position
18
31
I can do this the perl way using a for loop (coming from perl and currently learning R) but I am pretty sure I can somehow use filter or some other dplyr or base R combination of functions to achieve this.
Any help would be appreciated!
EDIT: Added df4 calculation as my question was marked as duplicate and this is sth not found in the other similar threads. This is something I am interested in doing to make my code faster!
r dplyr
marked as duplicate by Henrik
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 22 at 18:05
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
up vote
1
down vote
favorite
This question already has an answer here:
Filter by ranges supplied by two vectors, without a join operation
2 answers
Efficient way to filter one data frame by ranges in another
3 answers
Subset by multiple ranges [duplicate]
3 answers
I have two dataframes.
df1 looks like (or the column of df1 i am interested in):
position
2
6
12
18
25
31
and df2 looks like:
start end
2 17
24 29
I want to keep the positions in df1 that only fall between (<= or >=) the start and end coordinates of df2, so that df1 looks like this after filtering:
position
2
6
12
25
Then I want to keep the filtered out "leftover" values of df1 in another dataframe, let's call it df4.
df4 would look like:
position
18
31
I can do this the perl way using a for loop (coming from perl and currently learning R) but I am pretty sure I can somehow use filter or some other dplyr or base R combination of functions to achieve this.
Any help would be appreciated!
EDIT: Added df4 calculation as my question was marked as duplicate and this is sth not found in the other similar threads. This is something I am interested in doing to make my code faster!
r dplyr
marked as duplicate by Henrik
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 22 at 18:05
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
you mean 18, 31 for df4
– Andre Elrico
Nov 23 at 12:56
Yes, corrected this!
– RacktheMan
Nov 23 at 13:36
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
This question already has an answer here:
Filter by ranges supplied by two vectors, without a join operation
2 answers
Efficient way to filter one data frame by ranges in another
3 answers
Subset by multiple ranges [duplicate]
3 answers
I have two dataframes.
df1 looks like (or the column of df1 i am interested in):
position
2
6
12
18
25
31
and df2 looks like:
start end
2 17
24 29
I want to keep the positions in df1 that only fall between (<= or >=) the start and end coordinates of df2, so that df1 looks like this after filtering:
position
2
6
12
25
Then I want to keep the filtered out "leftover" values of df1 in another dataframe, let's call it df4.
df4 would look like:
position
18
31
I can do this the perl way using a for loop (coming from perl and currently learning R) but I am pretty sure I can somehow use filter or some other dplyr or base R combination of functions to achieve this.
Any help would be appreciated!
EDIT: Added df4 calculation as my question was marked as duplicate and this is sth not found in the other similar threads. This is something I am interested in doing to make my code faster!
r dplyr
This question already has an answer here:
Filter by ranges supplied by two vectors, without a join operation
2 answers
Efficient way to filter one data frame by ranges in another
3 answers
Subset by multiple ranges [duplicate]
3 answers
I have two dataframes.
df1 looks like (or the column of df1 i am interested in):
position
2
6
12
18
25
31
and df2 looks like:
start end
2 17
24 29
I want to keep the positions in df1 that only fall between (<= or >=) the start and end coordinates of df2, so that df1 looks like this after filtering:
position
2
6
12
25
Then I want to keep the filtered out "leftover" values of df1 in another dataframe, let's call it df4.
df4 would look like:
position
18
31
I can do this the perl way using a for loop (coming from perl and currently learning R) but I am pretty sure I can somehow use filter or some other dplyr or base R combination of functions to achieve this.
Any help would be appreciated!
EDIT: Added df4 calculation as my question was marked as duplicate and this is sth not found in the other similar threads. This is something I am interested in doing to make my code faster!
This question already has an answer here:
Filter by ranges supplied by two vectors, without a join operation
2 answers
Efficient way to filter one data frame by ranges in another
3 answers
Subset by multiple ranges [duplicate]
3 answers
r dplyr
r dplyr
edited Nov 23 at 13:36
asked Nov 22 at 17:20
RacktheMan
84
84
marked as duplicate by Henrik
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 22 at 18:05
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by Henrik
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 22 at 18:05
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
you mean 18, 31 for df4
– Andre Elrico
Nov 23 at 12:56
Yes, corrected this!
– RacktheMan
Nov 23 at 13:36
add a comment |
you mean 18, 31 for df4
– Andre Elrico
Nov 23 at 12:56
Yes, corrected this!
– RacktheMan
Nov 23 at 13:36
you mean 18, 31 for df4
– Andre Elrico
Nov 23 at 12:56
you mean 18, 31 for df4
– Andre Elrico
Nov 23 at 12:56
Yes, corrected this!
– RacktheMan
Nov 23 at 13:36
Yes, corrected this!
– RacktheMan
Nov 23 at 13:36
add a comment |
5 Answers
5
active
oldest
votes
up vote
0
down vote
accepted
We can full_join
these two data frames and then filter for the rows within the start
and end
column. The Flag
column in the example is just for the join. Finally, we can use distinct
to remove duplicated rows.
library(dplyr)
df3 <- df1 %>%
mutate(Flag = 1) %>%
full_join(df2 %>% mutate(Flag = 1), by = "Flag") %>%
filter(position >= start, position <= end) %>%
distinct(position)
df3
# position
# 1 3
# 2 6
# 3 12
# 4 25
DATA
df1 <- read.table(text = "position
3
6
12
18
25
31", header = TRUE)
df2 <- read.table(text = "start end
2 17
24 29",
header = TRUE)
Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)
– RacktheMan
Nov 23 at 11:18
I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!
– RacktheMan
Nov 23 at 13:04
add a comment |
up vote
3
down vote
Single line, simple base solution:
df1[df1$position %in% unlist(apply(df2,1,function(x) x["start"]:x["end"])),]
The apply simply generates a vector of all the cases that fall between starts and ends.
After a quick check, this seems to be the fastest solution.
– lith
Nov 26 at 8:47
add a comment |
up vote
2
down vote
Here is a base R
option
do.call(rbind, Map(function(i, j)
df1[df1$position > i & df1$position < j, , drop = FALSE],
df2$start, df2$end))
# position
#1 3
#2 6
#3 12
#5 25
Or using fuzzy_join
library(fuzzyjoin)
library(dplyr)
fuzzy_inner_join(df1, df2, by = c('position' = 'start', 'position' = 'end'),
match_fun = list(`>`, `<`)) %>%
select(position)
# position
#1 3
#2 6
#3 12
#4 25
Or use a non-equi join from data.table
setDT(df2)[df1, on = .(start < position, end > position), .(position), nomatch = 0]
# position
#1: 3
#2: 6
#3: 12
#4: 25
data
df1 <- structure(list(position = c(3L, 6L, 12L, 18L, 25L, 31L)), row.names = c(NA,
-6L), class = "data.frame")
df2 <- structure(list(start = c(2L, 24L), end = c(17L, 29L)),
class = "data.frame", row.names = c(NA, -2L))
add a comment |
up vote
1
down vote
base R
solution (no packages needed)
keepRows<-
sapply(df1$position, function(x_o) {
any(apply(df2, 1, function(x) {x_o => x[1] & x_o <= x[2]}))
})
df1[keepRows,, drop = FALSE]
Result:
# position
#1 3
#2 6
#3 12
#5 25
Please note:
This is basically a double loop, I don't know how else to solve this with
base::
.Should the border values be included? You are very vague with "between". Currently, I don't include them. You can change that using
<=, >=
.
To get the "leftover" use negation:
df1[!keepRows,, drop = FALSE]
Made my question more clear upon your suggestion.
– RacktheMan
Nov 23 at 11:19
add a comment |
up vote
0
down vote
Here is another take that starts with df2 (I don't say this is wiser than Andre's approach):
subset(df1, apply(apply(df2, 1, function (x) {dplyr::between(df1$position, x["start"], x["end"])}), 1, any))
You should probably run some benchmarks on the proposed approaches before making a decision.
@AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.
– lith
Nov 26 at 8:49
@AndreElrico Since I don't usedata.table
the function I use actually is fromdplyr
. Thanks for the hint though. I forgot it isn't in base.
– lith
Nov 26 at 15:09
add a comment |
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
We can full_join
these two data frames and then filter for the rows within the start
and end
column. The Flag
column in the example is just for the join. Finally, we can use distinct
to remove duplicated rows.
library(dplyr)
df3 <- df1 %>%
mutate(Flag = 1) %>%
full_join(df2 %>% mutate(Flag = 1), by = "Flag") %>%
filter(position >= start, position <= end) %>%
distinct(position)
df3
# position
# 1 3
# 2 6
# 3 12
# 4 25
DATA
df1 <- read.table(text = "position
3
6
12
18
25
31", header = TRUE)
df2 <- read.table(text = "start end
2 17
24 29",
header = TRUE)
Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)
– RacktheMan
Nov 23 at 11:18
I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!
– RacktheMan
Nov 23 at 13:04
add a comment |
up vote
0
down vote
accepted
We can full_join
these two data frames and then filter for the rows within the start
and end
column. The Flag
column in the example is just for the join. Finally, we can use distinct
to remove duplicated rows.
library(dplyr)
df3 <- df1 %>%
mutate(Flag = 1) %>%
full_join(df2 %>% mutate(Flag = 1), by = "Flag") %>%
filter(position >= start, position <= end) %>%
distinct(position)
df3
# position
# 1 3
# 2 6
# 3 12
# 4 25
DATA
df1 <- read.table(text = "position
3
6
12
18
25
31", header = TRUE)
df2 <- read.table(text = "start end
2 17
24 29",
header = TRUE)
Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)
– RacktheMan
Nov 23 at 11:18
I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!
– RacktheMan
Nov 23 at 13:04
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
We can full_join
these two data frames and then filter for the rows within the start
and end
column. The Flag
column in the example is just for the join. Finally, we can use distinct
to remove duplicated rows.
library(dplyr)
df3 <- df1 %>%
mutate(Flag = 1) %>%
full_join(df2 %>% mutate(Flag = 1), by = "Flag") %>%
filter(position >= start, position <= end) %>%
distinct(position)
df3
# position
# 1 3
# 2 6
# 3 12
# 4 25
DATA
df1 <- read.table(text = "position
3
6
12
18
25
31", header = TRUE)
df2 <- read.table(text = "start end
2 17
24 29",
header = TRUE)
We can full_join
these two data frames and then filter for the rows within the start
and end
column. The Flag
column in the example is just for the join. Finally, we can use distinct
to remove duplicated rows.
library(dplyr)
df3 <- df1 %>%
mutate(Flag = 1) %>%
full_join(df2 %>% mutate(Flag = 1), by = "Flag") %>%
filter(position >= start, position <= end) %>%
distinct(position)
df3
# position
# 1 3
# 2 6
# 3 12
# 4 25
DATA
df1 <- read.table(text = "position
3
6
12
18
25
31", header = TRUE)
df2 <- read.table(text = "start end
2 17
24 29",
header = TRUE)
answered Nov 22 at 17:31
www
25.7k102240
25.7k102240
Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)
– RacktheMan
Nov 23 at 11:18
I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!
– RacktheMan
Nov 23 at 13:04
add a comment |
Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)
– RacktheMan
Nov 23 at 11:18
I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!
– RacktheMan
Nov 23 at 13:04
Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)
– RacktheMan
Nov 23 at 11:18
Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)
– RacktheMan
Nov 23 at 11:18
I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!
– RacktheMan
Nov 23 at 13:04
I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!
– RacktheMan
Nov 23 at 13:04
add a comment |
up vote
3
down vote
Single line, simple base solution:
df1[df1$position %in% unlist(apply(df2,1,function(x) x["start"]:x["end"])),]
The apply simply generates a vector of all the cases that fall between starts and ends.
After a quick check, this seems to be the fastest solution.
– lith
Nov 26 at 8:47
add a comment |
up vote
3
down vote
Single line, simple base solution:
df1[df1$position %in% unlist(apply(df2,1,function(x) x["start"]:x["end"])),]
The apply simply generates a vector of all the cases that fall between starts and ends.
After a quick check, this seems to be the fastest solution.
– lith
Nov 26 at 8:47
add a comment |
up vote
3
down vote
up vote
3
down vote
Single line, simple base solution:
df1[df1$position %in% unlist(apply(df2,1,function(x) x["start"]:x["end"])),]
The apply simply generates a vector of all the cases that fall between starts and ends.
Single line, simple base solution:
df1[df1$position %in% unlist(apply(df2,1,function(x) x["start"]:x["end"])),]
The apply simply generates a vector of all the cases that fall between starts and ends.
edited Nov 22 at 17:51
answered Nov 22 at 17:45
iod
3,3741621
3,3741621
After a quick check, this seems to be the fastest solution.
– lith
Nov 26 at 8:47
add a comment |
After a quick check, this seems to be the fastest solution.
– lith
Nov 26 at 8:47
After a quick check, this seems to be the fastest solution.
– lith
Nov 26 at 8:47
After a quick check, this seems to be the fastest solution.
– lith
Nov 26 at 8:47
add a comment |
up vote
2
down vote
Here is a base R
option
do.call(rbind, Map(function(i, j)
df1[df1$position > i & df1$position < j, , drop = FALSE],
df2$start, df2$end))
# position
#1 3
#2 6
#3 12
#5 25
Or using fuzzy_join
library(fuzzyjoin)
library(dplyr)
fuzzy_inner_join(df1, df2, by = c('position' = 'start', 'position' = 'end'),
match_fun = list(`>`, `<`)) %>%
select(position)
# position
#1 3
#2 6
#3 12
#4 25
Or use a non-equi join from data.table
setDT(df2)[df1, on = .(start < position, end > position), .(position), nomatch = 0]
# position
#1: 3
#2: 6
#3: 12
#4: 25
data
df1 <- structure(list(position = c(3L, 6L, 12L, 18L, 25L, 31L)), row.names = c(NA,
-6L), class = "data.frame")
df2 <- structure(list(start = c(2L, 24L), end = c(17L, 29L)),
class = "data.frame", row.names = c(NA, -2L))
add a comment |
up vote
2
down vote
Here is a base R
option
do.call(rbind, Map(function(i, j)
df1[df1$position > i & df1$position < j, , drop = FALSE],
df2$start, df2$end))
# position
#1 3
#2 6
#3 12
#5 25
Or using fuzzy_join
library(fuzzyjoin)
library(dplyr)
fuzzy_inner_join(df1, df2, by = c('position' = 'start', 'position' = 'end'),
match_fun = list(`>`, `<`)) %>%
select(position)
# position
#1 3
#2 6
#3 12
#4 25
Or use a non-equi join from data.table
setDT(df2)[df1, on = .(start < position, end > position), .(position), nomatch = 0]
# position
#1: 3
#2: 6
#3: 12
#4: 25
data
df1 <- structure(list(position = c(3L, 6L, 12L, 18L, 25L, 31L)), row.names = c(NA,
-6L), class = "data.frame")
df2 <- structure(list(start = c(2L, 24L), end = c(17L, 29L)),
class = "data.frame", row.names = c(NA, -2L))
add a comment |
up vote
2
down vote
up vote
2
down vote
Here is a base R
option
do.call(rbind, Map(function(i, j)
df1[df1$position > i & df1$position < j, , drop = FALSE],
df2$start, df2$end))
# position
#1 3
#2 6
#3 12
#5 25
Or using fuzzy_join
library(fuzzyjoin)
library(dplyr)
fuzzy_inner_join(df1, df2, by = c('position' = 'start', 'position' = 'end'),
match_fun = list(`>`, `<`)) %>%
select(position)
# position
#1 3
#2 6
#3 12
#4 25
Or use a non-equi join from data.table
setDT(df2)[df1, on = .(start < position, end > position), .(position), nomatch = 0]
# position
#1: 3
#2: 6
#3: 12
#4: 25
data
df1 <- structure(list(position = c(3L, 6L, 12L, 18L, 25L, 31L)), row.names = c(NA,
-6L), class = "data.frame")
df2 <- structure(list(start = c(2L, 24L), end = c(17L, 29L)),
class = "data.frame", row.names = c(NA, -2L))
Here is a base R
option
do.call(rbind, Map(function(i, j)
df1[df1$position > i & df1$position < j, , drop = FALSE],
df2$start, df2$end))
# position
#1 3
#2 6
#3 12
#5 25
Or using fuzzy_join
library(fuzzyjoin)
library(dplyr)
fuzzy_inner_join(df1, df2, by = c('position' = 'start', 'position' = 'end'),
match_fun = list(`>`, `<`)) %>%
select(position)
# position
#1 3
#2 6
#3 12
#4 25
Or use a non-equi join from data.table
setDT(df2)[df1, on = .(start < position, end > position), .(position), nomatch = 0]
# position
#1: 3
#2: 6
#3: 12
#4: 25
data
df1 <- structure(list(position = c(3L, 6L, 12L, 18L, 25L, 31L)), row.names = c(NA,
-6L), class = "data.frame")
df2 <- structure(list(start = c(2L, 24L), end = c(17L, 29L)),
class = "data.frame", row.names = c(NA, -2L))
edited Nov 22 at 18:03
answered Nov 22 at 17:32
akrun
394k13186258
394k13186258
add a comment |
add a comment |
up vote
1
down vote
base R
solution (no packages needed)
keepRows<-
sapply(df1$position, function(x_o) {
any(apply(df2, 1, function(x) {x_o => x[1] & x_o <= x[2]}))
})
df1[keepRows,, drop = FALSE]
Result:
# position
#1 3
#2 6
#3 12
#5 25
Please note:
This is basically a double loop, I don't know how else to solve this with
base::
.Should the border values be included? You are very vague with "between". Currently, I don't include them. You can change that using
<=, >=
.
To get the "leftover" use negation:
df1[!keepRows,, drop = FALSE]
Made my question more clear upon your suggestion.
– RacktheMan
Nov 23 at 11:19
add a comment |
up vote
1
down vote
base R
solution (no packages needed)
keepRows<-
sapply(df1$position, function(x_o) {
any(apply(df2, 1, function(x) {x_o => x[1] & x_o <= x[2]}))
})
df1[keepRows,, drop = FALSE]
Result:
# position
#1 3
#2 6
#3 12
#5 25
Please note:
This is basically a double loop, I don't know how else to solve this with
base::
.Should the border values be included? You are very vague with "between". Currently, I don't include them. You can change that using
<=, >=
.
To get the "leftover" use negation:
df1[!keepRows,, drop = FALSE]
Made my question more clear upon your suggestion.
– RacktheMan
Nov 23 at 11:19
add a comment |
up vote
1
down vote
up vote
1
down vote
base R
solution (no packages needed)
keepRows<-
sapply(df1$position, function(x_o) {
any(apply(df2, 1, function(x) {x_o => x[1] & x_o <= x[2]}))
})
df1[keepRows,, drop = FALSE]
Result:
# position
#1 3
#2 6
#3 12
#5 25
Please note:
This is basically a double loop, I don't know how else to solve this with
base::
.Should the border values be included? You are very vague with "between". Currently, I don't include them. You can change that using
<=, >=
.
To get the "leftover" use negation:
df1[!keepRows,, drop = FALSE]
base R
solution (no packages needed)
keepRows<-
sapply(df1$position, function(x_o) {
any(apply(df2, 1, function(x) {x_o => x[1] & x_o <= x[2]}))
})
df1[keepRows,, drop = FALSE]
Result:
# position
#1 3
#2 6
#3 12
#5 25
Please note:
This is basically a double loop, I don't know how else to solve this with
base::
.Should the border values be included? You are very vague with "between". Currently, I don't include them. You can change that using
<=, >=
.
To get the "leftover" use negation:
df1[!keepRows,, drop = FALSE]
edited Nov 23 at 12:55
answered Nov 22 at 17:31
Andre Elrico
5,52811027
5,52811027
Made my question more clear upon your suggestion.
– RacktheMan
Nov 23 at 11:19
add a comment |
Made my question more clear upon your suggestion.
– RacktheMan
Nov 23 at 11:19
Made my question more clear upon your suggestion.
– RacktheMan
Nov 23 at 11:19
Made my question more clear upon your suggestion.
– RacktheMan
Nov 23 at 11:19
add a comment |
up vote
0
down vote
Here is another take that starts with df2 (I don't say this is wiser than Andre's approach):
subset(df1, apply(apply(df2, 1, function (x) {dplyr::between(df1$position, x["start"], x["end"])}), 1, any))
You should probably run some benchmarks on the proposed approaches before making a decision.
@AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.
– lith
Nov 26 at 8:49
@AndreElrico Since I don't usedata.table
the function I use actually is fromdplyr
. Thanks for the hint though. I forgot it isn't in base.
– lith
Nov 26 at 15:09
add a comment |
up vote
0
down vote
Here is another take that starts with df2 (I don't say this is wiser than Andre's approach):
subset(df1, apply(apply(df2, 1, function (x) {dplyr::between(df1$position, x["start"], x["end"])}), 1, any))
You should probably run some benchmarks on the proposed approaches before making a decision.
@AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.
– lith
Nov 26 at 8:49
@AndreElrico Since I don't usedata.table
the function I use actually is fromdplyr
. Thanks for the hint though. I forgot it isn't in base.
– lith
Nov 26 at 15:09
add a comment |
up vote
0
down vote
up vote
0
down vote
Here is another take that starts with df2 (I don't say this is wiser than Andre's approach):
subset(df1, apply(apply(df2, 1, function (x) {dplyr::between(df1$position, x["start"], x["end"])}), 1, any))
You should probably run some benchmarks on the proposed approaches before making a decision.
Here is another take that starts with df2 (I don't say this is wiser than Andre's approach):
subset(df1, apply(apply(df2, 1, function (x) {dplyr::between(df1$position, x["start"], x["end"])}), 1, any))
You should probably run some benchmarks on the proposed approaches before making a decision.
edited Nov 26 at 15:09
answered Nov 22 at 17:51
lith
567217
567217
@AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.
– lith
Nov 26 at 8:49
@AndreElrico Since I don't usedata.table
the function I use actually is fromdplyr
. Thanks for the hint though. I forgot it isn't in base.
– lith
Nov 26 at 15:09
add a comment |
@AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.
– lith
Nov 26 at 8:49
@AndreElrico Since I don't usedata.table
the function I use actually is fromdplyr
. Thanks for the hint though. I forgot it isn't in base.
– lith
Nov 26 at 15:09
@AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.
– lith
Nov 26 at 8:49
@AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.
– lith
Nov 26 at 8:49
@AndreElrico Since I don't use
data.table
the function I use actually is from dplyr
. Thanks for the hint though. I forgot it isn't in base.– lith
Nov 26 at 15:09
@AndreElrico Since I don't use
data.table
the function I use actually is from dplyr
. Thanks for the hint though. I forgot it isn't in base.– lith
Nov 26 at 15:09
add a comment |
you mean 18, 31 for df4
– Andre Elrico
Nov 23 at 12:56
Yes, corrected this!
– RacktheMan
Nov 23 at 13:36