Subset a dataframe in two dataframes by values in two columns of another dataframe [duplicate]











up vote
1
down vote

favorite













This question already has an answer here:




  • Filter by ranges supplied by two vectors, without a join operation

    2 answers



  • Efficient way to filter one data frame by ranges in another

    3 answers



  • Subset by multiple ranges [duplicate]

    3 answers




I have two dataframes.
df1 looks like (or the column of df1 i am interested in):



position
2
6
12
18
25
31


and df2 looks like:



start   end
2 17
24 29


I want to keep the positions in df1 that only fall between (<= or >=) the start and end coordinates of df2, so that df1 looks like this after filtering:



position
2
6
12
25


Then I want to keep the filtered out "leftover" values of df1 in another dataframe, let's call it df4.



df4 would look like:



position
18
31


I can do this the perl way using a for loop (coming from perl and currently learning R) but I am pretty sure I can somehow use filter or some other dplyr or base R combination of functions to achieve this.



Any help would be appreciated!



EDIT: Added df4 calculation as my question was marked as duplicate and this is sth not found in the other similar threads. This is something I am interested in doing to make my code faster!










share|improve this question















marked as duplicate by Henrik r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 22 at 18:05


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.















  • you mean 18, 31 for df4
    – Andre Elrico
    Nov 23 at 12:56










  • Yes, corrected this!
    – RacktheMan
    Nov 23 at 13:36















up vote
1
down vote

favorite













This question already has an answer here:




  • Filter by ranges supplied by two vectors, without a join operation

    2 answers



  • Efficient way to filter one data frame by ranges in another

    3 answers



  • Subset by multiple ranges [duplicate]

    3 answers




I have two dataframes.
df1 looks like (or the column of df1 i am interested in):



position
2
6
12
18
25
31


and df2 looks like:



start   end
2 17
24 29


I want to keep the positions in df1 that only fall between (<= or >=) the start and end coordinates of df2, so that df1 looks like this after filtering:



position
2
6
12
25


Then I want to keep the filtered out "leftover" values of df1 in another dataframe, let's call it df4.



df4 would look like:



position
18
31


I can do this the perl way using a for loop (coming from perl and currently learning R) but I am pretty sure I can somehow use filter or some other dplyr or base R combination of functions to achieve this.



Any help would be appreciated!



EDIT: Added df4 calculation as my question was marked as duplicate and this is sth not found in the other similar threads. This is something I am interested in doing to make my code faster!










share|improve this question















marked as duplicate by Henrik r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 22 at 18:05


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.















  • you mean 18, 31 for df4
    – Andre Elrico
    Nov 23 at 12:56










  • Yes, corrected this!
    – RacktheMan
    Nov 23 at 13:36













up vote
1
down vote

favorite









up vote
1
down vote

favorite












This question already has an answer here:




  • Filter by ranges supplied by two vectors, without a join operation

    2 answers



  • Efficient way to filter one data frame by ranges in another

    3 answers



  • Subset by multiple ranges [duplicate]

    3 answers




I have two dataframes.
df1 looks like (or the column of df1 i am interested in):



position
2
6
12
18
25
31


and df2 looks like:



start   end
2 17
24 29


I want to keep the positions in df1 that only fall between (<= or >=) the start and end coordinates of df2, so that df1 looks like this after filtering:



position
2
6
12
25


Then I want to keep the filtered out "leftover" values of df1 in another dataframe, let's call it df4.



df4 would look like:



position
18
31


I can do this the perl way using a for loop (coming from perl and currently learning R) but I am pretty sure I can somehow use filter or some other dplyr or base R combination of functions to achieve this.



Any help would be appreciated!



EDIT: Added df4 calculation as my question was marked as duplicate and this is sth not found in the other similar threads. This is something I am interested in doing to make my code faster!










share|improve this question
















This question already has an answer here:




  • Filter by ranges supplied by two vectors, without a join operation

    2 answers



  • Efficient way to filter one data frame by ranges in another

    3 answers



  • Subset by multiple ranges [duplicate]

    3 answers




I have two dataframes.
df1 looks like (or the column of df1 i am interested in):



position
2
6
12
18
25
31


and df2 looks like:



start   end
2 17
24 29


I want to keep the positions in df1 that only fall between (<= or >=) the start and end coordinates of df2, so that df1 looks like this after filtering:



position
2
6
12
25


Then I want to keep the filtered out "leftover" values of df1 in another dataframe, let's call it df4.



df4 would look like:



position
18
31


I can do this the perl way using a for loop (coming from perl and currently learning R) but I am pretty sure I can somehow use filter or some other dplyr or base R combination of functions to achieve this.



Any help would be appreciated!



EDIT: Added df4 calculation as my question was marked as duplicate and this is sth not found in the other similar threads. This is something I am interested in doing to make my code faster!





This question already has an answer here:




  • Filter by ranges supplied by two vectors, without a join operation

    2 answers



  • Efficient way to filter one data frame by ranges in another

    3 answers



  • Subset by multiple ranges [duplicate]

    3 answers








r dplyr






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 at 13:36

























asked Nov 22 at 17:20









RacktheMan

84




84




marked as duplicate by Henrik r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 22 at 18:05


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






marked as duplicate by Henrik r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 22 at 18:05


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.














  • you mean 18, 31 for df4
    – Andre Elrico
    Nov 23 at 12:56










  • Yes, corrected this!
    – RacktheMan
    Nov 23 at 13:36


















  • you mean 18, 31 for df4
    – Andre Elrico
    Nov 23 at 12:56










  • Yes, corrected this!
    – RacktheMan
    Nov 23 at 13:36
















you mean 18, 31 for df4
– Andre Elrico
Nov 23 at 12:56




you mean 18, 31 for df4
– Andre Elrico
Nov 23 at 12:56












Yes, corrected this!
– RacktheMan
Nov 23 at 13:36




Yes, corrected this!
– RacktheMan
Nov 23 at 13:36












5 Answers
5






active

oldest

votes

















up vote
0
down vote



accepted










We can full_join these two data frames and then filter for the rows within the start and end column. The Flag column in the example is just for the join. Finally, we can use distinct to remove duplicated rows.



library(dplyr)

df3 <- df1 %>%
mutate(Flag = 1) %>%
full_join(df2 %>% mutate(Flag = 1), by = "Flag") %>%
filter(position >= start, position <= end) %>%
distinct(position)
df3
# position
# 1 3
# 2 6
# 3 12
# 4 25


DATA



df1 <- read.table(text = "position
3
6
12
18
25
31", header = TRUE)

df2 <- read.table(text = "start end
2 17
24 29",
header = TRUE)





share|improve this answer





















  • Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)
    – RacktheMan
    Nov 23 at 11:18












  • I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!
    – RacktheMan
    Nov 23 at 13:04


















up vote
3
down vote













Single line, simple base solution:



df1[df1$position %in% unlist(apply(df2,1,function(x) x["start"]:x["end"])),]


The apply simply generates a vector of all the cases that fall between starts and ends.






share|improve this answer























  • After a quick check, this seems to be the fastest solution.
    – lith
    Nov 26 at 8:47


















up vote
2
down vote













Here is a base R option



do.call(rbind, Map(function(i, j) 
df1[df1$position > i & df1$position < j, , drop = FALSE],
df2$start, df2$end))
# position
#1 3
#2 6
#3 12
#5 25




Or using fuzzy_join



library(fuzzyjoin)
library(dplyr)
fuzzy_inner_join(df1, df2, by = c('position' = 'start', 'position' = 'end'),
match_fun = list(`>`, `<`)) %>%
select(position)
# position
#1 3
#2 6
#3 12
#4 25




Or use a non-equi join from data.table



setDT(df2)[df1, on = .(start < position, end > position), .(position), nomatch = 0]
# position
#1: 3
#2: 6
#3: 12
#4: 25


data



df1 <- structure(list(position = c(3L, 6L, 12L, 18L, 25L, 31L)), row.names = c(NA, 
-6L), class = "data.frame")

df2 <- structure(list(start = c(2L, 24L), end = c(17L, 29L)),
class = "data.frame", row.names = c(NA, -2L))





share|improve this answer






























    up vote
    1
    down vote













    base R solution (no packages needed)



    keepRows<-
    sapply(df1$position, function(x_o) {
    any(apply(df2, 1, function(x) {x_o => x[1] & x_o <= x[2]}))
    })

    df1[keepRows,, drop = FALSE]


    Result:



    #  position
    #1 3
    #2 6
    #3 12
    #5 25




    Please note:




    • This is basically a double loop, I don't know how else to solve this with base::.


    • Should the border values be included? You are very vague with "between". Currently, I don't include them. You can change that using <=, >=.





    To get the "leftover" use negation:



    df1[!keepRows,, drop = FALSE]





    share|improve this answer























    • Made my question more clear upon your suggestion.
      – RacktheMan
      Nov 23 at 11:19


















    up vote
    0
    down vote













    Here is another take that starts with df2 (I don't say this is wiser than Andre's approach):



    subset(df1, apply(apply(df2, 1, function (x) {dplyr::between(df1$position, x["start"], x["end"])}), 1, any))


    You should probably run some benchmarks on the proposed approaches before making a decision.






    share|improve this answer























    • @AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.
      – lith
      Nov 26 at 8:49










    • @AndreElrico Since I don't use data.table the function I use actually is from dplyr. Thanks for the hint though. I forgot it isn't in base.
      – lith
      Nov 26 at 15:09


















    5 Answers
    5






    active

    oldest

    votes








    5 Answers
    5






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote



    accepted










    We can full_join these two data frames and then filter for the rows within the start and end column. The Flag column in the example is just for the join. Finally, we can use distinct to remove duplicated rows.



    library(dplyr)

    df3 <- df1 %>%
    mutate(Flag = 1) %>%
    full_join(df2 %>% mutate(Flag = 1), by = "Flag") %>%
    filter(position >= start, position <= end) %>%
    distinct(position)
    df3
    # position
    # 1 3
    # 2 6
    # 3 12
    # 4 25


    DATA



    df1 <- read.table(text = "position
    3
    6
    12
    18
    25
    31", header = TRUE)

    df2 <- read.table(text = "start end
    2 17
    24 29",
    header = TRUE)





    share|improve this answer





















    • Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)
      – RacktheMan
      Nov 23 at 11:18












    • I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!
      – RacktheMan
      Nov 23 at 13:04















    up vote
    0
    down vote



    accepted










    We can full_join these two data frames and then filter for the rows within the start and end column. The Flag column in the example is just for the join. Finally, we can use distinct to remove duplicated rows.



    library(dplyr)

    df3 <- df1 %>%
    mutate(Flag = 1) %>%
    full_join(df2 %>% mutate(Flag = 1), by = "Flag") %>%
    filter(position >= start, position <= end) %>%
    distinct(position)
    df3
    # position
    # 1 3
    # 2 6
    # 3 12
    # 4 25


    DATA



    df1 <- read.table(text = "position
    3
    6
    12
    18
    25
    31", header = TRUE)

    df2 <- read.table(text = "start end
    2 17
    24 29",
    header = TRUE)





    share|improve this answer





















    • Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)
      – RacktheMan
      Nov 23 at 11:18












    • I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!
      – RacktheMan
      Nov 23 at 13:04













    up vote
    0
    down vote



    accepted







    up vote
    0
    down vote



    accepted






    We can full_join these two data frames and then filter for the rows within the start and end column. The Flag column in the example is just for the join. Finally, we can use distinct to remove duplicated rows.



    library(dplyr)

    df3 <- df1 %>%
    mutate(Flag = 1) %>%
    full_join(df2 %>% mutate(Flag = 1), by = "Flag") %>%
    filter(position >= start, position <= end) %>%
    distinct(position)
    df3
    # position
    # 1 3
    # 2 6
    # 3 12
    # 4 25


    DATA



    df1 <- read.table(text = "position
    3
    6
    12
    18
    25
    31", header = TRUE)

    df2 <- read.table(text = "start end
    2 17
    24 29",
    header = TRUE)





    share|improve this answer












    We can full_join these two data frames and then filter for the rows within the start and end column. The Flag column in the example is just for the join. Finally, we can use distinct to remove duplicated rows.



    library(dplyr)

    df3 <- df1 %>%
    mutate(Flag = 1) %>%
    full_join(df2 %>% mutate(Flag = 1), by = "Flag") %>%
    filter(position >= start, position <= end) %>%
    distinct(position)
    df3
    # position
    # 1 3
    # 2 6
    # 3 12
    # 4 25


    DATA



    df1 <- read.table(text = "position
    3
    6
    12
    18
    25
    31", header = TRUE)

    df2 <- read.table(text = "start end
    2 17
    24 29",
    header = TRUE)






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 22 at 17:31









    www

    25.7k102240




    25.7k102240












    • Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)
      – RacktheMan
      Nov 23 at 11:18












    • I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!
      – RacktheMan
      Nov 23 at 13:04


















    • Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)
      – RacktheMan
      Nov 23 at 11:18












    • I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!
      – RacktheMan
      Nov 23 at 13:04
















    Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)
    – RacktheMan
    Nov 23 at 11:18






    Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)
    – RacktheMan
    Nov 23 at 11:18














    I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!
    – RacktheMan
    Nov 23 at 13:04




    I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!
    – RacktheMan
    Nov 23 at 13:04












    up vote
    3
    down vote













    Single line, simple base solution:



    df1[df1$position %in% unlist(apply(df2,1,function(x) x["start"]:x["end"])),]


    The apply simply generates a vector of all the cases that fall between starts and ends.






    share|improve this answer























    • After a quick check, this seems to be the fastest solution.
      – lith
      Nov 26 at 8:47















    up vote
    3
    down vote













    Single line, simple base solution:



    df1[df1$position %in% unlist(apply(df2,1,function(x) x["start"]:x["end"])),]


    The apply simply generates a vector of all the cases that fall between starts and ends.






    share|improve this answer























    • After a quick check, this seems to be the fastest solution.
      – lith
      Nov 26 at 8:47













    up vote
    3
    down vote










    up vote
    3
    down vote









    Single line, simple base solution:



    df1[df1$position %in% unlist(apply(df2,1,function(x) x["start"]:x["end"])),]


    The apply simply generates a vector of all the cases that fall between starts and ends.






    share|improve this answer














    Single line, simple base solution:



    df1[df1$position %in% unlist(apply(df2,1,function(x) x["start"]:x["end"])),]


    The apply simply generates a vector of all the cases that fall between starts and ends.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 22 at 17:51

























    answered Nov 22 at 17:45









    iod

    3,3741621




    3,3741621












    • After a quick check, this seems to be the fastest solution.
      – lith
      Nov 26 at 8:47


















    • After a quick check, this seems to be the fastest solution.
      – lith
      Nov 26 at 8:47
















    After a quick check, this seems to be the fastest solution.
    – lith
    Nov 26 at 8:47




    After a quick check, this seems to be the fastest solution.
    – lith
    Nov 26 at 8:47










    up vote
    2
    down vote













    Here is a base R option



    do.call(rbind, Map(function(i, j) 
    df1[df1$position > i & df1$position < j, , drop = FALSE],
    df2$start, df2$end))
    # position
    #1 3
    #2 6
    #3 12
    #5 25




    Or using fuzzy_join



    library(fuzzyjoin)
    library(dplyr)
    fuzzy_inner_join(df1, df2, by = c('position' = 'start', 'position' = 'end'),
    match_fun = list(`>`, `<`)) %>%
    select(position)
    # position
    #1 3
    #2 6
    #3 12
    #4 25




    Or use a non-equi join from data.table



    setDT(df2)[df1, on = .(start < position, end > position), .(position), nomatch = 0]
    # position
    #1: 3
    #2: 6
    #3: 12
    #4: 25


    data



    df1 <- structure(list(position = c(3L, 6L, 12L, 18L, 25L, 31L)), row.names = c(NA, 
    -6L), class = "data.frame")

    df2 <- structure(list(start = c(2L, 24L), end = c(17L, 29L)),
    class = "data.frame", row.names = c(NA, -2L))





    share|improve this answer



























      up vote
      2
      down vote













      Here is a base R option



      do.call(rbind, Map(function(i, j) 
      df1[df1$position > i & df1$position < j, , drop = FALSE],
      df2$start, df2$end))
      # position
      #1 3
      #2 6
      #3 12
      #5 25




      Or using fuzzy_join



      library(fuzzyjoin)
      library(dplyr)
      fuzzy_inner_join(df1, df2, by = c('position' = 'start', 'position' = 'end'),
      match_fun = list(`>`, `<`)) %>%
      select(position)
      # position
      #1 3
      #2 6
      #3 12
      #4 25




      Or use a non-equi join from data.table



      setDT(df2)[df1, on = .(start < position, end > position), .(position), nomatch = 0]
      # position
      #1: 3
      #2: 6
      #3: 12
      #4: 25


      data



      df1 <- structure(list(position = c(3L, 6L, 12L, 18L, 25L, 31L)), row.names = c(NA, 
      -6L), class = "data.frame")

      df2 <- structure(list(start = c(2L, 24L), end = c(17L, 29L)),
      class = "data.frame", row.names = c(NA, -2L))





      share|improve this answer

























        up vote
        2
        down vote










        up vote
        2
        down vote









        Here is a base R option



        do.call(rbind, Map(function(i, j) 
        df1[df1$position > i & df1$position < j, , drop = FALSE],
        df2$start, df2$end))
        # position
        #1 3
        #2 6
        #3 12
        #5 25




        Or using fuzzy_join



        library(fuzzyjoin)
        library(dplyr)
        fuzzy_inner_join(df1, df2, by = c('position' = 'start', 'position' = 'end'),
        match_fun = list(`>`, `<`)) %>%
        select(position)
        # position
        #1 3
        #2 6
        #3 12
        #4 25




        Or use a non-equi join from data.table



        setDT(df2)[df1, on = .(start < position, end > position), .(position), nomatch = 0]
        # position
        #1: 3
        #2: 6
        #3: 12
        #4: 25


        data



        df1 <- structure(list(position = c(3L, 6L, 12L, 18L, 25L, 31L)), row.names = c(NA, 
        -6L), class = "data.frame")

        df2 <- structure(list(start = c(2L, 24L), end = c(17L, 29L)),
        class = "data.frame", row.names = c(NA, -2L))





        share|improve this answer














        Here is a base R option



        do.call(rbind, Map(function(i, j) 
        df1[df1$position > i & df1$position < j, , drop = FALSE],
        df2$start, df2$end))
        # position
        #1 3
        #2 6
        #3 12
        #5 25




        Or using fuzzy_join



        library(fuzzyjoin)
        library(dplyr)
        fuzzy_inner_join(df1, df2, by = c('position' = 'start', 'position' = 'end'),
        match_fun = list(`>`, `<`)) %>%
        select(position)
        # position
        #1 3
        #2 6
        #3 12
        #4 25




        Or use a non-equi join from data.table



        setDT(df2)[df1, on = .(start < position, end > position), .(position), nomatch = 0]
        # position
        #1: 3
        #2: 6
        #3: 12
        #4: 25


        data



        df1 <- structure(list(position = c(3L, 6L, 12L, 18L, 25L, 31L)), row.names = c(NA, 
        -6L), class = "data.frame")

        df2 <- structure(list(start = c(2L, 24L), end = c(17L, 29L)),
        class = "data.frame", row.names = c(NA, -2L))






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 22 at 18:03

























        answered Nov 22 at 17:32









        akrun

        394k13186258




        394k13186258






















            up vote
            1
            down vote













            base R solution (no packages needed)



            keepRows<-
            sapply(df1$position, function(x_o) {
            any(apply(df2, 1, function(x) {x_o => x[1] & x_o <= x[2]}))
            })

            df1[keepRows,, drop = FALSE]


            Result:



            #  position
            #1 3
            #2 6
            #3 12
            #5 25




            Please note:




            • This is basically a double loop, I don't know how else to solve this with base::.


            • Should the border values be included? You are very vague with "between". Currently, I don't include them. You can change that using <=, >=.





            To get the "leftover" use negation:



            df1[!keepRows,, drop = FALSE]





            share|improve this answer























            • Made my question more clear upon your suggestion.
              – RacktheMan
              Nov 23 at 11:19















            up vote
            1
            down vote













            base R solution (no packages needed)



            keepRows<-
            sapply(df1$position, function(x_o) {
            any(apply(df2, 1, function(x) {x_o => x[1] & x_o <= x[2]}))
            })

            df1[keepRows,, drop = FALSE]


            Result:



            #  position
            #1 3
            #2 6
            #3 12
            #5 25




            Please note:




            • This is basically a double loop, I don't know how else to solve this with base::.


            • Should the border values be included? You are very vague with "between". Currently, I don't include them. You can change that using <=, >=.





            To get the "leftover" use negation:



            df1[!keepRows,, drop = FALSE]





            share|improve this answer























            • Made my question more clear upon your suggestion.
              – RacktheMan
              Nov 23 at 11:19













            up vote
            1
            down vote










            up vote
            1
            down vote









            base R solution (no packages needed)



            keepRows<-
            sapply(df1$position, function(x_o) {
            any(apply(df2, 1, function(x) {x_o => x[1] & x_o <= x[2]}))
            })

            df1[keepRows,, drop = FALSE]


            Result:



            #  position
            #1 3
            #2 6
            #3 12
            #5 25




            Please note:




            • This is basically a double loop, I don't know how else to solve this with base::.


            • Should the border values be included? You are very vague with "between". Currently, I don't include them. You can change that using <=, >=.





            To get the "leftover" use negation:



            df1[!keepRows,, drop = FALSE]





            share|improve this answer














            base R solution (no packages needed)



            keepRows<-
            sapply(df1$position, function(x_o) {
            any(apply(df2, 1, function(x) {x_o => x[1] & x_o <= x[2]}))
            })

            df1[keepRows,, drop = FALSE]


            Result:



            #  position
            #1 3
            #2 6
            #3 12
            #5 25




            Please note:




            • This is basically a double loop, I don't know how else to solve this with base::.


            • Should the border values be included? You are very vague with "between". Currently, I don't include them. You can change that using <=, >=.





            To get the "leftover" use negation:



            df1[!keepRows,, drop = FALSE]






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 23 at 12:55

























            answered Nov 22 at 17:31









            Andre Elrico

            5,52811027




            5,52811027












            • Made my question more clear upon your suggestion.
              – RacktheMan
              Nov 23 at 11:19


















            • Made my question more clear upon your suggestion.
              – RacktheMan
              Nov 23 at 11:19
















            Made my question more clear upon your suggestion.
            – RacktheMan
            Nov 23 at 11:19




            Made my question more clear upon your suggestion.
            – RacktheMan
            Nov 23 at 11:19










            up vote
            0
            down vote













            Here is another take that starts with df2 (I don't say this is wiser than Andre's approach):



            subset(df1, apply(apply(df2, 1, function (x) {dplyr::between(df1$position, x["start"], x["end"])}), 1, any))


            You should probably run some benchmarks on the proposed approaches before making a decision.






            share|improve this answer























            • @AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.
              – lith
              Nov 26 at 8:49










            • @AndreElrico Since I don't use data.table the function I use actually is from dplyr. Thanks for the hint though. I forgot it isn't in base.
              – lith
              Nov 26 at 15:09















            up vote
            0
            down vote













            Here is another take that starts with df2 (I don't say this is wiser than Andre's approach):



            subset(df1, apply(apply(df2, 1, function (x) {dplyr::between(df1$position, x["start"], x["end"])}), 1, any))


            You should probably run some benchmarks on the proposed approaches before making a decision.






            share|improve this answer























            • @AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.
              – lith
              Nov 26 at 8:49










            • @AndreElrico Since I don't use data.table the function I use actually is from dplyr. Thanks for the hint though. I forgot it isn't in base.
              – lith
              Nov 26 at 15:09













            up vote
            0
            down vote










            up vote
            0
            down vote









            Here is another take that starts with df2 (I don't say this is wiser than Andre's approach):



            subset(df1, apply(apply(df2, 1, function (x) {dplyr::between(df1$position, x["start"], x["end"])}), 1, any))


            You should probably run some benchmarks on the proposed approaches before making a decision.






            share|improve this answer














            Here is another take that starts with df2 (I don't say this is wiser than Andre's approach):



            subset(df1, apply(apply(df2, 1, function (x) {dplyr::between(df1$position, x["start"], x["end"])}), 1, any))


            You should probably run some benchmarks on the proposed approaches before making a decision.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 26 at 15:09

























            answered Nov 22 at 17:51









            lith

            567217




            567217












            • @AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.
              – lith
              Nov 26 at 8:49










            • @AndreElrico Since I don't use data.table the function I use actually is from dplyr. Thanks for the hint though. I forgot it isn't in base.
              – lith
              Nov 26 at 15:09


















            • @AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.
              – lith
              Nov 26 at 8:49










            • @AndreElrico Since I don't use data.table the function I use actually is from dplyr. Thanks for the hint though. I forgot it isn't in base.
              – lith
              Nov 26 at 15:09
















            @AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.
            – lith
            Nov 26 at 8:49




            @AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.
            – lith
            Nov 26 at 8:49












            @AndreElrico Since I don't use data.table the function I use actually is from dplyr. Thanks for the hint though. I forgot it isn't in base.
            – lith
            Nov 26 at 15:09




            @AndreElrico Since I don't use data.table the function I use actually is from dplyr. Thanks for the hint though. I forgot it isn't in base.
            – lith
            Nov 26 at 15:09



            Popular posts from this blog

            Trompette piccolo

            Slow SSRS Report in dynamic grouping and multiple parameters

            Simon Yates (cyclisme)