Is it correct that Deedle/Series is slow compared to a list?
up vote
2
down vote
favorite
I am working on a data "intensive" app and I am not sure if I should use Series./DataFrame. It seems very interesting but it looks also way slower than the equivalent done with a List ... but I may not use the Series properly when I filter.
Please let me know what you think.
Thanks
type TSPoint<'a> =
{
Date : System.DateTime
Value : 'a
}
type TimeSerie<'a> = TSPoint<'a> list
let sd = System.DateTime(1950, 2, 1)
let tsd =[1..100000] |> List.map (fun x -> sd.AddDays(float x))
// creating a List of TSPoint
let tsList = tsd |> List.map (fun x -> {Date = x ; Value = 1.})
// creating the same as a serie
let tsSeries = Series(tsd , [1..100000] |> List.map (fun _ -> 1.))
// function to "randomise" the list of dates
let shuffleG xs = xs |> List.sortBy (fun _ -> Guid.NewGuid())
// new date list to search within out tsList and tsSeries
let d = tsd |> shuffleG |> List.take 1000
// Filter
d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
Here is what I get:
List -> Real: 00:00:04.780, CPU: 00:00:04.508, GC gen0: 917, gen1: 2, gen2: 1
Series -> Real: 00:00:54.386, CPU: 00:00:49.311, GC gen0: 944, gen1: 7, gen2: 3
f# deedle
add a comment |
up vote
2
down vote
favorite
I am working on a data "intensive" app and I am not sure if I should use Series./DataFrame. It seems very interesting but it looks also way slower than the equivalent done with a List ... but I may not use the Series properly when I filter.
Please let me know what you think.
Thanks
type TSPoint<'a> =
{
Date : System.DateTime
Value : 'a
}
type TimeSerie<'a> = TSPoint<'a> list
let sd = System.DateTime(1950, 2, 1)
let tsd =[1..100000] |> List.map (fun x -> sd.AddDays(float x))
// creating a List of TSPoint
let tsList = tsd |> List.map (fun x -> {Date = x ; Value = 1.})
// creating the same as a serie
let tsSeries = Series(tsd , [1..100000] |> List.map (fun _ -> 1.))
// function to "randomise" the list of dates
let shuffleG xs = xs |> List.sortBy (fun _ -> Guid.NewGuid())
// new date list to search within out tsList and tsSeries
let d = tsd |> shuffleG |> List.take 1000
// Filter
d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
Here is what I get:
List -> Real: 00:00:04.780, CPU: 00:00:04.508, GC gen0: 917, gen1: 2, gen2: 1
Series -> Real: 00:00:54.386, CPU: 00:00:49.311, GC gen0: 944, gen1: 7, gen2: 3
f# deedle
tbh, List itself will be very slow. Depending on your use case consider an array.
– s952163
Nov 22 at 8:07
1
but an array is mutable, which is something I am not a huge fan of
– John_hk
Nov 22 at 8:11
2
Just don't mutate it :D
– s952163
Nov 22 at 8:16
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I am working on a data "intensive" app and I am not sure if I should use Series./DataFrame. It seems very interesting but it looks also way slower than the equivalent done with a List ... but I may not use the Series properly when I filter.
Please let me know what you think.
Thanks
type TSPoint<'a> =
{
Date : System.DateTime
Value : 'a
}
type TimeSerie<'a> = TSPoint<'a> list
let sd = System.DateTime(1950, 2, 1)
let tsd =[1..100000] |> List.map (fun x -> sd.AddDays(float x))
// creating a List of TSPoint
let tsList = tsd |> List.map (fun x -> {Date = x ; Value = 1.})
// creating the same as a serie
let tsSeries = Series(tsd , [1..100000] |> List.map (fun _ -> 1.))
// function to "randomise" the list of dates
let shuffleG xs = xs |> List.sortBy (fun _ -> Guid.NewGuid())
// new date list to search within out tsList and tsSeries
let d = tsd |> shuffleG |> List.take 1000
// Filter
d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
Here is what I get:
List -> Real: 00:00:04.780, CPU: 00:00:04.508, GC gen0: 917, gen1: 2, gen2: 1
Series -> Real: 00:00:54.386, CPU: 00:00:49.311, GC gen0: 944, gen1: 7, gen2: 3
f# deedle
I am working on a data "intensive" app and I am not sure if I should use Series./DataFrame. It seems very interesting but it looks also way slower than the equivalent done with a List ... but I may not use the Series properly when I filter.
Please let me know what you think.
Thanks
type TSPoint<'a> =
{
Date : System.DateTime
Value : 'a
}
type TimeSerie<'a> = TSPoint<'a> list
let sd = System.DateTime(1950, 2, 1)
let tsd =[1..100000] |> List.map (fun x -> sd.AddDays(float x))
// creating a List of TSPoint
let tsList = tsd |> List.map (fun x -> {Date = x ; Value = 1.})
// creating the same as a serie
let tsSeries = Series(tsd , [1..100000] |> List.map (fun _ -> 1.))
// function to "randomise" the list of dates
let shuffleG xs = xs |> List.sortBy (fun _ -> Guid.NewGuid())
// new date list to search within out tsList and tsSeries
let d = tsd |> shuffleG |> List.take 1000
// Filter
d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
Here is what I get:
List -> Real: 00:00:04.780, CPU: 00:00:04.508, GC gen0: 917, gen1: 2, gen2: 1
Series -> Real: 00:00:54.386, CPU: 00:00:49.311, GC gen0: 944, gen1: 7, gen2: 3
f# deedle
f# deedle
asked Nov 22 at 8:03
John_hk
394
394
tbh, List itself will be very slow. Depending on your use case consider an array.
– s952163
Nov 22 at 8:07
1
but an array is mutable, which is something I am not a huge fan of
– John_hk
Nov 22 at 8:11
2
Just don't mutate it :D
– s952163
Nov 22 at 8:16
add a comment |
tbh, List itself will be very slow. Depending on your use case consider an array.
– s952163
Nov 22 at 8:07
1
but an array is mutable, which is something I am not a huge fan of
– John_hk
Nov 22 at 8:11
2
Just don't mutate it :D
– s952163
Nov 22 at 8:16
tbh, List itself will be very slow. Depending on your use case consider an array.
– s952163
Nov 22 at 8:07
tbh, List itself will be very slow. Depending on your use case consider an array.
– s952163
Nov 22 at 8:07
1
1
but an array is mutable, which is something I am not a huge fan of
– John_hk
Nov 22 at 8:11
but an array is mutable, which is something I am not a huge fan of
– John_hk
Nov 22 at 8:11
2
2
Just don't mutate it :D
– s952163
Nov 22 at 8:16
Just don't mutate it :D
– s952163
Nov 22 at 8:16
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
In general, Deedle series and data frames do have some extra overhead over writing hand-crafted code using whatever is the most efficient data structure for a given problem. The overhead is small for some operations and larger for some operations, so it depends on what you want to do and how you use Deedle.
If you use Deedle in a way in which it was intended to be used, then you'll get a good performance, but if you run a large number of operations that are not particularly efficient, you may get a bad performance.
In your particular case, you are running Series.filter
on 1000 series and creating a new series (which is what happens behind the scenes here) does have some overhead.
However, what your code really does is that you are using Series.filter
to find a value with a specific key. Deedle provides a key-based lookup operation for this (and it's one of the things it has been optimized for).
If you rewrite the code as follows, you'll get much better performance with Deedle than with list:
d |> List.map (fun x -> tsSeries.[x])
// 0.001 seconds
d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
// 3.46 seconds
d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
// 40.5 seconds
thanks, I didn't think to proceed this way. I will use TryGet so I get returned a OptionalValue and so I can do some pattern matching instead of getting an exception (in case of vfailure to find the value).
– John_hk
Nov 23 at 2:07
@John_hk Yes,TryGet
is the way to go if the key or value might be missing!
– Tomas Petricek
Nov 23 at 22:07
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
In general, Deedle series and data frames do have some extra overhead over writing hand-crafted code using whatever is the most efficient data structure for a given problem. The overhead is small for some operations and larger for some operations, so it depends on what you want to do and how you use Deedle.
If you use Deedle in a way in which it was intended to be used, then you'll get a good performance, but if you run a large number of operations that are not particularly efficient, you may get a bad performance.
In your particular case, you are running Series.filter
on 1000 series and creating a new series (which is what happens behind the scenes here) does have some overhead.
However, what your code really does is that you are using Series.filter
to find a value with a specific key. Deedle provides a key-based lookup operation for this (and it's one of the things it has been optimized for).
If you rewrite the code as follows, you'll get much better performance with Deedle than with list:
d |> List.map (fun x -> tsSeries.[x])
// 0.001 seconds
d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
// 3.46 seconds
d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
// 40.5 seconds
thanks, I didn't think to proceed this way. I will use TryGet so I get returned a OptionalValue and so I can do some pattern matching instead of getting an exception (in case of vfailure to find the value).
– John_hk
Nov 23 at 2:07
@John_hk Yes,TryGet
is the way to go if the key or value might be missing!
– Tomas Petricek
Nov 23 at 22:07
add a comment |
up vote
1
down vote
accepted
In general, Deedle series and data frames do have some extra overhead over writing hand-crafted code using whatever is the most efficient data structure for a given problem. The overhead is small for some operations and larger for some operations, so it depends on what you want to do and how you use Deedle.
If you use Deedle in a way in which it was intended to be used, then you'll get a good performance, but if you run a large number of operations that are not particularly efficient, you may get a bad performance.
In your particular case, you are running Series.filter
on 1000 series and creating a new series (which is what happens behind the scenes here) does have some overhead.
However, what your code really does is that you are using Series.filter
to find a value with a specific key. Deedle provides a key-based lookup operation for this (and it's one of the things it has been optimized for).
If you rewrite the code as follows, you'll get much better performance with Deedle than with list:
d |> List.map (fun x -> tsSeries.[x])
// 0.001 seconds
d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
// 3.46 seconds
d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
// 40.5 seconds
thanks, I didn't think to proceed this way. I will use TryGet so I get returned a OptionalValue and so I can do some pattern matching instead of getting an exception (in case of vfailure to find the value).
– John_hk
Nov 23 at 2:07
@John_hk Yes,TryGet
is the way to go if the key or value might be missing!
– Tomas Petricek
Nov 23 at 22:07
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
In general, Deedle series and data frames do have some extra overhead over writing hand-crafted code using whatever is the most efficient data structure for a given problem. The overhead is small for some operations and larger for some operations, so it depends on what you want to do and how you use Deedle.
If you use Deedle in a way in which it was intended to be used, then you'll get a good performance, but if you run a large number of operations that are not particularly efficient, you may get a bad performance.
In your particular case, you are running Series.filter
on 1000 series and creating a new series (which is what happens behind the scenes here) does have some overhead.
However, what your code really does is that you are using Series.filter
to find a value with a specific key. Deedle provides a key-based lookup operation for this (and it's one of the things it has been optimized for).
If you rewrite the code as follows, you'll get much better performance with Deedle than with list:
d |> List.map (fun x -> tsSeries.[x])
// 0.001 seconds
d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
// 3.46 seconds
d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
// 40.5 seconds
In general, Deedle series and data frames do have some extra overhead over writing hand-crafted code using whatever is the most efficient data structure for a given problem. The overhead is small for some operations and larger for some operations, so it depends on what you want to do and how you use Deedle.
If you use Deedle in a way in which it was intended to be used, then you'll get a good performance, but if you run a large number of operations that are not particularly efficient, you may get a bad performance.
In your particular case, you are running Series.filter
on 1000 series and creating a new series (which is what happens behind the scenes here) does have some overhead.
However, what your code really does is that you are using Series.filter
to find a value with a specific key. Deedle provides a key-based lookup operation for this (and it's one of the things it has been optimized for).
If you rewrite the code as follows, you'll get much better performance with Deedle than with list:
d |> List.map (fun x -> tsSeries.[x])
// 0.001 seconds
d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
// 3.46 seconds
d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
// 40.5 seconds
answered Nov 22 at 11:43
Tomas Petricek
197k13285458
197k13285458
thanks, I didn't think to proceed this way. I will use TryGet so I get returned a OptionalValue and so I can do some pattern matching instead of getting an exception (in case of vfailure to find the value).
– John_hk
Nov 23 at 2:07
@John_hk Yes,TryGet
is the way to go if the key or value might be missing!
– Tomas Petricek
Nov 23 at 22:07
add a comment |
thanks, I didn't think to proceed this way. I will use TryGet so I get returned a OptionalValue and so I can do some pattern matching instead of getting an exception (in case of vfailure to find the value).
– John_hk
Nov 23 at 2:07
@John_hk Yes,TryGet
is the way to go if the key or value might be missing!
– Tomas Petricek
Nov 23 at 22:07
thanks, I didn't think to proceed this way. I will use TryGet so I get returned a OptionalValue and so I can do some pattern matching instead of getting an exception (in case of vfailure to find the value).
– John_hk
Nov 23 at 2:07
thanks, I didn't think to proceed this way. I will use TryGet so I get returned a OptionalValue and so I can do some pattern matching instead of getting an exception (in case of vfailure to find the value).
– John_hk
Nov 23 at 2:07
@John_hk Yes,
TryGet
is the way to go if the key or value might be missing!– Tomas Petricek
Nov 23 at 22:07
@John_hk Yes,
TryGet
is the way to go if the key or value might be missing!– Tomas Petricek
Nov 23 at 22:07
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426313%2fis-it-correct-that-deedle-series-is-slow-compared-to-a-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
tbh, List itself will be very slow. Depending on your use case consider an array.
– s952163
Nov 22 at 8:07
1
but an array is mutable, which is something I am not a huge fan of
– John_hk
Nov 22 at 8:11
2
Just don't mutate it :D
– s952163
Nov 22 at 8:16