which is more efficient for the task: Xquery or Cypher
up vote
1
down vote
favorite
I would like to describe two scenarios where a system has a large xml file (containing multiple 10.000s of rows of data). My question is which scenario has a better performance, A or B?
The first step is the same for both scenarios: a function goes through the xml and puts the nodes and attributes into a neo4j database:
.xml --> custom function --> neo4j
The performance of this custom function doesn't matter, because it happens just once.
Than we would like to know the Nth uncle of a node for example. So we query the neo4j for that and return with the requested node.
A)
In the first scenario we query the neo4j directly
neo4j <-- query: Cypher <-- GUI
B)
In the second scenario we query the xml directly with Xquery, we get an ID back.
xml <-- query: Xquery <-- GUI
And than we query that ID from the neo4j
GUI --> query: Cypher --> neo4j
So in the first scenario we query the neo4j and do the read/write/update/delete basic operations as well.
in the second scenario we query the xml and only do the basic read/write/update/delete operations in the neo4j.
Would be nice to know which version has a better performance for the system and why!
xml performance neo4j cypher xquery
add a comment |
up vote
1
down vote
favorite
I would like to describe two scenarios where a system has a large xml file (containing multiple 10.000s of rows of data). My question is which scenario has a better performance, A or B?
The first step is the same for both scenarios: a function goes through the xml and puts the nodes and attributes into a neo4j database:
.xml --> custom function --> neo4j
The performance of this custom function doesn't matter, because it happens just once.
Than we would like to know the Nth uncle of a node for example. So we query the neo4j for that and return with the requested node.
A)
In the first scenario we query the neo4j directly
neo4j <-- query: Cypher <-- GUI
B)
In the second scenario we query the xml directly with Xquery, we get an ID back.
xml <-- query: Xquery <-- GUI
And than we query that ID from the neo4j
GUI --> query: Cypher --> neo4j
So in the first scenario we query the neo4j and do the read/write/update/delete basic operations as well.
in the second scenario we query the xml and only do the basic read/write/update/delete operations in the neo4j.
Would be nice to know which version has a better performance for the system and why!
xml performance neo4j cypher xquery
You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
– Michael Kay
Nov 22 at 15:37
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I would like to describe two scenarios where a system has a large xml file (containing multiple 10.000s of rows of data). My question is which scenario has a better performance, A or B?
The first step is the same for both scenarios: a function goes through the xml and puts the nodes and attributes into a neo4j database:
.xml --> custom function --> neo4j
The performance of this custom function doesn't matter, because it happens just once.
Than we would like to know the Nth uncle of a node for example. So we query the neo4j for that and return with the requested node.
A)
In the first scenario we query the neo4j directly
neo4j <-- query: Cypher <-- GUI
B)
In the second scenario we query the xml directly with Xquery, we get an ID back.
xml <-- query: Xquery <-- GUI
And than we query that ID from the neo4j
GUI --> query: Cypher --> neo4j
So in the first scenario we query the neo4j and do the read/write/update/delete basic operations as well.
in the second scenario we query the xml and only do the basic read/write/update/delete operations in the neo4j.
Would be nice to know which version has a better performance for the system and why!
xml performance neo4j cypher xquery
I would like to describe two scenarios where a system has a large xml file (containing multiple 10.000s of rows of data). My question is which scenario has a better performance, A or B?
The first step is the same for both scenarios: a function goes through the xml and puts the nodes and attributes into a neo4j database:
.xml --> custom function --> neo4j
The performance of this custom function doesn't matter, because it happens just once.
Than we would like to know the Nth uncle of a node for example. So we query the neo4j for that and return with the requested node.
A)
In the first scenario we query the neo4j directly
neo4j <-- query: Cypher <-- GUI
B)
In the second scenario we query the xml directly with Xquery, we get an ID back.
xml <-- query: Xquery <-- GUI
And than we query that ID from the neo4j
GUI --> query: Cypher --> neo4j
So in the first scenario we query the neo4j and do the read/write/update/delete basic operations as well.
in the second scenario we query the xml and only do the basic read/write/update/delete operations in the neo4j.
Would be nice to know which version has a better performance for the system and why!
xml performance neo4j cypher xquery
xml performance neo4j cypher xquery
edited Nov 22 at 14:29
asked Nov 22 at 13:54
kalzso
238
238
You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
– Michael Kay
Nov 22 at 15:37
add a comment |
You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
– Michael Kay
Nov 22 at 15:37
You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
– Michael Kay
Nov 22 at 15:37
You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
– Michael Kay
Nov 22 at 15:37
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
The best method is to implement both, stress test both, and evaluate for yourself if the performance difference is large enough to justify not going with the simpler/easier-to-maintain solution. There are a lot of factors not in your post that can affect the results like; What xquery implementation are you using? Are the GUI, XML, and Neo4j all on the same server? Network hardware, usage load, and server specs; data size (your suggested size sounds like your data could qualify as a "toy" project, so performance is probably a moot point. That is, you have less than 1 million nodes or relationships)
That said, I'd like to put my money on just doing the Neo4j Cypher. Network (or lesser, cross app) communication is slow (in computer time), and since you are going to Neo4j in both, you are going to be paying that cost anyways (twice with the XQuery solution since the GUI is initiating both calls)
XQuery will most likely have to do an scan of the XML file (I don't know how else it could possible work without an index), while Neo4j is designed for relationship traversals. Each of which will be a binary search against an an internal index (essentially, possibly not exactly. Cypher planner does what is most efficient per query). Both require disk IO, but Cypher has the advantage that Neo4j caches some data in RAM for quick retrievals, and Neo4j will more often require less disk accesses to find what it needs.
Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
– kalzso
Nov 26 at 12:58
@Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
– Tezra
Nov 27 at 15:32
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
The best method is to implement both, stress test both, and evaluate for yourself if the performance difference is large enough to justify not going with the simpler/easier-to-maintain solution. There are a lot of factors not in your post that can affect the results like; What xquery implementation are you using? Are the GUI, XML, and Neo4j all on the same server? Network hardware, usage load, and server specs; data size (your suggested size sounds like your data could qualify as a "toy" project, so performance is probably a moot point. That is, you have less than 1 million nodes or relationships)
That said, I'd like to put my money on just doing the Neo4j Cypher. Network (or lesser, cross app) communication is slow (in computer time), and since you are going to Neo4j in both, you are going to be paying that cost anyways (twice with the XQuery solution since the GUI is initiating both calls)
XQuery will most likely have to do an scan of the XML file (I don't know how else it could possible work without an index), while Neo4j is designed for relationship traversals. Each of which will be a binary search against an an internal index (essentially, possibly not exactly. Cypher planner does what is most efficient per query). Both require disk IO, but Cypher has the advantage that Neo4j caches some data in RAM for quick retrievals, and Neo4j will more often require less disk accesses to find what it needs.
Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
– kalzso
Nov 26 at 12:58
@Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
– Tezra
Nov 27 at 15:32
add a comment |
up vote
0
down vote
The best method is to implement both, stress test both, and evaluate for yourself if the performance difference is large enough to justify not going with the simpler/easier-to-maintain solution. There are a lot of factors not in your post that can affect the results like; What xquery implementation are you using? Are the GUI, XML, and Neo4j all on the same server? Network hardware, usage load, and server specs; data size (your suggested size sounds like your data could qualify as a "toy" project, so performance is probably a moot point. That is, you have less than 1 million nodes or relationships)
That said, I'd like to put my money on just doing the Neo4j Cypher. Network (or lesser, cross app) communication is slow (in computer time), and since you are going to Neo4j in both, you are going to be paying that cost anyways (twice with the XQuery solution since the GUI is initiating both calls)
XQuery will most likely have to do an scan of the XML file (I don't know how else it could possible work without an index), while Neo4j is designed for relationship traversals. Each of which will be a binary search against an an internal index (essentially, possibly not exactly. Cypher planner does what is most efficient per query). Both require disk IO, but Cypher has the advantage that Neo4j caches some data in RAM for quick retrievals, and Neo4j will more often require less disk accesses to find what it needs.
Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
– kalzso
Nov 26 at 12:58
@Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
– Tezra
Nov 27 at 15:32
add a comment |
up vote
0
down vote
up vote
0
down vote
The best method is to implement both, stress test both, and evaluate for yourself if the performance difference is large enough to justify not going with the simpler/easier-to-maintain solution. There are a lot of factors not in your post that can affect the results like; What xquery implementation are you using? Are the GUI, XML, and Neo4j all on the same server? Network hardware, usage load, and server specs; data size (your suggested size sounds like your data could qualify as a "toy" project, so performance is probably a moot point. That is, you have less than 1 million nodes or relationships)
That said, I'd like to put my money on just doing the Neo4j Cypher. Network (or lesser, cross app) communication is slow (in computer time), and since you are going to Neo4j in both, you are going to be paying that cost anyways (twice with the XQuery solution since the GUI is initiating both calls)
XQuery will most likely have to do an scan of the XML file (I don't know how else it could possible work without an index), while Neo4j is designed for relationship traversals. Each of which will be a binary search against an an internal index (essentially, possibly not exactly. Cypher planner does what is most efficient per query). Both require disk IO, but Cypher has the advantage that Neo4j caches some data in RAM for quick retrievals, and Neo4j will more often require less disk accesses to find what it needs.
The best method is to implement both, stress test both, and evaluate for yourself if the performance difference is large enough to justify not going with the simpler/easier-to-maintain solution. There are a lot of factors not in your post that can affect the results like; What xquery implementation are you using? Are the GUI, XML, and Neo4j all on the same server? Network hardware, usage load, and server specs; data size (your suggested size sounds like your data could qualify as a "toy" project, so performance is probably a moot point. That is, you have less than 1 million nodes or relationships)
That said, I'd like to put my money on just doing the Neo4j Cypher. Network (or lesser, cross app) communication is slow (in computer time), and since you are going to Neo4j in both, you are going to be paying that cost anyways (twice with the XQuery solution since the GUI is initiating both calls)
XQuery will most likely have to do an scan of the XML file (I don't know how else it could possible work without an index), while Neo4j is designed for relationship traversals. Each of which will be a binary search against an an internal index (essentially, possibly not exactly. Cypher planner does what is most efficient per query). Both require disk IO, but Cypher has the advantage that Neo4j caches some data in RAM for quick retrievals, and Neo4j will more often require less disk accesses to find what it needs.
edited Nov 23 at 21:04
answered Nov 23 at 20:57
Tezra
4,91821042
4,91821042
Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
– kalzso
Nov 26 at 12:58
@Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
– Tezra
Nov 27 at 15:32
add a comment |
Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
– kalzso
Nov 26 at 12:58
@Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
– Tezra
Nov 27 at 15:32
Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
– kalzso
Nov 26 at 12:58
Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
– kalzso
Nov 26 at 12:58
@Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
– Tezra
Nov 27 at 15:32
@Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
– Tezra
Nov 27 at 15:32
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53432528%2fwhich-is-more-efficient-for-the-task-xquery-or-cypher%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
– Michael Kay
Nov 22 at 15:37