which is more efficient for the task: Xquery or Cypher











up vote
1
down vote

favorite












I would like to describe two scenarios where a system has a large xml file (containing multiple 10.000s of rows of data). My question is which scenario has a better performance, A or B?



The first step is the same for both scenarios: a function goes through the xml and puts the nodes and attributes into a neo4j database:



.xml --> custom function --> neo4j


The performance of this custom function doesn't matter, because it happens just once.



Than we would like to know the Nth uncle of a node for example. So we query the neo4j for that and return with the requested node.



A)
In the first scenario we query the neo4j directly



neo4j <-- query: Cypher <-- GUI


B)
In the second scenario we query the xml directly with Xquery, we get an ID back.



xml <-- query: Xquery <-- GUI


And than we query that ID from the neo4j



GUI --> query: Cypher --> neo4j




So in the first scenario we query the neo4j and do the read/write/update/delete basic operations as well.



in the second scenario we query the xml and only do the basic read/write/update/delete operations in the neo4j.



Would be nice to know which version has a better performance for the system and why!










share|improve this question
























  • You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
    – Michael Kay
    Nov 22 at 15:37

















up vote
1
down vote

favorite












I would like to describe two scenarios where a system has a large xml file (containing multiple 10.000s of rows of data). My question is which scenario has a better performance, A or B?



The first step is the same for both scenarios: a function goes through the xml and puts the nodes and attributes into a neo4j database:



.xml --> custom function --> neo4j


The performance of this custom function doesn't matter, because it happens just once.



Than we would like to know the Nth uncle of a node for example. So we query the neo4j for that and return with the requested node.



A)
In the first scenario we query the neo4j directly



neo4j <-- query: Cypher <-- GUI


B)
In the second scenario we query the xml directly with Xquery, we get an ID back.



xml <-- query: Xquery <-- GUI


And than we query that ID from the neo4j



GUI --> query: Cypher --> neo4j




So in the first scenario we query the neo4j and do the read/write/update/delete basic operations as well.



in the second scenario we query the xml and only do the basic read/write/update/delete operations in the neo4j.



Would be nice to know which version has a better performance for the system and why!










share|improve this question
























  • You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
    – Michael Kay
    Nov 22 at 15:37















up vote
1
down vote

favorite









up vote
1
down vote

favorite











I would like to describe two scenarios where a system has a large xml file (containing multiple 10.000s of rows of data). My question is which scenario has a better performance, A or B?



The first step is the same for both scenarios: a function goes through the xml and puts the nodes and attributes into a neo4j database:



.xml --> custom function --> neo4j


The performance of this custom function doesn't matter, because it happens just once.



Than we would like to know the Nth uncle of a node for example. So we query the neo4j for that and return with the requested node.



A)
In the first scenario we query the neo4j directly



neo4j <-- query: Cypher <-- GUI


B)
In the second scenario we query the xml directly with Xquery, we get an ID back.



xml <-- query: Xquery <-- GUI


And than we query that ID from the neo4j



GUI --> query: Cypher --> neo4j




So in the first scenario we query the neo4j and do the read/write/update/delete basic operations as well.



in the second scenario we query the xml and only do the basic read/write/update/delete operations in the neo4j.



Would be nice to know which version has a better performance for the system and why!










share|improve this question















I would like to describe two scenarios where a system has a large xml file (containing multiple 10.000s of rows of data). My question is which scenario has a better performance, A or B?



The first step is the same for both scenarios: a function goes through the xml and puts the nodes and attributes into a neo4j database:



.xml --> custom function --> neo4j


The performance of this custom function doesn't matter, because it happens just once.



Than we would like to know the Nth uncle of a node for example. So we query the neo4j for that and return with the requested node.



A)
In the first scenario we query the neo4j directly



neo4j <-- query: Cypher <-- GUI


B)
In the second scenario we query the xml directly with Xquery, we get an ID back.



xml <-- query: Xquery <-- GUI


And than we query that ID from the neo4j



GUI --> query: Cypher --> neo4j




So in the first scenario we query the neo4j and do the read/write/update/delete basic operations as well.



in the second scenario we query the xml and only do the basic read/write/update/delete operations in the neo4j.



Would be nice to know which version has a better performance for the system and why!







xml performance neo4j cypher xquery






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 at 14:29

























asked Nov 22 at 13:54









kalzso

238




238












  • You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
    – Michael Kay
    Nov 22 at 15:37




















  • You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
    – Michael Kay
    Nov 22 at 15:37


















You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
– Michael Kay
Nov 22 at 15:37






You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
– Michael Kay
Nov 22 at 15:37














1 Answer
1






active

oldest

votes

















up vote
0
down vote













The best method is to implement both, stress test both, and evaluate for yourself if the performance difference is large enough to justify not going with the simpler/easier-to-maintain solution. There are a lot of factors not in your post that can affect the results like; What xquery implementation are you using? Are the GUI, XML, and Neo4j all on the same server? Network hardware, usage load, and server specs; data size (your suggested size sounds like your data could qualify as a "toy" project, so performance is probably a moot point. That is, you have less than 1 million nodes or relationships)





That said, I'd like to put my money on just doing the Neo4j Cypher. Network (or lesser, cross app) communication is slow (in computer time), and since you are going to Neo4j in both, you are going to be paying that cost anyways (twice with the XQuery solution since the GUI is initiating both calls)



XQuery will most likely have to do an scan of the XML file (I don't know how else it could possible work without an index), while Neo4j is designed for relationship traversals. Each of which will be a binary search against an an internal index (essentially, possibly not exactly. Cypher planner does what is most efficient per query). Both require disk IO, but Cypher has the advantage that Neo4j caches some data in RAM for quick retrievals, and Neo4j will more often require less disk accesses to find what it needs.






share|improve this answer























  • Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
    – kalzso
    Nov 26 at 12:58










  • @Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
    – Tezra
    Nov 27 at 15:32











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53432528%2fwhich-is-more-efficient-for-the-task-xquery-or-cypher%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













The best method is to implement both, stress test both, and evaluate for yourself if the performance difference is large enough to justify not going with the simpler/easier-to-maintain solution. There are a lot of factors not in your post that can affect the results like; What xquery implementation are you using? Are the GUI, XML, and Neo4j all on the same server? Network hardware, usage load, and server specs; data size (your suggested size sounds like your data could qualify as a "toy" project, so performance is probably a moot point. That is, you have less than 1 million nodes or relationships)





That said, I'd like to put my money on just doing the Neo4j Cypher. Network (or lesser, cross app) communication is slow (in computer time), and since you are going to Neo4j in both, you are going to be paying that cost anyways (twice with the XQuery solution since the GUI is initiating both calls)



XQuery will most likely have to do an scan of the XML file (I don't know how else it could possible work without an index), while Neo4j is designed for relationship traversals. Each of which will be a binary search against an an internal index (essentially, possibly not exactly. Cypher planner does what is most efficient per query). Both require disk IO, but Cypher has the advantage that Neo4j caches some data in RAM for quick retrievals, and Neo4j will more often require less disk accesses to find what it needs.






share|improve this answer























  • Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
    – kalzso
    Nov 26 at 12:58










  • @Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
    – Tezra
    Nov 27 at 15:32















up vote
0
down vote













The best method is to implement both, stress test both, and evaluate for yourself if the performance difference is large enough to justify not going with the simpler/easier-to-maintain solution. There are a lot of factors not in your post that can affect the results like; What xquery implementation are you using? Are the GUI, XML, and Neo4j all on the same server? Network hardware, usage load, and server specs; data size (your suggested size sounds like your data could qualify as a "toy" project, so performance is probably a moot point. That is, you have less than 1 million nodes or relationships)





That said, I'd like to put my money on just doing the Neo4j Cypher. Network (or lesser, cross app) communication is slow (in computer time), and since you are going to Neo4j in both, you are going to be paying that cost anyways (twice with the XQuery solution since the GUI is initiating both calls)



XQuery will most likely have to do an scan of the XML file (I don't know how else it could possible work without an index), while Neo4j is designed for relationship traversals. Each of which will be a binary search against an an internal index (essentially, possibly not exactly. Cypher planner does what is most efficient per query). Both require disk IO, but Cypher has the advantage that Neo4j caches some data in RAM for quick retrievals, and Neo4j will more often require less disk accesses to find what it needs.






share|improve this answer























  • Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
    – kalzso
    Nov 26 at 12:58










  • @Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
    – Tezra
    Nov 27 at 15:32













up vote
0
down vote










up vote
0
down vote









The best method is to implement both, stress test both, and evaluate for yourself if the performance difference is large enough to justify not going with the simpler/easier-to-maintain solution. There are a lot of factors not in your post that can affect the results like; What xquery implementation are you using? Are the GUI, XML, and Neo4j all on the same server? Network hardware, usage load, and server specs; data size (your suggested size sounds like your data could qualify as a "toy" project, so performance is probably a moot point. That is, you have less than 1 million nodes or relationships)





That said, I'd like to put my money on just doing the Neo4j Cypher. Network (or lesser, cross app) communication is slow (in computer time), and since you are going to Neo4j in both, you are going to be paying that cost anyways (twice with the XQuery solution since the GUI is initiating both calls)



XQuery will most likely have to do an scan of the XML file (I don't know how else it could possible work without an index), while Neo4j is designed for relationship traversals. Each of which will be a binary search against an an internal index (essentially, possibly not exactly. Cypher planner does what is most efficient per query). Both require disk IO, but Cypher has the advantage that Neo4j caches some data in RAM for quick retrievals, and Neo4j will more often require less disk accesses to find what it needs.






share|improve this answer














The best method is to implement both, stress test both, and evaluate for yourself if the performance difference is large enough to justify not going with the simpler/easier-to-maintain solution. There are a lot of factors not in your post that can affect the results like; What xquery implementation are you using? Are the GUI, XML, and Neo4j all on the same server? Network hardware, usage load, and server specs; data size (your suggested size sounds like your data could qualify as a "toy" project, so performance is probably a moot point. That is, you have less than 1 million nodes or relationships)





That said, I'd like to put my money on just doing the Neo4j Cypher. Network (or lesser, cross app) communication is slow (in computer time), and since you are going to Neo4j in both, you are going to be paying that cost anyways (twice with the XQuery solution since the GUI is initiating both calls)



XQuery will most likely have to do an scan of the XML file (I don't know how else it could possible work without an index), while Neo4j is designed for relationship traversals. Each of which will be a binary search against an an internal index (essentially, possibly not exactly. Cypher planner does what is most efficient per query). Both require disk IO, but Cypher has the advantage that Neo4j caches some data in RAM for quick retrievals, and Neo4j will more often require less disk accesses to find what it needs.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 23 at 21:04

























answered Nov 23 at 20:57









Tezra

4,91821042




4,91821042












  • Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
    – kalzso
    Nov 26 at 12:58










  • @Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
    – Tezra
    Nov 27 at 15:32


















  • Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
    – kalzso
    Nov 26 at 12:58










  • @Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
    – Tezra
    Nov 27 at 15:32
















Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
– kalzso
Nov 26 at 12:58




Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
– kalzso
Nov 26 at 12:58












@Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
– Tezra
Nov 27 at 15:32




@Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
– Tezra
Nov 27 at 15:32


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53432528%2fwhich-is-more-efficient-for-the-task-xquery-or-cypher%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to ignore python UserWarning in pytest?

What visual should I use to simply compare current year value vs last year in Power BI desktop

Script to remove string up to first number