which is more efficient for the task: Xquery or Cypher

up vote
1
down vote

favorite

I would like to describe two scenarios where a system has a large xml file (containing multiple 10.000s of rows of data). My question is which scenario has a better performance, A or B?

The first step is the same for both scenarios: a function goes through the xml and puts the nodes and attributes into a neo4j database:

.xml --> custom function --> neo4j

The performance of this custom function doesn't matter, because it happens just once.

Than we would like to know the Nth uncle of a node for example. So we query the neo4j for that and return with the requested node.

A)
In the first scenario we query the neo4j directly

neo4j <-- query: Cypher <-- GUI

B)
In the second scenario we query the xml directly with Xquery, we get an ID back.

xml <-- query: Xquery <-- GUI

And than we query that ID from the neo4j

GUI --> query: Cypher --> neo4j

So in the first scenario we query the neo4j and do the read/write/update/delete basic operations as well.

in the second scenario we query the xml and only do the basic read/write/update/delete operations in the neo4j.

Would be nice to know which version has a better performance for the system and why!

edited Nov 22 at 14:29

asked Nov 22 at 13:54

kalzso

238

You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
– Michael Kay
Nov 22 at 15:37

add a comment |

up vote
1
down vote

favorite

I would like to describe two scenarios where a system has a large xml file (containing multiple 10.000s of rows of data). My question is which scenario has a better performance, A or B?

The first step is the same for both scenarios: a function goes through the xml and puts the nodes and attributes into a neo4j database:

.xml --> custom function --> neo4j

The performance of this custom function doesn't matter, because it happens just once.

Than we would like to know the Nth uncle of a node for example. So we query the neo4j for that and return with the requested node.

A)
In the first scenario we query the neo4j directly

neo4j <-- query: Cypher <-- GUI

B)
In the second scenario we query the xml directly with Xquery, we get an ID back.

xml <-- query: Xquery <-- GUI

And than we query that ID from the neo4j

GUI --> query: Cypher --> neo4j

So in the first scenario we query the neo4j and do the read/write/update/delete basic operations as well.

in the second scenario we query the xml and only do the basic read/write/update/delete operations in the neo4j.

Would be nice to know which version has a better performance for the system and why!

edited Nov 22 at 14:29

asked Nov 22 at 13:54

kalzso

238

You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
– Michael Kay
Nov 22 at 15:37

add a comment |

up vote
1
down vote

favorite

I would like to describe two scenarios where a system has a large xml file (containing multiple 10.000s of rows of data). My question is which scenario has a better performance, A or B?

The first step is the same for both scenarios: a function goes through the xml and puts the nodes and attributes into a neo4j database:

.xml --> custom function --> neo4j

The performance of this custom function doesn't matter, because it happens just once.

Than we would like to know the Nth uncle of a node for example. So we query the neo4j for that and return with the requested node.

A)
In the first scenario we query the neo4j directly

neo4j <-- query: Cypher <-- GUI

B)
In the second scenario we query the xml directly with Xquery, we get an ID back.

xml <-- query: Xquery <-- GUI

And than we query that ID from the neo4j

GUI --> query: Cypher --> neo4j

So in the first scenario we query the neo4j and do the read/write/update/delete basic operations as well.

in the second scenario we query the xml and only do the basic read/write/update/delete operations in the neo4j.

Would be nice to know which version has a better performance for the system and why!

edited Nov 22 at 14:29

asked Nov 22 at 13:54

kalzso

238

I would like to describe two scenarios where a system has a large xml file (containing multiple 10.000s of rows of data). My question is which scenario has a better performance, A or B?

The first step is the same for both scenarios: a function goes through the xml and puts the nodes and attributes into a neo4j database:

.xml --> custom function --> neo4j

The performance of this custom function doesn't matter, because it happens just once.

Than we would like to know the Nth uncle of a node for example. So we query the neo4j for that and return with the requested node.

A)
In the first scenario we query the neo4j directly

neo4j <-- query: Cypher <-- GUI

B)
In the second scenario we query the xml directly with Xquery, we get an ID back.

xml <-- query: Xquery <-- GUI

And than we query that ID from the neo4j

GUI --> query: Cypher --> neo4j

So in the first scenario we query the neo4j and do the read/write/update/delete basic operations as well.

in the second scenario we query the xml and only do the basic read/write/update/delete operations in the neo4j.

Would be nice to know which version has a better performance for the system and why!

xml performance neo4j cypher xquery

edited Nov 22 at 14:29

asked Nov 22 at 13:54

kalzso

238

edited Nov 22 at 14:29

asked Nov 22 at 13:54

kalzso

238

edited Nov 22 at 14:29

asked Nov 22 at 13:54

kalzso

238

asked Nov 22 at 13:54

kalzso

238

asked Nov 22 at 13:54

kalzso

238

You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
– Michael Kay
Nov 22 at 15:37

add a comment |

You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
– Michael Kay
Nov 22 at 15:37

You will have to measure both and see; it's very unlikely that anyone else's measurements extrapolate to your particular environment. And note that XQuery is a language, not a piece of software. There are many implementations of XQuery, some working on databases, some in-memory, and their performance is likely to vary widely. In general though, if you have gone to the effort of loading your data into a database, then the database should be able to deliver better performance than anything working on the raw XML. The only caveat is that your dataset seems to be quite small.
– Michael Kay
Nov 22 at 15:37

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

The best method is to implement both, stress test both, and evaluate for yourself if the performance difference is large enough to justify not going with the simpler/easier-to-maintain solution. There are a lot of factors not in your post that can affect the results like; What xquery implementation are you using? Are the GUI, XML, and Neo4j all on the same server? Network hardware, usage load, and server specs; data size (your suggested size sounds like your data could qualify as a "toy" project, so performance is probably a moot point. That is, you have less than 1 million nodes or relationships)

That said, I'd like to put my money on just doing the Neo4j Cypher. Network (or lesser, cross app) communication is slow (in computer time), and since you are going to Neo4j in both, you are going to be paying that cost anyways (twice with the XQuery solution since the GUI is initiating both calls)

XQuery will most likely have to do an scan of the XML file (I don't know how else it could possible work without an index), while Neo4j is designed for relationship traversals. Each of which will be a binary search against an an internal index (essentially, possibly not exactly. Cypher planner does what is most efficient per query). Both require disk IO, but Cypher has the advantage that Neo4j caches some data in RAM for quick retrievals, and Neo4j will more often require less disk accesses to find what it needs.

edited Nov 23 at 21:04

answered Nov 23 at 20:57

Tezra

4,91821042

Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
– kalzso
Nov 26 at 12:58

@Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
– Tezra
Nov 27 at 15:32

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53432528%2fwhich-is-more-efficient-for-the-task-xquery-or-cypher%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

edited Nov 23 at 21:04

answered Nov 23 at 20:57

Tezra

4,91821042

Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
– kalzso
Nov 26 at 12:58

@Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
– Tezra
Nov 27 at 15:32

add a comment |

up vote
0
down vote

edited Nov 23 at 21:04

answered Nov 23 at 20:57

Tezra

4,91821042

Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
– kalzso
Nov 26 at 12:58

@Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
– Tezra
Nov 27 at 15:32

add a comment |

up vote
0
down vote

edited Nov 23 at 21:04

answered Nov 23 at 20:57

Tezra

4,91821042

edited Nov 23 at 21:04

answered Nov 23 at 20:57

Tezra

4,91821042

edited Nov 23 at 21:04

answered Nov 23 at 20:57

Tezra

4,91821042

answered Nov 23 at 20:57

Tezra

4,91821042

answered Nov 23 at 20:57

Tezra

4,91821042

Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
– kalzso
Nov 26 at 12:58

@Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
– Tezra
Nov 27 at 15:32

add a comment |

Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
– kalzso
Nov 26 at 12:58

@Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
– Tezra
Nov 27 at 15:32

Thank you for the tip, however I cannot accept your answer as solution, because we still dont know the answer for sure.
– kalzso
Nov 26 at 12:58

@Patry0t Like I said in the first half; Need more info to get any more specific. Namely, what XQuery implementation are you using, and where Neo4j, XQuery, and the Client are in the network (same machine vs in same lan, vs over ISP). Without more details, you will need to implement and stress test both to know for sure.
– Tezra
Nov 27 at 15:32

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Qfyilyi