How to design a table in Cassandra with event date and four column to filter











up vote
0
down vote

favorite












Here is my case: I have a flow of csv which represent 1-2 millions of event per day that has to be filtered on two variables : date range and one of four different columns. My only constraint is that data has to be stored on a single server.



I'm an advanced user on database and I had some interesting results on Postgres and Mysql with partition per day and index on the four columns. I did some tries on Cassandra but I was quite disappointed with the performance. Can Cassandra be competitive on a single server to do this kind of filtering compared to a database ? I tried different table structure without significant performance result. Do you have any recommendation ?










share|improve this question


















  • 1




    If you have a requirement to store the data on a single machine, then why would you choose Cassandra over the alternatives you mentioned? And what was the design of the table that you benchmarked Cassandra with?
    – ernest_k
    Nov 22 at 17:26










  • I will choose the most performant solution. My conclusion so far is that Cassandra on a single server is an interesting solution when the filtering is done on one or two dimension. The performance is bad if the filtering can be done on serval columns. However I'm not an advanced Cassandra user and before concluding anything, I prefer to check here if I missed something.
    – Ignatius J. Reilly
    Nov 22 at 20:26










  • You might already know that partition key is must to query on Cassandra tables and Order of your predicates in where should match the order you specified for Clustering columns in your table definition. You can also create Secondary indexes. But, consider using them along with your partition key in you where clause
    – Praneeth Gudumasu
    Nov 26 at 7:30















up vote
0
down vote

favorite












Here is my case: I have a flow of csv which represent 1-2 millions of event per day that has to be filtered on two variables : date range and one of four different columns. My only constraint is that data has to be stored on a single server.



I'm an advanced user on database and I had some interesting results on Postgres and Mysql with partition per day and index on the four columns. I did some tries on Cassandra but I was quite disappointed with the performance. Can Cassandra be competitive on a single server to do this kind of filtering compared to a database ? I tried different table structure without significant performance result. Do you have any recommendation ?










share|improve this question


















  • 1




    If you have a requirement to store the data on a single machine, then why would you choose Cassandra over the alternatives you mentioned? And what was the design of the table that you benchmarked Cassandra with?
    – ernest_k
    Nov 22 at 17:26










  • I will choose the most performant solution. My conclusion so far is that Cassandra on a single server is an interesting solution when the filtering is done on one or two dimension. The performance is bad if the filtering can be done on serval columns. However I'm not an advanced Cassandra user and before concluding anything, I prefer to check here if I missed something.
    – Ignatius J. Reilly
    Nov 22 at 20:26










  • You might already know that partition key is must to query on Cassandra tables and Order of your predicates in where should match the order you specified for Clustering columns in your table definition. You can also create Secondary indexes. But, consider using them along with your partition key in you where clause
    – Praneeth Gudumasu
    Nov 26 at 7:30













up vote
0
down vote

favorite









up vote
0
down vote

favorite











Here is my case: I have a flow of csv which represent 1-2 millions of event per day that has to be filtered on two variables : date range and one of four different columns. My only constraint is that data has to be stored on a single server.



I'm an advanced user on database and I had some interesting results on Postgres and Mysql with partition per day and index on the four columns. I did some tries on Cassandra but I was quite disappointed with the performance. Can Cassandra be competitive on a single server to do this kind of filtering compared to a database ? I tried different table structure without significant performance result. Do you have any recommendation ?










share|improve this question













Here is my case: I have a flow of csv which represent 1-2 millions of event per day that has to be filtered on two variables : date range and one of four different columns. My only constraint is that data has to be stored on a single server.



I'm an advanced user on database and I had some interesting results on Postgres and Mysql with partition per day and index on the four columns. I did some tries on Cassandra but I was quite disappointed with the performance. Can Cassandra be competitive on a single server to do this kind of filtering compared to a database ? I tried different table structure without significant performance result. Do you have any recommendation ?







cassandra






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 22 at 17:20









Ignatius J. Reilly

346




346








  • 1




    If you have a requirement to store the data on a single machine, then why would you choose Cassandra over the alternatives you mentioned? And what was the design of the table that you benchmarked Cassandra with?
    – ernest_k
    Nov 22 at 17:26










  • I will choose the most performant solution. My conclusion so far is that Cassandra on a single server is an interesting solution when the filtering is done on one or two dimension. The performance is bad if the filtering can be done on serval columns. However I'm not an advanced Cassandra user and before concluding anything, I prefer to check here if I missed something.
    – Ignatius J. Reilly
    Nov 22 at 20:26










  • You might already know that partition key is must to query on Cassandra tables and Order of your predicates in where should match the order you specified for Clustering columns in your table definition. You can also create Secondary indexes. But, consider using them along with your partition key in you where clause
    – Praneeth Gudumasu
    Nov 26 at 7:30














  • 1




    If you have a requirement to store the data on a single machine, then why would you choose Cassandra over the alternatives you mentioned? And what was the design of the table that you benchmarked Cassandra with?
    – ernest_k
    Nov 22 at 17:26










  • I will choose the most performant solution. My conclusion so far is that Cassandra on a single server is an interesting solution when the filtering is done on one or two dimension. The performance is bad if the filtering can be done on serval columns. However I'm not an advanced Cassandra user and before concluding anything, I prefer to check here if I missed something.
    – Ignatius J. Reilly
    Nov 22 at 20:26










  • You might already know that partition key is must to query on Cassandra tables and Order of your predicates in where should match the order you specified for Clustering columns in your table definition. You can also create Secondary indexes. But, consider using them along with your partition key in you where clause
    – Praneeth Gudumasu
    Nov 26 at 7:30








1




1




If you have a requirement to store the data on a single machine, then why would you choose Cassandra over the alternatives you mentioned? And what was the design of the table that you benchmarked Cassandra with?
– ernest_k
Nov 22 at 17:26




If you have a requirement to store the data on a single machine, then why would you choose Cassandra over the alternatives you mentioned? And what was the design of the table that you benchmarked Cassandra with?
– ernest_k
Nov 22 at 17:26












I will choose the most performant solution. My conclusion so far is that Cassandra on a single server is an interesting solution when the filtering is done on one or two dimension. The performance is bad if the filtering can be done on serval columns. However I'm not an advanced Cassandra user and before concluding anything, I prefer to check here if I missed something.
– Ignatius J. Reilly
Nov 22 at 20:26




I will choose the most performant solution. My conclusion so far is that Cassandra on a single server is an interesting solution when the filtering is done on one or two dimension. The performance is bad if the filtering can be done on serval columns. However I'm not an advanced Cassandra user and before concluding anything, I prefer to check here if I missed something.
– Ignatius J. Reilly
Nov 22 at 20:26












You might already know that partition key is must to query on Cassandra tables and Order of your predicates in where should match the order you specified for Clustering columns in your table definition. You can also create Secondary indexes. But, consider using them along with your partition key in you where clause
– Praneeth Gudumasu
Nov 26 at 7:30




You might already know that partition key is must to query on Cassandra tables and Order of your predicates in where should match the order you specified for Clustering columns in your table definition. You can also create Secondary indexes. But, consider using them along with your partition key in you where clause
– Praneeth Gudumasu
Nov 26 at 7:30

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435759%2fhow-to-design-a-table-in-cassandra-with-event-date-and-four-column-to-filter%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435759%2fhow-to-design-a-table-in-cassandra-with-event-date-and-four-column-to-filter%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to ignore python UserWarning in pytest?

What visual should I use to simply compare current year value vs last year in Power BI desktop

Script to remove string up to first number