The formula of ScoreDoc.score. in Lucene











up vote
1
down vote

favorite
1












I want to create a research engine using Lucene. From Lucene documentation, I noticed that ScoreDoc.score gives the similarity score between the document and query.



I want to know how the similarity score is calculated?



Please help me..










share|improve this question




























    up vote
    1
    down vote

    favorite
    1












    I want to create a research engine using Lucene. From Lucene documentation, I noticed that ScoreDoc.score gives the similarity score between the document and query.



    I want to know how the similarity score is calculated?



    Please help me..










    share|improve this question


























      up vote
      1
      down vote

      favorite
      1









      up vote
      1
      down vote

      favorite
      1






      1





      I want to create a research engine using Lucene. From Lucene documentation, I noticed that ScoreDoc.score gives the similarity score between the document and query.



      I want to know how the similarity score is calculated?



      Please help me..










      share|improve this question















      I want to create a research engine using Lucene. From Lucene documentation, I noticed that ScoreDoc.score gives the similarity score between the document and query.



      I want to know how the similarity score is calculated?



      Please help me..







      java apache search solr lucene






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 22 at 20:37

























      asked Nov 22 at 16:45









      Noran

      1649




      1649
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote













          Similarly score is calculated based on the similarly model being used in the field on which user is doing the query. There are two I am aware of tf-idf and another is BM25.



          Both of those uses the documents characterstics like doc length, word frequency, idf etc. So you could go through this link if it helps






          share|improve this answer





















          • That link doesn't really explain much about how BM25 works - a much better explanation can be found at BM25 - The Next Generation of Lucene Relevation. BM25 is the default similarity in Solr these days.
            – MatsLindh
            Nov 23 at 19:30












          • @MatsLindhThe page not found
            – Noran
            Nov 23 at 20:30












          • @AmanTandon I would like to normalize the scores in Lucene, Do you know how to do this?
            – Noran
            Nov 23 at 20:36










          • @Noran Please refer the github.com/apache/lucene-solr/blob/releases/lucene-solr/6.4.0/… You can divide the score given by scoring algo (BM25) by the max score returned by the getMaxScore to normalize the score, however could you explain why you want to normalize those values?
            – Aman Tandon
            Nov 24 at 7:37












          • @Noran Here is the corrected link which Mats provided and gives the good explanation of BM25. opensourceconnections.com/blog/2015/10/16/…
            – Aman Tandon
            Nov 24 at 7:39













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435288%2fthe-formula-of-scoredoc-score-in-lucene%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote













          Similarly score is calculated based on the similarly model being used in the field on which user is doing the query. There are two I am aware of tf-idf and another is BM25.



          Both of those uses the documents characterstics like doc length, word frequency, idf etc. So you could go through this link if it helps






          share|improve this answer





















          • That link doesn't really explain much about how BM25 works - a much better explanation can be found at BM25 - The Next Generation of Lucene Relevation. BM25 is the default similarity in Solr these days.
            – MatsLindh
            Nov 23 at 19:30












          • @MatsLindhThe page not found
            – Noran
            Nov 23 at 20:30












          • @AmanTandon I would like to normalize the scores in Lucene, Do you know how to do this?
            – Noran
            Nov 23 at 20:36










          • @Noran Please refer the github.com/apache/lucene-solr/blob/releases/lucene-solr/6.4.0/… You can divide the score given by scoring algo (BM25) by the max score returned by the getMaxScore to normalize the score, however could you explain why you want to normalize those values?
            – Aman Tandon
            Nov 24 at 7:37












          • @Noran Here is the corrected link which Mats provided and gives the good explanation of BM25. opensourceconnections.com/blog/2015/10/16/…
            – Aman Tandon
            Nov 24 at 7:39

















          up vote
          1
          down vote













          Similarly score is calculated based on the similarly model being used in the field on which user is doing the query. There are two I am aware of tf-idf and another is BM25.



          Both of those uses the documents characterstics like doc length, word frequency, idf etc. So you could go through this link if it helps






          share|improve this answer





















          • That link doesn't really explain much about how BM25 works - a much better explanation can be found at BM25 - The Next Generation of Lucene Relevation. BM25 is the default similarity in Solr these days.
            – MatsLindh
            Nov 23 at 19:30












          • @MatsLindhThe page not found
            – Noran
            Nov 23 at 20:30












          • @AmanTandon I would like to normalize the scores in Lucene, Do you know how to do this?
            – Noran
            Nov 23 at 20:36










          • @Noran Please refer the github.com/apache/lucene-solr/blob/releases/lucene-solr/6.4.0/… You can divide the score given by scoring algo (BM25) by the max score returned by the getMaxScore to normalize the score, however could you explain why you want to normalize those values?
            – Aman Tandon
            Nov 24 at 7:37












          • @Noran Here is the corrected link which Mats provided and gives the good explanation of BM25. opensourceconnections.com/blog/2015/10/16/…
            – Aman Tandon
            Nov 24 at 7:39















          up vote
          1
          down vote










          up vote
          1
          down vote









          Similarly score is calculated based on the similarly model being used in the field on which user is doing the query. There are two I am aware of tf-idf and another is BM25.



          Both of those uses the documents characterstics like doc length, word frequency, idf etc. So you could go through this link if it helps






          share|improve this answer












          Similarly score is calculated based on the similarly model being used in the field on which user is doing the query. There are two I am aware of tf-idf and another is BM25.



          Both of those uses the documents characterstics like doc length, word frequency, idf etc. So you could go through this link if it helps







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 23 at 17:22









          Aman Tandon

          560220




          560220












          • That link doesn't really explain much about how BM25 works - a much better explanation can be found at BM25 - The Next Generation of Lucene Relevation. BM25 is the default similarity in Solr these days.
            – MatsLindh
            Nov 23 at 19:30












          • @MatsLindhThe page not found
            – Noran
            Nov 23 at 20:30












          • @AmanTandon I would like to normalize the scores in Lucene, Do you know how to do this?
            – Noran
            Nov 23 at 20:36










          • @Noran Please refer the github.com/apache/lucene-solr/blob/releases/lucene-solr/6.4.0/… You can divide the score given by scoring algo (BM25) by the max score returned by the getMaxScore to normalize the score, however could you explain why you want to normalize those values?
            – Aman Tandon
            Nov 24 at 7:37












          • @Noran Here is the corrected link which Mats provided and gives the good explanation of BM25. opensourceconnections.com/blog/2015/10/16/…
            – Aman Tandon
            Nov 24 at 7:39




















          • That link doesn't really explain much about how BM25 works - a much better explanation can be found at BM25 - The Next Generation of Lucene Relevation. BM25 is the default similarity in Solr these days.
            – MatsLindh
            Nov 23 at 19:30












          • @MatsLindhThe page not found
            – Noran
            Nov 23 at 20:30












          • @AmanTandon I would like to normalize the scores in Lucene, Do you know how to do this?
            – Noran
            Nov 23 at 20:36










          • @Noran Please refer the github.com/apache/lucene-solr/blob/releases/lucene-solr/6.4.0/… You can divide the score given by scoring algo (BM25) by the max score returned by the getMaxScore to normalize the score, however could you explain why you want to normalize those values?
            – Aman Tandon
            Nov 24 at 7:37












          • @Noran Here is the corrected link which Mats provided and gives the good explanation of BM25. opensourceconnections.com/blog/2015/10/16/…
            – Aman Tandon
            Nov 24 at 7:39


















          That link doesn't really explain much about how BM25 works - a much better explanation can be found at BM25 - The Next Generation of Lucene Relevation. BM25 is the default similarity in Solr these days.
          – MatsLindh
          Nov 23 at 19:30






          That link doesn't really explain much about how BM25 works - a much better explanation can be found at BM25 - The Next Generation of Lucene Relevation. BM25 is the default similarity in Solr these days.
          – MatsLindh
          Nov 23 at 19:30














          @MatsLindhThe page not found
          – Noran
          Nov 23 at 20:30






          @MatsLindhThe page not found
          – Noran
          Nov 23 at 20:30














          @AmanTandon I would like to normalize the scores in Lucene, Do you know how to do this?
          – Noran
          Nov 23 at 20:36




          @AmanTandon I would like to normalize the scores in Lucene, Do you know how to do this?
          – Noran
          Nov 23 at 20:36












          @Noran Please refer the github.com/apache/lucene-solr/blob/releases/lucene-solr/6.4.0/… You can divide the score given by scoring algo (BM25) by the max score returned by the getMaxScore to normalize the score, however could you explain why you want to normalize those values?
          – Aman Tandon
          Nov 24 at 7:37






          @Noran Please refer the github.com/apache/lucene-solr/blob/releases/lucene-solr/6.4.0/… You can divide the score given by scoring algo (BM25) by the max score returned by the getMaxScore to normalize the score, however could you explain why you want to normalize those values?
          – Aman Tandon
          Nov 24 at 7:37














          @Noran Here is the corrected link which Mats provided and gives the good explanation of BM25. opensourceconnections.com/blog/2015/10/16/…
          – Aman Tandon
          Nov 24 at 7:39






          @Noran Here is the corrected link which Mats provided and gives the good explanation of BM25. opensourceconnections.com/blog/2015/10/16/…
          – Aman Tandon
          Nov 24 at 7:39




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435288%2fthe-formula-of-scoredoc-score-in-lucene%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to ignore python UserWarning in pytest?

          What visual should I use to simply compare current year value vs last year in Power BI desktop

          Héron pourpré