Tech Corner

Re-ranking of search results in SOLR

Engati Team
.
Jul 8
.
6-7 mins

Table of contents

Key takeawaysCollaboration platforms are essential to the new way of workingEmployees prefer engati over emailEmployees play a growing part in software purchasing decisionsThe future of work is collaborativeMethodology

Majority of the e-commerce search engines rely on parameters such as product popularity, product rating, recency, click through rate and other factors to influence the result set for an input user search query.

Data suggests that the search pages have a significantly higher probability of customer engagement when relying on additional factors to re-arrange the result set versus serving results only using pure SOLR relevance score.

The blog assumes that readers have prior knowledge on the following
> Querying SOLR for content
> Query time boosting supported by SOLR
> Function queries in SOLR
> ValueSource parsers in SOLR

There are multiple ways in this can be achieved. Some of the most used approaches are :
> Using the LTR model exposed by SOLR
> Supplying a boost function / query to the search query

More information on leveraging LTR can be found in the SOLR documentation (reference : https://lucene.apache.org/solr/guide/8_6/learning-to-rank.html)

Benefits / drawbacks of LTR approach:

> Easy to use and deploy
> SOLR provides options for feature engineering if there is no custom model available.
> Top N documents from the search query are considered for re-ranking.
> Most of the LTR models require the parameters to be normalised which may result in an additional SOLR query.

In this blog, I will try to focus more on the boosting function and custom functions to achieve result re-ranking

Given an input search term, there are multiple factors which can be considered.They can be mainly classified as :
Search term specific metrics
Search term independent metrics
Product metadata

Boosts in SOLR :

SOLR supports a variety of boosts as follows :
> Boost by query with an additive boost . (bq)
> Boost by function with an additive boost. (bf)
> Boost by function with a multiplicative boost (boost)
Generally multiplicative boosts are preferred over additive as they are predictive. But it depends on the use-case in hand.

If a set of products ids to boost for a particular search term is known ahead of time, they can be supplied as a boost to the SOLR query by passing them along with bq.
q=iphone&bq=productId(101^10 102^9 103^8 104^7) to provide additive boost to the 3 IDs.

Applying boost using BQ

Let’s consider a search term q=iphone
When I apply a boost to a product ID bq=id:PRC-60001–00424–00002 ^1.0
I will get the debug information as follows

7.7611732 = sum of:
2.1149852 = weight(name:iphone in 257987) [SchemaSimilarity], result of:
  2.1149852 = score(freq=1.0), computed as boost * idf * tf from:
    2.956811 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
      10512 = n, number of documents containing term
      202223 = N, total number of documents with field
    0.7152927 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
      1.0 = freq, occurrences of term within document
      1.2 = k1, term saturation parameter
      0.75 = b, length normalization parameter
      1.0 = dl, length of field
      9.1809435 = avgdl, average length of field
5.646188 = weight(id:PRC-60001-00424-00002 in 257987) [SchemaSimilarity], result of:
  5.646188 = score(freq=1.0), computed as boost * idf * tf from:
    12.421614 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
      1 = n, number of documents containing term
      372159 = N, total number of documents with field
    0.45454544 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
      1.0 = freq, occurrences of term within document
      1.2 = k1, term saturation parameter
      0.75 = b, length normalization parameter
      1.0 = dl, length of field
      1.0 = avgdl, average length of field

This suggests that the final score is a sum of 2.1149852 ( relevance score from name match ) + 5.646188 ( 1 * idf * tf) where 1 was the boost supplied .
Incase we do not need the boost to be multiplied with tf and idf, bq can be passed as below

bq=id:PRC-60001–00424–00002 ^=1.0
This would generate a debug as follows

3.1617308 = sum of:
2.1617308 = weight(name:iphone in 257987) [SchemaSimilarity], result of:
  2.1617308 = score(freq=1.0), computed as boost * idf * tf from:
    3.0451374 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
      12654 = n, number of documents containing term
      265907 = N, total number of documents with field
    0.70989597 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
      1.0 = freq, occurrences of term within document
      1.2 = k1, term saturation parameter
      0.75 = b, length normalization parameter
      1.0 = dl, length of field
      8.282933 = avgdl, average length of field
1.0 = ConstantScore(id:PRC-60001-00424-00002)

Indicating that the final score of 3.1617308 is a sum of 2.1617308 ( relevance score due to name match ) + 1.0 ( a constant score)

Applying boost using BF

bf is mainly used in cases where a function is supplied to SOLR which is evaluated and boost is determined.
bf=field(popularity) ^2.0

20.063128 = sum of:
2.0631282 = weight(name:iphone in 1341) [SchemaSimilarity], result of:
  2.0631282 = score(freq=1.0), computed as boost * idf * tf from:
    2.891674 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
      12278 = n, number of documents containing term
      221300 = N, total number of documents with field
    0.7134719 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
      1.0 = freq, occurrences of term within document
      1.2 = k1, term saturation parameter
      0.75 = b, length normalization parameter
      1.0 = dl, length of field
      8.858545 = avgdl, average length of field
18.0 = FunctionQuery(double(popularity)), product of:
  9.0 = double(popularity)=9.0
  2.0 = boost

In the above debug query, we can see that the final score of 20.063128 is a sum of 2.0631282 ( relevance score based on name match ) + 18.0 (which is a product of popularity * weight )

Applying boost using boost

In contrast to bf which is an additive boost, if one wants to apply a multiplicative boost, boost can be used
boost=field(popularity)

18.550365686416626 = weight(FunctionScoreQuery(nameSearch:iphone, scored by boost(double(popularity)))), result of:
18.550365686416626 = product of:
  2.0611517 = weight(name:iphone in 1341) [SchemaSimilarity], result of:
    2.0611517 = score(freq=1.0), computed as boost * idf * tf from:
      2.8886645 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
        12278 = n, number of documents containing term
        220635 = N, total number of documents with field
      0.713531 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
        1.0 = freq, occurrences of term within document
        1.2 = k1, term saturation parameter
        0.75 = b, length normalization parameter
        1.0 = dl, length of field
        8.868674 = avgdl, average length of field
  9.0 = double(popularity)=9.0

In the above debug query, the final score of 18.55 is a product of 2.0611517 ( relevance score based on name match) * 9.0 ( boost from popularity)

Leveraging SOLR’s Payloads and DocValues

SOLR payloads are per document map of terms to values which could be utilised to store searchTerm -> metadata mapping for a given document.
SOLR provides fieldTypes supporting Payloads . An example :

<fieldType name=”delimited_payloads_float” stored=”false” indexed=”true” class=”solr.TextField”>
<analyzer> <tokenizer name=”whitespace”/>
<filter name=”delimitedPayload” encoder=”float”/>
</analyzer>
</fieldType>
<dynamicField name="*_dpf" type="delimited_payloads_float" indexed="true"  stored="true"/>

Finally, in-order to achieve the required equation to be supplied as a function to SOLR to re-rank the result set, a custom ValueSourceParser can be created to utilise the payloads feature for search term specific metrics and docValues for accessing the metadata of the document.

Overriding the getValues function of the ValueSource allows you to initialise the Payload fields and the docValues fields to be utilised :

final Terms terms = readerContext.reader().terms("ctr_dpf");
FunctionValues createdDateFunctionValues =
  new LongFieldSource("rating").getValues(context, readerContext);

The expected return type is a DocValues supplier which would contain the logic to fetch appropriate information from the payload and the field

To fetch the documents corresponding to the given search term from the payload field.

if (terms != null) {   final TermsEnum termsEnum = terms.iterator();   if (termsEnum.seekExact(indexedBytes)) {     docs = termsEnum.postings(null, PostingsEnum.ALL);   } else {     docs = null;   } }

To fetch the payload value from the doc in-hand

BytesRef payload = docs.getPayload();
if (payload != null) {
String stringVal = payload.utf8ToString();
}

To fetch the value from the docValues field

Double getDataFromFunctionValues(FunctionValues functionValues,
int doc){
return (int) functionValues.objectVal(doc);
}

Using the above ValueSource, it can be passed to a SOLR query as part of a bf or a boost as it is a function to take advantage of all the params required to re-rank the result set by passing the query as
q=iphone&boost=myFunction(iphone, weights_of_individual_params)

References :
http://www.textsearch.io/?p=5
https://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/
https://lucene.apache.org/solr/guide/8_6/the-dismax-query-parser.html#bq-boost-query-parameter
https://lucene.apache.org/solr/guide/8_6/learning-to-rank.html
https://github.com/apache/lucene-solr/blob/master/solr/server/solr/configsets/_default/conf/managed-schema

Share
Share

Engati Team

At the forefront for digital customer experience, Engati helps you reimagine the customer journey through engagement-first solutions, spanning automation and live chat.

Andy is the Co-Founder and CIO of SwissCognitive - The Global AI Hub. He’s also the President of the Swiss IT Leadership Forum.

Andy is a digital enterprise leader and is transforming business strategies keeping the best interests of shareholders, customers, and employees in mind.

Follow him for your daily dose of AI news and thoughts on using AI to improve your business.

Catch our interview with Andy on AI in daily life

Continue Reading