D365 eCommerce and Relevance search – When life gives you lemons, make lemonade

Dynamics 365 Commerce utilizes cloud-powered search to enhance product discoverability. This feature is crucial for customer interaction across various channels like e-commerce and point of sale (POS), ensuring customers can quickly find products. This search experience includes advanced capabilities like faceted navigation, immersive autosuggest, and sorting options for better product discovery and scalability required for e-commerce traffic​​.

BM25 ranking

It is not always easy to understand exactly how the current search is working, and I will try to explain. The core center of the search is based on the ranking system BM25, where ‘BM’ means “Best Match”. BM25 is a popular ranking function used by search engines to estimate the relevance of documents to a given search query. In simpler terms, it’s a formula that helps determine how well a document (like a webpage, article, or product description) matches a search query. Here’s a basic explanation of how it works, using a simple example:

  1. Term Frequency (TF): This refers to how many times a search term appears in a document. For instance, if you’re searching for “chocolate cake” and a recipe mentions “chocolate” 10 times and “cake” 5 times, these numbers contribute to the term frequency part of the BM25 calculation.
  2. Inverse Document Frequency (IDF): This measures how common or rare a term is across all documents. If every recipe on a site mentions “cake,” then “cake” is a common term and has a lower IDF. However, if only a few recipes mention “chocolate,” then “chocolate” is rarer and has a higher IDF.
  3. Document Length: This part of the formula adjusts for the length of the document. A longer document might naturally use a search term more often, so BM25 compensates for this. For instance, a long article that mentions “chocolate” 10 times might not be as relevant as a short article that mentions “chocolate” 5 times.
  4. Query Length: BM25 also considers the length of the search query. The relevance of each term in the query to the document is considered to determine the overall relevance of the document to the query.

Example:

Suppose you have two recipes on a cooking website:

  • Recipe A: A short recipe for “Chocolate Lava Cake” that mentions “chocolate” 3 times.
  • Recipe B: A long article about the history of cakes that mentions “chocolate” 10 times.

If you search for “chocolate cake,” BM25 would calculate the relevance of both documents based on how often “chocolate” and “cake” appear in each (TF), how common these words are across all recipes (IDF), and the length of each document. Despite having fewer mentions of “chocolate,” Recipe A might be rated more relevant than Recipe B because it’s more focused and shorter, making its use of the term “chocolate” more significant.

In essence, BM25 helps search engines prioritize documents that are more likely to be what the user is looking for, based on how terms are used within them and how those terms are distributed across all documents.

Azure Search

In D365 eCommerce Azure search ranking is determined by underlying search engine algorithm BM25. You can refer to this document to understand how it works https://learn.microsoft.com/en-us/azure/search/index-similarity-and-scoring . It is basically a variant of the TF/IDF algorithm. The algorithm also takes corresponding language index (Like for nb-no locale is using “nb.microsoft” – Norwegian language analyzer) for all language fields like Name, Description, Keywords, Attributes etc. I asked Microsoft to explain one specific scenario we have been struggling on, and that is when searching for lemons (Sitron in Norwegian), and why we could not make this the highest ranking on the list. Instead, it came ranked as number 2. It is important to mention that the product name in this case is “SITRON KG”. If I rename the product is “SITRON” it will be ranked as number 1.

The reason why product sitrongele is returned as highest ranked product is because, with nb.microsoft analyzer, the word sitrongele is split into below tokens in search engine inverted index:

 “tokens”: [

       {

           “token”: “sitrongelé”,

           “startOffset”: 0,

           “endOffset”: 10,

           “position”: 0

       },

       {

           “token”: “sitrongele”,

           “startOffset”: 0,

           “endOffset”: 10,

           “position”: 0

       },

       {

           “token”: “sitron”,

           “startOffset”: 0,

           “endOffset”: 6,

           “position”: 0

       },

       {

           “token”: “gelé”,

           “startOffset”: 6,

           “endOffset”: 10,

           “position”: 0

       },

       {

           “token”: “gele”,

           “startOffset”: 6,

           “endOffset”: 10,

           “position”: 0

       }

   ]

 The SITRONGELE is compound word in Norwegian which combined by sitron and gele, so that it is split into tokens including both sitron and gele. That is why sitrongele is also returned when searching for “SITRON“.

Although both “SITRONGELE” and “SITRON KG” contains the token SITRON, the document for SITRONGELE product contains more matched token than document for “SITRON KG“:

From the above table, we can see that there are 6 matched tokens in “SITRONGELE” document, but 4 matched tokens in “SITRON KG” document. The more matched tokens, the higher rank of the document. That is why SITRONGELE product is in front of SITRON KG product in search results.

The BM25 algorithm adjusts its rankings based on term distribution within the available data. It tends to perform well with longer queries due to its handling of term saturation and information length consideration. Despite its effectiveness, BM25 has some limitations. It does not understand the semantic meaning of query terms or documents, which means it might not fully capture the search context. Additionally, BM25 treats all user queries equally, lacking a personalized approach to search results. Moreover, BM25 is subject to the limitations of the terms and data it is applied to, and its effectiveness can be influenced by the nature of the available information and the queries.

Based on this understanding we can see the importance of having the right searchable product name, search name, description, and attributes. What I further would like to have is a way to better control the ranking by having the ability to boost certain products based on campaigns, pricing, availability, and also to easier control with per site/legal entity. I also in the future hope to see search features that are self-improving and learn what customers are searching for and improve the search results accordingly.

There are extensibility options that developers can do to adjust the search algorithms, and it is advisable to involve Microsoft if there are specific needs, and there are also ISV solutions available in the marked place, and one of them being recommended are the unbxd.com. I have not yet talked to them or understood their offering, but it is a path to investigate if even more advanced search capabilities are required.

I would also like to thank the community for 2023. It has been a very productive year with lots of learning and new features. 2024 will be the year we see more F&O/eCommerce deliverables in the AI/CoPilot area, and that will be super interesting.

Happy Dax’ing

 

One thought on “D365 eCommerce and Relevance search – When life gives you lemons, make lemonade

Leave a reply to Shafeeque Mohammed Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.