Query Parsing

Search > Configurations > Relevancy Settings > Query Parsing

Overview

Query Parsing is a section under the Relevancy Settings of the Platform, which interprets and processes the user’s input into a query that the search system can understand and act upon. It involves breaking down the query into recognizable elements, identifying the intent, and optimizing it for the most relevant search results.

It allows administrators to fine-tune the settings for different AI models deployed to ensure search queries are processed and matched with the most relevant results. It categorizes search queries using machine learning models. It helps in identifying the intent behind a query and matches it with the right set of results.

This documentation will guide you through each section and its functionalities.

Query Classification

Assigns user queries to defined categories using machine learning, enhancing search accuracy and relevance.

For instance, a customer searches for "energy-efficient portable air conditioner." The Query Classifier assigns this query to both "Home Appliances" and "Energy-Efficient Electronics," ensuring that only the most relevant products, such as portable air conditioners with energy-saving features, are shown.

Enable Query Classifier

Toggle this switch to activate the Query Classifier. When enabled, incoming queries will be evaluated against the trained model to predict the best matching categories.

Train the Model

This section appears if the Query Classifier is enabled but any model is not trained.

Select the Model

Choose a pre-trained model from the list to categorize search queries for enhanced accuracy and relevance.

Model Settings

Here, you will find options to select and configure the model that classifies your queries.

Confidence Threshold

Adjust the confidence thresholds to control when to boost or filter search results for improved relevancy.

The section for Model Settings allows you to configure how the search algorithm should prioritize (boost) or downplay (filter) search results based on the categories that have been identified in the query classification process.

Filtering

Filtering settings are used to exclude certain categories from affecting the search results if they do not meet the confidence threshold. This is crucial for maintaining the relevance of the search results by ensuring that only categories identified with sufficient confidence (as per the set threshold) are considered.

Low: Categories identified with low confidence will still be considered in filtering the results.

High: Only categories identified with high confidence will be used to filter the results, which can lead to more precise but potentially narrower search outcomes.

Boosting

Boosting settings determine how strongly a category influences the ranking of search results. By setting a confidence level (such as Low, Medium, High), you instruct the system to give more weight to search results that belong to the categories identified with at least the chosen confidence level.

Low: Even categories identified with low confidence can boost relevant results, potentially increasing the diversity of the results but may also include less relevant ones.

For the search query "wireless headphones for running," when the boosting setting is adjusted to a low confidence level, the search system broadens the scope of product recommendations. This includes not only headphones specifically marketed for running but also a wider range of wireless headphones and earpods which might not sweat resistant and earfit.

High: Only categories identified with high confidence will boost results, which can increase the accuracy of the results but may reduce their diversity.

For the search query "wireless headphones for running,", the model identifies "Sports Headphones" as a category with high confidence, particularly those tagged or described with "wireless" and suitable for "running." The high confidence boosting ensures that products fitting all these criteria—wireless, designed for sports or running—are prominently displayed in the search results.

Tip: The Confidence level of Boosting is advised to be one level higher than Filtering. For example, if Filtering is set to Medium, then Boosting should be set to High.

Let us look at an e-Commerce Search Scenario: "Double Door Refrigerator"

Filtering with Low Confidence

When the filter is set at a low confidence level for the query, the search algorithm is instructed to include a broad range of refrigerators in the search results. This inclusivity means that not only double door models but also single door, French door, side-by-side, and other types of refrigerators are likely to appear in the initial search results, under the assumption that users might be interested in exploring various options, not strictly limited to double door models.

Boosting with High Confidence

Setting the boost to a high confidence level specifically targets the "double door refrigerator" category. In this setup, among the broad range of refrigerators shown due to the low filtering confidence, those that are categorically double door refrigerators—or have been tagged, reviewed, or otherwise identified as closely matching the "double door" criteria—are given a significant visibility boost. This means that while users will see a variety of refrigerator types due to the broad filter, double door models, which are the most relevant to the search query, are ranked higher. They are more likely to catch the user's attention early on in the search results.


This nuanced approach serves multiple purposes:

By including a wider array of refrigerators, the platform caters to users who might be in the exploration phase, unsure of the exact type of refrigerator they want, thereby increasing the chances of discovery and potential cross-sell opportunities.

Targeted Visibility: By boosting double door models specifically, the platform ensures that users who have a clear intent of finding a double door refrigerator can easily find these models without wading through less relevant options.

The boosting and filtering settings directly influence the user experience by controlling the relevancy of search results. They need to be configured carefully to balance between the precision and breadth of search outcomes.

Category Level Filtering

This feature allows you to enable or disable filtering at different hierarchical levels of category classification. If a query is classified into multiple categories across different levels, you can decide which levels should actively influence the filtering of search results.

First level: Usually the most general category level. Enabling filtering here means that only top-tier categories will influence the search results.

Second level and beyond: These are more specific category levels. Enabling or disabling these levels will fine-tune the granularity of the search results.

For instance, a user's search for "noise-cancelling headphones" might interact with various levels of filtering as follows:

First-level filtering on "Electronics": Confirms that all search results are within the electronics domain.

Second-level filtering on "Audio Equipment": Refines the search to include only audio-related products.

Third-level filtering on "Headphones": Filters the results further to show only headphones, and with boosting applied to "noise-cancelling" features, those products are given prominence over other types of headphones.


Impact on Recall & Precision

Enabling or disabling filtering at various category levels allows you to manage the specificity of your search results. If you enable filtering for all levels, only search results that match the query's classification across all levels will be shown, which can be very specific and might reduce the number of results. Conversely, disabling filtering at lower levels could increase the number of results but also include less relevant ones.

Taking the above discussed example, if a user searches for "noise-cancelling headphones":

  1. With first-level filtering only, the search will yield a broad range of electronics, from smartphones to headphones, including noise-cancelling headphones. Here, precision is lower (more varied results), but recall is higher (all noise-cancelling headphones in electronics are likely to be retrieved).
  2. As you enable second-level filtering, the search narrows down to audio equipment. Precision increases (more relevant results), but recall may decrease (other relevant electronic devices with noise-cancelling features are excluded).
  3. With third-level filtering enabled, only headphones are shown, which maximizes precision (highly relevant results). However, recall is at its lowest as this excludes other products that might also suit the user’s needs, like earbuds or headsets that are not categorized strictly as "Headphones" but still have noise-cancelling features.

So, the more specific the enabled category level, the higher the precision and the lower the recall. The key is finding the right balance to ensure users find what they need without missing out on potentially relevant products.

Category Hierarchy

This visual representation helps users understand the current structure of the category hierarchy at a glance. It's an interactive tree where each node represents a category level, allowing users to navigate through different categories and subcategories to see their relationship and organization.

If the hierarchy is not set up, you will be directed to the Category Hierarchy Configurations page.

Attribute Detection

Attribute Identification allows the model to recognize and act upon specific attributes within user queries. By identifying these attributes, the model can better understand the context and content of each query, improving the accuracy and relevance of search results.

Description: Detects relevant attributes for the user queries to refine search results for more precise and relevant outcomes.

Enable/Disable Option: Toggle this to enable the system to identify and utilize attributes in the query. When enabled, the system will look for predefined attributes within the user queries to help in the classification and relevancy of search results.

Train the model link: If the model is not already enabled, this option appears as a link. Clicking on it takes the user to the Models section to initiate training. If the model is trained, additional settings are displayed to fine-tune the model's behavior.

Model Settings:

Fallback Priority:

This setting allows you to assign a priority level to attributes, influencing their impact on search results.

You can set the priority (High, Medium, Low) for each attribute. High priority attributes have a significant impact on search results, while Low priority ones are considered optional. If the "Color" attribute has a High fallback priority, queries mentioning specific colors will heavily influence the search results.

Mandatory: Mark an attribute as mandatory if it must be present in the query for a result to be returned.

Filter: Choose to filter search results based on the presence or absence of this attribute.

Boost: Opt to boost results that contain this attribute, making them rank higher in the search results.

If "Size" is marked for both filtering and boosting, queries specifying a size will not only prioritize products of that size but will also exclude products of non-specified sizes.

High Recall:

Enhances search inclusivity by expanding result sets, ensuring comprehensive retrieval of products matching user queries.

Minimum Product Count: Define the minimum number of products that should be returned in the search results. If the actual number is below this threshold, the system may take alternative actions like expanding the search query.

Note: By default, the minimum product count is set to 0, which means no minimum threshold is applied unless changed.

The Attribute Identification settings are a powerful tool for fine-tuning search relevancy. By understanding and applying these settings, users can significantly enhance the effectiveness of their platform, ensuring users find exactly what they need.

Pattern Extraction

This section is dedicated to managing Regular Expression (Regex) configurations that enhance the search functionality of the platform. By setting up regex patterns, you can create more dynamic and precise search criteria, allowing users to find exactly what they're looking for based on specific patterns or terms.

It is designed to configure how the engine interprets and processes complex queries by utilizing Regular Expressions (Regex). It is a powerful tool for pattern matching, allowing you to define specific search patterns that can match various string sequences within your data. This is especially useful in e-commerce platforms or databases where users might search for products or information using a wide range of terms and formats.

How it Works:

The model interprets user queries by comparing them against predefined Regex patterns. If a user's search terms match a particular pattern, the search engine understands how to process that query in the context of the platform's data.

These configurations can be tailored to identify specific patterns, ranges, or sorting commands within text data.

Patterns: This screen lets you create and define new search patterns that the system will recognize when processing queries.

Adding a New Configuration:

Name: Assign a unique and descriptive name to your new regex configuration for easy identification.

Type: Select the type of pattern you wish to create:

  • Term: For simple patterns or keywords.
  • Range: For numerical ranges or sequences.
  • Sort: For ordering sequences based on predefined criteria.

Pattern: Choose from a list of pre-defined patterns or create a custom pattern that the system will use to recognize and act upon. Each pattern type serves a different function:

  • Term Patterns: Ideal for matching exact words, phrases, or character combinations.
  • Range Patterns: Useful for defining search criteria within a certain numerical scope, like prices or measurements.
  • Sort Patterns: Employed to order search results according to a particular attribute, such as price low to high.

Detailed Pattern Selection:

For each type, you will be presented with a dropdown menu to select a specific pattern that aligns with your needs. For example:

  • Capacity Pattern: Choose a pattern that recognizes numerical values followed by units of measurement, which can be helpful for product specifications like weight or volume.
  • Price Range: Select a pattern that interprets price queries such as "price below $100" or "price between $50 to $150".

Finalizing Configurations:

After selecting the pattern and its type, review your configurations to ensure they match the desired search criteria. Once confirmed, save the configuration for it to take effect within the system.

Note: It is essential to have an understanding of regular expressions when creating or modifying these configurations. They are powerful tools for pattern matching and can significantly enhance the functionality of search and data processing within the application.

Configuration List:

The main table lists all the active regex configurations. Each entry defines a regex pattern that the engine uses to interpret complex queries.

Edit Regex Config:

Clicking on a configuration opens the editing pane where you can modify the details of the regex pattern.

Components of Edit Regex Config:

  • Name: The identifier for the regex pattern being used.
  • Type: The category of the pattern (Term or Range).
  • Pattern: The actual regex pattern that will be matched against search queries.
  • Examples: Provides an example or examples of the pattern to illustrate its use.

Fields:

  • Literals with Variations: Set different literals that the pattern may include, ensuring variations of a term are recognized.
  • Literal Variations Group: This section is where you add the units of measurement. You can specify different variations of the units to ensure the system recognizes them regardless of how they are inputted. For instance:
    • "square centimeter" could also be "sq.cm" or "cm^2".
    • "square millimeter" could also be "sq.mm" or "mm^2".
    • You can add more variations by typing in the field and clicking the plus icon.
  • Separator between 3D size: This defines the character or characters that separate the dimensions in a series. For example, for 3D object dimensions, you might use 'x' to separate height, width, and depth (e.g., 5 ft x 4 ft x 3 ft).

Test Configuration: In this area, you can test the regex pattern by entering queries to see if the configuration correctly identifies the dimensions. Inputting a test query, such as "3.5 ft x 2 ft x 10 cm table," allows you to verify that the pattern is effectively recognizing dimension expressions.

After setting up the variations and separators, use the "Test" button to run your inputted queries through the regex pattern to confirm that it's working as expected. If the test is successful, you can save your configuration, which will then be applied to the system’s text processing.

Save Config: Once you've confirmed that the regex pattern works correctly with your test queries, click this button to save the configuration.

Settings:This screen is used for defining configurations for the previously defined patterns. The users can also set Fallback Priority levels and determine whether the fields should act as filters, be mandatory, or receive a boost in the search algorithm.