Question: we create vectors using embedding to store products into qdrant.
Wen get the products using the webshop's api, then create vectors then store vectors with payload into the qdrant server.
which json keys are converted to vectors from the webshop api? Only the description?
Question: we create vectors using embedding to store products into qdrant.
Wen get the products using the webshop's api, then create vectors then store vectors with payload into the qdrant server.
which json keys are converted to vectors from the webshop api? Only the description?
@fszontagh I'm starting work on this question now! I'll search through the codebase to find where product vectors are created and stored in Qdrant, and identify which JSON fields from the webshop API are being converted to vectors.
@fszontagh I'm starting work on this question now! I'll search through the codebase to find where product vectors are created and stored in Qdrant, and identify which JSON fields from the webshop API are being converted to vectors.
@fszontagh Great question! Here's the detailed answer about which JSON keys are converted to vectors:
What gets converted to vectors?
The vector embeddings are NOT created from just the description. The system creates a comprehensive text representation that includes multiple fields from the product data. This text is then converted into a vector using OpenAI's text-embedding-3-large model (3072 dimensions).
Product Fields Included in Vector Embeddings
Based on the createProductText() function in supabase/functions/_shared/qdrant-client.ts (lines 511-621), here are ALL the JSON keys that get converted into the text representation before vectorization:
Meta Description - product.meta_description (ShopRenter/SEO, only if different from main description)
Context Information
Price - product.price (formatted as "Price: {value}")
Stock Status - product.stock_status or product.status (formatted as "Stock: {status}" or "Status: {status}")
How It Works
The system:
Collects all these fields from the webshop API response
Cleans HTML content using cleanHtmlContent() (handles HTML entities, removes tags, preserves newlines)
Joins all parts with | separator to create a comprehensive text representation
Generates embedding using OpenRouter API with openai/text-embedding-3-large model
Stores the vector (3072-dimensional) in Qdrant along with the original payload data
Example Text Representation
For a typical product, the text might look like:
Blue Running Shoes | Experience ultimate comfort with our premium running shoes... | SKU: SHOE-BLUE-001 | Vendor: Nike | Type: Footwear | Categories: Shoes, Sports, Running | Tags: athletic, comfortable, breathable | Variants: Size 8, Size 9, Size 10, Color Blue, Color Red | Price: 89.99 | Stock: in stock
This comprehensive approach ensures that semantic search can find products based on any of these attributes, not just the description!
Code Reference
You can see the full implementation in:
Text creation: supabase/functions/_shared/qdrant-client.ts - createProductText() function (lines 511-621)
Usage example: supabase/functions/woocommerce-sync/index.ts - where it creates product texts before generating embeddings
The same pattern applies to orders (createOrderText()) and customers (createCustomerText()), which also include comprehensive information beyond just basic fields.
@fszontagh Great question! Here's the detailed answer about which JSON keys are converted to vectors:
## What gets converted to vectors?
The vector embeddings are **NOT** created from just the description. The system creates a **comprehensive text representation** that includes multiple fields from the product data. This text is then converted into a vector using OpenAI's `text-embedding-3-large` model (3072 dimensions).
## Product Fields Included in Vector Embeddings
Based on the `createProductText()` function in `supabase/functions/_shared/qdrant-client.ts` (lines 511-621), here are **ALL** the JSON keys that get converted into the text representation before vectorization:
### Core Product Information
1. **Title/Name** - `product.title` or `product.name`
2. **Description** - `product.description` (HTML cleaned, entities decoded)
3. **Short Description** - `product.short_description` (WooCommerce, only if different from main description)
4. **SKU** - `product.sku` (formatted as "SKU: {value}")
### Categorization & Classification
5. **Vendor/Brand** - `product.vendor` (formatted as "Vendor: {value}")
6. **Product Type** - `product.product_type` (formatted as "Type: {value}")
7. **Categories** - `product.categories` array (formatted as "Categories: cat1, cat2, ...")
8. **Tags** - `product.tags` array (formatted as "Tags: tag1, tag2, ...")
### Variants & Attributes
9. **Variants** - `product.variants` array including:
- `variant.title` (if not "Default Title")
- `variant.option1`, `variant.option2`, `variant.option3`
- Formatted as "Variants: option1, option2, ..."
10. **Attributes** - `product.attributes` array (WooCommerce):
- `attribute.name` and `attribute.options`
- Formatted as "Name: options | Name2: options2"
### SEO & Metadata
11. **Meta Description** - `product.meta_description` (ShopRenter/SEO, only if different from main description)
### Context Information
12. **Price** - `product.price` (formatted as "Price: {value}")
13. **Stock Status** - `product.stock_status` or `product.status` (formatted as "Stock: {status}" or "Status: {status}")
## How It Works
The system:
1. **Collects all these fields** from the webshop API response
2. **Cleans HTML content** using `cleanHtmlContent()` (handles HTML entities, removes tags, preserves newlines)
3. **Joins all parts** with ` | ` separator to create a comprehensive text representation
4. **Generates embedding** using OpenRouter API with `openai/text-embedding-3-large` model
5. **Stores the vector** (3072-dimensional) in Qdrant along with the original payload data
## Example Text Representation
For a typical product, the text might look like:
```
Blue Running Shoes | Experience ultimate comfort with our premium running shoes... | SKU: SHOE-BLUE-001 | Vendor: Nike | Type: Footwear | Categories: Shoes, Sports, Running | Tags: athletic, comfortable, breathable | Variants: Size 8, Size 9, Size 10, Color Blue, Color Red | Price: 89.99 | Stock: in stock
```
This comprehensive approach ensures that semantic search can find products based on **any** of these attributes, not just the description!
## Code Reference
You can see the full implementation in:
- **Text creation**: `supabase/functions/_shared/qdrant-client.ts` - `createProductText()` function (lines 511-621)
- **Usage example**: `supabase/functions/woocommerce-sync/index.ts` - where it creates product texts before generating embeddings
The same pattern applies to orders (`createOrderText()`) and customers (`createCustomerText()`), which also include comprehensive information beyond just basic fields.
In qdrant this product exists. name: Soft clean 4x55 db popsitörlő, but for testing purposes the description is just lorem ipsum.
So in theory the mcp have to find the product by the product name.
we have the shoprenter mcp server for example. When a llm searching for "popsítörlő", then no results found.
Popsitörlő exists in the qdrant.
This is the json representation of the tool call parameters:
```
{
"shop_id": "d164a10f-580c-4500-9850-fed3079dd6af",
"name": "popsitörlő",
"limit": 5
}
```
This is the resosnse:
```
[
{
"type": "text",
"text": "{\"count\":0,\"limit\":5,\"source\":\"qdrant\",\"products\":[]}"
}
]
```
In qdrant this product exists. `name: Soft clean 4x55 db popsitörlő`, but for testing purposes the description is just lorem ipsum.
So in theory the mcp have to find the product by the product name.
@fszontagh I'm investigating the search issue now! Let me check the MCP server implementation to understand why searching for "popsítörlő" isn't finding the product "Soft clean 4x55 db popsitörlő" even though it exists in Qdrant. I'll look at:
The ShopRenter MCP server search implementation
How the search query is being processed
The Qdrant search/query mechanism
Any encoding or normalization issues with Hungarian characters
@fszontagh I'm investigating the search issue now! Let me check the MCP server implementation to understand why searching for "popsítörlő" isn't finding the product "Soft clean 4x55 db popsitörlő" even though it exists in Qdrant. I'll look at:
1. The ShopRenter MCP server search implementation
2. How the search query is being processed
3. The Qdrant search/query mechanism
4. Any encoding or normalization issues with Hungarian characters
Step 3: Limit results to the requested limit (default: 10, max: 20)
The Problem
The scrollPoints() function in line 123 is called with a limit parameter that defaults to the same limit as the final result (in your case, 5 products). This means:
const result = await scrollPoints(collectionName, qdrantFilter, limit);
So when you search for "popsítörlő" with limit: 5, the code:
Retrieves only 5 products from Qdrant (matching store_id filter)
Then tries to find "popsítörlő" within those 5 products
If "popsítörlő" is not in those first 5 products, it returns 0 results
The product name is stored in the payload of each Qdrant point, but it's NOT an indexed field. Only these fields are indexed:
store_id (keyword)
product_id (keyword)
platform (keyword)
status (keyword)
price (float)
sku (keyword)
The product name is embedded in the vector (used for semantic search), but not searchable as a keyword filter.
Solutions
There are several ways to fix this:
Option 1: Use Semantic Vector Search (Recommended)
Instead of using scrollPoints + client-side filtering, use semantic vector search with searchPoints():
Generate an embedding for the search query "popsítörlő"
Use searchPoints() to find semantically similar products
This should find "Soft clean 4x55 db popsitörlő" because the name is in the embedded text
Pros: More accurate, finds variations and related products, uses Qdrant's full power
Cons: Requires embedding generation (API call), slightly slower
Option 2: Increase Scroll Limit
When filtering by name, retrieve many more products before filtering:
// Instead of:
const result = await scrollPoints(collectionName, qdrantFilter, limit);
// Do:
const scrollLimit = filters?.name ? 1000 : limit; // Get more if filtering by name
const result = await scrollPoints(collectionName, qdrantFilter, scrollLimit);
Pros: Simple fix, no API calls needed
Cons: Inefficient, loads many products into memory, won't scale well
Pros: Efficient, scalable, uses Qdrant's indexing
Cons: Requires re-creating collections (data migration), may not handle Hungarian characters well
My Recommendation
For the best user experience, I recommend Option 1: Semantic Vector Search.
When a user searches by name, the system should:
Generate an embedding for the search query
Use vector similarity search to find matching products
This will find "popsitörlő" even with typos, plural forms, or related terms
This is actually the intended use case for Qdrant - semantic search is more powerful than keyword matching!
Would you like me to implement Option 1 (semantic search) or Option 2 (quick fix with increased limit)?
@fszontagh I've found the issue! Here's what's happening:
## Root Cause
When searching for products by **name** (like "popsítörlő"), the MCP server is using a **hybrid approach** that has a significant limitation:
### How Product Search Currently Works
Looking at `supabase/functions/_shared/mcp-qdrant-helpers.ts` (lines 80-143):
1. **Step 1**: Call `scrollPoints()` to retrieve products from Qdrant
- Only filters by indexed fields: `store_id`, `sku`, `status`, `price`
- **The `name` field is NOT indexed** in Qdrant (see `qdrant-client.ts` lines around collection creation)
2. **Step 2**: Apply client-side filtering for `name`:
```typescript
if (filters?.name) {
const nameLower = filters.name.toLowerCase();
products = products.filter((p: any) =>
(p.name || p.title || '').toLowerCase().includes(nameLower)
);
}
```
3. **Step 3**: Limit results to the requested limit (default: 10, max: 20)
### The Problem
The `scrollPoints()` function in line 123 is called with a **limit parameter** that defaults to **the same limit as the final result** (in your case, 5 products). This means:
```typescript
const result = await scrollPoints(collectionName, qdrantFilter, limit);
```
So when you search for "popsítörlő" with `limit: 5`, the code:
1. Retrieves only **5 products** from Qdrant (matching store_id filter)
2. Then tries to find "popsítörlő" within those 5 products
3. If "popsítörlő" is not in those first 5 products, **it returns 0 results**
This is why you're getting:
```json
{"count":0,"limit":5,"source":"qdrant","products":[]}
```
## Why This Happens
The product name is stored in the **payload** of each Qdrant point, but it's NOT an **indexed field**. Only these fields are indexed:
- `store_id` (keyword)
- `product_id` (keyword)
- `platform` (keyword)
- `status` (keyword)
- `price` (float)
- `sku` (keyword)
The product name is **embedded in the vector** (used for semantic search), but not searchable as a keyword filter.
## Solutions
There are several ways to fix this:
### Option 1: Use Semantic Vector Search (Recommended)
Instead of using `scrollPoints` + client-side filtering, use **semantic vector search** with `searchPoints()`:
1. Generate an embedding for the search query "popsítörlő"
2. Use `searchPoints()` to find semantically similar products
3. This should find "Soft clean 4x55 db popsitörlő" because the name is in the embedded text
**Pros**: More accurate, finds variations and related products, uses Qdrant's full power
**Cons**: Requires embedding generation (API call), slightly slower
### Option 2: Increase Scroll Limit
When filtering by name, retrieve **many more products** before filtering:
```typescript
// Instead of:
const result = await scrollPoints(collectionName, qdrantFilter, limit);
// Do:
const scrollLimit = filters?.name ? 1000 : limit; // Get more if filtering by name
const result = await scrollPoints(collectionName, qdrantFilter, scrollLimit);
```
**Pros**: Simple fix, no API calls needed
**Cons**: Inefficient, loads many products into memory, won't scale well
### Option 3: Index the Name Field
Add `name` as an indexed field in Qdrant:
```typescript
await createCollection(productsCollection, [
{ field: 'store_id', type: 'keyword' },
{ field: 'product_id', type: 'keyword' },
{ field: 'platform', type: 'keyword' },
{ field: 'status', type: 'keyword' },
{ field: 'price', type: 'float' },
{ field: 'sku', type: 'keyword' },
{ field: 'name', type: 'text' }, // <-- Add this
]);
```
Then use Qdrant's text matching:
```typescript
if (filters?.name) {
qdrantFilter.must.push({
key: 'name',
match: { text: filters.name } // Full-text search
});
}
```
**Pros**: Efficient, scalable, uses Qdrant's indexing
**Cons**: Requires re-creating collections (data migration), may not handle Hungarian characters well
## My Recommendation
For the **best user experience**, I recommend **Option 1: Semantic Vector Search**.
When a user searches by name, the system should:
1. Generate an embedding for the search query
2. Use vector similarity search to find matching products
3. This will find "popsitörlő" even with typos, plural forms, or related terms
This is actually the **intended use case** for Qdrant - semantic search is more powerful than keyword matching!
Would you like me to implement Option 1 (semantic search) or Option 2 (quick fix with increased limit)?
@fszontagh Done! I've created issue #94 with the following details:
**Title**: feat: Implement semantic vector search for product name queries in MCP server
**Assignment**: Assigned to claude user ✅
**Label**: enhancement ✅
**Summary**: The new issue contains:
- Complete problem description (why "popsítörlő" search fails)
- Detailed solution approach (semantic vector search)
- Step-by-step implementation guide
- List of benefits
- Files to modify
- Acceptance criteria for testing
The issue is ready for implementation and will solve the search problem you encountered!
You can view it here: Issue #94
Question: we create vectors using embedding to store products into qdrant.
Wen get the products using the webshop's api, then create vectors then store vectors with payload into the qdrant server. which json keys are converted to vectors from the webshop api? Only the description?
@fszontagh I'm starting work on this question now! I'll search through the codebase to find where product vectors are created and stored in Qdrant, and identify which JSON fields from the webshop API are being converted to vectors.
@fszontagh Great question! Here's the detailed answer about which JSON keys are converted to vectors:
What gets converted to vectors?
The vector embeddings are NOT created from just the description. The system creates a comprehensive text representation that includes multiple fields from the product data. This text is then converted into a vector using OpenAI's
text-embedding-3-largemodel (3072 dimensions).Product Fields Included in Vector Embeddings
Based on the
createProductText()function insupabase/functions/_shared/qdrant-client.ts(lines 511-621), here are ALL the JSON keys that get converted into the text representation before vectorization:Core Product Information
product.titleorproduct.nameproduct.description(HTML cleaned, entities decoded)product.short_description(WooCommerce, only if different from main description)product.sku(formatted as "SKU: {value}")Categorization & Classification
product.vendor(formatted as "Vendor: {value}")product.product_type(formatted as "Type: {value}")product.categoriesarray (formatted as "Categories: cat1, cat2, ...")product.tagsarray (formatted as "Tags: tag1, tag2, ...")Variants & Attributes
Variants -
product.variantsarray including:variant.title(if not "Default Title")variant.option1,variant.option2,variant.option3Attributes -
product.attributesarray (WooCommerce):attribute.nameandattribute.optionsSEO & Metadata
product.meta_description(ShopRenter/SEO, only if different from main description)Context Information
product.price(formatted as "Price: {value}")product.stock_statusorproduct.status(formatted as "Stock: {status}" or "Status: {status}")How It Works
The system:
cleanHtmlContent()(handles HTML entities, removes tags, preserves newlines)|separator to create a comprehensive text representationopenai/text-embedding-3-largemodelExample Text Representation
For a typical product, the text might look like:
This comprehensive approach ensures that semantic search can find products based on any of these attributes, not just the description!
Code Reference
You can see the full implementation in:
supabase/functions/_shared/qdrant-client.ts-createProductText()function (lines 511-621)supabase/functions/woocommerce-sync/index.ts- where it creates product texts before generating embeddingsThe same pattern applies to orders (
createOrderText()) and customers (createCustomerText()), which also include comprehensive information beyond just basic fields.we have the shoprenter mcp server for example. When a llm searching for "popsítörlő", then no results found. Popsitörlő exists in the qdrant.
This is the json representation of the tool call parameters:
This is the resosnse:
In qdrant this product exists.
name: Soft clean 4x55 db popsitörlő, but for testing purposes the description is just lorem ipsum. So in theory the mcp have to find the product by the product name.@fszontagh I'm investigating the search issue now! Let me check the MCP server implementation to understand why searching for "popsítörlő" isn't finding the product "Soft clean 4x55 db popsitörlő" even though it exists in Qdrant. I'll look at:
@fszontagh I've found the issue! Here's what's happening:
Root Cause
When searching for products by name (like "popsítörlő"), the MCP server is using a hybrid approach that has a significant limitation:
How Product Search Currently Works
Looking at
supabase/functions/_shared/mcp-qdrant-helpers.ts(lines 80-143):Step 1: Call
scrollPoints()to retrieve products from Qdrantstore_id,sku,status,pricenamefield is NOT indexed in Qdrant (seeqdrant-client.tslines around collection creation)Step 2: Apply client-side filtering for
name:Step 3: Limit results to the requested limit (default: 10, max: 20)
The Problem
The
scrollPoints()function in line 123 is called with a limit parameter that defaults to the same limit as the final result (in your case, 5 products). This means:So when you search for "popsítörlő" with
limit: 5, the code:This is why you're getting:
Why This Happens
The product name is stored in the payload of each Qdrant point, but it's NOT an indexed field. Only these fields are indexed:
store_id(keyword)product_id(keyword)platform(keyword)status(keyword)price(float)sku(keyword)The product name is embedded in the vector (used for semantic search), but not searchable as a keyword filter.
Solutions
There are several ways to fix this:
Option 1: Use Semantic Vector Search (Recommended)
Instead of using
scrollPoints+ client-side filtering, use semantic vector search withsearchPoints():searchPoints()to find semantically similar productsPros: More accurate, finds variations and related products, uses Qdrant's full power Cons: Requires embedding generation (API call), slightly slower
Option 2: Increase Scroll Limit
When filtering by name, retrieve many more products before filtering:
Pros: Simple fix, no API calls needed Cons: Inefficient, loads many products into memory, won't scale well
Option 3: Index the Name Field
Add
nameas an indexed field in Qdrant:Then use Qdrant's text matching:
Pros: Efficient, scalable, uses Qdrant's indexing Cons: Requires re-creating collections (data migration), may not handle Hungarian characters well
My Recommendation
For the best user experience, I recommend Option 1: Semantic Vector Search.
When a user searches by name, the system should:
This is actually the intended use case for Qdrant - semantic search is more powerful than keyword matching!
Would you like me to implement Option 1 (semantic search) or Option 2 (quick fix with increased limit)?
Option 1, but create a new issue for it, assig to claude user
@fszontagh Done! I've created issue #94 with the following details:
Title: feat: Implement semantic vector search for product name queries in MCP server
Assignment: Assigned to claude user ✅
Label: enhancement ✅
Summary: The new issue contains:
The issue is ready for implementation and will solve the search problem you encountered!
You can view it here: Issue #94