In the supabase edge functions where products are synchronized with qdrant, the decriptions contains html special chars, html tags.
We have to remove these formatting, only plain text allowed to store in qdrant.
Remove duplicated new lines. Replace html new lines with one new line etc...
we have to keep the paragraphs, but remove all text formatting to keep clear the descriptions and preserve space.
If you deploy edge funtion into supabase, verify if the deployed script is not empty after deployment.
In the supabase edge functions where products are synchronized with qdrant, the decriptions contains html special chars, html tags.
We have to remove these formatting, only plain text allowed to store in qdrant.
Remove duplicated new lines. Replace html new lines <br/> with one new line etc...
we have to keep the paragraphs, but remove all text formatting to keep clear the descriptions and preserve space.
If you deploy edge funtion into supabase, verify if the deployed script is not empty after deployment.
I'll implement HTML cleanup for product descriptions in the Qdrant synchronization edge functions. My approach:
Search for edge functions that sync products to Qdrant
Create a utility function to clean HTML from descriptions (remove tags, special chars, normalize newlines)
Apply the cleanup to product descriptions before storing in Qdrant
Deploy and verify the edge functions are not empty after deployment
Starting analysis now...
@fszontagh I'm starting work on this issue now!
I'll implement HTML cleanup for product descriptions in the Qdrant synchronization edge functions. My approach:
1. Search for edge functions that sync products to Qdrant
2. Create a utility function to clean HTML from descriptions (remove tags, special chars, normalize newlines)
3. Apply the cleanup to product descriptions before storing in Qdrant
4. Deploy and verify the edge functions are not empty after deployment
Starting analysis now...
In the supabase edge functions where products are synchronized with qdrant, the decriptions contains html special chars, html tags.
We have to remove these formatting, only plain text allowed to store in qdrant.
Remove duplicated new lines. Replace html new lines
with one new line etc... we have to keep the paragraphs, but remove all text formatting to keep clear the descriptions and preserve space.
If you deploy edge funtion into supabase, verify if the deployed script is not empty after deployment.
@fszontagh I'm starting work on this issue now!
I'll implement HTML cleanup for product descriptions in the Qdrant synchronization edge functions. My approach:
Starting analysis now...
✅ Issue Completed
I've successfully implemented HTML cleanup for product descriptions in Qdrant synchronization.
Changes Made
Created
html-cleaner.tsutility (supabase/functions/_shared/html-cleaner.ts) ,&,<, etc.)<br/>,<br>,</p>,</div>tags to newlinesUpdated
qdrant-client.tscleanHtmlContent()function intocreateProductText()description,short_description, andmeta_descriptionfieldsDeployed Edge Functions
shopify-sync(v7)woocommerce-sync(v15)shoprenter-sync(v30)shoprenter-scheduled-sync(v28)mcp-shopify(v6)mcp-woocommerce(v7)mcp-shoprenter(v19)Verification
html-cleaner.tsdependencyExample Transformations
Before:
After:
Technical Details
The HTML cleaner:
<p>,<div>,<h1>-<h6>,<ul>,<ol>,<li>)•)&#xxx;and&#xXXX;)Committed:
dec7b6bPushed: ✅ origin/main