Browse Source

docs: add comprehensive Qdrant integration documentation #74

- Created QDRANT_INTEGRATION.md with full implementation details
- Documented collection schemas and naming conventions
- Added monitoring and troubleshooting guides
- Outlined future enhancements and security considerations
Claude 5 months ago
parent
commit
cb68ca9b87
1 changed files with 378 additions and 0 deletions
  1. 378 0
      docs/QDRANT_INTEGRATION.md

+ 378 - 0
docs/QDRANT_INTEGRATION.md

@@ -0,0 +1,378 @@
+# Qdrant Vector Database Integration
+
+## Overview
+
+This document describes the Qdrant vector database integration for the ShopCall.ai store synchronization system. The integration enables semantic search and AI-powered features across all e-commerce platforms.
+
+## Architecture
+
+### Vector Database Configuration
+
+- **Provider**: Qdrant
+- **Endpoint**: http://142.93.100.6:6333
+- **API Key**: pyXAyyEPbLzba2RvdBwm
+- **Vector Dimensions**: 3072 (OpenAI text-embedding-3-large compatible)
+- **Distance Metric**: Cosine (optimal for normalized text embeddings)
+
+### Collection Naming Convention
+
+Each store has separate collections for different entity types:
+- `{shopname}-products` - Product catalog
+- `{shopname}-orders` - Order history (if permitted)
+- `{shopname}-customers` - Customer data (if permitted)
+
+The `{shopname}` is sanitized: lowercase, alphanumeric with hyphens.
+
+## Data Privacy & Permissions
+
+### Store-Level Control
+
+The `stores.data_access_permissions` JSONB field controls what data can be synced:
+
+```json
+{
+  "allow_product_access": true,
+  "allow_order_access": true,
+  "allow_customer_access": true
+}
+```
+
+### Privacy Compliance
+
+- Products are always synced (core catalog data)
+- Orders and customers respect store owner preferences
+- SQL cache and Qdrant sync both check permissions
+- Helper functions: `can_sync_products()`, `can_sync_orders()`, `can_sync_customers()`
+
+## Collection Schemas
+
+### Products Collection
+
+**Payload Structure:**
+```typescript
+{
+  store_id: string,
+  product_id: string,
+  platform: "shopify" | "woocommerce" | "shoprenter",
+  title/name: string,
+  sku: string,
+  price: number,
+  status: string,
+  description: string,
+  tags: string[],
+  synced_at: string (ISO 8601)
+}
+```
+
+**Payload Indexes:**
+- `store_id` (keyword)
+- `product_id` (keyword)
+- `platform` (keyword)
+- `status` (keyword)
+- `price` (float)
+- `sku` (keyword)
+
+### Orders Collection
+
+**Payload Structure:**
+```typescript
+{
+  store_id: string,
+  order_id: string,
+  platform: string,
+  order_number: string,
+  status/financial_status: string,
+  total/total_price: number,
+  currency: string,
+  customer_name: string,
+  customer_email: string,
+  synced_at: string
+}
+```
+
+**Payload Indexes:**
+- `store_id` (keyword)
+- `order_id` (keyword)
+- `platform` (keyword)
+- `status` (keyword)
+- `total_price` (float)
+- `customer_email` (keyword)
+
+### Customers Collection
+
+**Payload Structure:**
+```typescript
+{
+  store_id: string,
+  customer_id: string,
+  platform: string,
+  email: string,
+  first_name: string,
+  last_name: string,
+  phone: string,
+  orders_count: number,
+  total_spent: number,
+  synced_at: string
+}
+```
+
+**Payload Indexes:**
+- `store_id` (keyword)
+- `customer_id` (keyword)
+- `platform` (keyword)
+- `email` (keyword)
+
+## Implementation Status
+
+### ✅ Completed
+
+1. **Qdrant Client Library** (`supabase/functions/_shared/qdrant-client.ts`)
+   - Collection management (create, delete, exists)
+   - Point operations (upsert, delete, scroll, search)
+   - Change detection support
+   - Text embedding helpers
+
+2. **Database Schema** (Migration: `20251111_qdrant_integration.sql`)
+   - Extended `data_access_permissions`
+   - Added Qdrant-specific columns to `stores`
+   - Created `qdrant_sync_logs` table
+   - Helper functions for permission checks
+   - Automatic trigger for `qdrant_last_sync_at`
+
+3. **Shopify Sync** (`shopify-sync/index.ts`)
+   - Full Qdrant integration
+   - Change detection for deleted products
+   - Privacy-compliant
+   - Comprehensive logging
+
+4. **WooCommerce Sync** (`woocommerce-sync/index.ts`)
+   - Full Qdrant integration
+   - Pagination-aware collection
+   - Privacy-compliant
+   - Comprehensive logging
+
+### 🔄 Pending
+
+1. **ShopRenter Sync** (`shoprenter-sync/index.ts`)
+   - Needs same updates as Shopify/WooCommerce
+   - Follow established pattern
+   - Import Qdrant client functions
+   - Add Qdrant sync helper functions
+   - Update main sync functions
+
+2. **Production Embeddings**
+   - Replace `generateSimpleEmbedding()` with OpenAI API
+   - Use `text-embedding-3-large` model
+   - Implement batching for efficiency
+   - Add embedding cost tracking
+
+## Sync Flow
+
+### Initial Sync
+
+1. Check store permissions
+2. Initialize Qdrant collections (if not exist)
+3. Fetch data from e-commerce platform
+4. Sync to SQL cache (existing behavior)
+5. If Qdrant enabled:
+   - Generate embeddings for text content
+   - Upsert points to Qdrant
+   - Log operation to `qdrant_sync_logs`
+
+### Change Detection
+
+1. Scroll through existing Qdrant points for store
+2. Compare with current product IDs from platform
+3. Identify deleted products (in Qdrant but not in platform)
+4. Delete stale points from Qdrant
+5. Log deletion operation
+
+### Scheduled Sync
+
+The existing scheduled sync mechanisms (pg_cron) will automatically include Qdrant:
+- `shopify-scheduled-sync` → syncs Shopify stores
+- `woocommerce-scheduled-sync` → syncs WooCommerce stores
+- `shoprenter-scheduled-sync` → syncs ShopRenter stores
+
+All respect `qdrant_sync_enabled` flag.
+
+## Monitoring & Debugging
+
+### Qdrant Sync Logs
+
+Query recent sync operations:
+
+```sql
+SELECT
+  s.store_name,
+  qsl.sync_type,
+  qsl.collection_name,
+  qsl.operation,
+  qsl.items_processed,
+  qsl.items_succeeded,
+  qsl.items_failed,
+  qsl.error_message,
+  qsl.duration_ms,
+  qsl.created_at
+FROM qdrant_sync_logs qsl
+JOIN stores s ON s.id = qsl.store_id
+ORDER BY qsl.created_at DESC
+LIMIT 50;
+```
+
+### Check Store Qdrant Status
+
+```sql
+SELECT
+  store_name,
+  platform_name,
+  qdrant_sync_enabled,
+  qdrant_last_sync_at,
+  data_access_permissions
+FROM stores
+WHERE is_active = true;
+```
+
+### Collection Info
+
+Use the Qdrant API:
+
+```bash
+curl -X GET "http://142.93.100.6:6333/collections/{collection-name}" \
+  -H "api-key: pyXAyyEPbLzba2RvdBwm"
+```
+
+## Performance Considerations
+
+### Batching
+
+- Points are upserted in chunks of 100
+- Prevents payload size limits
+- Improves network efficiency
+
+### Indexing
+
+- Payload indexes created on frequently filtered fields
+- Indexing threshold: 10,000 points
+- On-disk storage enabled for large collections
+
+### Rate Limiting
+
+- Respects e-commerce platform rate limits
+- No additional rate limiting for Qdrant
+- Async operations don't block SQL sync
+
+## Future Enhancements
+
+### 1. Real-time Embeddings
+
+Replace placeholder embeddings with OpenAI API:
+
+```typescript
+import { OpenAI } from 'openai'
+
+async function generateEmbedding(text: string): Promise<number[]> {
+  const openai = new OpenAI({ apiKey: Deno.env.get('OPENAI_API_KEY') })
+
+  const response = await openai.embeddings.create({
+    model: 'text-embedding-3-large',
+    input: text,
+    dimensions: 3072
+  })
+
+  return response.data[0].embedding
+}
+```
+
+### 2. Semantic Search API
+
+Add endpoint for vector similarity search:
+
+```typescript
+// Search products across all stores
+POST /api/search
+{
+  "query": "blue running shoes",
+  "limit": 10,
+  "filter": {
+    "platform": "shopify",
+    "price": { "gte": 50, "lte": 150 }
+  }
+}
+```
+
+### 3. AI Features
+
+- Product recommendations
+- Customer segmentation
+- Semantic product search
+- Duplicate product detection
+- Auto-categorization
+
+### 4. Analytics
+
+- Vector space visualization
+- Cluster analysis
+- Trend detection
+- Anomaly detection
+
+## Security
+
+### API Key Management
+
+- Qdrant API key stored in Edge Function environment
+- Not exposed to frontend
+- Rotated periodically
+
+### Data Access
+
+- Row-level security on `qdrant_sync_logs`
+- Users only see their own store sync logs
+- Service role required for cross-store operations
+
+### Privacy
+
+- Store owners control what data is synced
+- Audit trail in `store_permission_audit`
+- GDPR-compliant data deletion
+
+## Troubleshooting
+
+### Common Issues
+
+**Issue**: Collection not found
+- **Cause**: First sync didn't create collection
+- **Fix**: Call `initializeStoreCollections()` manually
+
+**Issue**: Points not updating
+- **Cause**: Qdrant sync disabled or permissions denied
+- **Fix**: Check `qdrant_sync_enabled` and `data_access_permissions`
+
+**Issue**: High sync latency
+- **Cause**: Large dataset + embedding generation
+- **Fix**: Implement batching and async processing
+
+### Debug Mode
+
+Enable verbose logging:
+
+```typescript
+// In Qdrant client
+console.log('[Qdrant] Detailed operation log:', operation)
+```
+
+Check Edge Function logs:
+
+```bash
+# Via Supabase CLI
+supabase functions logs shopify-sync
+
+# Via MCP tool
+mcp__supabase__get_logs(service: "edge-function")
+```
+
+## References
+
+- [Qdrant Documentation](https://qdrant.tech/documentation/)
+- [OpenAI Embeddings](https://platform.openai.com/docs/guides/embeddings)
+- [Vector Databases Guide](https://www.pinecone.io/learn/vector-database/)