Data Pipeline¶

The pipeline fetches data from a Shopify store, generates OpenAI embeddings, stores them in MongoDB, and triggers a webhook to load embeddings for in-memory retrieval and search functionality.

1. Fetching Data from Shopify¶

Metaobjects:
- Endpoint: https://{shop_name}.myshopify.com/admin/api/2023-04/graphql.json
- Fetches articles and stores them in articles_content.txt.
Product Details:
- Endpoint: https://{shop_name}.myshopify.com/admin/api/2023-04/custom_collections.json
- Fetches product details, including title, description, price, SKU, etc., stored in product_details.txt.

2. Generating Embeddings¶

Library: OpenAI's text-embedding-ada-002 and LangChain.
Process:
- Load data from text files.
- Split documents into chunks.
- Generate embeddings for each chunk.
- Store embeddings and metadata for further processing.

3. Storing Embeddings in MongoDB¶

Setup: MongoDB connection established via environment variables.
Process:
- Clear existing data.
- Insert embeddings in batches of 10 into the embeddings collection.

4. Triggering Webhook Notification¶

Webhook:
- Sent after embedding insertion to trigger loading embeddings into memory.
- Payload: { "namespace": "Products" }
- Response: Confirms webhook success.

5. Loading Embeddings into Memory¶

FastAPI Endpoint:
- POST /load_embeddings/: Loads embeddings from MongoDB into FAISS for fast retrieval.

6. Searching Embeddings¶

Search API:
- POST /search/: Accepts query vector and namespace, returns top-k closest results using FAISS.

Workflow Summary:¶

Fetch data from Shopify (metaobjects, product details).
Generate OpenAI embeddings.
Store embeddings in MongoDB.
Trigger webhook to load embeddings into memory.
Search embeddings using query vectors via FAISS.

Notes:¶

API: FastAPI handles webhook notifications and embedding search.
Environment Variables: MongoDB credentials, OpenAI API key, and webhook URL are stored in .env file.