Skip to content

Data Pipeline

The pipeline fetches data from a Shopify store, generates OpenAI embeddings, stores them in MongoDB, and triggers a webhook to load embeddings for in-memory retrieval and search functionality.

1. Fetching Data from Shopify

  • Metaobjects:
    • Endpoint: https://{shop_name}.myshopify.com/admin/api/2023-04/graphql.json
    • Fetches articles and stores them in articles_content.txt.
  • Product Details:
    • Endpoint: https://{shop_name}.myshopify.com/admin/api/2023-04/custom_collections.json
    • Fetches product details, including title, description, price, SKU, etc., stored in product_details.txt.

2. Generating Embeddings

  • Library: OpenAI's text-embedding-ada-002 and LangChain.
  • Process:
    • Load data from text files.
    • Split documents into chunks.
    • Generate embeddings for each chunk.
    • Store embeddings and metadata for further processing.

3. Storing Embeddings in MongoDB

  • Setup: MongoDB connection established via environment variables.
  • Process:
    • Clear existing data.
    • Insert embeddings in batches of 10 into the embeddings collection.

4. Triggering Webhook Notification

  • Webhook:
    • Sent after embedding insertion to trigger loading embeddings into memory.
    • Payload: { "namespace": "Products" }
    • Response: Confirms webhook success.

5. Loading Embeddings into Memory

  • FastAPI Endpoint:
    • POST /load_embeddings/: Loads embeddings from MongoDB into FAISS for fast retrieval.

6. Searching Embeddings

  • Search API:
    • POST /search/: Accepts query vector and namespace, returns top-k closest results using FAISS.

Workflow Summary:

  1. Fetch data from Shopify (metaobjects, product details).
  2. Generate OpenAI embeddings.
  3. Store embeddings in MongoDB.
  4. Trigger webhook to load embeddings into memory.
  5. Search embeddings using query vectors via FAISS.

Notes:

  • API: FastAPI handles webhook notifications and embedding search.
  • Environment Variables: MongoDB credentials, OpenAI API key, and webhook URL are stored in .env file.