Data Pipeline¶
The pipeline fetches data from a Shopify store, generates OpenAI embeddings, stores them in MongoDB, and triggers a webhook to load embeddings for in-memory retrieval and search functionality.
1. Fetching Data from Shopify¶
- Metaobjects:
- Endpoint:
https://{shop_name}.myshopify.com/admin/api/2023-04/graphql.json - Fetches articles and stores them in
articles_content.txt.
- Endpoint:
- Product Details:
- Endpoint:
https://{shop_name}.myshopify.com/admin/api/2023-04/custom_collections.json - Fetches product details, including title, description, price, SKU, etc., stored in
product_details.txt.
- Endpoint:
2. Generating Embeddings¶
- Library: OpenAI's
text-embedding-ada-002and LangChain. - Process:
- Load data from text files.
- Split documents into chunks.
- Generate embeddings for each chunk.
- Store embeddings and metadata for further processing.
3. Storing Embeddings in MongoDB¶
- Setup: MongoDB connection established via environment variables.
- Process:
- Clear existing data.
- Insert embeddings in batches of 10 into the
embeddingscollection.
4. Triggering Webhook Notification¶
- Webhook:
- Sent after embedding insertion to trigger loading embeddings into memory.
- Payload:
{ "namespace": "Products" } - Response: Confirms webhook success.
5. Loading Embeddings into Memory¶
- FastAPI Endpoint:
- POST
/load_embeddings/: Loads embeddings from MongoDB into FAISS for fast retrieval.
- POST
6. Searching Embeddings¶
- Search API:
- POST
/search/: Accepts query vector and namespace, returns top-k closest results using FAISS.
- POST
Workflow Summary:¶
- Fetch data from Shopify (metaobjects, product details).
- Generate OpenAI embeddings.
- Store embeddings in MongoDB.
- Trigger webhook to load embeddings into memory.
- Search embeddings using query vectors via FAISS.
Notes:¶
- API: FastAPI handles webhook notifications and embedding search.
- Environment Variables: MongoDB credentials, OpenAI API key, and webhook URL are stored in
.envfile.