
Large Language Models and Generative AI are becoming increasingly prevalent, and they have profoundly changed our lives. For us as software engineers, this means our jobs are evolving rapidly. We now have many tools at our disposal, from automated code reviews to smart coding agents, that assist us in our daily tasks.
Considering that the ecosystem is huge and changing rapidly, it's easy to see how AI and LLMs will be part of our lives for many years to come. Therefore, it's important to have a clear understanding of the capabilities of these tools to get the most out of them.
For these reasons, in the last couple of months, I took a course about LLM Engineering on Udemy to begin to understand LLMs: how do they work, how to interact with them, how to create a production-ready and cost-effective application.
When I'm learning something new, my typical approach is to build something to gain a better comprehension of the topic. During the course, I immediately realized what my first project could be: an LLM capable of returning food nutritional information. I've always been passionate about health and food, so this was an opportunity to build a capable assistant that could help me count my daily macronutrients and calories.
The main challenge is related to the cost of the application. While it would only take a few minutes to write the right prompt and get a proper answer from GPT, the OpenAI API has a cost. Moreover, it's not guaranteed that LLMs have been pre-trained with nutritional facts. This led to two main choices when considering the right strategy:
Both solutions come with advantages and disadvantages. A RAG is easier to implement and its data source is simpler to update, but it strictly depends on the quality of the data retrieved. On the other hand, a fine-tuned model could be more accurate, but the training requires a lot of time and resources, and updating the model with new data is very difficult.
Considering that the solution should have the least possible impact on costs and resources, I decided to implement a RAG.
Before starting, it's important to find the right dataset for the project. After some research, I decided to adopt a Kaggle dataset that contained all the information I needed. For each food, considering a 100g serving, it provides information about:
The dataset didn't need any other preprocessing besides selecting and renaming the relevant columns. At this point, the dataset was ready. If you want to see how this was done, you can refer to the relevant Jupyter Notebook.
In a RAG system, a prompt is enriched with context from an external data source. This data is converted into numerical vectors, a process called vectorization. These vectors are then stored and retrieved, allowing the system to understand and work with the data based on its semantic meaning rather than just keywords.
There are many options available when choosing the right vector store, some of them are:
I decided to go with MongoDB. Its free tier storage (512MB) is more than enough for my use case (in the end, I only stored 36MB of data), the documentation is excellent, and it's really easy to integrate. At this point, what is just needed is:
For some spoilers, you can refer to the relevant Jupyter Notebook, where I explain how everything works.
First of all, the documents we have need to be transformed into vectors, so each row of the dataset is encoded using a SentenceTransformer.
With bigger documents, an important step is to split them into chunks, deciding on an overlapping window context to help with index retrieval. However, since the documents are small, this step is not needed here.
To populate the collection, the following code is needed:
client = MongoClient(uri)
collection = client[DB_NAME][COLLECTION_NAME]
result = collection.insert_many(docs_to_insert)
To query our RAG database, a Vector Search Index is created inside MongoDB. As the Atlas documentation explains, this is how it's created:
from pymongo.operations import SearchIndexModel
import time
index_name="vector_index"
search_index_model = SearchIndexModel(
definition = {
"fields": [
{
"type": "vector",
"numDimensions": 384,
"path": "embedding",
"similarity": "cosine"
}
]
},
name = index_name,
type = "vectorSearch"
)
collection.create_search_index(model=search_index_model)
Note that the index needs to know the vector dimension (384 in this example) and the similarity metric (cosine similarity in this case) to query the data effectively.
The implementation of the concepts discussed earlier can be found in this notebook.
The RAG process involves querying MongoDB with a given food item, and the results are then passed to the gpt-4o-mini model. As requested in the prompt, the model returns a JSON object, which can be easily consumed by a RESTful service. For example, an input of salmon fish will produce the following output:
{
"protein": 22.56,
"carbohydrates": 0,
"fats": 5.57,
"calories": 140,
"sugars": 0,
"fibers": 0
}
You may have thought this was the end of the article, but the main point of all this was to understand how a production-ready application could be effectively released and used in the market.
To achieve this goal, I've used modal.com, a serverless compute platform designed for AI and Machine Learning applications. Their free tier was enough for my experiments (they give from 5 to 30 dollars of free credits monthly), so it was good to go.
Here is the final outcome, a service easily deployed on the internet and ready to be used for a nutritional application. What I've done here is just taking the code blocks from the notebooks and put everything together using Modal APIs and FastAPI nomenclature.
In this case, to avoid exposing my OpenAI API Key, I've added a header called X-Open-AI-Api-Key that anyone can fill with their proper API key. This is an example request that will return the JSON stated above:
curl --location 'https://ibbus93--nutritional-rag-service-openai-nutritionalragse-f9d7ea.modal.run' \
--header 'Content-Type: application/json' \
--header 'X-Open-AI-Api-Key: sk-proj-your-api-key' \
--data '{
"description": "Salmon fish"
}'
In any case, for more details about the implementation and the example, please refer to my repository project.
I've spent weeks studying the world of LLMs on Udemy and trying to apply the concepts to this fun RAG project. Using the power provided by LLM, the applications are endless, limited only by human imagination.
In addition to the OpenAI-based RAG described here, I've also tested the application with open-source models like Llama and Qwen. This approach could potentially reduce costs compared to using OpenAI's API, but it would require introducing several new concepts (like transformers and tokenizers). I also believe the solution's accuracy can be slightly improved, so I'll save that discussion for a future article.
If you've made it this far, thank you for reading. I appreciate your attention! :)