[{"data":1,"prerenderedAt":2694},["ShallowReactive",2],{"articles-list-en":3},[4,1357,2151],{"id":5,"title":6,"alt":7,"body":8,"date":1342,"description":1343,"extension":1218,"image":1344,"locale":1345,"meta":1346,"navigation":373,"path":1347,"seo":1348,"stem":1349,"tags":1350,"__hash__":1356},"articles/en/articles/3.open-source-rag-quantization.md","Adopting open-source models for cost-effective LLM solutions","Nutritional RAG LLM using open source models",{"type":9,"value":10,"toc":1330},"minimark",[11,16,26,29,45,49,52,55,58,165,175,179,225,245,249,258,262,269,273,295,299,307,328,538,542,550,553,648,651,718,721,787,800,864,867,978,982,985,1035,1038,1138,1142,1145,1205,1208,1211,1214,1278,1285,1289,1298,1301,1305,1326],[12,13,15],"h3",{"id":14},"context","Context",[17,18,19,20,25],"p",{},"In my ",[21,22,24],"a",{"href":23},"./a-nutritional-llm-assistant-using-rag","previous article",", I introduced the purpose of my studies: to understand how to leverage LLMs for creating cost-effective, production-ready solutions. Although the initial solution was functional, it was not cost-effective, since it relied on expensive REST requests to OpenAI APIs. To address this, I started to explore open-source models that are small enough to be easily deployed and adopted in a real-world application.",[17,27,28],{},"In the following paragraphs, I will describe my journey, including:",[30,31,32,36,39,42],"ul",{},[33,34,35],"li",{},"choosing the right model that fits my needs",[33,37,38],{},"applying quantization for optimizing the model",[33,40,41],{},"forcing the output to be in a specific JSON format",[33,43,44],{},"a comparison between a 7B parameters quantized model versus a 1B parameters non-quantized model",[12,46,48],{"id":47},"model-choice","Model choice",[17,50,51],{},"When choosing a model, one of the most important factors to consider is the number of parameters it has. Models trained with a higher number of parameters, often in the billions, are generally more powerful and can perform a wider range of tasks, but they also require more computational resources, which can lead to higher costs and slower execution times.",[17,53,54],{},"For this project, the main goal was to create a cost-effective solution, so I decided to focus on smaller models with a lower number of parameters. These models are easier to deploy and can be run on less expensive hardware, making them a more practical choice for real-world applications where budget is a concern.",[17,56,57],{},"In the following table it's possible to see, when the parameters grow, the execution time and the required space in RAM and disk grow as well.",[59,60,61,94],"table",{},[62,63,64],"thead",{},[65,66,67,74,79,84,89],"tr",{},[68,69,70],"th",{},[71,72,73],"strong",{},"LLM Model",[68,75,76],{},[71,77,78],{},"Parameters (Billions)",[68,80,81],{},[71,82,83],{},"VRAM Space (Inference, Est.)",[68,85,86],{},[71,87,88],{},"Disk Space (Est.)",[68,90,91],{},[71,92,93],{},"Execution Time (Latency)",[95,96,97,114,131,148],"tbody",{},[65,98,99,103,106,109,111],{},[100,101,102],"td",{},"Llama 3.1 Instruct 1B",[100,104,105],{},"1 B",[100,107,108],{},"~ 2.3 GB",[100,110,108],{},[100,112,113],{},"Very Low. Ideal for lightweight inference on basic GPUs or CPUs.",[65,115,116,119,122,125,128],{},[100,117,118],{},"Qwen 2.5 Instruct 7B",[100,120,121],{},"7 B",[100,123,124],{},"~ 5GB (4 bit) / ~ 15.2 GB (BF16)",[100,126,127],{},"~ 7-13 GB",[100,129,130],{},"Low/Medium. Similar efficiency to Llama 8B, runnable on a single >= 16 GB GPU.",[65,132,133,136,139,142,145],{},[100,134,135],{},"Llama 3.1 Instruct 8B",[100,137,138],{},"8 B",[100,140,141],{},"4.9 GB (Q4_K_M) / 16 GB (FP16)",[100,143,144],{},"~ 5-16 GB",[100,146,147],{},"Low/Medium. Highly efficient, runnable on a single mid-range GPU.",[65,149,150,153,156,159,162],{},[100,151,152],{},"Llama 3.1 Instruct 70B",[100,154,155],{},"70 B",[100,157,158],{},"~ 40 GB (Q4_K_M) / 141 GB (FP16)",[100,160,161],{},"~ 40-141 GB",[100,163,164],{},"Medium/High. Requires high-end GPUs or multiple GPUs.",[17,166,167,168,171,172,174],{},"For the research purposes, and trying to push the application limits, I've decided to use ",[169,170,102],"code",{}," and ",[169,173,118],{},".",[12,176,178],{"id":177},"model-quantization","Model quantization",[17,180,181,182,185,186,185,189,185,192,185,195,198,199,202,203,208,209,212,213,216,217,220,221,224],{},"The table mentions terms like ",[169,183,184],{},"Q3_K_L",", ",[169,187,188],{},"Q4_K_M",[169,190,191],{},"BF16",[169,193,194],{},"FP16",[169,196,197],{},"4 bits",", and ",[169,200,201],{},"8 bits",", which all refer to quantization techniques. ",[21,204,207],{"href":205,"target":206},"https://huggingface.co/docs/transformers/main/quantization/overview","_blank","Quantization"," is a method for shrinking the memory and processing power required by LLMs. It works by converting the model's parameters from their standard ",[169,210,211],{},"32-bit"," floating-point format into lower-precision types, like ",[169,214,215],{},"8-bit"," or even ",[169,218,219],{},"4-bit"," integers. These parameters, also known as ",[71,222,223],{},"weights",", are the numerical values that the model learns during training and that define its behavior.",[17,226,227,228,230,231,234,235,238,239,241,242,244],{},"A standard, unquantized model delivers maximum accuracy but demands substantial memory and computational power, as shown in the table. By applying a ",[169,229,219],{}," quantization to a model like ",[169,232,233],{},"Qwen 7B"," or ",[169,236,237],{},"Llama 8B",", high-precision values are mapped to a more compact, lower-precision range. Although this can slightly reduce accuracy, the trade-off is a significantly smaller memory footprint and faster performance, which is great when either deploying models on resource-constrained devices or trying to contain costs. Following the research purpose, I adopted ",[169,240,219],{}," quantization to the ",[169,243,233],{}," model, which led to a memory footprint of around 5GB instead of the starting 15GB.",[12,246,248],{"id":247},"transformers-and-tokenization","Transformers and tokenization",[17,250,251,252,171,255,174],{},"Before getting our hands dirty, it's important to have a full context of what you will read in the next lines of code. For this reason, I need to introduce to you two new core concepts, two pillars behind LLMs: ",[71,253,254],{},"transformers",[71,256,257],{},"tokenizers",[259,260,261],"h4",{"id":254},"Transformers",[17,263,264,265,268],{},"At the heart of LLMs there are ",[21,266,254],{"href":267,"target":206},"https://developers.google.com/machine-learning/crash-course/llm/transformers",", a type of neural network architecture that excels at handling sequential data, such as text. Transformers are composed of two main parts: an encoder and a decoder. The encoder processes the input text and creates a numerical representation of it, while the decoder uses this representation to generate the output text.",[259,270,272],{"id":271},"tokenizer","Tokenizer",[17,274,275,276,279,280,283,284,287,288,290,291,174],{},"Before a transformer can process text, the text must be converted into a format that the model can understand: this is where ",[71,277,278],{},"tokenization"," comes in. Tokenization is the process of breaking down a piece of text into smaller units, called ",[71,281,282],{},"tokens",", that can be words, subwords, or even individual characters.",[285,286],"br",{},"\nA ",[71,289,271],{}," is a tool that is responsible for performing this tokenization. The tokenizer has a vocabulary of all the tokens that the model knows, and it maps each token to a unique numerical ID. This sequence of IDs is then what is fed into the transformer model. You can find more information about tokenization ",[21,292,294],{"href":293,"target":206},"https://huggingface.co/learn/llm-course/chapter2/4","on HuggingFace",[12,296,298],{"id":297},"applying-quantization-to-qwen","Applying quantization to Qwen",[17,300,301,302,306],{},"Now that we have all the pieces, we can finally compose our puzzle. I've created a proper ",[21,303,305],{"href":304,"target":206},"https://colab.research.google.com/drive/1SPlXn66dBq8jSx3RdqEPwQQw9NttRzN3?usp=sharing","Google Colab Notebook",", you can easily run the following steps from there and understand better how everything works.",[17,308,309,310,313,314,317,318,320,321,324,325,327],{},"First, we need to create a quantization configuration. For doing this, I've used a library called ",[71,311,312],{},"BitsAndBytesConfig"," but there are ",[21,315,316],{"href":205,"target":206},"tons"," available with the same purpose, depending on the method adopted and the supported process unit (CPU, GPU).",[285,319],{},"\nWith this configuration, we can then initialize ",[71,322,323],{},"a quantized model"," and the ",[71,326,271],{}," obtained from the base model:",[329,330,335],"pre",{"className":331,"code":332,"language":333,"meta":334,"style":334},"language-python shiki shiki-themes material-theme-lighter material-theme material-theme-palenight","from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig\n\nBASE_MODEL = \"qwen/Qwen2.5-7B-Instruct\"\n\n# Create the configuration\nquantization_config = BitsAndBytesConfig(load_in_4bit=True)\n\n# Load the model and apply quantization \nquantized_model = AutoModelForCausalLM.from_pretrained(\n  BASE_MODEL, \n  device_map=\"auto\", # This allow to use CUDA if available\n  quantization_config=quantization_config\n)\n\n# Load the model tokenizer\ntokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)\n","python","",[169,336,337,368,375,394,399,406,428,433,439,457,468,489,500,506,511,517],{"__ignoreMap":334},[338,339,342,346,350,353,356,360,363,365],"span",{"class":340,"line":341},"line",1,[338,343,345],{"class":344},"s7zQu","from",[338,347,349],{"class":348},"sTEyZ"," transformers ",[338,351,352],{"class":344},"import",[338,354,355],{"class":348}," AutoTokenizer",[338,357,359],{"class":358},"sMK4o",",",[338,361,362],{"class":348}," AutoModelForCausalLM",[338,364,359],{"class":358},[338,366,367],{"class":348}," BitsAndBytesConfig\n",[338,369,371],{"class":340,"line":370},2,[338,372,374],{"emptyLinePlaceholder":373},true,"\n",[338,376,378,381,384,387,391],{"class":340,"line":377},3,[338,379,380],{"class":348},"BASE_MODEL ",[338,382,383],{"class":358},"=",[338,385,386],{"class":358}," \"",[338,388,390],{"class":389},"sfazB","qwen/Qwen2.5-7B-Instruct",[338,392,393],{"class":358},"\"\n",[338,395,397],{"class":340,"line":396},4,[338,398,374],{"emptyLinePlaceholder":373},[338,400,402],{"class":340,"line":401},5,[338,403,405],{"class":404},"sHwdD","# Create the configuration\n",[338,407,409,412,414,418,421,425],{"class":340,"line":408},6,[338,410,411],{"class":348},"quantization_config ",[338,413,383],{"class":358},[338,415,417],{"class":416},"s2Zo4"," BitsAndBytesConfig",[338,419,420],{"class":358},"(",[338,422,424],{"class":423},"sHdIc","load_in_4bit",[338,426,427],{"class":358},"=True)\n",[338,429,431],{"class":340,"line":430},7,[338,432,374],{"emptyLinePlaceholder":373},[338,434,436],{"class":340,"line":435},8,[338,437,438],{"class":404},"# Load the model and apply quantization \n",[338,440,442,445,447,449,451,454],{"class":340,"line":441},9,[338,443,444],{"class":348},"quantized_model ",[338,446,383],{"class":358},[338,448,362],{"class":348},[338,450,174],{"class":358},[338,452,453],{"class":416},"from_pretrained",[338,455,456],{"class":358},"(\n",[338,458,460,463,465],{"class":340,"line":459},10,[338,461,462],{"class":416},"  BASE_MODEL",[338,464,359],{"class":358},[338,466,467],{"class":416}," \n",[338,469,471,474,476,479,482,484,486],{"class":340,"line":470},11,[338,472,473],{"class":423},"  device_map",[338,475,383],{"class":358},[338,477,478],{"class":358},"\"",[338,480,481],{"class":389},"auto",[338,483,478],{"class":358},[338,485,359],{"class":358},[338,487,488],{"class":404}," # This allow to use CUDA if available\n",[338,490,492,495,497],{"class":340,"line":491},12,[338,493,494],{"class":423},"  quantization_config",[338,496,383],{"class":358},[338,498,499],{"class":416},"quantization_config\n",[338,501,503],{"class":340,"line":502},13,[338,504,505],{"class":358},")\n",[338,507,509],{"class":340,"line":508},14,[338,510,374],{"emptyLinePlaceholder":373},[338,512,514],{"class":340,"line":513},15,[338,515,516],{"class":404},"# Load the model tokenizer\n",[338,518,520,523,525,527,529,531,533,536],{"class":340,"line":519},16,[338,521,522],{"class":348},"tokenizer ",[338,524,383],{"class":358},[338,526,355],{"class":348},[338,528,174],{"class":358},[338,530,453],{"class":416},[338,532,420],{"class":358},[338,534,535],{"class":416},"BASE_MODEL",[338,537,505],{"class":358},[12,539,541],{"id":540},"outlines-for-structured-outputs","Outlines for structured outputs",[17,543,544,545,549],{},"To force the LLM to return a structured output, I used ",[21,546,548],{"href":547,"target":206},"https://github.com/dottxt-ai/outlines","Outlines",", a library that guides the model's output to conform to a specific structure, such as a JSON schema or a Pydantic model.",[17,551,552],{},"To achieve this goal, first define a Pydantic model:",[329,554,556],{"className":331,"code":555,"language":333,"meta":334,"style":334},"from pydantic import BaseModel\n\nclass Food(BaseModel):\n  protein: float\n  carbohydrates: float\n  fats: float\n  calories: float\n  sugar: float\n  fiber: float\n",[169,557,558,570,574,592,603,612,621,630,639],{"__ignoreMap":334},[338,559,560,562,565,567],{"class":340,"line":341},[338,561,345],{"class":344},[338,563,564],{"class":348}," pydantic ",[338,566,352],{"class":344},[338,568,569],{"class":348}," BaseModel\n",[338,571,572],{"class":340,"line":370},[338,573,374],{"emptyLinePlaceholder":373},[338,575,576,580,584,586,589],{"class":340,"line":377},[338,577,579],{"class":578},"spNyl","class",[338,581,583],{"class":582},"sBMFI"," Food",[338,585,420],{"class":358},[338,587,588],{"class":582},"BaseModel",[338,590,591],{"class":358},"):\n",[338,593,594,597,600],{"class":340,"line":396},[338,595,596],{"class":348},"  protein",[338,598,599],{"class":358},":",[338,601,602],{"class":582}," float\n",[338,604,605,608,610],{"class":340,"line":401},[338,606,607],{"class":348},"  carbohydrates",[338,609,599],{"class":358},[338,611,602],{"class":582},[338,613,614,617,619],{"class":340,"line":408},[338,615,616],{"class":348},"  fats",[338,618,599],{"class":358},[338,620,602],{"class":582},[338,622,623,626,628],{"class":340,"line":430},[338,624,625],{"class":348},"  calories",[338,627,599],{"class":358},[338,629,602],{"class":582},[338,631,632,635,637],{"class":340,"line":435},[338,633,634],{"class":348},"  sugar",[338,636,599],{"class":358},[338,638,602],{"class":582},[338,640,641,644,646],{"class":340,"line":441},[338,642,643],{"class":348},"  fiber",[338,645,599],{"class":358},[338,647,602],{"class":582},[17,649,650],{},"Then, create a generator that will output a structured output when the request is fulfilled by the LLM:",[329,652,654],{"className":331,"code":653,"language":333,"meta":334,"style":334},"from outlines import from_transformers, Generator\n\ngenerator = Generator(\n  from_transformers(quantized_model, tokenizer), \n  Food\n)\n",[169,655,656,673,677,689,709,714],{"__ignoreMap":334},[338,657,658,660,663,665,668,670],{"class":340,"line":341},[338,659,345],{"class":344},[338,661,662],{"class":348}," outlines ",[338,664,352],{"class":344},[338,666,667],{"class":348}," from_transformers",[338,669,359],{"class":358},[338,671,672],{"class":348}," Generator\n",[338,674,675],{"class":340,"line":370},[338,676,374],{"emptyLinePlaceholder":373},[338,678,679,682,684,687],{"class":340,"line":377},[338,680,681],{"class":348},"generator ",[338,683,383],{"class":358},[338,685,686],{"class":416}," Generator",[338,688,456],{"class":358},[338,690,691,694,696,699,701,704,707],{"class":340,"line":396},[338,692,693],{"class":416},"  from_transformers",[338,695,420],{"class":358},[338,697,698],{"class":416},"quantized_model",[338,700,359],{"class":358},[338,702,703],{"class":416}," tokenizer",[338,705,706],{"class":358},"),",[338,708,467],{"class":416},[338,710,711],{"class":340,"line":401},[338,712,713],{"class":416},"  Food\n",[338,715,716],{"class":340,"line":408},[338,717,505],{"class":358},[17,719,720],{},"Finally, we can use the generator to get a structured output from a prompt:",[329,722,724],{"className":331,"code":723,"language":333,"meta":334,"style":334},"prompt = \"\"\"\n  Get the nutritional data of the following food ingredient: **salmon fish**.\n  Use the following context: ...\n\"\"\"\nresult = generator(\n  prompt, \n  max_new_tokens=200\n)\n",[169,725,726,736,741,746,751,763,772,783],{"__ignoreMap":334},[338,727,728,731,733],{"class":340,"line":341},[338,729,730],{"class":348},"prompt ",[338,732,383],{"class":358},[338,734,735],{"class":358}," \"\"\"\n",[338,737,738],{"class":340,"line":370},[338,739,740],{"class":389},"  Get the nutritional data of the following food ingredient: **salmon fish**.\n",[338,742,743],{"class":340,"line":377},[338,744,745],{"class":389},"  Use the following context: ...\n",[338,747,748],{"class":340,"line":396},[338,749,750],{"class":358},"\"\"\"\n",[338,752,753,756,758,761],{"class":340,"line":401},[338,754,755],{"class":348},"result ",[338,757,383],{"class":358},[338,759,760],{"class":416}," generator",[338,762,456],{"class":358},[338,764,765,768,770],{"class":340,"line":408},[338,766,767],{"class":416},"  prompt",[338,769,359],{"class":358},[338,771,467],{"class":416},[338,773,774,777,779],{"class":340,"line":430},[338,775,776],{"class":423},"  max_new_tokens",[338,778,383],{"class":358},[338,780,782],{"class":781},"sbssI","200\n",[338,784,785],{"class":340,"line":435},[338,786,505],{"class":358},[17,788,789,790,794,795,799],{},"I've deployed a ",[21,791,793],{"href":792,"target":206},"https://github.com/federicoibba/nutritional-information-rag/blob/main/services/qwen.py","Qwen service"," on ",[21,796,798],{"href":797,"target":206},"https://modal.com","modal"," for testing it and a proper response can be obtained with the following cURL:",[329,801,805],{"className":802,"code":803,"language":804,"meta":334,"style":334},"language-bash shiki shiki-themes material-theme-lighter material-theme material-theme-palenight","curl --location 'https://ibbus93--nutritional-rag-service-qwen-nutritionalragserv-7ae00e.modal.run/' \\\n--header 'Content-Type: application/json' \\\n--data '{\n    \"description\": \"Salmon fish\"\n}'\n","bash",[169,806,807,827,841,851,856],{"__ignoreMap":334},[338,808,809,812,815,818,821,824],{"class":340,"line":341},[338,810,811],{"class":582},"curl",[338,813,814],{"class":389}," --location",[338,816,817],{"class":358}," '",[338,819,820],{"class":389},"https://ibbus93--nutritional-rag-service-qwen-nutritionalragserv-7ae00e.modal.run/",[338,822,823],{"class":358},"'",[338,825,826],{"class":348}," \\\n",[338,828,829,832,834,837,839],{"class":340,"line":370},[338,830,831],{"class":348},"--header ",[338,833,823],{"class":358},[338,835,836],{"class":389},"Content-Type: application/json",[338,838,823],{"class":358},[338,840,826],{"class":348},[338,842,843,846,848],{"class":340,"line":377},[338,844,845],{"class":348},"--data ",[338,847,823],{"class":358},[338,849,850],{"class":389},"{\n",[338,852,853],{"class":340,"line":396},[338,854,855],{"class":389},"    \"description\": \"Salmon fish\"\n",[338,857,858,861],{"class":340,"line":401},[338,859,860],{"class":389},"}",[338,862,863],{"class":358},"'\n",[17,865,866],{},"Which will lead to this result:",[329,868,872],{"className":869,"code":870,"language":871,"meta":334,"style":334},"language-json shiki shiki-themes material-theme-lighter material-theme material-theme-palenight","{\n  \"protein\": 22.56,\n  \"carbohydrates\": 0.0,\n  \"fats\": 5.57,\n  \"calories\": 140,\n  \"sugar\": 0.0,\n  \"fiber\": 0.0\n}\n","json",[169,873,874,878,896,912,928,944,959,973],{"__ignoreMap":334},[338,875,876],{"class":340,"line":341},[338,877,850],{"class":358},[338,879,880,883,886,888,890,893],{"class":340,"line":370},[338,881,882],{"class":358},"  \"",[338,884,885],{"class":578},"protein",[338,887,478],{"class":358},[338,889,599],{"class":358},[338,891,892],{"class":781}," 22.56",[338,894,895],{"class":358},",\n",[338,897,898,900,903,905,907,910],{"class":340,"line":377},[338,899,882],{"class":358},[338,901,902],{"class":578},"carbohydrates",[338,904,478],{"class":358},[338,906,599],{"class":358},[338,908,909],{"class":781}," 0.0",[338,911,895],{"class":358},[338,913,914,916,919,921,923,926],{"class":340,"line":396},[338,915,882],{"class":358},[338,917,918],{"class":578},"fats",[338,920,478],{"class":358},[338,922,599],{"class":358},[338,924,925],{"class":781}," 5.57",[338,927,895],{"class":358},[338,929,930,932,935,937,939,942],{"class":340,"line":401},[338,931,882],{"class":358},[338,933,934],{"class":578},"calories",[338,936,478],{"class":358},[338,938,599],{"class":358},[338,940,941],{"class":781}," 140",[338,943,895],{"class":358},[338,945,946,948,951,953,955,957],{"class":340,"line":408},[338,947,882],{"class":358},[338,949,950],{"class":578},"sugar",[338,952,478],{"class":358},[338,954,599],{"class":358},[338,956,909],{"class":781},[338,958,895],{"class":358},[338,960,961,963,966,968,970],{"class":340,"line":430},[338,962,882],{"class":358},[338,964,965],{"class":578},"fiber",[338,967,478],{"class":358},[338,969,599],{"class":358},[338,971,972],{"class":781}," 0.0\n",[338,974,975],{"class":340,"line":435},[338,976,977],{"class":358},"}\n",[12,979,981],{"id":980},"llama-service","Llama service",[17,983,984],{},"I've deployed a Llama service as well and it can be tested like below:",[329,986,988],{"className":802,"code":987,"language":804,"meta":334,"style":334},"curl --location 'https://ibbus93--nutritional-rag-service-llama-nutritionalragser-fc918b.modal.run/' \\\n--header 'Content-Type: application/json' \\\n--data '{\n    \"description\": \"Salmon fish\"\n}'\n",[169,989,990,1005,1017,1025,1029],{"__ignoreMap":334},[338,991,992,994,996,998,1001,1003],{"class":340,"line":341},[338,993,811],{"class":582},[338,995,814],{"class":389},[338,997,817],{"class":358},[338,999,1000],{"class":389},"https://ibbus93--nutritional-rag-service-llama-nutritionalragser-fc918b.modal.run/",[338,1002,823],{"class":358},[338,1004,826],{"class":348},[338,1006,1007,1009,1011,1013,1015],{"class":340,"line":370},[338,1008,831],{"class":348},[338,1010,823],{"class":358},[338,1012,836],{"class":389},[338,1014,823],{"class":358},[338,1016,826],{"class":348},[338,1018,1019,1021,1023],{"class":340,"line":377},[338,1020,845],{"class":348},[338,1022,823],{"class":358},[338,1024,850],{"class":389},[338,1026,1027],{"class":340,"line":396},[338,1028,855],{"class":389},[338,1030,1031,1033],{"class":340,"line":401},[338,1032,860],{"class":389},[338,1034,863],{"class":358},[17,1036,1037],{},"Even though the two services received the same input and used the same database, the Llama service returned a different response:",[329,1039,1041],{"className":869,"code":1040,"language":871,"meta":334,"style":334},"{\n  \"protein\": 23.19,\n  \"carbohydrates\": 0.0,\n  \"fats\": 12.95,\n  \"calories\": 209,\n  \"sugars\": 0.0,\n  \"fibre\": 0.0\n}\n",[169,1042,1043,1047,1062,1076,1091,1106,1121,1134],{"__ignoreMap":334},[338,1044,1045],{"class":340,"line":341},[338,1046,850],{"class":358},[338,1048,1049,1051,1053,1055,1057,1060],{"class":340,"line":370},[338,1050,882],{"class":358},[338,1052,885],{"class":578},[338,1054,478],{"class":358},[338,1056,599],{"class":358},[338,1058,1059],{"class":781}," 23.19",[338,1061,895],{"class":358},[338,1063,1064,1066,1068,1070,1072,1074],{"class":340,"line":377},[338,1065,882],{"class":358},[338,1067,902],{"class":578},[338,1069,478],{"class":358},[338,1071,599],{"class":358},[338,1073,909],{"class":781},[338,1075,895],{"class":358},[338,1077,1078,1080,1082,1084,1086,1089],{"class":340,"line":396},[338,1079,882],{"class":358},[338,1081,918],{"class":578},[338,1083,478],{"class":358},[338,1085,599],{"class":358},[338,1087,1088],{"class":781}," 12.95",[338,1090,895],{"class":358},[338,1092,1093,1095,1097,1099,1101,1104],{"class":340,"line":401},[338,1094,882],{"class":358},[338,1096,934],{"class":578},[338,1098,478],{"class":358},[338,1100,599],{"class":358},[338,1102,1103],{"class":781}," 209",[338,1105,895],{"class":358},[338,1107,1108,1110,1113,1115,1117,1119],{"class":340,"line":408},[338,1109,882],{"class":358},[338,1111,1112],{"class":578},"sugars",[338,1114,478],{"class":358},[338,1116,599],{"class":358},[338,1118,909],{"class":781},[338,1120,895],{"class":358},[338,1122,1123,1125,1128,1130,1132],{"class":340,"line":430},[338,1124,882],{"class":358},[338,1126,1127],{"class":578},"fibre",[338,1129,478],{"class":358},[338,1131,599],{"class":358},[338,1133,972],{"class":781},[338,1135,1136],{"class":340,"line":435},[338,1137,977],{"class":358},[12,1139,1141],{"id":1140},"model-comparison","Model comparison",[17,1143,1144],{},"Let's review now the two models, using the following table as comparison.",[59,1146,1147,1171],{},[62,1148,1149],{},[65,1150,1151,1156,1161,1166],{},[68,1152,1153],{},[71,1154,1155],{},"Model",[68,1157,1158],{},[71,1159,1160],{},"Execution time (5 runs)",[68,1162,1163],{},[71,1164,1165],{},"VRAM Memory footprint",[68,1167,1168],{},[71,1169,1170],{},"Accuracy",[95,1172,1173,1188],{},[65,1174,1175,1177,1180,1185],{},[100,1176,102],{},[100,1178,1179],{},"~ 2.85 seconds",[100,1181,1182],{},[71,1183,1184],{},"~ 2.4 GB",[100,1186,1187],{},"Questionable",[65,1189,1190,1192,1197,1200],{},[100,1191,118],{},[100,1193,1194],{},[71,1195,1196],{},"~ 1.72 seconds",[100,1198,1199],{},"~ 5 GB",[100,1201,1202],{},[71,1203,1204],{},"Pretty much accurate",[17,1206,1207],{},"As expected, Qwen has a larger memory footprint, but it also has a faster response time, which is a key factor for production applications.",[17,1209,1210],{},"Regarding the accuracy, there is a noticeable discrepancy between the two models, both in the JSON schema and the data they return.\nAbout the schema, models with fewer parameters are generally less reliable. In this case, the Llama model sometimes returned a different format between runs.",[17,1212,1213],{},"Regarding the data returned, the prompt used by both the models is the following:",[329,1215,1219],{"className":1216,"code":1217,"language":1218,"meta":334,"style":334},"language-md shiki shiki-themes material-theme-lighter material-theme material-theme-palenight","Please use only the following context to answer the question.\n**Precedence Rule: Always choose the nutritional data for RAW foods if available.**\n\nGet the nutritional data of the following food ingredient: **Salmon fish**.\nCONTEXT OPTIONS:\nproduct name: FISH,SALMON,COHO (SILVER),RAW (ALASKA NATIVE), fat: 5.57, carbohydrates: 0.0, proteins: 22.56, calories: 140, sugars: 0.0, fiber: 0.0 \nproduct name: FISH,SALMON,RED,(SOCKEYE),KIPPERED (ALASKA NATIVE), fat: 4.75, carbohydrates: 0.0, proteins: 24.5, calories: 141, sugars: 0.0, fiber: 0.0 \nproduct name: FISH,SALMON,KING,W/ SKN,KIPPERED,(ALASKA NATIVE), fat: 12.95, carbohydrates: 0.0, proteins: 23.19, calories: 209, sugars: 0.0, fiber: 0.0\n","md",[169,1220,1221,1226,1239,1243,1258,1263,1268,1273],{"__ignoreMap":334},[338,1222,1223],{"class":340,"line":341},[338,1224,1225],{"class":348},"Please use only the following context to answer the question.\n",[338,1227,1228,1232,1236],{"class":340,"line":370},[338,1229,1231],{"class":1230},"sHepR","**",[338,1233,1235],{"class":1234},"so75L","Precedence Rule: Always choose the nutritional data for RAW foods if available.",[338,1237,1238],{"class":1230},"**\n",[338,1240,1241],{"class":340,"line":377},[338,1242,374],{"emptyLinePlaceholder":373},[338,1244,1245,1248,1250,1253,1255],{"class":340,"line":396},[338,1246,1247],{"class":348},"Get the nutritional data of the following food ingredient: ",[338,1249,1231],{"class":1230},[338,1251,1252],{"class":1234},"Salmon fish",[338,1254,1231],{"class":1230},[338,1256,1257],{"class":348},".\n",[338,1259,1260],{"class":340,"line":401},[338,1261,1262],{"class":348},"CONTEXT OPTIONS:\n",[338,1264,1265],{"class":340,"line":408},[338,1266,1267],{"class":348},"product name: FISH,SALMON,COHO (SILVER),RAW (ALASKA NATIVE), fat: 5.57, carbohydrates: 0.0, proteins: 22.56, calories: 140, sugars: 0.0, fiber: 0.0 \n",[338,1269,1270],{"class":340,"line":430},[338,1271,1272],{"class":348},"product name: FISH,SALMON,RED,(SOCKEYE),KIPPERED (ALASKA NATIVE), fat: 4.75, carbohydrates: 0.0, proteins: 24.5, calories: 141, sugars: 0.0, fiber: 0.0 \n",[338,1274,1275],{"class":340,"line":435},[338,1276,1277],{"class":348},"product name: FISH,SALMON,KING,W/ SKN,KIPPERED,(ALASKA NATIVE), fat: 12.95, carbohydrates: 0.0, proteins: 23.19, calories: 209, sugars: 0.0, fiber: 0.0\n",[17,1279,1280,1281,1284],{},"The data extracted from the database (hence from the adopted dataset) are providing three different sample for the salmon, but you may notice that the first one is ",[71,1282,1283],{},"RAW"," and, while Qwen is using it, Llama is usually ignoring it.",[12,1286,1288],{"id":1287},"conclusions-and-future-challenges","Conclusions and future challenges",[17,1290,1291,1292,1294,1295,1297],{},"In conclusion, this research has demonstrated that it is possible to build cost-effective, production-ready LLM solutions by using open-source models and quantization.",[285,1293],{},"\nWhile smaller models might not always match the accuracy of their larger counterparts, they offer a significant advantage in terms of resource consumption and deployment flexibility.",[285,1296],{},"\nThe choice of the right model will always depend on the specific needs of the application, but with the right approach, it is possible to find a balance between performance and cost.",[17,1299,1300],{},"As a future challenge, it would be interesting to explore other quantization techniques and to fine-tune a smaller model on a specific domain to see if it is possible to improve its accuracy while keeping the resource consumption low.",[12,1302,1304],{"id":1303},"bibliography","Bibliography",[30,1306,1307,1315],{},[33,1308,1309],{},[21,1310,1314],{"href":1311,"rel":1312},"https://github.com/federicoibba/nutritional-information-rag/",[1313],"nofollow","Repository project",[33,1316,1317,1318,794,1322],{},"Article photo by ",[21,1319,1321],{"href":1320,"target":206},"https://unsplash.com/@kat_katerina?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText","Katerina",[21,1323,1325],{"href":1324,"target":206},"https://unsplash.com/photos/opened-brown-wooden-window-FQYCJSqER_0?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText","Unsplash",[1327,1328,1329],"style",{},"html pre.shiki code .s7zQu, html code.shiki .s7zQu{--shiki-light:#39ADB5;--shiki-light-font-style:italic;--shiki-default:#89DDFF;--shiki-default-font-style:italic;--shiki-dark:#89DDFF;--shiki-dark-font-style:italic}html pre.shiki code .sTEyZ, html code.shiki .sTEyZ{--shiki-light:#90A4AE;--shiki-default:#EEFFFF;--shiki-dark:#BABED8}html pre.shiki code .sMK4o, html code.shiki .sMK4o{--shiki-light:#39ADB5;--shiki-default:#89DDFF;--shiki-dark:#89DDFF}html pre.shiki code .sfazB, html code.shiki .sfazB{--shiki-light:#91B859;--shiki-default:#C3E88D;--shiki-dark:#C3E88D}html pre.shiki code .sHwdD, html code.shiki .sHwdD{--shiki-light:#90A4AE;--shiki-light-font-style:italic;--shiki-default:#546E7A;--shiki-default-font-style:italic;--shiki-dark:#676E95;--shiki-dark-font-style:italic}html pre.shiki code .s2Zo4, html code.shiki .s2Zo4{--shiki-light:#6182B8;--shiki-default:#82AAFF;--shiki-dark:#82AAFF}html pre.shiki code .sHdIc, html code.shiki .sHdIc{--shiki-light:#90A4AE;--shiki-light-font-style:italic;--shiki-default:#EEFFFF;--shiki-default-font-style:italic;--shiki-dark:#BABED8;--shiki-dark-font-style:italic}html .light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html.light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .spNyl, html code.shiki .spNyl{--shiki-light:#9C3EDA;--shiki-default:#C792EA;--shiki-dark:#C792EA}html pre.shiki code .sBMFI, html code.shiki .sBMFI{--shiki-light:#E2931D;--shiki-default:#FFCB6B;--shiki-dark:#FFCB6B}html pre.shiki code .sbssI, html code.shiki .sbssI{--shiki-light:#F76D47;--shiki-default:#F78C6C;--shiki-dark:#F78C6C}html pre.shiki code .sHepR, html code.shiki .sHepR{--shiki-light:#39ADB5;--shiki-light-font-weight:bold;--shiki-default:#89DDFF;--shiki-default-font-weight:bold;--shiki-dark:#89DDFF;--shiki-dark-font-weight:bold}html pre.shiki code .so75L, html code.shiki .so75L{--shiki-light:#E53935;--shiki-light-font-weight:bold;--shiki-default:#F07178;--shiki-default-font-weight:bold;--shiki-dark:#F07178;--shiki-dark-font-weight:bold}",{"title":334,"searchDepth":370,"depth":370,"links":1331},[1332,1333,1334,1335,1336,1337,1338,1339,1340,1341],{"id":14,"depth":377,"text":15},{"id":47,"depth":377,"text":48},{"id":177,"depth":377,"text":178},{"id":247,"depth":377,"text":248},{"id":297,"depth":377,"text":298},{"id":540,"depth":377,"text":541},{"id":980,"depth":377,"text":981},{"id":1140,"depth":377,"text":1141},{"id":1287,"depth":377,"text":1288},{"id":1303,"depth":377,"text":1304},"2025-11-02T00:00:00.000Z","A Nutritional RAG LLM using open-source models and quantization","/images/articles/open-source-rag.jpg","en",{},"/en/articles/open-source-rag-quantization",{"title":6,"description":1343},"en/articles/3.open-source-rag-quantization",[1351,1352,1353,1354,1355],"LLM","RAG","Llama","Qwen","quantization","Lo2MXPzVpj-GzrH9kyYVbOxrI4Bf1UkEV8CmHuNpZ6M",{"id":1358,"title":1359,"alt":1360,"body":1361,"date":2140,"description":2141,"extension":1218,"image":2142,"locale":1345,"meta":2143,"navigation":373,"path":2144,"seo":2145,"stem":2146,"tags":2147,"__hash__":2150},"articles/en/articles/2.a-nutritional-llm-assistant-using-rag.md","A nutritional LLM assistant using RAG and OpenAI","Nutritional RAG LLM using OpenAI API",{"type":9,"value":1362,"toc":2130},[1363,1365,1372,1375,1383,1387,1390,1393,1408,1411,1414,1418,1426,1442,1449,1453,1460,1463,1493,1496,1504,1511,1515,1524,1527,1595,1599,1607,1842,1845,1849,1867,1964,1968,1993,2056,2062,2066,2075,2077,2127],[12,1364,15],{"id":14},[17,1366,1367,1371],{},[1368,1369,1370],"b",{},"Large Language Models"," and Generative AI are becoming increasingly prevalent, and they have profoundly changed our lives. For us as software engineers, this means our jobs are evolving rapidly. We now have many tools at our disposal, from automated code reviews to smart coding agents, that assist us in our daily tasks.",[17,1373,1374],{},"Considering that the ecosystem is huge and changing rapidly, it's easy to see how AI and LLMs will be part of our lives for many years to come. Therefore, it's important to have a clear understanding of the capabilities of these tools to get the most out of them.",[17,1376,1377,1378,1382],{},"For these reasons, in the last couple of months, I took a course about LLM Engineering on ",[21,1379,1381],{"href":1380,"target":206},"https://www.udemy.com/course/llm-engineering-master-ai-and-large-language-models/","Udemy"," to begin to understand LLMs: how do they work, how to interact with them, how to create a production-ready and cost-effective application.",[12,1384,1386],{"id":1385},"the-project-and-the-challenges","The project and the challenges",[17,1388,1389],{},"When I'm learning something new, my typical approach is to build something to gain a better comprehension of the topic. During the course, I immediately realized what my first project could be: an LLM capable of returning food nutritional information. I've always been passionate about health and food, so this was an opportunity to build a capable assistant that could help me count my daily macronutrients and calories.",[17,1391,1392],{},"The main challenge is related to the cost of the application. While it would only take a few minutes to write the right prompt and get a proper answer from GPT, the OpenAI API has a cost. Moreover, it's not guaranteed that LLMs have been pre-trained with nutritional facts. This led to two main choices when considering the right strategy:",[1394,1395,1396,1402],"ol",{},[33,1397,1398,1401],{},[1368,1399,1400],{},"Retrieval-Augmented Generation (RAG)",": this is a technique in which a prompt is enhanced with more context before is sent to the LLM. In this way, the LLM can extract the right information needed by the user from a larger and enriched context;",[33,1403,1404,1407],{},[1368,1405,1406],{},"Fine-tuning",": this is the process of taking a pretrained model and specialize it to a specific domain.",[17,1409,1410],{},"Both solutions come with advantages and disadvantages. A RAG is easier to implement and its data source is simpler to update, but it strictly depends on the quality of the data retrieved. On the other hand, a fine-tuned model could be more accurate, but the training requires a lot of time and resources, and updating the model with new data is very difficult.",[17,1412,1413],{},"Considering that the solution should have the least possible impact on costs and resources, I decided to implement a RAG.",[12,1415,1417],{"id":1416},"dataset-preprocessing","Dataset preprocessing",[17,1419,1420,1421,1425],{},"Before starting, it's important to find the right dataset for the project. After some research, I decided to adopt a ",[21,1422,1424],{"href":1423,"target":206},"https://www.kaggle.com/datasets/shrutisaxena/food-nutrition-dataset","Kaggle dataset"," that contained all the information I needed. For each food, considering a 100g serving, it provides information about:",[30,1427,1428,1430,1433,1435,1437,1440],{},[33,1429,902],{},[33,1431,1432],{},"proteins",[33,1434,918],{},[33,1436,950],{},[33,1438,1439],{},"fibers",[33,1441,934],{},[17,1443,1444,1445,174],{},"The dataset didn't need any other preprocessing besides selecting and renaming the relevant columns. At this point, the dataset was ready. If you want to see how this was done, you can refer to the relevant ",[21,1446,1448],{"href":1447,"target":206},"https://github.com/federicoibba/nutritional-information-rag/blob/main/notebooks/0_dataset-food.ipynb","Jupyter Notebook",[12,1450,1452],{"id":1451},"vector-store","Vector store",[17,1454,1455,1456,1459],{},"In a RAG system, a prompt is enriched with context from an external data source. This data is converted into numerical vectors, a process called ",[1368,1457,1458],{},"vectorization",". These vectors are then stored and retrieved, allowing the system to understand and work with the data based on its semantic meaning rather than just keywords.",[17,1461,1462],{},"There are many options available when choosing the right vector store, some of them are:",[30,1464,1465,1481,1487],{},[33,1466,1467,1471,1472,234,1476,1480],{},[1468,1469,1470],"em",{},"self-hosted database",": solutions like ",[21,1473,1475],{"href":1474,"target":206},"https://www.trychroma.com/","Chroma",[21,1477,1479],{"href":1478,"target":206},"https://milvus.io/","Milvus"," allow the creation of in-memory stores that facilitate prototyping;",[33,1482,1483,1486],{},[1468,1484,1485],{},"managed cloud services",": there are hosted platforms like MongoDB that offer a native vector search feature inside the cloud service;",[33,1488,1489,1492],{},[1468,1490,1491],{},"extensions for traditional databases",": databases like PostgreSQL and Redis offer integration to store and retrieve vectors.",[17,1494,1495],{},"I decided to go with MongoDB. Its free tier storage (512MB) is more than enough for my use case (in the end, I only stored 36MB of data), the documentation is excellent, and it's really easy to integrate. At this point, what is just needed is:",[30,1497,1498,1501],{},[33,1499,1500],{},"to populate the collection with the vectors;",[33,1502,1503],{},"to create the vector search index to retrieve the vectors.",[17,1505,1506,1507,1510],{},"For some spoilers, you can refer to the relevant ",[21,1508,1448],{"href":1509,"target":206},"https://github.com/federicoibba/nutritional-information-rag/blob/main/notebooks/1_create_vectorstore.ipynb",", where I explain how everything works.",[259,1512,1514],{"id":1513},"populate-the-collection","Populate the collection",[17,1516,1517,1518,174,1521,1523],{},"First of all, the documents we have need to be transformed into vectors, so each row of the dataset is encoded using a ",[71,1519,1520],{},"SentenceTransformer",[285,1522],{},"\nWith bigger documents, an important step is to split them into chunks, deciding on an overlapping window context to help with index retrieval. However, since the documents are small, this step is not needed here.",[17,1525,1526],{},"To populate the collection, the following code is needed:",[329,1528,1530],{"className":331,"code":1529,"language":333,"meta":334,"style":334},"client = MongoClient(uri)\ncollection = client[DB_NAME][COLLECTION_NAME]\nresult = collection.insert_many(docs_to_insert)\n",[169,1531,1532,1549,1574],{"__ignoreMap":334},[338,1533,1534,1537,1539,1542,1544,1547],{"class":340,"line":341},[338,1535,1536],{"class":348},"client ",[338,1538,383],{"class":358},[338,1540,1541],{"class":416}," MongoClient",[338,1543,420],{"class":358},[338,1545,1546],{"class":416},"uri",[338,1548,505],{"class":358},[338,1550,1551,1554,1556,1559,1562,1565,1568,1571],{"class":340,"line":370},[338,1552,1553],{"class":348},"collection ",[338,1555,383],{"class":358},[338,1557,1558],{"class":348}," client",[338,1560,1561],{"class":358},"[",[338,1563,1564],{"class":348},"DB_NAME",[338,1566,1567],{"class":358},"][",[338,1569,1570],{"class":348},"COLLECTION_NAME",[338,1572,1573],{"class":358},"]\n",[338,1575,1576,1578,1580,1583,1585,1588,1590,1593],{"class":340,"line":377},[338,1577,755],{"class":348},[338,1579,383],{"class":358},[338,1581,1582],{"class":348}," collection",[338,1584,174],{"class":358},[338,1586,1587],{"class":416},"insert_many",[338,1589,420],{"class":358},[338,1591,1592],{"class":416},"docs_to_insert",[338,1594,505],{"class":358},[259,1596,1598],{"id":1597},"create-the-database-index","Create the database index",[17,1600,1601,1602,1606],{},"To query our RAG database, a Vector Search Index is created inside MongoDB. As the ",[21,1603,1605],{"href":1604,"target":206},"https://www.mongodb.com/docs/atlas/atlas-vector-search/rag/#use-mongodb-vector-search-to-retrieve-documents.-4","Atlas documentation explains",", this is how it's created:",[329,1608,1610],{"className":331,"code":1609,"language":333,"meta":334,"style":334},"from pymongo.operations import SearchIndexModel\nimport time\n\nindex_name=\"vector_index\"\nsearch_index_model = SearchIndexModel(\n  definition = {\n    \"fields\": [\n      {\n        \"type\": \"vector\",\n        \"numDimensions\": 384,\n        \"path\": \"embedding\",\n        \"similarity\": \"cosine\"\n      }\n    ]\n  },\n  name = index_name,\n  type = \"vectorSearch\"\n)\ncollection.create_search_index(model=search_index_model)\n",[169,1611,1612,1629,1636,1640,1654,1666,1677,1692,1697,1718,1734,1754,1772,1777,1782,1787,1799,1814,1819],{"__ignoreMap":334},[338,1613,1614,1616,1619,1621,1624,1626],{"class":340,"line":341},[338,1615,345],{"class":344},[338,1617,1618],{"class":348}," pymongo",[338,1620,174],{"class":358},[338,1622,1623],{"class":348},"operations ",[338,1625,352],{"class":344},[338,1627,1628],{"class":348}," SearchIndexModel\n",[338,1630,1631,1633],{"class":340,"line":370},[338,1632,352],{"class":344},[338,1634,1635],{"class":348}," time\n",[338,1637,1638],{"class":340,"line":377},[338,1639,374],{"emptyLinePlaceholder":373},[338,1641,1642,1645,1647,1649,1652],{"class":340,"line":396},[338,1643,1644],{"class":348},"index_name",[338,1646,383],{"class":358},[338,1648,478],{"class":358},[338,1650,1651],{"class":389},"vector_index",[338,1653,393],{"class":358},[338,1655,1656,1659,1661,1664],{"class":340,"line":401},[338,1657,1658],{"class":348},"search_index_model ",[338,1660,383],{"class":358},[338,1662,1663],{"class":416}," SearchIndexModel",[338,1665,456],{"class":358},[338,1667,1668,1671,1674],{"class":340,"line":408},[338,1669,1670],{"class":423},"  definition",[338,1672,1673],{"class":358}," =",[338,1675,1676],{"class":358}," {\n",[338,1678,1679,1682,1685,1687,1689],{"class":340,"line":430},[338,1680,1681],{"class":358},"    \"",[338,1683,1684],{"class":389},"fields",[338,1686,478],{"class":358},[338,1688,599],{"class":358},[338,1690,1691],{"class":358}," [\n",[338,1693,1694],{"class":340,"line":435},[338,1695,1696],{"class":358},"      {\n",[338,1698,1699,1702,1705,1707,1709,1711,1714,1716],{"class":340,"line":441},[338,1700,1701],{"class":358},"        \"",[338,1703,1704],{"class":389},"type",[338,1706,478],{"class":358},[338,1708,599],{"class":358},[338,1710,386],{"class":358},[338,1712,1713],{"class":389},"vector",[338,1715,478],{"class":358},[338,1717,895],{"class":358},[338,1719,1720,1722,1725,1727,1729,1732],{"class":340,"line":459},[338,1721,1701],{"class":358},[338,1723,1724],{"class":389},"numDimensions",[338,1726,478],{"class":358},[338,1728,599],{"class":358},[338,1730,1731],{"class":781}," 384",[338,1733,895],{"class":358},[338,1735,1736,1738,1741,1743,1745,1747,1750,1752],{"class":340,"line":470},[338,1737,1701],{"class":358},[338,1739,1740],{"class":389},"path",[338,1742,478],{"class":358},[338,1744,599],{"class":358},[338,1746,386],{"class":358},[338,1748,1749],{"class":389},"embedding",[338,1751,478],{"class":358},[338,1753,895],{"class":358},[338,1755,1756,1758,1761,1763,1765,1767,1770],{"class":340,"line":491},[338,1757,1701],{"class":358},[338,1759,1760],{"class":389},"similarity",[338,1762,478],{"class":358},[338,1764,599],{"class":358},[338,1766,386],{"class":358},[338,1768,1769],{"class":389},"cosine",[338,1771,393],{"class":358},[338,1773,1774],{"class":340,"line":502},[338,1775,1776],{"class":358},"      }\n",[338,1778,1779],{"class":340,"line":508},[338,1780,1781],{"class":358},"    ]\n",[338,1783,1784],{"class":340,"line":513},[338,1785,1786],{"class":358},"  },\n",[338,1788,1789,1792,1794,1797],{"class":340,"line":519},[338,1790,1791],{"class":423},"  name",[338,1793,1673],{"class":358},[338,1795,1796],{"class":416}," index_name",[338,1798,895],{"class":358},[338,1800,1802,1805,1807,1809,1812],{"class":340,"line":1801},17,[338,1803,1804],{"class":423},"  type",[338,1806,1673],{"class":358},[338,1808,386],{"class":358},[338,1810,1811],{"class":389},"vectorSearch",[338,1813,393],{"class":358},[338,1815,1817],{"class":340,"line":1816},18,[338,1818,505],{"class":358},[338,1820,1822,1825,1827,1830,1832,1835,1837,1840],{"class":340,"line":1821},19,[338,1823,1824],{"class":348},"collection",[338,1826,174],{"class":358},[338,1828,1829],{"class":416},"create_search_index",[338,1831,420],{"class":358},[338,1833,1834],{"class":423},"model",[338,1836,383],{"class":358},[338,1838,1839],{"class":416},"search_index_model",[338,1841,505],{"class":358},[17,1843,1844],{},"Note that the index needs to know the vector dimension (384 in this example) and the similarity metric (cosine similarity in this case) to query the data effectively.",[12,1846,1848],{"id":1847},"open-ai-notebook","Open AI Notebook",[17,1850,1851,1852,174,1856,1858,1859,1862,1863,1866],{},"The implementation of the concepts discussed earlier can be found in this ",[21,1853,1855],{"href":1854,"target":206},"https://github.com/federicoibba/nutritional-information-rag/blob/main/notebooks/2.0_open-ai.ipynb","notebook",[285,1857],{},"\nThe RAG process involves querying MongoDB with a given food item, and the results are then passed to the ",[71,1860,1861],{},"gpt-4o-mini"," model. As requested in the prompt, the model returns a JSON object, which can be easily consumed by a RESTful service. For example, an input of ",[71,1864,1865],{},"salmon fish"," will produce the following output:",[329,1868,1870],{"className":869,"code":1869,"language":871,"meta":334,"style":334},"{\n  \"protein\": 22.56,\n  \"carbohydrates\": 0,\n  \"fats\": 5.57,\n  \"calories\": 140,\n  \"sugars\": 0,\n  \"fibers\": 0\n}\n",[169,1871,1872,1876,1890,1905,1919,1933,1947,1960],{"__ignoreMap":334},[338,1873,1874],{"class":340,"line":341},[338,1875,850],{"class":358},[338,1877,1878,1880,1882,1884,1886,1888],{"class":340,"line":370},[338,1879,882],{"class":358},[338,1881,885],{"class":578},[338,1883,478],{"class":358},[338,1885,599],{"class":358},[338,1887,892],{"class":781},[338,1889,895],{"class":358},[338,1891,1892,1894,1896,1898,1900,1903],{"class":340,"line":377},[338,1893,882],{"class":358},[338,1895,902],{"class":578},[338,1897,478],{"class":358},[338,1899,599],{"class":358},[338,1901,1902],{"class":781}," 0",[338,1904,895],{"class":358},[338,1906,1907,1909,1911,1913,1915,1917],{"class":340,"line":396},[338,1908,882],{"class":358},[338,1910,918],{"class":578},[338,1912,478],{"class":358},[338,1914,599],{"class":358},[338,1916,925],{"class":781},[338,1918,895],{"class":358},[338,1920,1921,1923,1925,1927,1929,1931],{"class":340,"line":401},[338,1922,882],{"class":358},[338,1924,934],{"class":578},[338,1926,478],{"class":358},[338,1928,599],{"class":358},[338,1930,941],{"class":781},[338,1932,895],{"class":358},[338,1934,1935,1937,1939,1941,1943,1945],{"class":340,"line":408},[338,1936,882],{"class":358},[338,1938,1112],{"class":578},[338,1940,478],{"class":358},[338,1942,599],{"class":358},[338,1944,1902],{"class":781},[338,1946,895],{"class":358},[338,1948,1949,1951,1953,1955,1957],{"class":340,"line":430},[338,1950,882],{"class":358},[338,1952,1439],{"class":578},[338,1954,478],{"class":358},[338,1956,599],{"class":358},[338,1958,1959],{"class":781}," 0\n",[338,1961,1962],{"class":340,"line":435},[338,1963,977],{"class":358},[12,1965,1967],{"id":1966},"release-the-rag-as-a-serverless-restful-api","Release the RAG as a Serverless Restful API",[17,1969,1970,1971,1973,1974,1977,1978,1980,1981,1985,1986,1988,1989,1992],{},"You may have thought this was the end of the article, but the main point of all this was to understand how a production-ready application could be effectively released and used in the market.",[285,1972],{},"\nTo achieve this goal, I've used ",[21,1975,1976],{"href":797,"target":206},"modal.com",", a serverless compute platform designed for AI and Machine Learning applications. Their free tier was enough for my experiments (they give from 5 to 30 dollars of free credits monthly), so it was good to go.",[285,1979],{},"\nHere is the ",[21,1982,1984],{"href":1983,"target":206},"https://github.com/federicoibba/nutritional-information-rag/blob/main/services/openai-api.py","final outcome",", a service easily deployed on the internet and ready to be used for a nutritional application. What I've done here is just taking the code blocks from the notebooks and put everything together using Modal APIs and FastAPI nomenclature.",[285,1987],{},"\nIn this case, to avoid exposing my OpenAI API Key, I've added a header called ",[71,1990,1991],{},"X-Open-AI-Api-Key"," that anyone can fill with their proper API key. This is an example request that will return the JSON stated above:",[329,1994,1996],{"className":802,"code":1995,"language":804,"meta":334,"style":334},"curl --location 'https://ibbus93--nutritional-rag-service-openai-nutritionalragse-f9d7ea.modal.run' \\\n--header 'Content-Type: application/json' \\\n--header 'X-Open-AI-Api-Key: sk-proj-your-api-key' \\\n--data '{\n    \"description\": \"Salmon fish\"\n}'\n",[169,1997,1998,2013,2025,2038,2046,2050],{"__ignoreMap":334},[338,1999,2000,2002,2004,2006,2009,2011],{"class":340,"line":341},[338,2001,811],{"class":582},[338,2003,814],{"class":389},[338,2005,817],{"class":358},[338,2007,2008],{"class":389},"https://ibbus93--nutritional-rag-service-openai-nutritionalragse-f9d7ea.modal.run",[338,2010,823],{"class":358},[338,2012,826],{"class":348},[338,2014,2015,2017,2019,2021,2023],{"class":340,"line":370},[338,2016,831],{"class":348},[338,2018,823],{"class":358},[338,2020,836],{"class":389},[338,2022,823],{"class":358},[338,2024,826],{"class":348},[338,2026,2027,2029,2031,2034,2036],{"class":340,"line":377},[338,2028,831],{"class":348},[338,2030,823],{"class":358},[338,2032,2033],{"class":389},"X-Open-AI-Api-Key: sk-proj-your-api-key",[338,2035,823],{"class":358},[338,2037,826],{"class":348},[338,2039,2040,2042,2044],{"class":340,"line":396},[338,2041,845],{"class":348},[338,2043,823],{"class":358},[338,2045,850],{"class":389},[338,2047,2048],{"class":340,"line":401},[338,2049,855],{"class":389},[338,2051,2052,2054],{"class":340,"line":408},[338,2053,860],{"class":389},[338,2055,863],{"class":358},[17,2057,2058,2059,174],{},"In any case, for more details about the implementation and the example, please refer to my ",[21,2060,2061],{"href":1311,"target":206},"repository project",[12,2063,2065],{"id":2064},"final-consideration","Final consideration",[17,2067,2068,2069,2071,2072,2074],{},"I've spent weeks studying the world of LLMs on Udemy and trying to apply the concepts to this fun RAG project. Using the power provided by LLM, the applications are endless, limited only by human imagination.",[285,2070],{},"\nIn addition to the OpenAI-based RAG described here, I've also tested the application with open-source models like Llama and Qwen. This approach could potentially reduce costs compared to using OpenAI's API, but it would require introducing several new concepts (like transformers and tokenizers). I also believe the solution's accuracy can be slightly improved, so I'll save that discussion for a future article.",[285,2073],{},"\nIf you've made it this far, thank you for reading. I appreciate your attention! :)",[12,2076,1304],{"id":1303},[30,2078,2079,2084,2090,2097,2104,2111,2118],{},[33,2080,2081],{},[21,2082,1314],{"href":1311,"rel":2083},[1313],[33,2085,2086],{},[21,2087,2089],{"href":1423,"rel":2088},[1313],"Dataset source",[33,2091,2092],{},[21,2093,2096],{"href":2094,"rel":2095},"https://www.mongodb.com/docs/atlas/atlas-vector-search/rag/#std-label-avs-rag",[1313],"RAG with Atlas",[33,2098,2099],{},[21,2100,2103],{"href":2101,"rel":2102},"https://www.databricks.com/glossary/retrieval-augmented-generation-rag",[1313],"What is Retrieval Augmented Generation",[33,2105,2106],{},[21,2107,2110],{"href":2108,"rel":2109},"https://huggingface.co/learn/cookbook/advanced_rag",[1313],"RAG HuggingFace explanation",[33,2112,2113],{},[21,2114,2117],{"href":2115,"rel":2116},"https://modal.com/docs",[1313],"Modal.com documentation",[33,2119,1317,2120,794,2124],{},[21,2121,2123],{"href":2122,"target":206},"https://unsplash.com/@bamin?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash","Pierre Bamin",[21,2125,1325],{"href":2126,"target":206},"https://unsplash.com/photos/used-paint-brushes-RwccoChIGB8?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash",[1327,2128,2129],{},"html pre.shiki code .sMK4o, html code.shiki .sMK4o{--shiki-light:#39ADB5;--shiki-default:#89DDFF;--shiki-dark:#89DDFF}html pre.shiki code .spNyl, html code.shiki .spNyl{--shiki-light:#9C3EDA;--shiki-default:#C792EA;--shiki-dark:#C792EA}html pre.shiki code .sbssI, html code.shiki .sbssI{--shiki-light:#F76D47;--shiki-default:#F78C6C;--shiki-dark:#F78C6C}html .light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html.light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .sBMFI, html code.shiki .sBMFI{--shiki-light:#E2931D;--shiki-default:#FFCB6B;--shiki-dark:#FFCB6B}html pre.shiki code .sfazB, html code.shiki .sfazB{--shiki-light:#91B859;--shiki-default:#C3E88D;--shiki-dark:#C3E88D}html pre.shiki code .sTEyZ, html code.shiki .sTEyZ{--shiki-light:#90A4AE;--shiki-default:#EEFFFF;--shiki-dark:#BABED8}html pre.shiki code .s2Zo4, html code.shiki .s2Zo4{--shiki-light:#6182B8;--shiki-default:#82AAFF;--shiki-dark:#82AAFF}html pre.shiki code .s7zQu, html code.shiki .s7zQu{--shiki-light:#39ADB5;--shiki-light-font-style:italic;--shiki-default:#89DDFF;--shiki-default-font-style:italic;--shiki-dark:#89DDFF;--shiki-dark-font-style:italic}html pre.shiki code .sHdIc, html code.shiki .sHdIc{--shiki-light:#90A4AE;--shiki-light-font-style:italic;--shiki-default:#EEFFFF;--shiki-default-font-style:italic;--shiki-dark:#BABED8;--shiki-dark-font-style:italic}",{"title":334,"searchDepth":370,"depth":370,"links":2131},[2132,2133,2134,2135,2136,2137,2138,2139],{"id":14,"depth":377,"text":15},{"id":1385,"depth":377,"text":1386},{"id":1416,"depth":377,"text":1417},{"id":1451,"depth":377,"text":1452},{"id":1847,"depth":377,"text":1848},{"id":1966,"depth":377,"text":1967},{"id":2064,"depth":377,"text":2065},{"id":1303,"depth":377,"text":1304},"2025-09-15T00:00:00.000Z","How to implement a production-ready LLM that returns food nutritional information using RAG","/images/articles/nutritional-rag-llm.jpg",{},"/en/articles/a-nutritional-llm-assistant-using-rag",{"title":1359,"description":2141},"en/articles/2.a-nutritional-llm-assistant-using-rag",[1351,1352,2148,2149],"OpenAI","MongoDB","pjUasjEuuuyH9XLXtE8j3JMAsLO9wriqvPlaoysx7y4",{"id":2152,"title":2153,"alt":2154,"body":2155,"date":2680,"description":2681,"extension":1218,"image":2682,"locale":1345,"meta":2683,"navigation":373,"path":2684,"seo":2685,"stem":2686,"tags":2687,"__hash__":2693},"articles/en/articles/1.monorepo-versioning-with-javascript-and-lerna.md","Versioning multiple libraries using a monorepo approach in JavaScript","Versioning libraries in monorepo",{"type":9,"value":2156,"toc":2672},[2157,2159,2165,2168,2188,2192,2207,2211,2221,2246,2250,2253,2261,2269,2273,2278,2294,2299,2312,2316,2319,2335,2338,2352,2358,2461,2464,2498,2512,2541,2545,2548,2568,2571,2579,2588,2592,2604,2607,2623,2626,2632,2636,2639,2650,2653,2655,2669],[12,2158,15],{"id":14},[17,2160,2161,2162,174],{},"When it comes to create a new project, many decisions needs to be taken even before start typing some code. Regarding the project organization, more and more projects today choose to rely on a ",[71,2163,2164],{},"monorepo approach",[17,2166,2167],{},"A monorepo is a software development strategy in which multiple projects are stored under the same repository. This approach delivers multiple benefits, some of them are:",[30,2169,2170,2176,2182],{},[33,2171,2172,2175],{},[71,2173,2174],{},"code sharing and reusability",": it becomes easier to share code under the same repository, especially when using shared libraries;",[33,2177,2178,2181],{},[71,2179,2180],{},"simplify dependency management",": if multiple projects rely on the same library, this can reduce compatibility issues and it can be download just once instead of multiple times;",[33,2183,2184,2187],{},[71,2185,2186],{},"enhanced collaboration",": all team member shares the same codebase, improving the understanding and knowledge sharing.",[12,2189,2191],{"id":2190},"challenges","Challenges",[17,2193,2194,2195,2197,2198,234,2202,2206],{},"With great power comes great challenges, one of them is related to the library versioning.",[285,2196],{},"\nWhen you face with a single library it's easy to choose: you can rely on semver rules and just pick one library available in the plethora of release libraries, like ",[21,2199,2201],{"href":2200,"target":206},"https://github.com/semantic-release/semantic-release","Semantic Release",[21,2203,2205],{"href":2204,"target":206},"https://github.com/absolute-version/commit-and-tag-version","Commit and Tag Version",".\nWith monorepos is not as easy as it seems, as you may decide between having multiple independent version or a single one common between everyone.",[12,2208,2210],{"id":2209},"available-options","Available options",[17,2212,2213,2214,171,2217,2220],{},"Let's pretend we have a monorepo with two libraries, ",[71,2215,2216],{},"@monorepo/core",[71,2218,2219],{},"@monorepo/client",". The two main options available when deciding how to version multiple libraries are:",[1394,2222,2223,2235],{},[33,2224,2225,2228,2229,2231,2232,2234],{},[71,2226,2227],{},"version multiple libraries with individual version",": each library version is independent from the other, this can lead to version discrepancy and difficulties\nin choosing the right version for the user. In this scenario our libraries could have versions like: - ",[1468,2230,2216],{},": 1.0.3 - ",[1468,2233,2219],{},": 3.8.18",[33,2236,2237,2240,2241,2243,2244,2234],{},[71,2238,2239],{},"version multiple libraries using a common version",": adopting this approach, each library in the monorepo holds the same version, solving the\nchallenges encountered in the previous point. At the same time, however, every library update always triggers an update for all the libraries. With this option, our libraries versions would always be like: - ",[1468,2242,2216],{},": 3.8.18 - ",[1468,2245,2219],{},[12,2247,2249],{"id":2248},"make-it-work-with-lerna","Make it work with Lerna",[17,2251,2252],{},"During my daily work, with my team we initially adopted the first strategy, but it was causing us a lot of troubles because it was really hard to understand time by time which version of what library should have been installed in some project.\nAfter some research, we shifted to the second strategy.",[17,2254,2255,2256,2260],{},"We also needed a smart tool that could help us in realizing this new approach, especially in an already existing project, so we ended up adopting ",[21,2257,2259],{"href":2258,"target":206},"https://lerna.js.org/","Lerna",",\nwhich is one of the most famous and adopted tools in the monorepo world.",[17,2262,2263,2264,2268],{},"The following hands-on will help understand how to solve this approach leveraging a library like Lerna without really either publishing or releasing anything. A demo project is available ",[21,2265,2267],{"href":2266,"target":206},"https://github.com/federicoibba/lerna-demo-project","here"," and will provide an existing repository for starting using Lerna.",[259,2270,2272],{"id":2271},"before-to-start","Before to start",[1394,2274,2275],{},[33,2276,2277],{},"Clone the project",[329,2279,2281],{"className":802,"code":2280,"language":804,"meta":334,"style":334},"git clone https://github.com/federicoibba/lerna-demo-project.git\n",[169,2282,2283],{"__ignoreMap":334},[338,2284,2285,2288,2291],{"class":340,"line":341},[338,2286,2287],{"class":582},"git",[338,2289,2290],{"class":389}," clone",[338,2292,2293],{"class":389}," https://github.com/federicoibba/lerna-demo-project.git\n",[1394,2295,2296],{"start":370},[33,2297,2298],{},"Install the dependencies",[329,2300,2302],{"className":802,"code":2301,"language":804,"meta":334,"style":334},"npm i\n",[169,2303,2304],{"__ignoreMap":334},[338,2305,2306,2309],{"class":340,"line":341},[338,2307,2308],{"class":582},"npm",[338,2310,2311],{"class":389}," i\n",[259,2313,2315],{"id":2314},"setup-lerna","Setup Lerna",[17,2317,2318],{},"To install Lerna in the project, you just need to run:",[329,2320,2322],{"className":802,"code":2321,"language":804,"meta":334,"style":334},"npx lerna init\n",[169,2323,2324],{"__ignoreMap":334},[338,2325,2326,2329,2332],{"class":340,"line":341},[338,2327,2328],{"class":582},"npx",[338,2330,2331],{"class":389}," lerna",[338,2333,2334],{"class":389}," init\n",[17,2336,2337],{},"This command will:",[30,2339,2340,2343],{},[33,2341,2342],{},"install the needed dependencies",[33,2344,2345,2346,2351],{},"create a ",[21,2347,2350],{"href":2348,"rel":2349},"https://lerna.js.org/docs/api-reference/configuration",[1313],"lerna.json"," file used as configuration file.",[17,2353,2354,2355,2357],{},"Edit the ",[169,2356,2350],{}," in order to have the following structure:",[329,2359,2361],{"className":869,"code":2360,"filename":2350,"language":871,"meta":334,"style":334},"{\n  \"$schema\": \"node_modules/lerna/schemas/lerna-schema.json\",\n  \"version\": \"0.0.0\",\n  \"command\": {\n    \"version\": {\n      \"conventionalCommits\": true\n    }\n  }\n}\n",[169,2362,2363,2367,2387,2407,2420,2432,2447,2452,2457],{"__ignoreMap":334},[338,2364,2365],{"class":340,"line":341},[338,2366,850],{"class":358},[338,2368,2369,2371,2374,2376,2378,2380,2383,2385],{"class":340,"line":370},[338,2370,882],{"class":358},[338,2372,2373],{"class":578},"$schema",[338,2375,478],{"class":358},[338,2377,599],{"class":358},[338,2379,386],{"class":358},[338,2381,2382],{"class":389},"node_modules/lerna/schemas/lerna-schema.json",[338,2384,478],{"class":358},[338,2386,895],{"class":358},[338,2388,2389,2391,2394,2396,2398,2400,2403,2405],{"class":340,"line":377},[338,2390,882],{"class":358},[338,2392,2393],{"class":578},"version",[338,2395,478],{"class":358},[338,2397,599],{"class":358},[338,2399,386],{"class":358},[338,2401,2402],{"class":389},"0.0.0",[338,2404,478],{"class":358},[338,2406,895],{"class":358},[338,2408,2409,2411,2414,2416,2418],{"class":340,"line":396},[338,2410,882],{"class":358},[338,2412,2413],{"class":578},"command",[338,2415,478],{"class":358},[338,2417,599],{"class":358},[338,2419,1676],{"class":358},[338,2421,2422,2424,2426,2428,2430],{"class":340,"line":401},[338,2423,1681],{"class":358},[338,2425,2393],{"class":582},[338,2427,478],{"class":358},[338,2429,599],{"class":358},[338,2431,1676],{"class":358},[338,2433,2434,2437,2440,2442,2444],{"class":340,"line":408},[338,2435,2436],{"class":358},"      \"",[338,2438,2439],{"class":781},"conventionalCommits",[338,2441,478],{"class":358},[338,2443,599],{"class":358},[338,2445,2446],{"class":358}," true\n",[338,2448,2449],{"class":340,"line":430},[338,2450,2451],{"class":358},"    }\n",[338,2453,2454],{"class":340,"line":435},[338,2455,2456],{"class":358},"  }\n",[338,2458,2459],{"class":340,"line":441},[338,2460,977],{"class":358},[17,2462,2463],{},"Then, commit the changes with, for instance:",[329,2465,2467],{"className":802,"code":2466,"language":804,"meta":334,"style":334},"git add . && git commit -m \"build: setup lerna\"\n",[169,2468,2469],{"__ignoreMap":334},[338,2470,2471,2473,2476,2479,2482,2485,2488,2491,2493,2496],{"class":340,"line":341},[338,2472,2287],{"class":582},[338,2474,2475],{"class":389}," add",[338,2477,2478],{"class":389}," .",[338,2480,2481],{"class":358}," &&",[338,2483,2484],{"class":582}," git",[338,2486,2487],{"class":389}," commit",[338,2489,2490],{"class":389}," -m",[338,2492,386],{"class":358},[338,2494,2495],{"class":389},"build: setup lerna",[338,2497,393],{"class":358},[17,2499,2500,2501,2503,2504,2507,2508,2511],{},"Let's pretend that we need a do an urgent fix in ",[169,2502,2216],{},", because the epoch was needed, not a string date.\nGo to ",[169,2505,2506],{},"/core/index.js"," and change row 20, returning instead ",[169,2509,2510],{},"new Date().getTime()",", then commit using a fix commit message:",[329,2513,2515],{"className":802,"code":2514,"language":804,"meta":334,"style":334},"git add core/index.js && git commit -m \"fix: current date\"\n",[169,2516,2517],{"__ignoreMap":334},[338,2518,2519,2521,2523,2526,2528,2530,2532,2534,2536,2539],{"class":340,"line":341},[338,2520,2287],{"class":582},[338,2522,2475],{"class":389},[338,2524,2525],{"class":389}," core/index.js",[338,2527,2481],{"class":358},[338,2529,2484],{"class":582},[338,2531,2487],{"class":389},[338,2533,2490],{"class":389},[338,2535,386],{"class":358},[338,2537,2538],{"class":389},"fix: current date",[338,2540,393],{"class":358},[259,2542,2544],{"id":2543},"use-case-1-common-version","Use case 1: common version",[17,2546,2547],{},"At this point, you may want to release the new library version.\nWe will use two options that, for this demo project, prevent some errors and will help us focus on the versioning. Run hence:",[329,2549,2551],{"className":802,"code":2550,"language":804,"meta":334,"style":334},"npx lerna version --no-git-tag-version --no-push\n",[169,2552,2553],{"__ignoreMap":334},[338,2554,2555,2557,2559,2562,2565],{"class":340,"line":341},[338,2556,2328],{"class":582},[338,2558,2331],{"class":389},[338,2560,2561],{"class":389}," version",[338,2563,2564],{"class":389}," --no-git-tag-version",[338,2566,2567],{"class":389}," --no-push\n",[17,2569,2570],{},"Lerna will ask eventually if you are sure that the changes are correct, printing the future versions:",[329,2572,2577],{"className":2573,"code":2575,"language":2576},[2574],"language-text","Changes:\n - @monorepo/client: 3.8.18 => 3.8.19\n - @monorepo/core: 1.0.3 => 3.8.19\n","text",[169,2578,2575],{"__ignoreMap":334},[17,2580,2581,2582,2587],{},"In this case, which is the default behavior called ",[21,2583,2586],{"href":2584,"rel":2585},"https://lerna.js.org/docs/features/version-and-publish#fixedlocked-mode-default",[1313],"Fixed/Locked",", Lerna will see that the versions are highly different, conciliating the versions.",[259,2589,2591],{"id":2590},"use-case-2-independent-version","Use case 2: independent version",[17,2593,2594,2595,2597,2598,2601,2602,174],{},"Let's say that the requirement is to keep the versioning different, then it's enough to just edit the ",[169,2596,2350],{}," file and change the version row with ",[169,2599,2600],{},"independent"," instead of ",[169,2603,2402],{},[17,2605,2606],{},"Also this time, run:",[329,2608,2609],{"className":802,"code":2550,"language":804,"meta":334,"style":334},[169,2610,2611],{"__ignoreMap":334},[338,2612,2613,2615,2617,2619,2621],{"class":340,"line":341},[338,2614,2328],{"class":582},[338,2616,2331],{"class":389},[338,2618,2561],{"class":389},[338,2620,2564],{"class":389},[338,2622,2567],{"class":389},[17,2624,2625],{},"Lerna will propose a different bump strategy based on the independent key previously set, showing the following prompt:",[329,2627,2630],{"className":2628,"code":2629,"language":2576},[2574],"Changes:\n - @monorepo/client: 3.8.18 => 3.8.19\n - @monorepo/core: 1.0.3 => 1.0.4\n",[169,2631,2629],{"__ignoreMap":334},[12,2633,2635],{"id":2634},"conclusions","Conclusions",[17,2637,2638],{},"In this post you have briefly learned:",[30,2640,2641,2644,2647],{},[33,2642,2643],{},"what are the monorepos;",[33,2645,2646],{},"which are the different versioning strategies available;",[33,2648,2649],{},"how Lerna could help you solving the versioning challenge you can encounter.",[17,2651,2652],{},"If you interested in the topic, I recommend you to keep reading other resources and articles that you can find in the bibliography.",[12,2654,1304],{"id":1303},[30,2656,2657,2663],{},[33,2658,2659],{},[21,2660,2662],{"href":2661,"target":206},"https://medium.com/@alessandro.traversi/monorepos-advantages-and-disadvantages-233c1b7146c2","Monorepos: advantages and disavantages",[33,2664,2665],{},[21,2666,2668],{"href":2667,"target":206},"https://amarchenko.dev/blog/2023-09-26-versioning/","Mastering Monorepo Versioning Best Practices",[1327,2670,2671],{},"html pre.shiki code .sBMFI, html code.shiki .sBMFI{--shiki-light:#E2931D;--shiki-default:#FFCB6B;--shiki-dark:#FFCB6B}html pre.shiki code .sfazB, html code.shiki .sfazB{--shiki-light:#91B859;--shiki-default:#C3E88D;--shiki-dark:#C3E88D}html .light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html.light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .sMK4o, html code.shiki .sMK4o{--shiki-light:#39ADB5;--shiki-default:#89DDFF;--shiki-dark:#89DDFF}html pre.shiki code .spNyl, html code.shiki .spNyl{--shiki-light:#9C3EDA;--shiki-default:#C792EA;--shiki-dark:#C792EA}html pre.shiki code .sbssI, html code.shiki .sbssI{--shiki-light:#F76D47;--shiki-default:#F78C6C;--shiki-dark:#F78C6C}",{"title":334,"searchDepth":370,"depth":370,"links":2673},[2674,2675,2676,2677,2678,2679],{"id":14,"depth":377,"text":15},{"id":2190,"depth":377,"text":2191},{"id":2209,"depth":377,"text":2210},{"id":2248,"depth":377,"text":2249},{"id":2634,"depth":377,"text":2635},{"id":1303,"depth":377,"text":1304},"2025-03-14T00:00:00.000Z","Versioning multiple libraries under the monorepo approach can be challenging, let's see how to face this problems.","/images/articles/monorepo-versioning.jpg",{},"/en/articles/monorepo-versioning-with-javascript-and-lerna",{"title":2153,"description":2681},"en/articles/1.monorepo-versioning-with-javascript-and-lerna",[2688,2689,2690,2691,2692],"javascript","monorepo","lerna","versioning","semver","CcMq5UlbBd8CbN0K3cQvk6zMiR-ff3M8L5CWCni6b9A",1763894162800]