When should I go for Fine-tuning vs In-Context Learning (ICL)

As the adoption of Large Language Models (LLMs) in Generative AI based applications grow, it is often important to understand how you want to use those. LLMs can react to the user prompt with efficient prompt engineering techniques via Zero-shot, One-shot or Few-shot prompting. At the same time, LLMs could be fine-tuned to confine them efficiently to the domain of your business. Let's understand in detail between what is Fine-tuning and In-Context learning in this blog. By the end of this blog, you will also learn when to choose fine-tuning vs In-Context learning and the trade-offs between the both.

In-Context Learning

The technique of obtaining response from the LLMs with One-shot or Few-shot prompting by adopting efficient prompt engineering technique is referred to as In-Context Learning by the LLM. In this mode, you are providing context in one or few shots along with the prompt you are feeding to the LLM. ICL mode of prompting technique helps LLMs learn about the context you are providing to them and respond in line according to the context. Let's look at some examples of zero-shot, one-shot, and few-shot prompts.

Examples of Zero-shot prompt

Write a poem about a cat.

This prompt is zero shot because it does not provide any examples of poems about cats. The model must use its knowledge of the world and its ability to generate text to write a poem about a cat.

Examples of One-shot prompt

Write a poem about a cat, like this:
The cat sat on the mat,
And looked at the rat.
The rat looked at the cat,
And wondered what was that.

This prompt is one shot because it provides one example of a poem about a cat. The model can use this example to generate a similar poem.

Example of Few-shot prompt

Instruction: Provide response to the following human prompt
Human: Apples are two in the basket, I am happy happy
Mangoes are three in the basket, I am happy happy happy
Bananas are five in the basket,
Agent:

This is an example of few-shot prompt where you provide instruction how to respond along with the context and the LLM will pick it up in the form of agent responding to your instruction.

Fine-tuning

The context is volatile and is effective only for the duration of the session and is not permanent. If you want the LLM to limit it's response to the context, you can go for Fine-tuning. Fine-tuning is the process of transferring the base knowledge and capability of LLM to the specialized domain that you are providing to during the fine-tuning. Let's say an LLM hold the effective ability to summarize the large amount of text provided to it without losing the key concepts.

Without fine-tuning there is a chance that the response for your ICL prompt might deviate from the topic that you are intending to get response for. You might want the LLM to confine the response such that the business users of your Gen AI application see only what is appropriate to the question they are asking for then Fine-tuning will come to your rescue.

We want to fine-tune LLM to perform a specific task, such as writing medical articles. To do this, we would create a dataset of medical articles and train the LLM on this dataset. The LLM would learn the patterns of writing in medical articles, such as the use of technical language and the structure of the articles.

Once the LLM is fine-tuned, we can use it to generate medical articles. The LLM would be able to generate articles that are accurate and informative, and that are written in a style that is consistent with medical articles.

Deciding Factors for Fine-tuning vs In-context Learning

AI engineers often needs to decide and must be able to have a knowledge of when one should go for LLM Fine-tuning and when they can settle with the In-Context Learning (ICL).  So, when should you go for Fine-tuning Large Language Models (LLMs)?

When the model does not perform well with one-shot or few-shot prompts

Models pre-trained with smaller number of might not have the capability to respond well despite you provide one-shot or few-shot prompts. Even though you try varying inference parameters, the LLMs tend to respond poorly. In such situations you can either go for chat variant of the LLM if available, or fine-tune the pre-trained model with the data you have.

If your requirement is satisfied with the one-shot or few-shot prompting and you benefit from LLM for your use case then you better avoid fine-tuning the LLM.

When you want to scope the response to the context within the domain of your knowledgebase or domain

Certain use cases require your LLMs to not deviate or respond with the context learnt during it's pre-training. In such case you go for fine-tuning and limit the scope of response constrained to the data you are providing to the LLM during the process of fine-tuning.

Fine-tuning an LLM may be necessary if your legal team only approves certain LLMs. But it may happen that those approved LLMs do not perform well with in-context learning, even though there are many other LLMs available that can perform significantly better.

Another reason you might want to or be forced to go for Fine-tuning of the LLMs could be the length of the context window.  If your context window exceeds the maximum token capacity of the LLM in such situations you either want to summarize the context to fit along with the prompt or you can go for Fine-Tuning.

Summary

LLMs have evolved a lot and will over the period of time. It is very crucial to identify the need for fine-tuning which otherwise you would incur unwanted cost for data preparation, training and managing an AI Ops to have the pipeline streamlined. This also consumes your resources such as engineers, infrastructure, time and effort.

It is important to choose wisely between the option of fine-tuning vs going with one-shot or few-shot based on the aforementioned needs and use cases.