How to generate embeddings using Amazon Bedrock and LangChain

Learn how to generate embeddings using Amazon Bedrock's foundation model Titan Embeddings G1 - Text whose base model ID is `amazon.titan-embed-text-v1` and LangChain.

How to generate embeddings using Amazon Bedrock and LangChain - WowData.Science
How to generate embeddings using Amazon Bedrock and LangChain - WowData.Science

In this tutorial, we will learn how to generate embeddings using Amazon Bedrock's foundation model Titan Embeddings G1 - Text whose base model ID is amazon.titan-embed-text-v1 and LangChain. We will do step by step development to implement how to generate embeddings. Let's set up environment for development.

Setup Environment

We need to configure AWS profile, for that you should have AWS CLI. Install or update the latest version of the AWS CLI form the official documentation here. Then you can refer to Configure the AWS CLI.

aws configure

AWS Access Key ID [None]: <insert_access_key>
AWS Secret Access Key [None]: <insert_secret_key>
Default region name [None]: <insert_aws_region>
Default output format [json]: json

Your credentials and configuration will be stored in ~/.aws/ on *nix based OS and %UserProfile%/.aws/ in Windows OS.

Run to verify your profile aws sts get-caller-identity

{
    "UserId": "AIDA9EUKQEJHEWE27S1RA",
    "Account": "012345678901",
    "Arn": "arn:aws:iam::012345678901:user/dev-admin"
}

Now create repository for our development.

mkdir bedrockembed-langchain
cd bedrockembed-langchain

Let's create a virtual environment and install necessary packages.

python3 -m venv env
source env/bin/activate

Windows users:

python -m venv env
env\Scripts\activate

Install packages for bedrock and langchain. We will be accessing Amazon Bedrock API which is a service on AWS. Hence we will be using latest version of boto3 to access Amazon Bedrock. We will also install langchain to use BedrockEmbeddings available as a part of module langchain.embeddings.

python -m pip install -U pip
pip install boto3
pip install langchain==0.0.310

Finally we store the package info to requirements.txt.

pip freeze > requirements.txt

Let's create a main.py file to start implementing logic for interacting with Amazon Bedrock's embeddings FM Titan Embeddings G1 - Text. It's base model ID is amazon.titan-embed-text-v1.

touch main.py

Before we proceed adding code to main.py we will understand AWS_PROFILE.

Note about AWS_PROFILE

The AWS_PROFILE environment variable can be set in the active terminal where you are about to execute the python main.py command. The value of this environment variable will typically be default unless you had any customizations. To check that, you can visit ~/.aws/config file on *nix OS or %UserProfile%/.aws/config in Windows OS to verify and capture the correct name of your profile. For most of you it will look like below.

[default]
region = us-east-1

Show me the code!

Now open the file main.py and copy the following code.

import boto3
from langchain.embeddings import BedrockEmbeddings
import os

profile = os.environ.get("AWS_PROFILE",'default')
bedrock_runtime = boto3.client(
        service_name="bedrock-runtime",
        region_name="us-west-2",
    )

embeddings = BedrockEmbeddings(
    credentials_profile_name=profile,
    region_name="us-west-2",
    model_id="amazon.titan-embed-text-v1"
)

response = embeddings.embed_query("Hello from WowData.Science")
print(len(response), type(response))

Save the file and run the command in the terminal python main.py. You will see the output as 1536 <class 'list'> which is the output holding the length of response as well as type of response. For the sake of understanding, you can also print the response directly to the console which will output a list of length 1536 positional + vectorized representation of input query.

Why the length is 1536?

We are using Amazon Bedrock's FM Titan Embeddings G1 - Text with base model id amazon.titan-embed-text-v1. It's maximum output vector length is 1536. Also you must be noted that it's maximum input text is 8K tokens.

Summary

In this tutorial, we learnt how to setup environment to implement generating embeddings via Amazon Bedrock's FM amazon.titan-embed-text-v1 using LangChain.