How to generate embeddings using Amazon Bedrock and LangChain
Learn how to generate embeddings using Amazon Bedrock's foundation model Titan Embeddings G1 - Text whose base model ID is `amazon.titan-embed-text-v1` and LangChain.
In this tutorial, we will learn how to generate embeddings using Amazon Bedrock's foundation model Titan Embeddings G1 - Text whose base model ID is amazon.titan-embed-text-v1 and LangChain. We will do step by step development to implement how to generate embeddings. Let's set up environment for development.
Setup Environment
We need to configure AWS profile, for that you should have AWS CLI. Install or update the latest version of the AWS CLI form the official documentation here. Then you can refer to Configure the AWS CLI.
aws configure
AWS Access Key ID [None]: <insert_access_key>
AWS Secret Access Key [None]: <insert_secret_key>
Default region name [None]: <insert_aws_region>
Default output format [json]: json
Your credentials and configuration will be stored in ~/.aws/ on *nix based OS and %UserProfile%/.aws/ in Windows OS.
Run to verify your profile aws sts get-caller-identity
{
"UserId": "AIDA9EUKQEJHEWE27S1RA",
"Account": "012345678901",
"Arn": "arn:aws:iam::012345678901:user/dev-admin"
}Now create repository for our development.
mkdir bedrockembed-langchain
cd bedrockembed-langchainLet's create a virtual environment and install necessary packages.
python3 -m venv env
source env/bin/activateWindows users:
python -m venv env
env\Scripts\activate
Install packages for bedrock and langchain. We will be accessing Amazon Bedrock API which is a service on AWS. Hence we will be using latest version of boto3 to access Amazon Bedrock. We will also install langchain to use BedrockEmbeddings available as a part of module langchain.embeddings.
python -m pip install -U pip
pip install boto3
pip install langchain==0.0.310Finally we store the package info to requirements.txt.
pip freeze > requirements.txtLet's create a main.py file to start implementing logic for interacting with Amazon Bedrock's embeddings FM Titan Embeddings G1 - Text. It's base model ID is amazon.titan-embed-text-v1.
touch main.pyBefore we proceed adding code to main.py we will understand AWS_PROFILE.
Note about AWS_PROFILE
The AWS_PROFILE environment variable can be set in the active terminal where you are about to execute the python main.py command. The value of this environment variable will typically be default unless you had any customizations. To check that, you can visit ~/.aws/config file on *nix OS or %UserProfile%/.aws/config in Windows OS to verify and capture the correct name of your profile. For most of you it will look like below.
[default]
region = us-east-1Show me the code!
Now open the file main.py and copy the following code.
import boto3
from langchain.embeddings import BedrockEmbeddings
import os
profile = os.environ.get("AWS_PROFILE",'default')
bedrock_runtime = boto3.client(
service_name="bedrock-runtime",
region_name="us-west-2",
)
embeddings = BedrockEmbeddings(
credentials_profile_name=profile,
region_name="us-west-2",
model_id="amazon.titan-embed-text-v1"
)
response = embeddings.embed_query("Hello from WowData.Science")
print(len(response), type(response))
Save the file and run the command in the terminal python main.py. You will see the output as 1536 <class 'list'> which is the output holding the length of response as well as type of response. For the sake of understanding, you can also print the response directly to the console which will output a list of length 1536 positional + vectorized representation of input query.
Why the length is 1536?
We are using Amazon Bedrock's FM Titan Embeddings G1 - Text with base model id amazon.titan-embed-text-v1. It's maximum output vector length is 1536. Also you must be noted that it's maximum input text is 8K tokens.
Summary
In this tutorial, we learnt how to setup environment to implement generating embeddings via Amazon Bedrock's FM amazon.titan-embed-text-v1 using LangChain.