VectorStore-backed memory¶
The support for Cassandra vector store, available in LangChain, enables another interesting use case, namely a chat memory buffer that injects the most relevant past exchanges into the prompt, instead of the most recent (as most other memories do). This enables retrieval of related context arbitrarily far back in the chat history.
All you need is to instantiate a Cassandra
vector store and wrap it in a VectorStoreRetrieverMemory
type of memory, provided by LangChain.
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.memory import VectorStoreRetrieverMemory
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate
from langchain.vectorstores.cassandra import Cassandra
As usual, a database connection is needed to access Cassandra. The following assumes that a vector-search-capable Astra DB instance is available. Adjust as needed.
from cqlsession import getCQLSession, getCQLKeyspace
cqlMode = 'astra_db' # 'astra_db'/'local'
session = getCQLSession(mode=cqlMode)
keyspace = getCQLKeyspace(mode=cqlMode)
Both an LLM and an embedding function are required.
Below is the logic to instantiate the LLM and embeddings of choice. We choose to leave it in the notebooks for clarity.
from llm_choice import suggestLLMProvider
llmProvider = suggestLLMProvider()
# (Alternatively set llmProvider to 'GCP_VertexAI', 'OpenAI' ... manually if you have credentials)
if llmProvider == 'GCP_VertexAI':
from langchain.llms import VertexAI
from langchain.embeddings import VertexAIEmbeddings
llm = VertexAI()
myEmbedding = VertexAIEmbeddings()
print('LLM+embeddings from VertexAI')
elif llmProvider == 'OpenAI':
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings
llm = OpenAI(temperature=0)
myEmbedding = OpenAIEmbeddings()
print('LLM+embeddings from OpenAI')
else:
raise ValueError('Unknown LLM provider.')
LLM+embeddings from VertexAI
Create the store¶
table_name = 'vstore_memory_' + llmProvider
cassVStore = Cassandra(
session=session,
keyspace=keyspace,
table_name=table_name,
embedding=myEmbedding,
)
# just in case this demo runs multiple times
cassVStore.clear()
Create the retriever and the memory¶
From the vector store a "retriever" is spawned. You'll keep the number of items to fetch intentionally very small for demonstration purposes.
Next, the retriever is wrapped in a VectorStoreRetrieverMemory
:
retriever = cassVStore.as_retriever(search_kwargs={'k': 3})
semanticMemory = VectorStoreRetrieverMemory(retriever=retriever)
Create a fake "past conversation". Note how the topic of the discussion wanders to fixing one's PC in the last few exchanges:
pastExchanges = [
(
{"input": "Hello, what is the biggest mammal?"},
{"output": "The blue whale."},
),
(
{"input": "... I cannot swim. Actually I hate swimming!"},
{"output": "I see."},
),
(
{"input": "I like mountains and beech forests."},
{"output": "That's good to know."},
),
(
{"input": "Yes, too much water makes me uneasy."},
{"output": "Ah, how come?."},
),
(
{"input": "I guess I am just not a seaside person"},
{"output": "I see. How may I help you?"},
),
(
{"input": "I need help installing this driver"},
{"output": "First download the right version for your operating system."},
),
(
{"input": "Good grief ... my keyboard does not work anymore!"},
{"output": "Try plugging it in your PC first."},
),
]
Insert these exchanges into the memory:
for exI, exO in pastExchanges:
semanticMemory.save_context(exI, exO)
Given a conversation input, the load_memory_variables
performs a semantic search and comes up with relevant items from the memory, regardless of their order:
QUESTION = "Can you suggest me a sport to try?"
print(semanticMemory.load_memory_variables({"prompt": QUESTION})["history"])
input: I guess I am just not a seaside person output: I see. How may I help you? input: ... I cannot swim. Actually I hate swimming! output: I see. input: I like mountains and beech forests. output: That's good to know.
Usage in a conversation chain¶
This semantic memory element can be used within a full conversation chain.
In the following you'll create a custom prompt and a ConversationChain
out of it, attaching the latter to the vector-store-powered memory seen above:
semanticMemoryTemplateString = """The following is a between a human and a helpful AI.
The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know.
The AI can use information from parts of the previous conversation (only if they are relevant):
{history}
Current conversation:
Human: {input}
AI:"""
memoryPrompt = PromptTemplate(
input_variables=["history", "input"],
template=semanticMemoryTemplateString
)
conversationWithVectorRetrieval = ConversationChain(
llm=llm,
prompt=memoryPrompt,
memory=semanticMemory,
verbose=True
)
Run the chain with the sports question:
conversationWithVectorRetrieval.predict(input=QUESTION)
> Entering new ConversationChain chain... Prompt after formatting: The following is a between a human and a helpful AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know. The AI can use information from parts of the previous conversation (only if they are relevant): input: I guess I am just not a seaside person output: I see. How may I help you? input: ... I cannot swim. Actually I hate swimming! output: I see. input: I like mountains and beech forests. output: That's good to know. Current conversation: Human: Can you suggest me a sport to try? AI: > Finished chain.
'I see. How about hiking? It is a great way to explore the mountains and beech forests.'
Notice how new exchanges are automatically added to the memory:
conversationWithVectorRetrieval.predict(input="Would I like a swim in a mountain lake?")
> Entering new ConversationChain chain... Prompt after formatting: The following is a between a human and a helpful AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know. The AI can use information from parts of the previous conversation (only if they are relevant): input: Can you suggest me a sport to try? response: I see. How about hiking? It is a great way to explore the mountains and beech forests. input: I guess I am just not a seaside person output: I see. How may I help you? input: ... I cannot swim. Actually I hate swimming! output: I see. Current conversation: Human: Would I like a swim in a mountain lake? AI: > Finished chain.
'I see. I am not sure if you would like a swim in a mountain lake. I know that some people enjoy swimming in mountain lakes, but I also know that some people do not enjoy swimming in mountain lakes. I think it would be best for you to decide for yourself if you would like to swim in a mountain lake.'
... so that now the most relevant items for the same question are changed:
semanticMemory.retriever.get_relevant_documents(QUESTION)
[Document(page_content='input: Can you suggest me a sport to try?\nresponse: I see. How about hiking? It is a great way to explore the mountains and beech forests.', metadata={}), Document(page_content='input: Would I like a swim in a mountain lake?\nresponse: I see. I am not sure if you would like a swim in a mountain lake. I know that some people enjoy swimming in mountain lakes, but I also know that some people do not enjoy swimming in mountain lakes. I think it would be best for you to decide for yourself if you would like to swim in a mountain lake.', metadata={}), Document(page_content='input: I guess I am just not a seaside person\noutput: I see. How may I help you?', metadata={})]
A counterexample¶
What would happen with a simpler memory element, which simply retrieves a certain number of most recent interactions?
Create and populate an instance of LangChain's ConversationTokenBufferMemory
, limiting it to a maximum token length of 80 (roughly equivalent to the 3 fragments set for the semanticMemory
object):
from langchain.memory import ConversationTokenBufferMemory
from langchain.memory import ChatMessageHistory
baseHistory = ChatMessageHistory()
recencyBufferMemory = ConversationTokenBufferMemory(
chat_memory=baseHistory,
max_token_limit=80,
llm=llm,
)
for exI, exO in pastExchanges:
recencyBufferMemory.save_context(exI, exO)
Time to ask the same sports question. This is what will get injected into the prompt this time:
print(recencyBufferMemory.load_memory_variables({"prompt": QUESTION})["history"])
AI: Ah, how come?. Human: I guess I am just not a seaside person AI: I see. How may I help you? Human: I need help installing this driver AI: First download the right version for your operating system. Human: Good grief ... my keyboard does not work anymore! AI: Try plugging it in your PC first.
... and this is the (rather generic) answer you'd get:
conversationWithRecencyRetrieval = ConversationChain(
llm=llm,
prompt=memoryPrompt,
memory=recencyBufferMemory,
verbose=True
)
conversationWithRecencyRetrieval.predict(input=QUESTION)
> Entering new ConversationChain chain... Prompt after formatting: The following is a between a human and a helpful AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know. The AI can use information from parts of the previous conversation (only if they are relevant): AI: Ah, how come?. Human: I guess I am just not a seaside person AI: I see. How may I help you? Human: I need help installing this driver AI: First download the right version for your operating system. Human: Good grief ... my keyboard does not work anymore! AI: Try plugging it in your PC first. Current conversation: Human: Can you suggest me a sport to try? AI: > Finished chain.
'Sure, there are many different sports to choose from. What are you interested in?'