LLM Frameworks
Langchain
- Powerful but heavy llm framework
| from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-5-mini")
response = llm.invoke(tell_a_joke)
print(response.content)
|
LiteLLM
- Lightweight llm framework
| response = completion(model=MODEL, messages=messages)
response_content = response.choices[0].message.content
print(response_content)
print(f"Input Tokens: {response.usage.prompt_tokens}")
print(f"Output Tokens: {response.usage.completion_tokens}")
print(f"Total Tokens: {response.usage.total_tokens}")
print(f"Total Cost: {response._hidden_params["response_cost"]*100:.4f} cents")
|
Prompt Caching
- Cache allows users to reduce upto 10x costs but need to pay small additional cost for enabling prompt caching.
| with open("files/shakespeare.txt", "r", encoding="utf-8") as f:
shakespeare = f.read()
loc = shakespeare.find("what answer made the belly?")
print(shakespeare[loc:loc+100])
question = [{"role": "user", "content": "In shakespeare text, when a citizen asked 'what answer made the belly? what is the reply from MENENIUS" }]
response = completion(model="ollama/gpt-oss", messages=question)
print(response.choices[0].message.content)
print(f"Input Tokens: {response.usage.prompt_tokens}")
print(f"Output Tokens: {response.usage.completion_tokens}")
print(f"Total Tokens: {response.usage.total_tokens}")
print(f"Total Cost: {response._hidden_params["response_cost"]*100:.4f} cents")
|
Note
Always add the questions or variable text at the end, to effectively enable cache as the initial text will remain the same in all requests.