Callbacks
Pre-requisites
The lowest level way to stream outputs from LLMs in LangChain is via the callbacks system. You can pass a
callback handler that handles the on_llm_new_token
event into LangChain components. When that component is invoked, any
LLM or chat model contained in the component calls
the callback with the generated token. Within the callback, you could pipe the tokens into some other destination, e.g. a HTTP response.
You can also handle the on_llm_end
event to perform any necessary cleanup.
You can see this how-to section for more specifics on using callbacks.
Callbacks were the first technique for streaming introduced in LangChain. While powerful and generalizable, they can be unwieldy for developers. For example:
- You need to explicitly initialize and manage some aggregator or other stream to collect results.
- The execution order isn't explicitly guaranteed, and you could theoretically have a callback run after the
.invoke()
method finishes. - Providers would often make you pass an additional parameter to stream outputs instead of returning them all at once.
- You would often ignore the result of the actual model call in favor of callback results.