Vision Atomic Flow¶
Prequisite: Chat Atomic Flow
Definition¶
The VisionAtomicFlow
is a flow that seamlessly interfaces with an LLM through an API, . It is a flow that, given a textual input, and a set of images and/or videos, generates a textual output. Powered by the LiteLLM library in the backend, VisionAtomicFlow
supports various API providers; explore the full list here. For a detailed understanding of its parameters, refer to its FlowCard
for an extensive description of its parameters.
Methods¶
In this section, we’ll delve into some of the methods within the VisionAtomicFlow
class, specifically those invoked when it is called.
If you examine the VisionAtomicFlow
class, you’ll observe the following:
It’s a class that inherits from the
ChatAtomicFlow
.There is no
run
method explicitly defined, and as a result, it shares the samerun
method asChatAtomicFlow
, which is the method always called when a flow is invoked.
Here is the run method of VisionAtomicFlow:
def run(self,input_message: FlowMessage):
""" This method runs the flow. It processes the input, calls the backend and updates the state of the flow.
:param input_message: The input data of the flow.
:type input_message: aiflows.messages.FlowMessage
"""
input_data = input_message.data
response = self.query_llm(input_data=input_data)
reply_message = self.package_output_message(
input_message,
response = {"api_output": response}
)
self.send_message(reply_message)
In the provided code snippet, observe that the run
method handles the input data of the flow through the _process_input
method. Let’s delve into a closer examination of its functionality:
def _process_input(self, input_data: Dict[str, Any]):
""" This method processes the input data (prepares the messages to send to the API).
:param input_data: The input data.
:type input_data: Dict[str, Any]
:return: The processed input data.
:rtype: Dict[str, Any]
"""
if self._is_conversation_initialized():
# Construct the message using the human message prompt template
user_message_content = self.get_user_message(self.human_message_prompt_template, input_data)
else:
# Initialize the conversation (add the system message, and potentially the demonstrations)
self._initialize_conversation(input_data)
if getattr(self, "init_human_message_prompt_template", None) is not None:
# Construct the message using the query message prompt template
user_message_content = self.get_user_message(self.init_human_message_prompt_template, input_data)
else:
user_message_content = self.get_user_message(self.human_message_prompt_template, input_data)
self._state_update_add_chat_message(role=self.flow_config["user_name"],
content=user_message_content)
When calling _process_input(input_data)
in VisionAtomicFlow
, the flow generates its user message prompt similarly to ChatAtomicFlow
(refer to ChatAtomicFlow’s detailed example). However, due to a slight modification in the get_user_message
method compared to ChatAtomicFlow
, it also includes one or multiple images or videos in the input.
@staticmethod
def get_user_message(prompt_template, input_data: Dict[str, Any]):
""" This method constructs the user message to be passed to the API.
:param prompt_template: The prompt template to use.
:type prompt_template: PromptTemplate
:param input_data: The input data.
:type input_data: Dict[str, Any]
:return: The constructed user message (images , videos and text).
:rtype: Dict[str, Any]
"""
content = VisionAtomicFlow._get_message(prompt_template=prompt_template,input_data=input_data)
media_data = input_data["data"]
if "video" in media_data:
content = [ content[0], *VisionAtomicFlow.get_video(media_data["video"])]
if "images" in media_data:
images = [VisionAtomicFlow.get_image(image) for image in media_data["images"]]
content.extend(images)
return content
Note that images can be passed either via a URL (an image on the internet) or by providing the path to a local image. However, videos must be local videos.
Finally, the run
function calls the LLM via the LiteLLM library, saves the message in it’s flow state and sends the textual output to the next flow.
Additional Documentation:
To delve into the extensive documentation for
VisionAtomicFlow
, refer to its FlowCard on the FlowVerseFind
ChatAtomicFlow
’s default configuration here