Automating Workflows with Multi-modal AI and PyAQ

If you were impressed by the first generation of large language models, wait till you interact with a multi-modal one, such as Gemini, GPT4V, or LlaVA. In addition to text, these models can process and understand images, videos, and speech, which presents new opportunities for workflow automation.

If you were impressed by the first generation of large language models, wait till you interact with a multi-modal one, such as Gemini, GPT4V, or LlaVA.

In addition to text, these models can process and understand images, videos, and speech, which presents new opportunities for workflow automation.

Take, for instance, the vehicle damage assessment report that must be completed by an insurance adjuster at the place of an automobile accident. The report is used to estimate repair costs and determine whether a vehicle is repairable or considered a total loss.

The current process consists of the adjuster taking several pictures at the place of the accident and typing up the report containing the description of the damage, cause of the damage, assessment of repairability, cost estimate, an estimated timeline for completing the repairs, and other information.

With a multi-modal large language model, many of these steps can be automated. To illustrate this point, let’s build a simple app in PyAQ, a low-code platform for cognitive apps by AnyQuest.

As the first step, we must give the app an insurance adjuster persona. We accomplish this by entering the corresponding instructions in the header of the app:

info:
  id: aq.apps.accident_report
  version: 1.0.0
  title: Accident Report Application
  profile: >
    You are an auto insurance adjuster. 
    Your job is to assess the vehicle damage at the site of an accident.

Next, we must instruct the app to use a multi-modal model:

models:
  gpt4v:
    model: gpt-4-vision-preview
    provider: openai

At the level of abstraction provided by PyAQ, replacing this model with a different one is a matter of changing a couple of lines.

We strongly advise experimenting with models from different providers. Their relative price/performance varies broadly from use case to use case.

Finally, we need to provide a flow of tasks used to create the report:

  1. Read an image file
  2. Prompt the model to generate a damage assessment
  3. Save the report as an HTML document

In PyAQ, this looks as follows:

activities:
  read_image:
    type: read

  report_damage:
    type: generate
    inputs:
      - activity: read_image
    models:
      - gpt4v
    parameters:
      temperature: 0.5
      prompt: >
        Provide a detailed description of the damage 
        to the vehicle in the attached image.

  write_answer:
    type: write
    inputs:
      - activity: read_image
      - activity: report_damage
    parameters:
      format: html
      template: |
        ## DAMAGE REPORT
        ![Vehicle]({{read_image}})
        
        {{report_damage}}

In this example, we create a description of the damage based on one image.

The app can be easily extended to consume multiple images and launch several instances of the “generate” activity to fill out the various sections of the damage assessment report. In addition, we could place the insurance company policies and guidelines into the app's memory to guide the generation process.

If we run this app on our car accident image, we get a rather detailed description of the damage that would make any insurance adjuster proud.

To summarize, with just a few lines of code in PyAQ, we created an app that generates a detailed damage assessment report by inspecting an image of a car accident. This app can now be deployed on AnyQuest and invoked from an insurance workflow process automated on ServiceNow or any other platform.

You can find the source code for this and many other examples at https://github.com/anyquest/pyaq.git.

Gen AI in 2025: The Year of Quantity

What can we expect from Gen AI next year? For most of 2024, the narrative has been about quality. Models are measured and compared on a range of qualitative and quantitative benchmarks. In almost every dimension, the difference in performance among frontier models is now within a tenth of a

Optimizing Business Processes with Gen AI and Use Case Crowdsourcing

Gen AI is set to revolutionize business by automating numerous steps in enterprise processes. But where should one begin? We suggest establishing a robust cycle of Ideation, Prioritization, Testing, and Analysis. Here are some typical errors and best practices for the Ideation stage.

Can GPT-4 Outsmart Wall Street Stock Pickers?

I provided a large language model with last year's market outlook reports and asked it to generate aggressive growth portfolios. Next, I backtested the portfolios against historical market data. Here are the results.