Market Segmentation and Lead Enrichment with LLMs

Market segmentation is an essential part of any go-to-market strategy. From a marketer perspective, potential customers belong to the same segment if they

Will use the same product
Can be reached by the same sales process
Will look at each other as references

Let’s say we have a list of companies, where each company is described by two attributes: Name and Location. We could obtain this list by hosting a webinar series and asking each attendee for the name of their company.

How do we segment this list?

For a small list, we could Google each company, visit their website, understand their business, and assign the company to one or more segments.

But what if our list has hundreds of companies?

Generative AI to the rescue!

Large language models can use tools such as Python and web search.

Ideally, we must be able to give the list to an AI model and ask it to enrich the list with additional attributes by researching the companies online. Next, we could ask the model to segment the list using these attributes.

In reality, it’s easier said than done.

To begin with, the list is too big to fit in LLM's working memory. We must break it down into chunks, get them processed by the LLM, and collate the results.

Second, capable models hosted by OpenAI and Microsoft are guarded with request rate limiters. If we just throw all our data at a model, we will exceed a rate limit and get a bunch of errors.

Third, models are getting better at producing results in a standard format, such as JSON or CSV. Still, with many records, we are almost guaranteed to get garbage from a model occasionally. When this happens, the best approach is to resubmit the request while pointing out the error to the model.

Slowly but surely, this is turning into a week-long programming project.

But don’t despair.

PyAQ, an open-source Generative AI platform from AnyQuest, takes care of this for you automatically:

Breaks down a long list of records into chunks digestible by a model
Maps each chunk to a separate worker for parallel processing
Equips each worker with access to LLMs, web search, and Python
Throttles requests to honor request and token rate limits
Verifies results produced by the model and resubmits the ones that failed
Collates and saves the results produced by multiple workers

Thanks to these built-in services, you only need to do the fun part: craft a prompt instructing the model to search the web and gather information about the companies on the list.

Which is exactly what I did. Check out my super low code here:

https://github.com/anyquest/pyaq/blob/main/examples/apps/companies.yml

Reach out if you have a list or two that must be “enriched.” We are here to help.