OpenAI vs DeepSeek Showdown. Judged by Claude.
Both OpenAI O1 and DeepSeek R1 are now available on AnyQuest.
We compare their performance by asking the models to run an identical agentic workflow. The workflow uses popular business strategy frameworks to research a company, in this case, NVIDIA.
O1 and OpenAI were represented by the o1-preview model. R1 and DeepSeek were represented by the deepseek-r1-distill-llama-70b model deployed in the Groq Cloud.
O1 is about three times the size of the version of R1 I am testing, but the comparison is instructive nevertheless. How does a less expensive, smaller open-source model compare to a much more costly and larger closed-source model?
The AnyQuest agent used in the experiment was defined as follows:
The Start activity accepts the name and website URL of the company.
The Gen_1 and Gen_7 activities are meta-prompting activities. They generate prompts for downstream activities. For comparison, here are the political analysis prompts produced by O1 and R1, respectively.
Aspect | O1 | R1 |
---|---|---|
General Instructions | Analyze the Political Factors affecting NVIDIA Corporation. | Analyze the political environment affecting NVIDIA, focusing on government regulations, trade policies, and geopolitical tensions. |
Role and Persona | Experienced business analyst specializing in political risk assessment for technology companies. | Political Analyst |
In-Context Knowledge | Consider recent political developments, government policies, international relations, trade agreements, and any political events that may impact NVIDIA's operations and strategy over the next five years. | Consider global operations and potential impacts on supply chain and market access. |
Style Instructions | Provide detailed examples and specific impacts on the company's operations and strategy. Incorporate specific market data, trends, or regulatory changes relevant to NVIDIA and the semiconductor industry. | Provide detailed examples and specific impacts. |
Formatting Instructions | Use Markdown to format your answer. Provide the third-level title for your section, starting with '### Political Factors'. Use standard markdown syntax for headings, lists, bold text, etc. Ensure each list item starts without additional spaces unless it is nested. | Use third-level headings for sections. |
Directive | Produce a detailed analysis of the political factors affecting NVIDIA, including challenges and opportunities, and provide recommendations on how the company can adapt its strategy. | Identify challenges and opportunities, include recent news references. |
The models were asked to create 10 prompts, six for PESTEL and four for Porter.
R1 and O1 do not support function calling. They are pure reasoning models. Therefore, the actual research was done by the GPT-4o mini model equipped with tools for web search and scraping.
Finally, O1 and R1 were asked to summarize the results.
Here they are, in their full glory:
O1: https://demo.anyquest.ai/c/o1-company-research-ybtgfzva
R1: https://demo.anyquest.ai/c/deepseek-company-research-kshxafff
You can see the reasoning steps performed by R1 between the <think></think> tags. O1 does not show its thinking process.
Funnily enough, R1 slipped into Chinese a few times in its output.
In my non-expert human opinion, the 70B R1 model, which can be deployed on a small cluster of consumer-grade GPUs, comes very close to a much larger and much more expensive model.
I asked Claude Sonnet 3.5 to compare the outputs. It gave the win to O1.
Here is Claude's argument and conclusion: https://demo.anyquest.ai/c/comparison-analysis-nvidia-research-reports-from-two-ai-agents-1wwxpncau2