From Prompted Diagrams to Agentic TikZ Generation

Pooja, CorrectBrain

Abstract

This article describes how TikZ generation can move from human-designed prompts to LLM-designed diagrams, and then from a hybrid human-in-the-loop workflow to an agentic correction system. The main distinction is the role of the LLM: first it is only a code generator for a human design, then it becomes the designer, and finally its design process is distributed across specialized agents.

1. When the LLM Is Not the Designer

The common approach for generating TikZ graphics with large language models usually starts in one of two ways. In the first method, the user writes a complete design prompt that contains all visual, mathematical, structural, and formatting requirements. In the second method, the user provides an image of a hand-drawn or manually prepared design, and the model is asked to convert that visual reference into TikZ or LaTeX code.

In both cases, the LLM receives the design specification from the user and returns code, usually TikZ code, which is then compiled to produce the final graphic. In this workflow, the human is still the actual designer. The LLM is mainly acting as a code generator that translates an existing design idea into TikZ syntax.

A complete text design prompt used for TikZ generation
Figure 1. Human-designed text prompt. The LLM receives the design and converts it into TikZ code.
A prompt that uses a design image as the visual reference
Figure 2. Human-provided design image. The LLM converts the visual reference into TikZ or LaTeX.

Note: Figures 1 and 2 are reference examples taken from this YouTube video: https://www.youtube.com/watch?v=crZV-yweZNg.

Why This Workflow Fails at Scale

The generated code often contains several classes of errors, including compilation errors, package errors, invalid TikZ commands, missing libraries, incorrect coordinate systems, poor object placement, overlapping elements, and weak spatial reasoning. The most difficult issue is usually not simple syntax correction, but spatial analysis: ensuring that the generated diagram is visually coherent, geometrically correct, well aligned, and faithful to the intended design.

These errors are typically fixed through an iterative LLM-in-the-loop process. The code is compiled, the errors are identified, the LLM is asked to revise the code, and the process repeats until the diagram becomes acceptable. This approach can work for individual diagrams, but it does not scale well for mass production.

The bottleneck is the human designer. The human must prepare detailed design prompts or reference sketches, inspect every generated diagram, detect spatial and conceptual errors, and repeatedly guide the LLM toward a corrected result. Because of this dependency on human design capability and repeated correction, large-scale production of high-quality TikZ graphics becomes difficult.

2. When the LLM Becomes the Designer

Our approach changes the role of the LLM. Instead of using the LLM only as a TikZ code generator for a human-provided design, we use the LLM as the designer itself. The model is responsible for creating the visual concept, deciding the layout, generating the structure of the diagram, and then producing TikZ code from its own design reasoning.

This removes the need for a fully written design prompt or a handmade reference image for every graphic. However, this shift introduces a different class of problems. When the LLM becomes the designer, the main challenge is no longer only compilation or basic spatial correction.

The generated output must also be evaluated for design quality, logical correctness, conceptual accuracy, mathematical validity, domain consistency, and visual precision. The system must verify not only whether the code compiles, but whether the image produced by the code is actually correct.

2.1 First Stage: Hybrid Human-in-the-Loop Engineering System

We first solved this problem using a hybrid human-in-the-loop system. In this method, the LLM generates the initial design and TikZ code. Compilation errors and basic code-level issues are corrected automatically with the help of LLM-based revision loops. After that, the remaining errors are handled through human review.

To reduce the burden on the human reviewer, we built supporting scripts, Python programs, validation tools, and system-level correction mechanisms. This is important because many TikZ generation problems are not purely language-model problems. Spatial consistency, coordinate normalization, object alignment, collision detection, bounding-box control, style enforcement, and reusable layout constraints are engineering problems.

These engineering problems can often be handled more reliably through deterministic scripts, custom tooling, and modifications to TikZ libraries than through prompting alone. In this stage, the combination of fine-tuned open-source LLMs, engineering scripts, system-level programs, and TikZ-library modifications solved approximately 60% of the overall problem. The remaining 40% was handled by human reviewers using our interactive correction system.

Interactive Correction and Cost Control

To make the human-in-the-loop stage less time-consuming, we developed an interface that makes interaction with `.tex` files easier. The reviewer can inspect, modify, and correct the generated TikZ code more efficiently without manually searching through large and complex LaTeX files. This system allowed us to produce good-quality graphics while keeping the review process manageable.

For cost efficiency, we primarily used open-source LLMs. We also fine-tuned several open-source models for TikZ generation, design planning, and correction tasks. We tested both commercial and open-source models, but large-scale production with paid models quickly becomes expensive. Since cost is a major constraint in mass generation, open-source models became the practical foundation of the system.

We are not disclosing the names of the specific open-source models because the purpose here is not to benchmark or evaluate individual models.

2.2 Second Stage: Agentic Generation and Correction Pipeline

In the second stage, we reduced the amount of human work further by building an agentic generation and correction pipeline. The goal was to move as much of the remaining 40% human effort as possible into automated LLM agents and engineering tools. Instead of using one model to perform the entire task, we created a multi-agent system in which each agent is responsible for a specific type of reasoning or correction.

The pipeline initially used around five specialized agents. The first agent acts as the designer. It creates the visual concept, layout plan, diagram structure, and high-level design outline, but it does not directly write the final TikZ code. The next agent converts this design plan into LaTeX and TikZ code. Another agent focuses on spatial analysis and layout correction.

Initial layout generated by the first agent in the agentic TikZ system
Figure 3. Agentic system, first + Second agent output. The first agent draws the layout outline and high-level visual structure.

Additional agents check mathematical accuracy, logical consistency, conceptual correctness, formatting quality, and domain-specific constraints. The `.tex` file passes from one agent to the next, and each agent improves one category of the output.

Final output produced by the last agent in the agentic TikZ system
Figure 3. Agentic system, last agent output. The final agent produces the corrected render after the file passes through the agent chain.

Through this agentic workflow, the system can generate graphics that are approximately 95% to 100% accurate in many cases. A small amount of human work, usually around 0% to 5%, may still be required for final inspection or correction. This makes mass production possible because the human is no longer responsible for designing, coding, and correcting every diagram manually. Instead, the human supervises the pipeline and handles only the small percentage of cases that the system cannot fully resolve.

3. Image-Generation and SVG Agents

We also extended the system to support realistic image generation inside TikZ-based graphics. For this capability, we added a dedicated image-generation agent. This agent analyzes the `.tex` code, identifies the locations where realistic images are required, generates the appropriate image prompts, sends those prompts to a paid image-generation model such as Nano Banana Pro, and then inserts the generated images into the TikZ output with the correct position, shape, scale, and visual context.

Image generated by Nano Banana Pro agent for insertion into TikZ output
Figure 5. Image generated by the Nano Banana Pro image agent after insertion into the TikZ-based graphic.

Note: In Figure 5, the image content inside the red circle was generated by Nano Banana Pro.

Similarly, when vector assets are required, we use a separate SVG-generation agent. This agent can generate or select SVG elements and integrate them into the TikZ workflow. For this component, we use open-source LLMs because SVG generation and structural vector reasoning can often be handled effectively without relying on expensive commercial models.

4. Large-Scale Testing and Conclusion

Large Language Models are powerful text calculators, not reasoning minds.

To build this full system, we tested a wide range of open-source and commercial LLMs and generated approximately 200,000 TikZ examples across many visual themes and diagram categories. This large-scale experimentation helped us understand where LLMs usually fail, which errors can be solved through prompting or fine-tuning, which errors require deterministic engineering, and which cases still require human judgment.

The key conclusion is that LLMs should not be used to solve every part of the problem. Some parts of TikZ generation are language and not reasoning problems, where LLMs are useful. Other parts are engineering problems, where scripts, validation systems, layout algorithms, rendering checks, and library-level modifications are more reliable.

The best system combines both approaches: use LLMs where they are strong, use engineering methods where deterministic control is required, and reserve human review only for the small percentage of cases that cannot be solved automatically.

We are currently working on creating an agentic system for producing educational animations.

For more samples of our TikZ graphics, please visit the X timeline and Reddit timeline.

We have also created two graphics books, which you can purchase by visiting the Buy page.

We also work with publishers, coaching institutes, schools, colleges, edtech companies, educators, authors, and training teams. For professional work or collaborations, please send us a DM on X/Twitter or email us.