Automating Figma-to-Code at Coinbase
TL;DR: Turning Figma designs into production-ready code is a repetitive part of building software. To address this, we built an AI agent that reads Figma screenshots and writes the corresponding production code. Our first pilot took four days, where the same feature normally takes two to four weeks. The setup has three parts: a reference implementation the agent studies, a rules document that pins down exactly what to generate, and prompt templates that engineers fill in and run. We've published all of it to our internal skills platform so any team can pick it up.

At Coinbase speed matters. Faster shipping means we deliver to customers sooner and react faster if the market shifts.
Not every part of building software is the same kind of work, though. Some of it is real engineering: system design, thinking through edge cases, coordinating with other teams. A lot of it isn't. Translating a Figma design into production code is mechanical: you map components, match layouts, wire up data fetching, write tests, and check everything against the team's bar before it can ship. It takes time and attention, but it doesn't really need an engineer's judgment.
So we tried something obvious. What if that translation step didn't need a human?
The Translation Problem
Every feature starts with a design handoff. A designer drops a Figma file. An engineer picks it up and translates it into code: mapping components, matching layouts, wiring data fetching, writing tests, and checking everything against team standards before merging.

In our internal tooling platform, this step was eating 80% of the delivery time. Every new product domain needs a set of backend endpoints (service definitions, handlers, permissions, audit logging) plus a frontend with tabbed pages, tables, filters, and detail views. A typical backend domain with eight to twelve endpoints is one to two weeks of work. The frontend, with its tabs and tables, is another one to two weeks.
The blocker wasn't an engineering capability. The patterns repeated across features and teams, and the work required enough care that you couldn't rush it, but not enough creative thinking to be interesting.
Engineers regularly contribute to codebases outside their own team, working across team boundaries to ship features faster. Getting up to speed in an unfamiliar codebase well enough to write production code from scratch has historically been a barrier to that kind of cross-team contribution. It can add days or weeks of ramp-up before an engineer is productive, and in the worst cases it blocks the contribution entirely.
The Approach: Rules, Prompts, and a Reference Implementation
We didn't want a generic code generator. We wanted something that knows how Coinbase ships software: our component patterns, our code conventions, what review expects. A generic tool gives you code. Ours gives you our code, which is what makes the output usable instead of a starting point you still have to rework.
The system has three parts:
Reference implementation. A complete, real feature that the agent reads as ground truth before generating anything new. This is by far the most important piece. Output quality tracks reference quality, much like few-shot learning depends on the quality of the examples you give it. Get this right first; everything else compounds on it.
Rules document. This tells the agent exactly what to generate, in what order, with what constraints. The frontend version is a 13-step checklist covering page structure, component patterns, data fetching, naming conventions, and the validation gates the code has to pass. The backend version covers 12 areas: service definition discovery, handler implementation, error handling, permissions, audit logging, and so on. The rules are explicit and opinionated, with no real room to improvise.
Prompt templates. Ready-to-use prompts that an engineer copies, fills in a few variables (product name, route, endpoint list), and pastes into an agent chat. We have five prompts on each side (frontend and backend), covering everything from a single-component page to a multi-component orchestration.

How the frontend workflow works
The engineer takes a screenshot of each tab in the Figma design and hands them to the agent along with the prompt template. We picked screenshots over API-based design integrations on purpose: a screenshot gives the agent the full visual picture of the page in one shot, without burning tokens on structured design data, and it preserves layout and component hierarchy in ways that structured exports tend to flatten.
The agent reads the screenshots, the rules, and the reference implementation, and writes the whole product page: page structure, tab sub-pages, data models, route registration, permission constants, data fetching queries, table and filter components, unit tests. Before it opens a PR, it runs every validation gate locally: linting, type checking, formatting, query compilation, tests with coverage thresholds.
How the backend workflow works
The backend works the same way, just with text input instead of screenshots. The engineer either lists the upstream service details explicitly, or lets the agent find them on its own using code search.
From there the agent writes service definitions with source citations on every field, handler implementations with validation and error forwarding, unit tests, service registration, permission configs, audit logging, and the test commands you'd actually run to hit each endpoint.
Solving the Large PR Problem
Our first runs produced working code, but they surfaced a problem that had nothing to do with the code itself.
When we generated a full frontend product page in one shot, we got a pull request with 85 files and over 6,300 additions. The backend version was 19 files and over 14,000 additions. Both were correct and both merged. But no reviewer can do a careful pass on a PR that size. The code was fine; the review process wasn't.
Any team using AI for code generation will run into this. Agents can produce a lot of correct code quickly. The human review process hasn't caught up. To fix it, we built what we call the orchestrator pattern.
The Orchestrator Pattern
Instead of dumping everything into one PR, the agent runs in three phases.

Phase 1: Plan. The agent reads all the design screenshots and writes out a component-by-component plan, then stops and waits for the engineer to sign off before it writes any code.
Phase 2: Scaffold. It creates the base folder structure, permissions, routing, and empty stubs. It runs every validation gate and opens a draft PR. This is the foundation that the rest of the work targets.
Phase 3: Parallel workers. The agent spawns one independent worker per component (one per tab on the frontend, one per upstream service group on the backend). Each worker implements its slice, runs every validation gate, and opens its own focused draft PR targeting the scaffold branch.
When we re-ran the frontend generation with the orchestrator, the same product page that was previously one 85-file PR became seven smaller PRs, each scoped to a single component. All seven were generated in a single afternoon. Review cycles got faster, the git history got cleaner, and the workflow now scales as feature complexity grows.
The Battle Test
All of this was theory until we tried it on a real feature with production stakes.
The opportunity was the FCM (Futures Commission Merchant) Derivatives pages for our internal tooling platform. A cross-functional team needed to build the whole domain: portfolio overviews, positions, orders, statements, liquidations, and transactions, with backend endpoints and the corresponding frontend pages.
We used the AI agent workflow to generate the entire feature:

The numbers aren't the whole story.
The feature went to the partner team for refinement and launch. Partway through, the lead engineer had to step away, and a teammate picked it up cold. The handoff was clean: the generated foundation was structured well enough that the new engineer kept moving without losing time. The launch had no production issues, in the new code or in anything around it. The new engineer was able to add more features that creatively solved problems we hadn’t discussed initially. In part because the foundational implementation was made smooth and easy, opening up time and space to think with judgement and creativity about operational UX.
FCM was a successful implementation. We set out to generate a solid working foundation where engineers can deploy faster, test earlier leaving open time for refining. The agent does the structural bulk; engineers do the parts that need context and judgment.
Humans are Still Involved
The agent isn't a replacement for engineering judgment.
What worked well out of the box:
File structure and naming conventions, when given a good reference implementation
Data fetching schema and query generation
Table, filter, and export component patterns
Service definitions matching existing conventions
Handler wiring, route registration, and permissions
Boilerplate consistency: mocks, manifest entries, audit logging
What needed human adjustment:
Edge cases in business logic that required domain knowledge
Fine-tuning UI details like spacing, column ordering, and label tweaks compared to the original design
Test coverage beyond the happy path; edge case tests sometimes needed additions
Integration nuances where upstream services had specific quirks
The pattern is straightforward. The agent does great on structural, convention-following code, and worse on judgment calls that need context the rules don't capture. That's the division of labor we were aiming for.
Lessons Learned
Reference implementation matters. A clean, well-structured example pays back across every generation cycle. You're teaching the agent your patterns by showing it.
Large PRs break review. Design around that from the start. AI didn't invent this problem, but it makes it impossible to ignore. Once you can produce thousands of lines in minutes, review is the bottleneck. We treat the orchestrator pattern as essential.
Rules prevent the mistakes you'd otherwise catch in review. Explicit constraints around validation, permissions, error handling, and audit logging head off common mistakes before they happen. Every rule you add saves review time on every future generation.
Prompts are a team asset, not a one-off. The prompt templates and rules documents get versioned, reviewed, and iterated on like production code. They encode institutional knowledge in a way that scales to every engineer who runs the tool.
Start with your most repetitive work. The highest ROI is where the patterns are most predictable: CRUD endpoints, product pages, data tables. Complex, novel features with unique business logic are not where this approach shines.
The Shift
What used to be weeks of careful translation now takes a few days. The faster turnaround is an easy quantifiable metric . The change that matters most is what engineers get to spend their time on.
Instead of two weeks writing structural code that follows patterns we already use, an engineer spends a few days reviewing and refining code an agent generated in hours. Construction work becomes review work. The output is the same production code, deployed faster, with more time left for the problems that actually need human judgment.
We built an agent that knows how Coinbase ships software, and gave engineers their time back for the work that needs them to think.



