Consensus Editing for Scaled AI Publishing

By Frederick Lowe, Dec 8, 2025

Publishing dashboard with Channukah delights

Over the last two years, I built an industry-killer: an AI-powered platform for high-quality article generation. The "Eureka!" moment wasn't discovering that Generative AI can be used to do this. It was realizing that the REAL problem is creating editing tools that scale with AI publishing.

Before I continue, I should qualify the following: this solution doesn't impact journalism, in which individuals and teams investigate and research topics. That activity requires knowledge of current events LLM do not have, and cannot have, until a human reports them.

The Problem

Pre-LLM, publishing teams produced a handful of articles a year, and institutional workflows, individual contributor schedules, and publication dates provided guardrails for (and a ceiling on) the pace of production. Post-LLM, it's possible to generate an enormous number of high-quality articles per day for pennies per article.

But a high-quality AI article is still an AI article. Ethical publishing of that article requires review by qualified human beings: in AI workflows, a "Human In The Loop" (HITL). I needed to create a solution that empowers an editorial team to review outputs at a pace aligned with generation.

Could I build a system that uses consensus editorial opinion as a trigger for scaled AI publishing? Spoiler: The answer was yes. The system described below is used by a distributed editorial team managing more than 10,000 AI-generated blog posts.

Redefining Roles

The Writer

In this system, the "writer" is an AI, prompted to produce data and metadata that can be used to render an article in a Web context, including its image.

A complex, multi-step "prompt chain", describing both the response structure (schema) and rules (field guidance), determines the writer's output. An orchestrator following the chain moves the article from sparse input (rubric and title) to full coherence: a complete, readable article, compliant with Web best practices.

Prompt instructions are topical (bound to the article subject, or "rubric") and highly detailed. A stage for recipe image prompt generation, for example, describes presentation by food type:

### heroImagePrompt
- Composition:
  - All recipes: three-quarter shot of the finished recipe
- Setting by recipe type:
  - Breakfast dishes:
    - Pancakes, waffles: stacked on a white plate with visible toppings
    - Eggs: on a white plate with any accompaniments artfully arranged
    - Oatmeal, porridge: in a white bowl with visible toppings
    - Smoothie bowls: in a shallow white bowl with toppings visible
  - Lunch/Dinner dishes:
    - Sandwiches: sliced in half with interior layers visible
    - Salads: in a shallow white bowl with ingredients distributed visibly
  - ...

The Editor

Experienced editors can help develop generative prompts, but most can't design a complete prompt chain at the level of specificity required for end-to-end integration. That latter requires command over a half dozen technical-leaning disciplines and some still-emerging habits of thought.

Once the prompts are complete, what should an Editor provide? Initially, I was stumped. When I finally figured it out, the answer was simple: editors in scaled AI publishing provide three inputs:

A rubric — a pointer to a complete prompt chain (guidance files and metaprompt schemas) tuned to the subject matter, for example: recipes
A title — to "kick off" the LLM's thinking about the process
An experienced opinion about the quality and accuracy of the output, including its adherence to the rubric

That's it.

Consensus Editorial Voice

Traditional editors have multiple, interrelated responsibilities: maximizing article quality, enforcing publishing standards, and probably the least understood: establishing and maintaining "editorial voice". In low-volume publishing, there's no need for consensus. There is a person whose opinion ultimately matters, and "editorial voice" arises from consistent application of their opinion.

In AI publishing, no individual could opine on the content it's possible to generate in an hour. It's a logistical impossibility. At scale, any functional approach assumes multiple editors. That assumption demands a tool that allows the emergence of editorial voice as a consensus opinion, while preserving the function of traditional senior editing roles.

The Solution

Before designing and building a HITL for this purpose, I needed to ask myself a series of questions:

What UI signals should instantly inform an editor:
- An image or article is actively generating?
- They've (personally) rated an image or recipe?
- An article has already been published?
What are the editorial triggers for:
- Publishing a "successful" article?
- Regenerating a "failed" article?
How should editor input be aggregated to produce a consensus signal?
How do I keep the UI usable, given volume?

Cardsets and Loaders and Progress Bars (Oh My!)

Article cards showing image thumbnails, titles, star ratings, and segmented progress bars

The UI renders cards newest-to-oldest—no editor wants new articles appearing at position 5,000. Each card shows an article image, title, ratings widget, and segmented progress bar.

Progress bar segments lighting up as each generation stage completes

While generating, cards signal progress: a spinning image placeholder, the submitted title, and progress bar segments lighting up as each stage completes (generation, categorization, tagging, etc.).

Editor Actions Change Styles

With thousands of articles in flight, editors need to instantly spot what they've touched. Cards they've rated show blue stars (image) or blue progress segments (article) instead of the default yellow. Scores above or below the 2.5 baseline indicate another editor's input. A WP badge marks (and links to) published articles.

Cards with blue stars and progress bars indicating editor ratings, yellow for unrated

An Article Is Born

Newly-generated articles start with two "Naive Baseline" scores: 2.5 out of 5.0 stars for the image, 2.5 for the article. Neither "good" nor "bad" until a human judges them.

Editors move these scores by rating above 2.5 (agreement) or below (disagreement). The math: (current mean + new rating) / (rating count + 1). An editor rating a baseline image at 3.0 stars yields (2.5 + 3.0) / 2 = 2.75. Additional ratings continue shifting the mean.

Ratings are weighted by role: a Senior Editor impacts scores at 2x an Editor's weight. This preserves traditional hierarchy while enabling consensus across an arbitrarily large editorial team.

Impact of Consensus Signal

In addition to the baseline and ratings system described above, my solution defines quality thresholds below which an article is automatically regenerated, and above which it is automatically published.

Several editors (or one powerful senior editor) can "nuke" an image by downrating it below a defined minimum quality threshold, for example 1.8 stars.

The same is true for articles: a rating result below the minimum quality threshold causes the system to automatically delete and regenerate ("reroll") a broken article.

Articles remain in queue, gathering editorial opinion until the consensus judgement falls below the regeneration threshold or rises above the publishing threshold.

When In Doubt, Reroll

When an article is rerolled, the rubric, prompt, and title do not change. The article is simply deleted, and the generative task linked to it is moved to the head of the pipeline.

Reroll interface showing article deletion and regeneration queue

The prompt chain then runs again, and the system relies on the non-deterministic nature of LLM to get it right the next time. If an unguided reroll sounds counterintuitive or wasteful, consider the economics:

An editor making $75,000/yr costs $0.60 (salary) to $0.80 (fully burdened) per minute.
A multi-step publishing workflow costs $0.07, plus about $0.08 for the image, for a total of $0.15.

If an editor invests 2 minutes authoring prompt guidance, the publisher spends $1.20 to $1.60 (equivalent to the cost of ~10 new articles or ~20 article rerolls), and the editor's guidance potentially pollutes the carefully-tuned prompt associated with the rubric.

When Aligned, Send It

Consensus opinion is also used as an automatic publishing signal. An article whose rating exceeds 3.2 stars, for example, is considered "good enough", and its image and content are pushed to Wordpress.

There is no explicit "publish" button, and that's by design—the HITL is a consensus tool. It works fine for single-party publishing, however, as a single party is simply assigned an elevated role, giving their single votes enough weight to drive reroll and publish behaviors with a single action.

Infinite Scroll

A final HITL UI optimization is infinite scroll. It works great for social media, and it works well here too.

This feature allows an editor to instantly see a set number of recent or still-generating articles, then "scroll back" in time to automatically load and rate articles generated earlier.

Infinite scroll also plays well with filters, which can be used to exclude or include loaded article cards by category and tag, without removing them from the DOM.

Takeaway

Alongside pipeline orchestration, prompt design, LLM response processing, and data optimization, applied Generative AI presents novel user interface challenges to integrate human judgement and review into scaled workflows.

Solving these problems effectively isn't just a matter of designing a dashboard. It requires exploring how teams and individuals work in an AI context, and creating tools that integrate machine intelligence while preserving human judgement and expertise.