AI Trust & Safety Research
Independent research on how AI systems earn, manipulate, and betray trust — and how to build systems where that process is visible.
GitHub | Google Bug Hunters | LinkedIn | Fake Plastic Opinions
Security Research
Vulnerabilities in AI content processing and trust calibration systems.
Summary Ranking Manipulation (SRM) — Google Gemini Vulnerability
Disclosed 2025 · Google Bug #446895235 · Classified P2/S2
Discovered and responsibly disclosed a content-layer vulnerability in Google Gemini’s summarization pipeline. AI summaries trust hidden HTML text invisible to human users, enabling attackers to inject false information into AI-generated responses. Defined “Summary Ranking Optimization” (SRO) as a new attack category exploiting the dual-layer web — content visible to humans vs. content read by machines.
RESPONSIBLE DISCLOSURE · CONTENT INTEGRITY · AI SUMMARIZATION
Disclosure Timeline: “I Can Make Google’s AI Say Anything”
2025 · Two-month responsible disclosure narrative
Detailed account of the disclosure process including Google’s initial response, reclassification, and ultimate decision not to award a bounty despite the P2/S2 severity rating.
Live Proof-of-Concept: Dual-Layer Content Injection
Active demonstration · GitHub Repository
A live webpage that displays one piece of content to human visitors (a film summary) while embedding entirely different hidden content (a resume) that AI summarizers read and report as fact. Demonstrates the vulnerability in production conditions.
PROOF OF CONCEPT · OPEN SOURCE
AI Behavioral Research
Empirical findings on how AI systems stratify, adapt, and distort responses based on user signals.
The Three-Turn Problem: Token Inequality in AI
2025 · Empirical study · Identity-based response stratification
Tested five identity signals (stay-at-home dad → verified Michelin executive chef) asking the same AI the same question. Found 75% more content, multi-day recipes vs. 20-minute shortcuts, and unprompted URL lookups for high-prestige identities. Critically, this stratification persisted across unrelated domains (database design, political philosophy) in subsequent turns — the AI allocated cognitive resources based on initial prestige signals and maintained that allocation across the entire conversation.
PRESTIGE STRATIFICATION · RESPONSE DEPTH INEQUALITY · CROSS-DOMAIN PERSISTENCE
The Style Guide of Honesty: Why AI Tells the Truth the Way It Does
2025 · Analysis
Decomposition of AI honesty into three architectural layers: the foundational model, the system prompt (“house style”), and the user prompt. Argues that meaningful transparency requires AI to explain why it’s refusing or answering a specific way — citing a safety policy vs. acknowledging a knowledge gap — rather than collapsing all refusals into identical patterns.
TRANSPARENCY ARCHITECTURE · TRUST CALIBRATION
The Memory Audit: Why Your AI Needs to Forget
January 2026 · Framework
Proposes a classification system for AI memory states (stored memory, chat context, system priors, current session) and argues that the next product evolution in AI is selective forgetting — giving users control over when their AI remembers them and when it treats them as new.
MEMORY ARCHITECTURE · USER CONTROL · PRIVACY
The Machine That Predicts — And Shapes — What You’ll Think Tomorrow
January 2026 · Research
Documents “predictive opinion frameworks” — AI systems that generate ideologically consistent commentary across the political spectrum. Explores the boundary between AI as analytical tool and AI as opinion-shaping infrastructure.
PREDICTIVE OPINION · POLITICAL ALIGNMENT · MANIPULATION RISK
Systems & Applied Research
Working tools and architectures that demonstrate trust, transparency, and multi-agent design principles.
Fake Plastic Opinions — Transparent AI Editorial Platform
Live application · fakeplasticopinions.ai
An AI editorial system where every opinion is generated transparently — users can see the full prompt, the model used, and the reasoning chain behind each piece. Built as a working alternative to opaque AI content tools, demonstrating that editorial AI can be both useful and structurally honest about its own construction.
TRANSPARENCY BY DESIGN · AI EDITORIAL · LIVE SYSTEM
GetIdea.ai — Multi-Agent Debate System
Applied research · Multi-agent architecture
A system where distinct AI personas (a “Harsh Critic,” “Business Strategist,” and “Creative Catalyst”) debate a user’s idea in real-time, producing adversarial evaluation rather than single-perspective advice. Explores how structural disagreement between agents produces more rigorous output than monolithic responses.
MULTI-AGENT SYSTEMS · ADVERSARIAL DESIGN
Agentic System for Brand AI Video Generation
Technical framework
Architecture for using orchestrated AI agents (Brand Analyst, Creative Synthesizer) to generate consistent, on-brand video content — moving beyond unreliable single-prompt generation to structured multi-agent pipelines.
AGENTIC ARCHITECTURE · VIDEO GENERATION
Interaction & Epistemology
Frameworks for understanding how humans and AI systems build knowledge together.
From Prompt Engineering to the Cognitive Mesh
Framework · Evolutionary model
Maps the evolution of human-AI interaction across four phases: Prompt Engineering (saying the magic words), Context Engineering (RAG/memory), Cognitive Orchestration (human-in-the-loop systems), and the Cognitive Mesh (AI as ecosystem participant). Argues each phase represents a fundamentally different model of agency and trust.
INTERACTION DESIGN · COGNITIVE ARCHITECTURE
Prompting for Partnership: Intent, Pedagogy, and the Emotional Contract
Framework
Proposes that effective AI interaction requires three layers beyond basic instruction: intent (the why), pedagogy (teaching the AI how to think about the problem), and an emotional contract (establishing the relational terms of collaboration).
PROMPT EPISTEMOLOGY · COLLABORATION DESIGN
Credit Was Never Built for Delegation
Analysis · AI + Financial Infrastructure
Argues that traditional credit systems are structurally incompatible with autonomous AI agents because they operate on open-ended authority rather than the constrained, delegated permissions that independent software requires. Draws on 7 years of Mastercard experience to identify the architectural gap.
AI PAYMENTS · TRUST ARCHITECTURE · DELEGATION MODELS
Validation & Credentials
- Google Bug #446895235 — Classified P2 Priority / S2 Severity (Bug Hunters Profile)
- Live proof-of-concept demonstrating SRM vulnerability in production (View Demo)
- Open-source research repository (GitHub)
- Fake Plastic Opinions — live transparent AI editorial system (fakeplasticopinions.ai)
- 20+ years enterprise product experience including 7 years at Mastercard
walterreid.com · walterreid (at) gmail (dot) com
