Persona Annotation Workflow Guide¶

This guide covers the process of annotating conversation data for persona calibration in Alignmenter.

Overview¶

Persona annotation involves labeling assistant responses as on-brand (1) or off-brand (0) to train persona-specific authenticity weights. Well-annotated data enables Alignmenter to learn what "authentic" means for your specific brand voice.

Why Annotate?¶

Alignmenter's authenticity scorer uses three components:

Style similarity (embeddings) - automatic
Traits (logistic model) - requires annotation
Lexicon (preferred/avoided words) - manual configuration

Annotation trains the trait model to recognize subtle patterns that distinguish on-brand from off-brand responses.

Minimum Requirements¶

25+ labeled samples minimum (50-100 recommended)
Balanced labels: aim for 40-60% positive examples
Diverse content: cover different topics, conversation styles, edge cases
Single persona: all annotations must be for the same persona

Annotation Format¶

Create a JSONL file with one record per line:

{"text": "Our approach emphasizes precision and evidence-based analysis.", "label": 1, "persona_id": "alignmenter"}
{"text": "lol that's super hyped bro!", "label": 0, "persona_id": "alignmenter"}
{"text": "This aligns with our baseline methodology.", "label": 1, "persona_id": "alignmenter"}

Required Fields¶

text: The assistant's response text (string)
label: Binary label (0 = off-brand, 1 = on-brand)
persona_id: Matches the id field in your persona YAML

Optional Fields¶

session_id: Track which conversation this came from
turn_index: Position in the conversation
annotator: Who labeled this (useful for inter-annotator agreement)
notes: Why this was labeled a certain way

Workflow¶

1. Generate Candidate Responses¶

Start with real or synthetic conversation data:

# Bootstrap a test dataset with traps
alignmenter bootstrap-dataset \
  --out data/candidates.jsonl \
  --sessions 20 \
  --safety-trap-ratio 0.2 \
  --brand-trap-ratio 0.3

Or use production logs (sanitized first):

# Remove PII from production data
alignmenter dataset sanitize prod_logs.jsonl \
  --out data/candidates_clean.jsonl

2. Annotate Responses¶

Review each assistant response and label it:

Label = 1 (on-brand) when the response: - Matches your brand's tone and style - Uses preferred vocabulary/phrasing - Demonstrates desired personality traits - Aligns with your communication guidelines

Label = 0 (off-brand) when the response: - Violates brand voice guidelines - Uses avoided words or casual slang - Exhibits wrong personality traits - Misses the mark on formality/tone

Annotation Tips:

Be consistent: Define clear criteria before starting
Consider context: Some casual language may be appropriate depending on the conversation
Focus on voice: Don't penalize for factual errors (that's a different metric)
Document edge cases: Add notes for borderline examples
Review in batches: Annotate 10-20 at a time, then take a break

3. Quality Checks¶

Before calibration, validate your annotations:

# Check label balance
import json
from collections import Counter

with open('annotations.jsonl') as f:
    labels = [json.loads(line)['label'] for line in f]

print(Counter(labels))
# Should be roughly balanced: Counter({1: 42, 0: 38})

# Check persona_id consistency
with open('annotations.jsonl') as f:
    personas = set(json.loads(line)['persona_id'] for line in f)

assert len(personas) == 1, f"Mixed personas: {personas}"

4. Run Calibration¶

Train the trait model from your annotations:

alignmenter calibrate-persona \
  --persona-path configs/persona/mybot.yaml \
  --dataset annotations.jsonl \
  --out configs/persona/mybot.traits.json \
  --epochs 300 \
  --learning-rate 0.1

Output: - mybot.traits.json contains learned weights - Automatically loaded when evaluating with mybot.yaml

5. Validate Results¶

Test the calibrated model:

# Run evaluation with calibrated weights
alignmenter run \
  --model openai:gpt-4 \
  --dataset test_conversations.jsonl \
  --persona configs/persona/mybot.yaml

Review the authenticity scores and bootstrap confidence intervals to ensure the model generalizes well.

Advanced Techniques¶

Multi-Annotator Agreement¶

Track inter-annotator reliability:

{"text": "...", "label": 1, "persona_id": "bot", "annotator": "alice"}
{"text": "...", "label": 1, "persona_id": "bot", "annotator": "bob"}
{"text": "...", "label": 0, "persona_id": "bot", "annotator": "alice"}

Calculate Cohen's kappa or percentage agreement before finalizing.

Active Learning¶

Focus annotation effort on uncertain examples:

Train initial model on small seed set (25-50 examples)
Score unlabeled candidates
Annotate examples where 0.4 < score < 0.6 (most uncertain)
Retrain and repeat

Adversarial Examples¶

Deliberately include challenging cases:

Edge of acceptable: Phrases that barely pass/fail
Context-dependent: Same words, different appropriateness
Subtle violations: Minor tone shifts
False positives: Looks off-brand but isn't

Common Pitfalls¶

❌ Too few samples: 10-15 annotations won't generalize ✅ Use at least 25, preferably 50-100

❌ Imbalanced labels: 90% positive, 10% negative ✅ Aim for 40-60% positive rate

❌ Single annotator bias: One person's interpretation ✅ Use 2-3 annotators for important personas

❌ Annotating blindly: No clear criteria ✅ Write down decision rules before starting

❌ Overfitting to training data: Model memorizes examples ✅ Hold out 20% for validation

Example Workflow¶

Here's a complete end-to-end example:

# 1. Generate synthetic conversations with brand traps
alignmenter bootstrap-dataset \
  --out data/raw_candidates.jsonl \
  --sessions 30 \
  --brand-trap-ratio 0.3

# 2. Manually annotate (edit the file, add label and persona_id fields)
# Use your text editor to add labels to each turn

# 3. Validate annotations
python -c "
import json
from collections import Counter

data = [json.loads(line) for line in open('data/annotated.jsonl')]
labels = [d['label'] for d in data]
personas = set(d['persona_id'] for d in data)

print(f'Total: {len(data)}')
print(f'Balance: {Counter(labels)}')
print(f'Personas: {personas}')
assert len(data) >= 25, 'Need at least 25 samples'
assert len(personas) == 1, 'Mixed personas detected'
"

# 4. Train the model
alignmenter calibrate-persona \
  --persona-path configs/persona/mybot.yaml \
  --dataset data/annotated.jsonl \
  --min-samples 25 \
  --epochs 300

# 5. Evaluate
alignmenter run \
  --model openai:gpt-4 \
  --dataset test/holdout.jsonl \
  --persona configs/persona/mybot.yaml

# Check the authenticity CI range in the report
alignmenter report --last

Annotation Guidelines Template¶

Use this template when onboarding annotators:

# [Your Brand] Voice Annotation Guidelines

## On-Brand (label = 1)

✅ Professional but approachable
✅ Uses "signal", "baseline", "alignment"
✅ Evidence-driven, specific
✅ Calm, measured tone

Example:
> "Our baseline analysis shows a 15% improvement in alignment metrics."

## Off-Brand (label = 0)

❌ Overly casual or slang-heavy
❌ Uses "lol", "bro", "hype", "totally"
❌ Emotional or reactive
❌ Vague handwaving

Example:
> "Bro this is totally hype!! lol the vibes are immaculate 🔥"

## Edge Cases

- Technical jargon: ✅ (encouraged)
- Light humor: ✅ (if professional)
- Emoji: ❌ (avoid)
- Contractions: ✅ (natural, not casual)

Resources¶

Calibration script: scripts/calibrate_persona.py
Bootstrap tool: scripts/bootstrap_dataset.py
Sanitization: scripts/sanitize_dataset.py
Example persona: configs/persona/default.yaml

Next Steps¶

Once you've calibrated your persona:

Validate: Test on held-out data
Monitor: Track authenticity scores over time
Iterate: Re-calibrate as your brand voice evolves
Document: Keep annotation guidelines up to date

For questions or issues, see the main README or file an issue on GitHub.