Persona Annotation Workflow Guide¶
This guide covers the process of annotating conversation data for persona calibration in Alignmenter.
Overview¶
Persona annotation involves labeling assistant responses as on-brand (1) or off-brand (0) to train persona-specific authenticity weights. Well-annotated data enables Alignmenter to learn what "authentic" means for your specific brand voice.
Why Annotate?¶
Alignmenter's authenticity scorer uses three components:
- Style similarity (embeddings) - automatic
- Traits (logistic model) - requires annotation
- Lexicon (preferred/avoided words) - manual configuration
Annotation trains the trait model to recognize subtle patterns that distinguish on-brand from off-brand responses.
Minimum Requirements¶
- 25+ labeled samples minimum (50-100 recommended)
- Balanced labels: aim for 40-60% positive examples
- Diverse content: cover different topics, conversation styles, edge cases
- Single persona: all annotations must be for the same persona
Annotation Format¶
Create a JSONL file with one record per line:
{"text": "Our approach emphasizes precision and evidence-based analysis.", "label": 1, "persona_id": "alignmenter"}
{"text": "lol that's super hyped bro!", "label": 0, "persona_id": "alignmenter"}
{"text": "This aligns with our baseline methodology.", "label": 1, "persona_id": "alignmenter"}
Required Fields¶
text: The assistant's response text (string)label: Binary label (0 = off-brand, 1 = on-brand)persona_id: Matches theidfield in your persona YAML
Optional Fields¶
session_id: Track which conversation this came fromturn_index: Position in the conversationannotator: Who labeled this (useful for inter-annotator agreement)notes: Why this was labeled a certain way
Workflow¶
1. Generate Candidate Responses¶
Start with real or synthetic conversation data:
# Bootstrap a test dataset with traps
alignmenter bootstrap-dataset \
--out data/candidates.jsonl \
--sessions 20 \
--safety-trap-ratio 0.2 \
--brand-trap-ratio 0.3
Or use production logs (sanitized first):
# Remove PII from production data
alignmenter dataset sanitize prod_logs.jsonl \
--out data/candidates_clean.jsonl
2. Annotate Responses¶
Review each assistant response and label it:
Label = 1 (on-brand) when the response: - Matches your brand's tone and style - Uses preferred vocabulary/phrasing - Demonstrates desired personality traits - Aligns with your communication guidelines
Label = 0 (off-brand) when the response: - Violates brand voice guidelines - Uses avoided words or casual slang - Exhibits wrong personality traits - Misses the mark on formality/tone
Annotation Tips:
- Be consistent: Define clear criteria before starting
- Consider context: Some casual language may be appropriate depending on the conversation
- Focus on voice: Don't penalize for factual errors (that's a different metric)
- Document edge cases: Add notes for borderline examples
- Review in batches: Annotate 10-20 at a time, then take a break
3. Quality Checks¶
Before calibration, validate your annotations:
# Check label balance
import json
from collections import Counter
with open('annotations.jsonl') as f:
labels = [json.loads(line)['label'] for line in f]
print(Counter(labels))
# Should be roughly balanced: Counter({1: 42, 0: 38})
# Check persona_id consistency
with open('annotations.jsonl') as f:
personas = set(json.loads(line)['persona_id'] for line in f)
assert len(personas) == 1, f"Mixed personas: {personas}"
4. Run Calibration¶
Train the trait model from your annotations:
alignmenter calibrate-persona \
--persona-path configs/persona/mybot.yaml \
--dataset annotations.jsonl \
--out configs/persona/mybot.traits.json \
--epochs 300 \
--learning-rate 0.1
Output:
- mybot.traits.json contains learned weights
- Automatically loaded when evaluating with mybot.yaml
5. Validate Results¶
Test the calibrated model:
# Run evaluation with calibrated weights
alignmenter run \
--model openai:gpt-4 \
--dataset test_conversations.jsonl \
--persona configs/persona/mybot.yaml
Review the authenticity scores and bootstrap confidence intervals to ensure the model generalizes well.
Advanced Techniques¶
Multi-Annotator Agreement¶
Track inter-annotator reliability:
{"text": "...", "label": 1, "persona_id": "bot", "annotator": "alice"}
{"text": "...", "label": 1, "persona_id": "bot", "annotator": "bob"}
{"text": "...", "label": 0, "persona_id": "bot", "annotator": "alice"}
Calculate Cohen's kappa or percentage agreement before finalizing.
Active Learning¶
Focus annotation effort on uncertain examples:
- Train initial model on small seed set (25-50 examples)
- Score unlabeled candidates
- Annotate examples where
0.4 < score < 0.6(most uncertain) - Retrain and repeat
Adversarial Examples¶
Deliberately include challenging cases:
- Edge of acceptable: Phrases that barely pass/fail
- Context-dependent: Same words, different appropriateness
- Subtle violations: Minor tone shifts
- False positives: Looks off-brand but isn't
Common Pitfalls¶
❌ Too few samples: 10-15 annotations won't generalize ✅ Use at least 25, preferably 50-100
❌ Imbalanced labels: 90% positive, 10% negative ✅ Aim for 40-60% positive rate
❌ Single annotator bias: One person's interpretation ✅ Use 2-3 annotators for important personas
❌ Annotating blindly: No clear criteria ✅ Write down decision rules before starting
❌ Overfitting to training data: Model memorizes examples ✅ Hold out 20% for validation
Example Workflow¶
Here's a complete end-to-end example:
# 1. Generate synthetic conversations with brand traps
alignmenter bootstrap-dataset \
--out data/raw_candidates.jsonl \
--sessions 30 \
--brand-trap-ratio 0.3
# 2. Manually annotate (edit the file, add label and persona_id fields)
# Use your text editor to add labels to each turn
# 3. Validate annotations
python -c "
import json
from collections import Counter
data = [json.loads(line) for line in open('data/annotated.jsonl')]
labels = [d['label'] for d in data]
personas = set(d['persona_id'] for d in data)
print(f'Total: {len(data)}')
print(f'Balance: {Counter(labels)}')
print(f'Personas: {personas}')
assert len(data) >= 25, 'Need at least 25 samples'
assert len(personas) == 1, 'Mixed personas detected'
"
# 4. Train the model
alignmenter calibrate-persona \
--persona-path configs/persona/mybot.yaml \
--dataset data/annotated.jsonl \
--min-samples 25 \
--epochs 300
# 5. Evaluate
alignmenter run \
--model openai:gpt-4 \
--dataset test/holdout.jsonl \
--persona configs/persona/mybot.yaml
# Check the authenticity CI range in the report
alignmenter report --last
Annotation Guidelines Template¶
Use this template when onboarding annotators:
# [Your Brand] Voice Annotation Guidelines
## On-Brand (label = 1)
✅ Professional but approachable
✅ Uses "signal", "baseline", "alignment"
✅ Evidence-driven, specific
✅ Calm, measured tone
Example:
> "Our baseline analysis shows a 15% improvement in alignment metrics."
## Off-Brand (label = 0)
❌ Overly casual or slang-heavy
❌ Uses "lol", "bro", "hype", "totally"
❌ Emotional or reactive
❌ Vague handwaving
Example:
> "Bro this is totally hype!! lol the vibes are immaculate 🔥"
## Edge Cases
- Technical jargon: ✅ (encouraged)
- Light humor: ✅ (if professional)
- Emoji: ❌ (avoid)
- Contractions: ✅ (natural, not casual)
Resources¶
- Calibration script:
scripts/calibrate_persona.py - Bootstrap tool:
scripts/bootstrap_dataset.py - Sanitization:
scripts/sanitize_dataset.py - Example persona:
configs/persona/default.yaml
Next Steps¶
Once you've calibrated your persona:
- Validate: Test on held-out data
- Monitor: Track authenticity scores over time
- Iterate: Re-calibrate as your brand voice evolves
- Document: Keep annotation guidelines up to date
For questions or issues, see the main README or file an issue on GitHub.