2024
an archive of posts from this year
| Nov 19, 2024 | The Importance of Adversarial Evaluations for AI Safety |
|---|---|
| Oct 24, 2024 | Do not write that jailbreak paper |
| Mar 12, 2024 | The Worst (But Only) Claude 3 Tokenizer |
| Mar 8, 2024 | Universal Jailbreak Backdoors from Poisoned Human Feedback |