2024
an archive of posts from this year
Nov 19, 2024 | The Importance of Adversarial Evaluations for AI Safety |
---|---|
Oct 24, 2024 | Do not write that jailbreak paper |
Mar 12, 2024 | The Worst (But Only) Claude 3 Tokenizer |
Mar 8, 2024 | Universal Jailbreak Backdoors from Poisoned Human Feedback |