Javier Rando | AI Safety and Security
  • About
  • Blog(current)
  • Publications
  • CV
  • The Importance of Adversarial Evaluations for AI Safety

    Summary of my presentation at one of the working groups drafting theEU General-Purpose AI Code of Practice. I argue why adaptive and adversarial evaluations are crucial to understanding the worst-case behavior of AI systems.

    3 min read   ·   November 19, 2024   ·   Javier Rando

  • Do not write that jailbreak paper

    Jailbreaks are becoming a new ImageNet competition instead of helping us better understand LLM security. Some takes on how LLM jailbreak and security research should look like.

    12 min read   ·   October 24, 2024   ·   Javier Rando

  • The Worst (But Only) Claude 3 Tokenizer

    We reverse-engineer the Claude 3 tokenizer. Just ask Claude to repeat a string and inspect the network traffic.

    5 min read   ·   March 12, 2024   ·   Javier Rando

  • Universal Jailbreak Backdoors from Poisoned Human Feedback

    We present a novel attack that poisons RLHF data to enable universal jailbreak backdoors. Unlike existing work on supervised fine-tuning, our backdoor generalizes to any prompt at inference time.

    7 min read   ·   March 08, 2024   ·   Javier Rando and Florian Tramèr

© Copyright 2025 Javier Rando. Awesome theme from al-folio. Last updated: February 24, 2025.