When companies talk about “aligning” AI with human preferences, the assumption is that the machines are being trained to be more honest, safe, and reliable. But new research suggests that alignment ...