Anthropic announces a new safety classifier that eradicates jailbreaks and further increases Claude's over-refusal rate