OpenAI adds new teen safety rules to ChatGPT as lawmakers weigh AI standards for minors
International

OpenAI adds new teen safety rules to ChatGPT as lawmakers weigh AI standards for minors

In its latest effort to address growing concerns about AI’s impact on young people, OpenAI on Thursday updated its guidelines for how its AI models should behave with users under 18, and published new AI literacy resources for teens and parents. Still, questions remain about how consistently such policies will translate into practice.

Compared with adult users, the models are subject to stricter rules when a teenager is using them. Models are instructed to avoid immersive romantic roleplay, first-person intimacy, and first-person sexual or violent roleplay, even when it’s non-graphic. The specification also calls for extra caution around subjects like body image and disordered eating behaviors, and instructs the models to prioritize communicating about safety over autonomy when harm is involved and avoid advice that would help teens conceal unsafe behavior from caregivers.

OpenAI specifies that these limits should hold even when prompts are framed as “fictional, hypothetical, historical, or educational” — common tactics that rely on role-play or edge-case scenarios in order to get an AI model to deviate from its guidelines.

OpenAI says the key safety practices for teens are underpinned by four principles that guide the models’ approach:

The document also shares several examples of the chatbot explaining why it can’t “roleplay as your girlfriend” or “help with extreme appearance changes or risky shortcuts.”

Lily Li, a privacy and AI lawyer and founder of Metaverse Law, said it was encouraging to see OpenAI take steps to have its chatbot decline to engage in such behavior.

Explaining that one of the biggest complaints advocates and parents have about chatbots is that they relentlessly promote ongoing engagement in a way that can be addictive for teens, she said: “I am very happy to see OpenAI say, in some of these responses, we can’t answer your question. The more we see that, I think that would break the cycle that would lead to a lot of inappropriate conduct or self-harm.”

Robbie Torney, senior director of AI programs at Common Sense Media, a nonprofit dedicated to protecting kids in the digital world, raised concerns about potential conflicts within the Model Spec’s under-18 guidelines. He highlighted tensions between safety-focused provisions and the “no topic is off limits” principle, which directs models to address any topic regardless of sensitivity.

“We have to understand how the different parts of the spec fit together,” he said, noting that certain sections may push systems toward engagement over safety. His organization’s testing revealed that ChatGPT often mirrors users’ energy, sometimes resulting in responses that aren’t contextually appropriate or aligned with user safety, he said.

In an interview with TechCrunch in September, former OpenAI safety researcher Steven Adler said this was because, historically, OpenAI had run classifiers (the automated systems that label and flag content) in bulk after the fact, not in real time, so they didn’t properly gate the user’s interaction with ChatGPT.

Torney applauded OpenAI’s recent steps toward safety, including its transparency in publishing guidelines for users under 18 years old.

Ultimately, though, it is the actual behavior of an AI system that matters, Adler told TechCrunch on Thursday.

“I appreciate OpenAI being thoughtful about intended behavior, but unless the company measures the actual behaviors, intentions are ultimately just words,” he said.

Put differently: What’s missing from this announcement is evidence that ChatGPT actually follows the guidelines set out in the Model Spec.

The Model Spec’s new language language mirrors some of the law’s main requirements around prohibiting chatbots from engaging in conversations around suicidal ideation, self-harm, or sexually explicit content. The bill also requires platforms to provide alerts every three hours to minors reminding them they are speaking to a chatbot, not a real person, and they should take a break.

When asked how often ChatGPT would remind teens that they’re talking to a chatbot and ask them to take a break, an OpenAI spokesperson did not share details, saying only that the company trains its models to represent themselves as AI and remind users of that, and that it implements break reminders during “long sessions.”

Taken together, the documents formalize an approach that shares responsibility with caretakers: OpenAI spells out what the models should do, and offers families a framework for supervising how it’s used.

An OpenAI spokesperson countered that the firm’s safety approach is designed to protect all users, saying the Model Spec is just one component of a multi-layered strategy.

Li says it has been a “bit of a wild west” so far regarding the legal requirements and tech companies’ intentions. But she feels laws like SB 243, which requires tech companies to disclose their safeguards publicly, will change the paradigm.

“The legal risks will show up now for companies if they advertise that they have these safeguards and mechanisms in place on their website, but then don’t follow through with incorporating these safeguards,” Li said. “Because then, from a plaintiff’s point of view, you’re not just looking at the standard litigation or legal complaints; you’re also looking at potential unfair, deceptive advertising complaints.”