How a little semantics can change an estimate
Oct. 23, 2025 | Categories: Education ProbabilityLast Modified: Dec. 19, 2025, 5:34 a.m.
NPR Headline
I recently came across this NPR article that links a study from the Center for Democracy and Technology (CDT) that claims that 1 in 5 (really closer to 19% in the survey) high school students either have or know someone who has had a romantic relationship with an AI.
When I first saw this headline, I was extremely surprised. 1 in 5? That is a lot! Is AI relationships truly this prevalent of an issue among high schoolers?
Then I realized that perhaps the statement "or know someone" is carrying most of the mathematical weight here. Let's assume that the surveyed group of students consisted of a representative sample of high schoolers.
Let's let \( p \) denote the actual proportion of students that have had a romantic relationship with AI. Now, if someone is in a relationship with an AI, there's the additional probability that they would tell other people, call that \( t \). For this problem, let's assume that if someone is in a relationship with AI, others are given that information. Let's also assume that whether someone is in a relationship with AI is independent of if others they know are in a relationship.
Let's say a student knows roughly 100 students in their school (this is a very conservative estimate, as "knows" simply means "knows of" rather than "knows as a friend"). Let \( A \) be the event that a student is or knows someone in a relationship with AI.
$$P(A) = 1 - P(A^C)$$
$$= 1 - (1-p)^{N + 1}$$
where \( N \) is the number of people that a student knows. Averaging over all students, we should expect upper bound estimate of:
$$1 - (1-p)^{N + 1} = 0.19$$
$$\implies 1 - (1-p)^{101} = 0.19$$
$$\implies (1-p)^{101} = 0.81$$
$$\implies p = 1 - (0.81)^{1/101}$$
$$\boxed{\implies p \approx 0.0021 = 0.21\%}$$
So, in reality, at the very most, the study should really focus on the fact that 0.2% of students have been in a relationship with AI, an entire 2 magnitudes lower than the article's headline. This is of course most likely not the correct number in actuality, as this is based on a surveyed response where people may lie, play into rumors, or keep things secret.
You can go even further with this, perhaps assuming some Poisson distribution for number of people known (\( 1 + Pois(\lambda )\) perhaps?) and trying to get an estimate for \( t \)? This model would require many more assumptions and a lot more complexity than the problem laid out here (I originally wanted to do a Bayesian hierarchal model with \( \lambda \) estimated based on the sample size in the study, but I figured it would be way to intensive).
This does not mean that the study lied to anyone: the 1 in 5 estimate does follow their data. Still, the estimate can be very misleading. It's important to think about where numbers come from, even if they are true.