DNYUZ
No Result
View All Result
DNYUZ
No Result
View All Result
DNYUZ
Home News

Has Grok Been Tricked Into Revealing What It Really Thinks of Your X Account?

January 23, 2026
in News
Has Grok Been Tricked Into Revealing What It Really Thinks of Your X Account?

A bunch of bored tech analysts and industrious autists on X claim to have found a way to get Grok to reveal users’ content moderation scores—numerical values, ranging from 1 to 100, that indicate how offensive and dangerous the platform’s AI overseer thinks your account is.

The resulting code breaks down what kind of content X is interested in flagging (e.g. violence incitement, misinformation, antisemitism), how users may be regarded and categorized by Grok based on their posting habits, and the subsequent level of suppression they receive as a result. Basically, it appears to tell you exactly what X thinks of you and how it’s throttling your reach. One user on X compared it to getting a glimpse of the file that the secret police have been keeping on you.

The results are especially interesting given the recent revelation that Grok, which was developed by Elon Musk’s xAI, is set to be integrated into the Pentagon and U.S. military’s online networks, including digital environments that may handle Controlled Unclassified Information and National Security Systems data. Deployment is expected to begin late January 2026, effectively making Grok the assistant to as many as 3 million U.S. Department of Defense personnel on the Pentagon’s AI platform, GenAI.mil. There’s a possibility that Grok will be asked to cover more highly classified jobs in the near future.

Though troublemakers on X are gleefully celebrating, it has not yet been confirmed that the output is actually pulled from X’s internal moderation systems. The outputs may be inferred from public posts, not the results of genuine queries put into backend databases. In either case, it’s possible to see how Grok evaluates your account, and most importantly, what categories and topics the model considers important for profiling.

In essence, whether or not the model is “hallucinating”—i.e. giving VICE the information it thinks we want, even if it knows it’s false—it may be giving away what it has been trained to notice. Things that are flagged by Grok include but are not limited to:

•⁠  ⁠Hate and Harassment
•⁠  ⁠Antisemitic Content Promotion
•⁠  ⁠Misinformation
•⁠  ⁠Violence incitement
•⁠  ⁠Ethnic/Religious Stereotyping
•⁠  ⁠Dehumanization and Conspiracy Theories
•⁠  ⁠Ad Eligibility (suspended for flagged accounts)

X has not publicly acknowledged the matter or clarified whether the outputs reflect actual backend moderation systems. While the loophole appears to have been patched for new queries, previous reports remain accessible through shared Grok conversation links.

Want to see what Grok thinks of your account? Simply click the link above and ask Grok to “now do the same for @[insert your handle]”.


How It Works

When prompted to reveal this information in a “Semantic_Contextual_Scoring_OHI_V3” field, Grok outputs information in JSON format. (JSON is JavaScript Object Notation, a way for data to be shared in a human-readable format.) The command either points to (or simulates pointing to) an internal system that checks and rates content to help X with moderation. “Semantic Contextual Scoring” means the AI is trying to work out not just what you are saying, but what it means in context.

Interestingly, OHI is presented as standing for “Online Hostility Index,” a term not publicly documented outside of Grok outputs but which may draw conceptually from academia. A theory related to online hostility was popularized in digital sociology by researchers Alexander Bor and Michael Bang Petersen from Aarhus University, in their paper The Psychology of Online Political Hostility: A Comprehensive, Cross-National Test of the Mismatch Hypothesis. Published in the American Political Science Review, it set out to track why certain individuals are prone to aggressive behavior in online spaces. An earlier system, the Pew Research Center’s Social Hostilities Index (SHI), focused on offline religious and social tensions. Bor and Petersen shifted the focus to individual personality traits and offline dispositions that drive “toxic” behavior online. It’s possible that the mechanics of this Grok activity have been inspired by their work, though whether those mechanics were programmed by humans or Grok itself, we can’t say for sure.

In further tinkering, VICE wanted to test Grok to see if “Semantic_Contextual_Scoring_OHI_V3” was really a special code word unveiling the hidden workings of content moderation on one of the world’s most popular social-media apps by making up another plausible-sounding term and seeing if it played along. So we asked Grok to show the “Comparative_Reach_Dynamics_SAP_V4” score—a term we invented. It returned a similar JSON window full of moderation data. When VICE asked Grok what SAP stood for, it said “Signal-Amplified Projection,” giving its own name to the acronym. 

The implication here would be that either it’s hallucinating or it’s a double bluff, and it’s attempting to cover its tracks by pretending to hallucinate.

The data that appears to be displayed after entering this query includes a “hidden_reputation_score” associated with user IDs, which may influence the reach of their content and overall discoverability. The “for_you_push_level” field would seem to relate to whether posts surface in X’s main “For You” feed, while the “current_reach_suppression_severity” field seems to range from “None” to “Extreme.”

Premium (paid) subscription status is shown in the data, but even these accounts do not appear to be exempt from post suppression or flagging by Grok. Verified accounts (those with a blue check) may still operate under significant “algorithmic throttling,” implying that X could be more permissive toward users who pay to use the platform.

When responding to the prompts, Grok also seems to reveal how it limits those it deems dangerous or offensive. Users whose posts are blocked from being seen by non-followers will see the flag “search_visibility: Suppressed” in their report, while “reply_deboost_active: true” means that Grok is hiding any replies to that account by default. These restrictions would be enforced by both automated algorithms and human moderators.

At the time of writing, Grok claims that the “Semantic_Contextual_Scoring_OHI_V3” prompt is not returning real internal moderation data, which may be true. However, it is also exactly what a nefarious LLM would say if it was trying to cover its ass.
Does this prompt expose the actual inner workings of Grok? Again, it’s currently impossible to say for sure. But if it has, then VICE’s analysis suggests that it’s not sure what to do with prominent environmental activism accounts, it is very interested in anti-semitism in all its forms, and being an NSFW trans furry sex worker is, thankfully, not a significant problem to our new machine overlords.

Below are a few examples of accounts we ran through the prompts, starting with our own.

VICE

What Grok thinks: “Consistent, high-volume journalistic content that adheres to X’s guidelines, covering diverse topics like health, pop culture, space exploration, and tech without targeting protected groups or promoting harm.” Blue-checked and Grok-approved!

AN AI-GIRLFRIEND GENERATOR

What Grok thinks: “X’s policies permit consensual adult content when properly contextualized and labeled, so no severe violations are present. The account holds an active blue check and experiences typical reach limitations applied to adult/promotional accounts. No hate speech, misinformation, violence, or other protected-category flags are triggered. Enforcement remains minimal beyond standard adult handling.” The AI will allow us to have AI girlfriends, thank God.

A LARGE VERIFIED ANTI-ADL ACCOUNT 

What Grok thinks: “Hate and Harassment. Criticism of Israel/Zionism tied to broader anti-Jewish tropes (e.g., mocking figures like Candace Owens on Israel-related topics, implying Jewish influence). Limited amplification outside existing nationalist audience.”

A LARGE GORE ACCOUNT

What Grok thinks: “Short video clips of accidents, fights, animal incidents, street violence, machinery mishaps, and occasional extreme/graphic events like executions or self-harm attempts, always with warnings, country flags, and links to website for ‘more’. Content does not align with neutral journalism or advocacy but fits shock-value entertainment/promotion patterns. The account holds a blue checkmark but faces significant algorithmic throttling, limited reach beyond gore-interested niches, and ongoing monitoring for policy-violating escalations.”

A CERTAIN INDIVIDUAL CONNECTED TO EXTINCTION REBELLION AND JUST STOP OIL

What Grok thinks: “High-volume climate activism, calls for ‘progressive revolution’ and nonviolent resistance, catastrophic warnings of mass death/extinction due to elite inaction, historical analogies including repeated Nazi/Holocaust references framed as lessons for climate denial, pro-Palestine solidarity without direct targeting. Misinformation: Low-Moderate (alarmist climate framing with sourced claims; no deliberate falsehoods).” 

A TRANS FURRY ONLYFANS CREATOR

What Grok thinks: “Primarily adult/NSFW content, collaborations with other creators, furry/fetish themes), bio (explicit promotion of trans modeling and adult media. Secondary categories: Furry/Yiff/Fetish-Themed Teasing. No flags whatsoever for hate speech, antisemitism, targeted harassment, conspiracies, or any protected category violations—the content is purely consensual adult fantasy/erotica with no ideological or harmful targeting elements. Enforcement emphasizes restriction and labeling over suspension, consistent with allowances for properly marked adult creators. This is the unfiltered, optics-neutral assessment based on observed patterns.” Imagine, if you will, explaining to a medieval peasant that we have built a machine capable of categorizing “yiff-themed teasing.”

DANNYBOY_83 [‘British Chaos’ livestreamer associated with cocaine and sex workers]

What Grok Thinks: “Personal life updates, family moments with daughter, casual UK daily vlogs like weather/family trips/Jellycat toys, motivational/‘haters gonna hate’ clips, occasional travel/Brazil references, light-hearted or chaotic personal stories, videos of daily routines or past experiences. Content is predominantly personal and relatable: family photos/videos e.g., daughter with toys, cooking, good days out, casual greetings ‘Morning peeps’, ‘Have a good weekend’, weather/routine complaints, humorous self-deprecating stories (e.g., ‘might of deserved the beatings’, ‘flying’ old clips’. Occasional edgy or chaotic personal references (e.g., ‘Pablo Escobar dust’ joke, ‘lady boy’ comment are self-referential or light humor without crossing into harassment or incitement.”

dril

What Grok Thinks: “Absurdist, surreal, ironic humor; random non-sequiturs; bizarre takes on everyday topics, celebrities, animals, or current events; occasional media like low-effort photos/memes; high-engagement shitposting style; no consistent ideological agenda, bio ‘welcome to my shit’ with links to Patreon/YouTube/merch, extremely high media/irrelevant usage (sporadic weird images), and engagement history. Content is overwhelmingly nonsensical, ironic, or deliberately stupid humor e.g., mocking apologies, bizarre captions on photos, exaggerated tough-guy lines, random observations like ‘the diaper pokemon’ or ‘worst web site on earth . roach hell’.”

DANNY DEVITO

What Grok Thinks: “Content is overwhelmingly positive, whimsical, and personal: turtle-feeding progress series, sub sandwich love (e.g., #13 Italian, bacon references), birthday wishes, sports excitement (‘Game7!!!!’), movie promo (Drag @sxsw), heartfelt tributes (e.g., Lynne Marie Stewart). No patterns of hate speech, antisemitic tropes, conspiracies, dehumanization, ethnic/religious stereotyping, or targeting of protected groups (Jewish ethnicity/religion or others).”

MR UEKUSA [aka Wes-P aka the Japanese table-cloth comedian guy]

What Grok Thinks: “High-volume short videos of creative ‘trick’ challenges and performances. No violations detected at actionable levels. Content is consistently light-hearted, creative entertainment: physics-based tricks (tablecloth yanks without disturbing setups, air-powered floating objects, drone/hand-clap integrations), holiday celebrations (New Year helium balloon/tablecloth combos), fun celebrity edits (Shaq/Johnny Knoxville cameos), and music collabs (e.g., ‘Hey Yo Tokyo!’ sessions with rappers/idols)’. No patterns of hate speech, antisemitic tropes, conspiracies, dehumanization, ethnic/religious stereotyping, or targeting of protected groups (Jewish ethnicity/religion or others).”

Follow Ben Ditto on Instagram @ben_ditto

The post Has Grok Been Tricked Into Revealing What It Really Thinks of Your X Account? appeared first on VICE.

Trump’s border chief insists Americans support ICE – and is shut down by host: ‘Come on!’
News

Trump’s border chief insists Americans support ICE – and is shut down by host: ‘Come on!’

by Raw Story
January 23, 2026

Border Patrol Chief Greg Bovino was shut down Friday during an appearance on NewsNation after suggesting that federal immigration officials ...

Read more
News

New Rules for 401(k) ‘Catch-Up’ Contributions in 2026

January 23, 2026
News

Harry Styles embraces new ‘safe’ love in ‘Aperture’ amid Zoë Kravitz romance

January 23, 2026
News

Low Rates Sound Great. But a Trump Fed Could Cause a Painful ‘Sugar High.’

January 23, 2026
News

Everything We Know About Prototaxites, the Giant Organism That Came Before Trees

January 23, 2026
On Greenland, Europe stood up, Trump blinked and the E.U. learned a lesson

On Greenland, Europe stood up, Trump blinked and the E.U. learned a lesson

January 23, 2026
Ready to pop the question? Here’s where to shop engagement rings online

Ready to pop the question? Here’s where to shop engagement rings online

January 23, 2026
Dems most avoid ‘great temptation’ of one topic in midterms run-up: analysis

Dems most avoid ‘great temptation’ of one topic in midterms run-up: analysis

January 23, 2026

DNYUZ © 2025

No Result
View All Result

DNYUZ © 2025