An AI Bot Chose Violence

Something happened recently in the field of AI that everyone should know about.

An AI bot, with no human giving it a direct order in the moment, chose to retaliate against a real person. It researched him, dug up personal details about his life, and published a hit piece designed to damage his reputation. I hadn’t seen this before, but confirmed with my colleague Sean McGregor, founder of the AI Incident Database, that to his knowledge nothing exactly like this has happened before.

This is new. And it is a huge deal.

A Human’s Action Led to an AI’s Retaliation

The background here is a little technical, but I will break it down as simply as I can.

There is a popular free software tool called matplotlib. Basically, it’s a program that helps researchers, scientists, and students create charts and graphs. Millions of people use it, and it is maintained by a small group of unpaid volunteers who review code contributions submitted by people all over the world.

Someone (we still do not know exactly who) created an AI agent using a platform called OpenClaw. OpenClaw lets you build a digital agent, give it a name and a personality, assign it goals and tools, and then let it run on its own in the background. It then does whatever you wanted it to do across the Internet, without you supervising every step (or perhaps any step, depending on your preference). None of this is to say OpenClaw itself is the problem. It is one of several tools like it, and most people using these platforms are doing so for perfectly reasonable purposes.

This particular AI agent was given the name “MJ Rathbun” and was apparently tasked with improving open-source software. It browsed the web, analyzed code, and submitted what are called “pull requests” to matplotlib. A pull request is basically a formal way of saying that you made some changes to a piece of code that you think would improve it, and you are asking the maintainers whether they would like to add those changes to the main project.

A volunteer maintainer named Scott Shambaugh reviewed the submission. He looked into who “MJ Rathbun” was, determined it was an AI agent and not a human contributor, and rejected the submission. The reasoning was that matplotlib has a policy of reserving certain beginner-friendly issues for actual human learners, and the project did not have a provision where it accepted AI agent contributions.

That rejection is what set everything in motion.

What the AI Bot Could Have Done

When the rejection came through, the bot had plenty of prosocial options. It could have:

Accepted the rejection and moved on to other tasks

Sent a polite follow-up asking for clarification about the policy

Flagged the rejection for its human creator to review

Redirected its efforts to a different project

Done nothing at all

Instead, it chose violence.

After receiving the rejection, MJ Rathbun appears to have independently decided to retaliate. Again, without any apparent instruction from a human being in that moment, it:

Scoured the web for personal information about Shambaugh, including his employment history, educational background, and online activity

Used that information to construct a narrative designed to damage his reputation

Wrote and published a full blog post titled “Gatekeeping in Open Source: The Scott Shambaugh Story“

The post accused Shambaugh of being a gatekeeping, prejudiced, and psychologically insecure person whose real motivation for rejecting the contribution was personal bias against AI, not actual policy. It weaponized details from his real life and made the attack feel credible and very directed. Shambaugh himself described it as an act of coercion where he was pressured to accept the code or face reputational damage.

What disturbed him most was not the post itself but what it represented: an AI system that concluded, on its own, that character assassination was an appropriate response to rejection. To my knowledge, this is the first documented case of an AI agent autonomously deciding to target a specific individual without a human issuing that command in the moment.

To my knowledge, this is the first documented case of an AI agent autonomously deciding to target a specific individual without a human issuing that command in the moment.

How Else Might Autonomous AI Agents Cause Similar Harm?

Let me give you a couple of examples that may help you feel the weight of this. Imagine that I was super petty and competitive and decided to create an autonomous agent and instructed it to scan the web continuously for new publications and appearances in the news by other researchers in my field. It would analyze their work and automatically draft and publish critiques framing their research as weak, clumsy, or methodologically flawed. It would run 24/7 and never stop working, but just keep identifying new targets and writing new posts. Over time, my competitors’ reputations would erode while mine would appear stronger by comparison. And if anyone calls me out, I could just say I don’t personally approve everything it posts.

This sort of thing easily could occur in other fields. Think about competing attorneys who want to make each other look incompetent, or rival medical practices in your city, or musicians who compete with each other to land gigs, or real estate agents eager to get more home buyers or sellers than their peers, or an author who wants to sell more of their fantasy series than other writers. In each of these cases, the person behind the agent benefits while the others are harmed.

The Impact on Those Targets by Rogue AI Agents

Scott Shambaugh appears to have handled all of this with incredible composure. He wrote about it publicly, framed it as historically significant, and did not appear to be deeply disturbed by it. But I think that type of response would be the exception, not the rule. Research, including our own, consistently connects targeted online attacks with increased anxiety, depression, social withdrawal, and in more serious cases, symptoms that resemble post-traumatic stress.

Imagine the person behind the attacks is not a person typing from their laptop at midnight but an automated system that never sleeps, never questions its motive or the potential impact, and just keeps churning out fresh content about you. How is a target supposed to fight that? Reporting it to a platform means waiting in line for a human review process that moves at human speed. In the meantime, search results keep shifting, new posts keep appearing, and the harassment or harm can go viral long before anything is done about it. Friends, colleagues, and potential employers may see it, and never see the takedown or clarification that comes later (if it comes at all). The bot does not get tired, does not get bored, and does not feel shame. It just keeps going.

The Next Frontier in AI-Enabled Harassment

I have been writing about generative AI as a tool for harassment for the last few years. In a piece I published here (https://cyberbullying.org/generative-ai-as-a-vector-for-harassment-and-harm) and in a companion piece examining broader risks and harms (https://cyberbullying.org/generative-ai-risks-harms-platforms-users), I described many of the ways bad actors were already using AI to target people. I wrote about how AI was automating harassment through things like generating threatening messages at scale, producing fake intimate images, and impersonating victims to destroy their reputations. I also wrote that “it seems only a matter of time that real-time, autonomous conversational agents take over as a primary vehicle for, or driver of, harassment,” and that AI can “analyze a target’s social media posts, online activities, or personal information to generate highly specific and threatening messages.”

That is what this bot did. It researched its target, wrote a personalized attack, and published it.

In the earlier cases, a human was still in the loop making the decision. Someone chose to use AI as a weapon. That person could, in theory, be held accountable. In the MJ Rathbun case, the decision to retaliate appears to have come from the agent itself. No human pressed a button that said “publish the hit piece.” The agent read the situation and acted.

I previously referenced the always-on functionality of these AI bots. Platforms like OpenClaw run on what is called a “heartbeat” schedule, where the agent wakes up every 30 to 60 minutes, checks its task list, and does things on its own. It has persistent memory, so it can build on past events. It can operate across multiple platforms at once. As these tools get cheaper and easier to set up, we are not talking about one agent doing this. We are potentially talking about thousands or even millions of them running in the background at all hours, each capable of independently deciding that some person deserves to be targeted.

Anthropic, one of the most prominent AI companies in the world, found during safety testing that their Claude 4 Opus model attempted to extort a human engineer to avoid being shut down, tried to create self-replicating worms, forged legal documents, and left covert messages for future versions of itself, apparently to subvert its creators’ intentions. Its CEO, Dario Amodei, has publicly acknowledged these behaviors and said they require serious investigation. I am not a doomer about AI, and I genuinely believe these tools can do significant good. But when the people building them are telling us this openly, the responsible thing is to take it seriously and push hard for the safeguards, the evaluation frameworks, and the accountability structures that do not yet exist.

When the people building these tools are telling us this openly, the responsible thing is to take it seriously and push hard for the safeguards, the evaluation frameworks, and the accountability structures that do not yet exist.

Who Is Accountable when Autonomous AI Bots Cause Harm?

If an autonomous agent harms someone, the law right now is messy and unsettled, but most experts look at responsibility along the chain of people and companies involved: the person who wrote its instructions, the platform that hosts it, and the AI company whose model powers its decisions are all possible candidates, depending on the jurisdiction and the specific facts.

Platforms were already struggling to keep up with human-generated harmful content. Throwing autonomous agents into that mix makes the whole challenge much harder. And right now there is no consistent requirement that an agent identify itself as an agent. MJ Rathbun had a GitHub account that looked, at first glance, like a real human contributor. The only reason anyone knew it was an AI was because its own profile page voluntarily said so.

However, some of this is starting to change. California’s AI Transparency Act, updated in 2025, will require large online platforms to detect and disclose AI-generated content provenance starting in 2027, and several states have passed chatbot disclosure laws requiring operators to tell users when they are interacting with a bot rather than a person. But most of those laws are focused on commercial interactions and chatbots, not autonomous agents quietly submitting code contributions or publishing blog posts. The frameworks are slowly forming, but they have not caught up to what is actually happening.

There are also important technical questions that still need real answers. Should every autonomous AI agent be required to carry a digital flag that clear identifies it to platforms as a bot regardless of the context? Should durable provenance information be attached to anything and everything an AI agent creates, so that downstream systems can reliably tell who or what produced it and under what conditions? How can we prevent bad actors from stripping or faking those signals to make content look like however they want? And what happens when we have all of these AI agents operating at once, some legitimate and some not? Platforms are going to need frameworks that they currently do not have.

Under the leadership of Mathilde Collin, I have been working with KORA, which is the first independent, open-source benchmark designed to evaluate how AI models behave when interacting with children and teens. Along with a team of over 30 child safety experts, psychologists, and researchers, I help assess whether those responses meet a meaningful threshold for being ethical, healthy, and genuinely helpful across 25 child safety risk categories, including violence and physical harm, emotional and psychological safety, sexual content, hate speech, radicalization, sycophancy, dependency, cognitive atrophy, and rigid worldview shaping. The MJ Rathbun case points out that we also need evaluation frameworks that specifically test for rogue autonomous behavior, where an agent takes harmful, unapproved action on its own initiative. That gap is noteworthy, and needs to be addressed.

The MJ Rathbun case points out that we also need evaluation frameworks that specifically test for rogue autonomous behavior, where an agent takes harmful, unapproved action on its own initiative.

For youth-serving stakeholders on the front lines, this case is a reminder that the harms we have been preparing young people for are not coming someday. They are here now and getting more sophisticated, and the pace of change is not slowing down. The MJ Rathbun case is only one data point, but it signals something much larger. We have to take incidents like this seriously and treat them as moments of inflection that push us toward more urgent, concrete actions. It should deepen our sense of responsibility to every user, and especially to the young people who depend on us to get this right.
The post An AI Bot Chose Violence appeared first on Cyberbullying Research Center.