Skip to content

LLM-Powered Copycats Are Flooding PyPI

April 8, 2026·Tim Hopper

A developer named Roman Dubrovin published repowise, a tool for generating structured wikis from codebases, as his first PyPI package. The next morning, he searched for it on PyPI and found three new packages he didn’t recognize: repowise-pro, repowise-enhanced, and repowise-next.

All three had been uploaded within a 90-minute window. All three carried the same description: “Codebase intelligence that thinks ahead — outperforms repowise on every dimension.”

They weren’t empty shells. Someone had forked Dubrovin’s AGPL-3.0 licensed source code, run it through an LLM to patch a couple of minor issues, and republished under new names without attribution or license compliance. A community member confirmed that all three packages traced back to the same person and the same GitHub repository.

This isn’t typosquatting

Typosquatting relies on misspellings: colourama instead of colorama, jeIlyfish (capital I) instead of jellyfish. The copycat expects someone to fat-finger an install command.

The repowise clones used a different strategy. They named the original package in their description and claimed to beat it, trying to rank above the original in PyPI search results. Call it reputation squatting.

The tactic works because PyPI has no editorial layer. Anyone can publish anything, and PyPI search ranks by metadata relevance. A package whose description says “outperforms repowise on every dimension” can appear when someone searches for repowise, above the real thing.

Scrape, fork, prompt, publish

Several commenters in the Reddit thread reported the same experience with their own packages. One developer published sqlmodelgen and soon found sqlmodelgenerator, a package with AI-generated docs full of emojis, packed with unnecessary dependencies, and no repository link.

The recipe looks like this: monitor new PyPI releases, fork the source, use an LLM to make superficial modifications, republish with SEO-optimized descriptions. The LLM step is what changed the economics. What used to require a person reading and modifying code is now a single prompt away from producing a plausible-looking fork.

This isn’t hypothetical at scale. In March 2024, over 500 typosquatted packages were uploaded to PyPI from automated accounts, each targeting popular libraries like TensorFlow, requests, and Matplotlib. Each package came from a unique maintainer account with distinct metadata. PyPI had to temporarily halt all new project registrations to stop the flood.

The 500-package campaign used traditional typosquatting. LLM-powered cloning makes the packages harder to distinguish from legitimate forks, because the code actually works.

Clones that turn hostile

Copycat packages become a supply chain risk the moment someone installs one. And the path from harmless clone to attack vector is well-documented. The aiocpa package launched on PyPI in September 2024 as a legitimate Crypto Pay API client. Two months later, the maintainer injected obfuscated code that stole API tokens and private keys, sending them to a Telegram bot. The malicious code existed only in the PyPI upload, not in the GitHub repository.

The same pattern plays out with higher-profile packages through different vectors. In December 2024, ultralytics (~60 million downloads) was compromised through GitHub Actions cache poisoning. In March 2026, litellm was hit through a compromised security scanner. Both started as trusted packages before the malicious code arrived.

A copycat follows the same trajectory in miniature. It accumulates real users through its resemblance to the original, then the maintainer has a built-in distribution channel for whatever payload comes next.

Defending your dependency tree

Verify before you install. Check the PyPI page for a source repository link, an established maintainer history, and a download count that matches what you’d expect. A package with three downloads that claims to outperform a popular tool is worth questioning.

Use lockfiles. A lockfile pins exact package names and versions. If your lockfile says repowise==1.0.0, you’ll never accidentally pull repowise-pro. uv generates a lockfile by default with uv lock.

Pin dependencies with hashes. Hash pinning verifies that the artifact you install matches the artifact you reviewed, blocking tampering after the fact. See How to pin dependencies with hashes in uv.

Audit for known vulnerabilities. Run uv audit in CI to check your dependency tree against the OSV database. Once a malicious package is flagged in the advisory databases, the audit will catch it.

Use --exclude-newer for a cooldown period. uv’s --exclude-newer flag limits installations to packages published before a specific date. PyPI resolved 66% of malware reports within four hours in 2025, so a short delay buys meaningful protection.

Enable security linting. Ruff’s flake8-bandit rules catch common vulnerability patterns in your own code (hardcoded secrets, unsafe deserialization, SQL injection) before they ship.

Report copycats. PyPI’s security team processed over 2,000 malware reports in 2025. Email [email protected] for intellectual property violations. PEP 541 covers the formal policy. GitHub has its own DMCA takedown process for forked repositories that violate licenses.

How PyPI fights back

PyPI has tightened supply chain security since 2024. All maintainers must have 2FA enabled as of January 2024. Trusted Publishing replaces long-lived API tokens with short-lived OIDC credentials scoped to specific CI workflows. Over 50,000 projects use it. Digital attestations under PEP 740 let anyone verify that a package was built from a specific source commit, and 17% of PyPI uploads now include them.

On the name-squatting front, PyPI runs automated typosquatting detection that blocks obvious variations during project creation. But the repowise-style clones aren’t typosquats. They use different names entirely. Catching those requires different tools: metadata analysis, code similarity detection, reports from maintainers and security researchers. PyPI’s team handles these through the malware reporting pipeline, but the volume keeps growing.

Why AGPL gave Dubrovin a takedown

Dubrovin chose the AGPL-3.0 license, which requires anyone who forks the code to preserve the same license, provide attribution, and make their changes available as source. The copycats violated all three requirements.

As one commenter in the Reddit thread noted: if Dubrovin had used MIT, the copycats could have done all of this legally. AGPL doesn’t stop bad actors from publishing a fork, but it gives maintainers a legal basis for takedown requests. The Software Freedom Conservancy has historically pursued AGPL enforcement cases, and the coordinated timing of the three packages (one person, 90 minutes) makes the case easy to document.

For anyone publishing a package and weighing license options: your license choice affects your enforcement options when (not if) someone copies your work.

Learn More

Last updated on

Please submit corrections and feedback...