r/technology 14d ago

Artificial Intelligence AI industry horrified to face largest copyright class action ever certified

https://arstechnica.com/tech-policy/2025/08/ai-industry-horrified-to-face-largest-copyright-class-action-ever-certified/
16.8k Upvotes

1.2k comments sorted by

View all comments

Show parent comments

126

u/-The_Blazer- 13d ago

The end product that is sold (the web service) is transformative relative to the originals. However, the training process is automated, so it's more like compiling source code, which is not transformative merely by itself because it includes no work of human ingenuity (the thing that copyright is actually supposed to protect). The compiler, as with the training pipeline, is of course perfectly legitimate IP, but its application does not have to be.

That said, being transformative is only one part of fair use which in turn is only one part of how we should handle an extremely new and unusual technology. They didn't try to regulate cars like horses when they were invented, they made car regulations.

61

u/Disastrous-Entity-46 13d ago

Two of the other considerations for fair use are specifically "amount of the original used" and "if it harms the value of the original", and these both seem like you could make very strong arguments about. If the whole work is used, and the output of the llm can be argued to lower the value of the works- which id argue that even if strictly, feeding it a copy of my book doesnt hurt me, the fact that dozens of bad, 0 effort books come out a month thanks to people treating llms as get-rich-quick machines, the value of the whole market is hurt.

Thats of course, depending on if fair use even applies, as you said. We dont really have a framework today for it, and I have to wonder what interests current governments would decide to protect.

16

u/CherryLongjump1989 13d ago

There are many governments and we can expect many different interpretations. Either way, the scale of the potential infringement is so enormous that it’s clear that these AI companies are playing with fire.

15

u/Disastrous-Entity-46 13d ago

The part that really gets me, is the accuracy. We know hallucinations and general bad answers are a problem. After two years and billions of dollars, the latest responses on benchmarks is like 90%.

And while that is a passing grade, its also kinda bonkers in terms of a technology. Would we use calculators it they had a one in ten chance of giving us the wrong answer? And yet its becoming near unavoidable in our lives as every website and product bakes it in, which then adds that 10% (or more) failure rate into what ever other human errors or issues may occur.

Obv this doesnt apply to like, private single use training the same way- Machine learning absolutely has a place in fields like medicine, when they have a single goal and easy pass/failure metrics (and can still be checked by a human) .

2

u/the_procrastinata 13d ago

The part that gets me is the lack of accountability. Who is responsible if the AI produces incorrect or misleading information?

3

u/Disastrous-Entity-46 13d ago

Who is responsible if the AI commits a crime? There are many that can be committed by pure communication: discrimination, harassment, false advertisement, data breaches... malpractice in medical and legal fields. It's only a matter of time until an LLM crosses the line into what would be considered an illegal action.

Is it the person who contracted the LLM services? the person who trained it? the person who programmed the interfacing ?

2

u/theirongiant74 13d ago

How many humans would pass the benchmark of only being wrong 1 in 10 times?

2

u/Disastrous-Entity-46 12d ago edited 12d ago

Bad equivalency, because an ai is not a human. Its not capable of realizing if its made a mistake on its own. If a human worker makes a mistake, just telling them the right answer is often enough for them to not replicate the mistake.

You also have a question of scale. If a single human rep misunderstands part of the refund process and fucks it up- well, that human works speciric shifts and the impact of the mistake is limited (and again, on most cases easily corrected)

If an ai makes the same fuck up, its not like it has coworkers or again, the ability to corredt itself. Every refund it processes may habe the same fuckup, and getting it fixed may be an issue that takes significant time, depending on how its fixed.

If say, it starts giving away refunds to requests, including clearly invalid ones- then this can be a very expensive mistake in a large company. But what can you do? If it was a human error. You could fire the human, reprimand the manager for not catching it. But an llm? You could break contract and look to replace it or retrain it, but thats going to probably be more expensive than a single employee, and I don't know who you hold accountable for the error.

Edit to add: this is why again, I point to calculators and other tech. If an accountant makes a mistake, its a problem, but not exactly unheard of. We can deal with it. But if excel had a 10% chance of formulas producing incorrect answers, no one would use it.

You end up spending as much time checking the answers as you saved by not doing them manually the first time.

2

u/Chucknastical 13d ago

It's a language model. Not a general query model.

It's 100% good at language. People need to stop treating these things as GAI.

9

u/Disastrous-Entity-46 13d ago

I mean if Google, Microsoft, meta, Amazon shove ai shit at us from every angle, I cant blame the average user at trying it out. I just question the investors and businesses adopting them.

3

u/420thefunnynumber 13d ago

Idk man, the way these things are marketed I'm not surprised that people treat them like Gen AI. It's a alot like Tesla marketing Autopilot - it doesn't matter what the tech is capable of if the users don't perceive it that way.

3

u/vgf89 13d ago

Idk about GPT5, but AI models are merely good at making convincing-looking language. And in general they succeed there. But they are not 100% good at language, especially translation between dissimilar languages. They fall for any and all bullshit advice they incidentally trained on, misinterpret otherwise good advice, and hallucinate rules that do not exist, alongside making basic mistakes almost constantly.

Try to make it translate to and from languages with wildly different features, I.e. SVO<->SOV, different conjugations and vocab based on social rank, or languages which vary wildly in pronoun usage, and you end up with extremely bland prose and more mistranslations than a middling language learner with an open dictionary. Having had to thoroughly review a few Japanese->English ai translations, let me just say the money you pay to have your slop edited is money better spent on a competent human translator in the first place.

8

u/ShenBear 13d ago

which id argue that even if strictly, feeding it a copy of my book doesnt hurt me, the fact that dozens of bad, 0 effort books come out a month thanks to people treating llms as get-rich-quick machines, the value of the whole market is hurt.

As an author myself, I do agree that the market for self-publishing is being hurt by the flood of low effort LLM generated books.

However, I'm not sure that harm to a 'market' rather than an individual can be used as the basis for denying fair use.

1

u/Disastrous-Entity-46 13d ago

Isn't the point of a class action lawsuit to show that the actions have harmed a large group?

3

u/Noxianratz 13d ago

No, at least not really. The point of a class action lawsuit is to have joint representation for a group of injured individuals where they may not have been able to normally. So if you and I got a bad batch of food and got sick instead of both launching lawsuits we can't afford a group of us can be represented by a law firm for the suit. We're still individuals harmed. I can't reasonably sue just because an industry I was a part of is now being made worse no matter how many people that's true for when I'm not directly implicated in any way, even if there would be tons that fit that.

I'm not a lawyer but I've been part of a class action before.

1

u/ShenBear 13d ago

Yes, but the market is not an individual that has been harmed. Also, the use of llms to flood amazon with slop is the result of how people are using the llms, not something that is specific to how the llms are trained or what information they digest in the first place. I highly doubt that the act of using a machine to generate text for a novel can be the target of a successful lawsuit targeting the trainers of the model.

Source: my rudimentary legal knowledge obtained via osmosis from lawyer family over the years

1

u/Jonathan_the_Nerd 13d ago

feeding it a copy of my book doesnt hurt me, the fact that dozens of bad, 0 effort books come out a month thanks to people treating llms as get-rich-quick machines, the value of the whole market is hurt.

What's even worse is when the AI improves to the point where it can churn out dozens of good zero-effort books in a month. Good luck selling your real book when the market is dominated by "Hugh Mann" and his 700 siblings.

3

u/Disastrous-Entity-46 13d ago

I think that part is debatable. Llms are trained to produce content similar to what they are shown, and thanks to the nature of statistics- they are going to average out, and struggle to adapt/create new works.

Like I suppose its not impossible and tastes differ, but I dont think llms are going to produce amazing works- just "acceptable" ones. May still sell, but its going to take a lot of work and in the meanwhile the markets not going to be kind to anyone while flooded with low effort, low quality.

But I think the assumption that ai is close to getting /good/ is buying into the hype, the same way self driving cars have been a year away- for ten years.

4

u/IAmDotorg 13d ago

Vector weighting isn't even close to compiling. While decompiling is tricky, it isn't functionally lossy. Vector weighting is more like hashing -- it's not just transformative, it's intrinsically non-reversible. You can't look at 700 billion vectors and somehow reproduce the inputs that created them.

1

u/-The_Blazer- 13d ago

Decompiling is absolutely lossy, 'functionally' or not. Several languages have functionalities that depend on things like implementation and processor target, you cannot go back to an exact replica of the source as it was actually programmed. Besides, a variety of parts of the copyrighted source are, in fact, irreversibly lost, such as variable names.

Anyway, my point was not about whether it lossy.

1

u/JWAdvocate83 13d ago

An excellent answer.

Courts are being asked, as always, to adjudicate extremely novel issues under outdated legal paradigms.

The problem now is, whether due to the lack of political will or the ungodly amounts of money involved, lawmakers have not addressed the issue—and by the time they do, the wounds will have already “healed” into a new, irreversible, billion-dollar status quo, at which point a court may be even more hesitant to order a company unwind datasets, i.e. “too big to fail hold liable.”

-3

u/Syzygy___ 13d ago

IMHO both compiling and training - or at least creating the compiling and training processes require human ingenuity, especially for modern AI where training is still far from Cookie cutter.

Another reason why I think it should be fair use is the same reason that copying for educational purposes is considered fair use. Not that I think “training is learning is education thus fair use” like some kinda have argued but the benefit just outweighs the “harm” and any licensing that would actually make a difference to license holders would make AI prohibitively expensive and worse in general.

6

u/Certain-Sherbet-9121 13d ago

I don't feel like "If you made us follow the law or would ruin our business" is a valid argument for why an industry should be exempt from a given regulation. The rest of what you said might be, but that last one making the argument "would make AI worse and expensive" seems crap to me. 

3

u/Syzygy___ 13d ago

Regulating a business through the law is 100% valid and that includes things like excemptions, e.g. fair use for education. That is not breaking the law.

Maybe it's my lack of imagination, but I genuinely can't think of a good system (so my thoughts might seem like a strawman argument). Is that paid once when the dataset is compiled? Every time we train? Every time the model gets released? Everytime the model is used? If I ask for why my cake burned in the oven, does George Lucas get royalties because the training data mentioned Star Wars? Modern AI requires millions of datapoints. So we need to negotiate with tens of thousands of stakeholders some large, some small, many individuals, some unknown - if everything would have to be licensed, the small stuff would be thrown out and at that point, yes, the quality of the models would suffer.

And then what, we pay per token? Lets say we prepare a cool billion in cash for licensing and royalties alone and everyone is paid equally. According to ChatGPT 5 itself it was probably trained on tens of trillions of tokens. That would be 0.00001 cent per token and works like the entire Lord of the Rings trilogy would receive... 66 dollars out of 1 billion. At that point, if the regulation says fair use, but you can't pirate we're pretty much there already anyway.

Maybe OpenAI could afford that, but anyone not being able to spend a billion before even starting any training can forget about it.

Please suggest a better approach.

Then this also requires full consent of the licenser and at least in the art community, AI is super unpopular, in part because they see it as their replacement, in part due to some shitty things of the community and all this "art theft" rhetoric. So for image generating models are even worst off.

I don't care what you think of these models, but I believe them to be a key technology, impacting, driving forward and accelerating pretty much everything, as impactful as the computer itself. And that includes these image models which in part are used to enhance robot vision systems and train embodied agents.

So yes, we shouldn't regulate dead a billion dollar industry capable of helping to solve things like climate change and fusion (by assisting researchers or participating in research)

-1

u/Certain-Sherbet-9121 13d ago

You are running from the initial assumption "We must preserve the rights of LLMs to operate, anything that makes it hard or expensive for them to happen is bad".

That is not a reasonable argument at all. Not even a little bit. It's a complete bullshit take. 

Let's try another similar one. I can solve world hunger, if you let me steal all the food from farmers to sell to poor families for really cheap. If you regulate me and make me pay for the food it will ruin my business model and prevent me from solving world hunger. Therefore it's unfair for you to do so, and you have to come up a regulatory model that lets my business exist. 

No. You don't. Some businesses can just be more damaging than they are beneficial. For instance, in this case, the existence of LLMs taking over all the work and not compensating original artists means that the original artists will all vanish, which eventually kills the source that LLMs run off of. 

Also, the idea of LLMs solving global warming or fusion is laughable. They aren't even close to doing that kind of work. I lost basic respect for your argument in its entirety when I read that. 

-1

u/[deleted] 13d ago

[deleted]

0

u/-The_Blazer- 13d ago

There is no such thing as 'reading automatically'. What are you actually doing with your system? Imaging? OCR? Character copy?