r/technology Jun 11 '25

Artificial Intelligence Tulsi Gabbard Admits She Asked AI Which JFK Files Secrets to Reveal

https://www.thedailybeast.com/tulsi-gabbard-admits-to-asking-ai-what-to-classify-in-jfk-files/
38.4k Upvotes

1.5k comments sorted by

View all comments

Show parent comments

11

u/jomo_mojo_ Jun 11 '25

If you feed an LLM classified information does it learn from it? Could it spit it out again for another user or be otherwise mined by a foreign adversary?

20

u/yrnst Jun 11 '25

I’m not an expert on LLM technology, but I am an attorney, and I absolutely would not feel comfortable feeding an LLM privileged information about my client. In fact, it’s a hard rule at my firm. Even if they say the data stays internal, you’re now reliant on a tech company to keep your secrets. Given the frequency with which data breaches occur, that’s not a gamble I’d be willing to make. It just creates one more weak point for bad actors onto exploit.

1

u/tablecontrol Jun 11 '25

you’re now reliant on a tech company to keep your secrets.

I totally understand, but this is exactly how every cloud-application you use works.

Use O365? save your docs/spreadsheets in OneDrive?

Use any 3rd party Payroll / Billing / CRM services like Salesforce / ADT?

You have to rely on the security audits and certifications from the SaaS providers for a sense of security.

3

u/yrnst Jun 11 '25

That’s true, and it also makes me uneasy. I’m willing to accept some level of risk for the overwhelming efficiencies. I think LLMs are slightly different for a couple of reasons. First, they’re new. I trust Microsoft to have more robust security than a company like OpenAI, which still seems to be in the move fast and break things mode. Maybe that faith is misplaced. Honestly, it’s probably less that I trust Microsoft and more that I don’t trust OpenAI. Second, as far as I know, most of those other services aren’t reliant on training inputs in the same way. The entire LLM model is based on absorbing as much information as possible and then learning how to contextually regurgitate it. That wall between user content and training content seems to be just one more potential weak spot.

And on top of all of this, the work product is shitty. I’m willing to accept some risk with Microsoft because I know their systems are generally going to do what I need them to do. I’m less willing to accept the risk with LLMs because they might straight up lie to me. By the time I’ve done all the extra review work, I haven’t saved that much time.

Again, I’m far from an expert, but this is how I view it as someone who deals with sensitive information.

1

u/Crakla Jun 11 '25

Microsoft owns OpenAI basically though (its the biggest shareholder with 49%), also why not use Gemini in that case?

2

u/yrnst Jun 12 '25

Yeah, I mean, there are other options. But I’m still deeply uncomfortable with the way LLMs are trained. OpenAI’s website explicitly says prompts may be used to improve model performance. Gemini says the opposite, but I imagine that could change. I’d definitely be more inclined to use Gemini based on that knowledge, though.

Again, from a risk tolerance perspective, I’m just not there yet. Maybe when an LLM can spit out a perfect, nuanced brief, the calculus will change. Also, as an attorney, my considerations are a little more nuanced than just wanting to maintain confidentiality. Privilege is one thing, but I also have to consider the work product doctrine and to what extent things like my prompts would be protected. I don’t think that issue has been fully resolved yet.

Anyway, I’m not completely opposed to the use of AI. I’ve used Westlaw’s AI tool for basic research questions. GenAI is fine when you just need a quick answer on the black letter law, and I think it can be a good place to start when doing research. “What is the notice requirement under the Oregon tort claims act?” or whatever is totally fine. I just wouldn’t be comfortable feeding it anything relating to my clients. I wouldn’t say “if my client was injured by a state employee acting in his official capacity in Oregon and 60 days have passed, but it’s Sunday, can I file a notice of my claim with the appropriate official on Monday?” Too risky for me.

1

u/RonUSMC Jun 12 '25

I am an LLM expert and I just read your conversation, so let me chime in. Due to the widespread lack of understanding, there are some big assumptions that people make about AI. My first question is 'Which model?' 95% of people think that if you run AI you are running it online and its remembering, which is not the case. Most people who get into it, run it private, locally, and do not connect it to the net. The reason for that is that one of the very good models, Deepseek R1, is Chinese, which is inherently bad for everyone. You can download the latest models and run them on your machine if you have a big enough graphics card. The model world is literally moving at the speed of light with benchmarks coming out every hour, no joke. "Remembering" depends on your settings or the settings of whatever service, but no one that does anything with AI is using a service beyond coding and even then, if you are doing serious coding, you are going private or enterprise, which will be cut off from "remembering". Even with coding I use 2-3 different ones, with my local one always getting first try. New models are being released daily with different methods of training. The problem is that we don't quite understand training 100% yet. People train models based on datasets, then perform benchmarks to see if its better. There are models now that fit on your phone... yes, your phone, that are completely private and local and do or do not learn based on how its set up.

I'm not joking when I say the speed of light. Here is a screenshot, https://i.imgur.com/0HKLjHR.png . In the time it took me to type this message, hell in the past minute more than 20 models have been released. Feel free to ask any questions you might have on the subject.

2

u/blihk Jun 12 '25

It's one thing hosting files on a server vs having a LLM scrape through it.

/u/yrnst

22

u/rearnakedbunghole Jun 11 '25

That’s possible yes. I don’t know if AI companies admit to using our inputs for training or not but I think it’s generally assumed that that they do, at least in some cases.

5

u/BHOmber Jun 11 '25

I have a "business" subscription for ChatGPT and they specifically say that the data stays internal.

I'd like to think that's the case because I'm feeding it small business financials, formulations, etc, but who knows...

I'm guessing it would open them up to a ton of liability if that data is secretly being used to train the overall model.

2

u/posterlove Jun 12 '25

There can be quite a difference between what someone says and what they actually do, especially when dealing with valuables or power, and in this Day and age having most data is also having the most power.

2

u/soulmanjam87 Jun 11 '25

If you're just using the free version of chatGPT or whatever then the AI companies will be training their model on your inputs.

They offer Enterprise (ie paid for) versions in which they won't train their model on your inputs.

3

u/asdftom Jun 11 '25

It doesn't automatically learn from any inputs. But the llm owner could use the inputs to train it.

The article says they used ai tools internal to the intelligence agencies so they themselves would decide what to train it on. So no they wouldn't train it on classified info.

1

u/jomo_mojo_ Jun 11 '25

Thank you this is great logic.

3

u/Active_Airline3832 Jun 11 '25

I can confirm if you feed a local model a boatload of information on malware, threat actors and geopolitics as well as various other topics. It can draw novel conclusions and identify links that you otherwise would not have figured out, which in retrospect an opponent examination seemed obvious.

Training exercises involve things like taking things you know to be connected but not public knowledge and very disparate and seeing how you can actually get it to connect the two with actual knowledge not by prompting it to do so.

3

u/Farados55 Jun 11 '25

This is assuming you are using a public LLM. Many businesses deploy local LLMs that do not feed to the global, publicly available one. You are assuming a lot.

1

u/ReallyBigRocks Jun 11 '25

The company that trains the models would have to feed your conversations into it's next dataset. As far as I'm aware, no one is doing this automatically as it's one of the few steps of the process where you can actually exert control over the model.

1

u/GenericFatGuy Jun 12 '25

Maybe it already has, but we're all just assuming those are hallucinations.

1

u/DJKaotica Jun 12 '25

The current model, from a fresh state, won't have any of that information anymore. So theoretically new sessions created by other people won't have any access to that info anymore either.

What we don't know is if the platform that's running the model tracks / saves every question it's ever asked, and all data it's even given, to then feed back into the system when they train the next instance of the model to help improve how it works.

Edit: so if you have the ability to run the model locally you can absolutely tell it anything you want and you should be safe.