You can’t talk to a business customer today without AI coming up. While most people seem to have embraced the power of public generative AI tools like OpenAI’s ChatGPT or Google’s Gemini, there’s a lot of hesitation when it comes to using generative AI on enterprise content. The one concern that comes up again and again?
Security.
Rightfully so. Public AI tools don’t have to worry about security. They’re gobbling up all the data on the public internet with the motto: “Train first, worry about intellectual property rights later.” Technically, nothing is stopping them from doing that, and their models are fed by scrapers and crawlers that grab everything they can find.
In an enterprise, however, that doesn’t work. Enterprise data is privileged, confidential, and subject to privacy laws. The data cannot be shared with everyone, and AI models must respect that. It means two users must receive different answers to the same question, depending on the data they’re authorized to access.
The problem is, if your content isn’t well secured and governed in the first place, AI will expose those holes quickly. You may have been able to hide some data behind cryptic file names, but that won’t stop the AI models. Having solid data governance with granular, clean permissions is imperative. Otherwise, it’s “bad security in, bad security out,” to paraphrase Fuechsel’s Law ("garbage in, garbage out").
It also means you need to bring AI tools to your content rather than trying to bring your content to the AI tools. It’s hard enough to secure your content in the first place, and the idea of copying a snapshot into a separate container for AI would obliterate any of that security.
Don’t expect public AI vendors like OpenAI, Google, Anthropic, Meta, or DeepSeek to solve this problem. Enterprise content is a different animal—one they neither understand nor care to understand. None of these vendors has any enterprise DNA. Security is not their concern, and their models aren’t built with the assumption that data access should vary by user.
To illustrate this point, let me remind you of what happened with enterprise search. Web search, which we all use many times a day, is based on an index created by crawlers that scour the internet to deliver the best content match for your keywords—the same results for everyone. That’s what Google does in simplest terms. But in the enterprise, that approach doesn’t work. Enter enterprise search.
About 20 years ago, Google—the heavyweight search champion—entered the enterprise search space with a bright yellow, rack-mounted Google Search Appliance, drawing a lot of attention with its promise that managing content wasn’t necessary: "Wherever it is, you can find it with Google". Or something like that.
It sounded great—except it didn’t work. Google eventually discontinued the product after a decade of trying. Interestingly, other major players in enterprise search met similar fates. There was FAST, which Microsoft acquired in 2008—only to discover a year later that FAST had been cooking the books. And then there was Autonomy, which HP acquired in 2011, only to eventually sue the CEO and CFO for—you guessed it—cooking the books. The Hollywood-worthy Autonomy saga ended with the CFO in jail and the CEO dying in a freak boating accident. (I described that story in more detail last year in “Mike Lynch, Autonomy, and Incredible Coincidences”.)
Today, search is provided by the companies that own the data. Enterprise search is hard, and usually, only the company that built the repository has a chance of doing it well. On the web, Google finds content that wants to be found—literally. Millions of companies spend billions of dollars each year on SEO to make their content easily discoverable. And there’s no security to worry about.
Enterprise content is different. It’s not optimized for search engines, and security is not optional. This is hard to get right. Eventually, the open-source Apache Lucene solved the problem well enough, and that’s what many enterprise applications use today. Still, you rarely hear anyone say, “Wow, this search is amazing”—because it doesn't match Google’s web search, which sets the expectations bar.
Now, let’s come back to AI. The vector databases at the heart of enterprise AI models must respect data security, just like search indexes do. That’s incredibly difficult for anyone other than the companies that hold the data. Only they understand the data structures, the users, and their permissions. For any external application, sure, it’s possible, but it's really hard to make that work. If you don’t believe me, think back to enterprise search.
AI in the context of enterprise data will be extremely valuable, with the potential to dramatically boost productivity—whether it’s through assistants, agents, or whatever comes next. But in an enterprise, the first rule will always be: respect the data’s security.
And that makes it hard.