3GIMBALS

Open-Source Vetting for National Security: AI-Driven OSINT and Insider Threat Detection

Open-Source Vetting for National Security: AI-Driven OSINT and Insider Threat Detection

In an era of escalating insider threats and state-sponsored tech espionage, U.S. national security agencies are turning to open-source intelligence (OSINT) and unclassified data to vet individuals and companies for hidden risks. Open-source vetting for national security leverages AI, machine learning, graph databases, and data fusion to connect the dots across public information, enhancing risk assessments and spotting insider threats before they wreak havoc. From cutting-edge AI tools that flag anomalies in a contractor’s background to graph-powered analytics exposing covert networks, these innovations are reshaping how the Department of Defense (DoD) and Intelligence Community (IC) safeguard America’s secrets. This in-depth analysis explores how open-source vetting works in practice—and why recent insider espionage cases in 2024 and 2025 underscore its urgency.

Table of Contents

Introduction: The Insider Threats to National Security Hiding in Plain Sight

Open-source vetting for national security has become a mission-critical practice for the DoD and IC as they confront a surge in insider threats and technology theft. Not all threats wear enemy uniforms, some carry employee badges or university IDs. A defense engineer slipping classified designs onto a thumb drive, a researcher quietly emailing lab data overseas, a supplier with hidden ties to an adversary nation—these scenarios have moved from spy fiction to everyday reality. U.S. security officials warn that foreign governments are actively targeting businesses, academic institutions, and researchers to steal cutting-edge technology. Traditional clearance checks and network firewalls are no longer enough. Today’s adversaries exploit our open society, so vetting individuals and companies via open-source intelligence has emerged as a vital front-line defense.

Leveraging unclassified, publicly available information for vetting might once have seemed optional—now it’s essential. The good news: mountains of data that were previously untapped can be harnessed to expose red flags. The challenge: making sense of that data deluge. This is where AI, machine learning (ML), graph databases, and OSINT fusion come into play. By intelligently sifting and linking data from disparate open sources, these technologies can uncover hidden patterns that humans might miss—from undisclosed foreign affiliations to suspicious financial dealings—all without accessing a single classified database. The result is faster, more proactive risk assessment and insider threat detection, augmenting traditional security measures.

The Rise of Open-Source Vetting in National Security

Vetting in national security typically brings to mind exhaustive background investigations, security clearance interviews, and classified record checks. However, a paradigm shift is underway: agencies are increasingly looking outward to open-source and unclassified information to flag risks earlier and cast a wider net. Open-source vetting involves scouring everything from news articles and academic publications to social media, corporate records, and even the dark web for insights into a person or organization’s trustworthiness. This approach has several advantages. First, it’s unclassified and shareable, enabling collaboration across agencies and with allies. Second, it can reveal context that a classified check might overlook, patterns of behavior visible in public but not in government files.

Open-source vetting is especially pertinent to supply chain security and national security vendor due diligence. Modern defense projects rely on global networks of contractors and suppliers, any of which could be compromised. A 2025 supply chain risk analysis highlighted the need for stricter supplier vetting and transparency, noting new guidance on screening vendors and sharing threat reports to flag hidden dangers. By mining open sources like corporate registries, trade records, and litigation filings, vetting teams can discover if a “U.S.” subcontractor is secretly owned by a foreign rival, or if a parts supplier has past links to sanctioned entities. Such OSINT-driven vetting has proven its worth: when North Korean IT workers managed to get hired remotely at U.S. companies using fake identities, it was an open-source FBI investigation that ultimately exposed the scheme of over 300 illicit hires funneling earnings to Pyongyang. The lesson is clear—had those companies performed deeper open-source due diligence on their remote freelancers, they might have spotted anomalies before onboarding an adversary’s agents.

Importantly, open-source vetting isn’t about replacing classified intelligence with national security customers, but augmenting it. Consider the background investigation process for security clearances. Traditionally, a clearance is granted and revisited only every five to ten years, with limited checks in between. Now, under new reforms often termed Continuous Evaluation or Continuous Vetting, agencies are moving toward ongoing monitoring of employees’ backgrounds via automated data checks. This includes pulling data from criminal records, credit reports, and open sources like news or social media. The idea is to get near-real-time alerts if a trusted insider shows signs of increased risk—for example, an arrest, an extreme debt load, or even online postings advocating violence. However, sorting through such volumes of data is impractical manually. That is why experts argue for AI-driven analytics to be integrated directly into clearance vetting. As former CIA analyst Brian Drake notes, the government should weave open sources into its continuous vetting programs and employ AI tools for data analysis to catch problems early. In Drake’s words, adopting AI for background checks and insider threat detection would “improve security and reduce the risk of future leaks.”

AI, Machine Learning, and Data Fusion: New Open-Source Vetting Tools to Expose Hidden National Security Risks

Artificial intelligence and machine learning have become game-changers in the effort to mine OSINT for vetting. Modern AI algorithms can rapidly sift through vast datasets—news archives, forum posts, leaked data breaches, academic literature—to identify patterns or anomalies that signal risk. For example, machine learning models can be trained to flag language in social media posts that suggests extremist ideology or disgruntlement, or to spot inconsistencies between an individual’s stated resume and their digital footprint. Natural language processing (NLP) can scan thousands of text documents about a company to detect mentions of fraud, sanctions, or legal trouble. Rather than relying on an analyst to manually read everything, AI surfaces the “unknown unknowns.” This helps security professionals focus their investigations on the most relevant risk indicators.

One powerful application is using ML to perform entity resolution and risk scoring. Suppose an individual is applying for access to a DoD research program. An AI-enabled vetting platform can automatically pull in unclassified data about that person—news articles, court records, social media profiles, even GitHub contributions—and fuse it into a single profile. If that name appears in a leaked list of members of a foreign military’s talent recruitment program, or is listed as an officer of a shell company abroad, those would be enormous red flags. AI can match those data points even if name spellings differ or aliases are used, tasks that would stump simple keyword searches. The result is a composite risk score informing whether to grant access or investigate further.

Data fusion is a term often used to describe this integration of multi-source data into a coherent intelligence picture. Advanced vetting involves finding unique data from unindexed information and forging custom datasets from the deep web, while staying in unclassified channels. It’s not just about scraping the surface web; it’s about pulling together every thread of information—public records, social media, academic collaborations, patent filings, import/export logs—into one analysis. The DoD’s innovation units have recognized that fusing open-source data can fill intelligence gaps and even rival insights from classified reports. When done right, open-source data fusion can illuminate relationships and activities that adversaries assumed would go unnoticed in the noise of the internet.

Graph Databases and Network Analytics: Connecting the Dots

Perhaps the most transformative technology in uncovering hidden connections is the graph database. Unlike traditional relational databases that struggle to correlate complex relationships, graph databases excel at mapping networks of associations, a critical need in vetting and insider threat detection. In a graph data model, people, organizations, addresses, IP addresses, financial transactions, and more can all be nodes connected by relationship “edges.” This approach lets analysts and AI algorithms traverse an entire network of links to find indirect or non-obvious connections. Why does this matter for security? Because malicious insiders and covert collaborators often leave a trail of connections in public data. You just need to connect the dots.

Imagine an employee at a defense contractor who is secretly working with a foreign agent. There may be no obvious record in his HR file, but perhaps an open-source search finds that the employee’s email appears as a registered contact for a shell company in another country, or a family member is Facebook friends with a known foreign intelligence operative. A graph-driven OSINT platform can take disparate pieces of data and map such links visually and mathematically. This is precisely how financial crime investigators catch money launderers, by charting out company ownerships and communication links in a graph to see hidden relationships. The same principle helps insider threat analysts spot collusion. Graph analytics are key to identifying undisclosed relationships and assigning risk ratings in real-time. Traditional SQL databases simply cannot handle these multi-hop link analyses efficiently.

In the national security realm, graph-based vetting might reveal that a prospective contractor has multiple ties to entities in a foreign country’s defense supply chain, or that an applicant for a sensitive research role co-authored papers with scientists who later became subjects of an FBI inquiry. Open-source vetting with graph analytics allows these clues to surface quickly, before damage to national security is done. Open data is extremely effective for identifying connections between people and organizations, which is often the key to exposing insider risks. In fact, those connections usually involve an insider linked to an external actor, which OSINT can often pinpoint during screening. By using software to automate link analysis, what used to be a needle-in-haystack problem becomes a solvable puzzle.

Case Studies: Espionage and Insider Betrayals—Then and Now

Open-source vetting techniques have gained prominence because of hard lessons learned from past espionage and insider threat cases. A look at both recent incidents and historical examples illustrates the scope of the challenge, and how many red flags were hiding in public view.

Historical Vulnerabilities

State-sponsored tech theft is not new. During the Cold War, the Soviet Union ran an aggressive campaign to obtain Western dual-use technologies to fuel its military. Moscow’s spies worked assiduously to steal American high-tech components—from microchips to aerospace designs—knowing their weapons programs depended on it. In the 1980s, U.S. counterintelligence launched a famous sting codenamed “Operation Kvadrat” to feed the Kremlin sabotaged technology, precisely because Soviet agents were so relentlessly trying to acquire U.S. tech.

Fast forward to the late 2000s and 2010s, and we see a wave of cases involving China’s intelligence apparatus targeting U.S. industries and research institutions. In 2010, for instance, a Chinese-born engineer named Sixing “Steve” Liu was caught stealing thousands of files from L-3 Communications, a New Jersey defense contractor. The files contained detailed designs for missile guidance systems, rockets, and UAVs—which Liu had smuggled out in hopes of securing a job in China. He even gave presentations about the stolen technology at Chinese universities and defense conferences, all of which could have been discovered by a diligent open-source inquiry into his post-resignation activities. Liu’s conviction under the Economic Espionage Act in 2012 was a wake-up call that defense contractors were prime targets for foreign espionage.

Around the same time, the case of Yanjun Xu underscored how far adversaries would go. Xu was a deputy division director in China’s Ministry of State Security (MSS), essentially a professional spy, who spent years attempting to steal advanced aerospace know-how from U.S. companies. He specifically focused on GE Aviation, recruiting insiders and posing as a headhunter to lure engineers to China. In an unprecedented counterintelligence operation, the FBI lured Xu to Belgium in 2018 for a fake tech conference, arrested him, and extradited him to the U.S. He was convicted of economic espionage and attempting to steal trade secrets—namely, GE’s composite aircraft fan blade designs—and in 2022 was sentenced to 20 years in prison.

Yangun Xu was convicted of economic espionage and attempting to steal trade secrets for China; his case demonstrates the pervasive threats to U.S. technology and the need for open-source vetting
Figure 1: Yangun Xu was convicted of economic espionage and attempting to steal trade secrets for China; his case demonstrates the pervasive threats to U.S. technology and the need for law enforcement to embrace all tools—including OSINT and AI—at its disposal

This case, one of the first where a Chinese intelligence officer was brought to U.S. soil for trial, highlighted both the persistence of foreign intelligence in targeting U.S. tech and the difficulty of catching such spies only after damage is done. It begged the question: could earlier open-source vetting or monitoring of industry partnerships have noticed, for example, unusual approaches to GE engineers that posed risks to national security? Every meeting Xu set up, every false identity he created, potentially left breadcrumbs in open sources—email addresses, travel records, online postings—that an AI-driven analysis might have flagged as suspicious network activity.

Recent Insider Threat Incidents (2024–2025)

Unfortunately, the past two years have offered plenty of fresh examples that these vulnerabilities remain. In February 2024, the FBI arrested Chenguang Gong, a 57-year-old Chinese-born U.S. citizen working in California, for stealing sensitive technology from his employer. Gong had been a circuit design manager at an R&D company developing infrared sensors used in space-based missile launch detection—critical defense technology. In just four months of employment, he siphoned roughly 3,600 files onto personal devices. These files included blueprints for infrared satellite sensors to spot nuclear missile launches, as well as designs enabling U.S. aircraft to detect and evade heat-seeking missiles. According to prosecutors, Gong had earlier attempted to provide information to the PRC military.

Here we see a textbook insider threat: an individual with legitimate access to highly sensitive DoD-related technology, who turned around and tried to hand it to a foreign adversary. What warning signs might OSINT have caught? Perhaps Gong had past business dealings in China or had made online postings in Chinese tech forums indicating an interest in PRC government work? These are the types of leads an open-source vetting team could look for when someone joins a classified project—especially someone who, like Gong, had immigrated from a country known for targeting diaspora in espionage efforts.

Just a month later, in March 2024, another bombshell case emerged, this time involving the theft of AI secrets from one of America’s tech giants. A software engineer at Google’s California headquarters, Linwei “Leon” Ding, was indicted for an audacious scheme to steal Google’s proprietary AI research and source code.

Ding worked on Google’s supercomputing infrastructure for machine learning. Over the course of 2022–2023, he reportedly uploaded over 1,000 files of confidential Google data, including technical details of Google’s TPU and GPU AI chips—the software that runs its AI supercomputer—and designs for specialized network interface hardware to his personal cloud account.

Why? The indictment says that while employed at Google, Ding had secretly become involved with two Chinese companies and ultimately founded his own AI startup in China, effectively acting as an inside mole to feed his new venture, and by extension, the Chinese government, Google’s crown-jewel AI innovations. He now faces charges of economic espionage.

This case is striking because it did not occur at a defense contractor or government facility, but the technology stolen, advanced AI hardware and software, has clear national security implications. It underscores that insider threats can appear anywhere cutting-edge tech is developed, including universities and private firms that may later collaborate with the DoD. It also showcases how difficult it can be to detect a rogue insider before it’s too late.

One can’t help but wonder: if Google and other companies handling sensitive R&D employed rigorous open-source vetting and monitoring of their employees, might they catch warning signs of divided loyalties that put national security at risk? In Ding’s case, open records in China showed him registering a business and pursuing a CTO role there while still at Google, exactly the kind of intelligence a thorough OSINT background check could surface.

Beyond high-tech theft, 2024 saw insider breaches even within the U.S. military ranks. In August 2023, two U.S. Navy sailors based in California were arrested for transmitting military information to Chinese intelligence, a stark reminder that uniformed personnel can be insider threats too. One of them, Petty Officer Wenheng Zhao, had been bribed by a Chinese agent to pass photos, manuals, and blueprints of U.S. Navy systems; the other, Navy sailor Jinchao Wei, was accused of collecting details on the weapons and vulnerabilities of the ship he served on. Zhao pleaded guilty and was sentenced in early 2024 to 27 months in prison.

These cases revealed that the sailors had been approached via social media and convinced to smuggle out information, a breach of trust that might have been prevented if closer continuous monitoring of their finances and contacts had raised red flags. It’s noteworthy that Zhao’s espionage was detected relatively quickly, in part because the Naval Criminal Investigative Service (NCIS) caught wind of unusual communications. This success is encouraging, and it shows how insider threat programs are improving. Yet, it also highlights the need to proactively scan open-source venues like social media or messaging platforms for signs of personnel being targeted by foreign agents. Modern insider threat detection systems increasingly incorporate such capabilities, using AI to watch for keywords or patterns that suggest someone is recruiting or grooming an “insider.”

Academia has not been spared either. American universities conducting defense-related research have long been targets for intellectual property theft. A recent example came to light in October 2024: federal charges were filed against a group of five former University of Michigan students, all Chinese nationals, who were caught spying on a National Guard base in Michigan. They had surreptitiously taken photos of military equipment and exercises – information likely meant for the PRC. When confronted, they tried to pose as clueless tourists.

This incident echoed other cases of students or researchers exploiting academic cover to engage in espionage. It’s a delicate issue; universities thrive on openness and exchange of ideas. Yet, it’s now clear that adversaries see academia as a soft underbelly of national security. Open-source vetting can help here too: for instance, by screening foreign students or visiting scholars against databases of known intelligence fronts or by monitoring collaborations that involve export-controlled research.

Building an AI-Enhanced Shield Against Insider Threats

Given the sobering parade of incidents, how are the DoD and IC responding? The trend is toward a more unified, tech-enabled approach to insider risk. Insider Threat Programs have matured across agencies, often integrating counterintelligence, cybersecurity, and HR to detect anomalies. Best practices now call for a fusion of data from internal sources such as badge access logs, network activity, and personnel records with external open sources like public social media, criminal records, financial records.

For example, the Department of Defense is expanding its Continuous Vetting system that automatically checks cleared individuals’ backgrounds on a frequent basis. This system queries various databases and can incorporate public-record monitoring to catch things like arrests or large purchases that might indicate elevated risk. As one official described, “we put your name in the system… it goes through data sources on a recurring schedule, and if an anomaly comes up, we decide what to do with it”. Here again, AI plays a role: machine-learning models can learn what combinations of factors constitute a true risk versus a benign anomaly, improving the accuracy of alerts.

Another development is the push for insider threat detection tools that use behavioral analytics. Some modern security systems create a baseline profile of each user’s typical activities both on computer networks and in terms of personal life indicators. If someone starts downloading far more data than usual, logging in at odd hours, or suddenly incurs gambling debts or begins contacting competitors, these systems will trigger alerts. By fusing classified and unclassified inputs, agencies aim to get a 360-degree view of their trusted personnel.

It’s a challenging balance: privacy concerns are important, and no one wants “big brother” watching every aspect of employees’ lives. But the frequency and impact of recent breaches have driven home the point that early intervention is critical. A malicious insider typically exhibits warning behaviors like financial stress, disgruntlement, unexplained affluence, and contacts with adversaries before they actually betray secrets. The key is catching those signals in time. AI, with its ability to correlate subtle patterns, is invaluable here, and unlike humans, it doesn’t get tired or biased in scanning data.

On the industry side, defense contractors and tech companies are also stepping up. Many have instituted their own OSINT-based screening of new hires and partners. It’s now routine for a major contractor to run extensive open-source background checks on a startup before investing or collaborating, looking for any ties to foreign intelligence or past fraud.

Similarly, companies are increasingly doing social media sweeps of prospective employees, within legal and ethical bounds, to ensure nothing glaring is amiss. In one notable case, a major aerospace firm discovered through open-source checks that a job applicant had previously authored research sponsored by a foreign military, leading to additional scrutiny that ultimately disqualified the candidate.

Graph databases have become popular in the corporate security world to maintain “risk graphs” of supplier and employee networks. For example, if one employee is flagged, a company can quickly see who within their network, professionally or personally, might also pose a concern, mapping out a web of risk that can be monitored.

Finally, there is a growing recognition of the value of public-private intelligence partnerships. The FBI, DHS, and DoD frequently share unclassified threat intelligence with cleared industry partners, but now we see a two-way street: companies are using their OSINT capabilities to feed insights back to the government. If a defense contractor’s open-source team uncovers an online forum where anonymous users are soliciting insiders to sell classified information, they can relay that to federal authorities.

In essence, everyone is becoming a sensor in the broader counterintelligence ecosystem. The more data points and analytic power brought to bear, the harder it is for an insider to hide.

Conclusion: Toward a Proactive and Fusion-Driven Security Culture

The case studies from 2024 and 2025 make one thing abundantly clear: insider threats and foreign espionage remain a persistent, evolving danger to U.S. national security. But they also highlight opportunities—moments where better use of open-source intelligence and analytical technology could have raised the alarm sooner.

Open-source vetting, enhanced by AI, ML, and graph databases, offers a way to shift from reactive investigations to proactive prevention. It enables what some call a “360-degree threat surface” view, examining not just what an individual does inside the secure walls, but what connections and activities surround them in the open world.

To be sure, no system is foolproof. There will always be clever adversaries and the occasional false alarm. The goal for DoD and IC leaders is to integrate these tools in a balanced way: fuse OSINT data with classified intel, automate where possible but keep human analysts in the loop, and constantly refine the models with lessons learned from new incidents.

The best defense combines exquisite data and the talent to use it effectively. In practice, that means pairing cutting-edge AI platforms with seasoned analysts who understand context and can make judgment calls.

Culturally, agencies are encouraging a mindset of vigilance. “Trust but verify” has become “trust but continuously verify.” By normalizing the use of open-source checks and AI analysis in vetting, organizations send a message that attempts at insider malfeasance are much more likely to be caught.

The deterrent effect can be significant. A potential leaker or spy who knows that anomalous behavior will trigger an immediate review might think twice before acting.

In the end, embracing open-source vetting and advanced analytics is about harnessing the same forces that empower our adversaries—global connectivity, big data, AI—and turning them to our advantage. The U.S. has a rich tradition of open information and innovation; those strengths can also be our security strengths.

We can scour the same open seas where spies try to hide and fish out the threats among them. By uniting the efforts of government, industry, and even academia in this information fusion approach, we build a layered defense where the insider, the fraudster, or the spy has fewer and fewer places to conceal their tracks.

National security has always been a cat-and-mouse game of finding the mole in the shadows. Today, the shadows are often digital and in plain sight. AI-powered open-source vetting shines a light into those spaces, illuminating the clues that betray a bad actor.

The takeaway for policymakers and security professionals is that investing in open-source intelligence fusion is no longer just a forward-leaning idea—it’s now a best practice and an operational necessity. With our adversaries probing every vulnerability, from supply chains to research labs, the organizations that succeed will be those that leave no stone or dataset unturned in protecting their assets.

Open-source vetting, powered by the latest technology and guided by skilled analysts, is rapidly becoming an indispensable pillar of modern national defense. By learning from the past and staying ahead of the curve, the DoD and IC can mitigate insider threats and ensure that America’s most sensitive knowledge doesn’t fall into the wrong hands. The tools are here; it’s up to us to use them wisely and stay one step ahead in the never-ending contest of intelligence and counterintelligence.

Curious? Read more

We saw you looking. Contact us.