Let's say you're building a threat detection system. You have security logs, some telemetry, and a few threat intelligence reports. Your data is patchy. You're not seeing full attack chains. What you have is more like a collection of fragments — a compromised credential here, a suspicious IP there, maybe a technique ID from the ATT&CK framework if you're lucky. Your job is to connect those fragments and reconstruct plausible attack paths, even when most of the intermediate steps are missing.
This is exactly the problem that knowledge graph embedding (KGE) was supposed to solve. You model your security incidents as a graph, use link prediction to fill the gaps, and you have a framework for anticipatory defense. In theory, at least. In practice, it breaks almost immediately when you point it at Advanced Persistent Threats.
APTs are long-running, coordinated cyberattacks. They move slowly, pivot across systems, and stay hidden for weeks or months. The data around them is scarce because most of it is never shared publicly. But the problem goes deeper than just having few examples.
Standard KGE models assume the graph has structural density. They need enough connected paths to learn from. APT graphs have neither. One step in an attack might be a phishing email. Then nothing for two days. Then a privilege escalation event from a random IP. In graph terms, these dependencies are not just underrepresented — they are systematically absent. The model never trains on them, so it never learns to detect them.
The core issue is not just that the graphs are small. It is that they are structurally deficient in ways that destroy the assumptions baked into most embedding methods. Even advanced models like ComplEx or DistMult cannot recover from this because their learning signal is tied to graph structure, not domain semantics. The gaps are concentrated exactly where you need the model to work hardest — privilege escalation, lateral movement, exfiltration.
A team from the University of Electronic Science and Technology in China took a different approach. Instead of changing the model, they changed the graph. Their system, APT-ST-AN, addresses two distinct failure modes: not enough positive examples, and not enough useful negative ones.
APT attacks have structure. They play out over time, across specific systems, often with repeatable sequences. If one technique tends to follow another, that relationship has predictive value. APT-ST-AN captures this by introducing a new relation type called "next". It encodes temporal ordering directly into the graph. So if technique A typically precedes technique B, a "next" edge connects them.
These relations are extracted from the MITRE ATT&CK framework, the Attack Flow project, and a lightweight rule-mining system called AnyBURL. AnyBURL discovers logical patterns from existing triples — if technique A often precedes B across multiple campaigns, this pattern becomes a rule. Inferred edges from these rules then become new positive triples in the graph.
The effect is significant. The base knowledge graph in this study had around three thousand triples. After applying spatio-temporal reasoning, it grew to almost seven thousand. That is not noise or fabricated data. It is inferred structure derived from observable patterns in real threat intelligence. The model now has far more semantic coverage to generalize from during inference.
Every KGE model needs negative examples during training — deliberately false triples that contrast with the real ones. Without them, the model cannot learn to distinguish plausible connections from implausible ones. Most systems use random negative sampling. But in sparse graphs, randomly-selected false triples are obvious and trivial. They do not provide a useful learning signal.
APT-ST-AN replaces random negatives with adversarial and synthetic hard negatives. The process has three stages:
The result is a set of negative examples that forces the model to sharpen its decision boundaries. Combined with the enriched positive side from the ST module, the model now has both more to learn from and a harder optimization target.
The authors evaluated APT-ST-AN on multiple small-scale knowledge graphs, with a focus on real APT scenarios. The results were consistent. APT-ST-AN outperformed both classical models like TransE and RotatE, and domain-specific baselines designed for security graphs. Each module contributed measurably — the ST component improved results on its own, and adding the AN component pushed performance higher still.
The improvements held not only on the security-domain datasets but also on public benchmarks from film and literature. This matters because it suggests the approach generalizes. It is not just a tuned solution for one particular graph structure.
If you are building infrastructure that relies on predictive modeling over sparse knowledge graphs — not just in security, but in any domain where full data visibility is never available — this paper offers several transferable ideas.
APT-ST-AN is a targeted response to a specific structural problem: what do you do when your knowledge graph is too sparse to train on reliably? The answer is to enrich both sides of the training signal — more meaningful positives through sequential reasoning, harder negatives through adversarial construction. The result is an embedding model that can make useful predictions even when the data gives it very little to work with.
If you want to dig into the full ablation results or the details of the spatio-temporal rule extraction process, the paper is published in the journal Array (DOI: 10.1016/j.array.2025.100404).