ML Security Research That Goes Deeper

ML Security Research That Goes Deeper
SPR{k3 discovers critical vulnerabilities in production AI infrastructure — and proves them. Validated by Meta, Microsoft, NVIDIA, Amazon, and Intel.
SPR{k3 operates in the gap between traditional security tooling and production ML infrastructure — where 12 CVEs and 79+ validated findings prove the exposure is real.
We combine runtime telemetry with proprietary AI/ML exploit intelligence to separate noise from meaningful risk.
Request a Findings Briefing
View Our CVEs
The ML Security Layer Below Your Perimeter
Perimeter tools protect the boundary. SPR{k3 goes beneath it — into ML frameworks, training pipelines, model artifacts, and supply chains that need specialized analysis.
ML Framework Vulnerabilities
Pickle RCE in PyTorch, NeMo, and DeepSpeed. Not a cloud misconfiguration — a code-level exploit in the framework itself, outside every perimeter tool on the market.
Distributed Training Exploitation
NCCL, ZMQ, and unauthenticated gRPC attack paths inside your training cluster. One compromised node. Full cluster access in seconds.
Supply Chain Poisoning
Coordinated malicious patterns inserted across multiple ML repositories at once. Invisible to single-repo scanners. Only detectable through cross-repository temporal analysis.
The Vendor Scope Gap
NVIDIA, Microsoft, and Amazon made reasonable scoping decisions. SPR{k3 operates in the space those decisions left uncovered — where your production environment differs from the vendor threat model.
Could your production AI environment have attack surfaces that fall outside every existing tool's scope? SPR{k3 was built to cover that gap.
Why code scanning isn't enough
Frontier AI models can now find code vulnerabilities at superhuman scale. The ML vulnerabilities that matter aren't in the code.
Trust Boundary Violation
No bug in any file. Workers trust an unauthenticated network that delivers pickle. The architecture is the vulnerability.
Adversarial Data Source
pickle.load() works as designed. The data comes from a decentralized network where any participant can inject payloads. The trust assumption is the vulnerability.
Patch Coverage Gap
NVIDIA patched the CVE. 14 call sites bypass the patch entirely. The coverage gap is the vulnerability.
Compositional Exploit
Four files, three abstraction layers, every line clean. RCE exists only in the composition.
These aren't code bugs. They're broken trust assumptions, incomplete mitigations, and compositional flaws. Ora understands ML systems as systems — not as code. That structural understanding is what makes the tempo possible. The NeMo Hydra bypass found, validated, and disclosed before the vendor's own patch audit caught it. CVE-2026-24747 predicted 24 hours before public disclosure. While large-scale initiatives coordinate across twelve organizations, Ora has already scanned, found, reported, and moved on. The advantage isn't scale. It's what Ora understands — and how fast that understanding compounds.
THE COVERAGE BLIND SPOT
The Scope Gap in ML Security
The gap is not about bad tools. It is where vendor threat models end and real production environments begin — a structural boundary that leaves important ML attack surfaces unaddressed.
Where Vendor Scope Ends
NVIDIA scopes to its deployment assumptions. Microsoft scopes to its SDK. Amazon scopes to SageMaker's managed surface. None of them scope to your specific production topology.
The 250-Sample Threshold
Carlini et al. showed that 250 poisoned samples can reliably backdoor any LLM. That is the attack surface. Ora detects at 1–50 files — before the threshold is reached.
Cross-Repository Coordination
The LiteLLM attack hit five package ecosystems simultaneously. No single-repo scanner sees the pattern. SPR{k3's temporal cross-repo analysis does.
250 samples
Backdoor threshold (Carlini et al.)
to backdoor any LLM
1–50 files
Ora detection floor
SPR{k3 flags attacks before escalation
Proven in Production.
13
CVEs Across NVIDIA, Microsoft, GitHub & GitPython
79+
Confirmed Vulnerabilities
95.7%
Detection Accuracy
<3%
False Positive Rate
3
Consecutive NVIDIA Bulletins
Validated by Meta · Microsoft · NVIDIA · Amazon · Intel · GitPython
CVE Evidence
Three consecutive months of published, credited NVIDIA security advisories. Publicly verifiable.
NVIDIA Security Bulletin — February 2026
CVE-2025-33241 · CVE-2025-33243 · CVE-2025-33251 · CVE-2025-33252 · CVE-2025-33253 — Dan Aridor, SPR{k3 Security Research. View Bulletin →
NVIDIA Security Bulletin — March 2026
CVE-2025-33244 · CVE-2026-24157 · CVE-2026-24159 · CVE-2026-24152 · CVE-2026-24151 · CVE-2026-24150 — Dan Aridor, SPR{k3 Security Research. View Bulletin →
NVIDIA Security Bulletin — April 2026
Dan Aridor, SPR{k3 Security Research. View Bulletin →
Microsoft & Amazon — Security Acknowledgements
CVE-2026-26030 (CVSS 10.0) — RCE in Microsoft Semantic Kernel, acknowledged by Microsoft MSRC. RCE in Amazon SageMaker Python SDK, acknowledged by AWS Security.
GitPython — RCE via Newline Injection (GHSA-v87r-6q3f-2j67)
Found by Ora hours after Wiz published CVE-2026-3854. Fixed and advisory published within 8 hours. CVSS 7.8 (High). Affects GitPython <= 3.1.48. View Advisory →
Intel — Security Acknowledgement
Dan Aridor, SPR{k3 Security Research. Acknowledged by Intel Product Security.
What We Find
Vulnerabilities Vendors Don't Cover
Production deployments routinely operate outside vendor threat model assumptions. We identify the vulnerabilities that fall between vendor scope boundaries and operational reality.
Coordinated Attacks Across Repositories
Malicious patterns that spread across multiple ML frameworks simultaneously. Only visible when the full ecosystem is analyzed together.
Risks That Persist Through the Model Lifecycle
Backdoors that survive fine-tuning. Poisoning that persists through quantization and model merges. Threats that point-in-time scans miss entirely.
Attack Classes We Detect
Unsafe Deserialization / Pickle RCE
torch.load(), pickle.loads(), joblib.load() on untrusted data. Confirmed across PyTorch, NeMo, DeepSpeed, HuggingFace, and AutoGluon.
Distributed Training Exploitation
NCCL, ZMQ, unauthenticated gRPC in training clusters. One compromised node — full cluster access in seconds.
Supply Chain Poisoning
Coordinated malicious patterns across multiple repositories. Detected via cross-repo temporal correlation.
LLM Cognitive Degradation (BrainGuard™)
5 cognitive attack pattern classes covering context integrity, pipeline taint, reasoning consistency, evaluation drift, and entropy monitoring. Assessed across 177 frameworks. Now available as a standalone assessment.
Agent Security / MCP Trust Poisoning
Tool description injection, OAuth delegation exploits, agent identity mutation. OWASP AI agent threat landscape.
Model Artifact Poisoning
Quantization backdoor survival, model merge poisoning, LoRA adapter injection, checkpoint manipulation.
AI cognitive health — the attack surface nobody monitors
Perimeter tools protect the network. Vulnerability scanners find code flaws. Nobody monitors the cognitive integrity of your LLM applications. SPR{k3's BrainGuard engine identifies 5 distinct attack pattern classes across the AI reasoning layer — validated across 177 ML frameworks.
177
frameworks assessed
5
cognitive attack classes
340
avg gaps per framework
The five BrainGuard pattern classes
Context boundary erosion
Unbounded prompt assembly paths that allow external content to displace safety-critical instructions.
What it means: An attacker can push your system prompt out of the model's active context — silently removing the guardrails your application depends on.
Agent pipeline taint propagation
Unsanitized data flows between agent pipeline stages where tool outputs enter reasoning without validation.
What it means: A compromised tool response can escalate through your agent pipeline to unauthorized actions — with no taint boundary to stop it.
Reasoning consistency gap
Missing equivalence validation allows semantically identical queries to produce contradictory outputs.
What it means: Your LLM can give opposite answers to the same question asked two different ways. In regulated contexts — financial, legal, medical — this is an unacceptable reliability risk.
Self-evaluation drift
Ungrounded self-evaluation loops where model confidence increases while accuracy degrades.
What it means: Your model grades its own homework. Each self-refinement iteration can amplify the original error while reporting higher confidence. Downstream systems trust the score.
Reasoning trace entropy gap
Absent runtime monitoring for reasoning distribution shifts — slow degradation and backdoor activation go undetected.
What it means: Without cognitive health monitoring, a poisoned fine-tune or gradual model drift can shift behavior significantly before anyone notices. By the time outputs visibly degrade, the model has been compromised for weeks.
BrainGuard cognitive health assessment
We scan your LLM application codebase and map all 5 pattern classes to your specific code. You receive a prioritized risk assessment with exact locations, severity classification, and architectural remediation guidance. The methodology, exploitation details, and proof-of-concept materials are delivered under NDA.
What we deliver
Prioritized finding locations, severity mapping, architectural remediation roadmap, and proof-of-concept demonstrations — all under NDA.
Request Assessment →
'SPR{k3 identified cognitive health gaps in our agent pipeline that no other security tool flagged. The remediation roadmap was specific to our codebase — not generic recommendations.'
Enterprise client, agent framework deployment
Beyond Detection — Active Intelligence
SPR{k3 doesn't wait for threats to be reported. It predicts, tracks, and intercepts them — powered by an original bio-inspired algorithm that treats the ML threat landscape as a living, evolving system.
Predictive Threat Research
SPR{k3 identifies vulnerability classes before they are exploited in the wild. We surface threats that have no CVE yet — and no existing detection signature.
Active Threat Forecasting
We publish predictions about where the next ML attack vectors will emerge. Enterprises that engage with SPR{k3 see what's coming before it arrives.
Early Warning Intelligence
Coordinated attack patterns leave traces across repositories before they reach production. SPR{k3 detects these signals early — before a campaign becomes a confirmed breach.
Original Bio-Inspired Algorithm
Patent-pending. Models codebases as living systems. Identifies deviations from what should be preserved — not just known bad patterns. This inversion is what makes prediction possible.
Every finding feeds the model. Every CVE sharpens the forecast. The system compounds.
The Ora Scanner
Ora (אורה) means "light" in Hebrew. Named after Dan Aridor's late aunt, it carries a double significance: a personal tribute to family, and a metaphor for what the scanner does — illuminate hidden vulnerabilities, coordinated attacks, and architectural risks that other tools leave in the dark.
<3% false positive rate
with automated FPFE (False Positive Filter Engine) plus mandatory manual verification before any submission
Cross-language analysis
Python, JavaScript, Java, C++, Go, Rust, Ruby
SARIF output
for CI/CD integration, plus JSON/CSV reporting
Patent-pending methodology
for detection built to protect what matters most
Discovery Intelligence — Ora Finds What It Doesn't Know to Look For
Every scanner on the market matches against a fixed pattern database. When a new vulnerability class emerges, there's a gap — days, weeks, sometimes months — before someone writes a detection rule. Ora closes that gap to 24 hours.
Darwin Evolutionary Intelligence
Maintains a knowledge base of 3,944+ vulnerability pattern families. Every night, Darwin identifies repositories containing patterns it has never seen before — and sends them for deep analysis.
Level 3 Deep Analysis
Unclassified repositories get deep multi-layer analysis — tracing how untrusted data moves through code, across functions, and into execution.
Automatic Rule Generation
When deep analysis discovers a new vulnerability family, Ora automatically converts it into a detection rule and deploys it across the full scanner cluster.
187 Novel Families Discovered
What was invisible yesterday is detectable today. The detection surface doesn't just grow — it evolves.
"What was invisible yesterday is detectable today. The detection surface doesn't just grow — it evolves."
Beta Access
Ora finds it. Defend stops it.
SPR{k3 Defend is runtime protection for ML infrastructure — powered by the same pattern intelligence that discovers the vulnerabilities. The assessment identifies the exposure. Defend blocks it at runtime. Same pattern. Same system. No gap between discovery and defense. Now in beta at defend.sprk3.com — one-command install, 29 detection patterns, zero interference.
Monitor Mode
Every unsafe deserialization, model load, and hook execution logged in real time. Full visibility before you enforce.
Enforce Mode
Patterns blocked at the process level before execution. No signatures to write. No rules to maintain.
'Traditional endpoint security doesn't know what torch.load(weights_only=False) means. Defend was purpose-built for the attack surface that only exists in ML infrastructure.'
Get your agent →
Who Defend protects.
SPR{k3 Defend serves three tiers — enterprise ML infrastructure, and the individuals who handle the most dangerous code every day without knowing it.
ML Researchers
Downloading models from HuggingFace daily. Every torch.load() on an untrusted model is a potential RCE. Nobody warns them. We found 3 torch.load() calls without weights_only in llama.cpp's conversion tools — tools thousands of researchers run on models downloaded from strangers.
AI Agent Users
Running Claude Code, Cursor, Copilot, AutoGPT locally. These agents execute code, install packages, write files. One poisoned suggestion and the agent installs a compromised package or writes a backdoor. Nobody's watching.
Data Scientists
pip installing from requirements.txt they didn't audit, loading pickled datasets, running notebooks from Kaggle. Every pip install and every pickle.load() is a trust decision they don't know they're making.
Organizations
Security teams running ML in production with no runtime visibility into what their models and pipelines are actually executing. The assessment finds the exposure. Defend closes the window between the report and the fix.
AI Platform Suppliers
Building the infrastructure other teams run ML on — managed training clusters, model serving platforms, MLOps tooling. A single unsafe deserialization pattern in your platform is a vulnerability in every customer environment. Defend gives you runtime proof that your platform is clean.
"Defend reads shell history, pip logs, and downloads. The agent monitors tool calls and model loads. The registry has the patterns. The exposure is real — it just needs protection and intelligence."
Get your agent →
Beta
One system. Both sides.
SPR{k3 operates a shared pattern registry — the same intelligence that powers Ora's scanner also powers Defend's runtime layer. As the scanner evolves, the defense surface grows with it. No manual rule maintenance.
Ora finds a pattern
pickle.loads() in a distributed training framework's inter-node communication layer.
Pattern enters shared registry
Auto-generated from scan findings. No manual curation required.
Defend enforces at runtime
Any process calling the pattern triggers detection. Monitor or block.
Registry updates automatically
New scans discover new patterns. Defense surface compounds.
'The scanner finds vulnerabilities. Defend stops them. The system compounds.'
Beta
Runtime Detection Classes
Defend intercepts at the process level — not the source line. Every class below is detected at execution, not flagged in a report.
Unsafe Deserialization at Runtime
pickle.loads(), cloudpickle, dill, joblib, YAML unsafe_load — intercepted before execution.
Model Loading Without Verification
torch.load() without weights_only=True. numpy.load() with allow_pickle. HuggingFace trust_remote_code=True without revision pinning.
Network → Deserialization Chains
ZMQ recv_pyobj, unauthenticated gRPC endpoints deserializing pickle, HTTP endpoints accepting serialized payloads.
Supply Chain Injection
Runtime package installation from untrusted sources. setup.py execution during install. Dynamic imports from user-controlled paths.
Agent Exploitation
Unrestricted tool execution, data exfiltration to external endpoints, agent identity mutation via prompt injection.
ENTERPRISE ENGAGEMENTS
Seven ways to work with SPR{k3
Direct engagement. No SaaS trial. No access to your systems required. Every offering is backed by 79+ confirmed vulnerabilities across Meta, Microsoft, NVIDIA, and Amazon.
01 — ASSESSMENT
ML security findings report
Documented vulnerabilities specific to your ML stack. Scope gap analysis, propagation paths, blast radius, remediation guidance.
02 — TRUST BOUNDARY
Vendor scope gap audit
Maps every ML component to its vendor's threat model. Identifies exactly where vendor scoping assumptions don't hold in your topology.
03 — COGNITIVE HEALTH
BrainGuard assessment
5 cognitive attack pattern classes across your LLM application layer. Context integrity, pipeline taint, reasoning consistency, evaluation drift, entropy monitoring.
04 — SUPPLY CHAIN
ML supply chain audit
Cross-repository temporal correlation across your entire ML dependency tree. Package integrity, model provenance, checkpoint trust verification.
05 — CERTIFICATION
ML security readiness report
Pre-deployment security assessment for ML-powered products. Compliance documentation, SOC 2 evidence, board reporting artifact.
06 — ONGOING
Continuous monitoring retainer
Weekly scan reports. Prioritized triage. Early warning on coordinated supply chain attacks. Direct access to the research team.
07 — DEFEND
Runtime ML defense layer
Continuous runtime protection powered by your assessment findings. Monitor or enforce mode. Auto-updating pattern registry. Covers deserialization, model loading, distributed training protocols, supply chain, and agent exploitation. Deployed into your ML pipeline — not bolted onto your perimeter.
INCIDENT RESPONSE
ML supply chain incident response retainer
When the next LiteLLM-scale attack hits, you need someone who can scan your stack in hours and tell you "affected" or "clean." Guaranteed 4-hour response SLA.
All engagements begin with a conversation. NDA before any technical details.
support@sprk3.com
Research & Publications
Original research, published findings, and references to academic work that informs our detection approach.
Original Research — Dan Aridor, SPR{k3
Amazon SageMaker — Remote Code Execution (January 2026)
RCE in the SageMaker Python SDK JumpStart search flow. Fixed in v3.4.0, acknowledged by AWS Security. Read Post →
Microsoft Semantic Kernel — RCE, CVSS 10.0 (December 2025)
CVE-2026-26030. Remote code execution in InMemoryVectorStore filter parsing. Read Post →
The Scanner Was the Weapon — LiteLLM Supply Chain Analysis (March 2026)
Coordinated supply chain attack across five package ecosystems. LiteLLM — 97M+ monthly downloads — poisoned via a ghost PyPI release. Read Post →
When the Git Library Writes Your Config for You — GitPython RCE (April 2026)
Newline injection in GitPython's config_writer().set_value() enables persistent RCE via core.hooksPath — found by Ora hours after Wiz published CVE-2026-3854, while auditing MLRun's project.push(). Fixed and advisory published within 8 hours. Same class of bug, different layer. Advisory: GHSA-v87r-6q3f-2j67. Fix targeted for GitPython 3.1.49. Read Post →
Referenced Academic Work — External Sources
Poisoning Web-Scale Training Datasets is Practical (Carlini et al.)
Carlini et al. demonstrated that an attacker needs as few as 250 poisoned samples to reliably backdoor a large language model trained on web-scale data — and that those samples can be injected by purchasing expired domains that once hosted legitimate training data. This finding is foundational to SPR{k3's detection threshold: if the attack surface begins at 250 samples, a scanner that only flags large-scale anomalies will miss the most dangerous insertions entirely. Ora's cross-repository temporal analysis was designed specifically to operate below that threshold — detecting coordinated poisoning patterns at 1–50 files, before they reach the scale required for reliable backdoor activation. View on arXiv →
About SPR{k3
SPR{k3 was built by Dan Aridor. 12 CVEs assigned across 4 NVIDIA AI products: NeMo Framework, Megatron-LM, NeMo-Guardrails, and Apex. 79+ vulnerabilities confirmed across major ML frameworks.
We work closely with vendor security teams through coordinated responsible disclosure — 90-day timelines, professional documentation, and constructive collaboration. We help enterprises address the risks that fall between vendor scope boundaries and production reality.
The scanner is named Ora, after Dan's late aunt. SPR{k3 operates from Israel.
Contact: support@sprk3.com
Research: Dan Aridor — NVIDIA, Microsoft MSRC & Amazon Security Acknowledgements (2025, 2026)
Dan Aridor
Founder, SPR{k3 Security Research
Columbia Business School — MBA
Corporate finance, strategic partnerships, and fundraising.
Lt. Colonel, Israeli Intelligence Corps
Reserve service, retiring 2012. Co-headed a counter-intelligence research unit.
Chairman, AEBI-Bio
Leads the SoAP biotechnology platform — reducing drug discovery attrition for challenging therapeutic targets.
Founder, inga314.ai · inga314.com & Dan Aridor Holdings
AI-driven logical analysis frameworks applied to research and data science. Strategic consulting firm specializing in operational profitability.
MuTaTo — Multi-Target Toxin Cancer Research
Connected to AEBi's experimental personalized cancer treatment concept — targeting multiple receptors on cancer cells simultaneously to prevent resistance, using a peptide-based Trojan Horse strategy. Early-stage research with promising in-vitro and mouse study results.
support@sprk3.com · NVIDIA, Microsoft MSRC & Amazon Security Acknowledgements (2025, 2026)
Get in Touch
All engagements begin with a conversation. No technical details shared until scope is agreed and an NDA is in place.
Request a Findings Briefing
We'll walk you through relevant findings for your ML stack — no commitment required. Email: support@sprk3.com
Enterprise Engagement
Findings Report, MPI Assessment, or Ongoing Monitoring Retainer. Contact us to discuss scope and fit. Email: support@sprk3.com
Security Research Inquiries
Coordinated disclosure, research collaboration, or press inquiries. Email: support@sprk3.com
SPR{k3 operates from Israel. Response within 1 business day.
SPR{k3 Security Research
support@sprk3.com
Patent Pending — US Provisional Application Filed October 8, 2025
© 2025–2026 SPR{k3 Security Research Team. All rights reserved.