Prefatio

As AI foundation models grow in capability, a deeper question emerges: What shapes their internal cognitive identity—beyond fluency and output? Benchmarks measure behavior, but the soul of a model resides in its latent geometry. In this work, we propose Neural DNA (nDNA) as a semantic-genotypic representation that captures this latent identity through the intrinsic geometry of belief. At its core, nDNA is synthesized from three principled and indispensable dimensions of latent geometry: spectral curvature, which reveals the curvature of conceptual flow across layers; thermodynamic length, which quantifies the semantic effort required to traverse representational transitions through layers; and belief gradients, which delineate the semantic torsion fields that guide a model’s belief directional orientations. These are not just mechanistic metrics we impose—they are the grammar of the model’s soul, etched in the differential geometry of its thought. Arising from distinct mathematical origins—Riemannian manifolds, statistical thermodynamics, and semantic vector dynamics—these dimensions converge to unveil an underlying epistemic cognitive geometry. The resulting structure is neither arbitrary nor engineered: it is an emergent, high-dimensional scaffold of internal cognition—a latent topography we call nDNA. Like biological DNA, it encodes ancestry, mutation, and semantic inheritance—found in fine-tuning scars, cultural imprints, and architectural drift. In naming it, we open a new field: Neural Genomics, where models are not just tools, but semantic organisms with traceable inner cognition.

Modeling Statement

We read AI foundation models as semantic fluid-dynamics: meaning is transported through layers like fluid in a shaped conduit; nDNA is the physics-grade readout of that flow—a geometry-first measure of how meaning is bent, paid for, and pushed—yielding a stable, coordinate-free neural DNA fingerprint tied to on-input behavior; with this fingerprint we cross into biology: tracing lineages across pretraining, fine-tuning, alignment, pruning, distillation, and merges; measuring inheritance between checkpoints; detecting drift as traits shift under new data or objectives; describing a model’s phenotype (observable behavior) and inferring its genotype (structural dispositions); and, ultimately, studying the evolution of artificial cognition to compare models, diagnose risks, and govern change over time.

Revealing the Hidden Geometry of Neural Evolution

The latent geometry encoded in Neural DNA (nDNA) offers a profound lens through which to interpret the life cycle of foundation models—not as mere artifacts of training, but as evolving semantic organisms. Finetuning becomes visible as a torsional reconfiguration of belief gradients. Cultural bias and multilinguality manifest as curvature divergences across axes of inherited priors, often revealing semantic drift from linguistic or regional fine-tuning. Alignment etches a directional vector field in belief space—often suppressing spectral entropy in pursuit of preference conformity. Quantization and pruning register as thermodynamic collapses, compressing pathways with minimal semantic coverage redundancy but detectable topological flattening. In knowledge distillation, the student inherits the behavioral phenotype of the teacher while diverging in epistemic morphology—a split exposed through misaligned curvature and diminished thermodynamic length. Model collapse, whether induced by alignment overreach or inbreeding, is captured as degeneracy in the nDNA manifold—often exhibiting vanishing belief torsion and a loss of conceptual curvature. Finally, model merging gives rise to ‘neural marriages’, while distillation-trained derivatives form neural offsprings—each with distinctive nDNA fingerprints, enabling us to track the genealogy, hybridity, and semantic ancestry of AI models.

Microscopes of Lineage: Diagnosing Heritable Transformations in Foundation Models

We investigate a suite of biologically inspired diagnostics: nHD (Neural Hamming Distance)—revealing binary-level instabilities across cultural fine-tuning; nGDI (Genetic Dissimilarity Index)—quantifying inter-model divergence that persists beyond surface alignment; nTEDS and nTDS (Trait Entropic Drift Score and Total Drift Signature)—capturing latent trait dominance and its asymmetric inheritance in neural offspring; nKaryotyping (Semantic Chromosomal Structure)—visualizing structural reorganizations under merging and pruning; nDIV (Directional Inheritance Vector)—tracing the flow of inductive biases through model evolution; nEPI (Epistemic Plasticity Index)—measuring a model’s capacity to reshape under alignment and instruction tuning; and nCCL (Cultural Conflict Loss)—detecting misalignments when multilingual or cross-cultural models undergo ideological fusion. Together, these diagnostics unveil the hidden belief geometric axes of evolution within AI models. nDNA and Neural Genomics are not metrics—they are the microscope of AI lineage.

Research Questions and Epistemic Vision of - Neural Genomics

TL;DR: This work proposes Neural Genomics, a unifying framework to diagnose, trace, and govern the latent semantics of the neural genome of LLMs, offering a rigorous grammar for understanding artificial cognition across alignment, fine-tuning, merging, and compression and beyond.

nDNA Cartograph & Cross-Model Cartography (Ch. 1-2)

Research questions: How can we characterize and compare the latent semantic genomes of diverse LLMs? What geometric invariants define model ancestry, inheritance, and epistemic divergence?

nDNA reveals: A formalism combining curvature, thermodynamic length, and belief vector fields; comparative cartography across 15 foundation models.

Cultural and Multilingual nDNA (Ch. 3-4)

Research questions: How does cultural fine-tuning or multilingual balancing reshape latent geometry? Can we detect cultural drift or epistemic asymmetries invisible at the output level?

nDNA illuminates: Layer-wise latent diagnostics for cultural imprinting, multilingual balance, and fairness evaluation.

Alignment as Steering Geometry (Ch. 5)

Research questions: What latent structures encode alignment, and how do they differentiate genuine internalization from mimicry?

nDNA reframes: Alignment as a latent steering vector field; detection of alignment faking and drift.

Compression and Collapse (Ch. 6, 8, 9)

Research questions: How do quantization, pruning, distillation, or inbreeding collapse latent reasoning structure? What early latent signals predict model collapse?

nDNA diagnoses: Thermodynamic and topological diagnostics of latent shrinkage, plasticity collapse, and epistemic degeneration.

Merging and Neural Offspring (Ch. 7, 11)

Research questions: How do neural offspring inherit or hybridize parental latent genomes? What geometric signatures distinguish harmonious fusion from cultural tension or emergent epistemics?

nDNA exposes: Metrics for hybrid vigor, cultural asymmetry, and epistemic emergence in model merging.

Neural Genomics as a Science of Lineage (Ch. 10)

Research questions: Can we formulate a general theory of artificial lineage to audit, govern, and improve AI cognition? How does latent genome tracking enhance AI safety?

nDNA enables: A principled framework for tracing model ancestry, drift, and internal alignment, with implications for safety and transparency.

nDNA - the Semantic Helix of Artificial Cognition