AI-Ready by Design: How Adimab's 15+ Years of Data Integration Makes Every Antibody Program Smarter than the Last

The effective application of machine learning to antibody discovery depends on well-structured and interconnected data derived from experiments rather than the sophistication of algorithms used. Many organizations have data from their experiments distributed across different LIMS, ELN, and Excel databases, which limits cross-program analysis and the creation of ML models. Adimab’s Atlas platform is a database specifically designed to address this challenge. The Atlas platform has been capturing antibody discovery data since 2009 and has accumulated information from about 1,431 programs and 2.85 million clones.

Approach and outcomes

Atlas is organized into four integrated modules: a PostgreSQL-based Data Integration Layer that enforces naming conventions and entity relationships at creation time; an Assay Data Warehouse spanning 34+ assay types with standardized units and QC flags; a Natural Language Interface via the Model Context Protocol (MCP); and an ML Prediction Engine integrated with AWS SageMaker for sequence-to-property prediction.
End-to-end clone lineage is bidirectional and tracked across six stages: Library Design, Selection Campaigns, Clone Identification, Sample Production, Characterization Assays, and Delivery. Users can trace forward from a library to all derived clones, or backward from a delivered clone to its source library. Clone origins include selection-acquired (73%), library-acquired (25%), and other sources (2%).
The Characterization Data Warehouse contains 4.1 million measurements across 16 assay platforms, all linked to their source clone and project. This includes 11,000+ site-specific accelerated stress measurements across 700+ antibodies, enabling ML models for deamidation and isomerization liability prediction that improve prediction scores 2–2.5x over baseline.
Over 11 years, 103,000 HIC measurements were accumulated by the Atlas platform, providing the training data for a predictive model for HIC retention time, a key developability indicator. Predicted values are stored alongside experimental results, creating a closed feedback loop in which each validated prediction supports future model refinement. The trained model now generates approximately 104,000 predictions per year, substantially expanding developability assessment throughput.
The Atlas platform is accessible via MCP, which translates natural language queries into validated SQL and returns structured results. Researchers without database expertise can query 17 years of linked experimental data directly. For example, users can ask "What are the HIC retention times for clones from project X?" and receive tabulated results without writing SQL. All queries are read-only and audit-logged.
Atlas captures data across 1,431 projects spanning Type I (52%), Type II (37%), Type III (10%), and other antibody formats, providing the cross-program dataset breadth needed to train models that generalize across target classes and therapeutic formats.

Why it matters

Schema and workflow design choices made in 2010 enabled machine learning applications that were not deployed until more than a decade later, illustrating that data architecture decisions precede and constrain ML capability. The Atlas platform demonstrates that a domain-specific relational model, used consistently as an operational system over many years, produces a foundation for ML models that depend on large-scale, linked experimental data. Natural language querying via MCP lowers the barrier to institutional data access, making 17 years of linked experimental data available to all team members regardless of database expertise. Each new Adimab program can be immediately queried in the context of all prior data, meaning every antibody program benefits from the accumulated knowledge of all programs that came before it.

AI-Ready by Design: How Adimab's 15+ Years of Data Integration Makes Every Antibody Program Smarter than the Last

Related resources

Engineering Assembly and Low pH Hold Stress of Anti-CD3 scFv-Based Multispecific T Cell Engagers

A Comprehensive Developability Panel for Accelerating Antibody Development into the Clinic with a Selective and Highly Predictive Set of Assays

Development of a Versatile, Synthetic, Heavy Chain-Only Platform for the Discovery of Monoclonal and Multispecific Therapeutic Antibodies