Tushar Jain, Todd Boland, Asparouh Lilov, Irina Burnina, Michael Brown, Yingda Xu, Maximiliano Vásquez
Bioinformatics, 33(23), 3758–3766, DOI: 10.1093/bioinformatics/btx519
August 16, 2017
Adimab scientists introduce a significant advancement in the early assessment of therapeutic antibody developability by developing a computational method to predict antibody hydrophobicity directly from the corresponding amino acid sequence. Hydrophobicity is a critical biophysical property that can influence the manufacturability, long-term storage, and potentially, formulation of an antibody. Antibodies with high hydrophobicity, characterized by delayed retention times in hydrophobic interaction chromatography (HIC), are typically associated with higher aggregation that can reduce potency, induce immunogenicity, and accelerate serum clearance, all of which are undesirable for therapeutic drug development.
The central hypothesis of this study is that an antibody's hydrophobic character can be predicted directly from sequence data without constructing a full 3D structural model. Rather than mapping solvent-exposed residues through structural analysis, the authors propose that machine learning can be used to extract sufficient information from the amino acid sequence to accurately infer this key biophysical property.
To validate this hypothesis, the study pursued two main objectives:
Methodology and technical approach
The authors developed a two-step computational strategy to evaluate antibody hydrophobicity directly from sequence data in the variable region. The initial model was trained on a curated set of 902 antibody crystal structures from the Protein Data Bank, selected for kappa light chains and distinct complementarity-determining region (CDR) sequences.
First, the randomforest machine learning method was applied to predict the degree to which each amino acid in an antibody’s variable region is exposed to solvent, known as the solvent-accessible surface area (SASA). This property indicates how likely a residue is to participate in hydrophobic interactions. This model relied on patterns in the antibody amino acid sequence and known structural data to predict surface exposure directly from amino acid sequence information.
Next, experimental (HIC) retention time data from the large panel of antibodies were integrated with the SASA values to predict amino acid propensities using logistic regression.
Together, these models form a sequence-based framework for predicting hydrophobicity-related developability risks in therapeutic antibodies.
Key findings and impact
This study demonstrated that it is possible to predict antibody hydrophobicity with high accuracy by using only sequence information and without the constraints of obtaining atomic-level accuracy from modeling. The surface exposure model achieved strong predictive performance, particularly when local sequence context was included, and the follow-on model reliably identified antibodies with delayed HIC retention, a key indicator of aggregation. Importantly, the amino acid scores derived from the model aligned well with biophysical expectations: hydrophobic residues increased risk, while charged residues reduced it. The approach outperformed simpler prediction methods and worked well on clinical-stage antibodies.
Significant implications for therapeutic antibody development
By enabling early, rapid, and scalable assessment of hydrophobicity risk, this method supports better candidate selection, guides targeted sequence optimization, and can improve the design of antibody libraries, all without relying on structural models. This represents a powerful advance for streamlining therapeutic antibody development.
For more details, read the full article in Bioinformatics.