MolHuiTu — Molecular HyperGraph

Intelligent Drug–Target Interaction (DTI) Prediction Platform

A next-generation, GPU-accelerated DTI system that fuses hypergraph molecular encoders, protein language models (ProtBert), rigorous explainability, and a clean web UI.

Overview

MolHuiTu (Molecular Intelligence Graph) predicts drug–target interactions from a SMILES (ligand) and a FASTA (protein). It returns a calibrated score and explains the prediction by highlighting key atoms and residues. The web UI includes interactive 3D visualization, batch job management, and downloadable reports.

Feature Highlights

Hypergraph Molecular Encoder — Captures multi-body patterns (rings, functional groups, H-bonds) beyond pairwise bonds via hyperedges and a masked-autoencoder pretrain; improves modeling of complex chemistry.
Protein Language Model (ProtBert) — Transformer embeddings of amino-acid sequences (mean/CLS pooling), fused with ligand embeddings for robust DTI scoring.
End-to-End Inference — Single query and high-throughput batch CSV screening; optional probability calibration for deployment realism.
Integrated Explainability — Atom-level SHAP and residue-level occlusion with Top-K contributors and a consistency check.
One-Stop Context — Hooks for PubChem / UniProt / AlphaFold / RCSB PDB to enrich reports and drive 3Dmol.js visualization.
Practical UX — Clean web UI, job history, CSV export, and report pages; GPU-optimized backend validated on NVIDIA RTX 4090.

Traditional graph vs hypergraph: a simple graph restricts bonds to pairs; MolHuiTu uses hyperedges to connect any number of atoms so functional motifs are represented natively.

Guided Tour

Home → Entry Points

A minimal home screen routes to single prediction, batch submission, and history.

Single Prediction → Fill & Submit

Provide SMILES and a FASTA (or UniProt ID). Toggle explainability if needed.

Reports → Scores & Rationale

Per-sample reports summarize inputs, prediction scores, and visuals.

Architecture

Drug Encoder — HyperGraph-MAE: Represent molecules as hypergraphs (nodes = atoms; hyperedges = rings/groups/relations). Pretrain with degree-aware masking and reconstruction; aggregate via multi-head attention → fixed-size ligand embedding.
Protein Encoder — ProtBert: Transformer embeddings from ProtBert (HuggingFace); mean/CLS pooling configurable → protein embedding.
Fusion & Prediction — XGBoost Head: Concatenate (or bilinear fuse) ligand/protein embeddings → XGBoost for classification (probability) or regression (affinity). Optional Platt / Isotonic calibration improves reliability.

Backend stack: PyTorch (+ CUDA), PyTorch Geometric, RDKit, XGBoost, SHAP, 3Dmol.js (frontend).