Back to Projects

MolHuiTu — Molecular HyperGraph

Intelligent Drug–Target Interaction (DTI) Prediction Platform

A next-generation, GPU-accelerated DTI system that fuses hypergraph molecular encoders, protein language models (ProtBert), rigorous explainability, and a clean web UI.

Platform Homepage


Overview

MolHuiTu (Molecular Intelligence Graph) predicts drug–target interactions from a SMILES (ligand) and a FASTA (protein). It returns a calibrated score and explains the prediction by highlighting key atoms and residues. The web UI includes interactive 3D visualization, batch job management, and downloadable reports.


Feature Highlights

  • Hypergraph Molecular Encoder — Captures multi-body patterns (rings, functional groups, H-bonds) beyond pairwise bonds via hyperedges and a masked-autoencoder pretrain; improves modeling of complex chemistry.
  • Protein Language Model (ProtBert) — Transformer embeddings of amino-acid sequences (mean/CLS pooling), fused with ligand embeddings for robust DTI scoring.
  • End-to-End Inference — Single query and high-throughput batch CSV screening; optional probability calibration for deployment realism.
  • Integrated ExplainabilityAtom-level SHAP and residue-level occlusion with Top-K contributors and a consistency check.
  • One-Stop Context — Hooks for PubChem / UniProt / AlphaFold / RCSB PDB to enrich reports and drive 3Dmol.js visualization.
  • Practical UX — Clean web UI, job history, CSV export, and report pages; GPU-optimized backend validated on NVIDIA RTX 4090.

Traditional graph vs hypergraph: a simple graph restricts bonds to pairs; MolHuiTu uses hyperedges to connect any number of atoms so functional motifs are represented natively.


Guided Tour

Home → Entry Points

A minimal home screen routes to single prediction, batch submission, and history.

Prediction Interface

Single Prediction → Fill & Submit

Provide SMILES and a FASTA (or UniProt ID). Toggle explainability if needed.

Batch Screening

Reports → Scores & Rationale

Per-sample reports summarize inputs, prediction scores, and visuals.

Report Details


Architecture

Architecture Diagram

  • Drug Encoder — HyperGraph-MAE: Represent molecules as hypergraphs (nodes = atoms; hyperedges = rings/groups/relations). Pretrain with degree-aware masking and reconstruction; aggregate via multi-head attention → fixed-size ligand embedding.
  • Protein Encoder — ProtBert: Transformer embeddings from ProtBert (HuggingFace); mean/CLS pooling configurable → protein embedding.
  • Fusion & Prediction — XGBoost Head: Concatenate (or bilinear fuse) ligand/protein embeddings → XGBoost for classification (probability) or regression (affinity). Optional Platt / Isotonic calibration improves reliability.

Backend stack: PyTorch (+ CUDA), PyTorch Geometric, RDKit, XGBoost, SHAP, 3Dmol.js (frontend).