Abstract Title
An AI for an AI: can the next zoonotic avian influenza spillover be identified straight from its genome?
Abstract
Avian influenza (AI) remains a serious threat human health. Though zoonotic events in the current outbreak have been limited, there are concerns around potential for new strains to emerge and transmit between humans more readily.
Studies have shown that if trained appropriately, machine learning and artificial intelligence trained directly on viral genome sequences can detect suitability for given hosts. We build on these by tailoring methods to influenza genomes and their segmented structure and develop models to estimate zoonotic potential of AI.
We sourced >18,000 whole genome sequences of AI from 122 subtypes including 14 with known zoonotic events. To prevent over-fitting models to well-sampled lineages, we downsampled to ~4,000 representative non-zoonotic sequences and ~100 zoonotic sequences based on shared identity before model training.
We applied multiple machine learning algorithms (e.g., random forests, gradient boosted machines) to predict zoonotic status based on genomic and proteomic traits (e.g., k-mer composition, codon biases, protein physicochemistry).
We found zoonotic sequences can be distinguished with strong performance that generalises even to rare subtypes, e.g., H10N8. We also identified influential genetic patterns associated with human infection and their concentrated hotspots across the AI genome. We combined best-performing models into a single interface that can generate zoonotic risk predictions for new sequence inputs.
Our findings can suggest key genomic sites to monitor evolution of AI circulating within birds and our models can give early zoonotic risk assessment for new lineages as soon as a genome is sequenced.
Studies have shown that if trained appropriately, machine learning and artificial intelligence trained directly on viral genome sequences can detect suitability for given hosts. We build on these by tailoring methods to influenza genomes and their segmented structure and develop models to estimate zoonotic potential of AI.
We sourced >18,000 whole genome sequences of AI from 122 subtypes including 14 with known zoonotic events. To prevent over-fitting models to well-sampled lineages, we downsampled to ~4,000 representative non-zoonotic sequences and ~100 zoonotic sequences based on shared identity before model training.
We applied multiple machine learning algorithms (e.g., random forests, gradient boosted machines) to predict zoonotic status based on genomic and proteomic traits (e.g., k-mer composition, codon biases, protein physicochemistry).
We found zoonotic sequences can be distinguished with strong performance that generalises even to rare subtypes, e.g., H10N8. We also identified influential genetic patterns associated with human infection and their concentrated hotspots across the AI genome. We combined best-performing models into a single interface that can generate zoonotic risk predictions for new sequence inputs.
Our findings can suggest key genomic sites to monitor evolution of AI circulating within birds and our models can give early zoonotic risk assessment for new lineages as soon as a genome is sequenced.
Co-Author(s)
Liam Brierley (University of Glasgow),
Joaquin Mould-Quevedo (CSL Seqirus USA),
Matthew Baylis (University of Liverpool)
Abstract Category
Avian influenza in mammals, pandemic preparedness, and one health