Chromosome-level assembly of the Cascade hop genome

Sequenced with PacBio long-reads, assembled with FALCON, phased with FALCON-Unzip, and scaffolded with Hi-C.

         Masked assembly     FASTA (1,533 scaffolds; 439 Mb)

         Masked assembly     FASTA (10 scaffolds; 390 Mb)

         Genetic map     TSV

         Linkage disequilibrium map     TSV

Repeat sequences identified with LTRharvest, LTR_FINDER_parallel, LTR_retriever, and RepeatMasker

         All repeat coordinates     GFF

         LTR coordinates     GFF

         Protein sequences (full assembly)     76,595 sequences (FASTA)

         CDS (full assembly)     76,595 sequences (FASTA)

         Full transcript sequences (full assembly)     76,595 sequences (FASTA)

         Protein sequences (10 scaffolds)     54,888 sequences (FASTA)

         CDS (10 scaffolds)     54,888 sequences (FASTA)

         Download gene mapping file     (TSV)


         Similarity to UniProt genes and Pfam domains     26,654 genes

         Similarity to repeat-associated genes and Pfam domains     34,840 genes

         Genes without similarity to known genes     15,101 genes

         Biological Processes GO terms     19,147 genes

         Cellular Component GO terms     20,674 genes

         Molecular Function GO terms     19,385 genes

         Defense response genes with GO terms     

         Terpene-associated genes with GO terms     

         Hop vs hemp (CBDRx) MCScanX collinearity file     

         OrthoFinder Orthogroups.tsv     

         Protein sequences (full assembly)     21,698 sequences (FASTA)

         CDS (full assembly)     21,698 sequences (FASTA)

         Protein sequences (10 scaffolds)     20,581 sequences (FASTA)

         CDS (10 scaffolds)     20,581 sequences (FASTA)

Original Transdecoder output

         Protein sequences     57,684 sequences (FASTA)

         CDS     57,684 sequences (FASTA)

         Transcript sequences     57,684 sequences (FASTA)

         BED file     (BED)

         Gene-centric GFF3     (GFF3)

         Genome-centric GFF3     (GFF3)

         Protein sequences     71,233 sequences (FASTA)

         CDS     71,233 sequences (FASTA)

         Transcript sequences     71,233 sequences (FASTA)

         Gene GFF     (GFF)