Potato Genome Sequencing Consortium Public Data Release

Sequence files and other related information for the Potato Genome Sequencing Consortium (PGSC). The PGSC has sequenced two potato species: the heterozygous diploid S. tuberosum Group Tuberosum cultivar, RH89-039-16 (RH), and the doubled monoploid S. tuberosum Group Phureja clone DM1-3 (DM)

  • Updates:
  • July 9, 2012 - The PGSC v2.1.10 pseudomolecules (based on version 3 of the DM genome assembly) were updated to v2.1.11 pseudomolecules. This new version is the same as the S. tuberosum Group Phureja DM1-3 Version 2.1.10 AGP Pseudomolecule Sequences (available below) except the gaps greater than 50 kbp have been changed to 50 kbp
  • Dec 15, 2011 - The transcript and representative transcript files have updated due to the original files containingsome corrupted sequences.

Genome Assemblies (FASTA Format)

S. tuberosum Group Phureja DM1-3 Genome Annotation v3.4 mapped to the pseudomolecule sequences

S. tuberosum Group Phureja DM1-3 Genome Annotation v3.4 (based on v3 superscaffolds)

RNA-Seq Gene Expression Data

Information about the RNA-Seq Gene Expression Data

The format of the files:
1st column: gene ID
2nd column: library 1
3rd column: library 2
...
last column: functional annotation of the gene

The reads were mapped to S. tuberosum Group Phureja DM1-3 superscaffolds using Tophat (v1.1.4) [which made use of Bowtie (v0.12.7)] The FPKM values were calculated by Cufflinks (v0.9.2) using v3.4 representative model set only.

Tophat was run with "-i 10 -I 15000" parameters, which set a minimum intron size of 10bp (-i 10), and a maximum intron size of 15,000bp (-I 15000). These values are the minimum and maximum intron feature lengths present in the v3.4 GFF. For the paired-end DM libraries, the mate inner distance was set based on the fragment size of each library.

Cufflinks was run with the same maximum intron size of 15,000bp (-I 15000); for the paired-end DM libraries the same mate inner distance settings were used.

Functional annotation was based on best BLASTX hits using the CDS sequences against UniRef100. The text was assigned using a first informative best-hit strategy, which considers best BLASTX hits where E <= 1e-5, but excludes hits with non-informative functional text (eg: "Whole genome shotgun sequence of line..."). The text is also programmatically cleaned to remove some misleading and low-information strings. For gene-level annotation, the transcript-level functional text was concatenated, so there will be some redundancy due to variations in the annotation string assigned to the different isoforms.

Putative Orthologous Groups (OrthoMCL)

  • 12_plants_all_orthomcl_parsed.txt.zip -
    The predicted proteomes (representative peptides only) of 12 plant species were used for identification of putative orthologous groups using OrthoMCL with deafult parameters (Li et al., 2003). The plant species included are: Arabidopsis thaliana, Brachypodium distachyon, Carica papaya, Chlamydomonas reinhardtii, Glycine max, Oryza sativa, Physcomitrella patens, Populus trichocarpa, Solanum tuberosum, Sorghum bicolor, Vitis vinifera and Zea mays.

  • This tab-delimited file has the following columns:
    Cluster_ID,
    Number_of_peptides_in_this_cluster
    Number_of_species_in_this_cluster
    Species (separated by space)
    Peptides (separated by space)

    BAC, BAC End, and Fosmid End Sequences