PSI Protein Classifier is a computer program that allows you to summarize the results of both sequential and independent iterations of the PSI-BLAST program. The program determines the belonging of blast 'found proteins to previously known families and splits the remaining proteins into groups. It allows one to quantify (by the number of iterations) the level of kinship between different families of protein homologs .
Format of parsed files
PSI Protein Classifier uses files generated on the NCBI website by the PSI-BLAST program. When you start screening the database of amino acid sequences using PSI-BLAST, you need the protein sequence used as a query, enter it in the FASTA format, and assign it a binary name, separated by a hyphen. In this case, the first part of the name should be the designation of the family to which it belongs. It is necessary to consistently (after each iteration) save, using the “Use old BLAST report format” mode, web pages with the PSI-BLAST program results in the form of text (.txt) files in the PSI-Blast folder (these files are called “blast files ").
PSI Protein Classifier uses family list files as auxiliary files. The presence of such files is optional. Files of this type are text (.txt) and are placed in the FamilyName folder (these files are referred to as "family-files"). The first line of each of these files indicates the name of the family, with any two words preceding it, followed by at least one more. The following lines indicate the number of proteins (GenPept accession number) belonging to this family - one per line. It is important that each number contains an indication of the version of the protein, that is, the last digit of the number was preceded by a dot. The program can also use as a family-file ready lists of families from the CAZy database , saved as text (.txt) files. It should be noted that the CAZy database covers the glycosyl-hydrolase families and a number of other enzymes acting on carbohydrates and their derivatives.