Package org.snpeff.geneOntology
Class GoTerms
java.lang.Object
org.snpeff.geneOntology.GoTerms
- All Implemented Interfaces:
Serializable
,Iterable<GoTerm>
A collection of GO terms
- Author:
- Pablo Cingolani
- See Also:
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionAdd a GOTerm (if not already in this GOTerms) WARNING: Creates 'fake' symbolNames based on symbolIds.void
addInterestingSymbol
(String symbolId, int rank, HashSet<String> noGoTermFound) Add a symbol as 'interesting' symbol (to every corresponding GOTerm in this set)boolean
addSymbolId
(GoTerm goTerm, String symbolId) Add a symbolId (as well as all needed mappings)void
Use symbols for chids in DAG For every GOTerm, each child's symbols are added to the term so that root term contains every symbol and every interestingSymbolCreate a set with all the symbolsvoid
checkInterestingSymbolIds
(Set<String> interestingSymbolIds) Checks that every symboolID is in the set (as 'interesting' symbols)disjointSet
(List<GoTerm> goTermList, int activeSets) Produce a GOTerm based on a list of GOTerms and a 'mask'getGoTermsBySymbolId
(String symbolId) int
getLabel()
int
int
Get symbol's rankiterator()
Iterate through each GOterm in this GOTermskeySet()
int
levels()
Calculate each node's level (in DAG)listTopTerms
(int numberToSelect) Select a number of GOTermsint
Calculate how many interesting symbol-IDs in are there in all these GOTermsint
Number of nodes in this DAGint
Calculate the number of nodes in that have at least one interesting symbolint
Calculate the number of nodes in that have at least one annotated symbolint
Calculate how many symbol-IDs in are there in all these GOTermsvoid
readGeneAssocFile
(String goGenesFile, boolean useGeneId) Reads a file containing every gene (names and ids) associated GO termsvoid
readInterestingSymbolIdsFile
(String fileName) Reads a file with a list of 'interesting' genes (one per line)void
readOboFile
(String oboFile, boolean removeObsolete) Read an OBO filevoid
removeGOTerm
(String goTermAcc) Remove a GOTermvoid
Reset every 'interesting' symbolId (on every single GOTerm in this GOTerms)void
saveGseaGeneSets
(String fileName) Save gene sets file for GSEA analysis Format specification: http://www.broad.mit.edu/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29void
toString()
values()
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
debug
public static boolean debug -
verbose
public static boolean verbose
-
-
Constructor Details
-
GoTerms
public GoTerms()Default constructor -
GoTerms
public GoTerms(String oboFile, String nameSpace, String interestingGenesFile, String geneAssocFile, boolean removeObsolete, boolean useGeneId) Constructor- Parameters:
oboFile
- : Path to OBO description filenameSpace
- : Can be 'null' for "all namespaces"interestingGenesFile
- : Path to a file containing a list of 'interesting' genes (one geneName per line)geneAssocFile
- : A file containing lines like: "GOterm \t gene_product_id \t gene_name \n"
-
-
Method Details
-
add
Add a GOTerm (if not already in this GOTerms) WARNING: Creates 'fake' symbolNames based on symbolIds. This method is used mostly for testing / debugging -
addInterestingSymbol
Add a symbol as 'interesting' symbol (to every corresponding GOTerm in this set)- Parameters:
symbolName
- : Symbol's namerank
- : symbol's ranknoGoTermFound
- : Add symbol here if there are no GOTerms associated with this symbol
-
addSymbolId
Add a symbolId (as well as all needed mappings)- Parameters:
goTermAcc
-symbolId
-symbolName
-goTermType
-description
-- Returns:
- true if OK, false on error (GOTerm 'goTermAcc' not found)
-
addSymbolsFromChilds
public void addSymbolsFromChilds()Use symbols for chids in DAG For every GOTerm, each child's symbols are added to the term so that root term contains every symbol and every interestingSymbol -
allSymbols
Create a set with all the symbols -
checkInterestingSymbolIds
Checks that every symboolID is in the set (as 'interesting' symbols)- Parameters:
interestingSymbolIds
- : A set of interesting symbols Throws an exception on error
-
disjointSet
Produce a GOTerm based on a list of GOTerms and a 'mask'- Parameters:
goTermList
- : A list of GOTermsactiveSets
- : An integer (binary mask) that specifies weather a set in the list should be taken into account or not. The operation performed is: Intersection{ GOTerms where mask_bit == 1 } - Union{ GOTerms where mask_bit == 0 } ) where the minus sign '-' is actually a 'set minus' operation. This operation is done for both sets in GOTerm (i.e. symbolIds and interestingSymbolIds)- Returns:
- A GOTerm
-
getGoTerm
-
getGoTermsByGoTermAcc
-
getGoTermsBySymbolId
-
getGoTermsBySymbolId
-
getInterestingSymbolIdsSet
-
getInterestingSymbolIdsSize
public int getInterestingSymbolIdsSize() -
getLabel
-
getMaxRank
public int getMaxRank() -
getNameSpace
-
getRank
Get symbol's rank- Parameters:
symbolId
-- Returns:
-
getRankSymbolId
-
iterator
Iterate through each GOterm in this GOTerms -
keySet
-
levels
public int levels()Calculate each node's level (in DAG)- Returns:
- maximum level
-
listTopTerms
Select a number of GOTerms- Parameters:
numberToSelect
-- Returns:
-
numberOfInterestingSymbols
public int numberOfInterestingSymbols()Calculate how many interesting symbol-IDs in are there in all these GOTerms- Returns:
- Number of interesting symbols
-
numberOfNodes
public int numberOfNodes()Number of nodes in this DAG- Returns:
-
numberOfNodesWithOneInterestingSymbol
public int numberOfNodesWithOneInterestingSymbol()Calculate the number of nodes in that have at least one interesting symbol- Returns:
-
numberOfNodesWithOneSymbol
public int numberOfNodesWithOneSymbol()Calculate the number of nodes in that have at least one annotated symbol- Returns:
-
numberOfSymbols
public int numberOfSymbols()Calculate how many symbol-IDs in are there in all these GOTerms- Returns:
- Number of interesting symbols
-
readGeneAssocFile
Reads a file containing every gene (names and ids) associated GO terms- Parameters:
goGenesFile
- : A file containing gene associations to GO terms
-
readInterestingSymbolIdsFile
Reads a file with a list of 'interesting' genes (one per line)- Parameters:
fileName
- : Can be "-" for no-file
-
readOboFile
Read an OBO file- Parameters:
oboFile
-nameSpace
-
-
removeGOTerm
Remove a GOTerm -
resetInterestingSymbolIds
public void resetInterestingSymbolIds()Reset every 'interesting' symbolId (on every single GOTerm in this GOTerms) -
rootNodes
-
saveGseaGeneSets
Save gene sets file for GSEA analysis Format specification: http://www.broad.mit.edu/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29- Parameters:
fileName
-
-
setLabel
-
toString
-
values
-