Canonicalize gene names to Uniprot
Post date: Feb 25, 2015 2:31:18 PM
- Download the latest release of the Uniprot mappings. For example: HUMAN_9606_idmapping.dat.gz
- Import those data in a suitable SQL database (it’s just 3 tab-separated text columns). Make indices for all tables. For example, in sqlite3
CREATE TABLE HUMAN_9606_idmapping (id text, type text, id2 text); CREATE INDEX id on HUMAN_9606_idmapping(id); CREATE INDEX type on HUMAN_9606_idmapping(type); CREATE INDEX id2 on HUMAN_9606_idmapping(id2); .separator "\t" .import HUMAN_9606_idmapping.dat HUMAN_9606_idmapping
- Put the IDs you want to canonicalize into a separate table - let’s call it
Noncanonical.NCID
- Query with the following
SELECT NCID, m2.ID2, m1.Type FROM Noncanonical, HUMAN_9606_idmapping AS m1, HUMAN_9606_idmapping AS m2 WHERE m1.ID2 = NCID AND m1.ID = m2.ID AND m2.Type LIKE 'UniProtKB-ID'
The closest alternative from the web site seems to be the Retrieve/ID mapping and map IDs from ChiTaRS identifiers.