Canonicalize gene names to Uniprot

Post date: Feb 25, 2015 2:31:18 PM

Download the latest release of the Uniprot mappings. For example: HUMAN_9606_idmapping.dat.gz
Import those data in a suitable SQL database (it’s just 3 tab-separated text columns). Make indices for all tables. For example, in sqlite3

CREATE TABLE HUMAN_9606_idmapping (id text, type text, id2 text); CREATE INDEX id on HUMAN_9606_idmapping(id); CREATE INDEX type on HUMAN_9606_idmapping(type); CREATE INDEX id2 on HUMAN_9606_idmapping(id2); .separator "\t" .import HUMAN_9606_idmapping.dat HUMAN_9606_idmapping

Put the IDs you want to canonicalize into a separate table - let’s call it Noncanonical.NCID
Query with the following

SELECT NCID, m2.ID2, m1.Type FROM Noncanonical, HUMAN_9606_idmapping AS m1, HUMAN_9606_idmapping AS m2 WHERE m1.ID2 = NCID AND m1.ID = m2.ID AND m2.Type LIKE 'UniProtKB-ID'

The closest alternative from the web site seems to be the Retrieve/ID mapping and map IDs from ChiTaRS identifiers.

Google Sites

Report abuse