Canonicalize gene names to Uniprot

Post date: Feb 25, 2015 2:31:18 PM

    1. Download the latest release of the Uniprot mappings. For example: HUMAN_9606_idmapping.dat.gz
    2. Import those data in a suitable SQL database (it’s just 3 tab-separated text columns). Make indices for all tables. For example, in sqlite3
        1. CREATE TABLE HUMAN_9606_idmapping (id text, type text, id2 text); CREATE INDEX id on HUMAN_9606_idmapping(id); CREATE INDEX type on HUMAN_9606_idmapping(type); CREATE INDEX id2 on HUMAN_9606_idmapping(id2); .separator "\t" .import HUMAN_9606_idmapping.dat HUMAN_9606_idmapping
  1. Put the IDs you want to canonicalize into a separate table - let’s call it Noncanonical.NCID
  2. Query with the following
      1. SELECT NCID, m2.ID2, m1.Type FROM Noncanonical, HUMAN_9606_idmapping AS m1, HUMAN_9606_idmapping AS m2 WHERE m1.ID2 = NCID AND m1.ID = m2.ID AND m2.Type LIKE 'UniProtKB-ID'

The closest alternative from the web site seems to be the Retrieve/ID mapping and map IDs from ChiTaRS identifiers.