Monday, October 17, 2011

CLC_Genomics_WB: Human Genome from Ensembl

CLC was updated to 4.8 and three plugins were upgraded as well by Bob today.

1. Download human genome FASTA files from Ensembl ftp site: chromosome 1-22 plus X and Y; save them at a local drive (file extension is gz); unzip them (file extension is fa); keep all names as they are.

ftp://ftp.ensembl.org/pub/release-64/fasta/homo_sapiens/dna/

2. Upload all fa files to CLC server (a folder other than CLC_data): "Human Genome Ensembl (Sept 2011)".

This step takes a while; better to do it one file by one file.

3. IMPORT these fa files to a new folder "Human Genome Ensembl (Sept 2011) Imported" under the CLC_data folder: then these files will be converted to .clc files and names were changed to simple numbers.

This step is fast.

4. Define reference genome: simply follow the manual. Notes: at Figure 2.1, "Annotation tracks" is not available because Ensembl FASTA files don't have annotation, so this functionality is irrelevant.

Click "Create sequence track" and "Copy data to new tracks".
Save results to a new folder "Human Genome Ensembl (Sept 2011) Imported Track".

5. Download annotations from Ensembl: simply follow the manual.

Unchecked COSMIC;
Checked dbSNP: 1000genomes, HapMap; Clinical/LSDB.

Save results to the same folder "Human Genome Ensembl (Sept 2011) Imported Track".

This step took about 90 minutes.

No comments:

Post a Comment