Monday, October 17, 2011

CLC_Genomics_WB: Human Genome from GenBank

1. Download Human Genome Reference from NCBI using NC_0000* at the Search functionality of CLC; download the top 24 sequences to a new folder under CLC_data folder; change the name of each file to chr1, chr2 et al. -- they are all .clc files in CLC now.

2. Annotate_with_GTF: download the GTF file of human genome from the UCSC genome browser (follow the manual and choose ALL_SNP_132); save it under the folder of "humansnps(ALL_SNP_132)_1010" (unzip it and give a file extension gtf to it). Then this gtf file will match 24 chromosomes above as annotating.

Simply follow the manual; the annotation will be added to the original 24 chromosomes' clc files -- they are ready for mapping reads to reference.

3. Map reads to reference: follow the manual "Genomics_Gateway_User_Manual"; select "Homo sapiens tracks"; uncheck "Add tracks to existing track set".

It takes 24 hours.

4. A new track "Homo sapiens reads track" will be saved under the same folder.

5. SNP detection: follow the default setting except min coverage set to be 100.

CLC_Genomics_WB: Human Genome from Ensembl

CLC was updated to 4.8 and three plugins were upgraded as well by Bob today.

1. Download human genome FASTA files from Ensembl ftp site: chromosome 1-22 plus X and Y; save them at a local drive (file extension is gz); unzip them (file extension is fa); keep all names as they are.

ftp://ftp.ensembl.org/pub/release-64/fasta/homo_sapiens/dna/

2. Upload all fa files to CLC server (a folder other than CLC_data): "Human Genome Ensembl (Sept 2011)".

This step takes a while; better to do it one file by one file.

3. IMPORT these fa files to a new folder "Human Genome Ensembl (Sept 2011) Imported" under the CLC_data folder: then these files will be converted to .clc files and names were changed to simple numbers.

This step is fast.

4. Define reference genome: simply follow the manual. Notes: at Figure 2.1, "Annotation tracks" is not available because Ensembl FASTA files don't have annotation, so this functionality is irrelevant.

Click "Create sequence track" and "Copy data to new tracks".
Save results to a new folder "Human Genome Ensembl (Sept 2011) Imported Track".

5. Download annotations from Ensembl: simply follow the manual.

Unchecked COSMIC;
Checked dbSNP: 1000genomes, HapMap; Clinical/LSDB.

Save results to the same folder "Human Genome Ensembl (Sept 2011) Imported Track".

This step took about 90 minutes.