Statistics for the metadata in GEO/ENCODE database
The main purpose of TFmapper is to
all experimental ChIP-seq datasets and identify the
trans-acting factors or histone modifications
which show peaks at a gene of interest or a specified genomic region in a defined biological sample.
Please watch it at HD (1080p) in full screen for a better viewing experience
The workflow diagram of TFmapper
1. Peak files in the BED (Browser Extensible Data) format are downloaded from GEO/Cistrome and ENCODE.
2. Peaks are annotated to genomic features (Promoter, TTS, 5’UTR, 3’UTR, Intron, Exon, Intergenic) using the software HOMER with GRCh38/hg38 for human and GRCm38/mm10 for mouse as the reference genomes.
3. The annotation results are stored in a MySQL database. To increase the speed of query processing, the peaks are split by species, sources, factors, and chromosomes. In this way, the average of the number of rows would be about a few millions.
5. Results can be downloaded in the CSV or BED format, and peaks can be directly visualized in the in the WashU Epigenome Browser or the UCSC Genome Browser.
Users can query the database by 1) gene symbols or 2) genomic coordinates of GRCh38/hg38 for human or GRCm38/mm10 for mouse respectively.
SampleID: Curated from GEO,cistromeDB or ENCODE
Factors: The name of trans-acting factors or histone modifications
Visualization: Links to UCSC and washU
Sequence: For user to get the nucleotide sequenece for selected peak
Distance: From the summit of selected peak to the promoter of gene of interest
Score: Fold enrichment score calculated by MACS2
-log10(p value): -log10(p value) calculated by MACS2
-log10(q value): -log10(q value) calculated by MACS2
Attribute: Seven genomic features based on RefSeq annotations were assigned to each peak:
- 1. Promoter: -1kb to +100bp of transcription start site (TSS)
- 2. TTS region: -100 bp to +1kb of transcription termination site (TTS)
- 3. Exon
- 4. 5' UTR (untranslated region) Exon
- 5. 3' UTR (untranslated region) Exon
- 6. Intron
- 7. Intergenic regions
Title: The title of selected sample from GEO
Source Name: The detail description for the selected sample from GEO
Direct download links
User can download the results table as a csv or bed format files
csvformat file contains all of the fields in the search results.
bed: format file contains six different types of fields, which are :
- Chrom: The name of the chromosome
- chromStart: The starting position of the feature in the chromosome.
- chromEnd: The ending position of the feature in the chromosome.
- Factors: Name of the trans-acting factor or Histone mark
- Score: Fold enrichment score which calculated by MACS2
- Strand: "+" strand only
To visualize multiple peaks:
When multiple peaks are selected, a link for visualization in the WashU Epigenome Browser will appear (red arrow 1).
The results can be further filtered by typing in the boxes (figure 1, red arrow 2);
a slider will appear when the box under
which can be moved to narrow down the region.
Back to home page
Go back to the search results by clicking the
button at the top left corner of the page