Skip to content

General advices : msCENTIPEDE on ATAC-seq #11

@PFRoux

Description

@PFRoux

Hi !

I am trying tu use msCENTIPEDE to generate TF footprints for several ATAC-seq data sets i've produced.

I wanted to have general advices on how to achieve the most accurate workflow.

So far, what I am doing is the following :

  1. Call ATAC-peaks with MACS2
  2. Scan the entire genome for a given PWM with STEME with a probability cut-off at 1/1000
  3. Select only motifs falling inside ATAC peaks
  4. Train msCENTIPEDE based on the top 10000 motifs falling inside ATAC peaks
  5. Infer on the whole motif falling inside ATAC peaks

Could you tell me please if I am on the right track ?
Or is my approach somehow biased ?

I was then wondering how to define a cut-off using the statistics reported by msCENTIPEDE to focus only on the most likely bound sites ?
Could you please provide in the README a short description of the output of msCENTIPEDE and how to use LogPosOdds, LogPriorOdds, MultLikeRatio and NegBinLikeRatio ?

Thanks a lot,

Cheers,

Pef

Activity

rajanil

rajanil commented on Dec 10, 2015

@rajanil
Owner

Hi @PFRoux,
For step (4), I would recommend training the model using the top 10000 motifs from the entire genome (with top motifs selected based on a PWM score). In some previous analyses, I had noticed that training the model using only motifs within ATAC-seq or DNase-seq peaks can lead to odd shapes for the cleavage profiles, and poor accuracy. The inference step (5), however, can be restricted to motifs within ATAC peaks, as you do already.

One suggested set of conditions to select the most likely bound sites is
LogPosOdds > 2 AND MultLikeRatio > 1 AND NegBinLikeRatio > 1 .

LogPosOdds is the log of the posterior odds that the TF is bound. (LogPosOdds > 2 is equivalent to the binding posterior > 0.99)
MultLikeRatio is the log likelihood ratio for the cleavage profile model.
NegBinLikeRatio is the log likelihood ratio for the total chromatin accessibility model.
(all logs are base 10)

Hope this helps!

I will add to the README a suggested workflow and descriptions of these quantities. Thanks for the suggestion.

cheers
Anil

PFRoux

PFRoux commented on Dec 11, 2015

@PFRoux
Author

Hi @rajanil,

Thx a lot for your suggestions and for explaining the output of msCENTIPEDE.
I'am now trying to train the model using the top 10000 motifs from the entire genome but it seems that the algorithm failed to define initial parameters. Using exactly the same inputs (except the motif set) I get the following output :

call_binding.py --task learn --protocol ATAC_seq ./pwmscan_hg19_22208_31666.bed.gz ../1-NOBLACKLIST_Bam/D0ATAC_Rep1_DEDUP_NOBLACKLIST_SORT.bam ../1-NOBLACKLIST_Bam/D0ATAC_Rep2_DEDUP_NOBLACKLIST_SORT.bam

loading motifs ...
num of motif sites = 10000
loading read counts ...
transforming data into multiscale representation ...
starting model estimation (restart 1)
Inf in LogLike
initial log likelihood = -inf
Nan in Omega
Traceback (most recent call last):
File "/mount/gensoft2/exe/msCentipede/1.0/bin/call_binding.py", line 345, in
main()
File "/mount/gensoft2/exe/msCentipede/1.0/bin/call_binding.py", line 338, in main
learn_model(options)
File "/mount/gensoft2/exe/msCentipede/1.0/bin/call_binding.py", line 104, in learn_model
background_counts, options.model, options.log_file, options.restarts, options.mintol)
File "mscentipede.pyx", line 1337, in mscentipede.estimate_optimal_model (mscentipede.c:31882)
square_EM(data, scores, zeta, pi, tau,
File "mscentipede.pyx", line 1134, in mscentipede.square_EM (mscentipede.c:28300)
EM(data, scores, zeta, pi, tau, alpha, beta, omega, pi_null, tau_null, model)
File "mscentipede.pyx", line 1062, in mscentipede.EM (mscentipede.c:27829)
omega.update(zeta, alpha)
File "mscentipede.pyx", line 682, in mscentipede.Omega.update (mscentipede.c:20451)
raise ValueError
ValueError

Could you help me please to solve this problem ?

Thanks a lot,

Cheers,

Pef

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @rajanil@PFRoux

        Issue actions

          General advices : msCENTIPEDE on ATAC-seq · Issue #11 · rajanil/msCentipede