learning, regularization, and optimization for computational biology

Transcript

learning, regularization, and optimization for computational biology
learning, regularization, and optimization for
computational biology
alessandro verri
DIBRIS, Università di Genova
a. verri (Università di Genova)
sestri levante, september 11
1 / 23
summary
motivation
the problem
the proposed framework
results
conclusions
a. verri (Università di Genova)
sestri levante, september 11
2 / 23
motivation
learning through regularization: a fruitful approach
the case of computational biology
need of non-standard penalties for better modeling
optimization comes to the rescue
a. verri (Università di Genova)
sestri levante, september 11
3 / 23
the problem
metastatic neuroblastoma (NB) occurs in pediatric patients at stage
4S or stage 4
tumors of stage 4 NB contain several structural Copy Number
Aberrations (less frequent than those in stage 4S)
CNAs of stage 4 tumors are associated with highly aggressive disease
genomic instability appears to play a critical role in the genesis of
these tumors
goal: use high-resolution array comparative chromosome hybridization
of 190 metastatic NBs to build a tumorigenesis model
a. verri (Università di Genova)
sestri levante, september 11
4 / 23
array-based Comparative Genomic Hybridization (aCGH)
a. verri (Università di Genova)
sestri levante, september 11
5 / 23
Data example
a. verri (Università di Genova)
sestri levante, september 11
6 / 23
aCGH data analysis
Data Analysis: example
a. verri (Università di Genova)
sestri levante, september 11
7 / 23
The proposed statistical model A dictionary based approach
General Setting
We are given S ∈ N samples y1 , y2 , . . . , yS with ys ∈ RL .
We seek J simple atoms (βj )1≤j≤J which possibly give complete
representation of all the samples, in the sense that
ys u
J
X
θjs βj
∀s = 1, . . . , S
j=1
for some coefficients θjs ∈ R.
P
ŷs = Jj=1 θjs βj provides a smoothed version of ys which make
alterations more easily detectable.
the atoms (βj )1≤j≤J together with the coefficients (θs )1≤s≤S possibly
indicates principal and shared alterations in the data.
a. verri (Università di Genova)
sestri levante, september 11
8 / 23
The proposed statistical model A dictionary based approach
The starting point
min
θs ,βj
s.t.
J
J
S
J
2
X
X
X
1 X
kβj k1 + µ
TV (βj )
θjs βj + λ
ys −
2
s=1
S
X
j=1
i=1
j=1
θjs2 ≤ 1, for all j = 1, . . . , J, and fixed λ > 0 and µ > 0
s=1
This model has been proposed in [?]
the penalty λ kβj k1 + µTV (βj ) is called fused lasso and forces the
solution atoms to be sparse and piece-wise constant.
the hard constraints on the coefficients θ·j are imposed for consistency
and identifiability of the model.
chromosomes are analyzed one-by-one. As a consequence different
family of atoms are obtained for different chromosomes.
a. verri (Università di Genova)
sestri levante, september 11
9 / 23
The proposed statistical model A dictionary based approach
The proposed model I
S
min
θs ,βj
J
2
X
1 X
θjs βj ys −
2
s=1
+λ
i=1
J
X
kβj k21
j=1
s.t. 0 ≤ θjs ≤ 1 ,
+µ
J
X
TVw (βj ) + τ
S
X
kθs k21
s=1
j=1
∀j = 1, . . . , J
weighed total variation TVw (βj ) =
PL−1
l=1
wl |βl+1,j − βl,j |;
coefficients are constrained to be positive: θjs ≥ 0.
structured sparsity along columns of the matrix of atoms [β1 , . . . , βJ ]
and coefficients [θ1 , . . . , θS ] too.
a. verri (Università di Genova)
sestri levante, september 11
10 / 23
The proposed statistical model A dictionary based approach
The proposed model II
min
θs ,βj
J
S J
J
S
2
X
X
X
X
1 X
θjs βj + λ
kβj k21 + µ
TVw (βj ) + τ
kθs k21
ys −
2 s=1
s=1
i=1
j=1
j=1
s.t. 0 ≤ θjs ≤ 1 ,
∀j = 1, . . . , J
Improvements:
TVw allows to relax at some points the constraint of “small
variations” on the atoms.
the signal of the entire genome can now be analyzed at once, still
guaranteeing independency for each chromosome (at the end points of
chromosomes the weights are set to zero).
concomitant alterations occurring on different chromosomes can be
identified.
The coefficients are constrained to be positive.
This reduces the complexity of the matrix [θls ] and forces the atoms βj
to be more informative.
a. verri (Università di Genova)
sestri levante, september 11
11 / 23
The proposed statistical model A dictionary based approach
General Setting
Problem
min
B,Θ
where
g (B) = λ
1
kY − BΘk2F + h(Θ) + g (B)
2
J
X
kB(:, j)k21
+µ
j=1
J
X
TVw (B(:, j)),
j=1
h(Θ) = δ∆S×J (Θ) + τ
S
X
kΘ(:, s)k21
s=1
and ∆ = [0, 1].
a. verri (Università di Genova)
sestri levante, september 11
12 / 23
The proposed statistical model A dictionary based approach
Convergence Theorem
Problem
min
B,Θ
1
kY − BΘk2F + h(Θ) + g (B)
2
(1)
defining the partial functions
a. verri (Università di Genova)
sestri levante, september 11
13 / 23
The proposed statistical model A dictionary based approach
Convergence Theorem
Problem
min
B,Θ
1
kY − BΘk2F + h(Θ) + g (B)
2
(1)
defining the partial functions
ϕB (Θ) =
1
2
kY − BΘk2F + h(Θ),
a. verri (Università di Genova)
ψΘ (B) =
sestri levante, september 11
1
2
kY − BΘk2F + g (B)
13 / 23
The proposed statistical model A dictionary based approach
Convergence Theorem
Problem
min
B,Θ
1
kY − BΘk2F + h(Θ) + g (B)
2
defining the partial functions
ϕB (Θ) = 12 kY − BΘk2F + h(Θ),
ψΘ (B) =
1
2
(1)
kY − BΘk2F + g (B)
Theorem (Attouch et al. 2010)
ηk , ζk ∈ [ρ1 , ρ2 ]
Θk+1 ≈εn proxηk ϕB (Θk )
k
Bk+1 ≈εn proxζk ψΘ
k+1
(Bk )
0 < ρ1 ≤ ρ2 < +∞. If εn = 0, then (Bk , Θk ) → (B̂, Θ̂) a stationary point
of (1)
a. verri (Università di Genova)
sestri levante, september 11
13 / 23
The proposed statistical model A dictionary based approach
Convergence Theorem
Problem
min
B,Θ
1
kY − BΘk2F + h(Θ) + g (B)
2
defining the partial functions
ϕB (Θ) = 12 kY − BΘk2F + h(Θ),
ψΘ (B) =
1
2
(1)
kY − BΘk2F + g (B)
Theorem (Attouch et al. 2010)
ηk , ζk ∈ [ρ1 , ρ2 ]
Θk+1 ≈εn argminΘ ϕBk (Θ) + 2η1k kΘ − Θk k2F
Bk+1 ≈εn argminB ψΘk+1 (B) + 2ζ1k kB − Bk k2F
0 < ρ1 ≤ ρ2 < +∞. If εn = 0, then (Bk , Θk ) → (B̂, Θ̂) a stationary point
of (1)
a. verri (Università di Genova)
sestri levante, september 11
13 / 23
Numerical Experiments Data generation
Signal model
yls = µls + ls ,
µls =
Ms
X
cms 1[lms ,lms +kms ] [l],
ls ∼ N(0, σ 2 ),
m=1
µ·s is the mean signal, σ is the standard deviation of the noise ls ;
Ms is the number of segments (regions of CNVs) for sample s;
cms , lms and kms are the height, starting position and length
respectively for each segment.
We choose:
L = 1000, S = 20, σ ∈ {1, 2}
Ms ∈ {1, 2, 3, 4, 5}, cms ∈ {±1, ±2, ±3, ±4, ±5},
lms ∈ {1, . . . , L − 100} and kms ∈ {5, 10, 20, 50, 100}.
Signal model and numbers follow [?].
a. verri (Università di Genova)
sestri levante, september 11
14 / 23
Numerical Experiments Data generation
Synthetic Dataset
This dataset is explicitly designed to mimic a real signal composed of
different chormosomes. We build two classes of samples A and B and a
third randomly built class.
A
chr1
chr2
chr3
chr4
chr3
chr4
B
chr1
chr2
a. verri (Università di Genova)
sestri levante, september 11
15 / 23
Numerical Experiments Data generation
Synthetic Dataset
This dataset is explicitly designed to mimic a real signal composed of
different chormosomes. We build two classes of samples A and B and a
third randomly built class.
A
chr1
chr2
chr3
chr4
chr3
chr4
B
chr1
chr2
a. verri (Università di Genova)
sestri levante, september 11
15 / 23
Numerical Experiments Parameters selection
The Bayesian information criterion
The choice of the parameters (J, λ, µ, τ ) is done according to the Bayesian
information criterion (BIC) [?]. The BIC mitigates the problem of
overfitting by introducing a penalty term for the complexity of the model.
In our case the BIC is written as:
(SL) · log
kY − BΘk2 F
SL
+ k(B) log(SL)
and k(B) is computed as the number of jumps in B and ultimately
depends on the parameters (J, λ, µ, τ ).
After some standard normalizations, we choose:
J ∈ {5, 10, 15, 20},
λ ∈ {10−4 , 10−3 , 10−2 , 10−1 }, µ ∈ {1, 5, 10},
τ ∈ {10−3 , 10−2 , 10−1 , 1}.
a. verri (Università di Genova)
sestri levante, september 11
16 / 23
Numerical Experiments Parameters selection
ROC curves
True Positive rate vs False Positive rate
a. verri (Università di Genova)
sestri levante, september 11
17 / 23
Numerical Experiments Parameters selection
Estimated Y and B
FFlat, [?]
Solution 1
a. verri (Università di Genova)
sestri levante, september 11
18 / 23
Numerical Experiments Parameters selection
Estimated Y and B
Our proposal, [?]
Solution 1
a. verri (Università di Genova)
sestri levante, september 11
18 / 23
Numerical Experiments Parameters selection
Estimated Y and B
FFlat, [?]
Solution 2
a. verri (Università di Genova)
sestri levante, september 11
18 / 23
Numerical Experiments Parameters selection
Estimated Y and B
Our proposal, [?]
Solution 2
a. verri (Università di Genova)
sestri levante, september 11
18 / 23
Numerical Experiments Parameters selection
original and reconstructed aCGH data
a. verri (Università di Genova)
sestri levante, september 11
19 / 23
Numerical Experiments Parameters selection
reconstructed coefficient matrices
a. verri (Università di Genova)
sestri levante, september 11
20 / 23
Numerical Experiments Parameters selection
phylogenetic trees
a. verri (Università di Genova)
sestri levante, september 11
21 / 23
References
S. Salzo and S. Villa
Inexact and accelerated proximal point algorithms.
Journal of Convex Analysis 19, No. 4, 1167–1192, 2012.
S. Villa, S. Salzo, L. Baldassarre and A. Verri
Accelerated and inexact forward-backward algorithms, SIOPT, 2013.
S. Salzo, S. Masecchia, A. Verri,A. Barla
Alternating Proximal Regularized Dictionary Learning. NECO 2014.
Salvatore Masecchia, Simona Coco, Annalisa Barla, Alessandro Verri, Gian Paolo
Tonini,
Genome instability model of metastatic neuroblastoma tumorigenesis by a
dictionary learning algorithm. Submitted.
a. verri (Università di Genova)
sestri levante, september 11
22 / 23
Thank you
a. verri (Università di Genova)
sestri levante, september 11
23 / 23