learning, regularization, and optimization for computational biology

Transcript

learning, regularization, and optimization for
computational biology
alessandro verri
DIBRIS, Università di Genova
a. verri (Università di Genova)
sestri levante, september 11
1 / 23
summary
motivation
the problem
the proposed framework
results
conclusions
2 / 23
motivation
learning through regularization: a fruitful approach
the case of computational biology
need of non-standard penalties for better modeling
optimization comes to the rescue
3 / 23
the problem
metastatic neuroblastoma (NB) occurs in pediatric patients at stage
4S or stage 4
tumors of stage 4 NB contain several structural Copy Number
Aberrations (less frequent than those in stage 4S)
CNAs of stage 4 tumors are associated with highly aggressive disease
genomic instability appears to play a critical role in the genesis of
these tumors
goal: use high-resolution array comparative chromosome hybridization
of 190 metastatic NBs to build a tumorigenesis model
4 / 23
array-based Comparative Genomic Hybridization (aCGH)
5 / 23
Data example
6 / 23
aCGH data analysis
Data Analysis: example
7 / 23
The proposed statistical model A dictionary based approach
General Setting
We are given S ∈ N samples y1 , y2 , . . . , yS with ys ∈ RL .
We seek J simple atoms (βj )1≤j≤J which possibly give complete
representation of all the samples, in the sense that
ys u
J
X
θjs βj
∀s = 1, . . . , S
j=1
for some coefficients θjs ∈ R.
P
ŷs = Jj=1 θjs βj provides a smoothed version of ys which make
alterations more easily detectable.
the atoms (βj )1≤j≤J together with the coefficients (θs )1≤s≤S possibly
indicates principal and shared alterations in the data.
8 / 23
The starting point
min
θs ,βj
s.t.
J
J
S
J
2
X
X
X
1 X
kβj k1 + µ
TV (βj )
θjs βj + λ
ys −
2
s=1
S
X
j=1
i=1
j=1
θjs2 ≤ 1, for all j = 1, . . . , J, and fixed λ > 0 and µ > 0
s=1
This model has been proposed in [?]
the penalty λ kβj k1 + µTV (βj ) is called fused lasso and forces the
solution atoms to be sparse and piece-wise constant.
the hard constraints on the coefficients θ·j are imposed for consistency
and identifiability of the model.
chromosomes are analyzed one-by-one. As a consequence different
family of atoms are obtained for different chromosomes.
9 / 23
The proposed model I
S
min
θs ,βj
J
2
X
1 X
θjs βj ys −
2
s=1
+λ
i=1
J
X
kβj k21
j=1
s.t. 0 ≤ θjs ≤ 1 ,
+µ
J
X
TVw (βj ) + τ
S
X
kθs k21
s=1
j=1
∀j = 1, . . . , J
weighed total variation TVw (βj ) =
PL−1
l=1
wl |βl+1,j − βl,j |;
coefficients are constrained to be positive: θjs ≥ 0.
structured sparsity along columns of the matrix of atoms [β1 , . . . , βJ ]
and coefficients [θ1 , . . . , θS ] too.
10 / 23
The proposed model II
min
θs ,βj
J
S J
J
S
2
X
X
X
X
1 X
θjs βj + λ
kβj k21 + µ
TVw (βj ) + τ
kθs k21
ys −
2 s=1
s=1
i=1
j=1
j=1
s.t. 0 ≤ θjs ≤ 1 ,
∀j = 1, . . . , J
Improvements:
TVw allows to relax at some points the constraint of “small
variations” on the atoms.
the signal of the entire genome can now be analyzed at once, still
guaranteeing independency for each chromosome (at the end points of
chromosomes the weights are set to zero).
concomitant alterations occurring on different chromosomes can be
identified.
The coefficients are constrained to be positive.
This reduces the complexity of the matrix [θls ] and forces the atoms βj
to be more informative.
11 / 23
General Setting
Problem
min
B,Θ
where
g (B) = λ
1
kY − BΘk2F + h(Θ) + g (B)
2
J
X
kB(:, j)k21
+µ
j=1
J
X
TVw (B(:, j)),
j=1
h(Θ) = δ∆S×J (Θ) + τ
S
X
kΘ(:, s)k21
s=1
and ∆ = [0, 1].
12 / 23
Convergence Theorem
Problem
min
B,Θ
1
2
(1)
defining the partial functions
13 / 23
Convergence Theorem
Problem
min
B,Θ
1
2
(1)
ϕB (Θ) =
1
2
kY − BΘk2F + h(Θ),
ψΘ (B) =
1
2
kY − BΘk2F + g (B)
13 / 23
Convergence Theorem
Problem
min
B,Θ
1
2
ϕB (Θ) = 12 kY − BΘk2F + h(Θ),
ψΘ (B) =
1
2
(1)
Theorem (Attouch et al. 2010)
ηk , ζk ∈ [ρ1 , ρ2 ]
Θk+1 ≈εn proxηk ϕB (Θk )
k
Bk+1 ≈εn proxζk ψΘ
k+1
(Bk )
0 < ρ1 ≤ ρ2 < +∞. If εn = 0, then (Bk , Θk ) → (B̂, Θ̂) a stationary point
of (1)
13 / 23
Convergence Theorem
Problem
min
B,Θ
1
2
ϕB (Θ) = 12 kY − BΘk2F + h(Θ),
ψΘ (B) =
1
2
(1)
Theorem (Attouch et al. 2010)
ηk , ζk ∈ [ρ1 , ρ2 ]
Θk+1 ≈εn argminΘ ϕBk (Θ) + 2η1k kΘ − Θk k2F
Bk+1 ≈εn argminB ψΘk+1 (B) + 2ζ1k kB − Bk k2F
0 < ρ1 ≤ ρ2 < +∞. If εn = 0, then (Bk , Θk ) → (B̂, Θ̂) a stationary point
of (1)
13 / 23
Numerical Experiments Data generation
Signal model
yls = µls + ls ,
µls =
Ms
X
cms 1[lms ,lms +kms ] [l],
ls ∼ N(0, σ 2 ),
m=1
µ·s is the mean signal, σ is the standard deviation of the noise ls ;
Ms is the number of segments (regions of CNVs) for sample s;
cms , lms and kms are the height, starting position and length
respectively for each segment.
We choose:
L = 1000, S = 20, σ ∈ {1, 2}
Ms ∈ {1, 2, 3, 4, 5}, cms ∈ {±1, ±2, ±3, ±4, ±5},
lms ∈ {1, . . . , L − 100} and kms ∈ {5, 10, 20, 50, 100}.
Signal model and numbers follow [?].
14 / 23
Synthetic Dataset
This dataset is explicitly designed to mimic a real signal composed of
different chormosomes. We build two classes of samples A and B and a
third randomly built class.
A
chr1
chr2
chr3
chr4
chr3
chr4
B
chr1
chr2
15 / 23
Synthetic Dataset
This dataset is explicitly designed to mimic a real signal composed of
different chormosomes. We build two classes of samples A and B and a
third randomly built class.
A
chr1
chr2
chr3
chr4
chr3
chr4
B
chr1
chr2
15 / 23
Numerical Experiments Parameters selection
The Bayesian information criterion
The choice of the parameters (J, λ, µ, τ ) is done according to the Bayesian
information criterion (BIC) [?]. The BIC mitigates the problem of
overfitting by introducing a penalty term for the complexity of the model.
In our case the BIC is written as:
(SL) · log
kY − BΘk2 F
SL
+ k(B) log(SL)
and k(B) is computed as the number of jumps in B and ultimately
depends on the parameters (J, λ, µ, τ ).
After some standard normalizations, we choose:
J ∈ {5, 10, 15, 20},
λ ∈ {10−4 , 10−3 , 10−2 , 10−1 }, µ ∈ {1, 5, 10},
τ ∈ {10−3 , 10−2 , 10−1 , 1}.
16 / 23
ROC curves
True Positive rate vs False Positive rate
17 / 23
Estimated Y and B
FFlat, [?]
Solution 1
18 / 23
Estimated Y and B
Our proposal, [?]
Solution 1
18 / 23
Estimated Y and B
FFlat, [?]
Solution 2
18 / 23
Estimated Y and B
Our proposal, [?]
Solution 2
18 / 23
original and reconstructed aCGH data
19 / 23
reconstructed coefficient matrices
20 / 23
phylogenetic trees
21 / 23
References
S. Salzo and S. Villa
Inexact and accelerated proximal point algorithms.
Journal of Convex Analysis 19, No. 4, 1167–1192, 2012.
S. Villa, S. Salzo, L. Baldassarre and A. Verri
Accelerated and inexact forward-backward algorithms, SIOPT, 2013.
S. Salzo, S. Masecchia, A. Verri,A. Barla
Alternating Proximal Regularized Dictionary Learning. NECO 2014.
Salvatore Masecchia, Simona Coco, Annalisa Barla, Alessandro Verri, Gian Paolo
Tonini,
Genome instability model of metastatic neuroblastoma tumorigenesis by a
dictionary learning algorithm. Submitted.
22 / 23
Thank you
23 / 23

learning, regularization, and optimization for computational biology

Transcript

Documenti analoghi

Elementary Affine Logic and the Call by Value Lambda Calculus

Snapshots of the retarded interaction of charge charriers with

Motion and depth estimation for autonomous

Call for Papers

Lattice supersymmetry in one dimension with two supercharges

NVP melt/magma viscosity: insight on Mercury lava flows

Davanti al mare - ATTO I Diego Perrone curated

Clusters

Presentazione standard di PowerPoint

(HRUS) of the hip: a detailed didactic approach

babar - Università degli Studi della Basilicata

Architetture e Protocolli nelle Reti Peer-to-Peer

Molecular Remission in PML/RARa-Positive Acute