# Scalability of delays in input queued switches

## Transcript

Scalability of delays in input queued switches
```Scalability of delays in input queued switches
Paolo Giaccone
Notes for the class on “Router and Switch Architectures”
Politecnico di Torino
December 2013
Scalability of delays
N × N switch
Key question
How does the average delay W scale with N, when N → ∞?
Assumptions
I
I
Bernoulli i.i.d. arrivals with rate λij ∈ [0, 1] cell/slot at input i for
output j
Admissible traffic: ρ ∈ (0, 1) and
X
λkj ≤ ρ
∀j
k
X
λik ≤ ρ
∀i
k
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
2 / 23
Output Queued (OQ) switch
Assume uniform traffic: λij = ρ/N
Delay of an OQ switch
W OQ
1
=1+ 1−
N
ρ
ρ
≈1+
= O(1)
2(1 − ρ)
2(1 − ρ)
(1)
Proof: each output queue is a slotted M/D/1 queue with binomial
(N, ρ/N) arrivals per slot and service time equal to one slot
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
3 / 23
Scheduling in Input Queued (IQ) switches
Queue-independent policies
I
I
Arrival rates are known
Frame scheduling
F
F
Random Frame scheduler
Periodic Frame scheduler
Queue-aware policies
I
I
Arrival rates are unknown
Slot-by-slot schedulers
F
I
e.g.: MWM, iSLIP
Queue-aware frame scheduler
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
4 / 23
Random Frame scheduling
Assume non uniform traffic such that
ρµij for some ρ < 1 where
P λij = P
[µij ] is double-stochastic matrix: k µik = k µkj = 1
P
P
Thanks to BvN Theorem, µij = k pk Mijk with pk ≥ 0, k pk = 1
and M k = [Mijk ] be one matching matrix (1 ≤ k ≤ N!).
At each timeslot, the scheduler selects M k at random with probability
pk
Delay of Random Frame (RF) scheduler
W RF =
P. Giaccone (Politecnico di Torino)
N −1
= O(N)
1−ρ
Delay and frame scheduling
Dec. 2013
5 / 23
Proof - I
For a slotted Geom/Geom/1 queue with arrival probability λ and service
probability µ, the average delay is
W Geom/Geom/1 =
η
λ(1 − η)
with
η=
λ(1 − µ)
µ(1 − λ)
(2)
In the random frame scheduler, VOQij is a Geom/Geom/1 queue with
service probability
1 − Pr(VOQij is not served) = 1 −
Y
1 − pk Mijk ≈
k
1− 1−
X
k
X
pk Mijk = µij
pk Mijk =
k
and arrival probability λij = ρµij .
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
6 / 23
Proof - II
Now
η=ρ
1 − µij
1 − ρµij
1−η =
1−ρ
1 − ρµij
Recalling (2)
W ij =
(1 − µij ) (1 − ρµij )
1 − µij
1
ρ
=
ρµij (1 − ρµij ) (1 − ρ)
µij (1 − ρ)
The total arrival traffic is
λtot =
X
λij = ρN
i,j
and the overall average delay is
W RF =
1 X
1 X λij (1 − µij )
λij W ij =
=
λtot
Nρ
µij (1 − ρ)
i,j
i,j
1 X ρ(1 − µij )
1 X ρ − λij
N 2 ρ − Nρ
N −1
=
=
=
Nρ
(1 − ρ)
Nρ
1−ρ
Nρ(1 − ρ)
1−ρ
i,j
P. Giaccone (Politecnico di Torino)
i,j
Delay and frame scheduling
Dec. 2013
7 / 23
Periodic Frame scheduling
Assume uniform traffic: λij = ρ/N
The scheduler serves each VOQ exactly every N timeslots
I
I
I
fixed frame of N timeslots
during timeslot t, input i is connected to
e.g. for N = 3: frame is (M1 , M2 , M3 )



1 0 0
0 1
M1 = 0 1 0
M2 = 0 0
0 0 1
1 0
output (i + t) mod N

0
1
0

0
M3 = 1
0
0
0
1

1
0
0
Delay of Periodic Frame (PF) scheduling
W PF = 1 +
P. Giaccone (Politecnico di Torino)
N
= O(N)
2(1 − ρ)
Delay and frame scheduling
Dec. 2013
8 / 23
Proof - I
Each VOQ is a slotted, single server queue with arrival probability ρ/N
and actual/tentative services every N slots.
Now sample the state of the queue every N slots, in correspondence of
each service opportunity. The sampling period is N slots.
The VOQ appears as a slotted M/D/1 queue with binomial (N, ρ/N)
arrivals and service equal to one sampling period. For such queue, we
know (see (1)):
ρ
W M/D/1 ≈ 1 +
2(1 − ρ)
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
9 / 23
Proof - II
The average delay in the original VOQ will be N times the M/D/1 delay.
In addition, the delay must include the average waiting time before being
served and must be reduced since the service time is just 1 slot and not N
slots as in the considered sub-sampled system.
W PF =
N
+ NW M/D/1 − (N − 1) =
2
N
Nρ
N
Nρ
+N +
−N +1=
+1+
=
2
2(1 − ρ)
2
2(1 − ρ)
N
ρ
N
1
1+
1+
=1+
2
1−ρ
2 1−ρ
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
10 / 23
Queue-independent Frame Scheduling
For the previous queue-independent frame schedulers
W RF = O(N)
W PF = O(N)
General property:
Delay of a Queue-Independent frame scheduler
For any scheduling algorithm that operates independently of the queue size
W queue−independent = O(N)
(3)
proved in  M.J. Neely, E. Modiano, Y.S. Cheng, “Logarithmic
Delay for N × N Packet Switches Under the Crossbar Constraint”,
IEEE Transaction on Networking, Vol. 15, N. 3, June 2007
by comparing with (3) with (1), queue-independent frame scheduling
appears inefficient in terms of delays
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
11 / 23
Generic Frame Scheduling
Occupancy matrix L = [Lij ] where Lij is the size of VOQij
BvN theorem
T ? = max
( N
X
i=1,...,N
k=1
Lik ,
N
X
)
Lki
k=1
is the minimum clearance time for L
Minimum clearance time and maximal size matchings
Any arbitrary sequence of maximal size matchings will be able to serve all
packets of L in ≤ 2T ? − 1 timeslots.
Proof: A given packet can be delayed by at most T ? − 1 packets on the
same input and by at most T ? − 1 packets on the same output. In total,
each packet can be delayed by at most 2T ? − 2 other packets.
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
12 / 23
Queue-aware frame scheduling
Fix the frame duration T
Let Ik = [Tk, . . . , T (k + 1) − 1] the slots corresponding to the kth
frame
At the beginning of kth frame, i.e. at slot Tk, the scheduler computes
all the matchings for the future slots in Ik based on just the arrivals in
Ik−1
Overflow packets are packets that arrived in Ik−1 and were not served
in Ik
We assume (for now) that overflow packets are dropped
Key Idea
1
Choose T large enough to (almost) avoid overflow packets
2
Delays are O(T )
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
13 / 23
Queue-aware frame scheduling
Let A(T ) = [aij (T )] be the cumulative number of arrived packets during
the kth frame Ik
By BvN theorem,
it is possible
P
P to serve all the packets and avoid overflow
packets iff k aik ≤ T and k akj ≤ T
We will show that if T = θ(log(N))
Pr(frame overflow) can become negligible
delays become O(log(N))
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
14 / 23
Chernoff Bound
Theorem
Let X1 , X2 , . . . , Xn be independent
binary random variables, with
Pn
pi = Pr(Xi = 1). Let X = i=1 Xi and µ = E [X ]. For any δ > 0:
Pr(X > (1 + δ)µ) <
eδ
(1 + δ)(1+δ)
µ
(4)
1
P(X>(1+δ)µ)
0.01
µ=1
µ=10
µ=100
µ=1000
µ=10000
0.0001
1e-06
1e-08
1e-10
0
0.2
0.4
0.6
0.8
1
δ
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
15 / 23
Proof from Wikipedia
From http://en.wikipedia.org/wiki/Chernoff_bound
Q
E [ ni=1 exp(tXi )]
Pr[X > (1 + δ)µ)] ≤ inf
t>0 exp(t(1 + δ)µ)
Qn
E[exp(tXi )]
= inf i=1
t>0 exp(t(1 + δ)µ)
Qn
[pi exp(t) + (1 − pi )]
= inf i=1
t>0
exp(t(1 + δ)µ)
The third line above follows because e tXi takes the value e t with
probability pi and the value 1 with probability 1 − pi .
Rewriting pi e t + (1 − pi ) as pi (e t − 1) + 1 and recalling that 1 + x ≤ e x
(with strict inequality if x > 0), we set x = pi (e t − 1).
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
16 / 23
Proof from Wikipedia
Thus,
Qn
t
− 1))
exp(t(1 + δ)µ)
Pn
t
exp ((e − 1) i=1 pi )
exp((e t − 1)µ)
=
=
.
exp(t(1 + δ)µ)
exp(t(1 + δ)µ)
Pr[X > (1 + δ)µ] <
i=1 exp(pi (e
If we simply set t = log(1 + δ) so that t > 0 for δ > 0, we can substitute
and find
µ
exp((e t − 1)µ)
exp((1 + δ − 1)µ)
exp(δ)
=
=
exp(t(1 + δ)µ)
(1 + δ)(1+δ)µ
(1 + δ)(1+δ)
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
17 / 23
Minimum frame size to avoid overflow
Frame size and overflow
Let γ = ρe 1−ρ . If
T ≥
log(N/)
log(1/γ)
then Pr(frame overflow) ≤ .
1
0.8
γ
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
ρ
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
18 / 23
Minimum frame size to avoid overflow
Frame size and overflow
Let γ = ρe 1−ρ . If
T ≥
log(N/)
log(1/γ)
then Pr(frame overflow) ≤ .
Minimum frame size for N=16
100000
Minimum frame size for N=1024
100000
ε=0.1
ε=0.01
ε=0.001
ε=0.0001
10000
ε=0.1
ε=0.01
ε=0.001
ε=0.0001
10000
T
1000
T
1000
100
100
10
10
1
1
0
0.2
0.4
0.6
0.8
1
ρ
P. Giaccone (Politecnico di Torino)
0
0.2
0.4
0.6
0.8
1
ρ
Delay and frame scheduling
Dec. 2013
19 / 23
Proof - I
Consider a generic output j. Let C (T ) be the number of packets arrived
during the frame and destined to j:
C (T ) =
N
X
aij (T )
i=1
Pr (overflow for output j) = Pr (C (T ) > T )
PN PT
C (T ) = i=1 t=1 Xit where Xit = 1 with probability λij and all Xit are
independent random variables. We can use Chernoff Bound:
µ = E [C (T )] =
N X
T
X
E [Xit ] = T
i=1 t=1
having defined ρ0 =
PN
P. Giaccone (Politecnico di Torino)
i=1 λij
N
X
λij = T ρ0
i=1
≤ ρ.
Delay and frame scheduling
Dec. 2013
20 / 23
Proof - II
Using (4)
µ
µ
eδ
Pr(C (T ) > T ) = Pr C (T ) > 0 <
ρ
(1 + δ)(1+δ)
being 1 + δ = 1/ρ0 and δ = 1/ρ0 − 1
!ρ0 T
0
e 1/ρ −1
=
Pr (C (T ) > T ) <
(1/ρ0 )1/ρ0
0
e 1−ρ
1/ρ0
!T
0
= (ρ0 e 1−ρ )T ≤ γ T
since function γ is increasing with respect to ρ.
By the union bound:
X
Pr(overflow for any output) ≤
Pr(overflow for output j) ≤ Nγ T
j
Now we can set
Nγ T
≤ and obtain
T log(γ) ≤ log(/N)
P. Giaccone (Politecnico di Torino)
⇒
Delay and frame scheduling
T ≥
log(N/)
log(1/γ)
Dec. 2013
21 / 23
Queue-Aware Frame scheduling
Assume enough small to experience negligible frame overflows. Then all
packets are served with a delay ≤ 2T .
Delay for Queue-Aware Frame scheduling
WQAF ≤ 2
log(N/)
= O(log N)
log(1/γ)
Note that the  proves formally the property for the average delay W QAF
by considering also the delays for the overflow packets, which are not
dropped as assumed here.
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
22 / 23
Conclusions
Take home messages
Output-Queued switch W OQ = O(1)
Queue-Independent Frame scheduler W QIF = O(N)
I
Random Frame scheduler W RF = O(N)
I
Periodic Frame scheduler W PF = O(N)
Queue-Aware Frame scheduler W QAF = O(log N)
P. Giaccone (Politecnico di Torino)
Delay and frame scheduling
Dec. 2013
23 / 23
```

### CESARE GIACCONE Dettagli

### Specifics of CANopen protocol implementation Dettagli

### cross country - Speed Bikes Dettagli

### Implementation of CANopen protocol Dettagli

### Linee di estrusione Profili Giunto Termico Extrusion Lines for PA Dettagli

### 3DPmation - Accademia Carrara di Belle Arti Bergamo Dettagli

### Eventide Dettagli

### Dichiarazione di accessibilità Introduzione Validazione Dettagli

### “Biodegradable Polymers: Synthesis and Functionalisation” Dettagli

### qui Dettagli