06 tcp part1

Transcript

06 tcp part1
Lecture 6.
Internet Transport Layer:
introduction to the
Transport Control Protocol
(TCP)
RFC 793
(estensioni RFC 1122,1323,2018,2581,working group tsvwg)
G.Bianchi, G.Neglia, V.Mancuso
A MUCH more complex transport
for three main reasons
ÎConnection oriented
Öimplements mechanisms to setup and tear down a
full duplex connection between end points
ÎReliable
Öimplements mechanisms to guarantee error free
and ordered delivery of information
ÎFlow & Congestion controlled
Öimplements mechanisms to control traffic
G.Bianchi, G.Neglia, V.Mancuso
TCP services
Îconnection oriented
ÖTCP connections
Îreliable transfer
service
Öall bytes sent are received
ÎTCP functions
Æ application addressing (ports)
Æ error recovery (acks and
retransmission)
Æ reordering (sequence numbers)
Æ flow control
Æ congestion control
Appl.
Appl.
TCP
TCP
IP
IP
IP
IP
IP
G.Bianchi, G.Neglia, V.Mancuso
Byte stream service
ÎTCP exchange data between applications as
a stream of bytes
ÎIt does not introduce any data delimiter (an
application duty)
Ö source application may enter 10 bytes followed by 1 and 40 (grouped
with some semantics)
Ö data is buffered at source, and transmitted
Ö at receiver, may be read in the sequence 25 bytes, 22 bytes and 4
bytes...
Application view
TCP view
G.Bianchi, G.Neglia, V.Mancuso
TCP segments
ÎApplication data broken into segments for transmission
Îsegmentation totally up to TCP, according to what TCP
considers being the best strategy
Îeach segment placed into an IP packet
Îvery different from UDP!!
Header TCP
Header IP
TCP data
IP data
G.Bianchi, G.Neglia, V.Mancuso
Header TCP
Header IP
TCP data
IP data
TCP segment format
20 bytes header (minimum)
0
3
7
15
Source port
31
Destination port
32 bit Sequence number
32 bit acknowledgement number
Header
length
U
6 bit
R
Reserved G
checksum
A P R S F
C S S Y I
K H T N N
Urgent pointer
Options (if any)
Data (if any)
G.Bianchi, G.Neglia, V.Mancuso
Window size
padding
Source port
Destination port
32 bit Sequence number
32 bit acknowledgement number
Header
length
U
6 bit
R
Reserved G
checksum
A P R S F
C S S Y I
K H T N N
Window size
Urgent pointer
ÎSource & destination port + source and
destination IP addresses
Öunivocally determine TCP connection
Îchecksum as in UDP
Ösame calculation including same pseudoheader
Îno explicit segment length specification
G.Bianchi, G.Neglia, V.Mancuso
Source port
Destination port
32 bit Sequence number
32 bit acknowledgement number
Header
length
U
6 bit
R
Reserved G
checksum
A P R S F
C S S Y I
K H T N N
Window size
Urgent pointer
Options (if any)
00000000
ÎHeader length: 4 bits
Öspecifies the header size (n*4byte words) for options
Ömaximum header size: 60 (15*4)
Öoption field size must be multiple of 32bits: zero padding
when not.
ÎReserved: 000000 (still today!)
G.Bianchi, G.Neglia, V.Mancuso
Reliable data transfer: issues
packet
packet
INTERNET
PROBLEMS:
1)
2)
Packet received with errors
Packet not received at all
Same problem considered at DATA LINK LAYER
(although it is less likely that a whole packet is lost at data link)
Îmechanisms to guarantee correct reception:
ÖForward Error Correction (FEC) coding schemes
ÆPowerful to correct bits affected by error, less effective in
case of packet loss
ÆMostly used at link layer
ÖRetransmission – issues:
ÆACK
ÆNACK
ÆTIMEOUT
G.Bianchi, G.Neglia, V.Mancuso
Retransmission scenarios
referred to as ARQ schemes (Automatic Repeat reQuest)
COMPONENTS: a) error checking at receiver; b) feedback to sender; c) retx
SRC
DATA
ACK
SRC
DST
Error
Check:
OK
SRC
Retx
Timeout
(RTO)
DATA
NACK
Automatic
retransmit
Basic ACK idea
DATA
DST
Error
Check:
corrupted
DATA
Basic NACK idea
DST
SRC
DST
DATA
SRC
DATA
ACK
DATA
DATA
Basic ACK/Timeout idea
G.Bianchi, G.Neglia, V.Mancuso
DATA
DST
Error
Check:
corrupted
Why sequence numbers?
(on data)
Sender side:
Receiver side:
DATA
DATA
RTO
ACK
NETWORK
(ACK lost)
rtx
DATA
DATA
Need to univocally “label” all packets circulating
in the network between two end points.
1 bit (0-1) enough for Stop-and-wait
G.Bianchi, G.Neglia, V.Mancuso
New data?
Old data?
Why sequence numbers?
(on ack)
Sender side:
Receiver side:
DATA 1
ACK
DATA 2
Duplicated
ACK
Queueing
Delay
ACK
DATA 3
Data 2 lost !!
With pathologically critical network (as the Internet!)
also need to univocally “label” all acks circulating
in the network between two end points.
1 bit (0-1) enough for Stop-and-wait
G.Bianchi, G.Neglia, V.Mancuso
Source port
Destination port
32 bit Sequence number
32 bit acknowledgement number
Header
length
U
6 bit
R
Reserved G
checksum
A P R S F
C S S Y I
K H T N N
Window size
Urgent pointer
ÎSequence number:
ÖSequence number of the first byte in the segment.
ÖWhen reaches 232-1, next wraps back to 0
ÎAcknowledgement number:
Övalid only when ACK flag on
ÖContains the next byte sequence number that the host expects to
receive (= last successfully received byte of data + 1)
Ögrants successful reception for all bytes up to ack# - 1 (cumulative)
ÎWhen seq/ack reach 232-1, next wrap back to 0
G.Bianchi, G.Neglia, V.Mancuso
TCP data transfer management
ÎFull duplex connection
Ödata flows in both directions, independently
ÖTo the application program these appear as two unrelated
data streams
ÖImpossible to build multicast connection
Îeach end point maintains a sequence
number
ÖIndependent sequence numbers at both ends
ÖMeasured in bytes
Îacks often carried on top of reverse
flow data segments (piggybacking)
ÖBut ack packets alone are possible
G.Bianchi, G.Neglia, V.Mancuso
Byte-oriented
Example: 1 kbyte message – 1024 bytes
0
1
… … 100 … … 535 … … … 1023
Example: segment size = 536 bytes Æ 2 segments: 0-535; 536-1023
sender
seq=0
receiver
ÎNo explicit segment size
indication
36
Ack=5
seq=536
Ö Seq = first byte number
Ö Returning Ack = last byte
number + 1
Ö Segment size = Ack-seq#
0 24
Ack=1
time
G.Bianchi, G.Neglia, V.Mancuso
time
Pipelining
Example: 1024 bytes msg; seg_size = 536 bytes Æ 2 segments: 0-535; 536-1023
0
sender
1
… … 100 … … 535 … … … 1023
seq=0
receiver
seq=536
Î Other name
36
Ack=5
0 24
Ack=1
Î Basic approaches for error recovery
time
Ö sliding window mechanisms
Ö Go-Back-N and Selective Repeat
time
Why pipelining? Dramatic improvement in efficiency!
G.Bianchi, G.Neglia, V.Mancuso
Cumulative ack
Example: 1024 bytes msg; seg_size = 536 bytes Æ 2 segments: 0-535; 536-1023
0
sender
1
… … 100 … … 535 … … … 1023
seq=0
receiver
ÎCumulative ack
seq=536
Ö ACK = all previous bytes
correctly received!
Ö E.g. ACK=1024: all bytes 0-1023
received
0 24
Ack=1
ÎOther names of pipelining:
time
time
Ö Go-Back-N ARQ mechanisms
Ö Sliding window mechanisms
Why pipelining? Dramatic improvement in efficiency!
G.Bianchi, G.Neglia, V.Mancuso
Multiple acks; Piggybacking
Bytes 100
-1
Immediate ack,
no payload
99, seq=1
00,
= 200
k
c
A
,
EMPTY
= 200
k
c
a
,
45 0
=
q
e
s
525,
0
5
4
Bytes
Bytes 200
-2
CLIENT
G.Bianchi, G.Neglia, V.Mancuso
Data in reverse
direction,carries
previous ack
49, seq=2
00, ack=5
26
Next segment,
piggybacked ack
SERVER
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
TCP data transfer
bidirectional example
Segment size = 6
Segment size = 4
Time 0:
Seq=1, NO ack
Seq=112, NO ack
Time 1:
Seq=7, ack=116
Seq=116, ack=7
Time 2:
Seq=13, ack=119
Seq=119, ack=13
Time 3:
G.Bianchi, G.Neglia, V.Mancuso
Seq=119, ack=17
118
117
116
115
114
113
112
Performance issues
with/without pipelining
G.Bianchi, G.Neglia, V.Mancuso
Link delay computation
sender
Tx delay
B/C
ÎTransmission delay:
ÎC [bit/s] = link rate
Router
ÎB [bit] = packet size
Îtransmission delay = B/C [sec]
ÎExample:
Î512 bytes packet
receiver
Î64 kbps link
Îtransmission delay = 512*8/64000 = 64ms
Prop
delay
time
G.Bianchi, G.Neglia, V.Mancuso
ÎPropagation delay – constant depending on
ÎLink length
Tx delay
B/C
ÎElectromagnetig waves propagation speed in
considered media
Î200 km/s for copper links
Î300 km/s in air
time
ÎHypothesis: other delays neglected
Îqueueing
Îprocessing
Stop-and-wait performance
Router 1
28.8 Kbps
1 ms
Start tx
Router 2
1024 Kbps
30 ms
28.8 Kbps
1 ms
prop1
tx1
prop2
tx2
prop3
tx3
Completion time =
Tx1 + Prop1 + Tx2 + Prop2 + Tx3 + Prop3 +
Ack_Tx1 + Prop1 + Ack_Tx2 + Prop2 + Ack_Tx3 +
Prop3 + [same computation for other segments]=
……………
time
G.Bianchi, G.Neglia, V.Mancuso
time
= RTT + Tx1 + Tx2 + Tx3 + Ack_Tx1 +
Ack_Tx2 + Ack_Tx3 +...
Stop-and-wait performance
Numerical example
Router 1
28.8 Kbps
1 ms
Î Message:
Ö 1024 bytes;
Ö 2 segments:
536+488 bytes
Ö Overhead: 20 bytes
TCP + 20 bytes IP
Ö ACK = 40 bytes
(header only)
G.Bianchi, G.Neglia, V.Mancuso
Router 2
28.8 Kbps
1 ms
1024 Kbps
30 ms
Î Segment 1:
Ö Tx1 = 576*8/28,8 =
160ms
Ö Tx3 = Tx1
Ö Tx2 = 576*8/1024 =
4,5 ms
Î Acks:
Ö Tx1 = Tx3 =
40*8/28,8 = 11,1ms
Ö Tx2 = 40*8/1024 =
0,3 ms
RESULT:
Î Segment 2:
Ö Tx1 = 528*8/28,8 =
146,7ms
Ö Tx3 = Tx1
Ö Tx2 = 528*8/1024 =
4,1 ms
D = 667 (tx total) + 2*RTT =
= 795 ms
THR = 1024*8/795 =
= 10,3 kbps
Stop-and-wait performance
Numerical example
Router 1
With
ISDN?
128 Kbps
1 ms
Î Segment 1:
Ö Tx1 = Tx3 =
576*8/128 = 36ms
Ö Tx2 = 576*8/1024 =
4,5 ms
Î Segment 2:
Router 2
1024 Kbps
30 ms
Î Acks:
Ö Tx1 = Tx3 =
40*8/128 = 2,5 ms
Ö Tx2 = 40*8/1024 =
0,3 ms
RESULT:
Ö Tx1 = Tx3 =
528*8/128 = 33 ms D = 157,2 (tx total) + 2*RTT =
Ö Tx2 = 528*8/1024 =
= 279,9 ms
4,1 ms
THR = 1024*8/279,9 =
= 29,3 kbps
G.Bianchi, G.Neglia, V.Mancuso
128 Kbps
1 ms
on Gbps fiber optics?
D = negligible + 2*RTT =
= 128 ms
THR = 1024*8/128 =
= 64 kbps
Pipelining performance
Router 1
256 Kbps
1 ms
Start tx
Router 2
256 Kbps
1 ms
1024 Kbps
30 ms
prop1
tx1
prop2
tx2
prop3
tx3
Completion time (neglecting processing & queueing) =
Tx1 + Prop1 + Tx2 + Prop2 + Tx3 + Prop3 + Tx_bottleneck
Ack_Tx1 + Prop1 + Ack_Tx2 + Prop2 +
Ack_Tx3 + Prop3 (that’s it!)
……………
full tx time
time
G.Bianchi, G.Neglia, V.Mancuso
time
time
time
Pipelining performance
numerical example
On 28,8 kbps links
On 128 kbps ISDN links
D = 347 (tx segm1+ack) + RTT +
+ 160 (segm2 bottleneck) =
= 571 ms
D = 81,8 (tx segm1+ack) + RTT +
+ 33 (segm2 bottleneck) =
= 178,8 ms
THR = 1024*8/571 =
= 14,3 kbps
THR = 1024*8/178,8 =
= 45,8 kbps
on Gbps fiber optics?
D = negligible + RTT = 64 ms
THR = 1024*8/64 = 128 kbps
G.Bianchi, G.Neglia, V.Mancuso
Simplified performance model
C bits/sec
Approximate analysis, much simpler than multi-hop
Typically, C = bottleneck link rate
MSS = segment size
MSIZE = message size
Ignore overhead
Ignore ACK transmission time
No loss of segments
W = number of outstanding segments
W=1: stop-and-wait
W>1: sliding window
This is a highly dynamic parameter in TCP!!
For now, consider W fixed
G.Bianchi, G.Neglia, V.Mancuso
W=1 case (stop-and-wait)
sender
receiver
One way delay
MSS/C
RTT
MSS
throughput =
RTT + MSS / C
REMARK: throughput always lower than
Available link rate!
time
G.Bianchi, G.Neglia, V.Mancuso
time
client
server
Start TCP
connection
Latency in TCP
retrieval model
RTT
Latency: time elapsing between TCP connection
Request, and last bit received at client
request
object
RTT
MSIZE ⎡ MSIZE ⎤
latency = 2 RTT +
+⎢
− 1⎥ RTT
C
⎢ MSS
⎥
Number of segments
In which message
is split
time
G.Bianchi, G.Neglia, V.Mancuso
time
W=1 case (stop-and-wait)
MSS = 1500 bytes
throughput
Throughput (Kbps)
10000
C=28,8 kbps
C=128 kbps
C=640 kbps
C=10 Mbps
1000
100
10
1
10
100
RTT (ms)
Under-utilization with: 1) high capacity links, 2) large RTT links
G.Bianchi, G.Neglia, V.Mancuso
1000
W=1 case (stop-and-wait)
Utilization
MSS = 1500 bytes
C=28,8 kbps
C=128 kbps
C=640 kbps
C=10 Mbps
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1
10
100
RTT (ms)
Under-utilization with: 1) high capacity links, 2) large RTT links
G.Bianchi, G.Neglia, V.Mancuso
1000
Pipelining (W>1) analysis
two cases
RTT
+1tx
W=4
?
W=10
time
time
WINDOW SIZE that allows
CONTINUOUS TRANSMISSION
UNDER-SIZED WINDOW:
THROUGHPUT INEFFICIENCY
G.Bianchi, G.Neglia, V.Mancuso
Continuous transmission
Condition in which link rate is fully utilized
MSS
W⋅
C
Time to transmit
W segments
MSS
> RTT +
C
Time to receive
Ack of first segment
We may elaborate:
W ⋅ MSS > RTT ⋅ C + MSS ≈ RTT ⋅ C
This means that full link utilization is possible when window size (in bits) is
Greater than the bandwidth (C bit/s) delay (RTT s) product!
G.Bianchi, G.Neglia, V.Mancuso
Bandwidth-delay product
D
C
ÎNetwork: like a pipe
ÎC [bit/s] x D [s]
Ö number of bits “flying” in the
network
Ö number of bits injected in the
network by the tx, before that the
first bit is received at distance D s
64Kbps
A 15360 (64000x0.240) bits
“worm” in the air!!
bandwidth-delay product = bytes # that saturates network pipe
G.Bianchi, G.Neglia, V.Mancuso
Bandwidth-delay product
(usually considered)
D
RTT=2*D
C
D
After D/4 s
ÎC [bit/s] x RTT [s]
Ö number of bits “flying” in the
network (partly received)
Ö number of bits injected in the
network by the tx, before that the
ack of the first bit is received
G.Bianchi, G.Neglia, V.Mancuso
Bandwidth-delay product
(usually considered)
D
C
D
After D s
ÎC [bit/s] x RTT [s]
Ö number of bits “flying” in the
network (partly received)
Ö number of bits injected in the
network by the tx, before that the
ack of the first bit is received
G.Bianchi, G.Neglia, V.Mancuso
Bandwidth-delay product
(usually considered)
D
C
C*D/4
bits at the receiver
D
Ack
After 5/4*D s
ÎC [bit/s] x RTT [s]
Ö number of bits “flying” in the
network (partly received)
Ö number of bits injected in the
network by the tx, before that the
ack of the first bit is received
G.Bianchi, G.Neglia, V.Mancuso
Bandwidth-delay product
(usually considered)
D
C
C*D
bits at the receiver
D
Ack
After 2D s
ÎC [bit/s] x RTT [s]
Ö number of bits “flying” in the
network (partly received)
Ö number of bits injected in the
network by the tx, before that the
ack of the first bit is received
G.Bianchi, G.Neglia, V.Mancuso
Bandwidth-delay product
(usually considered)
D1
C
C*D2
bits at the receiver
D2
Ack
ÎC [bit/s] x RTT [s]
Ö number of bits “flying” in the
network (partly received)
Ö number of bits injected in the
network by the tx, before that the
ack of the first bit is received
G.Bianchi, G.Neglia, V.Mancuso
After RTT=D1+D2 s
Long Fat Networks
LFNs (el-ef-an(t)s): large bandwidth-delay product
NETWORK
RTT (ms)
rate (kbps)
BxD (bytes)
Ethernet
3
10.000
3.750
T1, transUS
60
1.544
11.580
T1 satellite
480
1.544
92.640
T3 transUS
60
45.000
337.500
Gigabit transUS
60
1.000.000
7.500.000
The 65535 (16 bit field in TCP header) maximum window size W
may be a limiting factor!
G.Bianchi, G.Neglia, V.Mancuso
Pipelining (W>1) analysis
W ⋅ MSS
⎛
⎞
thr = min⎜ C ,
⎟
⎝ RTT + MSS / C ⎠
2 RTT
Delay analysis (for TCP object retrieval) –
Non continuous transmission
MSIZE
latency = 2 RTT +
+
C
(W − 1) MSS ⎞
⎡ MSIZE
⎤⎛
+⎢
− 1⎥ ⎜ RTT −
⎟
W
⋅
MSS
C
⎢
⎥⎝
⎠
RTT
(+1tx)
Continuous transmission case:
W
G.Bianchi, G.Neglia, V.Mancuso
MSIZE
latency = 2 RTT +
C
Throughput for pipelining
MSS = 1500 bytes
1 Mbps link speed
T h ro u g h p u t (K b p s )
1200
1000
W=1
800
W=4
600
W=16
W=2
400
200
0
0
100
G.Bianchi, G.Neglia, V.Mancuso
200
300
RTT (ms)
400
500
600
Maximum achievable throughput
Throughput (Mbps)
(assuming infinite speed line…)
1000
W = 65535 bytes
100
10
1
0
100
200
300
RTT (ms)
G.Bianchi, G.Neglia, V.Mancuso
400
500