06 tcp part1
Transcript
06 tcp part1
Lecture 6. Internet Transport Layer: introduction to the Transport Control Protocol (TCP) RFC 793 (estensioni RFC 1122,1323,2018,2581,working group tsvwg) G.Bianchi, G.Neglia, V.Mancuso A MUCH more complex transport for three main reasons ÎConnection oriented Öimplements mechanisms to setup and tear down a full duplex connection between end points ÎReliable Öimplements mechanisms to guarantee error free and ordered delivery of information ÎFlow & Congestion controlled Öimplements mechanisms to control traffic G.Bianchi, G.Neglia, V.Mancuso TCP services Îconnection oriented ÖTCP connections Îreliable transfer service Öall bytes sent are received ÎTCP functions Æ application addressing (ports) Æ error recovery (acks and retransmission) Æ reordering (sequence numbers) Æ flow control Æ congestion control Appl. Appl. TCP TCP IP IP IP IP IP G.Bianchi, G.Neglia, V.Mancuso Byte stream service ÎTCP exchange data between applications as a stream of bytes ÎIt does not introduce any data delimiter (an application duty) Ö source application may enter 10 bytes followed by 1 and 40 (grouped with some semantics) Ö data is buffered at source, and transmitted Ö at receiver, may be read in the sequence 25 bytes, 22 bytes and 4 bytes... Application view TCP view G.Bianchi, G.Neglia, V.Mancuso TCP segments ÎApplication data broken into segments for transmission Îsegmentation totally up to TCP, according to what TCP considers being the best strategy Îeach segment placed into an IP packet Îvery different from UDP!! Header TCP Header IP TCP data IP data G.Bianchi, G.Neglia, V.Mancuso Header TCP Header IP TCP data IP data TCP segment format 20 bytes header (minimum) 0 3 7 15 Source port 31 Destination port 32 bit Sequence number 32 bit acknowledgement number Header length U 6 bit R Reserved G checksum A P R S F C S S Y I K H T N N Urgent pointer Options (if any) Data (if any) G.Bianchi, G.Neglia, V.Mancuso Window size padding Source port Destination port 32 bit Sequence number 32 bit acknowledgement number Header length U 6 bit R Reserved G checksum A P R S F C S S Y I K H T N N Window size Urgent pointer ÎSource & destination port + source and destination IP addresses Öunivocally determine TCP connection Îchecksum as in UDP Ösame calculation including same pseudoheader Îno explicit segment length specification G.Bianchi, G.Neglia, V.Mancuso Source port Destination port 32 bit Sequence number 32 bit acknowledgement number Header length U 6 bit R Reserved G checksum A P R S F C S S Y I K H T N N Window size Urgent pointer Options (if any) 00000000 ÎHeader length: 4 bits Öspecifies the header size (n*4byte words) for options Ömaximum header size: 60 (15*4) Öoption field size must be multiple of 32bits: zero padding when not. ÎReserved: 000000 (still today!) G.Bianchi, G.Neglia, V.Mancuso Reliable data transfer: issues packet packet INTERNET PROBLEMS: 1) 2) Packet received with errors Packet not received at all Same problem considered at DATA LINK LAYER (although it is less likely that a whole packet is lost at data link) Îmechanisms to guarantee correct reception: ÖForward Error Correction (FEC) coding schemes ÆPowerful to correct bits affected by error, less effective in case of packet loss ÆMostly used at link layer ÖRetransmission – issues: ÆACK ÆNACK ÆTIMEOUT G.Bianchi, G.Neglia, V.Mancuso Retransmission scenarios referred to as ARQ schemes (Automatic Repeat reQuest) COMPONENTS: a) error checking at receiver; b) feedback to sender; c) retx SRC DATA ACK SRC DST Error Check: OK SRC Retx Timeout (RTO) DATA NACK Automatic retransmit Basic ACK idea DATA DST Error Check: corrupted DATA Basic NACK idea DST SRC DST DATA SRC DATA ACK DATA DATA Basic ACK/Timeout idea G.Bianchi, G.Neglia, V.Mancuso DATA DST Error Check: corrupted Why sequence numbers? (on data) Sender side: Receiver side: DATA DATA RTO ACK NETWORK (ACK lost) rtx DATA DATA Need to univocally “label” all packets circulating in the network between two end points. 1 bit (0-1) enough for Stop-and-wait G.Bianchi, G.Neglia, V.Mancuso New data? Old data? Why sequence numbers? (on ack) Sender side: Receiver side: DATA 1 ACK DATA 2 Duplicated ACK Queueing Delay ACK DATA 3 Data 2 lost !! With pathologically critical network (as the Internet!) also need to univocally “label” all acks circulating in the network between two end points. 1 bit (0-1) enough for Stop-and-wait G.Bianchi, G.Neglia, V.Mancuso Source port Destination port 32 bit Sequence number 32 bit acknowledgement number Header length U 6 bit R Reserved G checksum A P R S F C S S Y I K H T N N Window size Urgent pointer ÎSequence number: ÖSequence number of the first byte in the segment. ÖWhen reaches 232-1, next wraps back to 0 ÎAcknowledgement number: Övalid only when ACK flag on ÖContains the next byte sequence number that the host expects to receive (= last successfully received byte of data + 1) Ögrants successful reception for all bytes up to ack# - 1 (cumulative) ÎWhen seq/ack reach 232-1, next wrap back to 0 G.Bianchi, G.Neglia, V.Mancuso TCP data transfer management ÎFull duplex connection Ödata flows in both directions, independently ÖTo the application program these appear as two unrelated data streams ÖImpossible to build multicast connection Îeach end point maintains a sequence number ÖIndependent sequence numbers at both ends ÖMeasured in bytes Îacks often carried on top of reverse flow data segments (piggybacking) ÖBut ack packets alone are possible G.Bianchi, G.Neglia, V.Mancuso Byte-oriented Example: 1 kbyte message – 1024 bytes 0 1 … … 100 … … 535 … … … 1023 Example: segment size = 536 bytes Æ 2 segments: 0-535; 536-1023 sender seq=0 receiver ÎNo explicit segment size indication 36 Ack=5 seq=536 Ö Seq = first byte number Ö Returning Ack = last byte number + 1 Ö Segment size = Ack-seq# 0 24 Ack=1 time G.Bianchi, G.Neglia, V.Mancuso time Pipelining Example: 1024 bytes msg; seg_size = 536 bytes Æ 2 segments: 0-535; 536-1023 0 sender 1 … … 100 … … 535 … … … 1023 seq=0 receiver seq=536 Î Other name 36 Ack=5 0 24 Ack=1 Î Basic approaches for error recovery time Ö sliding window mechanisms Ö Go-Back-N and Selective Repeat time Why pipelining? Dramatic improvement in efficiency! G.Bianchi, G.Neglia, V.Mancuso Cumulative ack Example: 1024 bytes msg; seg_size = 536 bytes Æ 2 segments: 0-535; 536-1023 0 sender 1 … … 100 … … 535 … … … 1023 seq=0 receiver ÎCumulative ack seq=536 Ö ACK = all previous bytes correctly received! Ö E.g. ACK=1024: all bytes 0-1023 received 0 24 Ack=1 ÎOther names of pipelining: time time Ö Go-Back-N ARQ mechanisms Ö Sliding window mechanisms Why pipelining? Dramatic improvement in efficiency! G.Bianchi, G.Neglia, V.Mancuso Multiple acks; Piggybacking Bytes 100 -1 Immediate ack, no payload 99, seq=1 00, = 200 k c A , EMPTY = 200 k c a , 45 0 = q e s 525, 0 5 4 Bytes Bytes 200 -2 CLIENT G.Bianchi, G.Neglia, V.Mancuso Data in reverse direction,carries previous ack 49, seq=2 00, ack=5 26 Next segment, piggybacked ack SERVER 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 TCP data transfer bidirectional example Segment size = 6 Segment size = 4 Time 0: Seq=1, NO ack Seq=112, NO ack Time 1: Seq=7, ack=116 Seq=116, ack=7 Time 2: Seq=13, ack=119 Seq=119, ack=13 Time 3: G.Bianchi, G.Neglia, V.Mancuso Seq=119, ack=17 118 117 116 115 114 113 112 Performance issues with/without pipelining G.Bianchi, G.Neglia, V.Mancuso Link delay computation sender Tx delay B/C ÎTransmission delay: ÎC [bit/s] = link rate Router ÎB [bit] = packet size Îtransmission delay = B/C [sec] ÎExample: Î512 bytes packet receiver Î64 kbps link Îtransmission delay = 512*8/64000 = 64ms Prop delay time G.Bianchi, G.Neglia, V.Mancuso ÎPropagation delay – constant depending on ÎLink length Tx delay B/C ÎElectromagnetig waves propagation speed in considered media Î200 km/s for copper links Î300 km/s in air time ÎHypothesis: other delays neglected Îqueueing Îprocessing Stop-and-wait performance Router 1 28.8 Kbps 1 ms Start tx Router 2 1024 Kbps 30 ms 28.8 Kbps 1 ms prop1 tx1 prop2 tx2 prop3 tx3 Completion time = Tx1 + Prop1 + Tx2 + Prop2 + Tx3 + Prop3 + Ack_Tx1 + Prop1 + Ack_Tx2 + Prop2 + Ack_Tx3 + Prop3 + [same computation for other segments]= …………… time G.Bianchi, G.Neglia, V.Mancuso time = RTT + Tx1 + Tx2 + Tx3 + Ack_Tx1 + Ack_Tx2 + Ack_Tx3 +... Stop-and-wait performance Numerical example Router 1 28.8 Kbps 1 ms Î Message: Ö 1024 bytes; Ö 2 segments: 536+488 bytes Ö Overhead: 20 bytes TCP + 20 bytes IP Ö ACK = 40 bytes (header only) G.Bianchi, G.Neglia, V.Mancuso Router 2 28.8 Kbps 1 ms 1024 Kbps 30 ms Î Segment 1: Ö Tx1 = 576*8/28,8 = 160ms Ö Tx3 = Tx1 Ö Tx2 = 576*8/1024 = 4,5 ms Î Acks: Ö Tx1 = Tx3 = 40*8/28,8 = 11,1ms Ö Tx2 = 40*8/1024 = 0,3 ms RESULT: Î Segment 2: Ö Tx1 = 528*8/28,8 = 146,7ms Ö Tx3 = Tx1 Ö Tx2 = 528*8/1024 = 4,1 ms D = 667 (tx total) + 2*RTT = = 795 ms THR = 1024*8/795 = = 10,3 kbps Stop-and-wait performance Numerical example Router 1 With ISDN? 128 Kbps 1 ms Î Segment 1: Ö Tx1 = Tx3 = 576*8/128 = 36ms Ö Tx2 = 576*8/1024 = 4,5 ms Î Segment 2: Router 2 1024 Kbps 30 ms Î Acks: Ö Tx1 = Tx3 = 40*8/128 = 2,5 ms Ö Tx2 = 40*8/1024 = 0,3 ms RESULT: Ö Tx1 = Tx3 = 528*8/128 = 33 ms D = 157,2 (tx total) + 2*RTT = Ö Tx2 = 528*8/1024 = = 279,9 ms 4,1 ms THR = 1024*8/279,9 = = 29,3 kbps G.Bianchi, G.Neglia, V.Mancuso 128 Kbps 1 ms on Gbps fiber optics? D = negligible + 2*RTT = = 128 ms THR = 1024*8/128 = = 64 kbps Pipelining performance Router 1 256 Kbps 1 ms Start tx Router 2 256 Kbps 1 ms 1024 Kbps 30 ms prop1 tx1 prop2 tx2 prop3 tx3 Completion time (neglecting processing & queueing) = Tx1 + Prop1 + Tx2 + Prop2 + Tx3 + Prop3 + Tx_bottleneck Ack_Tx1 + Prop1 + Ack_Tx2 + Prop2 + Ack_Tx3 + Prop3 (that’s it!) …………… full tx time time G.Bianchi, G.Neglia, V.Mancuso time time time Pipelining performance numerical example On 28,8 kbps links On 128 kbps ISDN links D = 347 (tx segm1+ack) + RTT + + 160 (segm2 bottleneck) = = 571 ms D = 81,8 (tx segm1+ack) + RTT + + 33 (segm2 bottleneck) = = 178,8 ms THR = 1024*8/571 = = 14,3 kbps THR = 1024*8/178,8 = = 45,8 kbps on Gbps fiber optics? D = negligible + RTT = 64 ms THR = 1024*8/64 = 128 kbps G.Bianchi, G.Neglia, V.Mancuso Simplified performance model C bits/sec Approximate analysis, much simpler than multi-hop Typically, C = bottleneck link rate MSS = segment size MSIZE = message size Ignore overhead Ignore ACK transmission time No loss of segments W = number of outstanding segments W=1: stop-and-wait W>1: sliding window This is a highly dynamic parameter in TCP!! For now, consider W fixed G.Bianchi, G.Neglia, V.Mancuso W=1 case (stop-and-wait) sender receiver One way delay MSS/C RTT MSS throughput = RTT + MSS / C REMARK: throughput always lower than Available link rate! time G.Bianchi, G.Neglia, V.Mancuso time client server Start TCP connection Latency in TCP retrieval model RTT Latency: time elapsing between TCP connection Request, and last bit received at client request object RTT MSIZE ⎡ MSIZE ⎤ latency = 2 RTT + +⎢ − 1⎥ RTT C ⎢ MSS ⎥ Number of segments In which message is split time G.Bianchi, G.Neglia, V.Mancuso time W=1 case (stop-and-wait) MSS = 1500 bytes throughput Throughput (Kbps) 10000 C=28,8 kbps C=128 kbps C=640 kbps C=10 Mbps 1000 100 10 1 10 100 RTT (ms) Under-utilization with: 1) high capacity links, 2) large RTT links G.Bianchi, G.Neglia, V.Mancuso 1000 W=1 case (stop-and-wait) Utilization MSS = 1500 bytes C=28,8 kbps C=128 kbps C=640 kbps C=10 Mbps 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 1 10 100 RTT (ms) Under-utilization with: 1) high capacity links, 2) large RTT links G.Bianchi, G.Neglia, V.Mancuso 1000 Pipelining (W>1) analysis two cases RTT +1tx W=4 ? W=10 time time WINDOW SIZE that allows CONTINUOUS TRANSMISSION UNDER-SIZED WINDOW: THROUGHPUT INEFFICIENCY G.Bianchi, G.Neglia, V.Mancuso Continuous transmission Condition in which link rate is fully utilized MSS W⋅ C Time to transmit W segments MSS > RTT + C Time to receive Ack of first segment We may elaborate: W ⋅ MSS > RTT ⋅ C + MSS ≈ RTT ⋅ C This means that full link utilization is possible when window size (in bits) is Greater than the bandwidth (C bit/s) delay (RTT s) product! G.Bianchi, G.Neglia, V.Mancuso Bandwidth-delay product D C ÎNetwork: like a pipe ÎC [bit/s] x D [s] Ö number of bits “flying” in the network Ö number of bits injected in the network by the tx, before that the first bit is received at distance D s 64Kbps A 15360 (64000x0.240) bits “worm” in the air!! bandwidth-delay product = bytes # that saturates network pipe G.Bianchi, G.Neglia, V.Mancuso Bandwidth-delay product (usually considered) D RTT=2*D C D After D/4 s ÎC [bit/s] x RTT [s] Ö number of bits “flying” in the network (partly received) Ö number of bits injected in the network by the tx, before that the ack of the first bit is received G.Bianchi, G.Neglia, V.Mancuso Bandwidth-delay product (usually considered) D C D After D s ÎC [bit/s] x RTT [s] Ö number of bits “flying” in the network (partly received) Ö number of bits injected in the network by the tx, before that the ack of the first bit is received G.Bianchi, G.Neglia, V.Mancuso Bandwidth-delay product (usually considered) D C C*D/4 bits at the receiver D Ack After 5/4*D s ÎC [bit/s] x RTT [s] Ö number of bits “flying” in the network (partly received) Ö number of bits injected in the network by the tx, before that the ack of the first bit is received G.Bianchi, G.Neglia, V.Mancuso Bandwidth-delay product (usually considered) D C C*D bits at the receiver D Ack After 2D s ÎC [bit/s] x RTT [s] Ö number of bits “flying” in the network (partly received) Ö number of bits injected in the network by the tx, before that the ack of the first bit is received G.Bianchi, G.Neglia, V.Mancuso Bandwidth-delay product (usually considered) D1 C C*D2 bits at the receiver D2 Ack ÎC [bit/s] x RTT [s] Ö number of bits “flying” in the network (partly received) Ö number of bits injected in the network by the tx, before that the ack of the first bit is received G.Bianchi, G.Neglia, V.Mancuso After RTT=D1+D2 s Long Fat Networks LFNs (el-ef-an(t)s): large bandwidth-delay product NETWORK RTT (ms) rate (kbps) BxD (bytes) Ethernet 3 10.000 3.750 T1, transUS 60 1.544 11.580 T1 satellite 480 1.544 92.640 T3 transUS 60 45.000 337.500 Gigabit transUS 60 1.000.000 7.500.000 The 65535 (16 bit field in TCP header) maximum window size W may be a limiting factor! G.Bianchi, G.Neglia, V.Mancuso Pipelining (W>1) analysis W ⋅ MSS ⎛ ⎞ thr = min⎜ C , ⎟ ⎝ RTT + MSS / C ⎠ 2 RTT Delay analysis (for TCP object retrieval) – Non continuous transmission MSIZE latency = 2 RTT + + C (W − 1) MSS ⎞ ⎡ MSIZE ⎤⎛ +⎢ − 1⎥ ⎜ RTT − ⎟ W ⋅ MSS C ⎢ ⎥⎝ ⎠ RTT (+1tx) Continuous transmission case: W G.Bianchi, G.Neglia, V.Mancuso MSIZE latency = 2 RTT + C Throughput for pipelining MSS = 1500 bytes 1 Mbps link speed T h ro u g h p u t (K b p s ) 1200 1000 W=1 800 W=4 600 W=16 W=2 400 200 0 0 100 G.Bianchi, G.Neglia, V.Mancuso 200 300 RTT (ms) 400 500 600 Maximum achievable throughput Throughput (Mbps) (assuming infinite speed line…) 1000 W = 65535 bytes 100 10 1 0 100 200 300 RTT (ms) G.Bianchi, G.Neglia, V.Mancuso 400 500