tema 2: aspectos de encaminamiento
DESCRIPTION
Tema 2: Aspectos de encaminamiento. Algoritmos básicos de encaminamiento Link state Distance Vector Encaminamiento en Internet RIP OSPF BGP Multi-Protocol Label Switching (MPLS). IP multicast. routing algorithm. local forwarding table. header value. output link. 0100 0101 - PowerPoint PPT PresentationTRANSCRIPT
Transmisión de Datos Multimedia – http://www.grc.upv.es/docencia/tdm – Master IC 2007/2008
Tema 2: Aspectos de encaminamientoTema 2: Aspectos de encaminamiento
Algoritmos básicos de encaminamiento Link state Distance Vector
Encaminamiento en Internet RIP OSPF BGP
Multi-Protocol Label Switching (MPLS).IP multicast
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
2
1
23
0111
value in arrivingpacket’s header
routing algorithm
local forwarding tableheader value output link
0100010101111001
3221
Interplay between routing, forwardingComputer Networking: A Top
Down Approach Featuring the Internet,
3rd edition. Jim Kurose, Keith Ross
Addison-Wesley, July 2004.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
3
u
yx
wv
z2
2
13
1
1
2
53
5
Graph: G = (N,E)
N = set of routers = { u, v, w, x, y, z }
E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }
Graph abstraction
Remark: Graph abstraction is useful in other network contexts
Example: P2P, where N is set of peers and E is set of TCP connections
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
4
Graph abstraction: costs
u
yx
wv
z2
2
13
1
1
2
53
5 • c(x,x’) = cost of link (x,x’)
- e.g., c(w,z) = 5
• cost could always be 1, or inversely related to bandwidth,or inversely related to congestion
Cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)
Question: What’s the least-cost path between u and z ?
Routing algorithm: algorithm that finds least-cost path
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
5
Routing Algorithm classification
Global or decentralized information?Global: all routers have complete topology,
link cost info “link state” algorithmsDecentralized: router knows physically-connected
neighbors, link costs to neighbors iterative process of computation,
exchange of info with neighbors “distance vector” algorithms
Static or dynamic?Static: routes change slowly over timeDynamic: routes change more quickly
periodic update in response to link cost
changes
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
6
A Link-State Routing Algorithm: Dijsktra’s Algorithm
net topology, link costs known to all nodes
accomplished via “link state broadcast”
all nodes have same info computes least cost paths from
one node (‘source”) to all other nodes
gives forwarding table for that node
iterative: after k iterations, know least cost path to k destinations
Notation: c(x,y): link cost from node x to y; = ∞ if not
direct neighbors D(v): current value of cost of path from
source to destination v p(v): predecessor node along path from
source to v N': set of nodes whose least cost path
definitively known
1 Initialization: 2 N' = {u} 3 for all nodes v 4 if v adjacent to u 5 then D(v) = c(u,v) 6 else D(v) = ∞ 7 8 Loop 9 find w not in N' such that D(w) is a minimum 10 add w to N' 11 update D(v) for all v adjacent to w and not in N' : 12 D(v) = min( D(v), D(w) + c(w,v) ) 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N'
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
7
Dijkstra’s algorithm example
A F
B
D E
C2
2
2
3
1
1
1
3
5
step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)0 A 2, A 5, A 1, A ~ ~
5
B C D E F
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
8
Dijkstra’s algorithm example
A F
B
D E
C2
2
2
3
1
1
1
3
5
step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)0 A 2, A 5, A 1, A ~ ~1 AD 2, A 4, D 2, D ~
5
B C D E F
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
9
Dijkstra’s algorithm example
A F
B
D E
C2
2
2
3
1
1
1
3
5
step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)0 A 2, A 5, A 1, A ~ ~1 AD 2, A 4, D 2, D ~2 ADE 2, A 3, E 4, E
5
B C D E F
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
10
Dijkstra’s algorithm example
A F
B
D E
C2
2
2
3
1
1
1
3
5
step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)0 A 2, A 5, A 1, A ~ ~1 AD 2, A 4, D 2, D ~2 ADE 2, A 3, E 4, E3 ADEB 3, E 4, E
5
B C D E F
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
11
Dijkstra’s algorithm example
A F
B
D E
C2
2
2
3
1
1
1
3
5
step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)0 A 2, A 5, A 1, A ~ ~1 AD 2, A 4, D 2, D ~2 ADE 2, A 3, E 4, E3 ADEB 3, E 4, E4 ADEBC 4, E
5
B C D E F
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
12
Dijkstra’s algorithm example
A F
B
D E
C2
2
2
3
1
1
1
3
5
step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)0 A 2, A 5, A 1, A ~ ~1 AD 2, A 4, D 2, D ~2 ADE 2, A 3, E 4, E3 ADEB 3, E 4, E4 ADEBC 4, E
5
B C D E F
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
13
Dijkstra’s algorithm example
A
ED
CB
F
Resulting shortest-path tree from A:
BD
E
C
F
(A,B)(A,D)
(A,D)
(A,D)
(A,D)
destination link
Resulting forwarding table in A:
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
14
Dijkstra’s algorithm, discussion
Algorithm complexity: n nodes each iteration: need to check all nodes, w, not in N n(n+1)/2 comparisons: O(n2) more efficient implementations possible: O(nlogn)
Oscillations possible: e.g., link cost = amount of carried traffic
A
D
C
B1 1+e
e0
e
1 1
0 0
A
D
C
B2+e 0
001+e1
A
D
C
B0 2+e
1+e10 0
A
D
C
B2+e 0
e01+e1
initially… recompute
routing… recompute … recompute
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
15
Distance Vector Algorithm
Bellman-Ford Equation (dynamic programming)
Define: dx(y) := cost of least-cost path from x to y
Then
where min is taken over all neighbors v of x
vdx(y) = min {c(x,v) + dv(y) }
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
16
Bellman-Ford example
u
yx
wv
z2
2
13
1
1
2
53
5
Clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3
du(z) = min { c(u,v) + dv(z), c(u,x) + dx(z), c(u,w) + dw(z) } = min {2 + 5, 1 + 3, 5 + 3} = 4
Node that achieves minimum is nexthop in shortest path ➜ forwarding table
B-F equation says:
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
17
Distance Vector Algorithm
Node x maintains the Distance vector: Dx = [Dx(y): y є N ]
Where Dx(y) = estimate of least cost from x to y
Node x also maintains its neighbors’ distance vectors For each neighbor v, x maintains
Dv = [Dv(y): y є N ]
Node x knows the cost to each neighbor v: c(x,v)
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
18
Distance vector algorithm
Basic idea: Each node periodically sends its own distance vector
estimate to neighbors When a node x receives new DV estimate from neighbor, it
updates its own DV using B-F equation:
Under minor, natural conditions, the estimate Dx(y) converge to the actual least cost dx(y)
Dx(y) ← minv{c(x,v) + Dv(y)} for each node y ∊ N
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
19
Distance Vector Algorithm
Iterative, asynchronous: each local iteration caused by: local link cost change DV update message from
neighbor Distributed:
each node notifies neighbors only when its DV changes
neighbors then notify their neighbors if necessary
wait for (change in local link cost of msg from neighbor)
recompute estimates
if DV to any dest has
changed, notify neighbors
Each node:
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
20
x y z
xyz
0 2 7
∞ ∞ ∞∞ ∞ ∞
from
cost tofr
om
from
x y z
xyz
0 2 3
from
cost tox y z
xyz
0 2 3
from
cost to
x y z
xyz
∞ ∞
∞ ∞ ∞
cost tox y z
xyz
0 2 7
from
cost to
x y z
xyz
0 2 3
from
cost to
x y z
xyz
0 2 3fr
om
cost tox y z
xyz
0 2 7
from
cost to
x y z
xyz
∞ ∞ ∞7 1 0
cost to
∞2 0 1
∞ ∞ ∞
2 0 17 1 0
2 0 17 1 0
2 0 13 1 0
2 0 13 1 0
2 0 1
3 1 0
2 0 1
3 1 0
time
x z12
7
y
node x table
node y table
node z table
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)} = min{2+0 , 7+1} = 2
Dx(z) = min{c(x,y) + Dy(z), c(x,z) + Dz(z)} = min{2+1 , 7+0} = 3
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
21
Comparison of LS and DV algorithms
Message complexity LS: with n nodes, E links,
O(nE) msgs sent DV: exchange between
neighbors only convergence time varies
Speed of Convergence LS: O(n2) algorithm requires
O(nE) msgs may have oscillations
DV: convergence time varies may be routing loops count-to-infinity problem
Robustness: what happens if router malfunctions? LS:
node can advertise incorrect link cost
each node computes only its own table
DV: DV node can advertise
incorrect path cost each node’s table used by
others – error propagate thru network
Transmisión de Datos Multimedia – http://www.grc.upv.es/docencia/tdm – Master IC 2007/2008
Tema 2: Aspectos de encaminamientoTema 2: Aspectos de encaminamiento
Algoritmos básicos de encaminamiento Link state Distance Vector
Encaminamiento en Internet RIP OSPF BGP
Multi-Protocol Label Switching (MPLS).IP multicast
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
23
Hierarchical Routing
Our routing study thus far - idealization all routers identical network “flat” … not true in practice
scale: with 200 million destinations: can’t store all dest’s in routing tables! routing table exchange would swamp links!
administrative autonomy internet = network of networks each network admin may want to control routing in its own
network
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
24
Hierarchical Routing
Gateway router Direct link to router in another AS
aggregate routers into regions, “autonomous systems” (AS) routers in same AS run same routing protocol
“intra-AS” routing protocol routers in different AS can run different intra-AS routing
protocol
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
25
3b
1d
3a
1c2aAS3
AS1
AS21a
2c2b
1b
Intra-ASRouting algorithm
Inter-ASRouting algorithm
Forwardingtable
3c
Interconnected ASes
Forwarding table is configured by both intra- and inter-AS routing algorithm Intra-AS sets entries for
internal dests Inter-AS & Intra-As sets
entries for external dests
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
26
3b
1d
3a
1c2aAS3
AS1
AS21a
2c2b
1b
3c
Inter-AS tasks
Suppose router in AS1 receives datagram for which dest is outside of AS1 Router should forward
packet towards one of the gateway routers, but which one?
AS1 needs: to learn which dests are
reachable through AS2 and which through AS3
to propagate this reachability info to all routers in AS1
Job of inter-AS routing!
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
27
Learn from inter-AS protocol that subnet x is reachable via multiple gateways
Use routing infofrom intra-AS
protocol to determine
costs of least-cost paths to each
of the gateways
Hot potato routing:Choose the
gatewaythat has the
smallest least cost
Determine fromforwarding table the interface I that leads
to least-cost gateway. Enter (x,I) in
forwarding table
Example: Choosing among multiple ASes
Now suppose AS1 learns from the inter-AS protocol that subnet x is reachable from AS3 and from AS2.
To configure forwarding table, router 1d must determine towards which gateway it should forward packets for dest x.
This is also the job on inter-AS routing protocol! Hot potato routing: send packet towards closest of two
routers.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
28
Intra-AS Routing
Also known as Interior Gateway Protocols (IGP) Most common Intra-AS routing protocols:
RIP: Routing Information Protocol
OSPF: Open Shortest Path First
IGRP: Interior Gateway Routing Protocol (Cisco proprietary)
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
29
RIP ( Routing Information Protocol)
Distance vector algorithm Included in BSD-UNIX Distribution in 1982 Distance metric: # of hops (max = 15 hops)
DC
BA
u v
w
x
yz
destination hops u 1 v 2 w 2 x 3 y 3 z 2
From router A to subsets:
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
30
RIP advertisements
Distance vectors: exchanged among neighbors every 30 sec via Response Message (also called advertisement)
Each advertisement: list of up to 25 destination nets within AS
Table processing: RIP routing tables managed by application-level process called
route-d (daemon) advertisements sent in UDP packets, periodically repeated
physical
link
network forwarding (IP) table
Transprt (UDP)
routed
physical
link
network (IP)
Transprt (UDP)
routed
forwardingtable
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
31
RIP: Link Failure and Recovery
If no advertisement heard after 180 sec --> neighbor/link declared dead routes via neighbor invalidated new advertisements sent to neighbors neighbors in turn send out new advertisements (if tables
changed) link failure info quickly propagates to entire net poison reverse used to prevent ping-pong loops (infinite
distance = 16 hops)
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
32
OSPF (Open Shortest Path First)
“open”: publicly available Uses Link State algorithm
LS packet dissemination Topology map at each node Route computation using Dijkstra’s algorithm
OSPF advertisement carries one entry per neighbor router Advertisements disseminated to entire AS (via flooding)
Carried in OSPF messages directly over IP (rather than TCP or UDP
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
33
OSPF “advanced” features (not in RIP)
Security: all OSPF messages authenticated (to prevent malicious intrusion)
Multiple same-cost paths allowed (only one path in RIP) For each link, multiple cost metrics for different TOS (e.g.,
satellite link cost set “low” for best effort; high for real time)
Integrated uni- and multicast support: Multicast OSPF (MOSPF) uses same topology data base as
OSPF Hierarchical OSPF in large domains.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
34
Hierarchical OSPF
Two-level hierarchy: local area, backbone. Link-state advertisements
only in area each nodes has detailed
area topology; only know direction (shortest path) to nets in other areas.
Area border routers: “summarize” distances to nets in own area, advertise to other Area Border routers.
Backbone routers: run OSPF routing limited to backbone.
Boundary routers: connect to other AS’s.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
35
Internet inter-AS routing: BGP
BGP (Border Gateway Protocol): the de facto standard BGP provides each AS a means to:
1. Obtain subnet reachability information from neighboring ASs.2. Propagate the reachability information to all routers internal
to the AS.3. Determine “good” routes to subnets based on reachability
information and policy. Allows a subnet to advertise its existence to rest of the
Internet: “I am here”
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
36
BGP basics
Pairs of routers (BGP peers) exchange routing info over semi-permanent TCP connections: BGP sessions
Note that BGP sessions do not correspond to physical links. When AS2 advertises a prefix to AS1, AS2 is promising it
will forward any datagrams destined to that prefix towards the prefix. AS2 can aggregate prefixes in its advertisement
3b
1d
3a
1c2aAS3
AS1
AS21a
2c
2b
1b
3c
eBGP session
iBGP session
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
37
BGP route selection
Router may learn about more than 1 route to some prefix. Router must select route.
Elimination rules: Local preference value attribute: policy decision Shortest AS-PATH Closest NEXT-HOP router: hot potato routing Additional criteria
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
38
Why different Intra- and Inter-AS routing ?
Policy: Inter-AS: admin wants control over how its traffic routed, who
routes through its net. Intra-AS: single admin, so no policy decisions needed
Scale: hierarchical routing saves table size, reduced update traffic
Performance: Intra-AS: can focus on performance Inter-AS: policy may dominate over performance
Transmisión de Datos Multimedia – http://www.grc.upv.es/docencia/tdm – Master IC 2007/2008
Tema 2: Aspectos de encaminamientoTema 2: Aspectos de encaminamiento
Algoritmos básicos de encaminamiento Link state Distance Vector
Encaminamiento en Internet RIP OSPF BGP
Multi-Protocol Label Switching (MPLS).IP multicast
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
40
MPLS - The Motivation
IP Protocol Suite - the most predominant networking technology.
Voice & Data convergence on a single network infrastructure.
Continual increase in number of users. Demand for higher connection speeds. Increase in traffic volumes. Ever-increasing number of ISP networks.
MPLS Working Groups and Standards Standardized by the IETF - currently in Draft stage. MPLS recommendations are done by IP players for IP services MPLS core components are generic MPLS doesn’t use specific technology process
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
41
MPLS and ISO model
PPP
Physical (Optical - Electrical) 1
2
IP 3
4
Applications7to5
FrameRelay
ATM (*)
TCP UDP
PPP Frame Relay ATM (*)
MPLS
(*) ATM overlay model (without addressing and P-NNI) is considered as an ISO layer 2 protocol.
IETF main goal is that when a layer is added, no modification is needed on the existing layers.All new protocol must be backward compatible
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
42
Some MPLS Terms...
LER - Label Edge Router LSR - Label Switch Router FEC - Forward Equivalence Class Label - Associates a packet to a FEC Label Stack - Multiple labels containing information on how
a packet is forwarded. Shim - Header containing a Label Stack Label Switch Path - path that a packet follows for a specific
FEC LDP - Label Distribution Protocol, used to distribute Label
information between MPLS-aware network devices Label Swapping - manipulation of labels to forward packets
towards the destination.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
43
Routing protocol OSPF OSPF OSPF
Attributes Precedence
Local tableLabel table Local table Local table
LSP (Label-Switched Paths) Label swapping Label removal
ClassificationLabel assignment
IngressNode
CoreNode
EgressNode
Label SwitchLayer 2
Layer 1
Layer 2
Layer 1
Layer 2
Layer 1
Layer 2
Layer 1
Layer 2
Layer 1
MPLS Architecture
FEC table Local table Local table Local table
Forward Equivalence Class
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
44
Label swapping
Label removal
ClassificationLabel assignment
Label swapping
Label removal
ClassificationLabel assignment
OSPF / RIP / IS-IS
Label Switch Path
Label table
IngressNode
CoreNode
EgressNode
Layer 2
Layer 1
Layer 2
Layer 1
Layer 2
Layer 1
Precedence
Label table Label table
Layer 2
Layer 1
Layer 2
Layer 1
FEC FEC FEC
MPLS process
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
45
LSR
LER
LSR
LER
IP PacketIP Packet w/ Label
L3 RoutingL3 Routing
Label SwappingLabel Swapping
LER: Label Edge Router
LERLER
L3 RoutingL3 Routing
L3 Routing
MPLS Cloud
LSR: Label Switch(ing) Router
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
46
Ingress Label FEC Egress Label
6 138.120.6/24 - xxxx 9
Ingress Label AttributeFEC Egress LabelIngress Label FEC Egress Label
6 138.120.6/24 - xxxx 9
Attribute
A
6 138.120.6/24 - xxxx 12B
•FECs are manually initiated by the operator
•A FEC is associated to at least one Label
FEC Classification
A packet can be mapped to a particular FEC based on the following criteria: destination IP address, source IP address, TCP/UDP port, in case of inter AS-MPLS, Source-AS and Dest-AS, class of service, application used, … any combination of the previous criteria.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
47
L2 Type L2 TypePort PortIngress Label Egress LabelFEC
ATM 1-1 12 (i.e. 4/65) F1 22 (i.e. 5/65)3-4ATM
ATM 1-1 15 (i.e. 0/25) F4 9 (i.e. 101) 5-1FR
Gig Eth 5-1 7 F1 22 (i.e. 4/65)3-4ATM
What is a Label?
A label is a short, fixed length, locally significant identifier used to identify a FEC.
The label can be identified by the L2 technology identifier (e.g. VPI/VCI for ATM, DLCI for FR or MPLS label for PPP/Ethernet).
MPLS Label Assignment Schemes Topology Driven
Label assignment in response to routing protocols (OSPF and BGP) updates
Control Driven Label assignment in response to RSVP, CR-LDP requests
Traffic Driven Label assignment in response to flow detection & triggering
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
48
Label(20 bits)
Exp(3 bits)
S(1 bit)
TTL(8bits)
Label : Label value (0 to 15 are reserved for special use)
Exp : Experimental UseS : Bottom of Stack (set to 1 for the last entry in the label)
TTL : Time To Live
The MPLS Shim Header
The Label (Shim Header) is represented as a sequence of Label Stack Entry
Each Label Stack Entry is coded by 4 bytes (32 bits) as described 20 Bits is reserved for the Label Identifier (also named Label)
Based on the contents of the label a swap, push (impose) or pop (dispose) operation can be performed on the packet's label stack
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
49
Label Switched Path
5 12
IngressInterface
IngressLabel
FEC EgressInterface
EgressLabel
1 138.120 312
IngressInterface
IngressLabel
FEC EgressInterface
EgressLabel
1 138.120 x4
53
IngressInterface
IngressLabel
FEC EgressInterface
EgressLabel
1 x 138.120
MPLS switch
MPLS switch
MPLS switch
MPLS switch1
2
3
1 2
3
1
2
3
41
2
3
138.120
192.168127.20
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
50
MPLS switch
MPLS switch
MPLS switch
MPLS switch1
2
3
1 2
3
1
2
3
41
2
3
138.120
192.168127.20138.120.6.12
138.120.6.12138.120.6.12
138.120.6.12 138.120.6.12
138.120.6.12
??138.120.6.12
Default3
IngressInterface
IngressLabel
FEC EgressInterface
EgressLabel
1 x None
??
138.120.6.12
Default Default
IngressInterface
IngressLabel
FEC EgressInterface
EgressLabel
1 None 3
??
138.120.6.12 ??138.120.6.12
Default
IngressInterface
IngressLabel
FEC EgressInterface
EgressLabel
1 None x4
??138.120.6.12
Hop by Hop IP forwarding
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
51
5 12
IngressInterface
IngressLabel
FEC EgressInterface
EgressLabel
1 138.120 312
IngressInterface
IngressLabel
FEC EgressInterface
EgressLabel
1 138.120 x4
53
IngressInterface
IngressLabel
FEC EgressInterface
EgressLabel
1 x 138.120
MPLS switch
MPLS switch
MPLS switch
MPLS switch1
2
3
1 2
3
1
2
3
41
2
3
138.120
192.168127.20
138.120.6.12
138.120.6.12
138.120.6.12
138.120.6.12 138.120.6.12
IP forwarding using Label Switch Path
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
52
MPLS Label Distribution Protocol
LDP - a set of procedures by which one LSR informs the other of the FEC-to-Label binding it has made.
Currently, several protocols used as Label Distribution Protocol (LDP) are available: RSVP-TE (MPLS extension) LDP and CR-LDP BGP-4 MPLS extensions
Label Distribution schemes
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
53
Downstream stream on demand
Mapping 12Mapping 5
5 12
IngressInterface
IngressLabel
FEC EgressInterface
EgressLabel
1 138.120 312
IngressInterface
IngressLabel
FEC EgressInterface
EgressLabel
1 138.120 x4
53
IngressInterface
IngressLabel
FEC EgressInterface
EgressLabel
1 x 138.120
Request 138.120Request 138.120
MPLS switch
MPLS switch
MPLS switch
MPLS switch1
2
3
1 2
3
1
2
3
41
2
3
138.120
192.168127.20
The label is requested by the upstream node and the downstream node defines the label used.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
54
Unsolicited Downstream
MPLS switch
MPLS switch
MPLS switch
MPLS switch1
2
3
1 2
3
1
2
3
41
2
3
138.120
192.168127.20
Mapping 12Mapping 5
5 1212
5
IngressInterface
IngressLabel
FEC EgressInterface
EgressLabel
1 138.120 3
IngressInterface
IngressLabel
FEC EgressInterface
EgressLabel
1 138.120 x4
3
IngressInterface
IngressLabel
FEC EgressInterface
EgressLabel
1 x 138.120
The downstream node defines the label and advertises it to the upstream node.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
55
Edge LSR Features
Routing protocols FEC Classification Initiates LSP setup for Downstream On Demand method Adaptation of non-MPLS data to MPLS data Layer 2 translation for MPLS data Terminated MPLS-VPN At least one LDP protocol Edge LSR is counted into the TTL count as a regular router
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
56
Core LSR Features
Routing protocols Propagates Downstream On Demand method (request and
mapping) Layer 2 translation High speed label forwarding/switching At least one LDP protocol
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
57
MPLS Advantages
Simplified Forwarding Efficient Explicit Routing Traffic Engineering QoS Routing Mappings from IP Packet to Forwarding Equivalence Class
(FEC) Partitioning of Functionality Common Operation over Packet and Cell media
Transmisión de Datos Multimedia – http://www.grc.upv.es/docencia/tdm – Master IC 2007/2008
Tema 2: Aspectos de encaminamientoTema 2: Aspectos de encaminamiento
Algoritmos básicos de encaminamiento Link state Distance Vector
Encaminamiento en Internet RIP OSPF BGP
Multi-Protocol Label Switching (MPLS).IP multicast
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
59
Multicast = Efficient Data Distribution
Src Src
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
60
Why Multicast ?
Need for efficient one-to-many delivery of same data Applications:
News/sports/stock/weather updates Distance learning Configuration, routing updates, service location Pointcast-type “push” apps Teleconferencing (audio, video, shared whiteboard, text editor) Distributed interactive gaming or simulations Email distribution lists Content distribution; Software distribution Web-cache updates Database replication
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
61
Why Not Broadcast or Unicast?
Broadcast: Send a copy to every machine on the net Simple, but inefficient All nodes must process packet even if they don’t care Wastes more CPU cycles of slower machines (“broadcast
radiation”) Network loops lead to “broadcast storms”
Replicated Unicast: Sender sends a copy to each receiver in turn Receivers need to register or sender must be pre-configured Sender is focal point of all control traffic Reliability => per-receiver state, separate sessions/processes
at sender
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
62
Multicast Apps Characteristics
Number of (simultaneous) senders to the group The size of the groups
Number of members (receivers) Geographic extent or scope Diameter of the group measured in router hops
The longevity of the group Number of aggregate packets/second The peak/average used by source Level of human interactivity
Lecture mode vs interactive Data-only (eg database replication) vs multimedia
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
63
Reliable Multicast vs. Unreliable Multicast
When a multicast message is sent by a process, the runtime support of the multicast mechanism is responsible for delivering the message to each process currently in the multicast group.
As each participating process may be on a separate host, due to factors such as failures of network links and/or network hosts, routing delays, and differences in software and hardware, the time between when a message is sent and when it is received may vary among the recipient processes.
Moreover, a message may not be received by one or more of the processes at all.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
64
Classification of multicasting mechanisms in terms of message delivery
Unreliable multicast: The arrival of the correct message at each process is not
guaranteed. Reliable multicast:
Guarantees that each message is eventually delivered in a non-corrupted form to each process in the group.
The definition of reliable multicast requires that each participating process receives exactly one copy of each message sent. It does not put any restriction of the order the messages delivered.
Reliable multicast can be further classified based on the order of the delivery of the messages: unordered, FIFO, causal order, atomic order.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
65
Classification of reliable multicast -- unordered
An unordered reliable multicast system guarantees the safe delivery of each message, but it provides no guarantee on the delivery order of the messages.
Example: Processes P1, P2, and P3 have formed a multicast group. Three messages, m1, m2, m3 have been sent to the group. An unordered reliable multicast system may deliver the messages to each of the three processes in any of these:
m1-m2-m3,m1-m3-m2, m2-m1-m3, m2-m3-m1, m3-m1-m2, m3-m2-m1
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
66
Classification of reliable multicast - FIFO
If process P sent messages mi and mj, in that order, then each process in the multicast group will be delivered the messages mi and mj, in that order.
Note that FIFO multicast places no restriction on the delivery order among messages sent by different processes. For example, P1 sends messages m11 then m12, and P2 sends messages m21 then m22. It is possible for different processes to receive any of the following orders:
m11-m12-m21-m22,m11-m21-m12-m22, m11-m21-m22-m12, m21-m11-m12-m22 m21-m11-m22-m12 m21-m22-m11-m12.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
67
Classification of reliable multicast – Causal order
If message mi causes (results in) the occurrence of message mj, then mi will be delivered to each process prior to mj. Messages mi and mj are said to have a causal or happen-before relationship.
For example, P1 sends a message m1, to which P2 replies with a multicast message m2. Since m2 is triggered by m1, the two messages share a causal relationship of m1-> m2. A causal-order multicast message system ensures that these two messages will be delivered to each of the processes in the order of m1- m2.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
68
Classification of reliable multicast – Atomic order
In an atomic-order multicast system, all messages are guaranteed to be delivered to each participant in the exact same order. Note that the delivery order does not have to be FIFO or causal, but must be identical for each process.
Example: P1 sends m1, P2 sends m2, and P3 sends m3.
An atomic system will guarantee that the messages will be delivered to each process in only one of the six orders:
m1-m2- m3, m1- m3- m2, m2- m1-m3, m2-m3-m1, m3-m1- m2, m3-m2-m1.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
69
IP Multicast Architecture
Hosts
Routers
Service modelService model
Host-to-router protocol(IGMP)
Multicast routing protocols(various)
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
70
IP Multicast model: RFC 1112
Message sent to multicast “group” (of receivers) Senders need not be group members A group identified by a single “group address”
Use “group address” instead of destination address in IP packet sent to group
Groups can have any size; Group members can be located anywhere on the Internet Group membership is not explicitly known Receivers can join/leave at will
Packets are not duplicated or delivered to destinations outside the group Distribution tree constructed for delivery of packets No more than one copy of packet appears on any subnet Packets delivered only to “interested” receivers => multicast
delivery tree changes dynamically Network has to actively discover paths between senders and
receivers
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
71
IP Multicast Addresses
Class D IP addresses 224.0.0.0 – 239.255.255.255
Address allocation: Well-known (reserved) multicast addresses, assigned by IANA:
224.0.0.x and 224.0.1.x Transient multicast addresses, assigned and reclaimed dynamically, e.g., by “sdr” program
Each multicast address represents a group of arbitrary size, called a “host group”
There is no structure within class D address space like subnetting => flat address space
1 1 1 0 Group ID
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
72
IP Multicast Service
Sending Uses normal IP-Send operation, with an IP multicast address
specified as the destination Must provide sending application a way to:
Specify outgoing network interface, if >1 available Specify IP time-to-live (TTL) on outgoing packet Enable/disable loop-back if the sending host is/isn't a member of
the destination group on the outgoing interface
Receiving Two new operations
Join-IP-Multicast-Group(group-address, interface) Leave-IP-Multicast-Group(group-address, interface)
Receive multicast packets for joined groups via normal IP-Receive operation
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
73
Link-Layer Transmission/Reception
Transmission IP multicast packet is transmitted as a link-layer multicast, on
those links that support multicast Link-layer destination address is determined by an algorithm
specific to the type of link Reception
Necessary steps are taken to receive desired multicasts on a particular link, such as modifying address reception filters on LAN interfaces
Multicast routers must be able to receive all IP multicasts on a link, without knowing in advance which groups will be used
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
74
Using Link-Layer Multicast Addresses
Ethernet and other LANs using 802 addresses: Direct mapping! Simpler than unicast! No ARP etc.
32 class D addresses may map to one MAC address Special OUI for IETF: 0x01-00-5E. No mapping needed for point-to-point links
LAN multicast address
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0
1 1 1 0 28 bits
23 bits
IP multicast address
Group bit
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
75
Multicast over LANs & Scoping
Multicasts are flooded across MAC-layer bridges along a spanning tree But flooding may steal sending opportunity for non-member
stations which want to transmit Almost like broadcast!
Scope: How far do transmissions propagate? Implicit scoping: Reserved Mcast addresses => don’t leave
subnet. Also called “link-local” addresses
TTL-based scoping: Multicast routers have a configured TTL threshold Multicast datagram dropped if TTL <= TTL threshold Useful as a blanket parameter.
Administrative scoping: Use a portion of class D address space (239.0.0.0 thru
239.255.255.255) Truly local to admin domain; address reuse possible. In IPv6, scoping is an internal attribute of an IPv6 multicast address
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
76
Multicast Scope Control – Small TTLs
TTL expanding-ring search to reach or find a nearby subset of a group
Rings can be nested, but not overlapping
s
1
2
3
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
77
IP Multicast Architecture
Hosts
Routers
Service model
Host-to-router protocolHost-to-router protocol(IGMP)(IGMP)
Multicast routing protocols(various)
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
78
Internet Group Management Protocol
IGMP: “signaling” protocol to establish, maintain, remove groups on a subnet.
Objective: keep router up-to-date with group membership of entire LAN Routers need not know who all the members are, only that
members exist Each host keeps track of which mcast groups are
subscribed to Socket API informs IGMP process of all joins
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
79
How IGMP Works
On each link, one router is elected the “querier” Querier periodically sends a Membership Query message to
the all-systems group (224.0.0.1), with TTL = 1 On receipt, hosts start random timers (between 0 and 10
seconds) for each multicast group to which they belong
QRouters:
Hosts:
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
80
How IGMP Works (cont.)
When a host’s timer for group G expires, it sends a Membership Report to group G, with TTL = 1
Other members of G hear the report and stop (suppress) their timers
Routers hear all reports, and time out non-responding groups
Q
G G G G
Routers:
Hosts:
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
81
How IGMP Works (cont.)
Normal case: only one report message per group present is sent in response to a query Query interval is typically 60-90 seconds
When a host first joins a group, it sends immediate reports, instead of waiting for a query
IGMPv2: Hosts may send a “Leave group” message to “all routers” (224.0.0.2) address Querier responds with a Group-specific Query message: see if
any group members are present Lower leave latency
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
82
The Java Basic Multicast API
At the transport layer, the basic multicast supported by Java is an extension of UDP (the User Datagram Protocol)
For the basic multicast, Java provides a set of classes which are closely related to the datagram socket API classes
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
83
Datagram - recap
a by te a rra y
a D a ta g ra m Pa ck e t o bje ct
r e c e i ve r 'sa d d r e s s
a D a ta g ra m S o ck e t o bje ct
s e nde r pro c e s s
a by te a rra y
a D a ta g ra m Pa ck e t o bje ct
a D a ta g ra m S o ck e t o bje ct
re c e ive r pro c e s s
s e n d
re ce iv e
o bje ct re f e re n ce
da ta f lo w
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
84
The Java Basic Multicast API - 2
There are four major classes in the API, the first three of which we have already seen in the context of datagram sockets.
InetAddress: In the datagram socket API, this class represents the IP address of the sender or receiver. In multicasting, this class can be used to identify a multicast group.
DatagramPacket: As with datagram sockets, an object of this class represents an actual datagram; in multicast, a DatagramPacket object represents a packet of data sent to all participants or received by each participant in a multicast group.
DatagramSocket: In the datagram socket API, this class represents a socket through which a process may send or receive data.
MulticastSocket : A MulticastSocket is a DatagramSocket, with additional capabilities for joining and leaving a multicast group. An object of the multicast datagram socket class can be used for sending and receiving multicast packets. In the Java API, a MulticastSocket object is bound to a port address, e.g. 3456, and methods of the object allows for the joining and leaving of a multicast address, e.g. 239.1.2.3
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
85
Joining a multicast group
To join a multicast group at IP address m and UDP port p, a MulticastSocket object must be instantiated with p, then the object’s joinGroup method can be invoked specifying the address m:
// join a Multicast group at IP address // 239.1.2.3 and port 3456
InetAddress group = InetAddress.getByName("239.1.2.3"); MulticastSocket s = new MulticastSocket(3456); s.joinGroup(group);
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
86
Sending to a multicast group
A multicast message can be sent using syntax similar with the datagram socket API.
String msg = "a multicast message."; InetAddress group = InetAddress.getByName("239.1.2.3"); MulticastSocket s = new MulticastSocket(3456); s.joinGroup(group); // optional DatagramPacket hi = new DatagramPacket(msg.getBytes( ),
msg.length( ),group, 3456); s.send(hi);
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
87
Receiving messages sent to a multicast group
A process that has joined a multicast group may receive messages sent to the group using syntax similar to receiving data using a datagram socket API.
byte[] buf = new byte[1000]; InetAddress group =
InetAddress.getByName("239.1.2.3");
MulticastSocket s =
new MulticastSocket(3456);
s.joinGroup(group);
DatagramPacket recv =
new DatagramPacket(buf,buf.length);
s.receive(recv);
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
88
Leaving a multicast group
A process may leave a multicast group by invoking the leaveGroup method of a MulticastSocket object, specifying the multicast address of the group.
s.leaveGroup(group);
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
89
Setting the “time-to-live”
The runtime support needs to propagate a multicast message from a host to a neighboring host in an algorithm which, when executed properly, will eventually deliver the message to all the participants.
Under some anomalous circumstances, however, it is possible that the algorithm which controls the propagation does not terminate properly, resulting in a packet circulating in the network indefinitely.
Indefinite message propagation causes unnecessary overhead on the systems and the network.
To avoid this occurrence, it is recommended that a “time to live” parameter be set with each multicast datagram.
The time-to-live (ttl) parameter, when set, limits the count of network links or hops that the packet will be forwarded on the network.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
90
Setting the “time-to-live”
The recommended ttl settings are: 0 if the multicast is restricted to processes on the same
host 1 if the multicast is restricted to processes on the same
subnet 32 if the multicast is restricted to processes on the same
site 64 if the multicast is restricted to is processes on the same
region 128 is if the multicast is restricted to processes on the
same continent 255 is the multicast is unrestricted
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
91
Setting the “time-to-live”
String msg = "Hello everyone!"; InetAddress group =
InetAddress.getByName("239.1.2.3"); MulticastSocket s = new MulticastSocket(3456); s.setTimeToLive(1);
// set time-to-live to 1 hop DatagramPacket hi =
new DatagramPacket(msg.getBytes( ), msg.length( ),group, 3456);
s.send(hi);
The value specified for the ttl must be in the range 0 <= ttl <= 255; an IllegalArgumentException will be thrown otherwise.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
92
The C version: Joining Multicast Groups
To join a group, you use the setsockopt() kernel service call with a new parameter. The new parameter is the ip_mreq structure:
The imr_multiaddr field specifies the multicast group you want to join. It is the same format as the sin_addr field in the sockaddr_in structure. The imr_interface field lets you choose a particular host interface. This is similar to a bind(), which lets you specify the host interface (or leave the host option wide open with an INADDR_ANY value).
/************************************************************//*** The ip_mreq structure for selecting a multicast addr ***//************************************************************/struct ip_mreq{ struct in_addr imr_multiaddr; /* known multicast group */ struct in_addr imr_interface; /* network interface */} ;
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
93
The C version: Joining Multicast Groups
The following code snippet shows you how to join a group using the ip_mreq structure. It sets the imr_interface field to INADDR_ANY merely for demonstration. Do not use it unless you have only one interface on your host; the results can be unpredictable .
/************************************************************//*** Join a multicast group ***//************************************************************/const char *GroupID = "224.0.0.10";struct ip_mreq mreq;if ( inet_aton(GroupID, &mreq.imr_multiaddr) == 0 ) panic("address (%s) bad", GroupID);mreq.imr_interface.s_addr = INADDR_ANY;if ( setsockopt(sd, SOL_IP, IP_ADD_MEMBERSHIP,&mreq,sizeof(mreq))!= 0) panic("Join multicast failed");
/************************************************************//*** Drop a multicast group ***//************************************************************/if ( setsockopt(sd, SOL_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq)) != 0 ) panic("Drop multicast failed");
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
94
IP Multicast Architecture
Hosts
Routers
Service model
Host-to-router protocol(IGMP)
Multicast routing protocols
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
95
Multicast Routing
Basic objective – build distribution tree for multicast packets The “leaves” of the distribution tree are the subnets containing
at least one group member (detected by IGMP)
Multicast service model makes it hard Anonymity Dynamic join/leave
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
96
Simple Multicast Routing Techniques
Flood and prune Begin by flooding traffic to entire network Prune branches with no receivers Examples: DVMRP, PIM-DM Unwanted state where there are no receivers
Link-state multicast protocols Routers advertise groups for which they have receivers to
entire network Compute trees on demand Example: MOSPF Unwanted state where there are no senders
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
97
How to Flood Efficiently ?
A router forwards a packet from source (S) iff it arrives via the shortest path from the router back to S Reverse path check!
Packet is replicated out all but the incoming interface Reverse shortest paths easy to compute just use info in
DV routing tables DV gives shortest reverse paths Efficient if costs are symmetric
xxyy
tt
SS
a
zz
Forward packets that arriveon shortest path from “t” to “S” (assume symmetric routes)
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
98
Problem
Flooding can cause a given packet to be sent multiple times over the same link: can filter better than this!
Solution: Reverse Path Broadcasting
xx yy
zz
SS
a
b
duplicate packet
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
99
Reverse Path Broadcasting (RPB)
Basic idea: forward a packet from S only on child links for S Child link of router x for source S: link that has x as parent
on the shortest path from the link to S
xx yy
zz
SS
a
b
5 6
child link of xfor S
forward onlyto child link
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
100
How to Find Child Links?
Routing updates ! If z tells x that it can reach S through y, and if the cost of this path is >= the cost of the path from z to
S through x, then x knows that the link to z is a child link
In case of tie, lower address wins
xx yy
zz
SS
a
b
5 6
child link of xfor S
forward onlyto child link
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
101
Truncated RPB
This is still a broadcast algorithm – the traffic goes everywhere – lousy filtering!
First order solution: Truncated RPB Don't forward traffic onto networks with no receivers Identify leaves
Leaf links are the child links that no other router uses to reach source S
Use periodic updates of form: – “this is my next-link to source S”
If child is not the “next-link” for anyone, it is a leaf
Detect group membership in leaf (IGMP)
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
102
Reverse Path Multicast (RPM)
Prune back transmission so that only absolutely necessary links carry traffic
Use on-demand pruning so that router group state scales with number of active groups
xx yy
tt
SS
vv bbaa
aa bb
data messageprune message
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
103
Basic RPM Idea
Prune (Source,Group) at leaf if no members Send Non-Membership Report (NMR) up the tree
If all children of router R prune (S,G) Propagate prune for (S,G) to parent R
On timeout: Prune dropped Flow is reinstated Down stream routers re-prune Note: this is a soft-state approach
Grafting: Explicitly reinstate sub-tree when IGMP detects new members at leaf, or when a child asks for a
graft.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
104
Putting it together: Topology
G G
S
G
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
105
Flood with Truncated Broadcast
G G
S
G
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
106
Pruning
G G
S
Prune (s,g)
Prune (s,g)
G
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
107
Graft (s,g)
Graft (s,g)
Grafting
G G
S
G
G
Report (g)
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
108
After Grafting Complete
G G
S
G
G
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
109
Reliable Multicast: The Goal
Implement reliability on top of IP multicast Why is this hard ?
Sender cannot keep state for unknown number of dynamic receivers
Remember open & dynamic group semantic? Algorithms like TCP that estimate path properties such as RTT
and congestion window don’t generalize to trees. Remember: TCP is only for a unicast session!
Has to address (N)ACK implosions
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
110
R1
Implosion
S
R3 R4
R2
21
R1
S
R3 R4
R2
Packet 1 is lost All 4 receivers request a resend
Resend request
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
111
Retransmission
Re-transmitter Options: sender, other receivers
How to retransmit Unicast, multicast, scoped multicast, retransmission group, …
Problem: retransmissions (aka repairs) may reach destinations that don’t require a retransmission A.k.a “exposure” problem Solution: subcast the re-transmission only to receivers that
need it.
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
112
R1
Why Subcast? Exposure problem…
S
R3 R4
R2
21
R1
S
R3 R4
R2
Packet 1 does not reach R1;Receiver 1 requests a resend
Packet 1 resent to all 4 receivers
1
1
Resend request Resent packet
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
113
Ideal Recovery Model
S
R3 R4
R2
2
1
S
R3 R4
R2
Packet 1 reaches R1 but is lost before reaching other Receivers
Only one receiver sends NACK to the nearest S or R with packet
Resend request
1 1Resent packet
Repair sent only to those that need packet
R1 R1
Tra
nsm
isió
n d
e D
ato
s M
ult
imedia
- M
ast
er
IC 2
00
7/2
00
8
114
Reliable Multicast Transport: Issues
Retransmission can make reliable multicast as inefficient as replicated unicast (N)ACK-implosion if all destinations ack at once “Crying baby”: a bad link affects entire group
Heterogeneity: receivers, links, group sizes Anonymous/Open/Dynamic Group Model:
Source does not know # of destinations, and destinations may vanish
Multicast applications do not need strong reliability of the type provided by TCP. Can tolerate some reordering, delay, etc