All rights reserved. Do not copy or redistribute in any form. Copyright © 2002-2005 Board of Trustees of the University of Illinois. Copyright © 2003 - 2005 Ohio Supercomputer Center. Copyright © 2002- 2005 University of Chicago. All Rights Reserved. Copyright © 2003 - 2005 Trustees of Boston University.
This tutorial is designed to assist an Access Grid (AG) Node Operator (Node Op) to evaluate multicast network problems. The following lessons will introduce tools and methods to determine if there is an AG network problem, to pinpoint the location of any network failure, and to give tips for solving any identified problems. This tutorial focuses only on the discovery and resolution of problems related to multicast networking and not on unicast networking problems, or local AG hardware or software failure. After this tutorial, you should have sufficient knowledge to detect Access Grid multicast problems and to communicate effectively with local networking staff on the resolution of these problems.
This tutorial is for technical individuals who plan to operate an AG Node or for persons responsible for monitoring the network. These lessons assume that users have at least a minimal knowledge of network operations.
1. How the AG uses multicast
2. Determining if there is a multicast problem
3. Diagnosing network problems using local source tools
4. Locating the network failure using network source tools
5. Gathering more data through network sources
6. Working around network problems
7. The future of multicast and the AG
Most of the material in this tutorial is relevant for users of any
version of the Toolkit. Whenever a section is applicable only for users
of a certain version, it will be indicated in the text, and/or marked
with one of these icons:
or 
bold text indicates you should type the command
bold, italicized text indicates text that needs to
be replaced with your information
For information on WebCT, please see the "Using WebCT" link in the navigation bar to the left. This document contains navigational information, tips about style sheets and pointers for further assistance. For questions or comments about the tutorial content, please use the Discussion Space. This is available to all users and is continually monitored by the content providers. For additional information concerning the Access Grid, please see the "AG Support" link in the navigation bar to the left. Here you will find technical mailing lists and information on the AG community MOO.
To locate network failures on the Access Grid, it is important to understand how information is transmitted among AG Node sites. AG Node components use multicasting to send data streams across the network. Multicasting can be viewed as a one-to-many data stream. Multicasting allows a single output stream to branch off to multiple AG Node locations and to be sent only to the AG node locations requesting the service. Multicasting greatly reduces bandwidth needs and ensures that all AG parties receive identical information at almost the exact same time.
You can often detect network failures by monitoring the status and traffic of the multicast network. Failures can occur at any location along the path of the data stream. However, we find that most broadcast problems occuring during an AG session are not network related. The following lessons will help you determine if a problem that occurs during an AG session is related to the network.
Very infrequently,
network problems are unicast related. The bulk of non-AG network traffic is
unicast. This document does not cover unicast issues, but these clues may indicate
unicast network problems:
If a local machine cannot send or receive Internet signals, a unicast problem is likely and should be reported to the appropriate network personnel.
More information on troubleshooting unicast network problems can be found at: http://www.redhat.com
Routers are
not normally configured for multicast traffic. To send and receive multicast,
every router along the data stream must be multicast enabled.
![]()
You can have many different types of problems during an Access Grid session. Some things to look for that may indicate network problems are:
An audio echo is usually caused by a configuration problem with the Gentner and
does not normally indicate a network failure.
While most network problems will be obvious, sometimes you will not immediately realize there is a problem, such as with one-way connectivity. This is why it is important to continuously monitor the AG session using these local tools:


Check the status of the multicast beacon
(http://beaconserver.accessgrid.org:9999/beacon-php/beacon.php)
before any Access Grid event. This is a useful tool which may alert you to any
potential multicast problems.
![]()
If you notice a problem that appears to be multicast related, try to rule out other causes before contacting network personnel. This may assist you in solving the problem on your own or help network personnel find the problem. There are many different tools available to assist in diagnosing a network failure.
This lesson provides an overview of these tools.
Local Source Tools
| Internet2 Detective | An application which allows you to test several properties of your network. |
| Moo or text chat | A text-based backchannel used for communication among persons staffing an AG event |
| rat | Software that lets multiple AG users have an audio conference over the Internet in multicast mode |
| vic | Software tool for videoconferencing |
| ping | A classic tool for measuring hop-to-hop latency and packet loss |
Internet2 Dectective
Internet2 Detective is an application which allows you to check several properties of your network connection, including whether support for multicast is enabled. It is very easy to download and install for both Windows and Mac; the source code is also available, creating the possibility of installing it on other systems as well.

Detailed information about Internet2 Detective is available at http://detective.internet2.edu/.
Using text chat
Text chat can be a useful tool for debugging network problems. If you are in an AG session and are having difficulties, using text chat can help you determine whether the problems are exclusive to your site, or occuring at other sites as well.
Ask the other Node Ops if they are experiencing any problems. Questions to ask include:
The
text chats are unicast based programs. Inability to connect to the text chat server is
associated with unicast network problems.
Talking to the other Node Ops will help you to determine if the problem is confined only to your site or widespread. Many Node Ops have an advanced knowledge of the Access Grid. They may also be able to assist you in locating any local AG problems.
For instructions on using text chat, please see the AGDP document, How to Use Text Chat for Access Grid.
rat (Robust Audio Tool)
Another important tool for diagnosing network problems is rat, which allows
multiple users to engage in an audio conference over the Internet in multicast
mode. This tool shows network information in a similar way as the multicast
beacon, except it shows audio packet loss between the local site and only those
sites in the same Virtual Venue.
The rat software
is available for Linux, FreeBSD, Solaris, Irix and Windows 95/98/Me/NT/2000.
First, check that other sites participating in the AG event are visible in the rat window. If not, make sure the Virtual Venue is correct and that the IP address, port number and TTL (see below) are correct in the Audio Resource Manager (ARM).
TTL is an
acronym for Time-to-Live. TTL is a displayed value in an Internet Protocol (IP)
packet that tells a network router whether or not the packet has been active
in the network too long and should be discarded. For several reasons, packets
may not be delivered to their destination in a reasonable length of time. For
example, a combination of incorrect routing tables could cause a packet to loop
endlessly. A solution is to discard the packet after a certain time and send
a message to the packet's originator, who can decide whether to resend the packet.
Each router that receives a packet subtracts one from the count in the TTL field.
When the count reaches zero, the router detecting it discards the packet and
sends an Internet Control Message Protocol (ICMP) message back to the originating
host.
The picture below shows the rat window with eleven participants in the Access Grid Lobby. Your own site will be shown at the top. Names are highlighted whenever anyone transmits audio. You can tell if you are losing packets from another participant by looking at the diamond to the left of the participant’s name. The left side of the diamond is the receiver and the right side is the sender. If the diamond is green (loss less than 5%) or orange (loss less than 10%) , then the packet loss is small. A gray diamond indicates the site is not currently transmitting audio and a red diamond indicates a high packet loss percentage.
A red diamond
may indicate a network slow down or failure. However, in actuality, there may
be no noticeable audio degradation.

Packet loss information is also available in a chart format. Clicking on the icon shown above will open the rat matrix which displays the reception quality reported by all participants.
vic (Video Interface Control)
Another useful software tool is vic. It links multiple sites with multiple simultaneous
video streams over a multicast infrastructure. It is helpful to start up vic
to see if video input is being received from all the other AG sites. Rectangular
disturbances in the video transmission, the appearance of moving objects at
multiple locations, and/or a loss of video are all indications of packet loss.
Extreme packet loss can also lead to a completely distorted or frozen picture.
You can also use vic to chart packet loss and frame rate. This information is
available next to each small video window in the main display panel.

Another feature of vic is the ability to show site information. Clicking on the info button brings up a menu which allows you to view RTP error statistics, decoder information and the version of vic the site is running. You may also run an mtrace to and from this site if it is supported by your system.

Real-time Transport Protocol (RTP) information is useful because it allows you to provide feedback to other Node Ops about reception quality. RTP is the standard Internet protocol used to transport real-time data. When AG multicast audio and video is sent out over the internet, RTP includes sequence numbers and timestamps with the data packets. This information allows the receiver to reconstruct the sender's packet sequence, even if the packets are not received in the proper order. RTCP, or the RTP control protocol, monitors the delivery of the data and conveys information about the participants in an on-going session. To view the RTP error statistics in vic, click on info and then on RTP Stats.
If the RTP Statistics window reveals significant packet loss and/or latency during an AG session, you should take note of the statistics. Then you will be able to relay this information to the sender and/or your local system support personnel.
If video
from the other AG sites is jerky or erratic, it may not be a network problem.
Dealing with too many video streams or opening too many windows can also cause
video problems. Try muting some of the video streams. If the loss rate drops,
you are probably experiencing a performance bottleneck on your video capture machine.
A blue video
window is usually the result of an unplugged or non-functioning camera
The vic software
is available for Linux, FreeBSD, Solaris, Irix and Windows 95/98/Me/NT/2000.
ping
A "classic unicast tool for measuring hop-to-hop latency and packet loss" is ping (http://dast.nlanr.net/NPMT/).
The ping program tries to reach a remote host, and once it does, ping tries
again to reach the original source. Node Ops can use the ping tool to determine
if their local router is responding to unicast requests. Bring up a terminal
window in Linux or a command prompt in Windows and type:
ping <router IP address>
Results should look similar to the following:
Pinging 206.21.19.7 with 32 bytes of data:
Reply from 206.21.19.7: bytes=32 time=2ms TTL=60
Reply from 206.21.19.7: bytes=32 time=2ms TTL=60
Reply from 206.21.19.7: bytes=32 time=2ms TTL=60
Reply from 206.21.19.7: bytes=32 time=2ms TTL=60Ping statistics for 206.21.19.7:
Packets: Sent = 4, Received = 4, Lost = 0 <0% loss>,
Approximate round trip times in milli-seconds:
Minimum = 2ms, Maximum = 2ms, Average = 2ms
On a Linux machine
it may be necessary to type Ctrl-C to stop the ping session.
If a reply is received without any packet loss, this tells you the local router is alive and responding to unicast requests. If no reply is received or there is significant packet loss, contact your local network support person because you may have a router problem. Keep in mind that ping cannot rule out multicast problems on a router. If you still suspect a multicast network problem, use network source tools in the following lesson to help you narrow down the cause.
![]()
In order to use the Mbone effectively, you need good diagnostic and debugging tools. The following tools can assist you in narrowing down the point of network failure.
Network Source Tools
| Multicast Beacon | Monitoring system software to help diagnose network problems during broadcast conferences |
| rtpmon | Software that monitors control information exchanged between applications that implement Real-time Transport Protocol (RTP) |
| mroute | Software tool used to query multicast routers |
Arguably, the most important tool for diagnosing network problems is the Multicast Beacon (http://beaconserver.accessgrid.org:9999). The Multicast Beacon provides measurement data for the current multicast traffic on the network. Every AG site should have a beacon client running at all times. These beacon clients continuously send packets to each other through a multicast session and measure the performance of the transmission. These clients also send reports to a central beacon server and provide information on which other beacons they can see. The reporting interval of the beacon clients is about ten seconds and the server updates occur about every 120 seconds. Because of this, the beacon is good for a long-term view of the multicast network, but not for short-term anomalies. However, the beacon can provide information on packet loss between all the sites in the current Access Grid session.
Routers are
not normally configured for multicast traffic. To send and receive multicast,
every router along the data stream must be multicast enabled.
![]()
To check the status of the beacon, use your browser to go to http://beaconserver.accessgrid.org:9999/beacon-php/beacon.php. Choose the beacon sites you would like to see and click Submit. Don't forget to include your own sites beacon. The grid above is a sample of the information obtained by the beacon. The S labels in the top row show Sender information and the R labels in the first column show Receiver information. To decipher the results, find your local site in the sender row and then follow it down the receiver column to see if any other sites are experiencing packet loss coming from your site. Next, find your local site in the receiver column and follow it across the row to see if you are experiencing packet loss coming from any other sites. Green blocks indicate packet loss of 10% or less and that the network is functioning properly between the two sites. Yellow blocks indicate packet loss of 30% or less and gray blocks indicate that no data is available. Red blocks show packet loss of 100% between the two sites. If there are many red blocks down the column, this is indicative of a multicast problem. When contacting network personnel, it may be useful to point them to the beacon site.
Instructions
for setting up a beacon client are available at: http://dast.nlanr.net/Projects/Beacon/guide_beacon.html
Another useful tool is rtpmon. This free, third-party software for Linux monitors control information exchanged between applications that implement Real-time Transport Protocol (RTP), such as the Access Grid. The primary purpose of RTP is to provide feedback to other Node Ops about reception quality. Feedback from receivers, including loss rate and jitter, are presented in a graphical user interface that can be sorted in various ways to help isolate and diagnose multicast distribution problems. Using this program, you can monitor the session, recognize and diagnose problems with the multicast distribution encountered by individual receivers
An RTP session consists of two channels, one for data packets and one for RTP control packets (RTCP). In an AG session there may be several participants transmitting data and there may be many more participants who simply listen. However, all participants periodically transmit RTCP packets on the control channel. Each RTCP packet contains statistics regarding a receiver's packet loss and delay jitter. RTCP packets also contain session description items which carry information about a site such as their real name, e-mail address and the application they are using.
To use rtpmon, open a terminal window and type:
% rtpmon <IP address/port>
Obtain
the IP address and port number from the audio or video resource manager window.
The main rtpmon window will open with the IP address and port number at the top. Click on Menu to open the display parameter window. From here, you can change the settings to display packet loss or delay jitter. You can sort results by maximum loss, average loss, IP address, or sender.

Using rtpmon you can also display a brief history of the statistics from a sender-receiver pair. Clicking on a data element in the main table brings up a window (shown in the lower left of the image above) with data stripcharts for each of the statistics values that rtpmon tracks. The stripchart display shows a barchart of the recent history for each parameter. A separate stripchart window must be created for each sender-receiver pair, but within each stripchart window, both jitter and loss statistics are displayed.
A
free version of rtpmon can be downloaded from: ftp://mm-ftp.cs.berkeley.edu/pub/rtpmon
Multicast routers maintain information about the state of incoming and outgoing interfaces for each source-group (S,G) pair. This information is used by the router to decide which packets are to be discarded and which are to be forwarded. The table that the router maintains for holding this state information is called a multicast routing table. Each entry in this table corresponds to a unique (S,G) pair and is referred to as mroute. Each mroute contains four types of entries:
For example: Consider a router that has three interfaces: Interface-A, Interface-B and Interface-C. Now, consider a scenario where this router has an mroute entry for (S1, G1) (i.e. source, S1 and group, G1). Interface-A is listed as an incoming interface and Interface-C is listed as an outgoing interface. With this entry in its multicast routing table, if the router receives a packet for (S1, G1) from Interface-A it will forward it to Interface-C. However, if a packet is received from any other interface, it will be discarded.
Results include the version number of the queried router along with a list of the neighboring multicast routers. Metrics, thresholds, and flags settings are also available. If mroute returns with no neighboring multicast routers, this indicates a router configuration problem and these results should be forwarded to network support.
Troubleshooting
router problems can be difficult because you may need access to routers at multiple
sites as well as network engineers who are not immediately available.
A free version of mroute
for Unix-based operating systems can be downloaded from: ftp://ftp.parc.xerox.com/pub/net-research/ipmulti.
![]()
The following tools may also be useful in tracking down network problems:
A public looking glass is a way to run mtrace (multicast traceroute) on another machine in order to get a two-way routing picture. Mtrace is a multicast version of traceroute, which is a software program (available natively on Windows and Unix-based machines) to determine the route through which data passes on its way to the local server. The TRACERT command executes a series of ICMP Echo requests beginning with TTL (Time To Live) set to 1 millisecond per hop and repeated three times. The TTL is then incremented by one until the route is reached.
In the image below you see the results of a traceroute from OSC to Purdue University (128.210.182.12). Shown below are the response times for each of the three ICMP Echo requests for each of the TTL settings (in this case TTL 1 through TTL 11). The IP address and DNS name resolution, if available, are also provided.

TRACERT
can be used to determine the route to any remote IP address and not just to multicast
addresses.
Run from a looking glass site, mtrace can be used in combination with tracert from the local site as an effective way to debug Mbone routing problems. It compares the two different data routes and looks for breakdowns in the path. The image below shows the route from the Mae-East looking glass site to a multicast IP at Purdue University.

An mtrace query works in a similar way to the tracert command, but mtrace provides some additional information such as, loss rates along the links and the number of multicast packets flowing across each hop per second for that particular address. When troubleshooting, use the mtrace and tracert commands to find where multicast traffic flow stops, to verify the path of multicast traffic, and to identify sub-optimal paths. Basically, public looking glasses are useful in pinpointing a malfunctioning router because they allow you to run a traceroute on a machine other than your own.
Public Looking Glass Sites:
A free version
of mtrace for installation on a local machine can be downloaded from:
ftp://ftp.parc.xerox.com/pub/net-research/ipmulti/.
Windows 2000 includes the mrinfo command that displays the configuration of a multicast router. This tool is similar to mroute for Linux. You can use the configuration information to help troubleshoot multicast forwarding and routing problems. It is useful for verifying multicast neighbors, confirming that bi-directional neighbor adjacency exists, and verifying that tunnels are up in both directions.
The mrinfo command queries a specified multicast router with an Internet Group Management Protocol (IGMP) message. The response to the query contains:
The
Internet Group Management Protocol (IGMP) is an Internet protocol that provides
a way for an Internet computer to report its multicast group membership to adjacent
routers.
To use mrinfo, open a command prompt window and type:
C:\mrinfo <router IP address>
The syntax of the mrinfo command is:
mrinfo [-n] [ -i address ] [ -r retry_count ] [ -t timeout_count ] multicast_router

In
the above example, mrinfo is run against the multicast router at 192.88.194.129.
The first line shows the multicast router configuration including version number
and flags (prune, mtrace and snmp supported).
Each additional line displays the interfaces on the multicast router and the neighbors on each interface. Interfaces 192.88.194.129, 192.153.40.1, 192.148.240.1 and 192.138.88.1 have no neighbors. Interfaces 192.88.194.6 and 192.148.249.1 each have one neighbor. For each line, mrinfo displays the interface, neighbor, the domain name for the neighbor, the multicast routing metric, the TTL threshold, and flags indicating its role on the network.
mrinfo
will also run on Linux
Once you determine there is a multicast problem and you have collected the information from our previous lessons, the next step is to contact your local network support. Tell the support person your exact problem, such as inability to receive multicast. Make sure to give them any router information you obtained from using the tools in this tutorial. Depending on the skill level of the support person, she may ask for additional information such as a multicast IP address to use for testing and the IP addresses of your local AG machines. It may also be helpful to give network support the web address of the multicast beacon.

A multicast
IP can be obtained from the vrm or arm window.
It is important to remember that the support person who answers the phone may not be able to troubleshoot multicast problems. It is also possible that your network support person will ask you to leave the AG session running so they can track the problem. This causes yet another issue because this type of intrusive diagnostic may cause interruptions at your site or even prohibit your site from joining the session by some other means. It may be necessary to put the diagnosis of the multicast problem on hold until the session is complete. If the problem cannot be immediately fixed, there are some temporary solutions for rejoining the session. We recommend the AGDP document Guide to Network Bridging on the Access Grid, which provides information about both providing and using bridges, which allow you to participate in AG sessions using unicast.
Give your network
support personnel a tour of your AG node. Some of them may have never seen an
AG setup. A quick tour will give you a chance to get to know them and give them
a chance to see how all your equipment is configured. Good communication with
network support will help you solve multicast problems more efficiently.
![]()