Developing the design of the Etherchannel switch for the enhancement of the Quality of Service (QoS) performance

Quality of Service (QoS) mechanisms provide the necessary level of services (bandwidth and delay) to any application in order to maintain an expected quality level. This paper studies the effect of adopting QoS on the performance of (real time) system like video conferencing. A simulation model of the real time network is built using OPNET package. The various parameters affecting the system performance are determined and different solutions to enhance the system performance are suggested .A modified switch architecture is proposed to enhance the real time performance of the system and to modify its quality of service capability .The modification includes adding Etherchannel unit which can classify data into real time or non-real time data and direct each data packet to the appropriate channel .The architecture of the Etherchannel unit is described by VHDL programming and built on FPGA chip .Accordingly , the modified switch is found to need only extra seven clock pulses to classify each data packet .


Quality of Service (QoS) -An Introduction
Quality of Service (QoS) means the possibility to classify different traffic streams and to qualify the performance for each traffic stream across a network. The role of QoS becomes obvious when various types of traffic for different users are utilized by heavy network .Therefore, QoS can offer better service when necessary [1] Although many network protocols have been providing QoS solutions for a number of years, the relatively low cost of the Ethernet infrastructure has made QoS solutions more popular edge technology. Therefore, over 85% world_wide LAN Ethernets have been driven to QoS expansion network. Ethernet performance can be improved by increasing its bandwidth and this was previously enough for most of the applications. Now many applications such as video and voice demand increases in the bandwidth in rates faster than the increase in the supply of bandwidth ,such applications more or less become QoS dependent .Therefore, QoS becomes a better solution and introduced instead of altering the infrastructure of many worldwide network. [2] Instead of paying for the extra cost of the extra bandwidth, administrator of the network must care on the quality of service (Qos) so that traffic management becomes more robust and the key to the network quality of service.

Traffic Characteristics of Network Protocols
The most commonly used network protocols today are TCP (Transport Control Protocol) and UDP (User Datagram Protocol). According to the characteristics of these two protocols, the network traffic flows (packets that are associated with the same application and session) can be analyzed so that their performances can be improved according the influences on the QoS requirements [3].
TCP has proven to be a rugged, durable protocol, in continuous use since 1974. Almost all widely implemented Internet applications use TCP as the network protocol. However, TCP has a lot of protocol and data overhead. In contrast, UDP is a much simpler protocol. However, several key features of TCP network traffic management are lost. The benefits of TCP are obvious if the application needs a reliable, connection-oriented way of communicating. Well-known examples of protocols using TCP are HTTP (the WWW protocol), FTP (File Transfer Protocol), and Telnet (Terminal Protocol) [3][4][5].
Since TCP flow can adapt to the available bandwidth then, combining several TCP flows on the same network connection may be done without special precautions, as their native behavior is to adapt to the available bandwidth. The TCP back-off mechanism effectively enables the different flows to share the available bandwidth over time. However, although bandwidth distribution will be fair over long periods of time, it is likely that there would be large variations between the currently most aggressive and least aggressive flow in any given short (milliseconds) time frame. This unfairness is due to variations in round-trip time (RTT) between different flows; a flow with a shorter RTT can recuperate faster from slow-start and so will get a larger share of the available bandwidth than flows with longer RTTs. Therefore combined TCP flows may still benefit from QoS handling.
The benefits of UDP are that data loss does not trigger time-outs or retransmission. This makes UDP the preferred protocol for delay and jitter sensitive applications, such as voice and video transmission , in which it is better to have a short period of silence or momentarily blank picture than to have big fluctuations in samples or frames per second. Techniques that prevent and lessen these effects exist for TCP, but UDP has proven to be more reliable for real-time applications [6].
This paper deals with a situation in which TCP and UDP traffic are competing on the network resources. Different video conferencing schemes are assumed to be the real time traffic which are affected by various TCP traffic intensities. Quality of Service is implemented using different techniques and accordingly a new switch architecture is suggested.

QoS in Switches and Routers
Usually quality of service is executed in the switches and routers directing traffic through the network infrastructure. In a switch or a router there are four distinct metrics that determine QoS: Latency (the delay a flow experiences when passing through a device), jitter (latency variations), bandwidth distribution and availability (throughput or goodput) and loss probability [7].
Control over latency and jitter is important for the services like voice and video that increasingly being offered over different types of networks and the Internet. Low latency is important for interactive services, such as a video conference or a simple telephone call (for example, the human ear will pick up latencies greater than 300ms). Latency is not as important for non-interactive services such as streaming video and listening to a radio broadcast transmitted over the Internet. Here, jitter (latency variation) is more important since the receiver needs to buffer information to compensate for latency variations.
Latency is the most important metric for videoconferencing application. There are basically four origins of latency in a network [8].
packet transmission time = (Packet length / Data rate) Propagation delay = ( Cable length / Wave speed on that cable) (Propagation delay in LAN is relatively small and could be neglected) Switch delay = (electronic circuits delay + queuing delay) Delay in the TCP/IP stack inside the transmitting and receiving nodes.

Enhancing QoS Capability by Modifying Etherchannel Technology
EtherChannel technology builds upon standards-based 802.3 full-duplex Fast Ethernet to provide network managers a reliable, high-speed solution for the network backbone. EtherChannel technology can offer bandwidth scalability with full-duplex increments of 200 Mbps to 8 Gbps [9].

The switch (which supports EtherChannel) distributes frames across the ports in an EtherChannel according to the source and destination Media Access Control (MAC) addresses. The operation that determines which link in an EtherChannel is used , is very simple. A connection across an EtherChannel is determined by the source -destination address pairs. The switch performs an X-OR operation on the last two bits of the source MAC address and the destination MAC address. This operation yields one of four possible results: (0 0), (0 1), (1 0), or (1 1). Each of these values points to a link in the EtherChannel bundle. Also, various load balancing techniques is used to guarantee fair distribution of traffic between the channels. When the load on a channel exceeds (1%) of its capacity, it is directed to other less load channels [9,10].
EtherChannel technology provides many benefits such as high bandwidth, load sharing and redundancy. This technology provides load balancing and management of each link by distributing traffic across the multiple links in the channel. Unicast, multicast, and broadcast traffic is distributed across the links in the channel. In addition , EtherChannel technology provides redundancy in the event of link failure. If a link is cut in an EtherChannel, traffic is rerouted to one of the other links in less than a few milliseconds, and the convergence is transparent to the user [9,10].
In this paper, EtherChannel is used to enhance QoS performance of switched Ethernet and to solve network delay problem caused by the network congestion. EtherChannel was assumed to be installed between the ports that connect the network switches together (refer to figure 1).
Nowadays , switches that can handle QoS ,they do that softwarely . In this paper , a hardware modification to the switch architecture is developed in order to handle QoS hardwarely. The main idea behind introducing this design , is to separate real time data from non real time data. Figure (1), shows the architecture of the proposed switch. As shown, the modification is performed on the Etherchannel controller unit. The operation of the new switch can be described as follows: 1. A new unit called "packets classification and forwarding unit" was added to the Etherchannel unit. Its duty is to classify and forward the packets to one of the switch ports as shown in figure(1). Packet classification is crucial for QoS because it enables the switch or router to differentiate the traffic streams and treat them differently depending on their individual requirements. Classification can be divided into two parts; data extraction, where the relevant fields are extracted from the packet header, and data comparison, where the extracted fields are compared to predefined data.

2.
When a packet arrives the switch, its "protocol type " fields(at the layer 3 header as shown in Figure(2)) is checked by the packets classification and forwarding unit and then forwards the packet to one of the ports. This arrangement guarantees full isolation between the two traffic types. 3. Each EtherChannel is reserved to one of the traffic types, i.e., the outgoing packets are directly forwarded (by the EtherChannel controller) to one of these channels.  (3) consists of a Fast Ethernet LAN connecting different videoconferencing clients arranged as groups. The Video performance is ((160x125) pixel ,8 bit colour intensity, 30 frame/sec. ). Also , the model has two TCP Traffic generators exchanging full duplex TCP Traffic in different rates. On the other side, there are four servers for the video conferencing groups . Switches sw1 and sw2 connect the clients groups and the server groups along with the TCP generators.

Figure(3): The Simulation Model
The first investigation of the system performance(videoconferencing Latency) includes varying the number of clients in each group and ,hence, the total number of users(with certain TCP traffic contribution). As a result of simulation, figure (4) shows that the unmanaged increment in the number of users could heavily affect the latency because of the bandwidth congestion problem.

Figure(4): Latency Variation vs. Number of Users
In order to investigate the effect of the TCP traffic on the system performance, the number of clients is fixed to be (50,100,150). In the beginning , the TCP traffic generators were connected to switch (SW1). The two nodes exchange a TCP data with a total rate of 100 Mbps on the line. It is noticed that the variation in latency when measured for different number of clients in the networks no more than (1%). This little increase assures the affectivity of the non blocking property of the switch. This property allows any two ports to exchange data without disturbing or blocking other ports on the switch. Now returning to figure (3), the load between the TCP traffic generators when varied from (0 to 100 Mbps), the maximum latency is measured. Figure (5) indicates that the network delay becomes more effective when the offered load exceeds (25Mbps), and the network becomes more congested when the load exceeds a value of (40Mbps ). The main reason beyond the increase in the network delay, and hence the latency, is the competition between different traffic types. When the packets moved from a switch to another, they pass through the ports that connect the two switches. When the load increases, more packets try to pass these ports and generate more queuing delay inside the switch (the bottle neck point).
When the offered load becomes very high, the queued packets inside the switches occupy most of its memory, which force the switch's controller to take one ( or both) of the following two actions [12]: 1.Removing the packets that spent a specific time inside the switch's memory(Aged packets),according to the FIFO principles.

Solving Network Delay Problem:
Network delay problem can be solved by either limiting the non real-time traffic (TCP traffic) offered to the network, or by increasing the channel bandwidth between the switches. The first solution was achieved by limiting the network connection to the TCP traffic generators nodes via (10 Mbps). The other solution uses either EtherChannel technology or 10 Gigabit Ethernet to increase the bandwidth between the switches. These solutions were tested by the OPNET environment, assuming the presence of a (1000 Mbps) TCP traffic and 100 clients (worst possible case).

Limiting Non Real-Time Traffic:
This solution is the easiest and cheapest option. The ports to which TCP traffic nodes are connected, could be reconfigured to work at 10 Mbps speed. Accordingly, higher bit rate can be assigned to the non real time traffic contribution, and the network delay is greatly minimized. The simulation results as referred to figure(5) show a very stable network behaviour in the presence of the (10 Mbps) traffic and the latencies keep their values without any change. However, this limitation could affect negatively on the applications working according to the TCP/IP protocol.

Using 10 Gigabit Ethernet Technology to Increase Channel Bandwidth:
As known, 10 Gigabit Ethernet is a new technology used to enhance channel throughput in many fields. In this paper, it is used to connect the network switches together, which increases the bandwidth between them to 10 Gbps. This arrangement allows both real time data and non real time data to be existed on the same network without disturbing each other. The simulation of the model in figure(3) gives a latency value of (0.0057 Sec.). However, 10 Gigabit Ethernet represents the highest cost option [9].

Using the Modified EtherChannel Technology to Increase Channel Bandwidth
In OPNET environment, the number of EtherChannels and their speed were varied, i.e. , the channel bandwidth. The goal is to find the necessary amount of channel bandwidth to handle the load offered to the network without affecting its latency values. Table (1) lists the various possibilities of channels and their corresponding latency values. It is obvious that (4 Channels , 1 Gbps ) option gives the best latency results. However, network delay problem could not be completely vanished because the system fails to respond to the real time deadline time of (300 mSec.) Table (1); Maximum latency for various possibilities of channels In spite of the relative advantage behind using Etherchannel technique, simulation results show that the average utilization of each EtherChannel in the case of (4 Channels , 1 Gbps), does not exceed (26%). It is obvious that the offered load consumes only a small portion of the channels bandwidth, which can be considered as a waste of bandwidth and unnecessary additional cost.
The characteristics of the proposed architecture were described and added in the OPNET environment. The speed of the two EtherChannels(channel 1 and channel 2 of figure  1) was chosen to be (1000 Mbps). The results obtained from running the simulation show that the proposed switch architecture is able to fully isolate the two traffic types (inside and outside the switch) . As shown in Figure(7) ,the latency curve for the modified Etherchannel is similar to that in Figure(4) for 0% TCP load when using the traditional Etherchannel . This assures the fully isolation of the two traffic types. This arrangement removes the negative effect of Ethernet delay on latency. In addition, the use of two (1000 Mbps) EtherChannels keeps the cost to the lower possible value. Finally, it is possible to say that using the modified EtherChannel technology (Separated traffic types + Separated transmission channels) greatly enhances the Quality of Service capability of the system. In order to investigate the effects of the proposed hardware changes on the switch's architecture, a VHDL program is built to describe the function of the modified Etherchannel unit mentioned above. Then, the design was implemented in FPGA using Spartan 3 evaluation kit. [13] In this paper, 32 bit-bus architecture type , due to its wide spread [14], was chosen to be the architecture of the proposed switch, see figure (1 ). Figure (8) shows the logical and pin diagram of the proposed Etherchannel unit , while table (2) lists the functional description of these pins. Packet classification unit is added to the traditional architecture of the switch. This unit classifies the incoming streams of data into TCP and UDP packets according to their headers in the second and third layers of the TCP/IP network model. The headers formats of these layers were shown earlier in figure (2). Packet classification unit mainly consists of two parts; data extraction part, where the relevant fields are extracted from the packet header, and data comparison part, where the extracted data fields are compared to predefined data.

Figure (8): The logical and pin diagram of the proposed Etherchannel unit
Static part of the packet may be picked and sent to the comparator. Although this method of data extraction is simple but has some drawbacks. For example, it can not detect the presence of a VLAN tag.
Dynamic Packet Decoding is a. more flexible method of extracting the header information. The dynamic packet decoder is based on the type of protocol and therefore it is protocol-aware and is able to filter out the relevant fields of a packet wherever they may be within the packet. This arrangement was achieved using programmable offset technique in which offset value is added to recover against the different situations (see figure 8).
Data comparison also made to be executed in different ways . The simplest method is to compare all extracted data fields with a ternary bitmask and to report if there is a packet match or not. The modified Etherchannel unit may work in one of two modes; programming mode or operation mode . I. Programming Mode: When the signal on pin PRG is '1', the modified Etherchannel unit works in the programming mode, in which the offset value is entered to the device. The sequence of events in this mode is as follows: 1. PRG signal is '1'. 2. HOLD signal is '0'. 3. The offset value is fed to the offset buffer via the 32 bit data bus and fed to the up counter ( as shown in figure 8) as a threshold value.
The above three steps are executed in only one clock cycle. II. Operation mode: This is the mode in which modified Etherchannel unit performs its packets forwarding functions depending on their transport layer protocols. It is summarized as follows: 1. Whenever a packet is transferred through the switch fabric, its layer two header(14 bytes) and the beginning of layer three header (10 bytes) are stored inside an 24 Bytes temporary buffer, see figure (2) . This operation requires 6 clock periods since the bus is 32 bit width. 2. The packet classification unit generates a 'HOLD' signal to pause the packet transfer procedure until finding its destination port . The HOLD signal is generated as a result of comparing the output of an 8 bit counter with a predefined value (represents No. of clocks) inside the latched HOLD signal unit. 3. The comparison process between the predefined protocol type field (8 bit) and the packet is achieved. When the protocol type value is 17(for UDP), then the packet is directed to Eth_ch1, Other packet types is forwarded to the second port(Eth_ch2).This step requires one clock cycle. 4. After completing the comparison process, the packet classification unit transfers the packet's header fields to the proper port by activating the signal Actv_PRT to low ( figure  8). Again, it needs 6 clocks to finish this operation. 5. The 'HOLD' signal is returned to "0" and 'Reset' signal become '1' to deactivate the unit and to allow the completion of the packet transfer operation to its proper port. In order to simulate the behavior of the packet classification unit , the above modes were described as a VHDL program. Both packet classification unit and the switch bus are assumed to work at a clock frequency of (50 MHZ). Also, the packet is assumed to have the following fields values : destination & source MAC addresses are (1A2867D01212, 121212121212) respectively, Version is 4 (i.e. IPV4), Header length is 5 (i.e. 20 Byte) , Service Type is 00 (i.e. Normal service), Overall Length (0060)  It is clear that packet classification unit adds a 7 clocks delay period to the packet transfer operation to the port buffers. For ( 50 MHZ ) clock, this period is equal to ( 0.14 Sec. ). This delay could be minimized further by using higher clock values. In order to explore the synthesis possibility of the VHDL program, an FPGA Spartan 3 starter kit is used to be the implementation target of the design. Spartan 3 has the following features [13]: Maximum working frequency of (50 MHz

Conclusions
The Quality of Service offered on a network rests on the ability to separate traffic into classes and to treat these classes differently in order to distribute the available bandwidth and to provide traffic prioritization. Latency and jitter control are the more important as real-time media applications, such as streaming video and video conferencing, take hold in the network world. This paper investigates the effect of mixing two traffic types , real time traffic and non real time traffic. An OPNET simulation model was built for this purpose. It was found that the unmanaged traffic could affect dangerously on the system performance and different traffic types could affect negatively on each other. In order to implement QoS concepts on the network , a new switch architecture is proposed. The suggested solution adopts two techniques to enhance the operation of the network. The different traffic types were isolated using separate buffers inside the switches(one for each traffic type) and routed to different channels between the switches. The adoption of this technique would enhance the performance of the network and introduce the concept of the multi purpose network. This technique needs only an extra 7 clocks cycles ,while it fully isolates the two different types of traffic.