CSIM Performance Models of Core Library Components
Upon startup, this model reads an application program into its memory from a file called "pe_xx.prog", where xx is the logical processor number returned by the MY_ID subroutine. The application program consits of a sequence of Compute, Send and Receive instructions that have been generated by the CSIM Scheduler based on the application data flow graph and system architecture definition.
The Computation Agent interprets and executes the instructions in sequence. A Compute instruction causes the process to delay by the time specified for the task. A Send instruction causes a message to be queued in the processor's output queue and sent out the external io_port to its destination. A Receive instruction causes the processor to dequeue the number of data bytes of type MID (Message ID) from its input buffer. If not enough data of that type has been received, the Computation Agent will wait until that data has been received at its input port. When all instructions have been completed, the Computation agent stops processing.
The Communication Agent consists of an Output Agent and an Input Agent. The Output Agent runs continuously and checks for any messages placed in the Output Queue by the Computation Agent. If there is a message in the Output Queue, the Output Agent sends it out the io_port to the external link. If the external link is full, the Output Agent will wait until the link becomes available for another message. If the link is blocked for an extended period of time, it may cause the Output Queue to be filled to capacity by the Computation Agent. When the Computation Agent tries to send a message out to a full Output Queue, it will also block and wait until there is room in the Output Queue.
The Input Agent also runs continuously waiting for messages coming in at the io_port. When messages come in, they get placed in a mailbox identified by their message ID (MID). The number of bytes received for each MID get recorded and the total number of bytes received for all messages get recorded. When the Computation Agent executes a Receive instruction, it checks for the number of bytes it needs for the specified MID. If it finds at least that number of bytes, it will dequeue that number of bytes from the input buffer. Otherwise, it will wait until sufficient data has been received. If the total number of bytes received by the Input Agent and not dequeued by the Computation Agent reaches the maximum size of the input buffer, the Input Agent will hold up any new messages coming in at the io_port, thus causing messages to be backed up at the sender side. This protocol allows the processor model to simulate real system behavior where limitations in the processor's memory, input and output buffer sizes sets constraints on the system performance.
In dynamic scheduler mode, #define DYNAMIC, the processor gets its instructions from its command queue, which are dynamically loaded by the scheduler during simulation. In dynamic mode, two instruction streams are processed concurrently: the usual command queue and a Send command que. This is done to allow Send commands, which have been held up earlier by the scheduler due to unresolved destination, to be processed immediately even when there is already another Compute instruction being processed concurrently. Only old Send commands would be processed immediately. A Send command is always associated with a completed Compute task. An id is associated with every Compute and Send command to allow comparison of Send commands in the queue with the most recent Compute task processed.
The generic_pe model is a behavioral model of a processor with a single I/O port. The generic model may be used to model almost any processor with a single I/O communication port. It is useful in modeling the performance of a processor at the task level. It uses a list of instructions that are generated automatically by the CSIM scheduler from a data flow graph Graphical User Interface.
The model supports the following set of instructions:
The instructions are generated by a static scheduler, in which case the list of instructions are generated and stored in a program (.prog) file. Or they may be generated dynamically during simulation by a dynamic scheduler which parses the instructions and delivers them to the processor in an input queue.
The model provides a number of debugging tools and gathers simulation statistics for display.
Major events may be displayed during simulation by setting the Verbosity
mode. Setting Verbose mode, causes messages to be displayed in the terminal
window. The Verbosity varies between 0 and 10. The higher the Verbosity,
the more messages will be displayed. When initiating the simulation by
command line, type:
sim.exe -V n
where n is a number from 0 to 10.
The model supports animation during simulation. When a task is executed in the processor, the model highlights its box in the simulation GUI. Either the hardware architecture graph or the software DFG graph may be displayed during simulation. This may be switched by the environment variable SIM_GRAPH, setting it to the file of either the hardware graph or the software graph. The default setting is the hardware architecture file.
During simulation, a number of files are generated which may be used
by a post-processor to show processor activity and communication timelines.
The file ProcTline.dat will show the task activity timeline using XGRAPH.
The file Spider.dat will show the interprocessor communication timeline.
It should be used in conjuction with ProcTline.dat to show both the task
and communication relationships, as in the command:
XGRAPH ProcTline.dat Spider.dat
A live display of the XGRAPH ProcTline and Spider plot can be generated
during simulation by invoking:
sim.exe -S socket_number
where socket_number is optional and can be a value between 1000 and
16383. See Live
XGRAPH Display feature document for a more detailed description of
its use.
An event history file called EventHist.dat is generated for use with the TLPP display tool which allows greater flexibility for displaying the task and communication timelines.
The on-chip processor memory utilization timeline is captured in IQtrace.dat, OQtrace.dat and Mtrace.dat files to be used with XGRAPH, to show the processor's data input queue, data output queue and total memory utilization, respectively. See Processor Memory Tracing and Management feature document for a more detailed description of its use.
The model includes an ability to specify buffer limits. When no memory/buffer
attributes are used, the generic_pe
model uses the global parameters set in parameters.sim:
The timeline displays and statistics may be generated within a specified simulation time window or by events in the data flow graph which specify the beginning and end of the window. See Utilization Time Window Setting document for a more detailed description of its use.
A summary.dat file is generated to collect processor utilization statistics.
Upon startup, this model reads an application program into its memory from a file called "pe_xx.prog", where xx is the logical processor number returned by the MY_ID subroutine. The application program consits of a sequence of Compute, Send and Receive instructions that have been generated by the CSIM Scheduler based on the application data flow graph and system architecture definition.
The Computation Agent interprets and executes the instructions in sequence. A Compute instruction causes the process to delay by the time specified for the task. A Send instruction causes a message to be queued in the processor's output queue and sent out the external io_port to its destination. A Receive instruction causes the processor to dequeue the number of data bytes of type MID (Message ID) from its input buffer. If not enough data of that type has been received, the Computation Agent will wait until that data has been received at its input port. When all instructions have been completed, the Computation agent stops processing.
The Communication Agent consists of an Output Agent and an Input Agent. The Output Agent runs continuously and checks for any messages placed in the Output Queue by the Computation Agent. If there is a message in the Output Queue, the Output Agent sends it out the io_port to the external link. If the external link is full, the Output Agent will wait until the link becomes available for another message. If the link is blocked for an extended period of time, it may cause the Output Queue to be filled to capacity by the Computation Agent. When the Computation Agent tries to send a message out to a full Output Queue, it will also block and wait until there is room in the Output Queue.
The Input Agent also runs continuously waiting for messages coming in at the io_port. When messages come in, they get placed in a mailbox identified by their message ID (MID). The number of bytes received for each MID get recorded and the total number of bytes received for all messages get recorded. When the Computation Agent executes a Receive instruction, it checks for the number of bytes it needs for the specified MID. If it finds at least that number of bytes, it will dequeue that number of bytes from the input buffer. Otherwise, it will wait until sufficient data has been received. If the total number of bytes received by the Input Agent and not dequeued by the Computation Agent reaches the maximum size of the input buffer, the Input Agent will hold up any new messages coming in at the io_port, thus causing messages to be backed up at the sender side. This protocol allows the processor model to simulate real system behavior where limitations in the processor's memory, input and output buffer sizes sets constraints on the system performance.
This processor does not support the dynamic scheduling mode and some of the newer features. It is a scaled down version of the generic_pe.sim model and may be used when desiring a faster simulation. For a full comparison of the features check the revision table in the model source text.
The static_pe model is a behavioral model of a processor with a single I/O port. The model may be used to model almost any processor with a single I/O communication port. It is useful in modeling the performance of a processor at the task level. It uses a list of instructions that are generated automatically by the CSIM static scheduler from a data flow graph Graphical User Interface and stored in the program (.prog) files.
The model supports the following set of instructions:
The static_pe ignores attributes that may be appended to the instructions.
The model provides a number of debugging tools and gathers simulation statistics for display.
Major events may be displayed during simulation by setting the Verbosity
mode. Setting Verbose mode, causes messages to be displayed in the terminal
window. The Verbosity varies between 0 and 10. The higher the Verbosity,
the more messages will be displayed. When initiating the simulation by
command line, type:
sim.exe -V n
where n is a number from 0 to 10.
The model supports animation during simulation. When a task is executed in the processor, the model highlights its box in the simulation GUI. Either the hardware architecture graph or the software DFG graph may be displayed during simulation. This may be switched by the environment variable SIM_GRAPH, setting it to the file of either the hardware graph or the software graph. The default setting is the hardware architecture file.
During simulation, a number of files are generated which may be used
by a post-processor to show processor activity and communication timelines.
The file ProcTline.dat will show the task activity timeline using XGRAPH.
The file Spider.dat will show the interprocessor communication timeline.
It should be used in conjuction with ProcTline.dat to show both the task
and communication relationships, as in the command:
XGRAPH ProcTline.dat Spider.dat
An event history file called EventHist.dat is generated for use with the TLPP display tool which allows greater flexibility for displaying the task and communication timelines.
The on-chip processor memory utilization timeline is captured in IQtrace.dat, OQtrace.dat and Mtrace.dat files to be used with XGRAPH, to show the processor's data input queue, data output queue and total memory utilization, respectively. The model includes an ability to specify buffer limits. The model uses the global parameters set in parameters.sim:
A summary.dat file is generated to collect processor utilization statistics.
1.3 ADSP Sharc Processing Element (PE) - sharc.sim
This is a task level model of the Anolog Device's Sharc processor's performance and its communication with other processors through its external ports. Tasks are modeled by their computation delay, and communication is modeled by Send and Receive instructions.
The Sharc has seven external I/O ports called "p0-p6" through which data flows, one is a parallel port and six are serial link ports. The Sharc receives data-messages on its I/O ports and it simulates the computation of application tasks which would operate on the data by a time delay. Result data from the computations are then modeled by sending data out the I/O ports.
The serial ports also serve as a routing mechanism for messages to traverse across a network of multiple Sharc processors. Messages that are received on the serial ports and are not for this processor, get forwarded out another port until it reaches its final destination. The routing path and the destination processor ID is carried along with each message.
Upon startup, this model reads an application program into its memory from a file called "pe_xx.prog", where xx is the logical processor number returned by the MY_ID subroutine. The application program consists of a sequence of Compute, Send and Receive instructions that have been generated by the CSIM Scheduler based on the application data flow graph and system architecture definition.
The Sharc processor is modeled by concurrent processes: a Computation Agent and Communication Agents for each of the I/O ports.
The Computation Agent interprets and executes the instructions in sequence. A Compute instruction causes the process to delay by the time specified for the task. A Send instruction causes a message to be queued in the processor's Output Queue and sent out one of the external I/O ports to its destination. A Receive instruction causes the processor to dequeue the number of data bytes of type MID (Message ID) from its input buffer. If not enough data of that type has been received, the Computation Agent will wait until that data has been received at its input ports. When all instructions have been completed, the Computation agent stops processing.
The Communication Agent consists of an Input Agent for each of its ports and a common Output Agent for all its ports. The Output Agent runs continuously and checks for any messages placed in the Output Queue by the Computation Agent. If there is a message in the Output Queue, the Output Agent sends it out the selected I/O port to the external link. If the external link is full, the Output Agent will wait until the link becomes available for another message. If the link is blocked for an extended period of time, it may cause the Output Queue to be filled to capacity by the Computation Agent. When the Computation Agent tries to send a message out to a full Output Queue, it will also block and wait until there is room in the Output Queue.
The Input Agents for each of the ports also run continuously waiting for messages coming in at the I/O port. When a message comes in at its port, the Input Agent checks its destination. If it not for this processor, the message gets forwarded out the I/O port specified by the next entry in the message routing path. If it is for this processor, the amount of data in the message gets placed in a mailbox identified by the message ID (MID). The number of bytes received for each MID gets recorded and the total number of bytes received for all messages gets recorded. When the Computation Agent executes a Receive instruction, it checks for the number of bytes it needs for the specified MID. If it finds at least that number of bytes, it will dequeue that number of bytes from the input buffer. Otherwise,it will wait until sufficient data has been received. If the total number of bytes received by the Input Agents and not dequeued by the Computation Agent reaches the maximum size of the input buffer, the Input Agent will hold up any new messages coming in at the I/O port, thus causing messages to be backed up at the sender side. This protocol allows the processor model to simulate real system behavior where limitations in the processor's memory, input and output buffer sizes sets constraints on the system performance.
In dynamic scheduler mode, #define DYNAMIC, the processor gets its instructions from its command queue, which are dynamically loaded by the scheduler during simulation. In dynamic mode, two instruction streams are processed concurrently: the usual command queue and a Send command que. This is done to allow Send commands, which have been held up earlier by the scheduler due to unresolved destination, to be processed immediately even when there is already another Compute instruction being processed concurrently. Only old Send commands would be processed immediately. A Send command is always associated with a completed Compute task. An id is associated with every Compute and Send command to allow comparison of Send commands in the queue with the most recent Compute task processed.
This model has been generalized to handle any number of I/O ports.
1.4 ADSP Sharc Processing Element (PE) - static_sharc.sim
This is a task level model of the Anolog Device's Sharc processor's performance and its communication with other processors through its external ports. Tasks are modeled by their computation delay, and communication is modeled by Send and Receive instructions.
The Sharc has seven external I/O ports called "p0-p6" through which data flows, one is a parallel port and six are serial link ports. The Sharc receives data-messages on its I/O ports and it simulates the computation of application tasks which would operate on the data by a time delay. Result data from the computations are then modeled by sending data out the I/O ports.
The serial ports also serve as a routing mechanism for messages to traverse across a network of multiple Sharc processors. Messages that are received on the serial ports and are not for this processor, get forwarded out another port until it reaches its final destination. The routing path and the destination processor ID is carried along with each message.
Upon startup, this model reads an application program into its memory from a file called "pe_xx.prog", where xx is the logical processor number returned by the MY_ID subroutine. The application program consists of a sequence of Compute, Send and Receive instructions that have been generated by the CSIM Scheduler based on the application data flow graph and system architecture definition.
The Sharc processor is modeled by concurrent processes: a Computation Agent and Communication Agents for each of the I/O ports.
The Computation Agent interprets and executes the instructions in sequence. A Compute instruction causes the process to delay by the time specified for the task. A Send instruction causes a message to be queued in the processor's Output Queue and sent out one of the external I/O ports to its destination. A Receive instruction causes the processor to dequeue the number of data bytes of type MID (Message ID) from its input buffer. If not enough data of that type has been received, the Computation Agent will wait until that data has been received at its input ports. When all instructions have been completed, the Computation agent stops processing.
The Communication Agent consists of an Input Agent for each of its ports and a common Output Agent for all its ports. The Output Agent runs continuously and checks for any messages placed in the Output Queue by the Computation Agent. If there is a message in the Output Queue, the Output Agent sends it out the selected I/O port to the external link. If the external link is full, the Output Agent will wait until the link becomes available for another message. If the link is blocked for an extended period of time, it may cause the Output Queue to be filled to capacity by the Computation Agent. When the Computation Agent tries to send a message out to a full Output Queue, it will also block and wait until there is room in the Output Queue.
The Input Agents for each of the ports also run continuously waiting for messages coming in at the I/O port. When a message comes in at its port, the Input Agent checks its destination. If it not for this processor, the message gets forwarded out the I/O port specified by the next entry in the message routing path. If it is for this processor, the amount of data in the message gets placed in a mailbox identified by the message ID (MID). The number of bytes received for each MID gets recorded and the total number of bytes received for all messages gets recorded. When the Computation Agent executes a Receive instruction, it checks for the number of bytes it needs for the specified MID. If it finds at least that number of bytes, it will dequeue that number of bytes from the input buffer. Otherwise,it will wait until sufficient data has been received. If the total number of bytes received by the Input Agents and not dequeued by the Computation Agent reaches the maximum size of the input buffer, the Input Agent will hold up any new messages coming in at the I/O port, thus causing messages to be backed up at the sender side. This protocol allows the processor model to simulate real system behavior where limitations in the processor's memory, input and output buffer sizes sets constraints on the system performance.
This processor does not support the dynamic scheduling mode. It is a scaled down version of the sharc.sim model and may be used when desiring a faster simulation. For a full comparison of the features check the revision table in the model source text.
The c40 has seven external I/O ports called "p0-p6" through which data flows, one is a parallel port and six are serial link ports. The c40 receives data-messages on its I/O ports and it simulates the computation of application tasks which would operate on the data by a time delay. Result data from the computations are then modeled by sending data out the I/O ports.
The serial ports also serve as a routing mechanism for messages to traverse across a network of multiple c40 processors. Messages that are received on the serial ports and are not for this processor, get forwarded out another port until it reaches its final destination. The routing path and the destination processor ID is carried along with each message.
Upon startup, this model reads an application program into its memory from a file called "pe_xx.prog", where xx is the logical processor number returned by the MY_ID subroutine. The application program consists of a sequence of Compute, Send and Receive instructions that have been generated by the CSIM Scheduler based on the application data flow graph and system architecture definition.
The c40 processor is modeled by concurrent processes: a Computation Agent and Communication Agents for each of the I/O ports.
The Computation Agent interprets and executes the instructions in sequence. A Compute instruction causes the process to delay by the time specified for the task. A Send instruction causes a message to be queued in the processor's Output Queue and sent out one of the external I/O ports to its destination. A Receive instruction causes the processor to dequeue the number of data bytes of type MID (Message ID) from its input buffer. If not enough data of that type has been received, the Computation Agent will wait until that data has been received at its input ports. When all instructions have been completed, the Computation agent stops processing.
The Communication Agent consists of an Input Agent for each of its ports and a common Output Agent for all its ports. The Output Agent runs continuously and checks for any messages placed in the Output Queue by the Computation Agent. If there is a message in the Output Queue, the Output Agent sends it out the selected I/O port to the external link. If the external link is full, the Output Agent will wait until the link becomes available for another message. If the link is blocked for an extended period of time, it may cause the Output Queue to be filled to capacity by the Computation Agent. When the Computation Agent tries to send a message out to a full Output Queue, it will also block and wait until there is room in the Output Queue.
The Input Agents for each of the ports also run continuously waiting for messages coming in at the I/O port. When a message comes in at its port, the Input Agent checks its destination. If it not for this processor, the message gets forwarded out the I/O port specified by the next entry in the message routing path. If it is for this processor, the amount of data in the message gets placed in a mailbox identified by the message ID (MID). The number of bytes received for each MID gets recorded and the total number of bytes received for all messages gets recorded. When the Computation Agent executes a Receive instruction, it checks for the number of bytes it needs for the specified MID. If it finds at least that number of bytes, it will dequeue that number of bytes from the input buffer. Otherwise,it will wait until sufficient data has been received. If the total number of bytes received by the Input Agents and not dequeued by the Computation Agent reaches the maximum size of the input buffer, the Input Agent will hold up any new messages coming in at the I/O port, thus causing messages to be backed up at the sender side. This protocol allows the processor model to simulate real system behavior where limitations in the processor's memory, input and output buffer sizes sets constraints on the system performance.
In dynamic scheduler mode, #define DYNAMIC, the processor gets its instructions from its command queue, which are dynamically loaded by the scheduler during simulation. In dynamic mode, two instruction streams are processed concurrently: the usual command queue and a Send command que. This is done to allow Send commands, which have been held up earlier by the scheduler due to unresolved destination, to be processed immediately even when there is already another Compute instruction being processed concurrently. Only old Send commands would be processed immediately. A Send command is always associated with a completed Compute task. An id is associated with every Compute and Send command to allow comparison of Send commands in the queue with the most recent Compute task processed.
This model has been generalized to handle any number of I/O ports.
The c40 has seven external I/O ports called "p0-p6" through which data flows, one is a parallel port and six are serial link ports. The c40 receives data-messages on its I/O ports and it simulates the computation of application tasks which would operate on the data by a time delay. Result data from the computations are then modeled by sending data out the I/O ports.
The serial ports also serve as a routing mechanism for messages to traverse across a network of multiple c40 processors. Messages that are received on the serial ports and are not for this processor, get forwarded out another port until it reaches its final destination. The routing path and the destination processor ID is carried along with each message.
Upon startup, this model reads an application program into its memory from a file called "pe_xx.prog", where xx is the logical processor number returned by the MY_ID subroutine. The application program consists of a sequence of Compute, Send and Receive instructions that have been generated by the CSIM Scheduler based on the application data flow graph and system architecture definition.
The c40 processor is modeled by concurrent processes: a Computation Agent and Communication Agents for each of the I/O ports.
The Computation Agent interprets and executes the instructions in sequence. A Compute instruction causes the process to delay by the time specified for the task. A Send instruction causes a message to be queued in the processor's Output Queue and sent out one of the external I/O ports to its destination. A Receive instruction causes the processor to dequeue the number of data bytes of type MID (Message ID) from its input buffer. If not enough data of that type has been received, the Computation Agent will wait until that data has been received at its input ports. When all instructions have been completed, the Computation agent stops processing.
The Communication Agent consists of an Input Agent for each of its ports and a common Output Agent for all its ports. The Output Agent runs continuously and checks for any messages placed in the Output Queue by the Computation Agent. If there is a message in the Output Queue, the Output Agent sends it out the selected I/O port to the external link. If the external link is full, the Output Agent will wait until the link becomes available for another message. If the link is blocked for an extended period of time, it may cause the Output Queue to be filled to capacity by the Computation Agent. When the Computation Agent tries to send a message out to a full Output Queue, it will also block and wait until there is room in the Output Queue.
The Input Agents for each of the ports also run continuously waiting for messages coming in at the I/O port. When a message comes in at its port, the Input Agent checks its destination. If it not for this processor, the message gets forwarded out the I/O port specified by the next entry in the message routing path. If it is for this processor, the amount of data in the message gets placed in a mailbox identified by the message ID (MID). The number of bytes received for each MID gets recorded and the total number of bytes received for all messages gets recorded. When the Computation Agent executes a Receive instruction, it checks for the number of bytes it needs for the specified MID. If it finds at least that number of bytes, it will dequeue that number of bytes from the input buffer. Otherwise,it will wait until sufficient data has been received. If the total number of bytes received by the Input Agents and not dequeued by the Computation Agent reaches the maximum size of the input buffer, the Input Agent will hold up any new messages coming in at the I/O port, thus causing messages to be backed up at the sender side. This protocol allows the processor model to simulate real system behavior where limitations in the processor's memory, input and output buffer sizes sets constraints on the system performance.
This processor does not support the dynamic scheduling mode. It is a scaled down version of the c40.sim model and may be used when desiring a faster simulation. For a full comparison of the features check the revision table in the model source text.
This reflexive-PE model automatically generates data traffic to a local memory, everytime a message is sent or received by the PE. The transfer is generated "reflexively", thus the name for this model. The reflexive transfer is made to have the same size as the actual message which triggers it.This file, reflexive_pe.sim, contains two models:This model is based on assumptions about the "actual-PE" that we are modeling, that:
Thus the data must be transferred on the local bus twice each time. Because down-stream events must wait for both transfers, the modeled PE/Memory pair is made to hand-shake with an acknowledge to assure the transfer to/from local memory completed before continuing.
- To send data, the actual PE must first write data into local memory, then the data flows from local memory to the local NIC (Network Interface).
- To receive data, the data first flows from the NIC into local memory, then the PE reads it out of local memory.
This pair of models makes a very important assumption! It assumes that the the memory will always be on port "2" of the local bus, and the PE on port "1". If you use other ports, it is easily changed. There is a DEFINE_GLOBAL block at the bottom of this file which initializes two arrays:
/* Set default path to local memory. */
/* Assumes local memory is on port "2" of local bus. */
int TO_LOCAL_MEMORY[] = {2,-1};
/* Set default path to local PE. */
/* Assumes local PE is on port "1" of local bus. */
int TO_LOCAL_PE[] = {1,-1};
Change these as needed.
This model consists of a major process thread that is instantiated multiple times, once for each of the used ports on the bus. The port_handler implements the messaging protocol as described in the Bus_NIC model.
It waits for a control signal to arrive on its port. If the arriving control signal is a "REQuest", then it checks to see if this bus is busy. If not, then it sets the status to "busy" and forwards the request message out that requested output port. Otherwise, if the bus is busy, it depends on whether this bus segment is the first in a cascade of buses encountered by the REQuest. If it's the first bus, then the REQuest is queued to be serviced next in line. If it's not the first bus, then it changes the message to a NACK and reflects it back for retry at a later time.
If the arriving control signal is a "done" (ACK) message, then it forwards the "done" message back out the port to which it was assigned, and resets the bus-status flag.
Different bus types may be cascaded together. This device uses two Cascade
Bus constants: CB_transfer_rate, and CB_latency (transfer rate and
latency). They are defined in the file "parameters.sim". These
variables will be overwritten by local instance variables, if they exist.
The corresponding instance variables are optionally defined as CSIM attributes
or macros and have names, respectively: cb_transfer_rate and cb_latency.
Each Cascade Bus element can have a different transfer rate and latency.
Each Cascade Bus element sets the lowest transfer rate value by comparing
its own rate to that previously set. The latencies get added up by each
Cascade Bus element. The receiving Bus NIC gets the resultant transfer
rate and latency and uses it to determine the total transfer delay.
Different bus types may be cascaded together. Each Cascade Bus element can have a different transfer rate and latency. Each Cascade Bus element sets the lowest transfer rate value by comparing its own rate to that previously set. The latencies get added up by each Cascade Bus element. The receiving Bus NIC gets the resultant transfer rate and latency and uses it to determine the total transfer delay. The constants for the Cascade Bus transfer rate, latency, and overhead are defined as macros in the "parameters.sim" file. The macro names are: CB_transfer_rate, CB_latency, and CB_transfer_ovrhd respectively. The constants PACKET_HEADER_SIZE and CB_PACKET_SIZE are also specified in the "parameters.sim" file. All five of these variables will be overwritten by local instance variables, if they exist. The corresponding instance variables are optionally defined as CSIM variables or macros and have names: cb_packet_header_size, cb_packet_size, cb_transfer_rate, cb_latency, and cb_transfer_ovrhd.
The local side process handles outgoing messages one at a time. It waits for a message to come from the local side. When it does, it checks to see if the NIC is in the idle state. If it is, then it sets it's state to sending, and forwards the message out the remote side, and goes to sleep, until it will be eventually reinstated when the returning "ACK" signal is received by the remote side process. The remote side process watches for either an incoming "REQuest" or a returning "ACK" message. If a "request" is received, then the remote side process spawns a sub-thread (called EOM), that delays for the appropriate transfer time, awakens and sends the "ACK" signal back and also forwards the packet to the local_side port.
1.11 Cascadable Bus Buffer- bus_buffer.sim
This is a CSIM description of the inter-bus buffer for cascade bus models. The function served by this device is to isolate two cascade bus networks and to buffer data transfers between the buses. This also allows each bus to run at its highest transfer rate and not be slowed down by the other slower bus.
Two of these devices need to be used and configured back-to-back in a buffer module with a common local-side link. The function of the buffer module is to forward a message/packet received on one Cascade Bus to the other Cascade Bus. It buffers it and forwards it in the same format and length as it receives it. No packetization is done by the module. It also handles the handshaking signals on each Cascade Bus.
The bus_buffer model consists of two main processes; one handles the local bus side and the other handles the CascadeBus or "remote" side. The remote side behaves like a bus_nic model, but without packetization. Due to the nature of the blocking in the cascade bus models, the inherent transfer-delay mechanism of CSIM is not sufficent by itself to account for the link communication delays. (This is because we cannot know in general when a packet transfer actually begins moving data due to possible blocking.) Therefore, we have implemented a protocol that accurately accounts for data movement that uses two control-signal types. The first, called "REQuest", opens a pathway through the bus network, and the second, called "ACK", returns backward through the pathway after the appropriate transfer delay. As it does, the (ACK) signal closes the pathway.
Because the data transfer delay for a packet is entirely accounted for with a time-delay statement within the NIC model before returning the "ACK" message, there must be no delays on the network links. Therefore, the data-rate of the links as specified in the CSIM topology table should be set to infinity (a very high number). In the network, data begins flowing into the destination NIC once the wormhole has been opened to it. Data then continues to flow for packet_length/transfer_rate seconds. Then the path is freed. Therefore, the time-delay in the NIC model for reflecting the "ACK" signal should be packet_length (in bytes) divided by the transfer rate (CB_transfer_rate) plus a latency delay (CB_latency) for each bus element through which the signal has passed. An additional fixed transfer overhead factor (CB_transfer_ovrhd) is added to the delay time. This gives the time delay in uSec. Each Cascade Bus element can have a different transfer rate and latency. Each Cascade Bus element sets the lowest transfer rate value by comparing its own rate to that previously set. The latencies get added up by each Cascade Bus element. The receiving Bus NIC gets the resultant transfer rate and latency and uses it to determine the total transfer delay.
The constants for the Cascade Bus transfer rate, latency, and overhead are defined as macros in the "parameters.sim" file. The macro names are: CB_transfer_rate, CB_latency, and CB_transfer_ovrhd respectively. The constants PACKET_HEADER_SIZE and CB_PACKET_SIZE are also specified in the "parameters.sim" file. These variables will be overwritten by local instance variables, if they exist. The corresponding instance variables are optionally defined as CSIM variables or macros and have names: cb_transfer_rate, cb_latency, cb_transfer_ovrhd, cb_packet_header_size and cb_packet_size.
The local side process handles outgoing messages one at a time. It waits for a message to come from the local side. When it does, it checks to see if the NIC is in the idle state. If it is, then it sets it's state to sending, and forwards the message out the remote side, and goes to sleep, until it will be eventually reinstated when the returning "ACK" signal is received by the remote side process.
The remote side process watches for either an incoming "REQuest" or a returning "ACK" message. If a "request" is received, then the remote side process spawns a sub-thread (called EOM), that delays for the appropriate transfer time, awakens and sends the "ACK" signal back and also forwards the packet to the local_side port.
1.12 Cascadable Bus Buffer Module- cbuf_module.sim
This is a CSIM structural model of the Cascade Bus buffer module. This module is made up of two Bus Buffer models connected back-to-back at the local side. The function served by this device is to isolate two cascade bus networks and to buffer data transfers between the buses.
The function of the buffer module is to forward a message/packet received
on one Cascade Bus to the other Cascade Bus. It buffers it and forwards
it in the same format and length as it receives it. No packetization is
done by this module. It also handles the handshaking signals on each Cascade
Bus.
yyy#xxxxx
where:
yyy is an optional instantiation string or number. the entire name prior to the pound sign is arbitrary. The instantiation string or number is used to make each box-name unique.
xxxxx is some int or real delay in simulation time units.
Devices of type "HW_delay_block" will block input until the specified delay. Devices of type "HW_delay_nonblock" act like a FIFO pipe and will merely delay messages the specified amount of of time: an arbitrary no. of messages (specified by the parameter MAXDELAYB4BLOCK) can be held simultaneously by the device--after which the device blocks.
Both devices act in a full duplex manner. Both will delay messages coming from either port--either while blocking or passing-- before releasing the message on the other port INDEPENDENTLY of whatever is happening in the other direction. THIS DEVICE ACTS LIKE TWO SEPARATE DEVICES operating each on data going in opposite directions. To make these devices act in a half duplex or simplex manner, manually set an arc attached to one of the sides to be half duplex or simplex.
NOTE: The method used here of pulling the delay-parameter out of the
device's instantiation name, will become obsolete when eventually ATL enhances
CSIM allowing parameters to HW boxes.
The XBAR_SWITCH_RATE constant is defined in "parameters.sim". An instance attribute called, generic_xbar_rate, if set, overwrites the global XBAR_SWITCH_RATE value. An attribute called infinite_xbar_rate may be set to 1 to allow the generic_xbar to run at an infinite transfer rate or zero delay.
This switch should be used with all its ports connected to links with zero delay or set to very high transfer rate. This device should normally be used as a local switch. It should generally not be used to form a network of multiple switches. If multiple switches are concatenated, be aware that the transfers will behave as a store and forward network where the entire message is held up at each switch for the duration of the transfer before being forwarded to the next switch.
This model is implemented as a major process thread, the port handler, that is instantiated multiple times, once for each of the ports on the crossbar switch. Each port handler waits for a message to arrive at its port. When it detects that a message has arrived, it checks the message's destination output port to see if it's available. If the output port is found to be in use by another message transfer, the message is queued at the output port's queue and waits until all messages ahead of it are transferred out. If it's available, it sets it to be in-use and delays for a time that is the message length divided by the switch rate. At the end of that time interval, it sends the message out the output port and releases the output port. The port handler does not process any new messages until its previous message has been sent out. Each output port has its own queue.
It waits for a control signal to arrive on its port. If the arriving control signal is a "REQuest", then it checks to see if the requested output port is available. If so, then it assigns the requested output port and this process's input port as being "a pair in-use", and forwards the request message out that requested output port. Otherwise, if the requested port is already in-use, then it changes the message to a NACK and reflects it back.
If a new request has a higher priority than an existing connection, then a "preempt" message is generated and sent out the forward port of the conflicting pair. Soon, a "done" (ACK) message will come back to release the conflicted ports, and when it does, then the preempting request is serviced.
If the arriving control signal is a "done" message, then it forwards the "done" message back out the port to which it was assigned, and de-assigns the port pair.
This model consists of two main processes; one handles the local bus side and the other handles the Raceway or "remote" side.
Due to the nature of the blocking protocol of the Raceway network, the default transfer-delay mechanism of CSIM-links is not sufficent by itself to account for the link communication delays. (This is because we cannot know in general when a packet transfer actually begins moving data due to possible blocking.) Therefore, we have implemented a protocol that accurately accounts for RACEway data movement that uses two control-signal types. The first, called "REQuest", opens a pathway through the switch network, and the second, called "ACK", returns backward through the pathway after the appropriate transfer delay. As it does, the (ACK) signal closes the pathway.
Because the data transfer delay for a packet is entirely accounted for with a time-delay statement within the NIC model before returning the "ACK" message, there must be no delays on the network links. Therefore, the data-rate of the links as specified in the CSIM topology table should be set to infinity (a very high number). In the RACEway network, data begins flowing into the destination NIC once the wormhole has been opened to it. Data then continues to flow for packet_length/transfer_rate seconds. Then the path is freed. Therefore, the time-delay in the NIC model for reflecting the "ACK" signal should be packet_length (in bytes) divided by transfer_rate, plus the per xbar latency. This gives the time delay in uSec, since the RACEway transfer rate is 160-Bytes/uSec (=160MBytes/sec), and the per xbar delay is 3-clock ticks, or 75-nS (=0.075-uS). An additional fixed transfer overhead factor is added to the delay time. The constants for the RACEway transfer rate and overhead are defined as macros in the "parameters.sim" file. The macro names are: RACE_transfer_rate and RACE_transfer_ovrhd respectively.
The local side process handles outgoing messages one at a time. It waits for a message to come from the local side. When it does, it checks to see if the NIC is in the idle state. If it is, then it sets it's state to sending, and forwards the message out the remote side, and goes to sleep, until it will be eventually reinstated when the returning "ACK" signal is received by the remote side process.
The remote side process watches for either an incoming "REQuest" or a returning "ACK" message. If a "request" is received, then the remote side process spawns a sub-thread (called EOM), that delays for the appropriate transfer time, awakens and sends the "ACK" signal back and also forwards the packet to the local_side port. This must be done by a separate thread, because the main remote_side process must always be ready to receive "PREEMPT" messages.
It waits for a control signal to arrive on its port. If the arriving control signal is a "REQuest", then it checks to see if the requested output port is available. If so, then it assigns the requested output port and this process's input port as being "a pair in-use", and forwards the request message out that requested output port. Otherwise, if the requested port is already in-use, then it changes the message to a NACK and reflects it back.
If a new request has a higher priority than an existing connection, then a "preempt" message is generated and sent out the forward port of the conflicting pair. Soon, a "done" (ACK) message will come back to release the conflicted ports, and when it does, then the preempting request is serviced.
If the arriving control signal is a "done" message, then it forwards the "done" message back out the port to which it was assigned, and de-assigns the port pair.
Because the data transfer delay for a packet is entirely accounted for with a time-delay statement within the NIC model before returning the "ACK" message, there must be no delays on the Raceway++ network links. Therefore, the data-rate of the links as specified in the CSIM topology table should be set to infinity (a very high number). To further understand how this Raceway transfer model operates, read the description of the RacePP_NIC model.
The Race++ crossbar model allows for the use of alternate redundant output ports
to be used in successive packet transfers. If any of the alternate ports is
busy, it attempts to route the data to an alternate port. The first output port
that it uses is the port specified by the transfer routing list. The use of
alternate output ports is specified by device attributes as follows:
pX_alternate = Y
where X is the output port number as used by the route_list and Y is the
alternate output port number to be attempted if X is busy. Any number of
available output ports may be used. For example, to use ports 1, 2 and 3 as
alternate output ports, set the following device attributes:
p1_alternate = 2
p2_alternate = 3
p3_alternate = 1
This model consists of two main processes; one handles the local bus side and the other handles the Raceway or "remote" side.
Due to the nature of the blocking protocol of the Raceway++ network, the default transfer-delay mechanism of CSIM-links is not sufficent by itself to account for the link communication delays. (This is because we cannot know in general when a packet transfer actually begins moving data due to possible blocking.) Therefore, we have implemented a protocol that accurately accounts for RACEway++ data movement that uses two control-signal types. The first, called "REQuest", opens a pathway through the switch network, and the second, called "ACK", returns backward through the pathway after the appropriate transfer delay. As it does, the (ACK) signal closes the pathway.
Because the data transfer delay for a packet is entirely accounted for with a time-delay statement within the NIC model before returning the "ACK" message, there must be no delays on the network links. Therefore, the data-rate of the links as specified in the CSIM topology table should be set to infinity (a very high number). In the RACEway++ network, data begins flowing into the destination NIC once the wormhole has been opened to it. Data then continues to flow for packet_length/transfer_rate seconds. Then the path is freed. Therefore, the time-delay in the NIC model for reflecting the "ACK" signal should be packet_length (in bytes) divided by transfer_rate, plus the per xbar latency. This gives the time delay in uSec, since the RACEway transfer rate is 320-Bytes/uSec (=320MBytes/sec), and the per xbar delay is 3-clock ticks, or 37.5-nS (=0.0375-uS). An additional fixed transfer overhead factor is added to the delay time.
The local side process handles outgoing messages one at a time. It waits for a message to come from the local side. When it does, it checks to see if the NIC is in the idle state. If it is, then it sets it's state to sending, and forwards the message out the remote side, and goes to sleep, until it will be eventually reinstated when the returning "ACK" signal is received by the remote side process.
The remote side process watches for either an incoming "REQuest" or a returning "ACK" message. If a "request" is received, then the remote side process spawns a sub-thread (called EOM), that delays for the appropriate transfer time, awakens and sends the "ACK" signal back and also forwards the packet to the local_side port. This must be done by a separate thread, because the main remote_side process must always be ready to receive "PREEMPT" messages.
The constants for the RACEway++ transfer rate, overhead and XBAR latency
are defined as macros in the "parameters.sim" file. So are the parameters
used for packetization, packet size and packet header size defined in parameters.sim.
The macro names are: RACEPP_transfer_rate, RACEPP_transfer_ovrhd,
RACEPP_xbar_latency, PACKET_SIZE and PACKET_HEADER_SIZE, respectively.
They may be overwritten by instance attributes called: racepp_transfer_rate,
racepp_transfer_ovrhd, racepp_xbar_latency, racepp_packet_size and racepp_packet_header_size,
respectively.
It waits for a control signal to arrive on its port. If the arriving control signal is a "REQ", then it checks to see if the requested output port is available. If so, then it assigns the requested output port and this process's input port as being "a pair in-use", and forwards the request-message out that requested output port. Otherwise, if the requested port is already in-use, then it queues the request on the port's waiting-queue.
If the arriving control signal is an ACK message, then it forwards the ACK message back out the port to which it was assigned, de-assigns the port pair, and checks to see if there are any waiting REQ messages in either of the port's waiting-queues. If there are, then it activates them as described above (as if they just arrived).
This model consists of two main processes; one handles the local processor bus side and the other handles the Myrinet or "remote" side. The remote side contains two states and uses a second thread (b) to represent the second state.
Due to the nature of the blocking protocol of the Myrinet network, the inherent transfer-delay mechanism of CSIM is not sufficent by itself to account for the link communication delays. (This is because we cannot know in general when a packet transfer actually begins moving data due to possible blocking.) Therefore, we have implemented a protocol that accurately accounts for Myrinet data movement that uses two control-signal types. The first, called "REQ", opens a pathway through the switch network, and the second, called "ACK", returns backward through the pathway after the appropriate transfer delay. As it does, the ACK signal closes the pathway.
Because the data transfer delay for a packet is entirely accounted for with a time-delay statement within the LANai model before returning the ACK message, there must be no delays on the network links. Therefore, the data-rate of the links as specified in the CSIM topology table should be set to infinity (a very high number). In the Myrinet network, data begins flowing into the destination LANai once the wormhole has been opened to it. Data then continues to flow for packet_length/transfer_rate seconds. Then the path is freed. An additional fixed transfer overhead factor is added to the delay time. The constants for the Myrinet transfer rate and overhead are defined as macros in the "parameters.sim" file. The macro names are: Mnet_transfer_rate and Mnet_transfer_ovrhd respectively.
The LANai supports full duplex communication on the Myrinet side. The local side process handles outgoing messages one at a time. It waits for a message to come from the local side. When it does, it simply forwards it out the remote side, sets it's state to pending_send, and goes to sleep, until it will be eventually reinstated when the returning ACK signal is received by the remote side process.
The remote side process watches for either an incoming REQ or a returning
ACK message. If a REQ is received, then the remote side process spawns
a sub-thread (called remote_side_b), that delays for the appropriate transfer
time, awakens and sends the ACK signal back and also forwards the packet
to the local_side port. This must be done by a separate thread, because
the main remote_side process must always be ready to receive ACK messages
in response to outgoing packets from the local side. When an ACK signal
is encountered, the remote_side process simply retriggers the local_side
process which was pending.
(Questions, Comments, & Suggestions: cstrasbe@atl.lmco.com)