Introduction

Abstractions

The MRNet distribution has two main components: libmrnet , a library that is linked into a tool's front-end and back-end components, and mrnet_commnode , a program that runs on intermediate nodes interposed between the application front-end and back-ends. libmrnet exports an API (see See C++ API Reference) that enables I/O interaction between the front-end and groups of back-ends via MRNet. The primary purpose of mrnet_commnode is to distribute data processing functionality across multiple computer hosts and to implement efficient and scalable group communications. In addition, there is another component, libmrnet_lightweight , which exports an API (see See C API Reference) that enables I/O interaction between the front-end and groups of "lightweight" back-ends via MRNet. The following sub-sections describe the lower-level components of the MRNet API in more detail.

End-Points

Communicators

Streams

Filters

See Adding New Filters describes facilities for adding new user-defined transformation and synchronization filters.

A Simple Example

The MRNet Interface

MRNet Instantiation

The MRNet API

Standard MRNet relies on the back-end nodes supporting C++ libraries. However, we have also created a lightweight backend library with a pure C interface. The instantiation process is the same and both methods of process instantation are supported, although the API interface is slightly different.

C++ API Reference

Class Network

The corresponding lightweight backend API class is See Class Network.

Network * Network::CreateNetworkFE(

const char * topology,

const char * backend_exe,

const char ** backend_argv,

const std::map< std::string, std::string>* attrs=NULL,

bool rank_backends = true,

bool using_memory_buffer = false );

The front-end constructor method that is used to instantiate the MRNet process tree. topology is the path to a configuration file that describes the desired process tree topology.

backend_exe is the path to the executable to be used for the application's back-end processes. backend_argv is a null terminated list of arguments to pass to the back-end application upon creation. If backend_exe is NULL, no back-end processes will be started, and the leaves of the topology specified by topology will be instances of mrnet_commnode .

attrs is a pointer to a map of attribute-value string pairs. attrs is currently used only on Cray XT platforms; for other platforms, it takes the default value of NULL. On Cray XT, when communication or back-end processes of the MRNet tree are to be co-located with application processes, attrs must contain a string pair that maps the string "apid" to a valid ALPS application id string, which is a unique identifier for an application process started using ALPS aprun.

rank_backends indicates whether the back-end process ranks should begin at 0, similar to MPI rank numbering, and defaults to true.

If using_memory_buffer is set to true (default is false), the topology parameter is actually a pointer to a memory buffer containing the specification, rather than the name of a file.

When this function completes without error, all MRNet processes specified in the topology will have been instantiated. You may use Network::has_error to check for successful completion. The explicit use of the Network constructor is now deprecated.

Network * Network::CreateNetworkBE( int argc, char ** argv );

The back-end constructor method that is used when the process is started due to a front-end network instantiation. MRNet automatically passes the necessary information to the back-end process using the program argument vector ( argc/argv ) by inserting it after the user-specified arguments. The explicit use of the Network constructor is now deprecated.

In the "back-end attach" mode of network instantiation, where the back-end is not launched directly by MRNet, the back-end program must construct a suitable argument vector. Typically, the front-end program will obtain information about the leaf mrnet_commnode processes using the NetworkTopology class, and pass this information to back-ends using external communication channels (e.g., a shared file system). The back-ends choose a leaf process as a parent, and use that parent's host, port, and rank information to attach. Each back-end must choose a unique value for its local rank; this value must be larger than any of the ranks of the processes in the existing network. The following code shows how to construct a valid argument vector:

char parHostname[64], myHostname[64], parPort[6], parRank[6], myRank[6];

// fill parent data here using info from front-end

gethostname( myHostname, 64 );

sprintf( myRank, "%d", <unique rank> );

be_argc = 6;

char* be_argv[be_argc];

be_argv[0] = argv[0];

be_argv[1] = parHostname;

be_argv[2] = parPort;

be_argv[3] = parRank;

be_argv[4] = myHostname;

be_argv[5] = myRank;

void Network::~Network();

Network::~Network tears down the MRNet process tree when the Network object is deleted. Note that Network::shutdown_Network is deprecated.

void Network::waitfor_ShutDown();

Network::waitfor_ShutDown can be used by back-ends to block until the network has been shut down by the front-end.

bool Network::is_ShutDown();

Back-ends use this method to query if the network has been shut down; returns true if it has been shut down, false otherwise.

bool Network::set_FailureRecovery( bool enable );

Network::set_FailureRecovery is used by a front-end to control whether internal communication processes and back-ends will automatically re-connect to a new parent when their parent terminates unexpectedly. By default, failure recovery is enabled and processes will re-connect. Call this method with enable set to false to turn off automatic failure recovery. This method returns true if the setting has been applied successfully, false otherwise.

bool Network::has_Error( );

Network::has_error returns true if an error has occured during the last call to a Network method. Network::print_error can be used to print a message describing the exact error.

ErrorCode Network::get_Error( );

Network::get_Error returns an ErrorCode for an error that occured during the last call to a Network method. Network::get_ErrorStr can be used to retrieve a message string describing the error.

const char * Network::get_ErrorStr( ErrorCode code );

Network::get_ErrorStr returns a character string describing the error indicated by code.

void Network::print_error( const char * error_msg );

Network::print_error prints a message to stderr describing the last error encountered during a Network method. It first prints the null-terminated string error_msg followed by a colon, then the actual error message followed by a newline.

std::string Network::get_LocalHostName();

Network::get_LocalHostName returns the name of the host on which the local MRNet process is running.

Port Network::get_LocalPort();

Network::get_LocalPort returns the listening port of the local MRNet process.

Rank Network::get_LocalRank();

Network::get_LocalRank returns the rank of the local MRNet process.

int Network::load_FilterFunc( const char * so_file, const char* func );

This method, used for loading new filter operations into the Network is conveniently similar to the conventional dlopen facilities for opening a shared object and dynamically loading symbols defined within.

so_file is the path to a shared object file that contains the filter function to be loaded and func_name is the name of the function to be loaded.

On success, Network::load_FilterFunc returns the id of the newly loaded filter which may be used in subsequent calls to Network::new_Stream . A value of -1 is returned on failure.

int Network::recv(

int * tag,

PacketPtr & packet,

Stream ** stream,

bool blocking = true );

Network::recv is used to invoke a stream-anonymous receive operation. Any packet available (i.e., addressed to any stream) will be returned (in roughly FIFO order).

otag will be filled in with the integer tag value that was passed by the corresponding Stream::send operation. packet is the packet that was received. A pointer to the stream to which the packet was addressed will be returned in stream .

blocking is used to signal whether this call should block or return if data is not immediately available; it defaults to a blocking call.

A return value of -1 indicates an error, 0 indicates no packets were available, and 1 indicates success.

bool Network::enable_PerformanceData(

perfdata_metric_t metric,

perfdata_context_t context );

Network::enable_PerformanceData uses Stream::enable_PerformanceData to start the recording of performance data of the specified metric type for the given context on all streams. Returns true on success, false otherwise. See MRNet Stream Performance Data describes the supported metric and context types. See Stream::enable_PerformanceData for additional details.

bool Network::disable_PerformanceData(

perfdata_metric_t metric,

perfdata_context_t context );

Network::disable_PerformanceData stops the recording of performance data of the specified metric type for the given context on all streams. Returns true on success, false otherwise. See Stream::disable_PerformanceData for additional details.

bool Network::collect_PerformanceData(

std::map< int, rank_perfdata_map > & results,

perfdata_metric_t metric,

perfdata_context_t context,

int aggr_filter_id = TFILTER_ARRAY_CONCAT );

Network::collect_PerformanceData collects the performance data of the specified metric type for the given context on all streams. The performance data of each stream is passed through the transformation filter identified by aggr_filter_id . The data for all streams is stored within the map results , keyed by stream identifier. Returns true on success, false otherwise. See Stream::collect_PerformanceData for additional details.

void Network::print_PerformanceData(

perfdata_metric_t metric,

perfdata_context_t context );

Network::enable_PerformanceData uses Stream::print_PerformanceData to print recorded performance data of the specified metric type for the given context on all streams. Data is printed to the MRNet log files. See Stream::print_PerformanceData for additional details.

unsigned int Network::num_EventsPending( );

Network::num_EventsPending returns the number of pending events available for retrieval using Network::next_Event .

Event * Network::next_Event( );

This method returns a pointer the next pending Event, or NULL if no events are available. Each event has an associated EventClass, one of Event::DATA_EVENT , Event::TOPOLOGY_EVENT , or Event::ERROR_EVENT , that can be queried using Event::get_Class. Similarly, each event has an associated EventType that can be queried using Event::get_Type.

void Network::clear_Events( );

This method clears all pending events.

bool Network::register_EventCallback(

EventClass eclass,

EventType etyp,

evt_cb_func cb_func,

void * cb_func_data );

Network::register_EventCallback allows users to register a callback function to be called when events are generated.

eclass should be set to one of Event::DATA_EVENT , Event::TOPOLOGY_EVENT , or Event::ERROR_EVENT .

etyp should be set to either Event::EVENT_TYPE_ALL , to have the function called when any event within the specified EventClass occurs, or one of the valid class-specific EventType values (see the classes DataEvent, TopologyEvent, and ErrorEvent in "mrnet/Event.h" for the class-specific types).

The type evt_cb_func is defined as `void (*evt_cb_fn)( Event* e, void* cb_data )'. All user-defined callback functions must use the same function prototype. When an event occurs, all callbacks registered for that type of event will be called. Each function is passed a pointer to the Event , and the value of the auxiliary data pointer cb_func_data given at registration, which may be NULL.

void Network::remove_EventCallback(

evt_cb_func cb_func,

EventClass eclass,

EventType etyp );

This method removes cb_func from the list of functions to be called for the specified EventClass and EventType. If eclass is given as Event::EVENT_CLASS_ALL , the function will be removed for all events. etyp can be given as Event::EVENT_TYPE_ALL to remove the function for all types of events in the given eclass .

void Network::remove_EventCallbacks(

EventClass eclass,

EventType etyp );

This method removes all functions to be called for the specified EventClass and EventType. If eclass is given as Event::EVENT_CLASS_ALL , all callback functions will be removed for all events. etyp can be given as Event::EVENT_TYPE_ALL to remove all functions registered for all types of events in the given eclass .

int Network::get_EventNotificationFd( EventClass eclass );

Network::get_EventNotificationFd returns a file descriptor that can be used with select or poll to receive notification of interesting DATA, TOPOLOGY, or ERROR events.

eclass should be set to one of Event::DATA_EVENT , Event::TOPOLOGY_EVENT , or Event::ERROR_EVENT . Event::DATA_EVENT can be used by both front-end and back-end processes to provide notification that one or more data packets have been received. Event::TOPOLOGY_EVENT and Event::ERROR_EVENT can only be used by front-end processes, and provide notification when the front-end observes a change in network topology or an error, respectively.

When the file descriptor has data available (for reading), you should call Network::clear_EventNotificationFd before taking action on the notification. When notifications are no longer needed, use Network::close_EventNotificationFd .

NOTE: this functionality is not available on Windows platforms.

void Network::clear_EventNotificationFd( EventClass eclass );

This method resets the event notification file descriptor returned from Network::get_EventNotificationFd . eclass should be set to one of Event::DATA_EVENT , Event::TOPOLOGY_EVENT , or Event::ERROR_EVENT .

NOTE: this functionality is not available on Windows platforms.

void Network::close_EventNotificationFd( EventClass eclass );

This method closes the event notification file descriptor returned from Network::get_EventNotificationFd . eclass should be set to one of Event::DATA_EVENT , Event::TOPOLOGY_EVENT , or Event::ERROR_EVENT .

NOTE: this functionality is not available on Windows platforms.

bool is_LocalNodeChild( ) const;

bool is_LocalNodeParent( ) const;

bool is_LocalNodeInternal( ) const;

bool is_LocalNodeFrontEnd( ) const;

bool is_LocalNodeBackEnd( ) const;

These methods return true if the local process is of the specified type, false otherwise.

Class NetworkTopology
  • Instances of NetworkTopology are network specific, so they are created when a Network is instantiated. MRNet API users should not need to create their own NetworkTopology instances.
  • The corresponding lightweight backend API class is See Class NetworkTopology.

NetworkTopology * Network::get_NetworkTopology( );

Network::get_NetworkTopology is used to retrieve a pointer to the underlying NetworkTopology instance of a Network.

unsigned int NetworkTopology::get_NumNodes( );

This method returns the total number of nodes in the tree topology, including front-end, internal, and back-end processes.

NetworkTopology::Node * NetworkTopology::find_Node( Rank node_rank );

This method returns a pointer to the tree node with rank equal to node_rank , or NULL if not found.

NetworkTopology::Node * NetworkTopology::get_Root( );

This method returns a pointer to the root node of the tree, or NULL if not found.

void NetworkTopology::get_Leaves(

std::vector<NetworkTopology::Node * > & leaves );

This method fills the leaves vector with pointers to the leaf nodes in the topology. In the case where back-end processes are not started when the network is instantiated, a front-end process can use this function to retrieve information about the leaf internal processes to which the back-ends should attach.

void NetworkTopology::get_BackEndNodes(

std::set< NetworkTopology::Node * > & nodes );

This method fills a set with pointers to all back-end process tree nodes. Note that this method is unsafe to use while the network topology is in flux, as is the case during the "back-end attach" mode of MRNet tree instantiation.

void NetworkTopology::get_ParentNodes(

std::set<NetworkTopology::Node * > & nodes );

This method fills a set with pointers to all tree nodes that are parents (i.e., those nodes having at least one child).

void NetworkTopology::get_OrphanNodes(

std::set< NetworkTopology::Node * > & nodes );

This method fills a set with pointers to all tree nodes that have no parent due to a failure.

void NetworkTopology::get_TreeStatistics(

unsigned int & num_nodes,

unsigned int & depth,

unsigned int & min_fanout,

unsigned int & max_fanout,

double & avg_fanout,

double & stddev_fanout );

This method provides users statistics about the tree topology by setting the passed parameters.

num_nodes is the total number of tree nodes (same as the value returned by NetworkTopology::get_NumNodes ), depth is the depth of the tree (i.e., the maximum path length from root to any leaf), min_fanout is the minimum number of children of any parent node, max_fanout is the maximum number of children of any parent node, avg_fanout is the average number of children across all parent nodes, and stddev_fanout is the standard deviation in number of children across all parent nodes.

void NetworkTopology::print_TopologyFile( const char * filename );

This method will create (or overwrite) the specified topology file filename using the current state of this NetworkTopology object.

void NetworkTopology::print_DOTGraph( const char * filename );

This method will create (or overwrite) the specified dot graph file filename using the current state of this NetworkTopology object.

std::string NetworkTopology::Node::get_HostName( );

This method returns a string identifying the hostname of the tree node.

Port NetworkTopology::Node::get_Port();

This method returns the listening port of the tree node.

Rank NetworkTopology::Node::get_Rank();

This method returns the unique rank of the tree node.

Rank NetworkTopology::Node::get_Parent();

This method returns the rank of the tree node's parent.

const std::set< NetworkTopology::Node * > & NetworkTopology::Node::get_Children( );

This method returns a set containing pointers to the children of the tree node, and is useful for navigating through the tree.

unsigned int NetworkTopology::Node::get_NumChildren( );

This method returns the number of children of the tree node.

unsigned int NetworkTopology::Node::find_SubTreeHeight( );

This method returns the height of the subtree rooted at this NetworkTopology node.

Class Communicator
  • Instances of Communicator are network specific, so their creation methods are functions of an instantiated Network object. There is no corresponding lightweight backend class.

Communicator * Network::new_Communicator();

This method returns a pointer to a new Communicator object. The object contains no end-points. Use Communicator::add_EndPoint to populate the communicator.

Communicator * Network::new_Communicator( Communicator & comm );

This method returns a pointer to a new Communicator object that contains the same set of end-points contained in comm .

Communicator * Network::new_Communicator(

std::set< CommunicationNode * > & endpoints );

This method returns a pointer to a new Communicator object that contains the provided set of end-points.

Communicator * Network::new_Communicator( std::set< Rank > & endpoints );

This method returns a pointer to a new Communicator object that contains the set of end-points corresponding to processes whose ranks are provided in the passed set.

Communicator * Network::get_BroadcastCommunicator( );

This method returns a pointer to a broadcast Communicator containing all the end-points available in the system at the time the function is called.

Multiple calls to this method return the same pointer to the Communicator object created at network instantiation. If the network topology changes, as can occur when starting back-ends separately, the object will be updated to reflect the additions or deletions. This object should not be deleted.

bool Communicator::add_EndPoint( Rank ep_rank );

This method is used to add an existing end-point with rank ep_rank to the set contained by this Communicator.

If the set of end-points in the communicator already contains the new end-point, the function returns success. This method fails if there exists no end-point defined by ep_rank . This method returns true on success, false on failure.

bool Communicator::add_EndPoint( CommunicationNode * endpoint );

This method is similar to add_EndPoint above except that it takes a pointer to a CommunicationNode object instead of a rank. Success and failure conditions are exactly as stated above. This method returns true on success and false on failure.

const std::set< CommunicationNode * > & Communicator::get_EndPoints( );

Returns a reference to the set of CommunicationNode pointers comprising the end-points in the communicator.

std::string CommunicationNode::get_HostName( );

Returns a character string identifying the hostname of the end-point represented by this CommunicationNode.

Port CommunicationNode::get_Port( );

Returns the listening port of the end-point represented by this CommunicationNode.

Rank CommunicationNode::get_Rank( );

Returns the rank of the end-point represented by this CommunicationNode.

Class Stream
  • Instances of Stream are network specific, so their creation methods are functions of an instantiated Network object. The corresponding lightweight backend API class is See Class Stream.
  • MRNet provides two types of streams, homegenous and heterogeneous. Homogenous streams use the same filters at every process participating in the stream, while heterogeneous streams allow for different filters to be used at different processes.

Stream * Network::new_Stream(

Communicator * comm,

int up_transfilter_id = TFILTER_NULL,

int up_syncfilter_id = SFILTER_WAITFORALL,

int down_transfilter_id = TFILTER_NULL );

This version of Network::new_Stream is used to create a homogenous Stream object attached to the end-points specified by a Communicator object comm .

up_transfilter_id specifies the transformation filter to apply to data flowing upstream from the application back-ends toward the front-end; the default value is TFILTER_NULL.

up_syncfilter_id specifies the synchronization filter to apply to upstream packets; the default value is SFILTER_WAITFORALL.

down_transfilter_id allows the user to specify a filter to apply to downstream data flows; the default value is TFILTER_NULL.

Stream * Network::new_Stream(

Communicator * comm,

std::string us_filters,

std::string sync_filters,

std::string ds_filters );

This version of Network::new_Stream is used to creae a heterogeneous Stream object. Users specify where packet filters are placed within the tree. Like the homogenous version of Network::new_Stream , the end-points are specified by the comm argument.

Strings are used to specify the filter placements, with the following syntax: "filter_id => rank; [filter_id => rank; ...]". If "*" is specified as the rank for an assignment, the filter will be assigned to all ranks that have not already been assigned. If a rank within comm is not assigned a filter, it will use the default filter. See $MRNET_ROOT/Examples/HeterogeneousFilters for an example of using Network::new_Stream to specify different filter types to be used within the same stream.

us_filters specifies the transformation filters to apply to data flowing upstream from the application back-ends toward the front-end.

sync_filters specifies the synchronization filters to apply to upstream packets.

ds_filters allows the user to specify filters to apply to downstream data flows.

Note that more than one filter should not be assigned to a single rank in any of these strings.

Stream * Network::get_Stream( unsigned int id );

Returns a pointer to the Stream identified by id , or NULL on failure.

unsigned int Stream::get_Id( );

Returns the integer identifier for this Stream.

const std::set< Rank > & Stream::get_EndPoints( );

Returns the set of end-point ranks for this Stream.

unsigned int Stream::size( );

Returns an integer indicating the number of end-points for this Stream.

bool Stream::is_ShutDown( );

For use by back-ends only, this method returns true if the front-end has deleted this Stream, false otherwise.

int Stream::send( int tag, const char * format_string, ... );

Invokes a data send operation on the calling Stream. tag is an integer that identifies the data in the packet. format_string is a format string describing the data in the packet (See See Format Strings for a full description.) On success, Stream::send returns 0; on failure -1.

NOTE: tag must have a value greather than or equal to the constant FirstApplicationTag defined by MRNet ( #include "mrnet/Types.h" ). Tag values less than FirstApplicationTag are reserved for internal MRNet use.

int Stream::flush();

Commits a flush of all packets currently buffered by this Stream. A successful return value of 1 indicates that all packets on the calling stream have been passed to the operating system for network transmission.

int Stream::recv( int * tag, PacketPtr & packet, bool blocking = true );

Invokes a stream receive operation. Packets received by the calling Stream will be returned by this method, one-at-a-time, in FIFO order.

tag will be filled in with the integer tag value that was passed by the corresponding Stream::send operation. packet is set to point to the received packet.

blocking determines whether the receive should block or return if data is not immediately available; it defaults to a blocking call.

A return value of -1 indicates an error, 0 indicates no packets were available, and 1 indicates success.

int Stream::get_DataNotificationFd( );

Stream::get_DataNotificationFd returns a file descriptor that can be used with select or poll to receive notification that data has arrived for a stream.

When the file descriptor has data available (for reading), you should call Stream::clear_DataNotificationFd before taking action on the notification. When notifications are no longer needed, use Stream::close_DataNotificationFd .

NOTE: this functionality is not available on Windows platforms.

void Stream::clear_DataNotificationFd( );

This method resets the data notification file descriptor returned from Stream::get_DataNotificationFd .

NOTE: this functionality is not available on Windows platforms.

void Stream::close_DataNotificationFd( );

This method closes the data notification file descriptor returned from Stream::get_DataNotificationFd .

NOTE: this functionality is not available on Windows platforms.

int Stream::set_FilterParameters(

FilterType ftype,

const char *format_str, ... ) const;

Stream::set_FilterParameters allows users to dynamically configure the operation of a stream transformation filter by passing arbitrary data in a similar fashion to Stream::send . When the filter executes, the passed data is available as a PacketPtr parameter to the filter, and the filter can extract the configuration settings.

ftype should be given as FILTER_UPSTREAM_SYNC to configure the synchronization filter, FILTER_UPSTREAM_TRANS for upstream transformation filter and FILTER_DOWNSTREAM_TRANS for downstream transformation filter.

int Stream::set_FilterParameters(

const char *format_str,

va_list params,

FilterType ftype ) const;

This method is the same as the previous method except for the filter configuration parameters are given in the va_list form.

bool Stream::enable_PerformanceData(

perfdata_metric_t metric,

perfdata_context_t context );

Stream::enable_PerformanceData starts recording performance data for the specified metric type for the given context . Returns true on success, false otherwise. See MRNet Stream Performance Data describes the metric and context types.

bool Stream::disable_PerformanceData(

perfdata_metric_t metric,

perfdata_context_t context );

Stream::disable_PerformanceData stops recording performance data for the specified metric type for the given context . Previously recorded data is not discarded, so that it can be retrieved with Stream::collect_PerformanceData . Users can enable/disable recording for a particular metric and context any number of times before collecting the results. Returns true on success, false otherwise.

bool Stream::collect_PerformanceData(

rank_perfdata_map & results,

perfdata_metric_t metric,

perfdata_context_t context,

int aggr_filter_id = TFILTER_ARRAY_CONCAT );

Stream::collect_PerformanceData collects the recorded performance data for the specified metric type for the given context . The collected data is returned in a rank_perfdata_map , which associates individual node ranks to a std::vector< perf_data_t > containing the recorded data instances. After collection, the recorded data at each nodeis discarded. Returns true on success, false otherwise.

Users can aggregate the recorded data across nodes by specifying a transformation filter with aggr_filter_id . Note that only the built-in filter types of TFILTER_SUM, TFILTER_MIN, TFILTER_MAX, TFILTER_AVG, and TFILTER_ARRAY_CONCAT are supported. By default, performance data from each node is concatenated, and results contains every recorded data instance for each node. If the summary aggregation filters are used, results will contain a single associated pair. The rank for this pair is equal to -1∞(number of aggregated ranks), and the vector contains one or more aggregated instances.

void Stream::print_PerformanceData(

perfdata_context_t metric,

perfdata_context_t context );

Stream::print_PerformanceData prints recorded performance data of the specified metric type for the given context . At each rank, the data is printed to the MRNet log files and then discarded.

Class Packet
  • A Packet encapsulates a set of formatted data elements sent on a stream. Packets are created using a format string (e.g., " %s %d " describes a null-terminated string followed by a 32-bit integer, and the packet is said to contain two data elements). MRNet front-end and back-end processes do not create instances of Packet; instead they are automatically produced from the formatted data passed to Stream::send . See Format Strings contains the full listing of data types that can be sent in a Packet.
  • When receiving a packet via Stream::recv or Network::recv , the Packet instance is stored within a PacketPtr object. PacketPtr is a class based on the Boost library shared_ptr class, and helps with memory management of packets. A PacketPtr can be assumed to be equivalent to "Packet *", and all operations on packets require use of PacketPtr.
  • The corresponding lightweight backend API class is See Class Packet.

int Packet::get_Tag( );

Returns the integer tag associated with this Packet.

unsigned short Packet::get_StreamId( );

Returns the stream id associated with this Packet.

const char * Packet::get_FormatString( );

Returns the character string specifying the data format of this Packet.

void Packet::unpack( const char * format_string, ... );

Extracts data contained within this Packet according to the format_string , which must match that of the packet. The function arguments following format_string should be pointers to the appropriate types of each data item. For string and array data types, new memory buffers to hold the data will be allocated using malloc , and it is the user's responsibility to free these strings and arrays. Note that for array data elements, an extra argument must be passed to hold each array's length.

void Packet::set_DestroyData( bool destroy );

This method can be used to tell MRNet whether or not to deallocate the string and array data members of a Packet. If destroy is true, string and array data members will be deallocated using free when the Packet destructor is executed - this assumes they were allocated using malloc . The default behavior for user-generated packets is not to deallocate (false). Turning on deallocation is useful in filter code that must allocate strings or arrays for output packets, which cannot be freed before the filter function returns.

C API Reference

return_type class::function_name( param1_type param1, ...);

translates to:

return_type class_function_name(

class class_object,

param1_type param1, ...);

Class Network

Network_t * Network_CreateNetworkBE( int argc, char ** argv );

The back-end constructor method. MRNet automatically passes the necessary information to the back-end process using the program argument vector ( argc/argv ) by inserting it after the user specified arguments. See See Network * Network::CreateNetworkBE( int argc, char ** argv ); for more information on the required arguments.

void delete_Network_t( Network_t * network );

delete_Network_t acts as a destructor for the Network_t object and cleans up internal structures before freeing the Network_t pointer.

void Network_waitfor_ShutDown( Network_t * network );

Network_waitfor_ShutDown blocks until the network has been shut down.

char Network_is_ShutDown( Network_t * network );

Returns true if the network has been shut down.

char* Network_get_LocalHostName( Network_t * network );

Network_get_LocalHostName returns the name of the host where the process is running.

Port Network_get_LocalPort( Network_t * network );

Network_get_LocalPort returns the listening port of the local process.

Rank Network_get_LocalRank( Network_t * network );

Network_get_LocalRank returns the rank of the local process.

int Network_recv(

Network_t * network,

int otag,

Packet_t * packet,

Stream_t * stream );

Network_recv is used to invoke a stream-anonymous receive operation. Any packet available (i.e., addressed to any stream) will be returned in roughly FIFO order.

otag will be filled in with the integer tag value that was passed by the corresponding Stream_send operation. packet is the packet that was received. A pointer to the Stream_t to which the packet was addressed will be returned in stream .

In standard MRNet, Network::recv had an additional parameter, blocking , to indicate whether this call should block or return if data is not immediately available. However, because the lightweight back-ends are single-threaded, there is only the blocking option; therefore this parameter has been omitted.

A return value of -1 indicates an error and 1 indictes a success.

Class NetworkTopology

NetworkTopology_t * Network_get_NetworkTopology( Network_t * network );

Network_get_NetworkTopology is used to retrieve a pointer to the underlying NetworkTopology_t instance within network .

Node_t * NetworkTopology_find_Node(

NetworkTopology_t * net_top,

Rank node_rank );

This method returns a pointer to the topology node with rank equal to node_rank , or NULL if no match is found.

Node_t * NetworkTopology_get_Root( NetworkTopology_t * net_top );

This method returns a pointer to the root node of the tree, or NULL if not found.

char * NetworkTopology_Node_get_HostName( Node_t * node );

This method returns a string identifying the hostname of the node .

Port NetworkTopology_Node_get_Port( Node_t * node );

This method returns the listening port of the node .

Rank NetworkTopology_Node_get_Rank( Node_t * node );

This method returns the rank of the node .

Rank NetworkTopology_Node_get_Parent( Node_t * node );

This method returns the rank of the node's parent.

unsigned int NetworkTopology_Node_find_SubTreeHeight( Node_t * node );

This method returns the height of the sub-tree rooted at the node .

Class Stream

void delete_Stream_t( Stream_t * stream );

delete_Stream_t acts as a destructor for the Stream_t object and cleans up internal structures before freeing the Stream_t pointer.

Stream_t * Network_get_Stream( Network_t * network, unsigned int id );

Network_get_Stream returns a pointer to a Stream_t identified by id , or NULL on failure.

unsigned int Stream_get_Id( Stream_t * stream );

This method returns the integer identifier for this Stream_t.

int Stream_send(

Stream_t * stream,

int tag,

const char * format_string, ... );

This method sends data on stream . tag is an integer that identifies the data to be sent by the stream. format_string is a format string describing the types of the data elements (see See Format Strings for a full description.) On success, Stream_send returns 0; on failure, -1.

NOTE: tag must have a value greater than or equal to the constant FirstApplicationTag defined by MRNet ( #include "mrnet_lightweight/Types.h" ). Tag values less than FirstApplicationTag are reserved for internal MRNet use.

int Stream_flush( Stream_t * stream );

This operation is currently not required in lightweight MRNet, as Stream_send will deliver the data for network transmission. This method will always return the value 1 for success.

int Stream_recv(

Stream_t * stream,

int * tag,

Packet_t * packet );

Stream_recv invokes a stream-specific receive operation. Packets addressed to the passed stream will be returned, one-at-a-time, in FIFO order.

tag will be filled in with the integer tag value that was passed by the corresponding Stream::send operation. packet is the received Packet_t.

Unlike the standard C++ Stream::recv , Stream_recv will always block if data is not immediately available.

A return value of -1 indicates an error and 1 indicates success.

Class Packet
  • When receiving a packet, it is stored within a Packet_t object. Note that standard MRNet makes use of the PacketPtr object, which is based on the Boost library shared_ptr class. However, in the lightweight back-end library, pointers to Packet_t objects are used instead.

int Packet_get_Tag( Packet_t * packet );

This method returns the integer tag associated with packet .

unsigned short Packet_get_StreamId( Packet_t * packet );

This method returns the stream id associated with packet .

char* Packet_get_FormatString( Packet_t * packet );

This method returns the character string specifying the data format of packet .

void Packet_unpack(

Packet_t * packet,

const char * format_string, ... );

This method extracts data elements contained within packet according to the format_string , which must match that of packet . The function arguments following format_string should be pointers to the appropriate types of each data element. For string and array data types, new memory bufffers to hold the data will be allocated using malloc , and it is the user's responsibility to free these strings and arrays. Note that for array data elements, an extra argument must be passed to hold each array's length.

Building and Testing MRNet

Supported Platforms and Compilers

MRNet has been developed to be highly portable; we expect it to run properly on all common Unix-based as well as Windows platforms. This being said, we have successfully built and tested MRNet on the following systems:

System Requirements

Build Configuration

UNIX> ./configure --help

shows all possible options of the command. Below, we display the MRNet-specific ones:

 

--enable-shared

Build shared library versions of MRNet and XPlat

--enable-debug

Build MRNet and XPlat with debug information

--enable-verbosebuild

Show build actions (useful for debugging build problems)

--with-startup=METHOD

Choose tree instantiation method: "ssh" (default), or "cray_xt" (Cray XT systems)

 

 

 

--with-alpstoolhelp-lib=DIR

 

 

--with-alpstoolhelp-inc=DIR

For Cray XT only, when co-locating MRNet processes with an already running application launched using ALPS.

 

Specify DIR as the absolute path to the directory containing the libalps library.

 

Specify DIR as the absolute path to the directory containing the libalps.h header file.

./configure without any options should give reasonable results, but the user may specify certain options. For example,

UNIX> ./configure CXX=g++

instructs the configure script to use g++ for the C++ compiler.

Compilation and Installation

To build MRNet:

UNIX> make

After a successful build, the following files will be present:

To build the MRNet tests and examples:

UNIX> make tests

UNIX> make examples

The tests and examples consist of front-end and back-end programs, and custom filter libraries:

To install the MRNet components (i.e., the executables, libraries, and headers) to the directories specified during configure. If --prefix is not provided to configure, the default install locations are /usr/local/{bin,lib,include}/ :

UNIX> make install

To install the MRNet tests or examples:

UNIX> make install-tests

UNIX> make install-examples

If your system does not provide the C++ Boost headers (normally installed in /usr/include/boost ), we provide the subset of Boost header files necessary for building MRNet. To install these headers:

UNIX> make install-boost

Testing the Code

The shell script, mrnet_tests.sh is placed in the binary directory with the other executables during the building of the MRNet tests as described above. This script can be used to run the MRNet test programs and check their output for errors. The script is used as follows:

UNIX> mrnet_tests.sh [ -l | -r <hostfile> | -a <hostfile> ]

[ -f <sharedobject> ] [ -lightweight ]

The -l flag is used to run all tests using only topologies that create processes on the local machine (note: running all the tests locally can take quite a while - anywhere from 30 minutes to an hour depending on the machine capabilities). The -r flag runs tests using remote machines specified in the file whose name immediately follows this flag. To run tests both locally and remotely, use the -a flag and specify a hostfile to use. To run the programs that test MRNet's ability to dynamically load filters, you must specify the absolute location of the shared object test_DynamicFilters.so produced when the tests were built. The -lightweight flag is used to run tests with both the standard back-ends and the lightweight back-ends.

Bugs, Questions, and Comments

MRNet is maintained by the Paradyn Project, University of Wisconsin-Madison. Comments and feedback whether positive or negative are encouraged.

Please report bugs to paradyn@cs.wisc.edu. Bug fixes as patches are also welcome.

The MRNet webpage is http://www.paradyn.org/mrnet/

A Complete Example: Integer Addition

A Complete MRNet Front-End

#include "mrnet/MRNet.h"

#include "IntegerAddition.h"

using namespace MRN;

 

int main(int argc, char **argv)

{

int send_val=32, recv_val=0;

int tag, retval;

PacketPtr p;

if( argc != 4 ){

printf("Usage: %s topology be_exe so_file\n", argv[0]);

exit(-1);

}

const char * topology_file = argv[1];

const char * be_exe = argv[2];

const char * so_file = argv[3];

const char * argv=NULL;

 

// Instantiates the MRNet internal nodes, using the organization

// in "topology_file," and the specified back-end application

Network * network = Network::CreateNetworkFE( topology_file,

be_exe, &argv );

 

// Make sure path to "so_file" is in LD_LIBRARY_PATH

int filter_id = network->load_FilterFunc( so_file, "IntegerAdd" );

if( filter_id == -1 ){

printf( "Network::load_FilterFunc() failure\n");

delete network;

return -1;

}

 

// A Broadcast communicator contains all the back-ends

Communicator * comm_BC = network->get_BroadcastCommunicator( );

 

// Create a stream that uses Integer_Add filter for aggregation

Stream * stream = network->new_Stream( comm_BC, filter_id,

SFILTER_WAITFORALL);

int num_backends = comm_BC->get_EndPoints().size();

 

// Broadcast a control message to back-ends to send us "num_iters"

// waves of integers

tag = PROT_SUM;

unsigned int num_iters=5;

if( stream->send( tag, "%d %d", send_val, num_iters ) == -1 ){

printf("stream::send() failure\n");

return -1;

}

if( stream->flush( ) == -1 ){

printf("stream::flush() failure\n");

return -1;

}

 

// We expect "num_iters" aggregated responses from all back-ends

for( unsigned int i=0; i<num_iters; i++ ){

retval = stream->recv(&tag, p);

if( retval == -1){

//recv error

return -1;

}

if( p->unpack( "%d", &recv_val ) == -1 ){

printf("stream::unpack() failure\n");

return -1;

}

if( recv_val != num_backends * i * send_val ){

printf("Iteration %d: Failure!\n", i);

}

else{

printf("Iteration %d: Success! recv_val(%d) == %d\n",

i, recv_val, send_val*i*num_backends );

}

}

 

if(stream->send(PROT_EXIT, "") == -1){

printf("stream::send(exit) failure\n");

return -1;

}

if(stream->flush() == -1){

printf("stream::flush() failure\n");

return -1;

}

 

// Network destruction will exit all processes

delete network;

return 0;

}

A Complete MRNet Back-End

#include "mrnet/MRNet.h"

#include "IntegerAddition.h"

 

using namespace MRN;

 

int main(int argc, char **argv)

{

Stream * stream=NULL;

PacketPtr p;

int tag=0, recv_val=0, num_iters=0;

Network * network = Network::CreateNetworkBE( argc, argv );

do {

if ( network->recv(&tag, p, &stream) != 1){

fprintf(stderr, "stream::recv() failure\n");

return -1;

}

switch(tag){

case PROT_SUM:

p->unpack( "%d %d", &recv_val, &num_iters );

 

// Send num_iters waves of integers

for( unsigned int i=0; i<num_iters; i++ ){

if( stream->send(tag, "%d", recv_val*i) == -1 ){

printf("stream::send(%%d) failure\n");

return -1;

}

if( stream->flush( ) == -1 ){

printf("stream::flush() failure\n");

return -1;

}

}

break;

case PROT_EXIT:

printf("Processing PROT_EXIT ...\n");

break;

default:

printf("Unknown Protocol: %d\n", tag);

break;

}

} while ( tag != PROT_EXIT );

 

network->waitfor_ShutDown();

delete network;

return 0;

}

A Complete MRNet Lightweight Back-End

#include "mrnet_lightweight/MRNet.h"

#include "IntegerAddition_lightweight.h"

 

int main(int argc, char **argv)

{

Stream_t * stream;

Packet_t* p = (Packet_t*)malloc(sizeof(Packet_t));

int tag=0, recv_val=0, num_iters=0;

Network_t * net = Network_CreateNetworkBE( argc, argv );

do {

if( Network_recv(net, &tag, p, &stream) != 1 ) {

printf("BE: stream::recv() failure\n");

break;

}

switch(tag) {

case PROT_SUM:

Packet_unpack(p, "%d %d", &recv_val, &num_iters );

// Send num_iters waves of integers

unsigned int i;

for( i=0; i<num_iters; i++ ) {

printf("BE: Sending wave %u ...\n", i);

if( Stream_send(stream,tag, "%d",

recv_val*i) == -1 ){

printf("BE: stream::send(%%d) failure\n");

tag = PROT_EXIT;

break;

}

if( Stream_flush(stream ) == -1 ){

printf("BE: stream::flush() failure\n");

tag = PROT_EXIT;

break;

}

sleep(2); // stagger sends

}

break;

case PROT_EXIT:

if( Stream_send(stream,tag, "%d", 0) == -1 ) {

printf("BE: stream::send(%%s) failure\n");

break;

}

if( Stream_flush(stream ) == -1 ) {

printf("BE: stream::flush() failure\n");

}

break;

 

default:

fprintf(stderr, "BE: Unknown Protocol: %d\n", tag);

tag = PROT_EXIT;

break;

}

} while ( tag != PROT_EXIT );

 

if ( p != NULL )

free (p);

Network_waitfor_ShutDown(net);

delete_Network_t(net);

return 0;

}

A MRNet Filter: Integer Addition

extern "C" {

 

//Must declare the format of data expected by the filter

const char * IntegerAdd_format_string = "%d";

void IntegerAdd( std::vector< PacketPtr > & packets_in,

std::vector< PacketPtr > & packets_out,

std::vector< PacketPtr > & /* packets_out_reverse */,

void ** /* filter state */,

PacketPtr & /* configuration parameters */,

TopologyLocalInfo & /* local topology information */)

{

int sum = 0;

for( unsigned int i = 0; i < packets_in.size( ); i++ ) {

PacketPtr cur_packet = packets_in[i];

int val;

cur_packet->unpack("%d", &val);

sum += val;

}

PacketPtr new_packet ( new Packet(packets_in[0]->get_StreamId(),

packets_in[0]->get_Tag(), "%d", sum ) );

packets_out.push_back( new_packet );

}

 

} /* extern "C" */

Process-Tree Topologies

MRNet allows a tool to specify a node allocation and process connectivity tailored to its computation and communication requirements and to the system where the tool will run. Choosing an appropriate MRNet configuration can be difficult due to the complexity of the tool's own activity and its interaction with the system. This section describes how users define their own process topologies, and the mrnet_topgen utility provided by MRNet to facilitate generation of topology specification files.

Topology File Format

hostname1:0 => hostname1:1 hostname1:2 ;

meaning a process on hostname1 with instance id 0 has two children, with instance ids 1 and 2, running on the same host. MRNet will parse the topology file without error if the file properly defines a tree in the mathematical sense (i.e. a tree must have a single root, no cycles, full connection, and no node can be its own descendant). Please note that the hostname associated with the root of the topology must match the host where the front-end process is run, or a run-time error will occur.

NOTE: A single topology specification line may span multiple physical lines to improve readability. For example:

hostname1:0 =>

hostname1:1

hostname1:2

;

An Example Topology File

nutmeg:0 => c01:0 c02:0 c03:0 c04:0 ;

c03:0 => c05:0 ;

c04:0 => c06:0 c07:0 c08:0 c09:0 ;

# nutmeg

# |

# -------

# / | | \

# c01 c02 c03 c04

# | |

# c05 |

# -------

# / | | \

# c06 c07 c08 c09

Topology File Generator

host1:4

host2

host3:2

host2

The above host list file results in three hosts being available for topology process placement, with host1 having four available slots, and host2 and host3 each having two available slots. The generator program also allows users to specify different host lists for the placement of internal communication processes and back-end processes (see the mrnet_topgen usage for more information).

Adding New Filters

Defining an MRNet Filter

A filter function has the following signature:

void filter_name(

std::vector< PacketPtr > & packets_in,

std::vector< PacketPtr > & packets_out,

std::vector< PacketPtr > & packets_out_reverse,

void ** filter_state,

PacketPtr & config_params,

const TopologyLocalInfo & topol_info );

Fault-Tolerant Filters

A filter state extraction function has the following signature:

PacketPtr filter_name_get_state( void ** filter_state, int stream_id );

Creating and Using MRNet Filter Shared Object Files

extern "C" {

and

}

Format Strings

After the % character that introduces a conversion, there may be a number of flag characters. u , h , l, and a are special modifiers meaning unsigned, short, long and array, respectivley. The full set of conversions are:

Format String Conversions

c

signed 8-bit character

uc

unsigned 8-bit character

ac

array of signed 8-bit characters

auc

array of unsigned 8-bit characters

hd

signed 16-bit decimal integer

uhd

unsigned 16-bit decimal integer

ahd

array of signed 16-bit decimal integers

auhd

array of unsigned 16-bit decimal integers

d

signed 32-bit decimal integer

ud

unsigned 32-bit decimal integer

ad

array of signed 32-bit decimal integers

aud

array of unsigned 32-bit decimal integers

ld

signed 64-bit decimal integer

uld

unsigned 64-bit decimal integer

ald

array of signed 64-bit decimal integers

auld

array of unsigned 64-bit decimal integers

f

32-bit floating-point number

af

array of 32-bit floating-point numbers

lf

64-bit floating-point number

alf

array of 64-bit floating-point numbers

s

null-terminated character string

as

array of null-terminated character strings

MRNet Stream Performance Data

typedef union { int64_t i; uint64_t u; double d; } perfdata_t;

Metrics define the type of performance data to record. The supported metric types are:

* PERFDATA_MET_NUM_BYTES : count of data bytes (uint64_t)

* PERFDATA_MET_NUM_PKTS : count of data packets (uint64_t)

* PERFDATA_MET_ELAPSED_SEC : elapsed seconds (double)

* PERFDATA_MET_CPU_USR_PCT : percent CPU utilization by user (double)

* PERFDATA_MET_CPU_USR_PCT : percent CPU utilization by system (double)

* PERFDATA_MET_MEM_VIRT_KB : virtual memory footprint in KB (double)

* PERFDATA_MET_MEM_PHYS_KB : physical memory footprint in KB (double)

Contexts specify when to record data. The supported contexts are:

* PERFDATA_CTX_SEND : when data is sent

* PERFDATA_CTX_RECV : when data is received

* PERFDATA_CTX_FILT_IN : before executing transformation filter

* PERFDATA_CTX_FILT_OUT : after executing transformation filter

* PERFDATA_CTX_NONE : when data is collected

See Metric-Context Compatibility Matrix shows which metrics are valid for a given context. When a metric is valid for only CTX_FILT_OUT , the metric is actually recorded through a combination of measurements at CTX_FILT_IN and CTX_FILT_OUT . When a metric is valid for only CTX_NONE , the data is only recorded at the time it is collected. An example MRNet application that makes use of the Stream performance data collection facilities is provided in $MRNET_ROOT/Examples/PerformanceData .

Metric-Context Compatibility Matrix

 

CTX_SEND

CTX_RECV

CTX_FILT_IN

CTX_FILT_OUT

CTX_NONE

MET_NUM_BYTES

yes

yes

yes

yes

no

MET_NUM_PKTS

yes

yes

yes

yes

no

MET_ELAPSED_SEC

no

no

no

yes

no

MET_CPU_USR_PCT

no

no

no

yes

no

MET_CPU_SYS_PCT

no

no

no

yes

no

MET_MEM_VIRT_KB

no

no

no

no

yes

MET_MEM_PHYS_KB

no

no

no

no

yes

Environment Variables

Environment Variables

XPLAT_RSH

XPLAT_RSH_ARGS

 

 

 

XPLAT_REMCMD

Set XPLAT_RSH to the name of the remote shell program to use for remote process execution. Default is 'ssh'. XPLAT_RSH_ARGS can be used to pass shell-specific options to the remote shell.

 

If it is necessary to run the remote shell program with a utility such as runauth to non-interactively authenticate the unattended remote process, that command may be specified using XPLAT_REMCMD.

 

NOTE: Each MRNet process that needs to start remote processes checks its environment for the remote shell to use. Thus, XPLAT_RSH and related variables must be set in the environment for all MRNet front-end and communication processes. The easiest method of ensuring the environment is set correctly is to define XPLAT_RSH within the login scripts for the user's default shell.

XPLAT_RESOLVE_HOSTS

 

 

 

XPLAT_RESOLVE_CANONICAL

Tell XPlat to perform DNS resolution of hostnames and IP addresses by setting the variable to '1'. Default is '1'.

 

When XPLAT_RESOLVE_HOSTS is '1', setting XPLAT_RESOLVE_CANONICAL to '1' will tell XPlat to try to resolve all hostnames to their canonical DNS format. Default is '0'.

MRNET_OUTPUT_LEVEL

 

 

 

 

 

MRNET_DEBUG_LOG_DIRECTORY

Set the debug output level (valid values are 1-5, default is 1). Level 1 will only log warning/error messages, level 3 provides fairly detailed function execution logging, and level 5 will produce every log message that MRNet generates.

 

Specify the absolute path to the directory to store MRNet log files. If not set, the first existing directory from the following list is used:

  • $HOME/mrnet-log
  • /tmp

MRN_COMM_PATH

If mrnet_commnode is not in your path by default, you can specify the full path using this variable.