about
12/08/2022
Post CommCompare
CommCompare
Comparing C++ and Rust Message-passing communicators
Comparing C++ and Rust
-
The C++ and Rust designs should be as similar as practical within constraints of the two languages.
Both use:
- Synchronous, full-duplex sockets for client to server client-handler communication.
- Queues for message input in both client and server.
- Server Threadpools to avoid context thrashing for large numbers of concurrent clients.
- Simple variable-size messages which may have binary bodies.
- Server processing simply echos client messages back (client handlers are designed to support significantly more complex server operations).
- These projects will use each language's standard libraries for programming resources, but will not use other third party libraries like Boost and those from Crates.io. The intent is to compare our work with C++ and Rust, not some other developer's work.
- The C++ code should use modern C++17 constructs supporting safe data handling. Safe Rust enforces memory and data race safety at compile time. No unsafe blocks will be used in any of the Rust code (excluding unsafe blocks used in, and thoroughly vetted for, the standard library).
- The code should provide build processes for Windows, Linux, and (eventually) macOS.
- Implementations will use professional care, but no extraordinary efforts will be made to squeeze out additional performance or other measured attributes. This is supposed to reflect good "business as usual" practice.
Communicator Concepts
C++ Communicator
Rust Communicator
Communicator Comparisons
-
Performance:
Performance is a very important issue for most C++ and Rust developers. In this section we compare performance of the two message-passing systems described above. One is implemented using C++17 and the other using Rust 1.48. The designs of each are as close as practical, given the syntax and semantics of the two languages. We measured performance by message-passing throughput in MegaBytes per Second for 16 concurrent clients each passing 1000 messages to a server that that uses a threadpool running client handlers that simply echo back each message. Clients send on one thread, receive on another, and don't wait for replies before sending the next message. Throughput is measured as the total message content bytes for all clients divided by the time between sending the first message and receiving the last. Here are the results for six different environments, e.g., three different desktops, two operating systems, running on both the native platform and in a VMware virtual machine. RustComm 5.20% faster than MPL on Windows - Core i7-7700C++ MPLIntel Core i7-7700 CPU, 16 GB RAM, 932GB SSD, 64b Windows 10 ver 2004, cl ver 19.28 Thruput MB/s with 8 thrds, 16 clients 202 218 229 230 195 217 Avg Thruput: 215.2 MB/s Rust CommIntel Core i7-7700 CPU, 16 GB RAM, 932GB SSD, 64b Windows 10 ver 2004, Rustc 1.47.0 Thruput MB/s with 8 thrds, 16 clients 227 248 229 214 235 209 Avg Thruput: 227.0 MB/s C++ MPL 9.32% faster than RustComm on Ubuntu - Core i7-2600C++ MPLIntel Core i7-2600 CPU, 8 GB RAM, 1.5 TB ATA, 64b Ubuntu version 20.04, gcc version 9.3.0 Thruput MB/s with 8 thrds, 16 clients 610 640 626 622 610 785 Avg Thruput: 648.8 MB/s Rust CommIntel Core i7-2600 CPU, 8 GB RAM, 1.5 TB ATA, 64b Ubuntu version 20.04, Rustc 1.47.0 Thruput MB/s with 8 thrds, 16 clients 671 608 606 618 456 571 Avg Thruput: 588.3 MB/s RustComm 17.2% faster than MPL on Windows - Xeon WorkstationC++ MPLXeon E3-1535M v6, 32 GB RAM, 64b Windows 10 Workstation , cl version 19.28.29334 Thruput MB/s with 8 thrds, 16 clients 395 438 385 398 422 391 Avg Thruput: 404.8 MB/s Rust CommXeon E3-1535M v6, 32 GB RAM, 64b Windows 10 Workstation , Rustc ver 1.48.0 Thruput MB/s with 8 thrds, 16 clients 499 513 434 495 486 507 Avg Thruput: 489.0 MB/s MPL 4.21% faster than RustComm on Windows - Xeon Workstation, VMware VMC++ MPLXeon E3-1535M v6, 32 GB RAM, 64b Windows 10 on VMware workstation 16.1.0 , cl version 19.28.29334 Thruput MB/s with 8 thrds, 16 clients 355 424 435 411 418 436 Avg Thruput: 413.2 MB/s Rust CommXeon E3-1535M v6, 32 GB RAM, 64b Windows 10 on VMware workstation 16.1.0 , Rustc ver 1.48.0 Thruput MB/s with 8 thrds, 16 clients 383 421 411 373 373 414 Avg Thruput: 395.8 MB/s MPL 14.8% faster than RustComm on Linux - Xeon Workstation, VMware VMC++ MPLXeon E3-1535M v6, 32 GB RAM, 64b Linux Mint ver 20 on VMware workstation 16.1.0 , g++ 9.3 Thruput MB/s with 8 thrds, 16 clients 630 640 639 600 607 614 Avg Thruput: 621.7 MB/s Rust CommXeon E3-1535M v6, 32 GB RAM, 64b Linux Mint ver 20 on VMware workstation 16.1.0 , Rustc ver 1.48.0 Thruput MB/s with 8 thrds, 16 clients 522 517 633 439 492 574 Avg Thruput: 529.5 MB/s If you want to reproduce these results, you will find the code here: with build instructions in Readme.md files in the repository roots. Summary and Conclusions:
The performance results are similar, over the several environments used. There aren't many surprises here, faster processors yield faster performance, throughput on Linux is faster than throughput on Windows.The conclusion is that C++ and Rust have very similar performance, both very good. -
Size and Complexity
In this item we compare the two communicators in terms of size of the source code and complexity measured by the number of scopes in source codes used to build the communicator programs and also by the number of functions in that code. Here are the results: C++ MPLSize - Lines of Source Code 5153 Number of Functions 266 Number of Scopes 544 Rust CommSize - Lines of Source Code 2313 Number of Functions 146 Number of Scopes 402 This analysis includes, along with implementing code, codes we used to test MPL and RustComm. Test code is an integral part of development and maintenance and, as such, should be counted as part of its size and complexity. Also, analysis includes not only the declarative and executable statements, but white space and comments that make the code readable and maintainable. That is, the lines of code count everything. The data above are affected by the fact that Rust has a fairly high level TCPStream library while C++, as of C++20, has no networking library. The MPL code used platform socket APIs to develop a solid library at the same level of abstraction as the Rust TCPStream library. The abstraction implementation had to use the slightly different platform socket APIs on Windows and Linux. We chose to include the C++ socket library since it is code that had to be developed to complete the C++ MPL project. The result of this is that we judge Rust to result in somewhat smaller and simpler code, mostly because of the scope of its standard libraries used in this application. -
Ease of Construction:
This comparison is based on personal experience developing C++ programs for years and more than one year of experience with Rust. Treat these comments as opinion, not evidence-based fact. C++
For experienced developers, getting a C++ project to compile is usually relatively easy, with the possible exception of template metaprogramming code. Developers tend to spend most of their time debugging operations, especially for multi-threaded code. Modern C++ constructs have improved memory safety significantly, but for large systems there are likely to be a few places where modern idioms are not followed, resulting in errors that are hard to find and fix. Rust
The Rust compiler uses static analysis to provide memory and data race safety by construction. That means that developers spend significantly more time getting complex programs to compile than they would with C++. Compiler error messages are very helpful, so this isn't as difficult as it would otherwise be. Rust developers tend to spend much less time debugging code, compared with C++, because Rust compiler analysis has eliminated all memory access and data race errors, leaving only logical errors with program operations. -
Safety:
Modern C++ is memory safe by convention. It has facilities and common idioms that eliminate most memory use errors, as long the conventions are followed everywhere. Rust is memory safe by construction and is also data race safe by construction. The compiler ensures that there are no opportunities for memory and data race errors. C++ MPL
C++17 has facilities to ensure data safety by convention: - range-based for loops prevent indexing outside collections
- iterators prevent dangling pointers when containers reallocate memory
- std::unique_ptr and std::shared_ptr provide deallocation when they go out of scope
C++ has locks to avoid data races by protecting blocks of code, but provides no guarentees that they are used correctly, e.g., do all threads share the same lock, are locks released in the presence of errors, ...
During construction of MPL a significant portion of the development time was spent finding data races. Rust Comm
Rust guarentees memory saftey by construction: - Indexing out-of-bounds an array or container causes an ordered shutdown, called a panic, that prevents accessing unowned memory.
-
Rust enforces data ownership at compile-time using a reference checker that ensures
references to data:
- are initialized before use.
- do not out live their referends.
- do not view data mutated by the owner or other references.
Rust locks protect data, not sections of code, which gaurentees that all threads accessing the same data use the same lock. That, combinded with Rust's data ownership rules, avoid data races.
When implementing Rust Comm, it took care to create multi-threaded code that compiled successfully. But once built there were no data races to find. So, its harder to build, but much easier to debug multi-threaded code in Rust. -
Ease of Maintenance:
Both MPL and RustComm are well commented, factored into single-focus components, and have effective build environments. Those factors are all part of a good maintenance strategy. In this block we will focus on those maintenance attributes that are largely determined by the C++ and Rust languages and their environments.C++ MPL
The existence and longevity of many large software systems written with C++ demonstrate that, if well designed, C++ code can be effectively maintained. Unlike Rust, C++ does not have a lot of test facilities delivered as part of its default environment. That means there is a tendency to test in batches periodically rather than continuously during construction. One other issue with C++ is its use of header file #includes. Builds using these includes work well, but if a project gets deployed to a different directory structure, there is a painful process of correcting include links before the project builds again. Rust Comm
The Rust ecosystem has a tool called cargo. It is a package and build manager, an executor, and it provides access to a linter, clippy, and documentor, rustdoc. This tool chain makes maintenance of Rust programs very productive. One of its best features is the facilities it provides for test. Cargo builds libraries with a preconfigured test hook that makes it easy to build unit tests as library construction proceeds. In addition to that, cargo will build any console app it finds in a /examples directory and link to the package library. So it is easy to provide a number of test and demonstration programs that illustrate library facilities.
Implementations
C++ Communicator
- Uses queued full-duplex buffered message transfers.
- Each message has a fixed size header and a body consisting of an array of bytes.
- For each incoming connection the TCPResponder requests a threadpool thread and processes messages with an instance of ClientHandler.
- For this demonstration ClientHandler instances echo back the incoming message, marked as reply.
Current Design:
Messages consist of a header with mtype and content_len attributes and an array of bytes for the
body.
-
Message(usize sz, const u8 content_buf[], usize content_len, u8 mtype) Create newMessage from array of bytes -
Message(usize sz, const std::string& str, u8 mtype) Create newMessage from string -
Message(MessageType mtype=MessageType::DEFAULT) Create newMessage with no contents -
Message(const Message &msg) Copy constructor -
Message &operator=(const Message &msg) Copy assignment -
Message(Message &&msg) Move constructor -
Message &operator=(Message &&msg) Move assignment -
~Message() Destructor -
u8 operator[](int index) const Const index operator -
void set_type(u8 mt) Set MessageType to one of TEXT, BYTES, STRING -
unsigned get_type() const Returns instance of MessageType -
usize get_content_len() const Returns length of message body in bytes -
void set_content_bytes(const u8 buff[], size_t len) Copy byte array into message body -
void set_content_str(const std::string &str) Copy string to message body -
std::string get_content_str() Return message body as string
TCPConnector supports direct and queued messaging with a connected TCPResponder.
-
TCPConnector(TCPSocketOptions *sc = nullptr); Create new TCPConnector -
bool Close() Shutdown TCPConnector and signal server-side ClientHandler to shutdown. -
bool IsConnected() const; Has valid connection? -
bool IsSending() const; Is send thread running? -
bool IsReceiving() const; Is receive thread running? -
void UseSendReceiveQueues(bool use_qs); Starts dedicated send and receive threads. -
void UseSendQueue(bool use_q); Start send thread on connection established. -
void UseReceiveQueue(bool use_q); Start receive thread on connection established. -
void PostMessage(const Message &m); Enqueues message for sending to associated TCPResponder. -
void SendMessage(const Message &m); Sends message directly instead of posting to send queue. -
Message GetMessage() Dequeue received message if available, else block. -
Message ReceiveMessage() Read message from socket if available, else block.
TCPResponder uses threadpool threads to support concurrent communication sessions.
-
TCPResponder(const EndPoint& ep, TCPSocketOptions* sc = nullptr) Create new TCPResponder -
void RegisterClientHandler(ClientHandler* ch); Register ClientHandler prototype - defines server-side message processing. -
void Start(int backlog=20) Start dedicated server listening thread. -
void Stop() Stop listening service. -
void UseClientSendReceiveQueues(bool use_qs); Start dedicated send and receive threads when extablishing a connection. -
void UseClientSendQueue(bool use_q); Start dedicated send thread when establishing a connection. -
void UseClientReceiveQueue(bool use_q); Start dedicated receive thread when establishing a connection.
ClientHandlers communicate directly with associated TCPConnectors. Derived ClientHandler
classes define application specific message handling operations.
-
Message GetMessage() Dequeue message from receive queue if available, else blocks. -
Message ReceiveMessage() Read message directly from socket if available, else blocks. -
void PostMessage(const Message& m) Enqueues message for connected client. -
void SendMessage(const Message& m) Write message directly to connected socket. -
virtual void AppProc() = 0 Application message processing supplied by derived ClientHandler class. -
virtual ClientHandler* Clone() = 0; Create application defined ClientHandler instances.
Rust Communication
- Uses queued full-duplex buffered message sending and receiving
-
Each message has a fixed size header and
Vec<u8> body. -
For each
Connector<P, M, L> connection,Listener<P, L> processes messages until receiving a message with MessageType::END. -
Listener<P, L> requests a thread from ThreadPool<P> for each client connection and processes messages inP::process_message . -
In this version,
P::process_message echos back message with "reply" appended as reply to sender. You observe that behavior by running test1, e.g.,cargo run --example test1 .
Current Design:
-
new() -> Message Create newMessage with empty body and MessageType::TEXT. -
set_type(&mut self, mt: u8) SetMessageType member to one of:TEXT, BYTES, END . -
get_type(&self) -> MessageType ReturnMessageType member value. -
set_body_bytes(&mut self, b: Vec<u8>) Setbody_buffer member to bytes fromb: Vec<u8> . -
set_body_str(&mut self, s: &str;) Setbody_buffer member to bytes froms: &str . -
get_body_size(&self) -> usize Return size in bytes of body member. -
get_body(&self) -> &Vec<u8> Returnbody_buffer member. -
get_body_str(&self) -> String Return body contents as lossyString . -
clear(&self) clear body contents.
Both Connector<P, M, L> and Listener<P, L> are parameterized with L ,
a type satisfying a Logger trait. The package defines two types that implement the trait,
VerboseLog and MuteLog that allow users to easily turn on and off event display
outputs. Fig 2. uses MuteLog in both Connector<P, M, L> and
Listener<P, L> .
-
new(addr: &'static str) -> std::io::Result<Connector<P,M,L>> Create newConnector<P,M,L> with running send and receive threads. -
is_connected(&self) -> bool is connected toaddr ?. -
post_message(&self, msg: M) Enqueues msg to send to connected Receiver. -
get_message(&mut self) -> M Reads reply message if available, else blocks. -
has_message(&self) -> bool Returns true if reply message is available.
-
new(nt: u8) -> Listener<P, L> Create newListener<P, L> with nt threads running. -
start(&mut self, addr: &'static str) -> std::io::Result<JoinHandle<()>> BindListener<P,L> toaddr and start listening on dedicated thread.
-
new() -> CommProcessing<L> Create instance of CommProcessing<L> -
send_message(msg:&M, stream:&mut<TcpStream) -> std::io::Result<()> Write message to TcpStream connected to Connector<P,M,L> -
buf_send_message(msg:&M, stream:&mut BufWriter<TcpStream>) -> std::io::Result<()> Write message to TcpStream connected to Connector<P,M,L> -
recv_message(stream:&mut TcpStream) -> std::io::Result<M> Read message from TcpStream connected to Connector<P,M,L> -
buf_recv_message(stream:&mut BufReader<TcpStream>) -> std::io::Result<M> Read message from TcpStream connected to Connector<P,M,L> -
process_message(msg:&mut M) Process message using mutable reference.