Writing a Windows Service in Rust

I’m currently writing a cross-platform server application in Rust which should also be able to run as a service on Windows.

While this sounds like a relatively straight-forward thing to do, it was a bit more involved than I originally thought (especially compared to running unix services).

Below you’ll find my notes and a well-documented code sample for running a small echo server for demonstration purposes as a Windows service. I’m going to cover service initialization, responding to control commands (like STOP) and also handling shutdowns in a tokio runtime setting where the individual tasks need to be cancelled. The sample application can also be run in console mode and the Windows service parts are conditionally compiled – so you can run/compile it on Unix systems for console mode use as well.

I decided to use the windows_service crate which abstracts the Windows API calls away and gives you an easy interface to work with.

The service can be installed with: sc.exe create sample-service binPath= "C:\path\to\target\app.exe --service" (sample-service can also be any other name, but it’s important to refer to the same name in the code as well. --service is just an argument for this sample code.)

You can find the complete code here on GitHub.

main(): Service vs. console mode

The application we are going to build should be able to be started as a Windows service but also retain the option to run in console mode. Since the service mode is only available on Windows, certain blocks have to be marked for conditional compilation with the cfg attribute.

We want to use the tokio asynchronous runtime but have the two mentioned startup cases – service and console mode. This means we can’t rely on tokio’s main macro attribute to set things up for us. In the service case, our runtime needs to be set up in a separate thread as the main one will block after calling the Windows service control dispatcher. So we will build the runtime in our main function only in the non-service branch.

Finally, to be able to stop our application and its multiple connection handling tasks – in both service and console mode –, we have to create a CancellationToken and pass it to our app’s individual tasks to receive a signal for a cancellation request (such as a STOP request from the service control manager, SCM, or ctrl-C in console mode) – more on that later in the post.

run_app will be our actual application start.

Here’s our main:

fn main() {
    if std::env::args().any(|a| a == "--service") {
        #[cfg(windows)]
        {
            // service mode (Windows)
            if let Err(e) = win_service::start() {
                // note that this error is only visible if you accidentally
                // run --service in non-service mode. It might make sense
                // to log to the Windows Event Log here.
                eprintln!("Error: {e}");
            }
        }
        #[cfg(not(windows))]
        {
            eprintln!("--service flag is only supported on Windows");
            std::process::exit(1);
        }
    } else {
        // normal console mode (manually started on Windows/Linux/macOS)
        let rt = match tokio::runtime::Runtime::new() {
            Ok(rt) => rt,
            Err(e) => {
                eprintln!("Error setting up tokio Runtime: {e}");
                return;
            }
        };

        let token = CancellationToken::new();
        let token_clone = token.clone();

        rt.block_on(async {
            tokio::spawn(async move {
                let _ = tokio::signal::ctrl_c().await;
                println!("Got Ctrl-C, shutting down...");
                token.cancel();
            });

            if let Err(e) = run_app(token_clone).await {
                eprintln!("Got error: {e}");
            }
        });
    }
}

Defining entry point and starting service control dispatcher

As can be seen in main, the service mode will call start() from our win_service module.

In it, we call service_dispatcher::start from the windows_service crate, which is a wrapper around the Win32 StartServiceCtrlDispatcher function.

When running an application as a service, the executable is supposed to call this function right in the beginning, and pass the service name and a pointer to the application’s entry point function. In the below code, this function is called ffi_service_main and is being generated by the define_windows_service! macro (the Windows API requires an entry point with the signature extern "system" fn(u32, *mut *mut u16), “system” being the standard calling convention on the Windows system).

service_dispatcher::start will block until we’re done or the application stops for another reason, and ffi_service_main will be called in a new thread to be able to start the initialization process.

mod win_service {
    const SERVICE_NAME: &str = "sample-service"; // important to match the service name!

    // ...

    define_windows_service!(ffi_service_main, app_service_main);

    pub fn start() -> Result<(), windows_service::Error> {
        service_dispatcher::start(SERVICE_NAME, ffi_service_main)
    }

    // ...
}

The windows_service crate abstracts the Windows API from us and we can build our actual service initialization logic in the function named app_service_main in the code below. This must just be a function which takes Vec<OsString> and will be called by ffi_service_main.

Depending on your service, you can ignore the arguments as these are just the service start parameters, not the arguments to your app (i.e. sc.exe start <service name> arg1 arg2).

Our app_service_main can then wrap a run_service function and potentially log errors to the windows event log or a file. Once app_service_main returns, the dispatcher (service_dispatcher::start) eventually unblocks.

pub fn app_service_main(_arguments: Vec<OsString>) {
    if let Err(_e) = run_service() {
        // log error (e.g. to Windows Event Log)
    }
}

Service initialization

In our run_service(), we are going to do the following:

Create a oneshot channel to be able to signal a shutdown and initiate a token cancellation in our application when we receive a STOP from the service control manager (SCM)
Register our service control handler code to define what should happen for which control event
Set the service state from start pending to running
Create the tokio runtime
Create a cancellation token to be able to cancel the application’s tasks on shutdown
Spawn a background task to receive and handle our shutdown signal
Eventually run our app with run_app(), just like in the console mode case mentioned in the beginning
Finally, once our app has finished, set the service state to stopped again

Since the code is easier to follow with inline comments, I’ve annotated run_service below rather than describing it separately. I also mentioned a few things that could be improved when doing this for a production application.

// Main service logic for registration of SCM handler code (for
// handling stop events from SCM), to signal to SCM that we're
// starting/running/stopping, and eventually running the app within the
// tokio runtime.
fn run_service() -> Result<(), windows_service::Error> {
    // oneshot channel to be able to receive stop requests, should the SCM
    // control handler send one.
    // The transmitter part goes into the control handler closure, the
    // receiver goes into our own background tasks which handles the token
    // cancellation and setting of the stop pending state.
    let (shutdown_tx, shutdown_rx) = oneshot::channel();
    let shutdown_tx = Mutex::new(Some(shutdown_tx));

    // Register service control handler
    // This wraps RegisterServiceCtrlHandlerExW – so we tell SCM to call
    // the following closure whenever it needs to signal a control command
    // to us.
    // The closure runs in a separate thread.
    // Since register() gives us a status handle, this must be called
    // before we can set a status (start pending, etc.)
    // SERVICE_NAME is required here again, as multiple services could
    // potentially share the same binary (not in our case, but part of API).
    let status_handle = service_control_handler::register(
        SERVICE_NAME,
        move |control_event| match control_event {
            ServiceControl::Interrogate => {
                // Mandatory so SCM can check if the service is still alive
                // and responding
                ServiceControlHandlerResult::NoError
            }
            ServiceControl::Stop => {
                // SCM (or a user through SCM) has requested a stop of the
                // service. Use oneshot transmitter to signal a shutdown to
                // our app (in case it hasn't exited yet, i.e. receiver is
                // still there).
                // We use the channel here and not the cancellation token
                // directly as we don't have the status handle yet and make
                // it a little easier.
                if let Some(tx) = shutdown_tx.lock().unwrap().take() {
                    let _ = tx.send(());
                }
                ServiceControlHandlerResult::NoError
            }

            // For any other control event we just tell SCM that we can't
            // handle it.
            _ => ServiceControlHandlerResult::NotImplemented,
        },
    )?;

    // Tell SCM that we've received the start request
    // This is technically not necessary since we don't have an
    // involved/long-lasting setup logic, so we could just go straight into
    // Running, and don't set StartPending with a wait_hint. I added it for
    // completeness.
    status_handle.set_service_status(ServiceStatus {
        // SERVICE_WIN32_OWN_PROCESS, i.e. we run our own non-shared
        // process
        service_type: ServiceType::OWN_PROCESS,

        // Starting up... (the new state of the service)
        current_state: ServiceState::StartPending,

        // Accept no control during startup
        controls_accepted: ServiceControlAccept::empty(),

        // No error
        exit_code: ServiceExitCode::Win32(0),

        // Progress checkpoint for SCM (only relevant if we would have a
        // multi-step initialization and want to report progress)
        checkpoint: 1,

        // How long SCM should wait before considering the service as hung
        wait_hint: Duration::from_secs(10),

        // Only used for shared processes
        process_id: None,
    })?;

    // Now tell SCM that we're ready
    // (ideally, you would have validated your initialization at this point,
    //  however, for a cross-platform app, you would likely need to
    //  restructure a bit if everything is normally done in run_app. I kept
    //  it simple here).
    status_handle.set_service_status(ServiceStatus {
        service_type: ServiceType::OWN_PROCESS,
        current_state: ServiceState::Running,

        // Accept STOP commands while running
        controls_accepted: ServiceControlAccept::STOP,

        exit_code: ServiceExitCode::Win32(0),

        // Must be 0 when service is running
        checkpoint: 0,

        // No wait hint when running, so just default/zero
        wait_hint: Duration::default(),
        process_id: None,
    })?;

    // Create tokio runtime _here_ as our regular main entry point in the
    // service case just calls the dispatcher and the dispatcher calls us
    // on a different thread.
    let runtime = match tokio::runtime::Runtime::new() {
        Ok(rt) => rt,
        Err(e) => {
            // Failed to create runtime
            status_handle.set_service_status(ServiceStatus {
                service_type: ServiceType::OWN_PROCESS,
                current_state: ServiceState::Stopped,
                controls_accepted: ServiceControlAccept::empty(),
                exit_code: ServiceExitCode::Win32(575), // app init failure
                checkpoint: 0,
                wait_hint: Duration::default(),
                process_id: None,
            })?;

            // hack: note that a custom error type might be more accurate
            // here as it's not a Windows API error...
            return Err(windows_service::Error::Winapi(e));
        }
    };

    let token = CancellationToken::new();

    runtime.block_on(async {
        let app_token = token.clone();

        // Spawn background task that waits for stop signal sent through
        // control handler, and then cancels the token to notify our app
        // tasks
        tokio::spawn(async move {
            let _ = shutdown_rx.await;
            token.cancel();

            // Report StopPending so SCM knows we're winding down and
            // doesn't force-kill us.
            // ServiceStatusHandle implements Copy, so we can use it here
            let _ = status_handle.set_service_status(ServiceStatus {
                service_type: ServiceType::OWN_PROCESS,
                current_state: ServiceState::StopPending,
                controls_accepted: ServiceControlAccept::empty(),
                exit_code: ServiceExitCode::Win32(0),
                checkpoint: 1,
                wait_hint: Duration::from_secs(10),
                process_id: None,
            });
        });

        // Run application until cancelled via token
        if let Err(e) = crate::run_app(app_token).await {
            // again: better to log to Windows Event Log or a file as
            // this error message goes to nowhere :-)
            eprintln!("Got error: {e}");
        }
    });

    // Tell SCM that we're done (we've passed the blocking runtime code
    // above, so we either crashed or received a shutdown)
    status_handle.set_service_status(ServiceStatus {
        service_type: ServiceType::OWN_PROCESS,
        current_state: ServiceState::Stopped,
        controls_accepted: ServiceControlAccept::empty(),
        exit_code: ServiceExitCode::Win32(0), // You may want to make return an error code
                                              // depending on run_app success
        checkpoint: 0,
        wait_hint: Duration::default(),
        process_id: None,
    })?;

    Ok(())
}

Sample app

As stated in the beginning, our sample application is a simple echo server. We’re binding to TCP port 5000, listen for connections and handle them by spawning a new task.

In our main loop which accepts new connections, we’re also checking if the cancellation token has been, well, cancelled. And if so, we break out of our connection acceptance loop and go into the connection draining phase to attempt a close of the connections.

We check if we have any remaining active connections, and if so set a 10 second timer while joining the connection tasks one by one. Once all connections have closed or our timer runs out, we break out of the drain loop.

To make this possible, we also have to pass the cancellation token to our connection handling code which will continuously check the token and read from TCP stream until either resolves (shown in next section).

Here’s how this looks in code:

pub async fn run_app(token: CancellationToken) -> Result<(), Box<dyn Error>> {
    let listener = TcpListener::bind("127.0.0.1:5000").await?;
    println!("Listening on 127.0.0.1:5000...");

    // Track all spawned connection tasks so we can wait for them to drain
    // on shutdown.
    let mut connections = tokio::task::JoinSet::new();

    // Accept connections until token gets cancelled
    loop {
        tokio::select! {
            _ = token.cancelled() => {
                println!("Accept loop: cancellation received, stopping...");
                break;
            }
            result = listener.accept() => {
                match result {
                    Ok((stream, addr)) => {
                        println!("New connection from {addr}");
                        let conn_token = token.child_token();
                        connections.spawn(
                            handle_connection(stream, addr, conn_token));
                    }
                    Err(e) => {
                        eprintln!("Error accepting new connection: {e}");
                    }
                }
            }
        }
    }

    // Token got cancelled and we stopped accepting new connections. Now give
    // existing ones some time to finish. After the timeout, connections will
    // just be dropped/tasks cancelled.
    let active = connections.len();
    if active > 0 {
        println!("Waiting for {active} connections to close");

        // you may want to capture/log panics here
        let drain = async { while connections.join_next().await.is_some() {} };
        if tokio::time::timeout(std::time::Duration::from_secs(10), drain)
            .await
            .is_err()
        {
            println!("Time out waiting for connections to close, forcing \
                      shutdown");
        }
    }
    println!("Server shut down.");

    Ok(())
}

Handling stream data while checking token

Finally, this is how we handle the token checking in handle_connection:

Either branch of the select! can lead to breaking out of the loop. It polls both branch futures simultaneously and when one resolves, the other is dropped (dropping the read in its pending state is OK when we want to shutdown our server).

async fn handle_connection(
    mut stream: tokio::net::TcpStream,
    addr: std::net::SocketAddr,
    token: CancellationToken,
) {
    let mut buf = [0u8; 1024];

    // Keep reading until token is cancelled (or client disconnects or we have
    // read/write errors). Note that if you're buffering some data, you might
    // want to flush before returning.
    loop {
        tokio::select! {
            _ = token.cancelled() => {
                println!("{addr}: shutdown signal received, closing \
                          connection.");

                let msg = format!("Bye {addr}...");
                if let Err(e) = stream.write_all(msg.as_bytes()).await {
                    eprintln!("{addr}: write error: {e}");
                }
                break;
            }
            result = stream.read(&mut buf) => {
                match result {
                    // Client disconnected
                    Ok(0) => {
                        println!("{addr}: client disconnected");
                        break;
                    }
                    Ok(n) => {
                        // Echo back what we received
                        let mut msg = Vec::with_capacity(6 + n);
                        msg.extend_from_slice(b"Echo: ");
                        msg.extend_from_slice(&buf[..n]);
                        if let Err(e) = stream.write_all(&msg).await {
                            eprintln!("{addr}: write error: {e}");
                            break;
                        }
                    }
                    Err(e) => {
                        eprintln!("{addr}: read error: {e}");
                        break;
                    }
                }
            }
        }
    }
}

And this completes the sample project and little exploration into Windows services. I learned a bunch of new things and hope the descriptions and sample code come in helpful to you as well.

As mentioned in the beginning, you can find the full code here on GitHub.

main(): Service vs. console mode#

Defining entry point and starting service control dispatcher#

Service initialization#

Sample app#

Handling stream data while checking token#

main(): Service vs. console mode

Defining entry point and starting service control dispatcher

Service initialization

Sample app

Handling stream data while checking token