Long-Running Tasks with Next.js: A Story of Reinventing the Wheel

A simple goal I’m building a back-office application using Next.js, which needs to handle various automation tasks, such as generating gigantic XML files, running regular tests on core features, and managing a few Puppeteer crawlers. Some tasks take less than a minute, others around five minutes, while the crawlers can have undefined or even infinite runtimes. I needed a solution to run these tasks in parallel control them and stream their updates to the back-office (BO). Don’t Reinvent the Wheel, just pick the most popular, widely-used open-source solution and rely on the collective effort of people who’ve spent hundreds of hours perfecting it. Sounds reasonable, right? I agree. And usually everything starts so. Yet, most of the time, I end up with some primitive, scrappy approach that I carry from project to project. I’m not sure if this is what word experience mean or I’m just accumulating bad habits, but either way, it’s my favorite ways to learn. Trigger.dev: The Open-Source Background Jobs Platform Wonderful, I thought. Within a few days, I managed to spin it up on my VPS and had a separate dashboard displaying a Gantt chart with all running and completed jobs. However, while prototyping the solution, I encountered a problem with streaming data to the back office (BO). Their Discord community was supportive, and someone mentioned that this issue had been reported before and should now be fixed. Unfortunately, after updating to the latest version, I still couldn’t subscribe to log streaming. After days of troubleshooting, I started thinking that I might have chosen the wrong tool. Maybe Trigger.dev works flawlessly in its cloud environment, but I wanted to keep my entire infrastructure limited to a few VPSs. Ultimately, I decided that although Trigger.dev is a powerful and sophisticated solution, I didn’t need all its features. A simpler, custom-built solution built on top of Next.js would be more suitable BullMQ: A Task Queue Library I thought of the solution as a task queue, so I explored BullMQ, a Node.js library for managing message queues. I've deployed a redis instance and started prototyping. Soon I discovered that jobs couldn’t be aborted once locked by a worker. After a few days of experiments with BullMQ event queues, workers and jobs, it became clear that BullMQ wasn’t designed for aborting task or for streaming messages in real time. No I need to invent a specific wheel, so no 3rd party anymore Web Workers I decided to rethink my needs entirely and opted for a simpler, 100% custom solution. Initially, I considered Web Workers, which are great for offloading tasks like generating large XML files in the browser. However, the idea of running Puppeteer (a headless browser) inside a Web Worker felt awkward. Web Workers are tied to the lifecycle of the page that created them. If the page is refreshed or navigated away, the worker is terminated, and its state or communication is lost. This limitation made Web Workers unsuitable for long-running or indefinite-duration tasks. Node.js Child Processes Spawning child processes or forks in Node.js allowed me to run tasks in parallel while retaining control to pause or abort them at any time. Additionally, child processes can use process.send() to exchange message objects with the parent process, which felt like the right approach. With this setup, the goal narrowed down to establishing a reliable bi-directional message exchange between the child processes and the browser. WebSockets The first communication method that came to mind was WebSockets. I didn’t want to maintain another server besides the one provided by Next.js, so I attempted to run wss:// over Next.js's https://. However, integrating WebSocket servers into a Next.js application proved to be more challenging than I anticipated. Here are the main issues I faced: Instrumentation Hook Initialization: I tried initializing the WebSocket server during server startup using the instrumentation hook (register()). While this is intended to run when the server starts, there are nuances depending on the hosting environment (e.g., Vercel, Docker, or a standalone Node.js server). The problem was that the HTTP server instance (global.server) wasn’t immediately available in these environments. This made it impossible to attach the WebSocket server to the existing HTTP server at the right time. Accessing Next.js's HTTP Server: Next.js doesn’t expose its internal HTTP server directly, which prevents you from easily attaching a WebSocket server to it. While it’s technically possible to spin up a separate Node.js server for WebSockets, this felt like a clunky and inelegant solution that added unnecessary complexity to the project. Cloud Provider Compatibility: Running a custom WebSocket server reduces compatibility with cloud platforms like Vercel, which don’t support persistent, long-lived WebSock

Jan 19, 2025 - 13:11

Long-Running Tasks with Next.js: A Story of Reinventing the Wheel

A simple goal

I’m building a back-office application using Next.js, which needs to handle various automation tasks, such as generating gigantic XML files, running regular tests on core features, and managing a few Puppeteer crawlers. Some tasks take less than a minute, others around five minutes, while the crawlers can have undefined or even infinite runtimes. I needed a solution to run these tasks in parallel control them and stream their updates to the back-office (BO).

Don’t Reinvent the Wheel, just pick the most popular, widely-used open-source solution and rely on the collective effort of people who’ve spent hundreds of hours perfecting it. Sounds reasonable, right? I agree. And usually everything starts so. Yet, most of the time, I end up with some primitive, scrappy approach that I carry from project to project. I’m not sure if this is what word experience mean or I’m just accumulating bad habits, but either way, it’s my favorite ways to learn.

Trigger.dev: The Open-Source Background Jobs Platform

Wonderful, I thought. Within a few days, I managed to spin it up on my VPS and had a separate dashboard displaying a Gantt chart with all running and completed jobs. However, while prototyping the solution, I encountered a problem with streaming data to the back office (BO). Their Discord community was supportive, and someone mentioned that this issue had been reported before and should now be fixed. Unfortunately, after updating to the latest version, I still couldn’t subscribe to log streaming.

After days of troubleshooting, I started thinking that I might have chosen the wrong tool. Maybe Trigger.dev works flawlessly in its cloud environment, but I wanted to keep my entire infrastructure limited to a few VPSs. Ultimately, I decided that although Trigger.dev is a powerful and sophisticated solution, I didn’t need all its features. A simpler, custom-built solution built on top of Next.js would be more suitable

BullMQ: A Task Queue Library

I thought of the solution as a task queue, so I explored BullMQ, a Node.js library for managing message queues. I've deployed a redis instance and started prototyping. Soon I discovered that jobs couldn’t be aborted once locked by a worker. After a few days of experiments with BullMQ event queues, workers and jobs, it became clear that BullMQ wasn’t designed for aborting task or for streaming messages in real time.

No I need to invent a specific wheel, so no 3rd party anymore

Web Workers

I decided to rethink my needs entirely and opted for a simpler, 100% custom solution. Initially, I considered Web Workers, which are great for offloading tasks like generating large XML files in the browser. However, the idea of running Puppeteer (a headless browser) inside a Web Worker felt awkward. Web Workers are tied to the lifecycle of the page that created them. If the page is refreshed or navigated away, the worker is terminated, and its state or communication is lost. This limitation made Web Workers unsuitable for long-running or indefinite-duration tasks.

Node.js Child Processes

Spawning child processes or forks in Node.js allowed me to run tasks in parallel while retaining control to pause or abort them at any time. Additionally, child processes can use process.send() to exchange message objects with the parent process, which felt like the right approach. With this setup, the goal narrowed down to establishing a reliable bi-directional message exchange between the child processes and the browser.

WebSockets

The first communication method that came to mind was WebSockets. I didn’t want to maintain another server besides the one provided by Next.js, so I attempted to run wss:// over Next.js's https://. However, integrating WebSocket servers into a Next.js application proved to be more challenging than I anticipated. Here are the main issues I faced:

Instrumentation Hook Initialization:

I tried initializing the WebSocket server during server startup using the instrumentation hook (register()). While this is intended to run when the server starts, there are nuances depending on the hosting environment (e.g., Vercel, Docker, or a standalone Node.js server). The problem was that the HTTP server instance (global.server) wasn’t immediately available in these environments. This made it impossible to attach the WebSocket server to the existing HTTP server at the right time.

Accessing Next.js's HTTP Server:

Next.js doesn’t expose its internal HTTP server directly, which prevents you from easily attaching a WebSocket server to it. While it’s technically possible to spin up a separate Node.js server for WebSockets, this felt like a clunky and inelegant solution that added unnecessary complexity to the project.

Cloud Provider Compatibility:

Running a custom WebSocket server reduces compatibility with cloud platforms like Vercel, which don’t support persistent, long-lived WebSocket connections in their serverless functions. For example, Vercel recommends using external services like Ably or Pusher for real-time messaging, making this approach less desirable if I wanted flexibility in deployment options.

Singleton WebSocket Server in Next.js API Routes

I then tried to initialize a singleton WebSocket server directly in a Next.js API route, ensuring it was tied to the request-response lifecycle. While this approach allowed me to bypass some of the earlier problems, it still didn’t feel ideal: Next.js’s serverless-first nature means it wasn’t designed to maintain long-lived connections like WebSockets natively. This approach decreased compatibility with cloud providers (even if I wasn’t targeting them now, limiting options felt like a bad strategy). Creating a separate Node.js server to run WebSockets alongside Next.js could work, but it was an inelegant and overly complex solution for my needs.

Why Not Long Polling?

I briefly experimented with long polling as an the simplest alternative. However, triggering child processes from the BO and subscribing to logs from multiple processes simultaneously led to noticeable fall of FPC and minor freezes. This was exacerbated by the fact that my BO interface was also rendering a heavily virtualized grid with over 300,000 rows and infinite loading, which relied on fetching and filtering data from the backend. Adding long polling risked to make potential problem worse. So i considered making it impractical for real-time communication in this case.

Server-Sent Events: The Final Solution (Finally)

Finally, I discovered Server-Sent Events (SSE MDN Reference), which turned out to be exactly what I needed. SSE allows the server to stream real-time updates to the client over a single HTTP connection, making it perfect for my use case.

The solution was built with three simple components:

Process Manager: A class that spawns and controls child processes. It handles task execution, pausing, aborting, and message exchange.
API Route: next.js route /api/[task-name] with: – GET: Streams a real-time text stream of updates from the task. – POST: Starts or pauses the task. – DELETE: Aborts the task.
React hook: The hook leverages the EventSource API to consume real-time updates from the server and the REST endpoints are used to control the task’s lifecycle (start, pause, abort).

Rakes collected

Web Workers
They’re great for running parallel computations in the browser, such as xml generation, 3rd-party scripts connecting offloading page load, data parsing. However, they’re tightly tied to the lifecycle of the page that creates them. For long-running or server-side tasks, they’re simply not the right fit.
Trigger.dev: This is a fantastic solution for managing background jobs in distributed systems. If I wanted a cloud-native, robust task orchestration platform with retries, monitoring, and a dashboard, Trigger.dev would be ideal. However, bug the might be fixed soon, its inability to natively stream real-time logs to my back-office made it unsuitable for my requirements. It’s a great choice if you’re running workflows that don’t demand real-time feedback to your own UI.
BullMQ: excelent at managing queues for tasks like processing large batches, email marketing, any another marketing sequences, running delayed or scheduled jobs with cron. It’s not designed for aborting active jobs or streaming real-time updates to the frontend.
Maybe Ill better use it for managing marketings features like sales funnels.
WebSockets: Maintaining WebSocket servers alongside a Next.js application, especially in cloud environments, feels cumbersome and is often overkill for simpler streaming needs.

After trying all of these, I discovered Server-Sent Events (SSE)—a simpler, elegant solution for my problem. It allowed me to "reinvent the wheel," creating a task management system with real-time streaming updates from child processes. While it’s not the fanciest tool in the box, it’s tailored to my specific needs, and I’m satisfied with the results.