HTTP Input Application replicas

Updated on 08 May 2025
1 Minute to read

Print
Dark
Light

Article summary

Did you find this summary helpful?

Thank you for your feedback!

Summary

The HTTP Input application can now scale horizontally through a user-configurable replica count setting. Previously, the replica count was determined automatically based on GPU allocation logic, which was hidden from users. This enhancement introduces an explicit configuration option, improving transparency, scalability, and control.

Change Overview

New Capability:
Users can override the internal scaling logic and explicitly set the number of replicas for HTTP Input applications.
Default Behavior:
The default value for this setting is "Auto", preserving the existing scaling logic.
UI Location:
HTTP → App Settings → Application Configuration
New UI Control:
A dropdown labeled "Number of Replicas" with selectable values:
- Auto
- 1 to 8
Technical Behavior:
- Each replica can handle up to 16 concurrent requests and is based on the number of input subjects.
- As the number of replicas increases, increased CPU memory usage results in contention with other EdgeFlow/CloudFlow subsystems.
- This configuration helps handle high-throughput workloads by allowing horizontal scaling:
  Example: For a system with 4 GPUs and 8 replicas of a model and a single Input subject into the HTTP input app, the number of HTTP input apps should be set to 4 to handle 32 concurrent transactions
- Leaving the value set to “Auto” will allow provisioning the HTTP Input app based on the number of input subjects and the model of the EdgeFlow/CloudFlow.

Was this article helpful?

What's Next

Deployments Quickstart Guide