-
Notifications
You must be signed in to change notification settings - Fork 894
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
While running the BentoML service with 4 workers (each with 1 thread), it appears that the incoming HTTP requests are not evenly balanced across the worker processes.
I'd like to know if there's any configuration I'm missing.
2025-08-06T06:00:53+0000 [INFO] [entry_service:Predictor:3] 172.16.140.42:43376 (scheme=http,method=POST,path=/classify,type=application/json,length=81) (status=200,type=application/json,length=10) 201.820ms (trace=2cf1120afa437d5ebe8f9792eb3519b0,span=2507eaf9014aba6e,sampled=0,service.name=Predictor)
2025-08-06T06:00:53+0000 [INFO] [entry_service:Predictor:3] 172.16.140.42:43608 (scheme=http,method=POST,path=/classify,type=application/json,length=81) (status=200,type=application/json,length=10) 202.028ms (trace=5565f4c5a624f3cda4c4835b2853f727,span=0e51701b5e614c96,sampled=0,service.name=Predictor)
2025-08-06T06:00:53+0000 [INFO] [entry_service:Predictor:3] 172.16.140.42:43816 (scheme=http,method=POST,path=/classify,type=application/json,length=81) (status=200,type=application/json,length=10) 201.912ms (trace=d475a2d347df94ca4edd3da858d16905,span=7b44259dafc4f72b,sampled=0,service.name=Predictor)
2025-08-06T06:00:53+0000 [INFO] [entry_service:Predictor:3] 172.16.140.42:44090 (scheme=http,method=POST,path=/classify,type=application/json,length=79) (status=200,type=application/json,length=10) 201.699ms (trace=a539c617542d29cb80aa70dc8b2ee42d,span=3d80958ead45d70c,sampled=0,service.name=Predictor)
2025-08-06T06:00:54+0000 [INFO] [entry_service:Predictor:3] 172.16.140.42:44270 (scheme=http,method=POST,path=/classify,type=application/json,length=80) (status=200,type=application/json,length=10) 201.748ms (trace=fdb5dc2c2773f0d87a2ff20acdd45eb3,span=d83e432d948c33dc,sampled=0,service.name=Predictor)
2025-08-06T06:00:54+0000 [INFO] [entry_service:Predictor:3] 172.16.140.42:44538 (scheme=http,method=POST,path=/classify,type=application/json,length=80) (status=200,type=application/json,length=10) 201.562ms (trace=dcdaccdab55fec02b2aff73d8227d733,span=9f3ab4fe948da5df,sampled=0,service.name=Predictor)
2025-08-06T06:00:54+0000 [INFO] [entry_service:Predictor:3] 172.16.140.42:44716 (scheme=http,method=POST,path=/classify,type=application/json,length=79) (status=200,type=application/json,length=10) 201.492ms (trace=84fc319d34bfa9337a2eb08314015323,span=d0b1e9ead2bb78eb,sampled=0,service.name=Predictor)
2025-08-06T06:00:54+0000 [INFO] [entry_service:Predictor:3] 172.16.140.42:44962 (scheme=http,method=POST,path=/classify,type=application/json,length=78) (status=200,type=application/json,length=10) 201.729ms (trace=ae1647b92484b84a80ba7cee49d9667c,span=537587c62b086b50,sampled=0,service.name=Predictor)
2025-08-06T06:00:54+0000 [INFO] [entry_service:Predictor:3] 172.16.140.42:45190 (scheme=http,method=POST,path=/classify,type=application/json,length=80) (status=200,type=application/json,length=10) 201.974ms (trace=ac90d9510dc60c5162309c900911d846,span=a1842bae213272ce,sampled=0,service.name=Predictor)
2025-08-06T06:00:55+0000 [INFO] [entry_service:Predictor:3] 172.16.140.42:45382 (scheme=http,method=POST,path=/classify,type=application/json,length=79) (status=200,type=application/
...
Here is the simple statistics.
Full logs are attached here.
bentoml_test_server.log
Expected behavior
All the requests should evenly distributed among the workers.
To reproduce
- Prepare simple BentoML class.
# service3.py
import bentoml
import logging
import time
bentoml_logger = logging.getLogger("bentoml")
@bentoml.service(workers=4, threads=1)
class Predictor:
def __init__(self):
pass
@bentoml.api
def classify(self, input_ids: list[list[int]]) -> list[float]:
"""
input_ids example:
[[82, 13, 59, 45, 97, 36, 74, 6, 91, 12, 33, 19, 77, 68, 40, 50]]
"""
time.sleep(0.2)
return [0.1, 0.2]- Run the BentoML service
$ bentoml serve service3:Predictor- Check all the workers are running through htop.
- Prepare client code to generate HTTP requests.
import numpy as np
import requests
import time
def classify_input_ids():
input_ids = np.random.randint(0, 100, (1, 16)).tolist()
response = requests.post(
"http://bentoml-test-server:3000/classify",
json={"input_ids": input_ids},
headers={
"accept": "text/plain",
"Content-Type": "application/json",
"Connection": "close"
}
)
print("Status Code:", response.status_code)
print("Response:", response.text)
def run_for_duration(seconds: int):
end_time = time.time() + seconds
count = 0
while time.time() < end_time:
classify_input_ids()
count += 1
print(f"Sent {count} requests in total.")
if __name__ == "__main__":
duration = int(input("Enter the duration to send requests (in seconds): "))
run_for_duration(duration)- Run the client code.
$ python3 bento_request_en.py
Enter the duration to send requests (in seconds): 180
...- Check the logs of BentoML service.
2025-08-06T06:00:54+0000 [INFO] [entry_service:Predictor:3] 172.16.140.42:44538 (scheme=http,method=POST,path=/classify,type=application/json,length=80) (status=200,type=application/json,length=10) 201.562ms (trace=dcdaccdab55fec02b2aff73d8227d733,span=9f3ab4fe948da5df,sampled=0,service.name=Predictor)
2025-08-06T06:00:54+0000 [INFO] [entry_service:Predictor:3] 172.16.140.42:44716 (scheme=http,method=POST,path=/classify,type=application/json,length=79) (status=200,type=application/json,length=10) 201.492ms
Environment
bentoml: 1.4.19
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working