🐳 Docker, Deploying LiteLLM Proxy

You can find the Dockerfile to build litellm proxy here

Quick Start

Basic
With CLI Args
use litellm as a base image

See the latest available ghcr docker image here: https://github.com/berriai/litellm/pkgs/container/litellm

docker pull ghcr.io/berriai/litellm:main-latest

docker run ghcr.io/berriai/litellm:main-latest

Run with LiteLLM CLI args

See all supported CLI args here:

Here's how you can run the docker image and pass your config to litellm

docker run ghcr.io/berriai/litellm:main-latest --config your_config.yaml

Here's how you can run the docker image and start litellm on port 8002 with num_workers=8

docker run ghcr.io/berriai/litellm:main-latest --port 8002 --num_workers 8

# Use the provided base image
FROM ghcr.io/berriai/litellm:main-latest

# Set the working directory to /app
WORKDIR /app

# Copy the configuration file into the container at /app
COPY config.yaml .

# Make sure your entrypoint.sh is executable
RUN chmod +x entrypoint.sh

# Expose the necessary port
EXPOSE 4000/tcp

# Override the CMD instruction with your desired command and arguments
CMD ["--port", "4000", "--config", "config.yaml", "--detailed_debug", "--run_gunicorn"]

Deploy with Database

We maintain a seperate Dockerfile for reducing build time when running LiteLLM proxy with a connected Postgres Database

Dockerfile
Kubernetes
Helm

docker pull docker pull ghcr.io/berriai/litellm-database:main-latest

docker run --name litellm-proxy \
-e DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname> \
-p 4000:4000 \
ghcr.io/berriai/litellm-database:main-latest

Your OpenAI proxy server is now running on http://0.0.0.0:4000.

Step 1. Create deployment.yaml

   apiVersion: apps/v1
   kind: Deployment
   metadata:
     name: litellm-deployment
   spec:
     replicas: 1
     selector:
       matchLabels:
         app: litellm
     template:
       metadata:
         labels:
           app: litellm
       spec:
         containers:
           - name: litellm-container
             image: ghcr.io/berriai/litellm-database:main-latest
             env:
              - name: DATABASE_URL
                value: postgresql://<user>:<password>@<host>:<port>/<dbname>

kubectl apply -f /path/to/deployment.yaml

Step 2. Create service.yaml

apiVersion: v1
kind: Service
metadata:
  name: litellm-service
spec:
  selector:
    app: litellm
  ports:
    - protocol: TCP
      port: 4000
      targetPort: 4000
  type: NodePort

kubectl apply -f /path/to/service.yaml

Step 3. Start server

kubectl port-forward service/litellm-service 4000:4000

Your OpenAI proxy server is now running on http://0.0.0.0:4000.

Step 1. Clone the repository

git clone https://github.com/BerriAI/litellm.git

Step 2. Deploy with Helm

helm install \
  --set masterkey=SuPeRsEcReT \
  mydeploy \
  deploy/charts/litellm

Step 3. Expose the service to localhost

kubectl \
  port-forward \
  service/mydeploy-litellm \
  8000:8000

Your OpenAI proxy server is now running on http://127.0.0.1:8000.

Advanced Deployment Settings

Customization of the server root path

info

In a Kubernetes deployment, it's possible to utilize a shared DNS to host multiple applications by modifying the virtual service

Customize the root path to eliminate the need for employing multiple DNS configurations during deployment.

👉 Set SERVER_ROOT_PATH in your .env and this will be set as your server root path

Setting SSL Certification

Use this, If you need to set ssl certificates for your on prem litellm proxy

Pass ssl_keyfile_path (Path to the SSL keyfile) and ssl_certfile_path (Path to the SSL certfile) when starting litellm proxy

docker run ghcr.io/berriai/litellm:main-latest \
    --ssl_keyfile_path ssl_test/keyfile.key \
    --ssl_certfile_path ssl_test/certfile.crt

Provide an ssl certificate when starting litellm proxy server

Platform-specific Guide

Google Cloud Run
Render deploy
Railway

Deploy on Google Cloud Run

Click the button to deploy to Google Cloud Run

Testing your deployed proxy

Assuming the required keys are set as Environment Variables

https://litellm-7yjrj3ha2q-uc.a.run.app is our example proxy, substitute it with your deployed cloud run app

curl https://litellm-7yjrj3ha2q-uc.a.run.app/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "gpt-3.5-turbo",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'

Deploy on Railway https://railway.app

Step 1: Click the button to deploy to Railway

Step 2: Set PORT = 4000 on Railway Environment Variables

Extras

Run with docker compose

Step 1

(Recommended) Use the example file docker-compose.example.yml given in the project root. e.g. https://github.com/BerriAI/litellm/blob/main/docker-compose.example.yml
Rename the file docker-compose.example.yml to docker-compose.yml.

Here's an example docker-compose.yml file

version: "3.9"
services:
  litellm:
    build:
      context: .
        args:
          target: runtime
    image: ghcr.io/berriai/litellm:main-latest
    ports:
      - "8000:8000" # Map the container port to the host, change the host port if necessary
    volumes:
      - ./litellm-config.yaml:/app/config.yaml # Mount the local configuration file
    # You can change the port or number of workers as per your requirements or pass any new supported CLI augument. Make sure the port passed here matches with the container port defined above in `ports` value
    command: [ "--config", "/app/config.yaml", "--port", "8000", "--num_workers", "8" ]

# ...rest of your docker-compose config if any

Step 2

Create a litellm-config.yaml file with your LiteLLM config relative to your docker-compose.yml file.

Check the config doc here

Step 3

Run the command docker-compose up or docker compose up as per your docker installation.

Use -d flag to run the container in detached mode (background) e.g. docker compose up -d

Your LiteLLM container should be running now on the defined port e.g. 8000.

LiteLLM Proxy Performance

LiteLLM proxy has been load tested to handle 1500 req/s.

Throughput - 30% Increase

LiteLLM proxy + Load Balancer gives 30% increase in throughput compared to Raw OpenAI API

Latency Added - 0.00325 seconds

LiteLLM proxy adds 0.00325 seconds latency as compared to using the Raw OpenAI API

🐳 Docker, Deploying LiteLLM Proxy

Quick Start​

Run with LiteLLM CLI args​

Deploy with Database​

Step 1. Create deployment.yaml​

Step 2. Create service.yaml​

Step 3. Start server​

Step 1. Clone the repository​

Step 2. Deploy with Helm​

Step 3. Expose the service to localhost​

Advanced Deployment Settings​

Customization of the server root path​

Setting SSL Certification​

Platform-specific Guide​

Deploy on Google Cloud Run​

Testing your deployed proxy​

Deploy on Render https://render.com/​

Deploy on Railway https://railway.app​

Extras​

Run with docker compose​

LiteLLM Proxy Performance​

Throughput - 30% Increase​

Latency Added - 0.00325 seconds​