Serve Qwen 3 with vLLM
Run Qwen3-8B as an OpenAI-compatible API. Query it from your local machine or plug it into any app that speaks the OpenAI format.cassian.yaml
Text-to-Speech with Kokoro
Run Kokoro TTS locally and generate speech from text. Great for building voice apps or generating training data.cassian.yaml
serve_tts.py:
Transcribe audio with Whisper
Run Whisper large-v3-turbo for fast audio transcription.cassian.yaml
serve_whisper.py:
Fine-tune Qwen with LoRA
Train a LoRA adapter on your own dataset. Model weights cache in cloud storage so you don’t re-download on restart.cassian.yaml
checkpoints/persists across sessions but doesn’t sync locally- Model weights in
/workspace/storagesurvive restarts without eating disk wandb/is excluded since W&B syncs to their own cloud
Image generation with FLUX
Serve FLUX.1-schnell for fast image generation.cassian.yaml
serve_flux.py:
Multi-GPU distributed training
Scale to multiple GPUs withtorchrun.
cassian.yaml
Jupyter on a GPU
Run notebooks with full CUDA access.cassian.yaml