Train a multimodal chat model that can see and discuss images in multi-round conversations, powered by DeepSpeed distributed training. This workflow trains a vision-language model that combines a ...