Async API
Last updated
Last updated
The Asynchronous Inference API provides a unified interface for submitting and managing long-running AI tasks—such as chat completions, image generations, and other model inferences—without blocking your application. Tasks are queued and processed in the background, allowing you to fetch results later using a unique task ID. This architecture is ideal for scaling inference workloads, handling variable latency, and integrating AI into event-driven or serverless systems. Whether you're running compute-intensive models or managing bursty traffic, the async API ensures flexibility, reliability, and control across all supported task types.