You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Based on a large patch by @vaiju1981 that proposed OpenAI-compatible chat completions and JSON-in/JSON-out endpoints for the java-llama.cpp project. The patch was reimplemented from scratch against the current codebase (llama.cpp b8611) with significant improvements.
CI Status: All 16/16 jobs green
macOS 14 (Metal), macOS 15 (Metal + no-Metal), Ubuntu, Windows (x86 + x86_64), Android, Linux aarch64, manylinux, CUDA — all passing.
What was implemented (14 commits)
Phase 1-2: Chat Completions (core feature)
Method
Description
chatComplete(InferenceParameters)
Blocking OpenAI-compatible chat completion with automatic template application
generateChat(InferenceParameters)
Streaming chat completion via LlamaIterator
handleChatCompletions(String)
Native JSON-in/JSON-out chat endpoint
requestChatCompletion(String)
Native streaming chat (returns task ID)
Phase 3: JNI Simplification
Change
Description
receiveCompletionJson
Returns JSON string instead of constructing LlamaOutput via JNI
The original patch by @vaiju1981 is now fully obsolete. All functionality has been reimplemented with improvements, comprehensive tests, and proper thread safety.