Build a Playground with AI SDK
A tutorial on core SDK and AI SDK by Vercel
This tutorial will show you how to build a playground using the lybic core SDK with AI SDK by Vercel.
Lybic core SDK is a library that allows you to interact with the Lybic API with TypeScript. The AI SDK is the TypeScript toolkit designed to help developers build AI-powered applications and agents with React, Next.js, Vue, Svelte, Node.js, and more.
By following this tutorial, you will learn how to:
- Use the lybic core SDK to interact with the Lybic API
- Benifit from Lybic's easy-to-use Sandbox environment
- Use the AI SDK to build a ready-to-use playground of Computer Use Agent
Setup
As a regular playground, you should split the app into a server side and a client side. You can choose any server-side or full-stack framework you like as we will not cover the framework details.
In this tutorial, we will just combine the server side and the client side into a single SPA frontend-only react app for convenience. This will expose your credentials to the client side and heavily discouraged, however it is very easy to transform it into a full-stack app later.
You can install the lybic core SDK, AI SDK and some supporting libraries in your project:
npm install @lybic/core ai @ai-sdk/openai-compatible @ai-sdk/react
yarn add @lybic/core ai @ai-sdk/openai-compatible @ai-sdk/react
pnpm install @lybic/core ai @ai-sdk/openai-compatible @ai-sdk/react
Break down the Computer Use Agent execution loop
As the very trimmed down version of the Computer Use Agent execution loop, you can see that the agent will:
- Get user prompts
- Take a screenshot
- Combine user prompts and screenshots into one user role message and adds to history
- Call LLM with the history constructed so far
- Parse LLM response
- Check if there are any actions / tool calls
- If there are, execute them
- If success or failed, update corresponding UI
- If user take over is required, ask user to take over
- If there are not, break and report error
- Continue back to step 2
Create a new model provider
We will use Ark from VolcEngine as our model provider. doubao-1.5-ui-tars
is the SOTA model for GUI grounding tasks and are developed by them.
Using the following snippet, you can create a new model provider for AI SDK and enable your transport to use it.
import { createOpenAICompatible } from '@ai-sdk/openai-compatible'
import { customProvider, wrapLanguageModel } from 'ai'
// create a openai compatible model provider
const ark = createOpenAICompatible({
baseURL: process.env.LLM_BASE_URL!,
apiKey: process.env.LLM_API_KEY!,
name: 'ark',
includeUsage: true,
})
// use the ark model provider to create a new model provider
const arkProvider = customProvider({
languageModels: {
'doubao-1-5-ui-tars-250428': wrapLanguageModel({
model: ark('doubao-1-5-ui-tars-250428'),
middleware: [],
}),
'doubao-1-5-thinking-vision-pro-250428': wrapLanguageModel({
model: ark('doubao-1-5-thinking-vision-pro-250428'),
middleware: [],
}),
},
})
// we only uses languageModel
export const arkModel = arkProvider.languageModel
You can adds as many models as you want, for now, we are focusing this 2 models.
Building the server side transport
The useChat
transport system provides fine-grained control over how messages are sent to your API endpoints and how responses are processed. This is particularly useful for our specialized backend integrations.
First, we need to create your own transport. Checkout the Building Custom Transports documentation of AI SDK for more details.
This is a tutorial only example, you should not use it in production.
In real production, you extract the code from sendMessages
function into your response handler, and pass the response stream to the client side. Meanwhile, client side should not need any custom transport.
Here, for quick and front-ony prototype, we will make all requests in the client side using the transport we created.
import { LybicClient } from '@lybic/core'
import {
ChatRequestOptions,
ChatTransport,
convertToModelMessages,
createUIMessageStream,
streamText,
UIMessage,
UIMessageChunk,
} from 'ai'
import { arkModel } from './ark-provider'
export class LybicChatTransport implements ChatTransport<UIMessage> {
public constructor(private readonly coreClient: LybicClient) {}
public async sendMessages(
options: {
trigger: 'submit-message' | 'regenerate-message'
chatId: string
messageId: string | undefined
messages: UIMessage[]
} & ChatRequestOptions,
): Promise<ReadableStream<UIMessageChunk>> {
// create a UI message stream
const stream = createUIMessageStream<UIMessage>({
execute: async ({ writer }) => {
// convert the client side UI messages to server side messages
const modelMessages = convertToModelMessages(options.messages)
const systemPrompt = guiAgentUiTarsPromptEn
const coreClient = this.coreClient
// call the LLM
const result = streamText({
model: arkModel('doubao-1-5-ui-tars-250428'),
system: systemPrompt,
messages: modelMessages,
})
// stream the LLM response to the client side
writer.merge(result.toUIMessageStream())
},
})
return stream
}
public async reconnectToStream(
options: { chatId: string } & ChatRequestOptions,
): Promise<ReadableStream<UIMessageChunk> | null> {
return null
}
}
// the system prompt for ui-tars model
const guiAgentUiTarsPromptEn = `## Role
You are a GUI Agent, proficient in the operation of various commonly used software on Windows, Linux, and other operating systems.
Please complete the user's task based on user input, history Action, and screen shots.
You need to complete the entire task step by step, and output only one Action at a time, please strictly follow the format below.
## Output Format
Action_Summary: ... // Please make sure use English in this part.
Action: ...
Please strictly use the prefix "Action_Summary:" and "Action:".
Please use English in Action_Summary and use function calls in Action.
## Action Format
### click(start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### left_double(start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### right_click(start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### drag(start_box='<bbox>left_x top_y right_x bottom_y</bbox>', end_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### type(content='content') // If you want to submit your input, next action use hotkey(key='enter')
### hotkey(key='key')
### scroll(direction:Enum[up,down,left,right]='direction',start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### wait()
### finished()
### call_user() // Submit the task and call the user when the task is unsolvable, or when you need the user's help.
### output(content='content') // It is only used when the user specifies to use output, and after output is executed, it cannot be executed again.
`
// @filename: ark-provider.ts
import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
import { customProvider, wrapLanguageModel } from "ai";
// create a openai compatible model provider
const ark = createOpenAICompatible({
baseURL: process.env.LLM_BASE_URL,
apiKey: process.env.LLM_API_KEY,
name: "ark",
includeUsage: true
});
// use the ark model provider to create a new model provider
const arkProvider = customProvider({ languageModels: {
"doubao-1-5-ui-tars-250428": wrapLanguageModel({
model: ark("doubao-1-5-ui-tars-250428"),
middleware: []
}),
"doubao-1-5-thinking-vision-pro-250428": wrapLanguageModel({
model: ark("doubao-1-5-thinking-vision-pro-250428"),
middleware: []
})
} });
// we only uses languageModel
export const arkModel = arkProvider.languageModel;
import { convertToModelMessages, createUIMessageStream, streamText } from "ai";
export class LybicChatTransport {
constructor(coreClient) {
this.coreClient = coreClient;
}
async sendMessages(options) {
// create a UI message stream
const stream = createUIMessageStream({ execute: async ({ writer }) => {
// convert the client side UI messages to server side messages
const modelMessages = convertToModelMessages(options.messages);
const systemPrompt = guiAgentUiTarsPromptEn;
const coreClient = this.coreClient;
// call the LLM
const result = streamText({
model: arkModel("doubao-1-5-ui-tars-250428"),
system: systemPrompt,
messages: modelMessages
});
// stream the LLM response to the client side
writer.merge(result.toUIMessageStream());
} });
return stream;
}
async reconnectToStream(options) {
return null;
}
}
// the system prompt for ui-tars model
const guiAgentUiTarsPromptEn = `## Role
You are a GUI Agent, proficient in the operation of various commonly used software on Windows, Linux, and other operating systems.
Please complete the user's task based on user input, history Action, and screen shots.
You need to complete the entire task step by step, and output only one Action at a time, please strictly follow the format below.
## Output Format
Action_Summary: ... // Please make sure use English in this part.
Action: ...
Please strictly use the prefix "Action_Summary:" and "Action:".
Please use English in Action_Summary and use function calls in Action.
## Action Format
### click(start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### left_double(start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### right_click(start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### drag(start_box='<bbox>left_x top_y right_x bottom_y</bbox>', end_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### type(content='content') // If you want to submit your input, next action use hotkey(key='enter')
### hotkey(key='key')
### scroll(direction:Enum[up,down,left,right]='direction',start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### wait()
### finished()
### call_user() // Submit the task and call the user when the task is unsolvable, or when you need the user's help.
### output(content='content') // It is only used when the user specifies to use output, and after output is executed, it cannot be executed again.
`;
In the example above, we have created a new transport that uses the ark model provider, and expose many details we can inject in. After LLMs has responded, we will parse the output and execute the actions.
Now, we can use useChat
hook to build a simple frontend.
Building the chatbot UI
You can follow the Chatbot documentation of AI SDK to build a simple chatbot UI, we will not cover the details here.
For minimal example, you can just create a new transport and pass it to the useChat
hook.
import { useChat } from '@ai-sdk/react'
const chat = useChat({
transport: new LybicChatTransport({
coreClient: client,
}),
})
Integrate with Lybic Sandbox
Now you'll have a simple chatbot, but it cannot interact with any computer. We need to integrate with Lybic Sandbox to make it work.
Create a new sandbox
You should create a new sandbox on your server side, and get the sandbox id.
import { LybicClient } from '@lybic/core'
// create a new client
const client = new LybicClient({
baseUrl: 'your-base-url',
orgId: 'your-org-id',
apiKey: 'your-api-key',
})
// create a new sandbox
const { data: sandbox } = await client.createSandbox({
name: 'My Sandbox',
maxLifeSeconds: 60 * 60 * 24 * 30,
})
import { LybicClient } from "@lybic/core";
// create a new client
const client = new LybicClient({
baseUrl: "your-base-url",
orgId: "your-org-id",
apiKey: "your-api-key"
});
// create a new sandbox
const { data: sandbox } = await client.createSandbox({
name: "My Sandbox",
maxLifeSeconds: 60 * 60 * 24 * 30
});
Live Stream to the sandbox
Get started with Lybic UI SDK
Checkout the Lybic UI SDK to learn more about how to connect stream to the sandbox.
Take screenshots of the sandbox
While sending messages to the LLM, you should always take a screenshot and combine it with the user prompt or history.
Adds the following code to your transport:
const coreClient = this.coreClient
// takes screenshot of the sandbox
const response = await coreClient.previewSandbox(sandboxId)
const preview = response.data!
// get the last message
const lastMessageId = options.messages[options.messages.length - 1]?.id
const lastMessage = modelMessages[modelMessages.length - 1]
// append the screenshot to the last message
lastMessage.content = [
{
type: 'text',
text: lastMessage.content,
},
// adds the screenshot to the last message
{
type: 'file',
mediaType: 'image/webp',
data: new URL(preview.screenShot!),
}
]
You may also want to download the screenshot and convert it into base64 data URLs, as some of the providers may not support the https
URL.
Parse LLM response and execute actions
Adds the onFinish
handler in streamText
function, and call parseLlmOutput
to parses the LLM response.
Upon parsed the actions, we will stream the actions to the client side, and execute the actions on the sandbox.
onFinish: async (message) => {
// parse the LLM response
const { data: parsedAction } = await coreClient.parseLlmOutput({
model: 'ui-tars',
textContent: message.text,
})
if (parsedAction?.actions && parsedAction.actions.length > 0) {
// stream the parsed actions to the client side
writer.write({
type: 'data-parsed',
data: {
actions: parsedAction.actions,
text: [parsedAction.thoughts, parsedAction.unknown].filter(Boolean).join('\n'),
},
})
// execute the actions on the sandbox
for (const action of parsedAction?.actions) {
await coreClient.executeComputerUseAction(sandboxId, {
action,
includeScreenShot: false,
includeCursorPosition: false,
})
}
}
},
Handing updates on client side
Once you have writer.write
some data from the server side, you can use the onData
callback of useChat
hook to handle the updates.
onData: (data) => {
if (data.type === 'data-screenShot') {
chat.setMessages(
produce((messages) => {
const messageIndex = messages.findIndex((m) => m.id === data.data.messageId)
if (messageIndex === -1) {
return messages
}
const message = messages[messageIndex]
message?.parts.push({
type: 'file',
mediaType: 'image/webp',
url: data.data.url,
})
return messages
}),
)
}
},
See Streaming Custom Data in AI SDK documentation for more details.
Conclusion
You have learned how to build a playground with AI SDK and Lybic core SDK. You can now use this playground to build your own AI-powered applications.
We have open sourced our playground at GitHub under lybic/typescript repository, you can find the complete code there.