Lybic Docs

Build a Playground with AI SDK

A tutorial on core SDK and AI SDK by Vercel

This tutorial will show you how to build a playground using the lybic core SDK with AI SDK by Vercel.

Lybic core SDK is a library that allows you to interact with the Lybic API with TypeScript. The AI SDK is the TypeScript toolkit designed to help developers build AI-powered applications and agents with React, Next.js, Vue, Svelte, Node.js, and more.

By following this tutorial, you will learn how to:

  • Use the lybic core SDK to interact with the Lybic API
  • Benifit from Lybic's easy-to-use Sandbox environment
  • Use the AI SDK to build a ready-to-use playground of Computer Use Agent

Setup

As a regular playground, you should split the app into a server side and a client side. You can choose any server-side or full-stack framework you like as we will not cover the framework details.

In this tutorial, we will just combine the server side and the client side into a single SPA frontend-only react app for convenience. This will expose your credentials to the client side and heavily discouraged, however it is very easy to transform it into a full-stack app later.

You can install the lybic core SDK, AI SDK and some supporting libraries in your project:

npm install @lybic/core ai @ai-sdk/openai-compatible @ai-sdk/react
yarn add @lybic/core ai @ai-sdk/openai-compatible @ai-sdk/react
pnpm install @lybic/core ai @ai-sdk/openai-compatible @ai-sdk/react

Break down the Computer Use Agent execution loop

As the very trimmed down version of the Computer Use Agent execution loop, you can see that the agent will:

  1. Get user prompts
  2. Take a screenshot
  3. Combine user prompts and screenshots into one user role message and adds to history
  4. Call LLM with the history constructed so far
  5. Parse LLM response
  6. Check if there are any actions / tool calls
    1. If there are, execute them
    2. If success or failed, update corresponding UI
    3. If user take over is required, ask user to take over
    4. If there are not, break and report error
  7. Continue back to step 2

Create a new model provider

We will use Ark from VolcEngine as our model provider. doubao-1.5-ui-tars is the SOTA model for GUI grounding tasks and are developed by them.

Using the following snippet, you can create a new model provider for AI SDK and enable your transport to use it.

ark-provider.ts
import {  } from '@ai-sdk/openai-compatible'
import { ,  } from 'ai'

// create a openai compatible model provider
const  = ({
  : ..!,
  : ..!,
  : 'ark',
  : true,
})

// use the ark model provider to create a new model provider
const  = ({
  : {
    'doubao-1-5-ui-tars-250428': ({
      : ('doubao-1-5-ui-tars-250428'),
      : [],
    }),
    'doubao-1-5-thinking-vision-pro-250428': ({
      : ('doubao-1-5-thinking-vision-pro-250428'),
      : [],
    }),
  },
})

// we only uses languageModel
export const  = .

You can adds as many models as you want, for now, we are focusing this 2 models.

Building the server side transport

The useChat transport system provides fine-grained control over how messages are sent to your API endpoints and how responses are processed. This is particularly useful for our specialized backend integrations.

First, we need to create your own transport. Checkout the Building Custom Transports documentation of AI SDK for more details.

This is a tutorial only example, you should not use it in production.

In real production, you extract the code from sendMessages function into your response handler, and pass the response stream to the client side. Meanwhile, client side should not need any custom transport.

Here, for quick and front-ony prototype, we will make all requests in the client side using the transport we created.

lybic-chat-transport.ts
import {  } from '@lybic/core'
import {
  ,
  ,
  ,
  ,
  ,
  ,
  ,
} from 'ai'
import {  } from './ark-provider'

export class  implements <> {
  public constructor(private readonly : ) {}

  public async (
    : {
      : 'submit-message' | 'regenerate-message'
      : string
      : string | undefined
      : []
    } & ,
  ): <<>> {
    // create a UI message stream
    const  = <>({
      : async ({  }) => {
        // convert the client side UI messages to server side messages
        const  = (.)
        const  = 
        const  = this.

        // call the LLM
        const  = ({
          : ('doubao-1-5-ui-tars-250428'),
          : ,
          : ,
        })

        // stream the LLM response to the client side
        .(.())
      },
    })

    return 
  }

  public async (
    : { : string } & ,
  ): <<> | null> {
    return null
  }
}

// the system prompt for ui-tars model
const  = `## Role
You are a GUI Agent, proficient in the operation of various commonly used software on Windows, Linux, and other operating systems.
Please complete the user's task based on user input, history Action, and screen shots.
You need to complete the entire task step by step, and output only one Action at a time, please strictly follow the format below.

## Output Format
Action_Summary: ... // Please make sure use English in this part.
Action: ...

Please strictly use the prefix "Action_Summary:" and "Action:".
Please use English in Action_Summary and use function calls in Action.

## Action Format
### click(start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### left_double(start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### right_click(start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### drag(start_box='<bbox>left_x top_y right_x bottom_y</bbox>', end_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### type(content='content') // If you want to submit your input, next action use hotkey(key='enter')
### hotkey(key='key')
### scroll(direction:Enum[up,down,left,right]='direction',start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### wait()
### finished()
### call_user() // Submit the task and call the user when the task is unsolvable, or when you need the user's help.
### output(content='content') // It is only used when the user specifies to use output, and after output is executed, it cannot be executed again.
`
lybic-chat-transport.ts
// @filename: ark-provider.ts
import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
import { customProvider, wrapLanguageModel } from "ai";
// create a openai compatible model provider
const ark = createOpenAICompatible({
	baseURL: process.env.LLM_BASE_URL,
	apiKey: process.env.LLM_API_KEY,
	name: "ark",
	includeUsage: true
});
// use the ark model provider to create a new model provider
const arkProvider = customProvider({ languageModels: {
	"doubao-1-5-ui-tars-250428": wrapLanguageModel({
		model: ark("doubao-1-5-ui-tars-250428"),
		middleware: []
	}),
	"doubao-1-5-thinking-vision-pro-250428": wrapLanguageModel({
		model: ark("doubao-1-5-thinking-vision-pro-250428"),
		middleware: []
	})
} });
// we only uses languageModel
export const arkModel = arkProvider.languageModel;
import { convertToModelMessages, createUIMessageStream, streamText } from "ai";
export class LybicChatTransport {
	constructor(coreClient) {
		this.coreClient = coreClient;
	}
	async sendMessages(options) {
		// create a UI message stream
		const stream = createUIMessageStream({ execute: async ({ writer }) => {
			// convert the client side UI messages to server side messages
			const modelMessages = convertToModelMessages(options.messages);
			const systemPrompt = guiAgentUiTarsPromptEn;
			const coreClient = this.coreClient;
			// call the LLM
			const result = streamText({
				model: arkModel("doubao-1-5-ui-tars-250428"),
				system: systemPrompt,
				messages: modelMessages
			});
			// stream the LLM response to the client side
			writer.merge(result.toUIMessageStream());
		} });
		return stream;
	}
	async reconnectToStream(options) {
		return null;
	}
}
// the system prompt for ui-tars model
const guiAgentUiTarsPromptEn = `## Role
You are a GUI Agent, proficient in the operation of various commonly used software on Windows, Linux, and other operating systems.
Please complete the user's task based on user input, history Action, and screen shots.
You need to complete the entire task step by step, and output only one Action at a time, please strictly follow the format below.

## Output Format
Action_Summary: ... // Please make sure use English in this part.
Action: ...

Please strictly use the prefix "Action_Summary:" and "Action:".
Please use English in Action_Summary and use function calls in Action.

## Action Format
### click(start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### left_double(start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### right_click(start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### drag(start_box='<bbox>left_x top_y right_x bottom_y</bbox>', end_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### type(content='content') // If you want to submit your input, next action use hotkey(key='enter')
### hotkey(key='key')
### scroll(direction:Enum[up,down,left,right]='direction',start_box='<bbox>left_x top_y right_x bottom_y</bbox>')
### wait()
### finished()
### call_user() // Submit the task and call the user when the task is unsolvable, or when you need the user's help.
### output(content='content') // It is only used when the user specifies to use output, and after output is executed, it cannot be executed again.
`;

In the example above, we have created a new transport that uses the ark model provider, and expose many details we can inject in. After LLMs has responded, we will parse the output and execute the actions.

Now, we can use useChat hook to build a simple frontend.

Building the chatbot UI

You can follow the Chatbot documentation of AI SDK to build a simple chatbot UI, we will not cover the details here.

For minimal example, you can just create a new transport and pass it to the useChat hook.

import { useChat } from '@ai-sdk/react'

const chat = useChat({
  transport: new LybicChatTransport({
    coreClient: client,
  }),
})

Integrate with Lybic Sandbox

Now you'll have a simple chatbot, but it cannot interact with any computer. We need to integrate with Lybic Sandbox to make it work.

Create a new sandbox

You should create a new sandbox on your server side, and get the sandbox id.

server.ts
import {  } from '@lybic/core'

// create a new client
const  = new ({
  : 'your-base-url',
  : 'your-org-id',
  : 'your-api-key',
})

// create a new sandbox
const { :  } = await .({
  : 'My Sandbox',
  : 60 * 60 * 24 * 30,
})
server.ts
import { LybicClient } from "@lybic/core";
// create a new client
const client = new LybicClient({
	baseUrl: "your-base-url",
	orgId: "your-org-id",
	apiKey: "your-api-key"
});
// create a new sandbox
const { data: sandbox } = await client.createSandbox({
	name: "My Sandbox",
	maxLifeSeconds: 60 * 60 * 24 * 30
});

Live Stream to the sandbox

Get started with Lybic UI SDK

Checkout the Lybic UI SDK to learn more about how to connect stream to the sandbox.

Take screenshots of the sandbox

While sending messages to the LLM, you should always take a screenshot and combine it with the user prompt or history.

Adds the following code to your transport:

lybic-chat-transport.ts
const  = this.

// takes screenshot of the sandbox
const  = await .()
const  = .!

// get the last message
const  = .[.. - 1]?.
const  = [. - 1]

// append the screenshot to the last message
. = [
  {
    : 'text',
    : .,
  },
  // adds the screenshot to the last message
  {
    : 'file',
    : 'image/webp',
    : new (.!),
  }
]

You may also want to download the screenshot and convert it into base64 data URLs, as some of the providers may not support the https URL.

Parse LLM response and execute actions

Adds the onFinish handler in streamText function, and call parseLlmOutput to parses the LLM response.

Upon parsed the actions, we will stream the actions to the client side, and execute the actions on the sandbox.

lybic-chat-transport.ts
  : async () => {
    // parse the LLM response
    const { data:  } = await coreClient.parseLlmOutput({
      : 'ui-tars',
      : .,
    })

    if (?.actions && .actions.length > 0) {
      // stream the parsed actions to the client side
      writer.write({
        : 'data-parsed',
        : {
          : .actions,
          : [.thoughts, .unknown].().('\n'),
        },
      })

      // execute the actions on the sandbox
      for (const  of ?.actions) {
        await coreClient.executeComputerUseAction(sandboxId, {
          ,
          : false,
          : false,
        })
      }
    }
  },

Handing updates on client side

Once you have writer.write some data from the server side, you can use the onData callback of useChat hook to handle the updates.

conversation.tsx
  : () => {
    if (.type === 'data-screenShot') {
      .setMessages(
        produce(() => {
          const  = .findIndex(() => .id === .data.messageId)
          if ( === -1) {
            return 
          }
          const  = []
          ?.parts.push({
            : 'file',
            : 'image/webp',
            : .data.url,
          })
          return 
        }),
      )
    }
  },

See Streaming Custom Data in AI SDK documentation for more details.

Conclusion

You have learned how to build a playground with AI SDK and Lybic core SDK. You can now use this playground to build your own AI-powered applications.

We have open sourced our playground at GitHub under lybic/typescript repository, you can find the complete code there.