Display your AI response stream on a web page

With the accessibility of LLMs to the public, it has become widely popular to integrate chatting models into websites. We are on the verge of a revolution in the way users consume the web and it might become crucial in the near future to harness the power of artificial intelligence into new and existing products.

Credits: DALL-E

This is a follow up article to Stream Claude answers in real-time with PHP

Getting an AI model to process a query is one thing, displaying the result on screen is another. With the implementation of streams, a product can create a deeper user experience by giving chunks of an answer as soon as they are being processed. This way of doing things has become a relatively new standard in recent years and might be tricky to achieve on a web page. Fortunately, Javascript provides all the necessary tools to make it happen.

The goal of this article is to give a general understanding on how to create a real-time chatting experience on a simple web page. If you need something deeper, with more functionalities, feel free to customize the following example based on your needs or get chatting tools made by the community.

Let's start by creating the file structure we will need to make it work. There will be four key components in this project. A PHP file where we will manage communications with the LLM, an HTML file to put our markup, a CSS file to style the page and a JS file to add interactivity.

|
— stream.php
|
— index.html
|
— 818765fa7bad09edd3b973538f638292.css
|
— 1623b0929e8951017763c30925357269.js

For the PHP script, let's take back the work done in our previous article and adjust the code to fit the current needs before inserting it into the stream.php file.

$request = json_decode(
    file_get_contents('php://input'),
    true
);
if (!empty($request['message'])) {
    $message = $request['message'];
} else {
    echo 'Please ask me anything you want.';
    exit;
}

$url = "https://api.anthropic.com/v1/messages";

$headers = array(
    "x-api-key: " . getenv('YOUR_API_KEY'),
    "content-type: application/json",
    "anthropic-version: 2023-06-01"
);

$body = array(
    "model" => 'claude-3-sonnet-20240229',
    "max_tokens" => 1024,
    "stream" => true,
    "messages" => array(
        array(
            "role" => "user",
            "content" => $message
        )
    )
);

$context  = stream_context_create(
    [
        'http' => [
            'method'  => 'POST',
            'header'  => $headers,
            'content' => json_encode($body),
        ],
    ]
);
$response = fopen($url, 'r', false, $context);

if ($response === false) {
    echo "Couldn't connect to the API
";
    exit;
}

$dataExtractionRegex = '/data: (.*)/';

$file = __DIR__ . '/output.txt';
file_put_contents($file, '');

$answer = '';
while (!feof($response)) {
    $chunk = fgets($response);
    if ($chunk !== false) {
        //Since Claude3 are sending data chunks with extra information, 
        //we need to extract the data from the chunk. To do this, we use a regex.
        preg_match_all($dataExtractionRegex, $chunk, $matches);

        if (!empty($matches[1])) {
            //The data is formatted as a JSON object, so we can decode it to an associative array.
            $data = json_decode(trim($matches[1][0]), true);

            if ($data !== false) {
                if ($data['type'] === 'content_block_delta') {
                    //The content block delta contains the text generated by the model.
                    $answer .= $data['delta']['text'];
                    file_put_contents($file, $data['delta']['text'], FILE_APPEND);
                }
            }
        }
    }
}

echo $answer;

For the sake of simplicity, we will put chunks generated by Claude in a text file at the root of the project. The web page will then poke that file at fixed intervals to get new updates on the stream. This approach is the simplest as no special configuration is required to make it work. Otherwise, we would have to rely on either an HTTP stream or a Websocket which both require additional effort and knowledge, which is out of the scope of this article.

Next step is to create a basic chat interface where we will be able to input questions and receive answers.


    
        AMA with Claude 3
        
    
    
        

Ask me anything, featuring Claude 3!

Simple HTML structure where new messages will get appended to the messages div tag while providing controls to send inputs through a text area and a send button.

For the interface to be visually understandable, a layer of style needs to be applied on top of the markup. Let's save the following CSS rules to the 818765fa7bad09edd3b973538f638292.css file.

.conversation {
    height: 20rem;
    width: 36rem;
    padding: 0.5rem;
    border: 1px solid black;
    overflow-y: auto;
    display: flex;
    flex-direction: column-reverse;
}

#messages {
    display: flex;
    align-items: end;
    justify-items: center;
    flex-direction: column;
}

.message {
    padding: 0.5rem;
    margin: 0.5rem;
    border-radius: 0.5rem;
    max-width: 80%;
    position: relative;
}

.anchor {
    position: absolute;
    top: 0;
    left: 0;
}

.message:last-child {
    overflow-anchor: auto;
}

.message.user {
    background-color: #666;
    color: #fff;
    align-self: flex-end;
}

.message.ai {
    background-color: #f0f0f0;
    align-self: flex-start;
}

.controls {
    display: flex;
    width: 36rem;
    padding: 0.5rem;
    border: 1px solid black;
    margin-top: 0.5rem;
}

.controls textarea {
    width: 30rem;
}

.controls button {
    width: 5rem;
    margin-left: 1rem;
}

#btnSend.disabled {
    opacity: 0.6;
    pointer-events: none;
}

One thing to note here is the use of the conversation and messages containers to pin the scroll at the bottom. By reversing the flex column direction in the conversation class, new items will force the browser to scroll to the bottom as the vertical direction is reversed but by resetting it in the messages class, items would still be displayed from top to bottom while being visually pushed down.

Finally, it's time to tackle Javascript. This is where the fun begins. First, let's begin by setting up a tool function in charge of dynamically creating message markup to add to our HTML structure. A message is split into three logical components:

  1. Content: This is where the text representation of the message is inserted
  2. Anchor: Mainly used for scrolling purposes. Although invisible, this element is forced into view when a scroll is needed.
  3. Container: The box where the content and anchor are placed.

With this in mind, let's create the markup and return each component in a definition object for further usage.

function createMessage(type) {
    //Create the message container
    var container = document.createElement("div");
    container.classList.add("message");
    container.classList.add(type);

    //Create the anchor for scrolling
    var anchor = document.createElement("div");
    anchor.classList.add("anchor");
    container.appendChild(anchor);

    //Create the content of the message
    var content = document.createElement("div");
    content.classList.add("content");
    container.appendChild(content);

    return { container, anchor, content };
}

Now that we can create a message, let's have the user send them through the inputs on screen. To simplify things, let's put the sending logic into a function where we will attach to the click of the send button.

var messageContainer = document.getElementById("messages");
var txtAsk = document.getElementById("txtAsk");
var btnSend = document.getElementById("btnSend");

var streamedMsg, streaming;

function sendMessage() {
    var toAsk = txtAsk.value;

    //If the user has entered a message, we ask the AI
    if (toAsk !== "") {
        //First we display the input message on screen
        var userMessage = createMessage("user");
        messageContainer.append(userMessage.container);
        userMessage.content.innerHTML = toAsk;

        txtAsk.value = "";
        userMessage.anchor.scrollIntoView();

        //Then we prevent further message during the stream and ask the AI
        btnSend.classList.add("disabled");
        askAI(toAsk);
    }
}

If the user typed something into the textarea, it gets parsed and added as a message in the conversation. Since streaming an AI response takes time, the send button gets also disabled as we don't want the user to continue typing while the answer is coming in (let's not be rude and interrupt Claude while he's talking!).

As you can see from the code above, all the communication logic has been bundled into the askAI function. This is where the magic happens. It takes a string of text input by the user and initiates a stream by using the Javascript Fetch API to communicate with the PHP script on the server side. For the duration of the script, the communication stays open and continues to run while we periodically poke the server to get new updates on chunks generated by Claude. As soon as the main Fetch call ends, it's our queue to wrap everything up and finish the display of the current message. We do so by setting the streaming flag to false as this will tell the display method that we reached the end, no more updates are needed.

function askAI(message) {
    //Since it can take some time to get the answer started, we let the user know
    //that his or her request has been sent to the AI
    var aiMessage = createMessage("ai");
    messageContainer.append(aiMessage.container);
    aiMessage.content.innerHTML = "Thinking ...";

    aiMessage.anchor.scrollIntoView();

    streamedMsg = "";
    streaming = true;

    //We start the stream by calling it. This will start the streaming process on the server side
    //and will run until it gets the full answer
    fetch("/stream.php", {
        method: "POST",
        headers: {
            Accept: "application/json",
            "Content-Type": "application/json",
        },
        body: JSON.stringify({ message }),
    }).then(async (response) => {
        await response.text().then((response) => {
            //As soon as we get the full answer, we stop the stream and display the answer
            if (response !== "") {
                streamedMsg = response;
            }
            streaming = false;
        });
    });

    //While the stream is running, we begin the display loop
    displayMessage(aiMessage);
}

The last part of the script is in charge of displaying the message on screen in a way that mimics a human, like if Claude is directly speaking to the user. There are multiple ways to achieve this. The one picked for this example relies on having logic executed on every animation frame run by the browser.

function displayMessage(aiMessage) {
    var now = new Date().getTime();
    var lastTime = { display: 0, update: now };
    var index = 0;
    //We create a function to be run every frame. It will display the message letter by letter
    //and also update the stream every second
    const display = () => {
        now = new Date().getTime();
        //If there are still letters to display, we display them
        if (index < streamedMsg.length) {
            //On the first frame, we clear the "Thinking ..." text
            if (index === 0) {
                aiMessage.content.innerHTML = "";
            }

            //We display the message letter by letter at a rate of one letter per 30ms
            if (now - lastTime.display > 30) {
                lastTime.display = now;

                //This is a simple way to handle new lines. This is where we would handle markdown logic parsing
                aiMessage.content.innerHTML +=
                    streamedMsg[index] == "
" ? "
" : streamedMsg[index]; index++; } } //If the stream is still running, we update the stream every second if (streaming && now - lastTime.update > 1000) { lastTime.update = now; fetch("/output.txt").then(async (response) => { await response.text().then((response) => { if (response !== "") { streamedMsg = response; } }); }); } //If the stream is still running or there are still letters to display, we request a new frame if (streaming || index < streamedMsg.length) { window.requestAnimationFrame(display); } else { //At the end of the stream we re-enable the input field btnSend.classList.remove("disabled"); } }; display(); }

The function takes the current state of the answer and outputs the next letter based on an index variable set outside the loop. It calculates the delta time between now and the last time a letter was updated to know when to process the next one. It goes on until the end of the stream. The same function also uses the fact we are already requesting a frame to calculate when to fetch a stream update from the server.

Let's put everything together, host our example on a web hosting environment (DDEV does a remarkable job for testing local projects such as this) and hit refresh on the browser.

AMA with Claude 3

It works! We did it!

Closing thoughts

With this basic example, we've seen how a simple chat interface could be done behind the scenes. There are many ways to achieve this, either manually or by using tools from the community, but what's most important is to pick the right solution based on your needs.

Response streaming can drastically increase the user experience by mitigating the lag between a user request and an AI response. It also gives the impression that the computer is speaking to you which could increase engagement of users towards your product. We are in an age of technological marvels and the way we consume products is on the verge of being revolutionized with the boom in popularity of LLMs.

Possibilities are endless but remember, creativity is the key.

Disclaimer: No AI models were used in the writing of this article. The text content was purely written by hand by its author. Human generated content still has its place in the world and must continue to live on. Only the image was generated using an AI model.

Articles you might be interested

Enhance your ChatGPT prompt with Pinecone and PHP

When developing a new AI powered application, the need to provide external knowledge to AI models is often required. They certainly have their own kno wledge, which they have been trained on, but sometimes, the information they need to process the right answer might be hidden from public knowledge or situational to the task at hand. In those cases, relevant data needs to be provided with the user’s query to increase your chances of success.

May 5, 2024

Stream Claude answers in real-time with PHP

Since the release of Claude 3, the AI model has boomed in popularity and became a serious alternative to ChatGPT. The quality of its reasoning and the relevance of its answers made it a perfect choice as an engine to drive an AI powered application forward.

April 14, 2024

The usefulness of prompt templates

In an era where AI technology is getting integrated in many aspects of our lives, being able to properly externalize what we need is a must. The time when AI models can read our brain to know exactly what we want hasn’t come yet and it’s crucial to be able to craft meaningful prompts to achieve the task at hand.

March 17, 2024