Building Client-Side AI Applications with ONNX Runtime

In this article, we'll explore how to use ONNX Runtime to run machine learning models directly in the browser, creating powerful AI-enabled applications without relying on expensive API calls.

Why Client-Side AI?

Running AI models in the browser offers several significant advantages:

Privacy: User data never leaves their device, addressing privacy concerns around sending sensitive information to third-party servers.
Cost: No expensive API calls to services like OpenAI or Anthropic, which can quickly become costly at scale.
Offline support: Applications continue to work without an internet connection, improving reliability.
Reduced latency: No network delays means faster response times for a better user experience.

However, these benefits come with tradeoffs in model size and complexity compared to server-side solutions.

What is ONNX Runtime?

ONNX (Open Neural Network Exchange) is an open format for representing machine learning models. ONNX Runtime is a performance-focused engine for running these models across different platforms.

The ONNX format allows models trained in frameworks like PyTorch or TensorFlow to be exported to a standard format that can run in various environments - including web browsers via WebAssembly.

Setting Up ONNX Runtime in a React Project

Let's walk through the process of adding ONNX Runtime to a React application:

First, install the required dependencies:

npm install onnxruntime-web

Next, we need to configure our build system. If you're using Create React App, you'll need to either eject or use CRACO to modify the webpack configuration. For Vite, create a vite.config.js file:

import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';

export default defineConfig({
  plugins: [react()],
  server: {
    headers: {
      'Cross-Origin-Embedder-Policy': 'require-corp',
      'Cross-Origin-Opener-Policy': 'same-origin',
    },
  },
  build: {
    rollupOptions: {
      output: {
        manualChunks: {
          onnx: ['onnxruntime-web'],
        },
      },
    },
  },
  optimizeDeps: {
    exclude: ['onnxruntime-web'],
  },
});

The headers are important for allowing SharedArrayBuffer which ONNX Runtime uses for better performance.

Creating an AI Component

Now, let's create a simple text classification component:

import React, { useState, useEffect, useRef } from 'react';
import * as ort from 'onnxruntime-web';

const TextClassifier = () => {
  const [model, setModel] = useState(null);
  const [input, setInput] = useState('');
  const [prediction, setPrediction] = useState(null);
  const [loading, setLoading] = useState(true);
  const [error, setError] = useState(null);

  // Initialize model on component mount
  useEffect(() => {
    const initModel = async () => {
      try {
        // Path to your ONNX model
        const modelPath = '/models/text_classifier.onnx';
        const session = await ort.InferenceSession.create(modelPath);
        setModel(session);
        setLoading(false);
      } catch (e) {
        console.error('Failed to load ONNX model:', e);
        setError('Failed to load the AI model. Please try again later.');
        setLoading(false);
      }
    };

    initModel();
  }, []);

  // Tokenize text (simplified example)
  const tokenize = (text) => {
    // In a real app, you'd use a proper tokenizer that matches how the model was trained
    const tokens = text.toLowerCase().split(' ');
    return tokens;
  };

  const handleSubmit = async (e) => {
    e.preventDefault();
    
    if (!model) return;
    
    try {
      // Tokenize and prepare input in the format your model expects
      const tokens = tokenize(input);
      
      // Convert tokens to model input format (example - adjust for your model)
      const inputTensor = new ort.Tensor('string', tokens, [1, tokens.length]);
      
      // Run inference
      const results = await model.run({ input: inputTensor });
      
      // Process results based on your model's output format
      const output = results.output.data;
      
      setPrediction({
        category: output[0] > 0.5 ? 'Positive' : 'Negative',
        confidence: Math.max(output[0], 1 - output[0]) * 100
      });
    } catch (e) {
      console.error('Inference failed:', e);
      setError('Failed to process the text. Please try again.');
    }
  };

  if (loading) return <div>Loading model...</div>;
  if (error) return <div>Error: {error}</div>;

  return (
    <div className="p-4 border rounded-lg">
      <h2 className="text-xl font-bold mb-4">Text Sentiment Classifier</h2>
      <form onSubmit={handleSubmit}>
        <textarea
          value={input}
          onChange={(e) => setInput(e.target.value)}
          className="w-full p-2 border rounded mb-4"
          rows="4"
          placeholder="Enter text to classify..."
        />
        <button 
          type="submit" 
          className="px-4 py-2 bg-blue-500 text-white rounded"
          disabled={!input.trim()}
        >
          Analyze
        </button>
      </form>
      
      {prediction && (
        <div className="mt-4 p-3 bg-gray-100 rounded">
          <p><strong>Result:</strong> {prediction.category}</p>
          <p><strong>Confidence:</strong> {prediction.confidence.toFixed(2)}%</p>
        </div>
      )}
    </div>
  );
};

export default TextClassifier;

This is a simplified example. Real-world implementation would require a proper tokenizer that matches how your model was trained, handling of token IDs, and possibly additional preprocessing.

Finding Suitable Models

Not all models can effectively run in the browser. Here are some guidelines:

Size Matters: Smaller models (< 100MB) work best in browser environments
Quantized Models: Look for quantized models (INT8 or UINT8) which are much smaller
Specialized Models: Models designed for specific tasks generally perform better than general-purpose ones
Model Hubs: Check ONNX Model Zoo, Hugging Face, or convert custom models from PyTorch/TensorFlow

Popular models suitable for browser deployment include:

MobileNet for image classification
BERT-tiny for NLP tasks
BlazeFace for face detection
PoseNet for pose estimation

Performance Optimization Tips

When running AI models in the browser, performance optimization is crucial:

Quantize models from FP32 to INT8 when possible (trading minimal accuracy for significant size reduction)
Use Web Workers to run inference on a separate thread
Load models asynchronously to avoid blocking the main thread
Implement progressive loading for larger models
Leverage caching to avoid reloading the model on repeat visits

Conclusion

Client-side AI with ONNX Runtime opens up exciting possibilities for creating intelligent applications that respect user privacy and reduce operational costs. While it's not suited for every AI use case, particularly those requiring large-scale models, it's perfect for many common scenarios like content classification, image recognition, and simple natural language processing tasks.

By thoughtfully considering the tradeoffs and optimizing for the browser environment, you can create compelling AI-enhanced experiences that run entirely on your users' devices.