Yevhen Krasnokutsky

Posted on May 21

Giving Odin Intelligence

#odin #onnx #ai

My adventure with Odin keeps going. I decided to try something unexpected with Odin - deep learning and AI! "But how?" you might ask. Fortunately, there is a well-known deep-learning framework for running AI models. I'm about ONNX.

Prerequisites

Familiarity with ONNX
Familiarity with C
Odin Overview page has been read

ONNX has a well-documented C API, which makes it easy to port to Odin. At least, I hoped it would be easy. But this is another story.

The plan:

chose the ONNX sample to translate to Odin
make binds to ONNX
???
run model on GPU with the use of Odin

ONNX Sample

I've found a suitable for my idea ONNX example. I'm going to use this example as a strong foundation for the project. But to make things more interesting I'll add just a few enhancements:

listing all available providers
checking if CUDA is available
using CUDA (if available) for running a model

This example code shows the basic usage of ONNX:

Make OrtApi instance
Initialize OrtEnv variable
Configure OrtSessionOptions and OrtSession
Run session (all computations happen here)
Process results and release resources

ONNX Bindings

Long story short. I've already made Odin bindings for ONNX. And I'll use those bindings in the project. I have to say that this post is not about bindings generation. Still, we'll compare the C and Odin versions of the code to get an idea about the usage of C APIs in Odin.

???

In this paragraph, I'll describe only the key parts of the code. A link to the full code is available in the references.

Make `OrtApi` instance

In C OrtApi instance initialized as follows:

#include <onnxruntime_c_api.h>

const OrtApi* g_ort = OrtGetApiBase()->GetApi(ORT_API_VERSION);

On the Odin side, the OrtApi structure is already defined in the bindings file, so we can simply initialize the OrtApi instance and even check if it was initialized successfully:

g_ort: ^OrtApi
if g_ort = OrtGetApiBase().GetApi(ORT_API_VERSION); cast(rawptr)g_ort == nil {
    fmt.eprintln(">>> OrtApi is nil")
    os.exit(1)
}

Listing all available providers

To get a list of all available providers, in C we use:

int providers_count;
char **providers;

CheckStatus(g_ort->GetAvailableProviders(&providers, &providers_count));

printf(">>> Num Providers: %d\n", providers_count);
printf(">>> Providers:\n");

for (int i = 0; i < providers_count; ++i) {
    printf(">>> %d) %s\n", i, providers[i]);
}
CheckStatus(g_ort->ReleaseAvailableProviders(providers, providers_count));

Here, you can see a regular pattern used by the ONNX C API:

Declare variables.
Initialize variables by reference in a function.
Check the status of the operation.
Release allocated resources (if any).

CheckStatus is just a helper function to check the status:

void CheckStatus(OrtStatus *status) {
  if (status != NULL) {
    const char *msg = g_ort->GetErrorMessage(status);
    fprintf(stderr, "%s\n", msg);
    g_ort->ReleaseStatus(status);
    exit(1);
  }
}

The equivalent code in Odin is as follows:

//// Get available providers:
providers_count: c.int
providers: [^]cstring

g_ort.GetAvailableProviders(cast(^^^c.char)(&providers), &providers_count)

defer g_ort.ReleaseAvailableProviders(providers, providers_count)

fmt.println(">>> Available providers:")
for i: c.int = 0; i < providers_count; i += 1 {
    fmt.printfln("\t%d) %s", i, providers[i])
}
/*
    0) TensorrtExecutionProvider
    1) CUDAExecutionProvider
    2) CPUExecutionProvider
*/

The most interesting part of the code snippets above (as well as in the whole project) is how Odin types are mapped to C types.

Mapping between C's int and Odin's c.int is quite straightforward. But what about

char **providers;
OrtStatus* GetAvailableProviders(char*** out_ptr, int* provider_length);

This is quite tricky. On the Odin side, we could use providers: ^^c.char, but we shouldn't. This is because we use providers as a 1D array of C-strings (null-terminated char arrays), but not just double pointer to char. To express array nature of providers, we use multi-pointers. Multi-pointers support indexing in Odin. So, instead of providers: ^^c.char, we use providers: [^]cstring.

But how do I pass providers: [^]cstring to a function with the following signature:

GetAvailableProviders : proc(out_ptr: ^^^c.char, provider_length: ^c.int) -> OrtStatusPtr

To pass providers into GetAvailableProviders() we have to cast providers to ^^^c.char:

cast(^^^c.char)(&providers)

Initialize `OrtEnv` variable

According to the C API, to initialize the OrtEnv we use the following code:

OrtEnv *env;
CheckStatus(g_ort->CreateEnv(ORT_LOGGING_LEVEL_WARNING, "test", &env));

On the Odin side, the equivalent code is quite similar:

env: ^OrtEnv
status: OrtStatusPtr = g_ort.CreateEnv(OrtLoggingLevel.ORT_LOGGING_LEVEL_WARNING, "test", &env)
CheckStatus(g_ort, status)
defer g_ort.ReleaseEnv(env)

CheckStatus() is as follows:

CheckStatus :: proc(ort: ^OrtApi, status: OrtStatusPtr) {
    if status != nil {
        msg: cstring = ort.GetErrorMessage(status)
        fmt.eprintln(msg)
        ort.ReleaseStatus(status)
        os.exit(1)
    }
}

Configure `OrtSessionOptions` and `OrtSession`

`OrtSessionOptions`

To initialize OrtSessionOptions we use the OrtApi instance:

OrtSessionOptions *session_options;
CheckStatus(g_ort->CreateSessionOptions(&session_options));
g_ort->SetIntraOpNumThreads(session_options, 1);
g_ort->SetSessionGraphOptimizationLevel(session_options, ORT_ENABLE_BASIC);

Compare it with Odin's code:

session_options: ^OrtSessionOptions
status = g_ort.CreateSessionOptions(&session_options)
CheckStatus(g_ort, status)
defer g_ort.ReleaseSessionOptions(session_options)

status = g_ort.SetIntraOpNumThreads(session_options, 1)
CheckStatus(g_ort, status)

status = g_ort.SetSessionGraphOptimizationLevel(
    session_options,
    GraphOptimizationLevel.ORT_ENABLE_BASIC,
)
CheckStatus(g_ort, status)

Enable CUDA if available

After OrtSessionOptions is initialized, we can check if CUDA is available and configure ONNX to use CUDA as the provider. Here I show only Odin code because as you already have seen, the difference between Odin and C is minimal.

Let's find out if CUDA is available:

is_cuda_available: bool
for i: c.int = 0; i < providers_count; i += 1 {
    if providers[i] == "CUDAExecutionProvider" {
        is_cuda_available = true
        break
    }
}
fmt.printfln(">>> CUDA is available: %t", is_cuda_available)

And use it as acceleration provider:

if is_cuda_available {
    fmt.println(">>> Setting up CUDA...")
    status = OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0)
    CheckStatus(g_ort, status)
}

`OrtSession`

OrtSession is initialized with OrtEnv, model path, and OrtSessionOptions:

session: ^OrtSession
model_path :: "squeezenet1.0-8.onnx"

status = g_ort.CreateSession(env, model_path, session_options, &session)
CheckStatus(g_ort, status)
defer g_ort.ReleaseSession(session)

Run session

So, we've made all necessary preparations to configure OrtSession. We're almost ready to run the mode in GPU.

To run the model, we have to:

specify the model's input and output node names
allocate input tensor
run session
get computation results

Specify model's input and output node names

ONNX model is a computational graph where nodes represent operations, and edges represent the data flow.

We have to specify which node we'd like to put data in and which node we'd like to get the results from. The specification is done by name.

Without diving deep into details, I'll postulate that the input node name is data_0, the output node name is softmaxout_1, and the input node dimensions are 1x3x224x224. There are a few ways to get this information:

read model specification or source code
use handy visualizers, for example, https://netron.app/
use ONNX API functionality to get the information about the model in runtime

Here is a screenshot of SqueezeNet model properties taken on the netron.app:

The syntax of node name declaration is straightforward, so I'll show only output_node_names in C (to be more precise, in C++ as it was implemented in the ONNX example):

std::vector<const char *> output_node_names = {"softmaxout_1"};

On the Odin side, input node name, output node name, and dimensions are initialized as follows:

input_node_dims := make([dynamic]c.int64_t)
defer delete(input_node_dims)
append(&input_node_dims, 1, 3, 224, 224)

input_node_names := make([dynamic]cstring)
defer delete(input_node_names)
append(&input_node_names, "data_0")

output_node_names := make([dynamic]cstring)
defer delete(output_node_names)
append(&output_node_names, "softmaxout_1")

Allocate input tensor

In a real-world scenario, we'd use images to pass to the model and get predictions. But for the sake of the demo, we'll use dummy data to be passed to the model.

SqueezeNet is the model for image classification. The model is trained on the ImageNet dataset. Images in the ImageNet dataset have a specific size of 224x224x3 pixels (224 pixels height, 224 pixels width, and 3 channels). So, we have to allocate and populate a vector of 224*224*3 elements.

size_t input_tensor_size = 224 * 224 * 3;
std::vector<float> input_tensor_values(input_tensor_size);
// initialize input data with values in [0.0, 1.0] (dummy data)
for (size_t i = 0; i < input_tensor_size; i++) {
    input_tensor_values[i] = (float)i / (input_tensor_size + 1);
}

// create input tensor object from data values
OrtMemoryInfo *memory_info;
CheckStatus(g_ort->CreateCpuMemoryInfo(OrtArenaAllocator, OrtMemTypeDefault, &memory_info));

OrtValue *input_tensor = NULL;
CheckStatus(g_ort->CreateTensorWithDataAsOrtValue(
    memory_info, input_tensor_values.data(),
    input_tensor_size * sizeof(float), input_node_dims.data(), 4,
    ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT, &input_tensor));

int is_tensor;
CheckStatus(g_ort->IsTensor(input_tensor, &is_tensor));
assert(is_tensor);
g_ort->ReleaseMemoryInfo(memory_info);

In the C code above (again, it's C++, but who cares), we allocate the input_tensor_values vector of floats with 224*224*3 elements and populate it with dummy data. Then, we "transfer" the data to the ONNX world by calling CreateTensorWithDataAsOrtValue() and allocating input_tensor. As the final step, we check if input_tensor is a reference to the real tensor, and we're good to go.

Here is the same thing in Odin:

input_tensor_size: c.size_t = 224 * 224 * 3

input_tensor_values := make([dynamic]c.float, input_tensor_size)
defer delete(input_tensor_values)

// initialize input data with values in [0.0, 1.0] (dummy data)
for i: c.size_t = 0; i < input_tensor_size; i += 1 {
    input_tensor_values[i] = cast(c.float)i / (cast(c.float)input_tensor_size + 1)
}

// create input tensor object from data values
memory_info: ^OrtMemoryInfo
status = g_ort.CreateCpuMemoryInfo(
    OrtAllocatorType.OrtArenaAllocator,
    OrtMemType.OrtMemTypeDefault,
    &memory_info,
)
CheckStatus(g_ort, status)
defer g_ort.ReleaseMemoryInfo(memory_info)

input_tensor: ^OrtValue
status = g_ort.CreateTensorWithDataAsOrtValue(
    memory_info,
    cast(rawptr)raw_data(input_tensor_values),
    input_tensor_size * size_of(c.float),
    cast(^c.int64_t)raw_data(input_node_dims),
    len(input_node_dims),
    ONNXTensorElementDataType.ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT,
    &input_tensor,
)
CheckStatus(g_ort, status)
defer g_ort.ReleaseValue(input_tensor)

is_tensor: c.int
status = g_ort.IsTensor(input_tensor, &is_tensor)
CheckStatus(g_ort, status)
assert(is_tensor == 1, "input_tensor not a tensor")

Pay attention to how we initialize dynamic arrays in Odin, and how we get a pointer to its inner data. To get a pointer to the inner data of the dynamic array we use raw_data() Odin's intrinsic. And then, we cast it to the raw pointer cast(rawptr) which is equivalent to casting to the void* in C.

Run session

At the moment, we've prepared a session and allocated input tensor as well as names and dimensions of input and output nodes. It was a long, tedious, but necessary preparation.

To store inference results, we use the OrtValue pointer. After computations are done, we check if that pointer is a valid tensor. Here is a C sample:

OrtValue *output_tensor = NULL;
CheckStatus(g_ort->Run(session, NULL, input_node_names.data(),
                        (const OrtValue *const *)&input_tensor, 1,
                        output_node_names.data(), 1, &output_tensor));
CheckStatus(g_ort->IsTensor(output_tensor, &is_tensor));
assert(is_tensor);

At this point, Odin's code should be clear and readable. I hope no extra explanations are required at the moment:

output_tensor: ^OrtValue
run_options: ^OrtRunOptions
status = g_ort.Run(
    session,
    run_options,
    raw_data(input_node_names),
    &input_tensor,
    len(input_node_names),
    raw_data(output_node_names),
    len(output_node_names),
    &output_tensor,
)
defer g_ort.ReleaseValue(output_tensor)
CheckStatus(g_ort, status)

status = g_ort.IsTensor(output_tensor, &is_tensor)
CheckStatus(g_ort, status)
assert(is_tensor == 1, "output_tensor not a tensor")

Get computation results

The time has come to get computation results from ONNX world to our world back. For these purposes, the GetTensorMutableData() function is used.

The result of the computations is a float array that has a length of 1000. "Why 1000?" you may ask. This is because there are 1000 classes of images in the ImageNet dataset. The model's output is the vector of probabilities of the image (dummy data in our case) depicting a specific class.

Here is a C sample:

float* floatarr;
CheckStatus(g_ort->GetTensorMutableData(output_tensor, (void**)&floatarr));
assert(std::abs(floatarr[0] - 0.000045) < 1e-6);

// score the model, and print scores for first 5 classes
for (int i = 0; i < 5; i++)
    printf("Score for class [%d] =  %f\n", i, floatarr[i]);

You already know that for array pointers we use multi-pointers in Odin:

floatarr: [^]c.float
status = g_ort.GetTensorMutableData(output_tensor, cast(^rawptr)&floatarr)
CheckStatus(g_ort, status)
assert(abs(floatarr[0]) - 0.000045 < 1e-6, "computition failed")

for i := 0; i < 5; i += 1 {
    fmt.printfln(">>> Score for class [%d] =  %.6f", i, floatarr[i])
}

Run model on GPU with the use of Odin

The full code described in the post is available on GitHub: https://github.com/yevhen-k/onnx-odin-squeezenet-inference-demo

To run the code, you have to:

Use Linux
Have ONNX Runtime on your machine (in the /thirdparty/onnxruntime folder in the current example)
Have GPU with CUDA (optional)

Prepare, compile and run the code.

Clone the repo

git clone https://github.com/yevhen-k/onnx-odin-squeezenet-inference-demo.git
cd onnx-odin-squeezenet-inference-demo

Get SqueezeNet model (using squeezenet version 1.3)

curl https://github.com/onnx/models/raw/main/validated/vision/classification/squeezenet/model/squeezenet1.0-8.onnx -Lso squeezenet1.0-8.onnx

Edit onnxbinding.odin if necessary to adjust package name or foreign import of libonnxruntime.so

// ...
package onnx_bindings
// ...
when ODIN_OS == .Linux do foreign import onnx "/thirdparty/onnxruntime/lib/libonnxruntime.so"
// ...

Build

cd ..
odin build onnx-odin-squeezenet-inference-demo -extra-linker-flags:"-Wl,-rpath=/thirdparty/onnxruntime/lib/" -out:onnx-odin-squeezenet-inference-demo/odin_onnx_example

Run

cd onnx-odin-squeezenet-inference-demo && ./odin_onnx_example

Special Thanks

Special thanks to the Odin community for answering my questions and helping me understand the mechanisms of passing data between C and Odin.

References

Inference of SqueezeNet.onnx model on CUDA with Odin: https://github.com/yevhen-k/onnx-odin-squeezenet-inference-demo/
Odin bindings to the ONNX Runtime (Linux): https://github.com/yevhen-k/onnx-odin-bindings
ONNX SqueezeNet example: https://github.com/microsoft/onnxruntime/blob/v1.4.0/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/C_Api_Sample.cpp
ONNX C API: https://github.com/microsoft/onnxruntime/blob/v1.17.3/include/onnxruntime/core/session/onnxruntime_c_api.h
SqueezeNet: https://paperswithcode.com/method/squeezenet
Model viewer: https://netron.app/

DEV Community

Giving Odin Intelligence

Prerequisites

ONNX Sample

ONNX Bindings

???

Make `OrtApi` instance

Listing all available providers

Initialize `OrtEnv` variable

Configure `OrtSessionOptions` and `OrtSession`

`OrtSessionOptions`

Enable CUDA if available

`OrtSession`

Run session

Specify model's input and output node names

Allocate input tensor

Run session

Get computation results

Run model on GPU with the use of Odin

Special Thanks

References

Top comments (0)

Read next

The Rise of AI-Powered Startups in 2024

API Performance Testing Tools: A Comprehensive Guide

Will Artificial Intelligence Replace Programmers?

Create Stunning Parallax Animations On Your Website

Prerequisites

ONNX Sample

ONNX Bindings

???

Make OrtApi instance

Listing all available providers

Initialize OrtEnv variable

Configure OrtSessionOptions and OrtSession

OrtSessionOptions

Enable CUDA if available

OrtSession

Run session

Specify model's input and output node names

Allocate input tensor

Run session

Get computation results

Run model on GPU with the use of Odin

Special Thanks

References

Read next

The Rise of AI-Powered Startups in 2024

API Performance Testing Tools: A Comprehensive Guide

Will Artificial Intelligence Replace Programmers?

Create Stunning Parallax Animations On Your Website

Make `OrtApi` instance

Initialize `OrtEnv` variable

Configure `OrtSessionOptions` and `OrtSession`

`OrtSessionOptions`

`OrtSession`