DEV Community

Cover image for Giving Odin Intelligence
Yevhen Krasnokutsky
Yevhen Krasnokutsky

Posted on

Giving Odin Intelligence

My adventure with Odin keeps going. I decided to try something unexpected with Odin - deep learning and AI! "But how?" you might ask. Fortunately, there is a well-known deep-learning framework for running AI models. I'm about ONNX.

Prerequisites

  • Familiarity with ONNX
  • Familiarity with C
  • Odin Overview page has been read

ONNX has a well-documented C API, which makes it easy to port to Odin. At least, I hoped it would be easy. But this is another story.

The plan:

  • chose the ONNX sample to translate to Odin
  • make binds to ONNX
  • ???
  • run model on GPU with the use of Odin

ONNX Sample

I've found a suitable for my idea ONNX example. I'm going to use this example as a strong foundation for the project. But to make things more interesting I'll add just a few enhancements:

  • listing all available providers
  • checking if CUDA is available
  • using CUDA (if available) for running a model

This example code shows the basic usage of ONNX:

  1. Make OrtApi instance
  2. Initialize OrtEnv variable
  3. Configure OrtSessionOptions and OrtSession
  4. Run session (all computations happen here)
  5. Process results and release resources

ONNX Bindings

Long story short. I've already made Odin bindings for ONNX. And I'll use those bindings in the project. I have to say that this post is not about bindings generation. Still, we'll compare the C and Odin versions of the code to get an idea about the usage of C APIs in Odin.

???

In this paragraph, I'll describe only the key parts of the code. A link to the full code is available in the references.

Make OrtApi instance

In C OrtApi instance initialized as follows:

#include <onnxruntime_c_api.h>

const OrtApi* g_ort = OrtGetApiBase()->GetApi(ORT_API_VERSION);
Enter fullscreen mode Exit fullscreen mode

On the Odin side, the OrtApi structure is already defined in the bindings file, so we can simply initialize the OrtApi instance and even check if it was initialized successfully:

g_ort: ^OrtApi
if g_ort = OrtGetApiBase().GetApi(ORT_API_VERSION); cast(rawptr)g_ort == nil {
    fmt.eprintln(">>> OrtApi is nil")
    os.exit(1)
}
Enter fullscreen mode Exit fullscreen mode

Listing all available providers

To get a list of all available providers, in C we use:

int providers_count;
char **providers;

CheckStatus(g_ort->GetAvailableProviders(&providers, &providers_count));

printf(">>> Num Providers: %d\n", providers_count);
printf(">>> Providers:\n");

for (int i = 0; i < providers_count; ++i) {
    printf(">>> %d) %s\n", i, providers[i]);
}
CheckStatus(g_ort->ReleaseAvailableProviders(providers, providers_count));
Enter fullscreen mode Exit fullscreen mode

Here, you can see a regular pattern used by the ONNX C API:

  1. Declare variables.
  2. Initialize variables by reference in a function.
  3. Check the status of the operation.
  4. Release allocated resources (if any).

CheckStatus is just a helper function to check the status:

void CheckStatus(OrtStatus *status) {
  if (status != NULL) {
    const char *msg = g_ort->GetErrorMessage(status);
    fprintf(stderr, "%s\n", msg);
    g_ort->ReleaseStatus(status);
    exit(1);
  }
}
Enter fullscreen mode Exit fullscreen mode

The equivalent code in Odin is as follows:

//// Get available providers:
providers_count: c.int
providers: [^]cstring

g_ort.GetAvailableProviders(cast(^^^c.char)(&providers), &providers_count)

defer g_ort.ReleaseAvailableProviders(providers, providers_count)

fmt.println(">>> Available providers:")
for i: c.int = 0; i < providers_count; i += 1 {
    fmt.printfln("\t%d) %s", i, providers[i])
}
/*
    0) TensorrtExecutionProvider
    1) CUDAExecutionProvider
    2) CPUExecutionProvider
*/
Enter fullscreen mode Exit fullscreen mode

The most interesting part of the code snippets above (as well as in the whole project) is how Odin types are mapped to C types.

Mapping between C's int and Odin's c.int is quite straightforward. But what about

char **providers;
OrtStatus* GetAvailableProviders(char*** out_ptr, int* provider_length);
Enter fullscreen mode Exit fullscreen mode

This is quite tricky. On the Odin side, we could use providers: ^^c.char, but we shouldn't. This is because we use providers as a 1D array of C-strings (null-terminated char arrays), but not just double pointer to char. To express array nature of providers, we use multi-pointers. Multi-pointers support indexing in Odin. So, instead of providers: ^^c.char, we use providers: [^]cstring.

But how do I pass providers: [^]cstring to a function with the following signature:

GetAvailableProviders : proc(out_ptr: ^^^c.char, provider_length: ^c.int) -> OrtStatusPtr
Enter fullscreen mode Exit fullscreen mode

To pass providers into GetAvailableProviders() we have to cast providers to ^^^c.char:

cast(^^^c.char)(&providers)
Enter fullscreen mode Exit fullscreen mode

Initialize OrtEnv variable

According to the C API, to initialize the OrtEnv we use the following code:

OrtEnv *env;
CheckStatus(g_ort->CreateEnv(ORT_LOGGING_LEVEL_WARNING, "test", &env));
Enter fullscreen mode Exit fullscreen mode

On the Odin side, the equivalent code is quite similar:

env: ^OrtEnv
status: OrtStatusPtr = g_ort.CreateEnv(OrtLoggingLevel.ORT_LOGGING_LEVEL_WARNING, "test", &env)
CheckStatus(g_ort, status)
defer g_ort.ReleaseEnv(env)
Enter fullscreen mode Exit fullscreen mode

CheckStatus() is as follows:

CheckStatus :: proc(ort: ^OrtApi, status: OrtStatusPtr) {
    if status != nil {
        msg: cstring = ort.GetErrorMessage(status)
        fmt.eprintln(msg)
        ort.ReleaseStatus(status)
        os.exit(1)
    }
}
Enter fullscreen mode Exit fullscreen mode

Configure OrtSessionOptions and OrtSession

OrtSessionOptions

To initialize OrtSessionOptions we use the OrtApi instance:

OrtSessionOptions *session_options;
CheckStatus(g_ort->CreateSessionOptions(&session_options));
g_ort->SetIntraOpNumThreads(session_options, 1);
g_ort->SetSessionGraphOptimizationLevel(session_options, ORT_ENABLE_BASIC);
Enter fullscreen mode Exit fullscreen mode

Compare it with Odin's code:

session_options: ^OrtSessionOptions
status = g_ort.CreateSessionOptions(&session_options)
CheckStatus(g_ort, status)
defer g_ort.ReleaseSessionOptions(session_options)

status = g_ort.SetIntraOpNumThreads(session_options, 1)
CheckStatus(g_ort, status)

status = g_ort.SetSessionGraphOptimizationLevel(
    session_options,
    GraphOptimizationLevel.ORT_ENABLE_BASIC,
)
CheckStatus(g_ort, status)
Enter fullscreen mode Exit fullscreen mode

Enable CUDA if available

After OrtSessionOptions is initialized, we can check if CUDA is available and configure ONNX to use CUDA as the provider. Here I show only Odin code because as you already have seen, the difference between Odin and C is minimal.

Let's find out if CUDA is available:

is_cuda_available: bool
for i: c.int = 0; i < providers_count; i += 1 {
    if providers[i] == "CUDAExecutionProvider" {
        is_cuda_available = true
        break
    }
}
fmt.printfln(">>> CUDA is available: %t", is_cuda_available)
Enter fullscreen mode Exit fullscreen mode

And use it as acceleration provider:

if is_cuda_available {
    fmt.println(">>> Setting up CUDA...")
    status = OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0)
    CheckStatus(g_ort, status)
}
Enter fullscreen mode Exit fullscreen mode

OrtSession

OrtSession is initialized with OrtEnv, model path, and OrtSessionOptions:

session: ^OrtSession
model_path :: "squeezenet1.0-8.onnx"

status = g_ort.CreateSession(env, model_path, session_options, &session)
CheckStatus(g_ort, status)
defer g_ort.ReleaseSession(session)
Enter fullscreen mode Exit fullscreen mode

Run session

So, we've made all necessary preparations to configure OrtSession. We're almost ready to run the mode in GPU.

To run the model, we have to:

  • specify the model's input and output node names
  • allocate input tensor
  • run session
  • get computation results

Specify model's input and output node names

ONNX model is a computational graph where nodes represent operations, and edges represent the data flow.

We have to specify which node we'd like to put data in and which node we'd like to get the results from. The specification is done by name.

Without diving deep into details, I'll postulate that the input node name is data_0, the output node name is softmaxout_1, and the input node dimensions are 1x3x224x224. There are a few ways to get this information:

  • read model specification or source code
  • use handy visualizers, for example, https://netron.app/
  • use ONNX API functionality to get the information about the model in runtime

Here is a screenshot of SqueezeNet model properties taken on the netron.app:

SqueezeNet on netron.app

The syntax of node name declaration is straightforward, so I'll show only output_node_names in C (to be more precise, in C++ as it was implemented in the ONNX example):

std::vector<const char *> output_node_names = {"softmaxout_1"};
Enter fullscreen mode Exit fullscreen mode

On the Odin side, input node name, output node name, and dimensions are initialized as follows:

input_node_dims := make([dynamic]c.int64_t)
defer delete(input_node_dims)
append(&input_node_dims, 1, 3, 224, 224)

input_node_names := make([dynamic]cstring)
defer delete(input_node_names)
append(&input_node_names, "data_0")

output_node_names := make([dynamic]cstring)
defer delete(output_node_names)
append(&output_node_names, "softmaxout_1")
Enter fullscreen mode Exit fullscreen mode

Allocate input tensor

In a real-world scenario, we'd use images to pass to the model and get predictions. But for the sake of the demo, we'll use dummy data to be passed to the model.

SqueezeNet is the model for image classification. The model is trained on the ImageNet dataset. Images in the ImageNet dataset have a specific size of 224x224x3 pixels (224 pixels height, 224 pixels width, and 3 channels). So, we have to allocate and populate a vector of 224*224*3 elements.

size_t input_tensor_size = 224 * 224 * 3;
std::vector<float> input_tensor_values(input_tensor_size);
// initialize input data with values in [0.0, 1.0] (dummy data)
for (size_t i = 0; i < input_tensor_size; i++) {
    input_tensor_values[i] = (float)i / (input_tensor_size + 1);
}

// create input tensor object from data values
OrtMemoryInfo *memory_info;
CheckStatus(g_ort->CreateCpuMemoryInfo(OrtArenaAllocator, OrtMemTypeDefault, &memory_info));

OrtValue *input_tensor = NULL;
CheckStatus(g_ort->CreateTensorWithDataAsOrtValue(
    memory_info, input_tensor_values.data(),
    input_tensor_size * sizeof(float), input_node_dims.data(), 4,
    ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT, &input_tensor));

int is_tensor;
CheckStatus(g_ort->IsTensor(input_tensor, &is_tensor));
assert(is_tensor);
g_ort->ReleaseMemoryInfo(memory_info);
Enter fullscreen mode Exit fullscreen mode

In the C code above (again, it's C++, but who cares), we allocate the input_tensor_values vector of floats with 224*224*3 elements and populate it with dummy data. Then, we "transfer" the data to the ONNX world by calling CreateTensorWithDataAsOrtValue() and allocating input_tensor. As the final step, we check if input_tensor is a reference to the real tensor, and we're good to go.

Here is the same thing in Odin:

input_tensor_size: c.size_t = 224 * 224 * 3

input_tensor_values := make([dynamic]c.float, input_tensor_size)
defer delete(input_tensor_values)

// initialize input data with values in [0.0, 1.0] (dummy data)
for i: c.size_t = 0; i < input_tensor_size; i += 1 {
    input_tensor_values[i] = cast(c.float)i / (cast(c.float)input_tensor_size + 1)
}

// create input tensor object from data values
memory_info: ^OrtMemoryInfo
status = g_ort.CreateCpuMemoryInfo(
    OrtAllocatorType.OrtArenaAllocator,
    OrtMemType.OrtMemTypeDefault,
    &memory_info,
)
CheckStatus(g_ort, status)
defer g_ort.ReleaseMemoryInfo(memory_info)

input_tensor: ^OrtValue
status = g_ort.CreateTensorWithDataAsOrtValue(
    memory_info,
    cast(rawptr)raw_data(input_tensor_values),
    input_tensor_size * size_of(c.float),
    cast(^c.int64_t)raw_data(input_node_dims),
    len(input_node_dims),
    ONNXTensorElementDataType.ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT,
    &input_tensor,
)
CheckStatus(g_ort, status)
defer g_ort.ReleaseValue(input_tensor)

is_tensor: c.int
status = g_ort.IsTensor(input_tensor, &is_tensor)
CheckStatus(g_ort, status)
assert(is_tensor == 1, "input_tensor not a tensor")
Enter fullscreen mode Exit fullscreen mode

Pay attention to how we initialize dynamic arrays in Odin, and how we get a pointer to its inner data. To get a pointer to the inner data of the dynamic array we use raw_data() Odin's intrinsic. And then, we cast it to the raw pointer cast(rawptr) which is equivalent to casting to the void* in C.

Run session

At the moment, we've prepared a session and allocated input tensor as well as names and dimensions of input and output nodes. It was a long, tedious, but necessary preparation.

To store inference results, we use the OrtValue pointer. After computations are done, we check if that pointer is a valid tensor. Here is a C sample:

OrtValue *output_tensor = NULL;
CheckStatus(g_ort->Run(session, NULL, input_node_names.data(),
                        (const OrtValue *const *)&input_tensor, 1,
                        output_node_names.data(), 1, &output_tensor));
CheckStatus(g_ort->IsTensor(output_tensor, &is_tensor));
assert(is_tensor);
Enter fullscreen mode Exit fullscreen mode

At this point, Odin's code should be clear and readable. I hope no extra explanations are required at the moment:

output_tensor: ^OrtValue
run_options: ^OrtRunOptions
status = g_ort.Run(
    session,
    run_options,
    raw_data(input_node_names),
    &input_tensor,
    len(input_node_names),
    raw_data(output_node_names),
    len(output_node_names),
    &output_tensor,
)
defer g_ort.ReleaseValue(output_tensor)
CheckStatus(g_ort, status)

status = g_ort.IsTensor(output_tensor, &is_tensor)
CheckStatus(g_ort, status)
assert(is_tensor == 1, "output_tensor not a tensor")
Enter fullscreen mode Exit fullscreen mode

Get computation results

The time has come to get computation results from ONNX world to our world back. For these purposes, the GetTensorMutableData() function is used.

The result of the computations is a float array that has a length of 1000. "Why 1000?" you may ask. This is because there are 1000 classes of images in the ImageNet dataset. The model's output is the vector of probabilities of the image (dummy data in our case) depicting a specific class.

Here is a C sample:

float* floatarr;
CheckStatus(g_ort->GetTensorMutableData(output_tensor, (void**)&floatarr));
assert(std::abs(floatarr[0] - 0.000045) < 1e-6);

// score the model, and print scores for first 5 classes
for (int i = 0; i < 5; i++)
    printf("Score for class [%d] =  %f\n", i, floatarr[i]);
Enter fullscreen mode Exit fullscreen mode

You already know that for array pointers we use multi-pointers in Odin:

floatarr: [^]c.float
status = g_ort.GetTensorMutableData(output_tensor, cast(^rawptr)&floatarr)
CheckStatus(g_ort, status)
assert(abs(floatarr[0]) - 0.000045 < 1e-6, "computition failed")

for i := 0; i < 5; i += 1 {
    fmt.printfln(">>> Score for class [%d] =  %.6f", i, floatarr[i])
}
Enter fullscreen mode Exit fullscreen mode

Run model on GPU with the use of Odin

The full code described in the post is available on GitHub: https://github.com/yevhen-k/onnx-odin-squeezenet-inference-demo

To run the code, you have to:

  1. Use Linux
  2. Have ONNX Runtime on your machine (in the /thirdparty/onnxruntime folder in the current example)
  3. Have GPU with CUDA (optional)

Prepare, compile and run the code.

  1. Clone the repo

    git clone https://github.com/yevhen-k/onnx-odin-squeezenet-inference-demo.git
    cd onnx-odin-squeezenet-inference-demo
    
  2. Get SqueezeNet model (using squeezenet version 1.3)

    curl https://github.com/onnx/models/raw/main/validated/vision/classification/squeezenet/model/squeezenet1.0-8.onnx -Lso squeezenet1.0-8.onnx
    
  3. Edit onnxbinding.odin if necessary to adjust package name or foreign import of libonnxruntime.so

    // ...
    package onnx_bindings
    // ...
    when ODIN_OS == .Linux do foreign import onnx "/thirdparty/onnxruntime/lib/libonnxruntime.so"
    // ...
    
  4. Build

    cd ..
    odin build onnx-odin-squeezenet-inference-demo -extra-linker-flags:"-Wl,-rpath=/thirdparty/onnxruntime/lib/" -out:onnx-odin-squeezenet-inference-demo/odin_onnx_example
    
  5. Run

    cd onnx-odin-squeezenet-inference-demo && ./odin_onnx_example
    

Special Thanks

Special thanks to the Odin community for answering my questions and helping me understand the mechanisms of passing data between C and Odin.

References

Top comments (0)