The Coding Notebook
Memorable coding moments of a software engineer
Cross Platform Object Detection with TensorFlow Lite - Part II
In this post we will write the native Object Detector class in C++




# Overview
In the [previous post](https://www.thecodingnotebook.com/2019/11/cross-platform-object-detection-with.html) we collected all development files we need in order to use tflite with c++, and in this post we will actually write our native detector. The important thing to keep in mind when we write our detector is to make sure we keep it "clean" from any android/ios specifics, so it can truly be crossplatform.

# ObjectDetector class
We will not go over the entire code here but highlight the interesting parts, the full code is available on [GitHub](https://github.com/ValYouW/crossplatform-tflite-object-detecion/tree/master/native-detector)

**Credit**: The code is influenced from [YijinLiu/tf-cpu](https://github.com/YijinLiu/tf-cpu) work.

# Header file
In the header file we will define the `ObjectDetector` class and a `DetectResult` struct, which is the...detection result.
```cpp
struct DetectResult {
    int label = -1;
    float score = 0;
    float ymin = 0.0;
    float xmin = 0.0;
    float ymax = 0.0;
    float xmax = 0.0;
};

class ObjectDetector {
public:
    ObjectDetector(const char *tfliteModel, long modelSize, bool quantized = false);
    ~ObjectDetector();
    DetectResult *detect(Mat src);
    const int DETECT_NUM = 3; // The number of detections (boxes) to return
private:
    // Members
    char *m_modelBytes = nullptr; // the raw tflite model bytes
    std::unique_ptr<tflite::FlatBufferModel> m_model; // The loaded tflite model
    std::unique_ptr<tflite::Interpreter> m_interpreter; // The model interpreter
    bool m_hasDetectionModel = false;
    bool m_modelQuantized = false;
    const int DETECTION_MODEL_SIZE = 300; // The image input size, in our case width=height=300
    const int DETECTION_MODEL_CNLS = 3; // input image channels

    // normalization values for non-quantized models
    const float IMAGE_MEAN = 128.0;
    const float IMAGE_STD = 128.0;

    // reference to input/output tensors
    TfLiteTensor *m_input_tensor = nullptr;
    TfLiteTensor *m_output_locations = nullptr;
    TfLiteTensor *m_output_classes = nullptr;
    TfLiteTensor *m_output_scores = nullptr;
    TfLiteTensor *m_num_detections = nullptr;

    // Methods
    void initDetectionModel(const char *tfliteModel, long modelSize);
};
```

# Loading the model
Our detector class will need to following for initialization:
1. `const char *tfliteModel` - Byte array of a `.tflite` file, it will be the responsibility of each platform to load the model from file and provide us with the bytes
1. `long modelSize` - The size, in bytes, of `tfliteModel`
1. `bool quantized` - whether `tfliteModel` represents a quantized model

Given that we can now load and initialize the model (`initDetectionModel`), first we "take responsibility" on the model bytes by copying it:
```cpp
// Copy to model bytes as the caller might release this memory while we need it (EXC_BAD_ACCESS error on ios)
m_modelBytes = (char *) malloc(sizeof(char) * modelSize);
memcpy(m_modelBytes, tfliteModel, sizeof(char) * modelSize);
```
And load the model:
```cpp
m_model = tflite::FlatBufferModel::BuildFromBuffer(m_modelBytes, modelSize);
if (m_model == nullptr) {
    printf("Failed to load model");
    return;
}
```

Next we build an interpreter and get a reference to the intput tensor:
```cpp
// Build the interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
tflite::InterpreterBuilder builder(*m_model, resolver);
builder(&m_interpreter);
if (m_interpreter == nullptr) {
    printf("Failed to create interpreter");
    return;
}

// Allocate tensor buffers.
if (m_interpreter->AllocateTensors() != kTfLiteOk) {
    printf("Failed to allocate tensors!");
    return;
}

m_interpreter->SetNumThreads(1);

// Find input tensors.
if (m_interpreter->inputs().size() != 1) {
    printf("Detection model graph needs to have 1 and only 1 input!");
    return;
}

m_input_tensor = m_interpreter->tensor(m_interpreter->inputs()[0]);
```

The following step is optional but a good practice to run some validations on the model input/output type/dimension
```cpp
if (m_modelQuantized && m_input_tensor->type != kTfLiteUInt8) {
    printf("Detection model input should be kTfLiteUInt8!");
    return;
}

if (!m_modelQuantized && m_input_tensor->type != kTfLiteFloat32) {
    printf("Detection model input should be kTfLiteFloat32!");
    return;
}

// Input dims are 1 X width X height X channels
if (m_input_tensor->dims->data[0] != 1 ||
    m_input_tensor->dims->data[1] != DETECTION_MODEL_SIZE ||
    m_input_tensor->dims->data[2] != DETECTION_MODEL_SIZE ||
    m_input_tensor->dims->data[3] != DETECTION_MODEL_CNLS) {
    printf("Detection model must have input dims of 1x%ix%ix%i",DETECTION_MODEL_SIZE, DETECTION_MODEL_SIZE, DETECTION_MODEL_CNLS);
    return;
}

// We expect 4 outputs
if (m_interpreter->outputs().size() != 4) {
    printf("Detection model graph needs to have 4 and only 4 outputs!");
    return;
}
```

And finally get a reference to the 4 output tensors
```cpp
m_output_locations = m_interpreter->tensor(m_interpreter->outputs()[0]);
m_output_classes = m_interpreter->tensor(m_interpreter->outputs()[1]);
m_output_scores = m_interpreter->tensor(m_interpreter->outputs()[2]);
m_num_detections = m_interpreter->tensor(m_interpreter->outputs()[3]);
```

# Running the detection
This is the implementation of the `detect` method of our class, the input is an OpenCV image (`Mat`) and the response is an array of `DetectResult`:
```cpp
DetectResult* ObjectDetector::detect(Mat src) {...}
```

## Prepare the input image
The input image should be 300x300 pixels and in RGB format:
```cpp
Mat image;
resize(src, image, Size(DETECTION_MODEL_SIZE, DETECTION_MODEL_SIZE), 0, 0, INTER_AREA);

// Convert to RGB
int cnls = image.type();
if (cnls == CV_8UC1) {
    cvtColor(image, image, COLOR_GRAY2RGB);
} else if (cnls == CV_8UC4) {
    cvtColor(image, image, COLOR_BGRA2RGB);
}
```

## Copy image to input tensor
When copying `image` to the input tensor there is a different logic depending wether the model is quantized or not. For quantized models the expected data type is `uint8` which is exactly the data type of our OpenCV `image` (8bit per pixel), so we can just copy the bytes. For float models we first need to convert `image` to 32bit float and normalize the pixels values according to `IMAGE_STD` and `IMAGE_MEAN`.

```cpp
if (m_modelQuantized) {
    // just copy the image
    uchar *dst = m_input_tensor->data.uint8;
    memcpy(dst, image.data,
            sizeof(uchar) * DETECTION_MODEL_SIZE * DETECTION_MODEL_SIZE * DETECTION_MODEL_CNLS);
} else {
    // Convert image to floats and normalize based on std and mean (p' = (p-mean)/std)
    Mat fimage;
    image.convertTo(fimage, CV_32FC3, 1 / IMAGE_STD, -IMAGE_MEAN / IMAGE_STD);

    // Copy float image into input tensor
    float *dst = m_input_tensor->data.f;
    memcpy(dst, fimage.data,
            sizeof(float) * DETECTION_MODEL_SIZE * DETECTION_MODEL_SIZE * DETECTION_MODEL_CNLS);
}
```

## Invoking the interpreter
Now we can run the detection:
```cpp
if (m_interpreter->Invoke() != kTfLiteOk) {
    printf("Error invoking detection model");
    return res;
}
```

## Collecting the results
Out model has 4 outputs:
1. locations - Float array, the detection "boxes" in the order `ymin,xmin,ymax,xmax`, note these values are between 0-1.
1. classes - Float array, the class id for each detection (person,bicycle,car etc)
1. scroes - Float array, the matching "score" (between 0-1) for each detection
1. number of detection - Integer, The number of detections in the output

We then just loop over all the detection results (or we reached `DETECT_NUM` which is the number of results we limit our detector to) and converting each one to our `DetectResult` struct.

**Note:** Since the detection boxes coordinates are between 0-1 we multiply them by the size of our original input image, so our results will be in "pixel" values of our input image.

```cpp
// The output array
DetectResult res[DETECT_NUM];

const float *detection_locations = m_output_locations->data.f;
const float *detection_classes = m_output_classes->data.f;
const float *detection_scores = m_output_scores->data.f;
const int num_detections = (int) *m_num_detections->data.f;

for (int i = 0; i < num_detections && i < DETECT_NUM; ++i) {
    res[i].score = detection_scores[i];
    res[i].label = (int) detection_classes[i];

    // Get the bbox, make sure its not out of the image bounds, and scale up to src image size
    res[i].ymin = std::fmax(0.0f, detection_locations[4 * i] * src.rows);
    res[i].xmin = std::fmax(0.0f, detection_locations[4 * i + 1] * src.cols);
    res[i].ymax = std::fmin(float(src.rows - 1), detection_locations[4 * i + 2] * src.rows);
    res[i].xmax = std::fmin(float(src.cols - 1), detection_locations[4 * i + 3] * src.cols);
}
```

# The End
That's it, in the [next](https://www.thecodingnotebook.com/2020/04/cross-platform-object-detection-with_26.html) post we will see how to use this code from both Android and iOS.