The Coding Notebook
Memorable coding moments of a software engineer
Cross Platform Object Detection with TensorFlow Lite - Part III
In this post we will develop the Android and iOS apps that will use our native Object Detector.

# Overview
In the [previous post]( we developed an Object Detector in C++, in this post we will see how to use it on Android an iOS.

# The App
Obviously we won't cover all the code for both apps in this post, the code is available on [GitHub]( We will focus mainly on the interface between the managed code (Kotlin/Swift) to the native code (C++).

## Project structure
The project contains the following folders:
* `native-detector` - The C++ code of the Object Detector, this will be used by both apps.
* `tflite-models` - The object detection model, this is a quantized model, and a `labelmap.txt` file with the label of each class id (Notes on [converting TF object detection model to tflite model](
* android - The Android app
* ios - The iOS app

## App Flow
Both apps works generally the same, it is a single-view app with a camera feed.

When the screen is loaded we will read the tflite model (which is embeded in the app), create an instance of `OjectDetector` and register to the new camera frame event.

Frame processing is going to be done on a background thread, we handle only one frame at a time (frames will be dropped while still processing old frames). When there's new frame to process: Convert the frame to OpenCV `Mat` --> Run detection using `ObjectDetector` (done on the native "side") --> encode the results (which is array of `DetectionResult`) to a float array --> Display result on screen.

# Android
It is important to create the app with "Native C++" support, for full explanation you can check [blog post](

## MainActivity
We'll define some class members that we'll use throughout the activity, the interesting ones are:
// A pointer to an instance of the C++ ObjectDetector
private var detectorAddr = 0L

// An array of class label (the class id is its index in the array)
val labelsMap = arrayListOf<String>()

Also we'll declare the jni methods here (which we will implement later):
// Create instance of ObjectDetector which returns a pointer to it
private external fun initDetector(assetManager: AssetManager): Long

// The detection method
private external fun detect(
    detectorAddr: Long,
    srcAddr: ByteArray,
    width: Int,
    height: Int,
    rotation: Int
): FloatArray

### onCreate
We will use [camera view]( to handle all the camera stuff. First we register to the new frame event, note the CameraView automatically dispatch the event handler on a background thread for us:
val cameraView = this.findViewById<CameraView>(
cameraView.addFrameProcessor { frame -> detectObjectNative(frame) }

Next, load `labelmap.txt` file from the embedded assets into the `labelsMap` array (of string).
val labelsInput ="labelmap.txt")
val br = BufferedReader(InputStreamReader(labelsInput))
var line = br.readLine()
while (line != null) {
    line = br.readLine()


### detectObjectNative
This is our "frame processor" (which runs on a background thread), first we make sure we have instance of the detector:
if (this.detectorAddr == 0L) {
    this.detectorAddr = initDetector(this.assets)
    // Save the frame dimensions to be used later when drawing on screen
    this.frameWidth = frame.size.width
    this.frameHeight = frame.size.height

And then we can call the jni `detect` method, since the frame data is just a byte array we are providing the frame dimensions as well, so the receiver will know how to decode the byte array into an image. We also provide the required rotation to do in order to make the image upright (this info is provided by CameraView):
val res = detect(

The result is just a float array with all detections, in our sample the detector is configured to return only 3 detections (See `DETECT_NUM` in `ObjectDetector.h`), and each detection has 6 values: `classId,score,xmin,xmax,ymin,ymax` - so we expect `res` to be array with 18 values. Then we just make calls to draw the 3 detections.

### drawDetection
This method gets the entire detections array and a detection index to draw, and draw in on `Canvas`. We won't go over the entire method here, the important part is getting the detection values based on the detection index, and converting the detection box coordinates to screen coordinates.

First we get the required scale/offset from detection box to screen box. `camera.width` is the camera view width, which is basically the screen width. `this.frameWidth` is the width of the image we ran detection on:
val scaleX = camera.width.toFloat() / this.frameWidth
val scaleY = camera.height.toFloat() / this.frameHeight

// The camera view offset on screen
val xoff = camera.left.toFloat()
val yoff =

And now extract the detection from the detections array, since each detection has 6 values, `detectionIdx * 6` is the starting index of each detection:
val classId = detectionsArr[detectionIdx * 6 + 0]
val score = detectionsArr[detectionIdx * 6 + 1]
val xmin = xoff + detectionsArr[detectionIdx * 6 + 2] * scaleX
val xmax = xoff + detectionsArr[detectionIdx * 6 + 3] * scaleX
val ymin = yoff + detectionsArr[detectionIdx * 6 + 4] * scaleY
val ymax = yoff + detectionsArr[detectionIdx * 6 + 5] * scaleY

Finally we just draw that on Canvas.

## The native part (Android)
### CMakeLists.txt
We need to update the CMakeLists file to include the ObjectDetector files, so they will get compiled into the `native-lib`, we will modify the call to `add_library` and add `ObjectDetector.cpp`:
    # Sets the name of the library.

    # Sets the library as a shared library.

    # Provides a relative path to your source file(s).

### native-lib.cpp
Here we need to implement 2 method: `initDetector` and `detect`.

The `initDetector` will read the tflite model from the embedded assets and create instance of `ObjectDetector`:
extern "C" JNIEXPORT jlong JNICALL
Java_com_vyw_nativeobjectdetection_MainActivity_initDetector(JNIEnv *env, jobject p_this, jobject assetManager) {
    char *buffer = nullptr;
    long size = 0;

    // Make sure assetManager is not null
    if (!(env->IsSameObject(assetManager, NULL))) {
        // open the model file
        AAssetManager *mgr = AAssetManager_fromJava(env, assetManager);
        AAsset *asset = AAssetManager_open(mgr, "detect.tflite", AASSET_MODE_UNKNOWN);
        assert(asset != nullptr);

        // read the model file into buffer
        size = AAsset_getLength(asset);
        buffer = (char *) malloc(sizeof(char) * size);
        AAsset_read(asset, buffer, size);

    // create instance of ObjectDetector
    jlong res = (jlong) new ObjectDetector(buffer, size, true);
    free(buffer); // ObjectDetector has its own copy so we can delete
    return res;

This is the declaration of the `detect` method, we get the pointer to the object detector, an image byte array (`src`), and the image dimensions/rotation:
Java_com_vyw_nativeobjectdetection_MainActivity_detect(JNIEnv *env, jobject p_this, jlong detectorAddr, jbyteArray src, int width, int height, int rotation)

First we convert the image byte array to OpenCV `mat`:
// Read image bytes
jbyte *_yuv = env->GetByteArrayElements(src, 0);

// Create yuv mat from bytes above
Mat myyuv(height + height / 2, width, CV_8UC1, _yuv);

// convert yuv to BGRA
Mat frame(height, width, CV_8UC4);
cvtColor(myyuv, frame, COLOR_YUV2BGRA_NV21);

// rotate the mat
if (rotation == 90) {
    transpose(frame, frame);
    flip(frame, frame, 1); //transpose+flip(1)=CW
} else if (rotation == 270) {
    transpose(frame, frame);
    flip(frame, frame, 0); //transpose+flip(0)=CCW
} else if (rotation == 180) {
    flip(frame, frame, -1);    //flip(-1)=180

// release memory
env->ReleaseByteArrayElements(src, _yuv, 0);

Next we cast the detector pointer to `ObjectDetector` and run detection
ObjectDetector *detector = (ObjectDetector *) detectorAddr;
DetectResult *res = detector->detect(frame);

And the last step is to encode the `DetectResult` array to float array:
// Encode each detection as 6 numbers (label,score,xmin,xmax,ymin,ymax)
int resArrLen = detector->DETECT_NUM * 6;
jfloat jres[resArrLen];
for (int i = 0; i < detector->DETECT_NUM; ++i) {
    jres[i * 6] = res[i].label;
    jres[i * 6 + 1] = res[i].score;
    jres[i * 6 + 2] = res[i].xmin;
    jres[i * 6 + 3] = res[i].xmax;
    jres[i * 6 + 4] = res[i].ymin;
    jres[i * 6 + 5] = res[i].ymax;

jfloatArray detections = env->NewFloatArray(resArrLen);
env->SetFloatArrayRegion(detections, 0, resArrLen, jres);

return detections;

# iOS
## ViewController
First we'll define some class members:
// will be used to make sure we don't run detection on several frames in parallel
var processing = false

// Background thread for processing camera frames
let sampleBufferQueue = .background)

// The native module that will ust ObjectDetector
let cv = OpenCVWrapper()

// A custom UIView to draw the detections
let detectionsCanvas = DetectionsCanvas()

### viewDidLoad
This is where we will setup the camera stream, since there is a bunch of code setting up the camera, we will not describe it here, the important thing is to make sure we do the frame processing in the background:
output.setSampleBufferDelegate(self, queue: sampleBufferQueue)

Next, load `labelmap.txt` file from the embedded assets into the `DetectionsCanvas` view which will use it:
detectionsCanvas.labelmap = loadLabels()

func loadLabels() -> [String] {
    var res = [String]()
    if let filepath = Bundle.main.path(forResource: "labelmap", ofType: "txt") {
        do {
            let contents = try String(contentsOfFile: filepath)
            res = contents.split { $0.isNewline }.map(String.init)
        } catch {
            print("Error loading labelmap.txt file")

    return res

### captureOutput
This is the frame processing handler, it runs on a background thread, explanations in the comments:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
    // IMPORTANT to process only 1 frame at a time
    if (processing) { return }
    processing = true

    // On first frame save the frame width/height
    if (detectionsCanvas.capFrameWidth == 0) {
        guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
            processing = false

        CVPixelBufferLockBaseAddress( pixelBuffer, .readOnly )
        detectionsCanvas.capFrameWidth = CVPixelBufferGetWidth(pixelBuffer)
        detectionsCanvas.capFrameHeight = CVPixelBufferGetHeight(pixelBuffer)
        CVPixelBufferUnlockBaseAddress( pixelBuffer, .readOnly )
        processing = false

    // Run the detection
    let res = cv.detect(sampleBuffer)

    // Convert results from Objective-C float to swift Float
    detectionsCanvas.detections = res.compactMap {($0 as! Float)}

    // Signal canvas to refresh important to run it on the ui thread
    DispatchQueue.main.async { [weak self] in
        self!.processing = false

### DetectionsCanvas
This is a custom UIView that draw the detections on the screen, we won't go over it, the important thing is how to interpret the detections result and converting the detection box coordinates to screen coordinates. This is all done in `override func draw(_ rect: CGRect)`.

First we get the required scale/offset from detection box to screen box, `capFrameWidth` and `capFrameHeight` are the frame dimensions we ran detection on:
let scaleX = self.frame.size.width / CGFloat(capFrameWidth)
let scaleY = self.frame.size.height / CGFloat(capFrameHeight)

// The view offset on screen
let xoff = self.frame.minX
let yoff = self.frame.minY

Next we loop over the detections, get the label/score/bbox and draw, note that `detections` is just a float array with ALL detections values, each detection has 6 values: `classId,score,xmin,xmax,ymin,ymax`
let count = detections.count / 6
for i in 0..<count {
    let idx = i * 6
    let classId = Int(detections[idx])
    let score = detections[idx + 1]
    let xmin = xoff + CGFloat(detections[idx + 2]) * scaleX
    let xmax = xoff + CGFloat(detections[idx + 3]) * scaleX
    let ymin = yoff + CGFloat(detections[idx + 4]) * scaleY
    let ymax = yoff + CGFloat(detections[idx + 5]) * scaleY

    ... drawing stuff

## The native part (iOS)
Integrating C++ inside iOS app is quite easy. First we create a new group and call it `Wrappers`, then add a new C++ file under `Wrappers` and name it `OpenCVWrapper` (choose to create header file as well), then Xcode should offer to create an Objective-C bridging header file for us (if it didn't it is possible that there is one already, or at least there is one configured under "Objective-C bridging header" in the Build Settings). Next, rename the `.cpp` file to `.mm` (so we can do Obj-C here) and the `.hpp` to `.h`, finally open the bridging header file and add
#import "OpenCVWrapper.h"

### Adding the ObjectDetector files
We need to add reference to the ObjectDetector files (`ObjectDetector.cpp` / `ObjectDetector.h` created in the previous post) by just dragging them to the `Wrappers` group (and choosing to "create folder reference").

This file is basically the bridging code between our Swift app and C++ ObjectDetector. We are going to have only 2 methods in it: `initDetector` and `detect`.

The `initDetector` function is where we are going to load the tflite model from the resources file and create an instace of `ObjectDetector`, note our detector instance is static:
static ObjectDetector* detector = nil;

- (void) initDetector {
    if(detector != nil) { return; }

    // Load the tflite model
    long size = 0;
    char* model = nullptr;
    NSError* configLoadError = nil;
    NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"detect" ofType:@"tflite"];
    NSData* data = [NSData dataWithContentsOfFile:modelPath options:0 error:&configLoadError];
    if (!data) {
      NSLog(@"Failed to load model: %@", configLoadError);
    } else {
        size = data.length;
        model = (char*)data.bytes;

    detector = new ObjectDetector((const char*)model, size, true);

And the `detect` method is where we run the actual detection. First we convert the camera buffer to OpenCV Mat:
-(NSArray*) detect: (CMSampleBufferRef)buffer {
    CVImageBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(buffer);
    CVPixelBufferLockBaseAddress( pixelBuffer, 0 );

    //Processing here
    int bufferWidth = (int)CVPixelBufferGetWidth(pixelBuffer);
    int bufferHeight = (int)CVPixelBufferGetHeight(pixelBuffer);
    unsigned char *pixel = (unsigned char *)CVPixelBufferGetBaseAddress(pixelBuffer);

    //put buffer in open cv, no memory copied
    Mat dst = Mat(bufferHeight, bufferWidth, CV_8UC4, pixel, CVPixelBufferGetBytesPerRow(pixelBuffer));

    //End processing
    CVPixelBufferUnlockBaseAddress( pixelBuffer, 0 );

Next we make sure the detector is initialized and run the detection:
    [self initDetector];

    // Run detections
    DetectResult* detections = detector->detect(dst);

Finally we decode the detections into a float array (remember each detection is 6 float numbers):
    // decode detections into float array
    NSMutableArray *array = [[NSMutableArray alloc] initWithCapacity: (detector->DETECT_NUM * 6)];

    for (int i = 0; i < detector->DETECT_NUM; ++i) {
        [array addObject:[NSNumber numberWithFloat:detections[i].label]];
        [array addObject:[NSNumber numberWithFloat:detections[i].score]];
        [array addObject:[NSNumber numberWithFloat:detections[i].xmin]];
        [array addObject:[NSNumber numberWithFloat:detections[i].xmax]];
        [array addObject:[NSNumber numberWithFloat:detections[i].ymin]];
        [array addObject:[NSNumber numberWithFloat:detections[i].ymax]];

    return array;

# The End

Pshew, that was long :) and it also concludes our series about Cross Platform Object Detection with TensorFlow Lite !
Again, the full code is available on [GitHub](