Freedom of Choice - Building 3D Vision Solutions

Boaz Ein-Gil
Oct 17, 2021
4 min read

Having developed 3D vision solutions for a while, it became clear to the team at Agent Factory that the freedom to choose which 3D camera, programming language and device/s our code runs on is critical for delivering customer value within budget.

Ideally, you – the developer – should be able to make these choices as you approach a new project. Furthermore, if you are building reusable 3D vision components, you should be able to do so in a way that ensures they can be used in a variety of hardware configurations.

The Choice of Hardware and its Associated Cost

3D Vision solution typically start with 3D image acquisition from a 3D camera. Today, multiple vendors offer a variety of 3D cameras to choose from. This variety offers a range of price points, performance and endurance that let solution builders choose the right 3D camera for the task, balancing features with cost. While this variety is great, there is a high cost to be paid if you want to leverage it.

Variety of APIs

Unfortunately, once you start working with different 3D cameras, you quickly realize that there is no single, common, API that is implemented across different cameras. What this means is that you have to specifically implement the code that interacts with the 3D camera (i.e. acquire 3D images) differently for each type of 3D camera you want to support. This difference is not limited to using different interfaces or calling different methods, but it goes as deep as having different ways to represent the 3D data (e.g. which measurement unit is used to represent the distance? is the distance reported from the camera lens or from the camera plane? etc.) as well as requiring a completely different programming language, development and runtime environments. Clearly, this fact renders the concept of ‘write once’ mote.

Programming Languages and Platforms

Another desired capability is the option to choose the programming language in which to implement your 3D processing logic. While traditionally, computer vision algorithms were implemented in C/C++ programming languages, today, there is a growing number of different languages that can be used for such programming tasks. However, once you choose a 3D camera, you are usually locked into using a specific programming language too. For us, at Agent Factory, the freedom to use our programming language of choice for such 3D vision processing tasks is critical as it allows us to leverage our existing codebase across projects.

Compute Device

Choosing the compute device to deploy on-prem and run your code is also directly affected by your choice of 3D camera. Some cameras require the device to be directly connected to the compute device in order to operate, while others require code to be deployed directly to an on-board computer. Having these types of constraints, limits the freedom of choosing the best hardware for a given environment and solution.

Common 3D Capture API

In order to address these constraints and maintain the desired level of freedom when building 3D vision solutions, we have implemented a Common 3D Camera API. This API allows our developers to focus on building their 3D vision processing algorithms and solutions, while isolating them from the limitations mentioned above.

The Agent Factory’s Common 3D Capture API – C3C – is composed of a Cross-Platform Service, a Common 3D Capture Protocol and a Client API.

The Cross-Platform Service can be deployed on a range of compute devices and operating systems as well as inside Docker containers. It is responsible to abstract the native camera API exposed by each vendor and implements the server-side of the Common 3D Capture Protocol. Currently, this service supports Microsoft’s Azure Kinect DK as well as Sick’s Visionary T-Mini and Visionary S line of 3D cameras. Support for additional cameras is on its way.

The Common 3D Capture Protocol is a TCP-based network protocol that exposes the common capturing and configuration capabilities of all supported 3D cameras to any programming language or tool that can compose, send and receive TCP network requests. This essentially removes any programming language barrier the vendor specific API may impose.

The Client API is a library that implements the client-side of the common 3D capture protocol in a range of common programming languages. It greatly simplifies the process of acquiring 3D images from different 3D cameras by natively integrating with your programming language of choice and eliminating the need to dive into network programming. Currently, we have implemented the client API for Python, C# and Java.

Freedom for All (or what’s in it for you?)

Realizing the value that we gain from having the Common 3D Capture API when we implement 3D solutions and products for our customers, we’ve decided to make it available for 3rd party developers to use. We believe that they too will benefit from the value and freedom it provides to 3D vision solution design and implementation. Namely:

Your choice of 3D Cameras
Your choice of programming language and platform
Your choice of compute devices

Leaving you to focus on delivering your unique 3D vision value rather than worry about adjusting your development and deployment to any specific 3D capturing hardware.

Early access to Agent Factory’s Common 3D Capture API is currently available to select partners. If you are interested in trying it out and learn how it can benefit your next solution, please drop us a note below.

AGENT FACTORY