Image & Voice Recognition

Background on object recognition features

Today, detecting human faces in real-time imaging is common, occurring in different areas and fitting diverse business demands. The DB Best development team utilizes various image and voice recognition algorithms to develop iOS or Android applications.

The latest technology solutions for image & voice recognition allow for detecting human faces, road signs, songs, voice commands, or even cats!

In order to get the best possible results, you need to consider leveraging a neural network, or even multiple neural networks.

Essentially, object recognition tasks come down to pattern recognition, which is a branch of machine learning. This means that you have to deal with big sets of data which requires enough processing power to solve the recognition tasks.

Image & Voice Recognition technologies

Usually, for image & voice recognition tasks, you need to use a neural networking technology. Neural networks use non-linear approaches by means of artificial intelligence algorithms. In doing so, they are capable of providing similar or even better results than the human brain! We can train a neural network from scratch, or utilize an already trained neural network. Implementing Convolutional Neural Networks like Google Cloud Vision, IBM Watson Visual Recognition, or Clarif.ai, will guarantee outstanding object recognition results.

Program-wise, our development team follows a 2-step approach: detecting the area with an object in the picture and the actual recognition. Once the object is detected, you need to crop the image and use the neural network for image recognition. This allows for reducing the neural network’s workload and increasing its performance.

The first basic step may be carried out by the neural network or even with the help of built-in smartphone features like CIDetector class of iOS Core Image library or the Face Detector class in Android.media API. On the second step, the neural network creates the multidimensional vector of the detected object. Then you can compare this vector with the sample vectors to complete the recognition process.

Take a look at the basic application structure (below) that leverages image & voice recognition via a neural network.

The DB Best team developed a smart solution that allows us to keep the high frame rate of the video output and decrease the neural network workload. In this application, we use the neural network to process one of 3 frames (about 10 images per second), while the live camera picture has a frame rate of 30 images per second.

Using a Neural Network for Face Tracking

Leveraging a neural network for face tracking and recognition delivers incredible results.

In our own in-house R&D lab, we crafted a number of iOS and Android applications, which utilize the neural networks for image and voice recognition.

Implementation of neural networks in mobile applications proved to be quite a challenging task, which requires some really good optimization hacks.

Check the following video to learn more about using neural networks for image recognition in mobile applications.

Object recognition scope of use

Our team can leverage image & voice recognition technologies in a variety of applications, from custom-tailored camera apps to immersive eye-controlled games. The list of cool features that can be brought to life with a face recognition feature includes adding Facebook likes with smile detection, sending emojis based on your face mimics like animojis in the Apple iPhone X, as well as handwriting recognition. With face-controlled apps and games becoming more and more popular, you can trust us to deliver native iOS applications, which use facial motions and gestures as input devices. Self-driving cars also use object recognition features, starting with road signs and finishing with various obstacles.

In regards to voice recognition, the DB Best team can add voice control features into your mobile application, just like we added recognition of voice commands in our research project. Generally speaking, you may use the sound recognition for instant music identification (like Shazam) or even voice translation (consider a built-in Translator in Skype, which recognizes no less than 8 different languages in real-time).

Most object detection applications use machine learning algorithms, so the more you use them, the better they get. Start building your image or sound recognition application today — contact DB Best to learn how you can take advantage of our experience!

Learn more

Blog posts

Check out some of our blog posts that highlight our empirical experience in the creation of image recognition mobile applications.

Using a Neural Network for Face Tracking on Android

2 March 2017 Bill Ramos

In one of our previous blog posts, we talked about face tracking basics. We have demonstrated a simple mobile Face Tracking application, based on native iOS frameworks. Nonetheless, it...

Discover How Face Tracking Mobile Apps Can Improve Your Business

10 February 2017 Bill Ramos

Detection of human faces in real-time imaging is a common task, which may occur in different areas and fit diverse business demands. Various algorithms were developed to resolve this t...

Request a quote