Blog: Mobile Video Processing on the Fly

Implementation of a mobile app interaction with a handheld device camera, e.g. video recording, is quite a trivial task if we are talking about using the standard recording parameters. In this case a few lines of code will do the trick: the app will connect to the built-in camera, which will make a recording and send the link to the recording back to the app. This initial simplicity however turns into a major problem when implementing algorithms designed to make some changes to the standard video-recording.

 

When working with the native video-recording scenarios, it becomes clear that you cannot make any changes to the standard UI of the camera screen, let alone some more substantial changes to the recording process logics and the video-recordings post-processing.

video-processing-on-fly-teaser

Video recording and editing on the fly app creation

In one of our recent projects we were to build an app which could make a video recording using a built-in camera on iOS or Android device, while simultaneously applying various filters and watermarks. There are quite a few applications in the market playing around the same concept: e.g. Vine or Instagram, but thus far they only allow  to make a short 15-sec long video recording, and the video is not of the best quality. In this project our task was to enable customized video recording of up to 15 minutes in a Full HD format.

Technological limitations of mobile devices

Just as we started  designing an app architecture our developers faced the hardware limitations of the mobile devices. The mobile devices today have enough horsepower to fully replace a computer for the standard operations (typing and text editing, tables creation, music and video playback). But  for the more complex tasks, such as video post-processing or 3D graphics rendering, mobile devices’ technical capacities would not be enough.
Learn more about the capabilities of smartphones’ GPU and CPU from the following video.

Here are some numbers showing the comparison of modern computers vs. smartphone devices:

video-processing-on-fly-table

When developer builds an app, she cannot account for a full exploitation of the device horse power, since quite a high percentage is being reserved by the OS. For instance for iOS devices this number can be as high as 40% RAM. This only means that you should be very careful about the amount of resources used by the app, since if the OS gets a notification about the resources over-usage it might terminate the app’s work without saving any data.

The topic of this article is  just about that: how to build an video-processing app so that all device’s horse power is  stretched to the limit, but at the same time gives the user a great user experience.

Rolling out the video post-processing

The initial version of a custom video-recording architecture was quite simple.

The technical specifications of the iPhone were not enough to start the video editing before the recording is saved in a file, but our priority was to enable a smooth playback of the edited video with no delays.  As a result the edited video-stream with the applied filters would be projected on the screen, and the original, non-filtered, recording would be saved to the file. Since the video itself  would be recorded without editing, we designed a video-file post-processing stage, which would require additional time for saving the end result to a file.

This architecture, as pictured below, despite of its simplicity, was still quite problematic performance and resource consumption-wise:

video-processing-old-architecture

The initial app architecture

In addition to that, when complex video-filters were applied, the video-processing rate would fall drastically, so that a 15-min long video-recording would take up to half an hour to be processed, which was completely unacceptable. Thus all the pitfalls described above made our team to continue looking for a more efficient and aesthetically successful solution.

Working with Capture Session

When digging deeper into the inability of the video “pre-processing” the team noticed the ability of working with media-sessions on iOS. You could work directly with Capture Session through AV Foundation  Framework.

Capture Session Work Process Scheme

AV Foundation Framework contains a class, which allows working with camera media-streams and  microphone on iOS devices. To display the camera image for the user, we used AVCaptureVideoPreviewLayer.

Using these alternate video editing and processing technologies we managed to improve the app architecture. We applied filters at the Capture Session level, following which an edited video-recording would be displayed on the screen and saved to a file.

 

 video-processing-new-architecture

 New App Architecture

GPU video processing

Another important question was: How do we process the video? A simple, but not so efficient solution, was video processing using the CPU capacities of the devices. When applying video-filters on iPhone 5 camera video-stream, we obtained 8FPS (Frames per second) and below. Such results were definitely not acceptable, hence we moved forward with video processing using GPU (Graphics Processing Unit), the main task of which is image graphic rendering. This approach should have been much more efficient for graphic and video content processing, compared to the previous one using CPU.

Video Processing with GPU

To implement the video processing we used the GPUImage library, which supported OpenGL ES 2.0, thus enabling direct communication with GPU, as well as applying video-filters and various photo-effects to the live-video. There’s no such thing as a completely smooth development problems, so of course we’ve encountered some problems when implementing the new architecture, but the new approach allowed us to increase the performance level by over a 100 times compared to the one obtained with the initial architecture using CPU.

Conclusion

In most of the cases CPU resources would not be sufficient for high quality video-stream processing and editing. A solution to this problem can be found in transferring this task over to the Graphics Processing Unit of a mobile device, which would result in both a drastic increase of the app performance and fit in the rigid capacity limits set by both iOS and Android.