Employing AI to Streamline Large-Scale Data Management

Render Farms

Author: Joe D’Amato, Senior Solutions Architect, CDW Canada

In recent years, there have been many exciting developments in the areas of automation and artificial intelligence (AI), which have factored significantly in transforming operations in many industries, including media and entertainment (M&E). In this blog, we will look at how AI is being introduced into render farms, and the role that AI will play in shaping the future of M&E workflows.

How managing resources in render farms can become complex

Despite an array of technological advancements in M&E rendering practices, resource prediction has only gotten more challenging across the industry. For years, resource managers would track activity by manually feeding data into Microsoft Excel. However, the rapid change rate of other variables such as production changes has made resource prediction much more difficult. This is partly due to the large size of render farms, which nowadays approach between 10-15K cores and up. The complexity and scale of the datasets you’re working with as metrics are coming back at you can be too much to digest, and result in a data overload.

In addition, the human factor must be considered. For example, a heavy rendering from a single FX artist can generate dozens of terabytes overnight – or not, depending on the artist’s discretion. This means resource managers need to predict and forecast the human choices that people are making to accommodate data that’s submitted to the render farm.

Further, resource managers must forecast the overall consumption of the render and disk. They must predict how big the show will be, how big the shot will be, how long the render will take, and how long the frame will take. Trying to monitor all these moving pieces is like watching the synapse of a brain fire off in real time. You have a petabyte-scale NAS storage system that has nearly a billion files with tens of thousands of farm tasks writing 20-50TB of data. This is all happening over a period of hours in a day and you need to manage what is going on.

The rise of AI and predictive analytics in rendering workflows

As a next step after using Excel, resource managers are feeding their data into an ELK stack, which comprises Elasticsearch, Logstash and Kibana open-source projects. These black-and-colour graphs are in almost every studio and provide insight into how the render farm is being used and other storage monitoring. However, there is still an overflow of data to correlate. To combat this, AI-style models are being leveraged to take all this information and begin correlating it. The end goal is to correlate all render farm activity and NAS activity, which are like two separate brains with their own synapses firing, and then coordinate how these affect each other.

We’ve evolved beyond the point where any human can track this activity in real time – there’s no watching the charts anymore. Advanced tools are needed that can constantly gather all the information and correlate it into a machine-learning model.

Applying machine learning to simplify workflow management

The most solid approach regarding storage is to leverage Dell EMC DataIQ, a real-time data monitoring system. Few studios that have a petabyte of storage are aware of what actually makes up that petabyte. DataIQ enables everyone using Dell EMC PowerScale NAS storage to see in real time where every last gigabyte is used. You can track artist data and shot data – it provides the framework on the data management side to correlate all these rapid changes. Resource managers need to predict if you’re running at 70-80 percent of the file system, which can sometimes mean 100TB or less of available headroom. If a handful of artists generate 12TB of data a night, then in one night you can significantly slow down the NAS if you don’t have alerts in place to stop it. Using DataIQ, as well as quotas and other PowerScale tools, will allow you to create triggers and limits so you won’t fill the disk to 100 percent, which can be destructive t to your PowerScale NAS.

Additionally, each VFX and animation studio will use home-brewed, bespoke software solutions, and entire studios will often operate like a custom-software package unto itself. Because of this, the software industry has been hard-pressed to come up with a viable AI product that can handle the unique rendering and storage challenges with off-the-shelf products. We have already seen companies come and go in this arena.

This leaves the studio having to create their own AI/ML analytics in-house to bring together render and disk prediction. In recent SIGGRAPH conventions, studios have publicly detailed their pursuit of AI solutions for predictive analysis using large historical datasets fed into machine learning models.  However, there has only been modest success in some areas with isolated use cases. Many large render and storage AI projects have been tried and scrapped or restarted as unsuccessful in recent years. The goal is not to model machine and render behaviour, but instead model human behaviour of submitted render jobs and the resulting storage needs. That is no small feat.

Getting started with AI

Whether leveraging an out-of-the-box solution or one that is bespoke, it’s imperative to start collecting and storing all the data points that you can, even if you don’t know which way you will ultimately go. Save as much render data as possible, even if it’s in a rudimentary form of a traditional SQL database or data dumps from the render into flat text files of information.

To train a machine-learning model, the more data you have, the better. Having the last year or two of render statistics offers much more to work with than day-one renders. For any studio, regardless if you’ve begun to approach how to analyze and do historical tracking, start saving now. Even if you miss a field that should’ve been tracked, you’ll be much further along with the rest of the data you have. The information can be parsed later and determined how to best be used. The more you save, the more you can apply to the models. The more rudimentary your data is, the less it will matter, because there are very sophisticated models on the horizon – scary sophisticated models.

Learn more about CDW’s solutions for the Media and Entertainment industry and Dell EMC PowerScale for Media and Entertainment workflow

About the author: Joe D’Amato is senior solutions architect for media and entertainment at CDW Canada, where he analyzes and reports on industry trends, production needs, new service offerings and market opportunities, while also building relationships and opportunities in North America. Prior to joining CDW, Joe spent more than 20 years providing technical expertise for visual effects and animation companies, and the last 12 years working exclusively in the areas of render farms and storage for producing cinematic special effects. His expertise includes rendering, storage and other infrastructure issues. :