Accelerating AI by 3000% a Run:AI Case Study

Use Blip to create an article or summary from any YouTube video.

I am Amri Geller, co-founder and CEO of Run.AI, an orchestration and acceleration software platform for artificial intelligence. Today, I will be discussing the challenges that financial institutions face with their AI infrastructure.

In the past decade, there has been an explosion of data and an increase in readily accessible compute power, such as Nvidia's GPUs. However, building AI solutions requires a massive amount of computations, and as the amount of data and the size of AI models continue to grow, the need for computing power in data centers increases exponentially. Finance has been a significant growth area for AI, with use cases such as predicting risk, fraud detection, and algorithmic trading.

However, getting machine learning initiatives to market is not without challenges. IT leaders, MLOps, and data science teams find themselves with limited ability to manage expensive compute resources to achieve optimal speed and utilization. On average, customers coming to Run.AI are achieving only 25% utilization of these expensive resources. The reason for this is that data science is based on running many experiments to build models, a process called experimentation, which is different from software development.

Data scientists run a variable amount of experiments at different times to build solutions, and understanding experimentation better, I will show you real data that was gathered from one of our customers over a five-week period. We will see that every single user has multiple ways to interact with the GPU and that every user needs a variable amount of GPUs at different times.

There are two different workload types: build and train. Build is when users are building their models, meaning when they code, tweak, and build the AI model to solve a solution. Training models, on the other hand, typically happens in long sessions and is highly compute-intensive, requiring high GPU utilization performance.

In a project life cycle, there are long periods of time during which many concurrent training workloads are running, like when optimizing hyperparameters. Therefore, running multiple combinations of your model in parallel to see which one is the best is essential. There are also idle times in which only a small number of experiments are utilizing GPUs.

In conclusion, resources are underutilized in organizations, with an average of 25% GPU utilization. Static GPU allocations are a hassle for data science and IT, leading to low utilization and slowdown of the experimentation part of the R&D, which harms the productivity of the data scientists.

The question is, how do we reach nirvana? At Run.AI, we envision a world in which dynamic automated processes replace static resource allocation for artificial intelligence. Our software helps manage GPU clusters by providing centralized and high-performance cluster orchestration. With our software, every user can get access to any number of GPUs when they need it without being limited with static resource allocations.

Our scheduler allows users to use more GPUs than their quota if available. If there aren't available GPUs in the cluster and the user that is under her quota asks for a GPU, our scheduler becomes smarter and reclaims GPUs from one of the users that is over her quota while taking organizational policy and priority priorities into account.

With Run.AI, data scientists can be much more productive with automated access to GPUs at any moment, and organizations can ensure that expensive GPU resources utilization is maximized. They also have full visibility and control over the resources.

In summary, static resource allocation significantly harms data science productivity. However, these challenges can be solved by using orchestration software, leading to huge improvements in data science speed and productivity. I would like to thank you for attending and invite you to learn more about Run.AI at our booth or online.