Heading off Problems with Application Delivery in an Era of AI
Efficient traffic steering practices will address application delivery performance problems to meet user expectations of their AI-enhanced experiences.
December 16, 2024
Today, 93% of organizations deploy and rely on various application delivery services to ensure the scale, performance, and availability of applications and APIs. Yet the services available today were not built for AI applications or the AI factories that fuel them.
Worse yet, there's been little effort to establish best practices for application delivery over the years despite its critical nature in scaling applications and ensuring the availability of everything from applications to APIs to security infrastructure. Oh, there's been a spate of "don't use round robin!" (usually from me), but application delivery has never been treated like the formal discipline it needs to be in a digital-by-default world.
When we look at AI applications and the factories full of inferencing servers that fuel them, we discover that efficient traffic steering (load balancing, request distribution, whatever you want to call it) and low latency are essential characteristics of successful AI applications.
So why are so many organizations struggling to do that?
Common AI-related Application Delivery Performance Issues
Well, today we’re going to highlight (expose?) some of the most common reasons—and ways to address them. Ready?
Common sources of inefficient traffic steering and poor performance:
Static Routing Policies: Relying on unchanging routing rules fails to account for real-time network conditions, leading to inefficient resource utilization and potential bottlenecks. And we all know that bottlenecks introduce latency, which makes users unhappy. So, more modern approaches to routing should be used that incorporate an understanding of congestion and can change routes when necessary. I mean, if gamers know how to use tools to get rid of lag, then professionals can certainly do so as well.
Lack of Dynamic Decision-Making: Without adaptive mechanisms to respond to current server health and network congestion, traffic may be directed to overburdened or failing resources, causing delays and reduced availability. Again, angry users! This one goes with the first one because no single switch or router has visibility into the bigger picture. There's got to be something—SDN, whatever—that’s overseeing traffic and can identify when there’s a problem—and do something about it.
Insufficient Load-Balancing Algorithms: Poorly designed load-balancing strategies can result in uneven distribution of traffic, with some servers overwhelmed while others remain underutilized, impacting both performance and scalability. This one is typically the hardest to get right because it requires matching algorithms with the application to which traffic is distributed. At a minimum, a little testing here wouldn't hurt.
Inadequate Health Checks: Failing to implement robust health monitoring means traffic might be sent to unresponsive or degraded servers, leading to increased latency and potential downtime. Oh, observability save us! Network responsiveness is not an indicator of application health. Repeat after me, network responsiveness…You get the picture. As with load-balancing algorithm choices, a little testing and ensuring you're measuring the metrics that matter will go a long way toward keeping users happy.
Absence of Programmable Infrastructure: Without programmable application delivery controllers (ADCs), it's challenging to customize traffic steering to align with specific application requirements, hindering responsiveness to dynamic conditions. This one is really important, more so than you might think. That’s because there are so many legacy, traditional, and bespoke applications out there that just choosing an algorithm may not be enough to address performance and delivery problems. Scaling patterns exist: X-axis scaling uses techniques like clustering and cloning, Y-axis scaling uses routing based on identifiable variables, and Z-axis scaling uses sharding. These patterns aren't about algorithms, and they pair up with application architectures to ensure performance and scale. But Y- and Z-axis scaling requires programmable infrastructure. So it’s kind of really super important to have.
A Final Word on Meeting the Application Delivery Performance Needs of AI Users
There are a lot of reasons organizations struggle with implementing efficient traffic steering policies, not the least of which is a shared understanding of the most common causes of delivery and performance problems—and a set of best practices to solve for them.
The problem with that, of course, is that AI isn’t going to wait for those best practices to be established. Organizations are going to struggle even more with performance and routing when building out their network and application delivery architecture to support AI.
AI applications often involve large-scale data processing and require low-latency access to compute resources, workloads that can be unpredictable with demand spikes that require adaptive resource allocation, and AI services often depend on multiple microservices and APIs.
More efficient traffic steering practices will address these potential pitfalls and ensure your users are happy with the performance of their AI-enhanced experience.
Read more about:
Infrastructure for AIAbout the Author
You May Also Like