- Estimating latency accurately and efficiently is crucial in machine learning model deployment
- Piecewise linear regression is a simple and general method for approximating non-linear functions
- Inference time behaves as an "elbow" function, so a 2-segment piecewise linear regression model is used to describe it
- The break point and slope of each segment are obtained from data analysis and reported in Table I.
Everyday language explanation: Imagine you’re trying to estimate the time it takes for your favorite recipe to cook. You know that it can take anywhere from 30 minutes to an hour, but you want to make sure you give yourself enough time to avoid burning the dinner. A piecewise linear regression is like a simple cooking timer that breaks down the total time into smaller segments based on how far along you are in the recipe. By understanding when each segment takes longer or shorter, you can give yourself a more accurate estimate of the overall cooking time without overcomplicating things.
Computer Science, Networking and Internet Architecture