linux cgroups -based solutions (e.g., Docker , CoreOS ) are increasingly being used to host multiple applications on the same host. We have been using cgroups at LinkedIn to build our owncontainerization product calledLPS (LinkedIn Platform as a Service) and to investigate the impact of resource-limiting policies on application performance. This post presents our findings on how CPU scheduling affects the performance of Java applications in cgroups. We found that Java applications can have more and longer application pauses when using CFS (Completely Fair Scheduler) in conjunction with CFS Bandwidth Control quotas. During these pauses, the application is not responding to user requests, so this is a severe performance pain that we need to understand and address.
These increased pauses are caused by the interactions between JVM’s GC (Garbage Collection) mechanisms and CFS scheduling. In CFS, a cgroup is assigned certain CPU quota (i.e., cfs_quota), which can be quickly drained by JVM GC’s multiple-threading activity, causing the application to be throttled. For example, the following may occur:
If an application aggressively uses its CPU quota in a scheduling period, then the application is throttled (no more CPU is given) and stops responding for the remaining duration of the scheduling period.
The multi-threaded JVM GC makes the problem much worse, as the cfs_quota is counted across all threads of the application. As a result, the CPU quota may be even more quickly used up. JVM GC has many concurrent phases that are non-STW (stop the world). However, their running can also cause faster cpu_quota use, hence practically making the entire application STW.
In this post, we’ll share our findings after investigating this issue and our recommendations about CFS/JVM tuning to mitigate the negative impact. Specifically:
Sufficient CPU quota should be assigned to the cgroup that hosts the Java application; and,
JVM GC threads should be appropriately tuned down to mitigate the pauses.
Linux cgroups backgroundLinux cgroups (Control Groups) are used to limit various types of resource usage by applications. For the CPU resource, the CPU subsystem schedules CPU access to cgroups, and CFS is one of the two supported schedulers. CFS is a proportional share scheduler that allocates CPU access to cgroups based on the cgroups’ weights.
For the RHEL7 (Red Hat Enterprise Linux) machine we use, there are multiple tunables. Two CFS tunables for ceiling enforcement are used for limiting the CPU resources used by cgroups: cpu.cfs_period_us and cpu.cfs_quota_us, both in microseconds. cpu.cfs_period_us specifies the CFS period, or enforcement interval for which a quota is applied, and cpu.cfs_quota_us specifies the quota the cgroup can use in each CFS period. cpu.cfs_quota_us essentially sets the hard limit (i.e., ceiling) on the CPU resource. A cgroup (along with its processes) is only allowed to occupy the CPU cores for the time duration specified in cpu.cfs_quota_us. Hence, to give a cgroup N cores, cpu.cfs_quota_us is set to N times of cpu.cfs_period_us.
Workload and setupFor our analysis, we created a synthetic Java application for testing CFS behavior. This Java application simply keeps allocating objects on the Java heap. After the number of allocated objects reaches a certain threshold, a portion of them are released. There are two application threads, each independently performing object allocation and object release. The time taken by each object allocation is recorded as the allocation latency. The source code for this synthetic Java application is on GitHub .
The performance metrics we considered include: (1) application throughput (the object allocation rate); (2) object allocation latencies; (3) the cgroup’s statistics, including the cgroup’s CPU usage, nr_throttled (number of CFS periods that are throttled), and throttled_time (total throttled time).
The machine we used is RHEL7 with 3.10.0-327 kernel, and it has 24 HT (hyper-threading) cores. The CPU resources were limited using CFS ceiling enforcement. The cgroup hosting the Java application by default was assigned three cores of CPU shares, considering the fact that there were two application threads and GC activities. In later tests we also varied the number of cores assigned in order to gain additional insights. The cfs_period by default was 100ms. Each run of the workload took 20 minutes (1200 seconds). So with cfs_period being 100ms, there were 12,000 CFS periods in each run.
Investigation of a large application pauseWe’ll start with a detailed analysis of a particular application pause in order to shed light on the reasons behind the pause.
Application stop
At the time of 22:57:34, both application threads stop for about three seconds (i.e., 2,917ms and 2,916ms).JVM GC STW
To understand what caused the three-second application freezing, we first examined the JVM GC log. We found that at the time of 22:57:37.771, a STW (stop the world) GC pause occurred. Note that the pause lasts about 0.12 seconds.Compared with the three-second application stop, the 0.12-second GC pause is insufficient to explain the 2.88-second (i.e., 3-0.12) difference; therefore, there must be some other reason for the pause.
CFS throttling
We suspected that the extra application pause was caused by the cgroups’ CFS scheduler. We examined the cgroups’ statistics by gathering various types of reported cgroups statistics for every second that the application was running. We found that the metric of “throttled_time” is of great interest. “Throttled_time” reports the accumulated total time duration (in nanoseconds) for which entities of the group have been throttled.We noticed that, while the application was frozen, the “throttled_time” occurred starting at 22:57:33. When the application was in the frozen period, the increase (i.e., difference) in “throttled_time” was about 5.28 seconds. We believe that the “throttled_time” contributed to the application freezing.
JVM GC threads
We have found that some CFS scheduling periods expose substantial “throttled_time,” which we believe is caused by the (multiple) JVM GC threads. Briefly, when GC starts, JVM invokes multiple GC threads to do the work. JVM uses internal formulas to decide on the number of GC threads. Specifically, for a machine with 24 cores, the number of parallel GC threads is18, and the number of concurrent GC threads is 5. Because of these large numbers of threads, the cgroup’s CPU quota is quickly used up, causing all application threads (including GC threads) to be paused. How does CFS scheduler cause the application pause? CFS scheduler can lead to long application pau