- Enable group scheduling in CFQ
CONFIG_CFQ_GROUP_IOSCHED=y
-- Compile and boot into kernel and mount IO controller (blkio).
+- Compile and boot into kernel and mount IO controller (blkio); see
+ cgroups.txt, Why are cgroups needed?.
- mount -t cgroup -o blkio none /cgroup
+ mount -t tmpfs cgroup_root /sys/fs/cgroup
+ mkdir /sys/fs/cgroup/blkio
+ mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
- Create two cgroups
- mkdir -p /cgroup/test1/ /cgroup/test2
+ mkdir -p /sys/fs/cgroup/blkio/test1/ /sys/fs/cgroup/blkio/test2
- Set weights of group test1 and test2
- echo 1000 > /cgroup/test1/blkio.weight
- echo 500 > /cgroup/test2/blkio.weight
+ echo 1000 > /sys/fs/cgroup/blkio/test1/blkio.weight
+ echo 500 > /sys/fs/cgroup/blkio/test2/blkio.weight
- Create two same size files (say 512MB each) on same disk (file1, file2) and
launch two dd threads in different cgroup to read those files.
echo 3 > /proc/sys/vm/drop_caches
dd if=/mnt/sdb/zerofile1 of=/dev/null &
- echo $! > /cgroup/test1/tasks
- cat /cgroup/test1/tasks
+ echo $! > /sys/fs/cgroup/blkio/test1/tasks
+ cat /sys/fs/cgroup/blkio/test1/tasks
dd if=/mnt/sdb/zerofile2 of=/dev/null &
- echo $! > /cgroup/test2/tasks
- cat /cgroup/test2/tasks
+ echo $! > /sys/fs/cgroup/blkio/test2/tasks
+ cat /sys/fs/cgroup/blkio/test2/tasks
- At macro level, first dd should finish first. To get more precise data, keep
on looking at (with the help of script), at blkio.disk_time and
- Enable throttling in block layer
CONFIG_BLK_DEV_THROTTLING=y
-- Mount blkio controller
- mount -t cgroup -o blkio none /cgroup/blkio
+- Mount blkio controller (see cgroups.txt, Why are cgroups needed?)
+ mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
- Specify a bandwidth rate on particular device for root group. The format
for policy is "<major>:<minor> <byes_per_second>".
- echo "8:16 1048576" > /cgroup/blkio/blkio.read_bps_device
+ echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.read_bps_device
Above will put a limit of 1MB/second on reads happening for root group
on device having major/minor number 8:16.
CFQ and throttling will practically treat all groups at same level.
pivot
- / | \ \
+ / / \ \
root test1 test2 test3
Down the line we can implement hierarchical accounting/control support
- Specifies per cgroup weight. This is default weight of the group
on all the devices until and unless overridden by per device rule.
(See blkio.weight_device).
- Currently allowed range of weights is from 100 to 1000.
+ Currently allowed range of weights is from 10 to 1000.
- blkio.weight_device
- One can specify per cgroup per device rules using this interface.
Following is the format.
- #echo dev_maj:dev_minor weight > /path/to/cgroup/blkio.weight_device
+ # echo dev_maj:dev_minor weight > blkio.weight_device
Configure weight=300 on /dev/sdb (8:16) in this cgroup
# echo 8:16 300 > blkio.weight_device
# cat blkio.weight_device
CFQ sysfs tunable
=================
-/sys/block/<disk>/queue/iosched/group_isolation
------------------------------------------------
-
-If group_isolation=1, it provides stronger isolation between groups at the
-expense of throughput. By default group_isolation is 0. In general that
-means that if group_isolation=0, expect fairness for sequential workload
-only. Set group_isolation=1 to see fairness for random IO workload also.
-
-Generally CFQ will put random seeky workload in sync-noidle category. CFQ
-will disable idling on these queues and it does a collective idling on group
-of such queues. Generally these are slow moving queues and if there is a
-sync-noidle service tree in each group, that group gets exclusive access to
-disk for certain period. That means it will bring the throughput down if
-group does not have enough IO to drive deeper queue depths and utilize disk
-capacity to the fullest in the slice allocated to it. But the flip side is
-that even a random reader should get better latencies and overall throughput
-if there are lots of sequential readers/sync-idle workload running in the
-system.
-
-If group_isolation=0, then CFQ automatically moves all the random seeky queues
-in the root group. That means there will be no service differentiation for
-that kind of workload. This leads to better throughput as we do collective
-idling on root sync-noidle tree.
-
-By default one should run with group_isolation=0. If that is not sufficient
-and one wants stronger isolation between groups, then set group_isolation=1
-but this will come at cost of reduced throughput.
-
/sys/block/<disk>/queue/iosched/slice_idle
------------------------------------------
On a faster hardware CFQ can be slow, especially with sequential workload.