Differences between revisions 3 and 4
Revision 3 as of 2014-04-18 19:42:18
Size: 19220
Editor: FrBrGeorge
Comment:
Revision 4 as of 2014-04-19 07:27:27
Size: 20484
Editor: FrBrGeorge
Comment:
Deletions are marked like this. Additions are marked like this.
Line 24: Line 24:
Базовая статья: [[http://lartc.org/howto/lartc.qdisc.html|Linux Advanced Routing & Traffic Control HOWTO / Chapter 9. Queueing Disciplines for Bandwidth Management]], [[http://habrahabr.ru/post/133076/index.html|попроще на Хабре]] Базовые статьи:
 *
[[http://lartc.org/howto/lartc.qdisc.html|Linux Advanced Routing & Traffic Control HOWTO / Chapter 9. Queueing Disciplines for Bandwidth Management]]
 * [[http://www.opennet.ru/docs/RUS/LARTC|русский перевод]]
 *
[[http://habrahabr.ru/post/133076/index.html|попроще на Хабре]]
Line 32: Line 35:

Термины:
 * ''дисциплина'' — очередь (или иная динамическая структура) пакетов и правила её обработки
 * ''класс'' — поток пакетов; на конце этого потока либо очередь, либо подкласс (тогда классы иерархические)
 * ''фильтр'' — сканирование пакетов на предмет записывания их в классы


Line 37: Line 48:
Чистое ограничение трафика. Чистое ограничение трафика: очередь пакетов иррегулярная (то густо, то пусто), очередь жетонов равномерно прибывающая; выйти из системы пакет может только получив жетон.

 * ⇒ средняя скорость отсылки пакетов не превышает скорость прибытия жетонов
 * ⇒ т. н. burst: если очередь пакетов пуста, а очередь из N жетонов полна, и внезапно™ прибывает много пакетов, первые N пакетов отошлются быстрее

Ограничение трафика: Linux TC и интерфейсный уровень

ToS

Поле IP-пакета. 4 бита (+3+1резервный) таблица отсюда:

TOS

Bits

Means

Linux Priority

Band

0x0

0

Normal Service

0 Best Effort

1

0x2

1

Minimize Monetary Cost

1 Filler

2

0x4

2

Maximize Reliability

0 Best Effort

1

0x6

3

mmc+mr

0 Best Effort

1

0x8

4

Maximize Throughput

2 Bulk

2

0xa

5

mmc+mt

2 Bulk

2

0xc

6

mr+mt

2 Bulk

2

0xe

7

mmc+mr+mt

2 Bulk

2

0x10

8

Minimize Delay

6 Interactive

0

0x12

9

mmc+md

6 Interactive

0

0x14

10

mr+md

6 Interactive

0

0x16

11

mmc+mr+md

6 Interactive

0

0x18

12

mt+md

4 Int. Bulk

1

0x1a

13

mmc+mt+md

4 Int. Bulk

1

0x1c

14

mr+mt+md

4 Int. Bulk

1

0x1e

15

mmc+mr+mt+md

4 Int. Bulk

1

TC

Базовые статьи:

Userspace + поддержка в ядре + модули.

Отдельная подсистема (≠netfilter).

Работа с очередями пакетов («дисциплина обработки»):

  • приоритизация
  • ограничение пропускной способности

Термины:

  • дисциплина — очередь (или иная динамическая структура) пакетов и правила её обработки

  • класс — поток пакетов; на конце этого потока либо очередь, либо подкласс (тогда классы иерархические)

  • фильтр — сканирование пакетов на предмет записывания их в классы

pfifo_fast

Чистая приоритизация. Три приоритетных очереди в зависимости от ToS.

Token Bucket Filter (TBF)

Чистое ограничение трафика: очередь пакетов иррегулярная (то густо, то пусто), очередь жетонов равномерно прибывающая; выйти из системы пакет может только получив жетон.

  • ⇒ средняя скорость отсылки пакетов не превышает скорость прибытия жетонов
  • ⇒ т. н. burst: если очередь пакетов пуста, а очередь из N жетонов полна, и внезапно™ прибывает много пакетов, первые N пакетов отошлются быстрее


что ещё про TBF tc: tc filter

NAME
       pfifo - Packet limited First In, First Out queue

       bfifo - Byte limited First In, First Out queue


SYNOPSIS
       tc qdisc ... add pfifo [ limit packets ]

       tc qdisc ... add bfifo [ limit bytes ]


DESCRIPTION
       The  pfifo  and  bfifo qdiscs are unadorned First In, First Out queues.
       They are the simplest queues possible and therefore have  no  overhead.
       pfifo  constrains the queue size as measured in packets.  bfifo does so
       as measured in bytes.

       Like all non-default qdiscs, they maintain statistics. This might be  a
       reason to prefer pfifo or bfifo over the default.


ALGORITHM
       A  list  of  packets  is  maintained, when a packet is enqueued it gets
       inserted at the tail of a list. When a packet needs to be sent  out  to
       the network, it is taken from the head of the list.

       If  the  list  is  too long, no further packets are allowed on. This is
       called 'tail drop'.

NAME
       CBQ - Class Based Queueing

SYNOPSIS
       tc  qdisc  ... dev dev ( parent classid | root) [ handle major: ] cbq [
       allot bytes ] avpkt bytes bandwidth rate [ cell bytes ] [ ewma log ]  [
       mpu bytes ]

       tc  class  ... dev dev parent major:[minor] [ classid major:minor ] cbq
       allot bytes [ bandwidth rate ] [ rate rate ]  prio  priority  [  weight
       weight  ] [ minburst packets ] [ maxburst packets ] [ ewma log ] [ cell
       bytes ] avpkt bytes [ mpu bytes ] [ bounded isolated ] [ split handle &
       defmap defmap ] [ estimator interval timeconstant ]


DESCRIPTION
       Class  Based  Queueing  is  a  classful  qdisc  that  implements a rich
       linksharing hierarchy of classes.  It contains shaping elements as well
       as  prioritizing  capabilities.   Shaping  is performed using link idle
       time calculations based on the timing of dequeue events and  underlying
       link bandwidth.

NAME
       HTB - Hierarchy Token Bucket

SYNOPSIS
       tc  qdisc  ... dev dev ( parent classid | root) [ handle major: ] htb [
       default minor-id ]

       tc class ... dev dev parent major:[minor] [ classid major:minor  ]  htb
       rate rate [ ceil rate ] burst bytes [ cburst bytes ] [ prio priority ]


DESCRIPTION
       HTB is meant as a more understandable and intuitive replacement for the
       CBQ qdisc in Linux. Both CBQ and HTB help you to control the use of the
       outbound  bandwidth on a given link. Both allow you to use one physical
       link to simulate several slower links and to send  different  kinds  of
       traffic  on different simulated links. In both cases, you have to spec-
       ify how to divide the physical link into simulated  links  and  how  to
       decide which simulated link to use for a given packet to be sent.

       Unlike  CBQ,  HTB shapes traffic based on the Token Bucket Filter algo-
       rithm which does not depend on interface characteristics  and  so  does
       not need to know the underlying bandwidth of the outgoing interface.


SHAPING ALGORITHM
       Shaping works as documented in tc-tbf (8).

NAME
       tbf - Token Bucket Filter

SYNOPSIS
       tc  qdisc ... tbf rate rate burst bytes/cell ( latency ms | limit bytes
       ) [ mpu bytes [ peakrate rate mtu bytes/cell ] ]

       burst is also known as buffer and maxburst. mtu is also known  as  min-
       burst.

DESCRIPTION
       The  Token  Bucket  Filter is a classless queueing discipline available
       for traffic control with the tc(8) command.

       TBF is a pure shaper and never schedules traffic. It  is  non-work-con-
       serving  and  may  throttle  itself, although packets are available, to
       ensure that the configured rate is  not  exceeded.   On  all  platforms
       except  for  Alpha, it is able to shape up to 1mbit/s of normal traffic
       with ideal minimal burstiness, sending out  data exactly at the config-
       ured rates.

       Much  higher  rates  are possible but at the cost of losing the minimal
       burstiness. In that case, data is on average dequeued at the configured
       rate  but may be sent much faster at millisecond timescales. Because of
       further queues living in network adaptors, this is often not a problem.

       Kernels  with  a  higher  'HZ'  can  achieve  higher rates with perfect
       burstiness. On Alpha, HZ is ten times higher,  leading  to  a  10mbit/s
       limit  to perfection. These calculations hold for packets of on average
       1000 bytes.


ALGORITHM
       As the name implies, traffic is filtered based on  the  expenditure  of
       tokens.   Tokens  roughly correspond to bytes, with the additional con-
       straint that each packet consumes some tokens, no matter how  small  it
       is.  This  reflects the fact that even a zero-sized packet occupies the
       link for some time.

       On creation, the TBF is stocked with tokens  which  correspond  to  the
       amount  of  traffic  that  can  be  burst in one go. Tokens arrive at a
       steady rate, until the bucket is full.

       If no tokens are available, packets are  queued,  up  to  a  configured
       limit.  The  TBF  now calculates the token deficit, and throttles until
       the first packet in the queue can be sent.

       If it is not acceptable to  burst  out  packets  at  maximum  speed,  a
       peakrate  can be configured to limit the speed at which the bucket emp-
       ties. This peakrate is implemented as a second TBF with  a  very  small
       bucket, so that it doesn't burst.

       To  achieve  perfection,  the  second  bucket may contain only a single
       packet, which leads to the earlier mentioned 1mbit/s limit.

       This limit is caused by the fact that the kernel can only throttle  for
       at minimum 1 'jiffy', which depends on HZ as 1/HZ. For perfect shaping,
       only a single packet can get sent per jiffy - for  HZ=100,  this  means
       100 packets of on average 1000 bytes each, which roughly corresponds to
       1mbit/s.

NAME
       PRIO - Priority qdisc

SYNOPSIS
       tc  qdisc ... dev dev ( parent classid | root) [ handle major: ] prio [
       bands bands ] [ priomap band,band,band...  ] [ estimator interval time-
       constant ]


DESCRIPTION
       The  PRIO  qdisc is a simple classful queueing discipline that contains
       an arbitrary number of classes of differing priority. The  classes  are
       dequeued in numerical descending order of priority. PRIO is a scheduler
       and never delays packets - it is a work-conserving  qdisc,  though  the
       qdiscs contained in the classes may not be.

       Very useful for lowering latency when there is no need for slowing down
       traffic.


ALGORITHM
       On creation with 'tc qdisc add', a fixed number of  bands  is  created.
       Each  band is a class, although is not possible to add classes with 'tc
       qdisc add', the number of bands to be created must instead be specified
       on the commandline attaching PRIO to its root.

       When dequeueing, band 0 is tried first and only if it did not deliver a
       packet does PRIO try band 1, and so onwards. Maximum reliability  pack-
       ets should therefore go to band 0, minimum delay to band 1 and the rest
       to band 2.

       As the PRIO qdisc itself will have minor number 0, band 0  is  actually
       major:1, band 1 is major:2, etc. For major, substitute the major number
       assigned to the qdisc on 'tc qdisc add' with the handle parameter.

NAME
       sfq - Stochastic Fairness Queueing

SYNOPSIS
       tc qdisc ... perturb seconds quantum bytes


DESCRIPTION
       Stochastic  Fairness Queueing is a classless queueing discipline avail-
       able for traffic control with the tc(8) command.

       SFQ does not shape traffic but only schedules the transmission of pack-
       ets,  based  on  'flows'.   The goal is to ensure fairness so that each
       flow is able to send data in turn, thus preventing any single flow from
       drowning out the rest.

       This  may  in  fact  have some effect in mitigating a Denial of Service
       attempt.

       SFQ is work-conserving and therefore always delivers a packet if it has
       one available.

ALGORITHM
       On enqueueing, each packet is assigned to a hash bucket, based on

       (i)    Source address

       (ii)   Destination address

       (iii)  Source port

       If these are available. SFQ knows about ipv4 and ipv6 and also UDP, TCP
       and ESP.  Packets with other protocols are hashed based on  the  32bits
       representation  of  their  destination and the socket they belong to. A
       flow corresponds mostly to a TCP/IP connection.

       Each of these buckets should represent a unique flow. Because  multiple
       flows  may get hashed to the same bucket, the hashing algorithm is per-
       turbed at configurable intervals so that the unfairness lasts only  for
       a  short  while. Perturbation may however cause some inadvertent packet
       reordering to occur.

       When dequeuing, each hashbucket with data is queried in a  round  robin
       fashion.

       The compile time maximum length of the SFQ is 128 packets, which can be
       spread over at most 128 buckets of 1024 available. In case of overflow,
       tail-drop  is  performed  on the fullest bucket, thus maintaining fair-
       ness.

NAME
       CoDel - Controlled-Delay Active Queue Management algorithm

SYNOPSIS
       tc qdisc ... codel [ limit PACKETS ] [ target TIME ] [ interval TIME ] [
       ecn | noecn ]
ALGORITHM
Instead of using queue size or
       queue average, it uses the local minimum  queue  as  a  measure  of  the
       standing/persistent  queue. 

NAME
       choke - choose and keep scheduler

SYNOPSIS
       tc  qdisc  ...  choke  limit packets min packets max packets avpkt bytes
       burst packets [ ecn ] [ bandwidth rate ] probability chance

DESCRIPTION
       CHOKe (CHOose and Keep for responsive flows, CHOose and Kill  for  unre‐
       sponsive  flows)  is  a  classless  qdisc  designed to both identify and
       penalize flows that monopolize the queue.  CHOKe is a variation of  RED,
       and the configuration is similar to RED.

ALGORITHM
       Once  the  queue hits a certain average length, a random packet is drawn
       from the queue.  If both the to-be-queued and the drawn packet belong to
       the same flow, both packets are dropped.  Otherwise, if the queue length
       is still below the maximum length, the new  packet  has  a  configurable
       chance  of  being  marked (which may mean dropped).  If the queue length
       exceeds max, the new packet will always be marked (or dropped).  If  the
       queue length exceeds limit, the new packet is always dropped.

NAME
       drr - deficit round robin scheduler

SYNOPSIS
       tc qdisc ... add drr [ quantum bytes ]

DESCRIPTION
       The  Deficit Round Robin Scheduler is a classful queuing discipline as a
       more flexible replacement for Stochastic Fairness Queuing.

       Unlike SFQ, there are no built-in queues -- you need to add classes  and
       then set up filters to classify packets accordingly.  This can be useful
       e.g. for using RED qdiscs with different settings for  particular  traf‐
       fic.  There  is no default class -- if a packet cannot be classified, it
       is dropped.

ALGORITHM
       Each class is assigned a deficit counter, initialized to quantum.

       DRR maintains an (internal) ''active'' list of classes whose qdiscs  are
       non-empty.   This list is used for dequeuing.  A packet is dequeued from
       the class at the head of the list if the packet size is smaller or equal
       to the deficit counter.  If the counter is too small, it is increased by
       quantum and the scheduler moves on to the next class in the active list.

NAME
       ematch - extended matches for use with "basic" or "flow" filters

NAME
       HFSC - Hierarchical Fair Service Curve's control under linux

SYNOPSIS
       tc qdisc add ... hfsc [ default CLASSID ]

       tc class add ... hfsc [ [ rt SC ] [ ls SC ] | [ sc SC ] ] [ ul SC ]

       rt : realtime service curve
       ls : linkshare service curve
       sc : rt+ls service curve
       ul : upperlimit service curve

NAME
       tc-stab - Generic size table manipulations

SYNOPSIS
       tc qdisc add ... stab \
           [ mtu BYTES ] [ tsize SLOTS ] \
           [ mpu BYTES ] [ overhead BYTES ] [ linklayer TYPE ] ...
DESCRIPTION
       Size  tables  allow manipulation of packet size, as seen by whole sched‐
       uler framework (of course, the actual packet  size  remains  the  same).
       Adjusted packet size is calculated only once - when a qdisc enqueues the
       packet. Initial root enqueue initializes it to the real packet's size.

tc-netem:

root@host-15 ~ #  ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bridge state UP qlen 1000
    link/ether 08:00:27:3b:c6:bd brd ff:ff:ff:ff:ff:ff
    inet6 fe80::a00:27ff:fe3b:c6bd/64 scope link 
       valid_lft forever preferred_lft forever
3: enp0s9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bridge state UP qlen 1000
    link/ether 08:00:27:a2:97:28 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::a00:27ff:fea2:9728/64 scope link 
       valid_lft forever preferred_lft forever
4: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
    link/ether 08:00:27:e9:45:4b brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global enp0s3
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fee9:454b/64 scope link 
       valid_lft forever preferred_lft forever
5: bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 08:00:27:3b:c6:bd brd ff:ff:ff:ff:ff:ff
    inet6 fe80::a00:27ff:fe3b:c6bd/64 scope link 
       valid_lft forever preferred_lft forever

root@host-15 ~ # cat /etc/net/ifaces/bridge/options 
TYPE=bri
HOST='enp0s8 enp0s9'

root@host-15 ~ # grep -r "." /etc/net/ifaces/enp0s*
/etc/net/ifaces/enp0s3/options:BOOTPROTO=dhcp
/etc/net/ifaces/enp0s3/options:TYPE=eth
/etc/net/ifaces/enp0s3/options:CONFIG_WIRELESS=no
/etc/net/ifaces/enp0s3/options:CONFIG_IPV4=yes
/etc/net/ifaces/enp0s8/options:TYPE=eth
/etc/net/ifaces/enp0s8/qos/1/qdisc#delay:netem delay 0.5ms loss 0.05% 25% corrupt 0.05%
/etc/net/ifaces/enp0s8/qos/1/qdisc#loss:netem loss 0.33% 25% corrupt 0.33%
/etc/net/ifaces/enp0s8/qos/1/qdisc#rate:netem loss 0.05% 25% corrupt 0.05% rate 10mbit
/etc/net/ifaces/enp0s8/qos/1/qdisc#LOSS:netem loss 5% 10% corrupt 5%
/etc/net/ifaces/enp0s8/qos/1/qdisc:pfifo_fast
/etc/net/ifaces/enp0s9/options:TYPE=eth

root@host-15 ~ # service network restartwith LOSS  

root@host-15 ~ # tc -s qdisc show                
qdisc netem 1: dev enp0s8 root refcnt 2 limit 1000 loss 5% 10% corrupt 5%
 Sent 316 bytes 4 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc netem 1: dev enp0s9 root refcnt 2 limit 1000 loss 5% 10% corrupt 5%
 Sent 246 bytes 3 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc pfifo_fast 0: dev enp0s3 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 86135 bytes 524 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 

dummynet

пример

[root@fwe-bsd ~]# cat /etc/rc.conf     
hostname="fwe-bsd.fw.cs.msu.su"
ifconfig_em0="dhcp"
#defaultrouter="10.0.2.1"
dumpdev="NO"

sshd_enable="YES"
cloned_interfaces="bridge0"
ifconfig_bridge0="addm le0 addm le1 up"
ifconfig_le0="up"
ifconfig_le1="up"

firewall_enable="YES"
firewall_type="/etc/rc.ebridge"
dummynet_enable="YES"
[root@fwe-bsd ~]# cat /etc/rc.ebridge 
#export $PATH="/bin:/sbin"
pipe 1 config bw 10Mbit/s
add pipe 1 ip from any to any via le* layer2
add allow ip from any to any via em0 layer2
add allow ip from any to any via lo0
add deny ip from any to 127.0.0.0/8
add deny ip from 127.0.0.0/8 to any
add deny ip from any to ::1
add deny ip from ::1 to any
add allow ip from any to any


[root@fwe-bsd ~]# ipfw pipe show
00001:  10.000 Mbit/s    0 ms burst 0 
q131073  50 sl. 0 flows (1 buckets) sched 65537 weight 0 lmax 0 pri 0 droptail sched 65537 type FIFO flags 0x0 0 buckets 0 active

LecturesCMC/UnixFirewalls2014/09_TrafficShapingLinux (last edited 2014-04-25 10:21:13 by FrBrGeorge)