Kim Nordlund [Sat, 2 Dec 2006 04:21:44 +0000 (20:21 -0800)]
[PKT_SCHED] act_gact: division by zero
Not returning -EINVAL, because someone might want to use the value
zero in some future gact_prob algorithm?
Signed-off-by: Kim Nordlund <kim.nordlund@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sat, 2 Dec 2006 04:10:13 +0000 (20:10 -0800)]
[NETFILTER]: Kill ip_queue from feature removal schedule.
We really can't remove ip_queue. Many users use this, there is no binary
compatible interface and even the compat replacement for the originally
statically linked library doesn't work. There is also no real necessity
to remove the code, so the feature-removal-schedule entry should be
removed instead.
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Sat, 2 Dec 2006 04:07:42 +0000 (20:07 -0800)]
[GENETLINK]: Add cmd dump completion.
Remove assumption that generic netlink commands cannot have dump
completion callbacks.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 Dec 2006 05:05:23 +0000 (21:05 -0800)]
[ATM]: Kill ipcommon.[ch]
All that remained was skb_migrate() and that was overkill
for what the two call sites were trying to do.
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Fri, 1 Dec 2006 03:54:05 +0000 (19:54 -0800)]
[NET_SCHED]: policer: restore compatibility with old iproute binaries
The tc actions increased the size of struct tc_police, which broke
compatibility with old iproute binaries since both the act_police
and the old NET_CLS_POLICE code check for an exact size match.
Since the new members are not even used, the simple fix is to also
accept the size of the old structure. Dumping is not affected since
old userspace will receive a bigger structure, which is handled fine.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adrian Bunk [Fri, 1 Dec 2006 03:50:36 +0000 (19:50 -0800)]
[PKT_SCHED]: Remove unused exports.
This patch removes the following unused EXPORT_SYMBOL's:
- sch_api.c: qdisc_lookup
- sch_generic.c: __netdev_watchdog_up
- sch_generic.c: noop_qdisc_ops
- sch_generic.c: qdisc_alloc
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 1 Dec 2006 03:28:48 +0000 (19:28 -0800)]
[EBTABLES]: Split ebt_replace into user and kernel variants, annotate.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 1 Dec 2006 03:28:25 +0000 (19:28 -0800)]
[EBTABLES]: Clean ebt_register_table() up.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 1 Dec 2006 03:28:08 +0000 (19:28 -0800)]
[EBTABLES]: Move calls of ebt_verify_pointers() upstream.
... and pass just repl->name to translate_table()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 1 Dec 2006 03:27:48 +0000 (19:27 -0800)]
[EBTABLES]: ebt_check_entry() doesn't need valid_hooks
We can check newinfo->hook_entry[...] instead.
Kill unused argument.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 1 Dec 2006 03:27:32 +0000 (19:27 -0800)]
[EBTABLES]: Clean ebt_get_udc_positions() up.
Check for valid_hooks is redundant (newinfo->hook_entry[i] will
be NULL if bit i is not set). Kill it, kill unused arguments.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 1 Dec 2006 03:27:13 +0000 (19:27 -0800)]
[EBTABLES]: Switch ebt_check_entry_size_and_hooks() to use of newinfo->hook_entry[]
kill unused arguments
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 1 Dec 2006 03:26:53 +0000 (19:26 -0800)]
[EBTABLES]: translate_table(): switch direct uses of repl->hook_info to newinfo
Since newinfo->hook_table[] already has been set up, we can switch to using
it instead of repl->{hook_info,valid_hooks}.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 1 Dec 2006 03:26:35 +0000 (19:26 -0800)]
[EBTABLES]: Move more stuff into ebt_verify_pointers().
Take intialization of ->hook_entry[...], ->entries_size and ->nentries
over there, pull the check for empty chains into the end of that sucker.
Now it's self-contained, so we can move it up in the very beginning of
translate_table() *and* we can rely on ->hook_entry[] being properly
transliterated after it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 1 Dec 2006 03:26:14 +0000 (19:26 -0800)]
[EBTABLES]: Pull the loop doing __ebt_verify_pointers() into a separate function.
It's easier to expand the iterator here *and* we'll be able to move all
uses of ebt_replace from translate_table() into this one.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 1 Dec 2006 03:25:51 +0000 (19:25 -0800)]
[EBTABLES]: Split ebt_check_entry_size_and_hooks
Split ebt_check_entry_size_and_hooks() in two parts - one that does
sanity checks on pointers (basically, checks that we can safely
use iterator from now on) and the rest of it (looking into details
of entry).
The loop applying ebt_check_entry_size_and_hooks() is split in two.
Populating newinfo->hook_entry[] is done in the first part.
Unused arguments killed.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 1 Dec 2006 03:25:21 +0000 (19:25 -0800)]
[EBTABLES]: Prevent wraparounds in checks for entry components' sizes.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 1 Dec 2006 03:24:49 +0000 (19:24 -0800)]
[EBTABLES]: Deal with the worst-case behaviour in loop checks.
No need to revisit a chain we'd already finished with during
the check for current hook. It's either instant loop (which
we'd just detected) or a duplicate work.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 1 Dec 2006 03:24:12 +0000 (19:24 -0800)]
[EBTABLES]: Verify that ebt_entries have zero ->distinguisher.
We need that for iterator to work; existing check had been too weak.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 1 Dec 2006 03:22:42 +0000 (19:22 -0800)]
[EBTABLES]: Fix wraparounds in ebt_entries verification.
We need to verify that
a) we are not too close to the end of buffer to dereference
b) next entry we'll be checking won't be _before_ our
While we are at it, don't subtract unrelated pointers...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Morton [Fri, 1 Dec 2006 03:16:28 +0000 (19:16 -0800)]
[TCP]: Fix warnings with TCP_MD5SIG disabled.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adrian Bunk [Fri, 1 Dec 2006 01:22:29 +0000 (17:22 -0800)]
[NET]: Possible cleanups.
This patch contains the following possible cleanups:
- make the following needlessly global functions statis:
- ipv4/tcp.c: __tcp_alloc_md5sig_pool()
- ipv4/tcp_ipv4.c: tcp_v4_reqsk_md5_lookup()
- ipv4/udplite.c: udplite_rcv()
- ipv4/udplite.c: udplite_err()
- make the following needlessly global structs static:
- ipv4/tcp_ipv4.c: tcp_request_sock_ipv4_ops
- ipv4/tcp_ipv4.c: tcp_sock_ipv4_specific
- ipv6/tcp_ipv6.c: tcp_request_sock_ipv6_ops
- net/ipv{4,6}/udplite.c: remove inline's from static functions
(gcc should know best when to inline them)
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Miika Komu [Fri, 1 Dec 2006 00:41:50 +0000 (16:41 -0800)]
[IPSEC]: Add AF_KEY interface for encapsulation family.
Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Miika Komu [Fri, 1 Dec 2006 00:40:51 +0000 (16:40 -0800)]
[IPSEC]: Add netlink interface for the encapsulation family.
Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Miika Komu [Fri, 1 Dec 2006 00:40:43 +0000 (16:40 -0800)]
[IPSEC]: Add encapsulation family.
Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 Dec 2006 00:35:01 +0000 (16:35 -0800)]
[TCP] MD5SIG: Kill CONFIG_TCP_MD5SIG_DEBUG.
It just obfuscates the code and adds limited value. And as Adrian
Bunk noticed, it lacked Kconfig help text too, so just kill it.
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 30 Nov 2006 01:37:42 +0000 (17:37 -0800)]
[NET_SCHED]: Fix endless loops (part 5): netem/tbf/hfsc ->requeue failures
When peeking at the next packet in a child qdisc by calling dequeue/requeue,
the upper qdisc qlen counter may get out of sync in case the requeue fails.
The qdisc and the child qdisc both have their counter decremented, but since
no packet is given to the upper qdisc it won't decrement its counter itself.
requeue should not fail, so this is mostly for "correctness".
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 30 Nov 2006 01:37:05 +0000 (17:37 -0800)]
[NET_SCHED]: Fix endless loops (part 4): HTB
Convert HTB to use qdisc_tree_decrease_len() and add a callback
for deactivating a class when its child queue becomes empty.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 30 Nov 2006 01:36:43 +0000 (17:36 -0800)]
[NET_SCHED]: Fix endless loops (part 3): HFSC
Convert HFSC to use qdisc_tree_decrease_len() and add a callback
for deactivating a class when its child queue becomes empty.
All queue purging goes through hfsc_purge_queue(), which is used in
three cases: grafting, class creation (when a leaf class is turned
into an intermediate class by attaching a new class) and class
deletion. In all cases qdisc_tree_decrease_len() is needed.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 30 Nov 2006 01:36:20 +0000 (17:36 -0800)]
[NET_SCHED]: Fix endless loops (part 2): "simple" qdiscs
Convert the "simple" qdiscs to use qdisc_tree_decrease_qlen() where
necessary:
- all graft operations
- destruction of old child qdiscs in prio, red and tbf change operation
- purging of queue in sfq change operation
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 30 Nov 2006 01:35:48 +0000 (17:35 -0800)]
[NET_SCHED]: Fix endless loops caused by inaccurate qlen counters (part 1)
There are multiple problems related to qlen adjustment that can lead
to an upper qdisc getting out of sync with the real number of packets
queued, leading to endless dequeueing attempts by the upper layer code.
All qdiscs must maintain an accurate q.qlen counter. There are basically
two groups of operations affecting the qlen: operations that propagate
down the tree (enqueue, dequeue, requeue, drop, reset) beginning at the
root qdisc and operations only affecting a subtree or single qdisc
(change, graft, delete class). Since qlen changes during operations from
the second group don't propagate to ancestor qdiscs, their qlen values
become desynchronized.
This patch adds a function to propagate qlen changes up the qdisc tree,
optionally calling a callback function to perform qdisc-internal
maintenance when the child qdisc becomes empty. The follow-up patches
will convert all qdiscs to use this function where necessary.
Noticed by Timo Steinbach <tsteinbach@astaro.com>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 30 Nov 2006 01:35:18 +0000 (17:35 -0800)]
[NET_SCHED]: Set parent classid in default qdiscs
Set parent classids in default qdiscs to allow walking up the tree
from outside the qdiscs. This is needed by the next patch.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 30 Nov 2006 01:34:50 +0000 (17:34 -0800)]
[NET_SCHED]: sch_htb: perform qlen adjustment immediately in ->delete
qlen adjustment should happen immediately in ->delete and not in the
class destroy function because the reference count will not hit zero in
->delete (sch_api holds a reference) but in ->put. Since the qdisc
lock is released between deletion of the class and final destruction
this creates an externally visible error in the qlen counter.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Morris [Wed, 29 Nov 2006 21:50:27 +0000 (16:50 -0500)]
Rename class_destroy to avoid namespace conflicts.
We're seeing increasing namespace conflicts between the global
class_destroy() function declared in linux/device.h, and the private
function in the SELinux core code. This patch renames the SELinux
function to cls_destroy() to avoid this conflict.
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Paul Moore [Wed, 29 Nov 2006 18:18:20 +0000 (13:18 -0500)]
NetLabel: add the ranged tag to the CIPSOv4 protocol
Add support for the ranged tag (tag type #5) to the CIPSOv4 protocol.
The ranged tag allows for seven, or eight if zero is the lowest category,
category ranges to be specified in a CIPSO option. Each range is specified by
two unsigned 16 bit fields, each with a maximum value of 65534. The two values
specify the start and end of the category range; if the start of the category
range is zero then it is omitted.
See Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt for more details.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Paul Moore [Wed, 29 Nov 2006 18:18:19 +0000 (13:18 -0500)]
NetLabel: add the enumerated tag to the CIPSOv4 protocol
Add support for the enumerated tag (tag type #2) to the CIPSOv4 protocol.
The enumerated tag allows for 15 categories to be specified in a CIPSO option,
where each category is an unsigned 16 bit field with a maximum value of 65534.
See Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt for more details.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Paul Moore [Wed, 29 Nov 2006 18:18:18 +0000 (13:18 -0500)]
NetLabel: convert to an extensibile/sparse category bitmap
The original NetLabel category bitmap was a straight char bitmap which worked
fine for the initial release as it only supported 240 bits due to limitations
in the CIPSO restricted bitmap tag (tag type 0x01). This patch converts that
straight char bitmap into an extensibile/sparse bitmap in order to lay the
foundation for other CIPSO tag types and protocols.
This patch also has a nice side effect in that all of the security attributes
passed by NetLabel into the LSM are now in a format which is in the host's
native byte/bit ordering which makes the LSM specific code much simpler; look
at the changes in security/selinux/ss/ebitmap.c as an example.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Pablo Neira Ayuso [Wed, 29 Nov 2006 01:35:43 +0000 (02:35 +0100)]
[NETFILTER]: remove the reference to ipchains from Kconfig
It is time to move on :-)
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:42 +0000 (02:35 +0100)]
[NETFILTER]: Fix PROC_FS=n warnings
Fix some unused function/variable warnings.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:41 +0000 (02:35 +0100)]
[NETFILTER]: remove remaining ASSERT_{READ,WRITE}_LOCK
Signed-off-by: Patrick McHardy <kaber@trash.net>
Bart De Schuymer [Wed, 29 Nov 2006 01:35:40 +0000 (02:35 +0100)]
[NETFILTER]: ebtables: add --snap-arp option
The attached patch adds --snat-arp support, which makes it possible to
change the source mac address in both the mac header and the arp header
with one rule.
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:38 +0000 (02:35 +0100)]
[NETFILTER]: x_tables: add NFLOG target
Add new NFLOG target to allow use of nfnetlink_log for both IPv4 and IPv6.
Currently we have two (unsupported by userspace) hacks in the LOG and ULOG
targets to optionally call to the nflog API. They lack a few features,
namely the IPv4 and IPv6 LOG targets can not specify a number of arguments
related to nfnetlink_log, while the ULOG target is only available for IPv4.
Remove those hacks and add a clean way to use nfnetlink_log.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:36 +0000 (02:35 +0100)]
[NETFILTER]: x_tables: add port of hashlimit match for IPv4 and IPv6
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:34 +0000 (02:35 +0100)]
[NETFILTER]: nfnetlink_log: remove useless prefix length limitation
There is no reason for limiting netlink attributes in size.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Eric Leblond [Wed, 29 Nov 2006 01:35:33 +0000 (02:35 +0100)]
[NETFILTER]: nfnetlink_queue: allow changing queue length through netlink
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Pablo Neira Ayuso [Wed, 29 Nov 2006 01:35:32 +0000 (02:35 +0100)]
[NETFILTER]: ctnetlink: rework conntrack fields dumping logic on events
| NEW | UPDATE | DESTROY |
----------------------------------------|
tuples | Y | Y | Y |
status | Y | Y | N |
timeout | Y | Y | N |
protoinfo | S | S | N |
helper | S | S | N |
mark | S | S | N |
counters | F | F | Y |
Leyend:
Y: yes
N: no
S: iif the field is set
F: iif overflow
This patch also replace IPCT_HELPINFO by IPCT_HELPER since we want to
track the helper assignation process, not the changes in the private
information held by the helper.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Pablo Neira Ayuso [Wed, 29 Nov 2006 01:35:31 +0000 (02:35 +0100)]
[NETFILTER]: ctnetlink: check for status attribute existence on conntrack creation
Check that status flags are available in the netlink message received
to create a new conntrack.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:30 +0000 (02:35 +0100)]
[NETFILTER]: sip conntrack: better NAT handling
The NAT handling of the SIP helper has a few problems:
- Request headers are only mangled in the reply direction, From/To headers
not at all, which can lead to authentication failures with DNAT in case
the authentication domain is the IP address
- Contact headers in responses are only mangled for REGISTER responses
- Headers may be mangled even though they contain addresses not
participating in the connection, like alternative addresses
- Packets are droppen when domain names are used where the helper expects
IP addresses
This patch takes a different approach, instead of fixed rules what field
to mangle to what content, it adds symetric mapping of From/To/Via/Contact
headers, which allows to deal properly with echoed addresses in responses
and foreign addresses not belonging to the connection.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:28 +0000 (02:35 +0100)]
[NETFILTER]: sip conntrack: make header shortcuts optional
Not every header has a shortcut, so make them optional instead
of searching for the same string twice.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:27 +0000 (02:35 +0100)]
[NETFILTER]: sip conntrack: do case insensitive SIP header search
SIP headers are generally case-insensitive, only SDP headers are
case sensitive.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:26 +0000 (02:35 +0100)]
[NETFILTER]: sip conntrack: minor cleanup
- Use enum for header field enumeration
- Use numerical value instead of pointer to header info structure to
identify headers, unexport ct_sip_hdrs
- group SIP and SDP entries in header info structure
- remove double forward declaration of ct_sip_get_info
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:25 +0000 (02:35 +0100)]
[NETFILTER]: ip_conntrack: fix NAT helper unload races
The NAT helpr hooks are protected by RCU, but all of the
conntrack helpers test and use the global pointers instead
of copying them first using rcu_dereference()
Also replace synchronize_net() by synchronize_rcu() for clarity
since sychronizing only with packet receive processing is
insufficient to prevent races.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Yasuyuki Kozakai [Wed, 29 Nov 2006 01:35:23 +0000 (02:35 +0100)]
[NETFILTER]: conntrack: add '_get' to {ip, nf}_conntrack_expect_find
We usually uses 'xxx_find_get' for function which increments
reference count.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:22 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: /proc compatibility with old connection tracking
This patch adds /proc/net/ip_conntrack, /proc/net/ip_conntrack_expect and
/proc/net/stat/ip_conntrack files to keep old programs using them working.
The /proc/net/ip_conntrack and /proc/net/ip_conntrack_expect files show only
IPv4 entries, the /proc/net/stat/ip_conntrack shows global statistics.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:20 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: sysctl compatibility with old connection tracking
This patch adds an option to keep the connection tracking sysctls visible
under their old names.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:18 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: move conntrack protocol sysctls to individual modules
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:17 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: automatic sysctl registation for conntrack protocols
Add helper functions for sysctl registration with optional instantiating
of common path elements (like net/netfilter) and use it for support for
automatic registation of conntrack protocol sysctls.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:15 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: move extern declaration to header files
Using extern in a C file is a bad idea because the compiler can't
catch type errors.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Wed, 29 Nov 2006 01:35:14 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack_ftp: fix missing helper mask initilization
Signed-off-by: Patrick McHardy <kaber@trash.net>
Martin Josefsson [Wed, 29 Nov 2006 01:35:12 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: reduce timer updates in __nf_ct_refresh_acct()
Only update the conntrack timer if there's been at least HZ jiffies since
the last update. Reduces the number of del_timer/add_timer cycles from one
per packet to one per connection per second (plus once for each state change
of a connection)
Should handle timer wraparounds and connection timeout changes.
Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Martin Josefsson [Wed, 29 Nov 2006 01:35:11 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: remove unused struct list_head from protocols
Remove unused struct list_head from struct nf_conntrack_l3proto and
nf_conntrack_l4proto as all protocols are kept in arrays, not linked
lists.
Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Martin Josefsson [Wed, 29 Nov 2006 01:35:10 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: minor __nf_ct_refresh_acct() whitespace cleanup
Minor whitespace cleanup.
Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Martin Josefsson [Wed, 29 Nov 2006 01:35:09 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: remove ASSERT_{READ,WRITE}_LOCK
Remove the usage of ASSERT_READ_LOCK/ASSERT_WRITE_LOCK in nf_conntrack,
it didn't do anything, it was just an empty define and it uglified the code.
Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Martin Josefsson [Wed, 29 Nov 2006 01:35:08 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: more sanity checks in protocol registration/unregistration
Add some more sanity checks when registering/unregistering l3/l4 protocols.
Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Martin Josefsson [Wed, 29 Nov 2006 01:35:06 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: rename struct nf_conntrack_protocol
Rename 'struct nf_conntrack_protocol' to 'struct nf_conntrack_l4proto' in
order to help distinguish it from 'struct nf_conntrack_l3proto'. It gets
rather confusing with 'nf_conntrack_protocol'.
Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Martin Josefsson [Wed, 29 Nov 2006 01:35:04 +0000 (02:35 +0100)]
[NETFILTER]: More __read_mostly annotations
Place rarely written variables in the read-mostly section by using
__read_mostly
Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Martin Josefsson [Wed, 29 Nov 2006 01:35:03 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: split out protocol handling
This patch splits out L3/L4 protocol handling into its own file
nf_conntrack_proto.c
Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Martin Josefsson [Wed, 29 Nov 2006 01:35:01 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: split out the event cache
This patch splits out the event cache into its own file
nf_conntrack_ecache.c
Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Martin Josefsson [Wed, 29 Nov 2006 01:34:59 +0000 (02:34 +0100)]
[NETFILTER]: nf_conntrack: split out helper handling
This patch splits out handling of helpers into its own file
nf_conntrack_helper.c
Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Martin Josefsson [Wed, 29 Nov 2006 01:34:58 +0000 (02:34 +0100)]
[NETFILTER]: nf_conntrack: split out expectation handling
This patch splits out expectation handling into its own file
nf_conntrack_expect.c
Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
David S. Miller [Tue, 28 Nov 2006 22:37:38 +0000 (14:37 -0800)]
[TCP] Vegas: Increase default alpha to 2 and beta to 4.
This helps Vegas cope better with delayed ACKs, see
analysis at:
http://www.cs.caltech.edu/%7Eweixl/technical/ns2linux/known_linux/index.html#vegas
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Tue, 28 Nov 2006 21:55:06 +0000 (19:55 -0200)]
[DCCP]: Use `unsigned' for packet lengths
This patch implements a suggestion by Ian McDonald and
1) Avoids tests against negative packet lengths by using unsigned int
for packet payload lengths in the CCID send_packet()/packet_sent() routines
2) As a consequence, it removes an now unnecessary test with regard to `len > 0'
in ccid3_hc_tx_packet_sent: that condition is always true, since
* negative packet lengths are avoided
* ccid3_hc_tx_send_packet flags an error whenever the payload length is 0.
As a consequence, ccid3_hc_tx_packet_sent is never called as all errors
returned by ccid_hc_tx_send_packet are caught in dccp_write_xmit
3) Removes the third argument of ccid_hc_tx_send_packet (the `len' parameter),
since it is currently always set to skb->len. The code is updated with regard
to this parameter change.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Tue, 28 Nov 2006 21:51:42 +0000 (19:51 -0200)]
[DCCP] ccid3: Larger initial windows
This implements the larger-initial-windows feature for CCID 3, as described in
section 5 of RFC 4342. When the first feedback packet arrives, the sender can
send up to 2..4 packets per RTT, instead of just one.
The patch further
* reduces the number of timestamping calls by passing the timestamp value
(which is computed in one of the calling functions anyway) as argument
* renames one constant with a very long name into one which is shorter and
resembles the one in RFC 3448 (t_mbi)
* simplifies some of the min_t/max_t cases where both `x', `y' have the same
type
Commiter note: renamed TFRC_t_mbi to TFRC_T_MBI, to follow Linux coding style.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Arnaldo Carvalho de Melo [Tue, 28 Nov 2006 21:42:03 +0000 (19:42 -0200)]
[DCCP]: Make {set,get}sockopt(DCCP_SOCKOPT_PACKET_SIZE) return 0
To reflect the fact that this now is of no effect, not making apps
stop working, just be warned in the system log.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Tue, 28 Nov 2006 21:33:36 +0000 (19:33 -0200)]
[DCCP]: Tidy up unused structures
This removes and cleans up unused variables and structures which have become
unnecessary following the introduction of the EWMA patch to automatically track
the CCID 3 receiver/sender packet sizes `s'.
It deprecates the PACKET_SIZE socket option by returning an error code and
printing a deprecation warning if an application tries to read or write this
socket option.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Tue, 28 Nov 2006 21:22:33 +0000 (19:22 -0200)]
[DCCP] ccid3: Track RX/TX packet size `s' using moving-average
Problem:
Gerrit Renker [Tue, 28 Nov 2006 20:34:34 +0000 (18:34 -0200)]
[DCCP] ccid3: Set NoFeedback Timeout according to RFC 3448
This corrects the setting of the nofeedback timer with regard to RFC
3448 - previously it was not set to max(4*R, 2*s/X) as specified. Using
the maximum of 1 second as upper bound (as it was done before) can have
detrimental effects, especially if R is small.
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Tue, 28 Nov 2006 20:14:10 +0000 (18:14 -0200)]
[DCCP]: Remove allocation of sysctl numbers
This is in response to a request sent earlier by Eric W. Biederman
and replaces all sysctl numbers for net.dccp.default with CTL_UNNUMBERED.
It has been tested to compile and to work.
Commiter note: I've removed the use of CTL_UNNUMBERED, not setting .ctl_name
sets it to 0, that is the what CTL_UNNUMBERED is, reason is
to avoid unneeded source code cluttering.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Arnaldo Carvalho de Melo [Tue, 28 Nov 2006 05:11:33 +0000 (03:11 -0200)]
[INET]: Change protocol field in struct inet_protosw to u16
[acme@newtoy net-2.6.20]$ pahole /tmp/tcp_ipv6.o inet_protosw
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/net/protocol.h:69 */
struct inet_protosw {
struct list_head list; /* 0 8 */
short unsigned int type; /* 8 2 */
/* XXX 2 bytes hole, try to pack */
int protocol; /* 12 4 */
struct proto * prot; /* 16 4 */
const struct proto_ops * ops; /* 20 4 */
int capability; /* 24 4 */
char no_check; /* 28 1 */
unsigned char flags; /* 29 1 */
}; /* size: 32, sum members: 28, holes: 1, sum holes: 2, padding: 2 */
So that we can kill that hole, protocol can only go all the way to 255 (RAW).
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Arnaldo Carvalho de Melo [Tue, 28 Nov 2006 03:12:38 +0000 (01:12 -0200)]
[TCP]: Renove the __ prefix on the struct tcp_sock members
As this struct is not userland visible at all.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Arnaldo Carvalho de Melo [Tue, 28 Nov 2006 02:48:32 +0000 (00:48 -0200)]
[TCP]: Change tcp_header_len member in tcp_sock to u16
With this we eliminate the last hole in struct tcp_sock.
End result:
[acme@newtoy net-2.6.20]$ codiff -sV /tmp/tcp.o.before net/ipv4/tcp.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp.c:
struct tcp_sock | -4
tcp_header_len;
from: int /* 1000(0) 4(0) */
to: u16 /* 1000(0) 2(0) */
1 struct changed
[acme@newtoy net-2.6.20]$
Now sizeof(tcp_sock) is just...
[acme@newtoy net-2.6.20]$ pahole --sizes ../OUTPUT/qemu/net-2.6.20/net/ipv4/tcp.o | grep -w tcp_sock
struct tcp_sock: 1500 0
1500 bytes ;-)
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 27 Nov 2006 22:32:37 +0000 (20:32 -0200)]
[DCCP] ccid3: Consolidate handling of t_RTO
This patch
* removes setting t_RTO in ccid3_hc_tx_init (per [RFC 3448, 4.2], t_RTO is
undefined until feedback has been received);
* makes some trivial changes (updates of comments);
* performs a small optimisation by exploiting that the feedback timeout
uses the value of t_ipi. The way it is done is safe, because the timeouts
appear after the changes to t_ipi, ensuring that up-to-date values are used;
* in ccid3_hc_tx_packet_recv, moves the t_rto statement closer to the calculation
of the next_tmout. This makes the code clearer to read and is also safe, since
t_rto is not updated until the next call of ccid3_hc_tx_packet_recv, and is not
read by the functions called via ccid_wait_for_ccid();
* removes a `max' statement in sk_reset_timer, this is not needed since the timeout
value is always greater than 1E6 microseconds.
* adds `XXX'es to highlight that currently the nofeedback timer is set
in a non-standard way
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 27 Nov 2006 22:31:33 +0000 (20:31 -0200)]
[DCCP] ccid3: Consistently update t_nom, t_ipi, t_delta
This patch:
* consolidates updating of parameters (t_nom, t_ipi, t_delta) which
need to be updated at the same time, since they are inter-dependent
* removes two inline functions which are no longer needed as a result of
the above consolidation
* resolves a FIXME regarding the re-calculation of t_ipi within the nofeedback
timer, in the state where no feedback has previously been received
* ties updating these parameters to updating the sending rate X, exploiting
that all three parameters in turn depend on X; and using a small optimisation
which can reduce the number of required instructions: only update the three
parameters when X really changes
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 27 Nov 2006 22:29:27 +0000 (20:29 -0200)]
[DCCP] ccid3: Consolidate timer resets
This patch concerns updating the value of the nofeedback timer when no feedback
has been received so far.
Since in this case the value of R is still undefined according to [RFC 3448,
4.2], we can not perform step (3) of [RFC 3448, 4.3]. A clarification is
provided in [RFC 4342, sec. 5], which states that in these cases the nofeedback
timer (still) expires "after two seconds".
Many thanks to Ian McDonald for pointing this out and providing the
clarification.
The patch
* implements [RFC 4342, sec. 5] with regard to the above case
* consolidates handling timer restart by
- adding an appropriate jump label and
- initialising the timeout value
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Jamal Hadi Salim [Mon, 27 Nov 2006 20:59:30 +0000 (12:59 -0800)]
[XFRM]: Convert a few __u8 to proper u8
Caught by the EyeBalls(tm) of Thomas Graf
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Mon, 27 Nov 2006 20:58:20 +0000 (12:58 -0800)]
[XFRM]: Make flush notifier prettier when subpolicy used
Might as well make flush notifier prettier when subpolicy used
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Mon, 27 Nov 2006 19:58:59 +0000 (17:58 -0200)]
[XFRM]: Pack struct xfrm_policy
[acme@newtoy net-2.6.20]$ pahole net/ipv4/tcp.o xfrm_policy
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/linux/security.h:67 */
struct xfrm_policy {
struct xfrm_policy * next; /* 0 4 */
struct hlist_node bydst; /* 4 8 */
struct hlist_node byidx; /* 12 8 */
rwlock_t lock; /* 20 36 */
atomic_t refcnt; /* 56 4 */
struct timer_list timer; /* 60 24 */
u8 type; /* 84 1 */
/* XXX 3 bytes hole, try to pack */
u32 priority; /* 88 4 */
u32 index; /* 92 4 */
struct xfrm_selector selector; /* 96 56 */
struct xfrm_lifetime_cfg lft; /* 152 64 */
struct xfrm_lifetime_cur curlft; /* 216 32 */
struct dst_entry * bundles; /* 248 4 */
__u16 family; /* 252 2 */
__u8 action; /* 254 1 */
__u8 flags; /* 255 1 */
__u8 dead; /* 256 1 */
__u8 xfrm_nr; /* 257 1 */
/* XXX 2 bytes hole, try to pack */
struct xfrm_sec_ctx * security; /* 260 4 */
struct xfrm_tmpl xfrm_vec[6]; /* 264 360 */
}; /* size: 624, sum members: 619, holes: 2, sum holes: 5 */
So lets have just one hole instead of two, by moving 'type' to just before 'action',
end result:
[acme@newtoy net-2.6.20]$ codiff -s /tmp/tcp.o.before net/ipv4/tcp.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp.c:
struct xfrm_policy | -4
1 struct changed
[acme@newtoy net-2.6.20]$
[acme@newtoy net-2.6.20]$ pahole -c 64 net/ipv4/tcp.o xfrm_policy
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/linux/security.h:67 */
struct xfrm_policy {
struct xfrm_policy * next; /* 0 4 */
struct hlist_node bydst; /* 4 8 */
struct hlist_node byidx; /* 12 8 */
rwlock_t lock; /* 20 36 */
atomic_t refcnt; /* 56 4 */
struct timer_list timer; /* 60 24 */
u32 priority; /* 84 4 */
u32 index; /* 88 4 */
struct xfrm_selector selector; /* 92 56 */
struct xfrm_lifetime_cfg lft; /* 148 64 */
struct xfrm_lifetime_cur curlft; /* 212 32 */
struct dst_entry * bundles; /* 244 4 */
u16 family; /* 248 2 */
u8 type; /* 250 1 */
u8 action; /* 251 1 */
u8 flags; /* 252 1 */
u8 dead; /* 253 1 */
u8 xfrm_nr; /* 254 1 */
/* XXX 1 byte hole, try to pack */
struct xfrm_sec_ctx * security; /* 256 4 */
struct xfrm_tmpl xfrm_vec[6]; /* 260 360 */
}; /* size: 620, sum members: 619, holes: 1, sum holes: 1 */
Are there any fugly data dependencies here? None that I know.
In the process changed the removed the __ prefixed types, that are just for
userspace visible headers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Arnaldo Carvalho de Melo [Mon, 27 Nov 2006 19:58:02 +0000 (17:58 -0200)]
[NET]: Pack struct hh_cache
[acme@newtoy net-2.6.20]$ pahole net/ipv4/tcp.o hh_cache
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/linux/netdevice.h:190 */
struct hh_cache {
struct hh_cache * hh_next; /* 0 4 */
atomic_t hh_refcnt; /* 4 4 */
__be16 hh_type; /* 8 2 */
/* XXX 2 bytes hole, try to pack */
int hh_len; /* 12 4 */
int (*hh_output)(); /* 16 4 */
rwlock_t hh_lock; /* 20 36 */
long unsigned int hh_data[24]; /* 56 96 */
}; /* size: 152, sum members: 150, holes: 1, sum holes: 2 */
[acme@newtoy net-2.6.20]$ find net -name "*.[ch]" | xargs grep 'hh_len.\+=' | sort -u
net/atm/br2684.c: hh->hh_len = PADLEN + ETH_HLEN;
net/ethernet/eth.c: hh->hh_len = ETH_HLEN;
net/ipv4/ipconfig.c: int hh_len = LL_RESERVED_SPACE(dev);
net/ipv4/ip_output.c: hh_len = LL_RESERVED_SPACE(rt->u.dst.dev);
net/ipv4/ip_output.c: int hh_len = LL_RESERVED_SPACE(dev);
net/ipv4/netfilter.c: hh_len = (*pskb)->dst->dev->hard_header_len;
net/ipv4/raw.c: hh_len = LL_RESERVED_SPACE(rt->u.dst.dev);
net/ipv6/ip6_output.c: hh_len = LL_RESERVED_SPACE(rt->u.dst.dev);
net/ipv6/netfilter/ip6t_REJECT.c: hh_len = (dst->dev->hard_header_len + 15)&~15;
net/ipv6/raw.c: hh_len = LL_RESERVED_SPACE(rt->u.dst.dev);
[acme@newtoy net-2.6.20]$
[acme@newtoy net-2.6.20]$ find include -name "*.h" | xargs grep 'define ETH_HLEN'
include/linux/if_ether.h:#define ETH_HLEN 14 /* Total octets in header. */
(((dev)->hard_header_len&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
[acme@newtoy net-2.6.20]$ pahole net/ipv4/tcp.o net_device | grep hard_header_len
short unsigned int hard_header_len; /* 106 2 */
[acme@newtoy net-2.6.20]$
So I think we're safe in turning hh_len an u16, end result:
[acme@newtoy net-2.6.20]$ codiff -sV /tmp/tcp.o.before net/ipv4/tcp.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp.c:
struct hh_cache | -4
hh_len;
from: int /* 12(0) 4(0) */
to: u16 /* 10(0) 2(0) */
1 struct changed
[acme@newtoy net-2.6.20]$
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Arnaldo Carvalho de Melo [Mon, 27 Nov 2006 19:56:43 +0000 (17:56 -0200)]
[INET_CONNECTION_SOCK]: Pack struct inet_connection_sock_af_ops
We have a hole in:
[acme@newtoy net-2.6.20]$ pahole net/ipv6/tcp_ipv6.o inet_connection_sock_af_ops
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/net/inet_connection_sock.h:38 */
struct inet_connection_sock_af_ops {
int (*queue_xmit)(); /* 0 4 */
void (*send_check)(); /* 4 4 */
int (*rebuild_header)(); /* 8 4 */
int (*conn_request)(); /* 12 4 */
struct sock * (*syn_recv_sock)(); /* 16 4 */
int (*remember_stamp)(); /* 20 4 */
__u16 net_header_len; /* 24 2 */
/* XXX 2 bytes hole, try to pack */
int (*setsockopt)(); /* 28 4 */
int (*getsockopt)(); /* 32 4 */
int (*compat_setsockopt)(); /* 36 4 */
int (*compat_getsockopt)(); /* 40 4 */
void (*addr2sockaddr)(); /* 44 4 */
int sockaddr_len; /* 48 4 */
}; /* size: 52, sum members: 50, holes: 1, sum holes: 2 */
But we don't need sockaddr_len to be an int:
[acme@newtoy net-2.6.20]$ find net -name "*.[ch]" | xargs grep '\.sockaddr_len.\+=' | sort -u
net/dccp/ipv4.c: .sockaddr_len = sizeof(struct sockaddr_in),
net/dccp/ipv6.c: .sockaddr_len = sizeof(struct sockaddr_in6),
net/ipv4/tcp_ipv4.c: .sockaddr_len = sizeof(struct sockaddr_in),
net/ipv6/tcp_ipv6.c: .sockaddr_len = sizeof(struct sockaddr_in6),
net/sctp/ipv6.c: .sockaddr_len = sizeof(struct sockaddr_in6),
net/sctp/protocol.c: .sockaddr_len = sizeof(struct sockaddr_in),
[acme@newtoy net-2.6.20]$ pahole --sizes net/ipv6/tcp_ipv6.o | grep sockaddr_in
struct sockaddr_in: 16 0
struct sockaddr_in6: 28 0
[acme@newtoy net-2.6.20]$
So I turned sockaddr_len a 'u16', and now:
[acme@newtoy net-2.6.20]$ pahole net/ipv6/tcp_ipv6.o inet_connection_sock_af_ops
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/net/inet_connection_sock.h:38 */
struct inet_connection_sock_af_ops {
int (*queue_xmit)(); /* 0 4 */
void (*send_check)(); /* 4 4 */
int (*rebuild_header)(); /* 8 4 */
int (*conn_request)(); /* 12 4 */
struct sock * (*syn_recv_sock)(); /* 16 4 */
int (*remember_stamp)(); /* 20 4 */
u16 net_header_len; /* 24 2 */
u16 sockaddr_len; /* 26 2 */
int (*setsockopt)(); /* 28 4 */
int (*getsockopt)(); /* 32 4 */
int (*compat_setsockopt)(); /* 36 4 */
int (*compat_getsockopt)(); /* 40 4 */
void (*addr2sockaddr)(); /* 44 4 */
}; /* size: 48 */
So we've saved 4 bytes:
[acme@newtoy net-2.6.20]$ codiff -sV /tmp/tcp_ipv6.o.before net/ipv6/tcp_ipv6.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv6/tcp_ipv6.c:
struct inet_connection_sock_af_ops | -4
net_header_len;
from: __u16 /* 24(0) 2(0) */
to: u16 /* 24(0) 2(0) */
sockaddr_len;
from: int /* 48(0) 4(0) */
to: u16 /* 26(0) 2(0) */
1 struct changed
[acme@newtoy net-2.6.20]$
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 27 Nov 2006 17:29:59 +0000 (09:29 -0800)]
[UDP(-Lite)]: consolidate v4 and v6 get|setsockopt code
This patch consolidates set/getsockopt code between UDP(-Lite) v4 and 6. The
justification is that UDP(-Lite) is a transport-layer protocol and therefore
the socket option code (at least in theory) should be AF-independent.
Furthermore, there is the following code reduplication:
* do_udp{,v6}_getsockopt is 100% identical between v4 and v6
* do_udp{,v6}_setsockopt is identical up to the following differerence
--v4 in contrast to v4 additionally allows the experimental encapsulation
types UDP_ENCAP_ESPINUDP and UDP_ENCAP_ESPINUDP_NON_IKE
--the remainder is identical between v4 and v6
I believe that this difference is of little relevance.
The advantages in not duplicating twice almost completely identical code.
The patch further simplifies the interface of udp{,v6}_push_pending_frames,
since for the second argument (struct udp_sock *up) it always holds that
up = udp_sk(sk); where sk is the first function argument.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Mon, 27 Nov 2006 17:27:07 +0000 (09:27 -0800)]
[RTNETLINK]: Add rtnl_put_cacheinfo() to unify some code
IPv4, IPv6, and DECNet all use struct rta_cacheinfo in a similiar
way, therefore rtnl_put_cacheinfo() is added to reuse code.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Mon, 27 Nov 2006 17:25:58 +0000 (09:25 -0800)]
[NETLINK]: Remove unused dst_pid field in netlink_skb_parms
The destination PID is passed directly to netlink_unicast()
respectively netlink_multicast().
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Mon, 27 Nov 2006 14:31:45 +0000 (12:31 -0200)]
[NET]: Add documentation for TFRC structures
This adds documentation for the TFRC structure fields.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 27 Nov 2006 14:28:48 +0000 (12:28 -0200)]
[DCCP] ccid3: Resolve small FIXME
This considers the case - ACK received while no packet has been sent
so far. Resolved by printing a (rate-limited) warning message.
Further removes an unnecessary BUG_ON in ccid3_hc_tx_packet_recv,
received feedback on a terminating connection is simply ignored.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 27 Nov 2006 14:27:55 +0000 (12:27 -0200)]
[DCCP] ccid3: Remove redundant statements in ccid3_hc_tx_packet_sent
This patch removes a switch statement which is redundant since,
* nothing is done in states TFRC_SSTATE_NO_SENT/TFRC_SSTATE_NO_FBACK
* it is impossible that the function is called in the state TFRC_SSTATE_TERM, since
--the function is called, in dccp_write_xmit, after ccid3_hc_tx_send_packet
--if ccid3_hc_tx_send_packet is called in state TFRC_SSTATE_TERM, it returns
-EINVAL, which means that ccid3_hc_tx_packet_sent will not be called
(compare dccp_write_xmit)
--> therefore, this case is logically impossible
* the remaining state is TFRC_SSTATE_FBACK which conditionally updates t_ipi, t_nom,
and t_delta. This is a no-op, since
--t_ipi only changes when feedback is received
--however, when feedback arrives via ccid3_hc_tx_packet_recv, there is an identical
code block which performs the same set of operations
--performing the same set of operations again in ccid3_hc_tx_packet_sent therefore
does not change anything, since between the time of receiving the last feedback
(and therefore update of t_ipi, t_nom, and t_delta), the value of t_ipi has not
changed
--since t_ipi has not changed, the values of t_delta and t_nom also do not change,
they depend fully on t_ipi
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 27 Nov 2006 14:26:57 +0000 (12:26 -0200)]
[DCCP] ccid3: Avoid congestion control on zero-sized data packets
This resolves an `XXX' in ccid3_hc_tx_send_packet().
The function is only called on Data and DataAck packets and returns a negative
result on zero-sized messages. This is a reasonable policy since CCID 3 is a
congestion-control module and congestion control on zero-sized Data(Ack)
packets is in a way pathological.
The patch uses a more suitable error code for this case, it returns the Posix.1
code `EBADMSG' ("Not a data message") instead of `ENOTCONN'.
As a result of ignoring zero-sized packets, a the condition for a warning
"First packet is data" in ccid3_hc_tx_packet_sent is always satisfied; this
message has been removed since it will always be printed.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 27 Nov 2006 14:26:03 +0000 (12:26 -0200)]
[DCCP] ccid3: Simplify control flow of ccid3_hc_tx_send_packet
This makes some logically equivalent simplifications, by replacing
rc - values plus goto's with direct return statements.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 27 Nov 2006 14:25:10 +0000 (12:25 -0200)]
[DCCP] ccid3: Fix calculation of t_ipi time of scheduled transmission
Problem:
Gerrit Renker [Mon, 27 Nov 2006 14:22:48 +0000 (12:22 -0200)]
[DCCP] ccid3: Simplify control flow in the calculation of t_ipi
This patch performs a simplifying (performance) optimisation:
In each call of the inline function ccid3_calc_new_t_ipi(), the state is
tested against TFRC_SSTATE_NO_FBACK. This is expensive when the function
is called very often. A simpler solution, implemented by this patch, is
to adapt the control flow.
Background:
Gerrit Renker [Mon, 27 Nov 2006 14:13:38 +0000 (12:13 -0200)]
[DCCP] ccid3: Fix bug in calculation of first t_nom and first t_ipi
Problem: