Arnaldo Carvalho de Melo [Wed, 10 Aug 2005 03:07:35 +0000 (20:07 -0700)]
[INET]: Move tcp_port_rover to inet_hashinfo
Also expose all of the tcp_hashinfo members, i.e. killing those
tcp_ehash, etc macros, this will more clearly expose already generic
functions and some that need just a bit of work to become generic, as
we'll see in the upcoming changesets.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Wed, 10 Aug 2005 03:07:13 +0000 (20:07 -0700)]
[INET]: Generalise tcp_bind_hash & tcp_inherit_port
This required moving tcp_bucket_cachep to inet_hashinfo.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Wed, 10 Aug 2005 03:06:42 +0000 (20:06 -0700)]
[NETFILTER]: fix list traversal order in ctnetlink
Currently conntracks are inserted after the head. That means that
conntracks are sorted from the biggest to the smallest id. This happens
because we use list_prepend (list_add) instead list_add_tail. This can
result in problems during the list iteration.
list_for_each(i, &ip_conntrack_hash[cb->args[0]]) {
h = (struct ip_conntrack_tuple_hash *) i;
if (DIRECTION(h) != IP_CT_DIR_ORIGINAL)
continue;
ct = tuplehash_to_ctrack(h);
if (ct->id <= *id)
continue;
In that case just the first conntrack in the bucket will be dumped. To
fix this, we iterate the list from the tail to the head via
list_for_each_prev. Same thing for the list of expectations.
Signed-off-by: Pablo Neira Ayuso <pablo@eurodev.net>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Wed, 10 Aug 2005 03:06:27 +0000 (20:06 -0700)]
[NETFILTER]: Fix typo in ctnl_exp_cb array (no bug, just memory waste)
This fixes the size of the ctnl_exp_cb array that is IPCTNL_MSG_EXP_MAX
instead of IPCTNL_MSG_MAX. Simple typo.
Signed-off-by: Pablo Neira Ayuso <pablo@eurodev.net>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Wed, 10 Aug 2005 03:06:11 +0000 (20:06 -0700)]
[NETFILTER]: fix conntrack refcount leak in unlink_expect()
In unlink_expect(), the expectation is removed from the list so the
refcount must be dropped as well.
Signed-off-by: Pablo Neira Ayuso <pablo@eurodev.net>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Wed, 10 Aug 2005 03:05:52 +0000 (20:05 -0700)]
[NETFILTER]: ctnetlink: make sure event order is correct
The following sequence is displayed during events dumping of an ICMP
connection: [NEW] [DESTROY] [UPDATE]
This happens because the event IPCT_DESTROY is delivered in
death_by_timeout(), that is called from the icmp protocol helper
(ct->timeout.function) once we see the reply.
To fix this, we move this event to destroy_conntrack().
Signed-off-by: Pablo Neira Ayuso <pablo@eurodev.net>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 03:04:07 +0000 (20:04 -0700)]
[NETFILTER]: don't use nested attributes for conntrack_expect
We used to use nested nfattr structures for ip_conntrack_expect. This is
bogus, since ip_conntrack and ip_conntrack_expect are communicated in
different netlink message types. both should be encoded at the top level
attributes, no extra nesting required. This patch addresses the issue.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 03:03:54 +0000 (20:03 -0700)]
[NETFILTER]: cleanup nfnetlink_check_attributes()
1) memset return parameter 'cda' (nfattr pointer array) only on success
2) a message without attributes and just a 'struct nfgenmsg' is valid,
don't return -EINVAL
3) use likely() and unlikely() where apropriate
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 03:03:40 +0000 (20:03 -0700)]
[NETFILTER]: attribute count is an attribute of message type, not subsytem
Prior to this patch, every nfnetlink subsystem had to specify it's
attribute count. However, in reality the attribute count depends on
the message type within the subsystem, not the subsystem itself. This
patch moves 'attr_count' from 'struct nfnetlink_subsys' into
nfnl_callback to fix this.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 03:03:22 +0000 (20:03 -0700)]
[NETFILTER]: fix ctnetlink 'create_expect' parsing
There was a stupid copy+paste mistake where we parse the MASK nfattr into
the "tuple" variable instead of the "mask" variable. This patch fixes it.
Thanks to Pablo Neira.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira [Wed, 10 Aug 2005 03:02:55 +0000 (20:02 -0700)]
[NETFILTER]: conntrack_netlink: Fix locking during conntrack_create
The current codepath allowed for ip_conntrack_lock to be unlock'ed twice.
Signed-off-by: Pablo Neira <pablo@eurodev.net>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira [Wed, 10 Aug 2005 03:02:36 +0000 (20:02 -0700)]
[NETFILTER]: remove bogus memset() calls from ip_conntrack_netlink.c
nfattr_parse_nested() calls nfattr_parse() which in turn does a memset
on the 'tb' array. All callers therefore don't need to memset before
calling it.
Signed-off-by: Pablo Neira <pablo@eurodev.net>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 10 Aug 2005 03:02:13 +0000 (20:02 -0700)]
[NETFILTER]: Fix multiple problems with the conntrack event cache
refcnt underflow: the reference count is decremented when a conntrack
entry is removed from the hash but it is not incremented when entering
new entries.
missing protection of process context against softirq context: all
cache operations need to locally disable softirqs to avoid races.
Additionally the event cache can't be initialized when a packet
enteres the conntrack code but needs to be initialized whenever we
cache an event and the stored conntrack entry doesn't match the
current one.
incorrect flushing of the event cache in ip_ct_iterate_cleanup:
without real locking we can't flush the cache for different CPUs
without incurring races. The cache for different CPUs can only be
flushed when no packets are going through the
code. ip_ct_iterate_cleanup doesn't need to drop all references, so
flushing is moved to the cleanup path.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Wed, 10 Aug 2005 03:01:14 +0000 (20:01 -0700)]
[INET]: Move bind_hash from tcp_sk to inet_sk
This should really be in a inet_connection_sock, but I'm leaving it
for a later optimization, when some more fields common to INET
transport protocols now in tcp_sk or inet_sk will be chunked out into
inet_connection_sock, for now its better to concentrate on getting the
changes in the core merged to leave the DCCP tree with only DCCP
specific code.
Next changesets will take advantage of this move to generalise things
like tcp_bind_hash, tcp_put_port, tcp_inherit_port, making the later
receive a inet_hashinfo parameter, and even __tcp_tw_hashdance, etc in
the future, when tcp_tw_bucket gets transformed into the struct
timewait_sock hierarchy.
tcp_destroy_sock also is eligible as soon as tcp_orphan_count gets
moved to sk_prot.
A cascade of incremental changes will ultimately make the tcp_lookup
functions be fully generic.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Wed, 10 Aug 2005 03:00:51 +0000 (20:00 -0700)]
[INET]: Move the TCP hashtable functions/structs to inet_hashtables.[ch]
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Wed, 10 Aug 2005 02:59:44 +0000 (19:59 -0700)]
[INET]: Just rename the TCP hashtable functions/structs to inet_
This is to break down the complexity of the series of patches,
making it very clear that this one just does:
1. renames tcp_ prefixed hashtable functions and data structures that
were already mostly generic to inet_ to share it with DCCP and
other INET transport protocols.
2. Removes not used functions (__tb_head & tb_head)
3. Removes some leftover prototypes in the headers (tcp_bucket_unlock &
tcp_v4_build_header)
Next changesets will move tcp_sk(sk)->bind_hash to inet_sock so that we can
make functions such as tcp_inherit_port, __tcp_inherit_port, tcp_v4_get_port,
__tcp_put_port, generic and get others like tcp_destroy_sock closer to generic
(tcp_orphan_count will go to sk->sk_prot to allow this).
Eventually most of these functions will be used passing the transport protocol
inet_hashinfo structure.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Wed, 10 Aug 2005 02:59:20 +0000 (19:59 -0700)]
[INET]: Move the TCP ehash functions to include/net/inet_hashtables.h
To be shared with DCCP (and others), this is the start of a series of patches
that will expose the already generic TCP hash table routines.
The few changes noticed when calling gcc -S before/after on a pentium4 were of
this type:
movl 40(%esp), %edx
cmpl %esi, 472(%edx)
je .L168
- pushl $291
+ pushl $272
pushl $.LC0
pushl $.LC1
pushl $.LC2
[acme@toy net-2.6.14]$ size net/ipv4/tcp_ipv4.before.o net/ipv4/tcp_ipv4.after.o
text data bss dec hex filename
17804 516 140 18460 481c net/ipv4/tcp_ipv4.before.o
17804 516 140 18460 481c net/ipv4/tcp_ipv4.after.o
Holler if some weird architecture has issues with things like this 8)
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 02:58:39 +0000 (19:58 -0700)]
[NETFILTER]: Add new "nfnetlink_log" userspace packet logging facility
This is a generic (layer3 independent) version of what ipt_ULOG is already
doing for IPv4 today. ipt_ULOG, ebt_ulog and finally also ip[6]t_LOG will
be deprecated by this mechanism in the long term.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 02:58:27 +0000 (19:58 -0700)]
[NETFILTER]: Extend netfilter logging API
This patch is in preparation to nfnetlink_log:
- loggers now have to register struct nf_logger instead of nf_logfn
- nf_log_unregister() replaced by nf_log_unregister_pf() and
nf_log_unregister_logger()
- add comment to ip[6]t_LOG.h to assure nobody redefines flags
- add /proc/net/netfilter/nf_log to tell user which logger is currently
registered for which address family
- if user has configured logging, but no logging backend (logger) is
available, always spit a message to syslog, not just the first time.
- split ip[6]t_LOG.c into two parts:
Backend: Always try to register as logger for the respective address family
Frontend: Always log via nf_log_packet() API
- modify all users of nf_log_packet() to accomodate additional argument
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 02:50:45 +0000 (19:50 -0700)]
[NETFILTER]: Add refcounting and /proc/net/netfilter interface to nfnetlink_queue
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Wed, 10 Aug 2005 02:50:02 +0000 (19:50 -0700)]
[INET]: Introduce inet_sk_rebuild_header
From tcp_v4_rebuild_header, that already was pretty generic, I only
needed to use sk->sk_protocol instead of the hardcoded IPPROTO_TCP and
establish the requirement that INET transport layer protocols that
want to use this function map TCP_SYN_SENT to its equivalent state.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Wed, 10 Aug 2005 02:49:02 +0000 (19:49 -0700)]
[SOCK]: Introduce sk_setup_caps
From tcp_v4_setup_caps, that always is preceded by a call to
__sk_dst_set, so coalesce this sequence into sk_setup_caps, removing
one call to a TCP function in the IP layer.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Wed, 10 Aug 2005 02:47:37 +0000 (19:47 -0700)]
[SOCK]: Rename __tcp_v4_rehash to __sk_prot_rehash
This operation was already generic and DCCP will use it.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Wed, 10 Aug 2005 02:45:38 +0000 (19:45 -0700)]
[NET]: Cleanup INET_REFCNT_DEBUG code
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 10 Aug 2005 02:45:02 +0000 (19:45 -0700)]
[IPV4/6]: Check if packet was actually delivered to a raw socket to decide whether to send an ICMP unreachable
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew McDonald [Wed, 10 Aug 2005 02:44:42 +0000 (19:44 -0700)]
[IPV6]: Check interface bindings on IPv6 raw socket reception
Take account of whether a socket is bound to a particular device when
selecting an IPv6 raw socket to receive a packet. Also perform this
check when receiving IPv6 packets with router alert options.
Signed-off-by: Andrew McDonald <andrew@mcdonald.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 02:44:15 +0000 (19:44 -0700)]
[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink
- Add new nfnetlink_queue module
- Add new ipt_NFQUEUE and ip6t_NFQUEUE modules to access queue numbers 1-65535
- Mark ip_queue and ip6_queue Kconfig options as OBSOLETE
- Update feature-removal-schedule to remove ip[6]_queue in December
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 02:43:44 +0000 (19:43 -0700)]
[NETFILTER]: Core changes required by upcoming nfnetlink_queue code
- split netfiler verdict in 16bit verdict and 16bit queue number
- add 'queuenum' argument to nf_queue_outfn_t and its users ip[6]_queue
- move NFNL_SUBSYS_ definitions from enum to #define
- introduce autoloading for nfnetlink subsystem modules
- add MODULE_ALIAS_NFNL_SUBSYS macro
- add nf_unregister_queue_handlers() to register all handlers for a given
nf_queue_outfn_t
- add more verbose DEBUGP macro definition to nfnetlink.c
- make nfnetlink_subsys_register fail if subsys already exists
- add some more comments and debug statements to nfnetlink.c
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 02:42:34 +0000 (19:42 -0700)]
[NETFILTER]: Move reroute-after-queue code up to the nf_queue layer.
The rerouting functionality is required by the core, therefore it has
to be implemented by the core and not in individual queue handlers.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 02:40:55 +0000 (19:40 -0700)]
[NETLINK]: Add properly module refcounting for kernel netlink sockets.
- Remove bogus code for compiling netlink as module
- Add module refcounting support for modules implementing a netlink
protocol
- Add support for autoloading modules that implement a netlink protocol
as soon as someone opens a socket for that protocol
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 02:39:00 +0000 (19:39 -0700)]
[NETFILTER]: Move ipv4 specific code from net/core/netfilter.c to net/ipv4/netfilter.c
Netfilter cleanup
- Move ipv4 code from net/core/netfilter.c to net/ipv4/netfilter.c
- Move ipv6 netfilter code from net/ipv6/ip6_output.c to net/ipv6/netfilter.c
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 02:37:23 +0000 (19:37 -0700)]
[NETFILTER]: Rename skb_ip_make_writable() to skb_make_writable()
There is nothing IPv4-specific in it. In fact, it was already used by
IPv6, too... Upcoming nfnetlink_queue code will use it for any kind
of packet.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 10 Aug 2005 02:36:53 +0000 (19:36 -0700)]
[NETFILTER]: C99 initizalizers for NAT protocols
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 Aug 2005 02:36:29 +0000 (19:36 -0700)]
[NET]: Remove explicit initializations of skb->input_dev
Instead, set it in one place, namely the beginning of
netif_receive_skb().
Based upon suggestions from Jamal Hadi Salim.
Signed-off-by: David S. Miller <davem@davemloft.net>
Adrian Bunk [Wed, 10 Aug 2005 02:35:47 +0000 (19:35 -0700)]
[IPV4]: possible cleanups
This patch contains the following possible cleanups:
- make needlessly global code static
- #if 0 the following unused global function:
- xfrm4_state.c: xfrm4_state_fini
- remove the following unneeded EXPORT_SYMBOL's:
- ip_output.c: ip_finish_output
- ip_output.c: sysctl_ip_default_ttl
- fib_frontend.c: ip_dev_find
- inetpeer.c: inet_peer_idlock
- ip_options.c: ip_options_compile
- ip_options.c: ip_options_undo
- net/core/request_sock.c: sysctl_max_syn_backlog
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 Aug 2005 02:34:12 +0000 (19:34 -0700)]
[NET]: Kill skb->real_dev
Bonding just wants the device before the skb_bond()
decapsulation occurs, so simply pass that original
device into packet_type->func() as an argument.
It remains to be seen whether we can use this same
exact thing to get rid of skb->input_dev as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 10 Aug 2005 02:33:51 +0000 (19:33 -0700)]
[NET]: Reduce tc_index/tc_verd to u16
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Wed, 10 Aug 2005 02:33:31 +0000 (19:33 -0700)]
[REQSK]: Move the syn_table destroy from tcp_listen_stop to reqsk_queue_destroy
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 02:32:58 +0000 (19:32 -0700)]
[NETFILTER]: Add ctnetlink subsystem
Add ctnetlink subsystem for userspace-access to ip_conntrack table.
This allows reading and updating of existing entries, as well as
creating new ones (and new expect's) via nfnetlink.
Please note the 'strange' byte order: nfattr (tag+length) are in host
byte order, while the payload is always guaranteed to be in network
byte order. This allows a simple userspace process to encapsulate netlink
messages into arch-independent udp packets by just processing/swapping the
headers and not knowing anything about the actual payload.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Wed, 10 Aug 2005 02:31:17 +0000 (19:31 -0700)]
[NET]: Remove HIPPI private from skbuff.h
This removes the private element from skbuff, that is only used by
HIPPI. Instead it uses skb->cb[] to hold the additional data that is
needed in the output path from hard_header to device driver.
PS: The only qdisc that might potentially corrupt this cb[] is if
netem was used over HIPPI. I will take care of that by fixing netem
to use skb->stamp. I don't expect many users of netem over HIPPI
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 10 Aug 2005 02:30:51 +0000 (19:30 -0700)]
[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options
Allows overriding of sysctl_{wmem,rmrm}_max
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 02:30:24 +0000 (19:30 -0700)]
[NETFITLER]: Add nfnetlink layer.
Introduce "nfnetlink" (netfilter netlink) layer. This layer is used as
transport layer for all userspace communication of the new upcoming
netfilter subsystems, such as ctnetlink, nfnetlink_queue and some day even
the mythical pkttables ;)
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 02:28:03 +0000 (19:28 -0700)]
[NETFILTER]: connection tracking event notifiers
This adds a notifier chain based event mechanism for ip_conntrack state
changes. As opposed to the previous implementations in patch-o-matic, we
do no longer need a field in the skb to achieve this.
Thanks to the valuable input from Patrick McHardy and Rusty on the idea
of a per_cpu implementation.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 10 Aug 2005 02:25:56 +0000 (19:25 -0700)]
[NET]: Kill skb->tc_classid
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 Aug 2005 02:25:21 +0000 (19:25 -0700)]
[NET]: Kill skb->list
Remove the "list" member of struct sk_buff, as it is entirely
redundant. All SKB list removal callers know which list the
SKB is on, so storing this in sk_buff does nothing other than
taking up some space.
Two tricky bits were SCTP, which I took care of, and two ATM
drivers which Francois Romieu <romieu@fr.zoreil.com> fixed
up.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Harald Welte [Wed, 10 Aug 2005 02:24:19 +0000 (19:24 -0700)]
[NETFILTER]: reduce netfilter sk_buff enlargement
As discussed at netconf'05, we're trying to save every bit in sk_buff.
The patch below makes sk_buff 8 bytes smaller. I did some basic
testing on my notebook and it seems to work.
The only real in-tree user of nfcache was IPVS, who only needs a
single bit. Unfortunately I couldn't find some other free bit in
sk_buff to stuff that bit into, so I introduced a separate field for
them. Maybe the IPVS guys can resolve that to further save space.
Initially I wanted to shrink pkt_type to three bits (PACKET_HOST and
alike are only 6 values defined), but unfortunately the bluetooth code
overloads pkt_type :(
The conntrack-event-api (out-of-tree) uses nfcache, but Rusty just
came up with a way how to do it without any skb fields, so it's safe
to remove it.
- remove all never-implemented 'nfcache' code
- don't have ipvs code abuse 'nfcache' field. currently get's their own
compile-conditional skb->ipvs_property field. IPVS maintainers can
decide to move this bit elswhere, but nfcache needs to die.
- remove skb->nfcache field to save 4 bytes
- move skb->nfctinfo into three unused bits to save further 4 bytes
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 02:22:01 +0000 (19:22 -0700)]
[NETFILTER]: convert nfmark and conntrack mark to 32bit
As discussed at netconf'05, we convert nfmark and conntrack-mark to be
32bits even on 64bit architectures.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 29 Aug 2005 20:54:35 +0000 (13:54 -0700)]
Merge refs/heads/upstream from /linux/kernel/git/jgarzik/libata-dev
Jeff Garzik [Mon, 29 Aug 2005 19:59:42 +0000 (15:59 -0400)]
Merge /spare/repo/linux-2.6/
David S. Miller [Mon, 29 Aug 2005 19:46:22 +0000 (12:46 -0700)]
[SPARC64]: More fully work around Spitfire Errata 51.
It appears that a memory barrier soon after a mispredicted
branch, not just in the delay slot, can cause the hang
condition of this cpu errata.
So move them out-of-line, and explicitly put them into
a "branch always, predict taken" delay slot which should
fully kill this problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 29 Aug 2005 19:46:07 +0000 (12:46 -0700)]
[SPARC64]: Make debugging spinlocks usable again.
When the spinlock routines were moved out of line into
kernel/spinlock.c this made it so that the debugging
spinlocks record lock acquisition program counts in the
kernel/spinlock.c functions not in their callers.
This makes the debugging info kind of useless.
So record the correct caller's program counter and
now this feature is useful once more.
Signed-off-by: David S. Miller <davem@davemloft.net>
Kumar Gala [Mon, 29 Aug 2005 19:45:44 +0000 (12:45 -0700)]
[SPARC]: remove use of asm/segment.h
Removed sparc architecture specific users of asm/segment.h and
asm-sparc/segment.h itself
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kumar Gala [Mon, 29 Aug 2005 19:45:30 +0000 (12:45 -0700)]
[SPARC64]: remove use of asm/segment.h
Removed sparc64 architecture specific users of asm/segment.h and
asm-sparc64/segment.h itself
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 29 Aug 2005 19:45:11 +0000 (12:45 -0700)]
[SPARC64]: Revamp Spitfire error trap handling.
Current uncorrectable error handling was poor enough
that the processor could just loop taking the same
trap over and over again. Fix things up so that we
at least get a log message and perhaps even some register
state.
In the process, much consolidation became possible,
particularly with the correctable error handler.
Prefix assembler and C function names with "spitfire"
to indicate that these are for Ultra-I/II/IIi/IIe only.
More work is needed to make these routines robust and
featureful to the level of the Ultra-III error handlers.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 29 Aug 2005 19:44:57 +0000 (12:44 -0700)]
[SPARC64]: Do not call winfix_dax blindly
Verify we really are taking a data access exception trap, at TL1, from
one of the window spill/fill handlers.
Else call a new function, data_access_exception_tl1, to log the error.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 29 Aug 2005 19:44:40 +0000 (12:44 -0700)]
[SPARC64]: Fix trap state reading for instruction_access_exception.
1) Read ASI_IMMU SFSR not ASI_DMMU.
2) IMMU has no SFAR, read TPC instead
3) Delete old and incorrect comment about the DTLB protection
trap having a dependency on the SFSR contents in order to
function correctly
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Garzik [Mon, 29 Aug 2005 19:12:56 +0000 (15:12 -0400)]
[libata sata_nv] NVIDIA ok'd license change from OSL+GPL to GPL
Al Viro [Sun, 28 Aug 2005 02:52:22 +0000 (03:52 +0100)]
[PATCH] missing include in smc-ultra
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Al Viro [Sun, 28 Aug 2005 02:47:50 +0000 (03:47 +0100)]
[PATCH] missing include in tda80xx
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Al Viro [Sun, 28 Aug 2005 02:19:14 +0000 (03:19 +0100)]
[PATCH] mod_devicetable.h fixes
* ieee1394_device_id has kernel_ulong_t field after an odd number of
__u32 ones. Since mod_devicetable.h is included both from kernel and
from host build helper, we may be in trouble if we are building on
32bit host for 64bit target - userland sees unsigned long long,
kernel sees unsigned long and while their sizes match, alignments
might not. Fixed by forcing alignment. Fortunately, almost nobody
else needs that - the rest of such fields is naturally aligned as it
is.
* of_device_id has void * in it. Host userland helpers need
kernel_ulong_t instead, since their void * might have nothing to do
with the kernel one. Fixed in the same way it's done for similar
problems in pcmcia_device_id (ifdef __KERNEL__).
* pcmcia_device_id has the same problem as ieee1394_device_id. Fixed
the same way.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Benjamin LaHaise [Sun, 28 Aug 2005 22:05:17 +0000 (18:05 -0400)]
[PATCH] new name for 2.6.14
We've had Woozy Numbat for a while now. Here's an updated name care of
Jeff Garzik and myself.
Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Mon, 29 Aug 2005 17:36:48 +0000 (10:36 -0700)]
Merge HEAD from /linux/kernel/git/roland/infiniband.git
Linus Torvalds [Mon, 29 Aug 2005 17:35:43 +0000 (10:35 -0700)]
Merge HEAD from /home/rmk/linux-2.6-arm.git
Linus Torvalds [Mon, 29 Aug 2005 17:35:21 +0000 (10:35 -0700)]
Merge HEAD from /home/rmk/linux-2.6-mmc.git
Linus Torvalds [Mon, 29 Aug 2005 17:34:59 +0000 (10:34 -0700)]
Merge HEAD from /home/rmk/linux-2.6-serial.git
Linus Torvalds [Mon, 29 Aug 2005 17:34:31 +0000 (10:34 -0700)]
Merge HEAD from /home/rmk/linux-2.6-ucb.git
Linus Torvalds [Mon, 29 Aug 2005 17:04:37 +0000 (10:04 -0700)]
Merge refs/heads/upstream from /linux/kernel/git/jgarzik/netdev-2.6
Linus Torvalds [Mon, 29 Aug 2005 17:03:46 +0000 (10:03 -0700)]
Merge refs/heads/upstream from /linux/kernel/git/jgarzik/libata-dev
Steven Rostedt [Mon, 29 Aug 2005 15:44:09 +0000 (11:44 -0400)]
[PATCH] convert signal handling of NODEFER to act like other Unix boxes.
It has been reported that the way Linux handles NODEFER for signals is
not consistent with the way other Unix boxes handle it. I've written a
program to test the behavior of how this flag affects signals and had
several reports from people who ran this on various Unix boxes,
confirming that Linux seems to be unique on the way this is handled.
The way NODEFER affects signals on other Unix boxes is as follows:
1) If NODEFER is set, other signals in sa_mask are still blocked.
2) If NODEFER is set and the signal is in sa_mask, then the signal is
still blocked. (Note: this is the behavior of all tested but Linux _and_
NetBSD 2.0 *).
The way NODEFER affects signals on Linux:
1) If NODEFER is set, other signals are _not_ blocked regardless of
sa_mask (Even NetBSD doesn't do this).
2) If NODEFER is set and the signal is in sa_mask, then the signal being
handled is not blocked.
The patch converts signal handling in all current Linux architectures to
the way most Unix boxes work.
Unix boxes that were tested: DU4, AIX 5.2, Irix 6.5, NetBSD 2.0, SFU
3.5 on WinXP, AIX 5.3, Mac OSX, and of course Linux 2.6.13-rcX.
* NetBSD was the only other Unix to behave like Linux on point #2. The
main concern was brought up by point #1 which even NetBSD isn't like
Linux. So with this patch, we leave NetBSD as the lonely one that
behaves differently here with #2.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andy Fleming [Wed, 24 Aug 2005 23:46:21 +0000 (18:46 -0500)]
[PATCH] PHY Layer fixup
This patch adds back the code that was taken out, thus re-enabling:
* The PHY Layer to initialize without crashing
* Drivers to actually connect to PHYs
* The entire PHY Control Layer
This patch is used by the gianfar driver, and other drivers which are in
development.
Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Jeff Garzik [Mon, 29 Aug 2005 00:18:39 +0000 (20:18 -0400)]
[libata] license change, other bits
- changes license of all code from OSL+GPL to plain ole GPL
- except for NVIDIA, who hasn't yet responded about sata_nv
- copyright holders were already contacted privately
- adds info in each driver about where hardware/protocol docs may be
obtained
- where I have made major contributions, updated copyright dates
Linus Torvalds [Sun, 28 Aug 2005 23:41:01 +0000 (16:41 -0700)]
Linux v2.6.13
Pavel Machek [Sun, 28 Aug 2005 21:39:08 +0000 (22:39 +0100)]
[ARM] drop i386-isms from arm Kconfig
This kills i386-specific stuff from arm Kconfig. Please apply,
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Heiko Carstens [Sun, 28 Aug 2005 20:22:37 +0000 (13:22 -0700)]
[PATCH] zfcp: bugfix and compile fixes
Bugfix (usage of uninitialized pointer in zfcp_port_dequeue) and compile
fixes for the zfcp device driver.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alexey Dobriyan [Sun, 28 Aug 2005 11:33:53 +0000 (15:33 +0400)]
[PATCH] zfcp: fix compilation due to rports changes
struct zfcp_port::scsi_id was removed by commit
3859f6a248cbdfbe7b41663f3a2b51f48e30b281
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Sun, 28 Aug 2005 01:05:14 +0000 (18:05 -0700)]
Merge refs/heads/upstream-fixes from /linux/kernel/git/jgarzik/netdev-2.6
Paul Mackerras [Sat, 27 Aug 2005 23:40:01 +0000 (09:40 +1000)]
[PATCH] Remove race between con_open and con_close
[ Same race and same patch also by Steven Rostedt <rostedt@goodmis.org> ]
I have a laptop (G3 powerbook) which will pretty reliably hit a race
between con_open and con_close late in the boot process and oops in
vt_ioctl due to tty->driver_data being NULL.
What happens is this: process A opens /dev/tty6; it comes into
con_open() (drivers/char/vt.c) and assign a non-NULL value to
tty->driver_data. Then process A closes that and concurrently process
B opens /dev/tty6. Process A gets through con_close() and clears
tty->driver_data, since tty->count == 1. However, before process A
can decrement tty->count, we switch to process B (e.g. at the
down(&tty_sem) call at drivers/char/tty_io.c line 1626).
So process B gets to run and comes into con_open with tty->count == 2,
as tty->count is incremented (in init_dev) before con_open is called.
Because tty->count != 1, we don't set tty->driver_data. Then when the
process tries to do anything with that fd, it oopses.
The simple and effective fix for this is to test tty->driver_data
rather than tty->count in con_open. The testing and setting of
tty->driver_data is serialized with respect to the clearing of
tty->driver_data in con_close by the console_sem. We can't get a
situation where con_open sees tty->driver_data != NULL and then
con_close on a different fd clears tty->driver_data, because
tty->count is incremented before con_open is called. Thus this patch
eliminates the race, and in fact with this patch my laptop doesn't
oops.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[ Same patch
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
in http://marc.theaimsgroup.com/?l=linux-kernel&m=
112450820432121&w=2 ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andreas Herrmann [Sat, 27 Aug 2005 18:07:54 +0000 (11:07 -0700)]
[PATCH] zfcp: add rports to enable scsi_add_device to work again
This patch fixes a severe problem with 2.6.13-rc7.
Due to recent SCSI changes it is not possible to add any LUNs to the zfcp
device driver anymore. With registration of remote ports this is fixed.
Signed-off-by: Andreas Herrmann <aherrman@de.ibm.com>
Acked-by: James Bottomley <jejb@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jan Blunck [Sat, 27 Aug 2005 18:07:52 +0000 (11:07 -0700)]
[PATCH] sg.c: fix a memory leak in devices seq_file implementation
I know that scsi procfs is legacy code but this is a fix for a memory leak.
While reading through sg.c I realized that the implementation of
/proc/scsi/sg/devices with seq_file is leaking memory due to freeing the
pointer returned by the next() iterator method. Since next() might return
NULL or an error this is wrong. This patch fixes it through using the
seq_files private field for holding the reference to the iterator object.
Here is a small bash script to trigger the leak. Use slabtop to watch
the size-32 usage grow and grow.
#!/bin/sh
while true; do
cat /proc/scsi/sg/devices > /dev/null
done
Signed-off-by: Jan Blunck <j.blunck@tu-harburg.de>
Acked-by: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patrick Boettcher [Sat, 27 Aug 2005 17:30:30 +0000 (19:30 +0200)]
[PATCH] fix for race problem in DVB USB drivers (dibusb)
Fixed race between submitting streaming URBs in the driver and starting
the actual transfer in hardware (demodulator and USB controller) which
sometimes lead to garbled data transfers. URBs are now submitted first,
then the transfer is enabled. Dibusb devices and clones are now fully
functional again.
Signed-off-by: Patrick Boettcher <pb@linuxtv.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
James Morris [Sat, 27 Aug 2005 11:47:06 +0000 (13:47 +0200)]
[PATCH] Fix capifs bug in initialization error path.
This fixes a bug in the capifs initialization code, where the
filesystem is not unregistered if kern_mount() fails.
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric W. Biederman [Sat, 27 Aug 2005 06:56:18 +0000 (00:56 -0600)]
[PATCH] acpi_shutdown: Only prepare for power off on power_off
When acpi_sleep_prepare was moved into a shutdown method we
started calling it for all shutdowns.
It appears this triggers some systems to power off on reboot.
Avoid this by only calling acpi_sleep_prepare if we are going to power
off the system.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Al Viro [Sat, 27 Aug 2005 05:48:15 +0000 (06:48 +0100)]
[PATCH] mmaper_kern.c fixes [buffer overruns]
- copy_from_user() can fail; ->write() must check its return value.
- severe buffer overruns both in ->read() and ->write() - lseek to the
end (i.e. to mmapper_size) and
if (count + *ppos > mmapper_size)
count = count + *ppos - mmapper_size;
will do absolutely nothing. Then it will call
copy_to_user(buf,&v_buf[*ppos],count);
with obvious results (similar for ->write()).
Fixed by turning read to simple_read_from_buffer() and by doing
normal limiting of count in ->write().
- gratitious lock_kernel() in ->mmap() - it's useless there.
- lots of gratuitous includes.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Francois Romieu [Tue, 23 Aug 2005 23:14:23 +0000 (01:14 +0200)]
[PATCH] r8169: avoid conflict between revisions 2 and 3 of the Linksys EG1032
Both revisions share the same PCI device ID and vendor ID but revision 2
of the device uses SysKonnect's chipset whereas revision 3 of the device
uses Realtek's 8169 chipset.
Credit goes to Christiaan Lutzer <mythtv.lutzer@gmail.com> for reporting
the issue and giving the actual value for the different revisions.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Ralf Baechle [Wed, 24 Aug 2005 17:06:36 +0000 (18:06 +0100)]
[PATCH] SMP rewrite of mkiss
Rewrite the mkiss driver to make it SMP-proof following the example of
6pack.c.
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Ralf Baechle [Wed, 24 Aug 2005 17:01:33 +0000 (18:01 +0100)]
[PATCH] Fix 6pack setting of MAC address
Don't check type of sax25_family; dev_set_mac_address has already done
that before and anyway, the type to check against would have been
ARPHRD_AX25. We only got away because AF_AX25 and ARPHRD_AX25 both happen
to be defined to the same value.
Don't check sax25_ndigis either; it's value is insignificant for the
purpose of setting the MAC address and the check has shown to break
some application software for no good reason.
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Ralf Baechle [Thu, 25 Aug 2005 18:38:30 +0000 (19:38 +0100)]
[PATCH] 6pack Timer initialization
I dropped the timer initialization bits by accident when sending the
p-persistence fix. This patch gets the driver to work again on halfduplex
links.
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Jeff Garzik [Sat, 27 Aug 2005 08:20:12 +0000 (04:20 -0400)]
[libata scsi] fix read/write translation edge cases
Fix bugs for unlikely edge cases noticed by Douglas Gilbert:
- When READ(6)/WRITE(6) sector count == 0, treat it as 256 sectors
- For other READ(x)/WRITE(x), when sector count == 0, error.
We don't support successfully completing zero-length transfers at
this time.
Jeff Garzik [Sat, 27 Aug 2005 08:13:52 +0000 (04:13 -0400)]
libata: fix a few alan-isms
Roland Dreier [Thu, 25 Aug 2005 20:40:04 +0000 (13:40 -0700)]
[PATCH] IB: move include files to include/rdma
Move the InfiniBand headers from drivers/infiniband/include to include/rdma.
This allows InfiniBand-using code to live elsewhere, and lets us remove the
ugly EXTRA_CFLAGS include path from the InfiniBand Makefiles.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Michael S. Tsirkin [Wed, 24 Aug 2005 21:41:51 +0000 (14:41 -0700)]
[PATCH] IPoIB: Fix device removal race
Currently we may have work scheduled in default kernel workqueue when
the device is going down. The device could get freed before this
workqueue gets serviced. I am actually seeing this causing system
hangs.
The following patch fixes this by using ipoib_workqueue which gets
flushed when the device is going down.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Sean Hefty [Fri, 19 Aug 2005 20:50:33 +0000 (13:50 -0700)]
[PATCH] IB: Add handling for ABORT and STOP RMPP MADs.
Add handling for ABORT / STOP RMPP MADs.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Sean Hefty [Fri, 19 Aug 2005 20:46:34 +0000 (13:46 -0700)]
[PATCH] IB: fix userspace CM deadlock
Fix deadlock condition resulting from trying to destroy a cm_id
from the context of a CM thread. The synchronization around the
ucm context structure is simplified as a result, and some simple
code cleanup is included.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Fri, 19 Aug 2005 19:03:17 +0000 (12:03 -0700)]
[PATCH] IPoIB: Set full membership bit in P_Keys
Always make sure that the full membership bit is set in the P_Keys
that IPoIB uses. This makes sure that all hosts join the correct
multicast groups so that hosts that are partial partition members
can talk to the rest of the network.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Fri, 19 Aug 2005 17:59:31 +0000 (10:59 -0700)]
[PATCH] IB/mthca: Add SRQ implementation
Add mthca support for shared receive queues (SRQs),
including userspace SRQs.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Fri, 19 Aug 2005 17:36:11 +0000 (10:36 -0700)]
[PATCH] IB/mthca: Handle context tables smaller than our chunk size
When creating a table in context memory where the table is smaller
than our chunk size, we don't want to allocate and map a full chunk.
Instead, allocate just enough memory to cover the table.
This can be pretty simple because all tables are a power-of-2 size, so
either the table is a multiple of the chunk size, or it's smaller than
one chunk.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Fri, 19 Aug 2005 17:33:35 +0000 (10:33 -0700)]
[PATCH] IB/mthca: Move WQE structures into their own header
Move the definitions of the WQE structures from mthca_qp.c into
mthca_wqe.h, so that we'll be able to share them when we add the
SRQ code in mthca_srq.c.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Fri, 19 Aug 2005 16:19:05 +0000 (09:19 -0700)]
[PATCH] IB/mthca: Simplify handling of completions with error
Mem-free HCAs never generate error CQEs that complete multiple WQEs,
so just skip the call to mthca_free_err_wqe() for them rather than
having logic to handle the mem-free case in mthca_free_err_wqe().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Thu, 18 Aug 2005 20:39:31 +0000 (13:39 -0700)]
[PATCH] IB/mthca: Factor out common queue alloc code
Clean up the allocation of memory for queues by factoring out the
common code into mthca_buf_alloc() and mthca_buf_free(). Now CQs and
QPs share the same queue allocation code, which we'll also use for SRQs.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Thu, 18 Aug 2005 19:24:13 +0000 (12:24 -0700)]
[PATCH] IB: userspace SRQ support
Add SRQ support to userspace verbs module. This adds several commands
and associated structures, but it's OK to do this without bumping the
ABI version because the commands are added at the end of the list so
they don't change the existing numbering. There are two cases to
worry about:
1. New kernel, old userspace. This is OK because old userspace simply
won't try to use the new SRQ commands. None of the old commands are
changed.
2. Old kernel, new userspace. This works perfectly as long as
userspace doesn't try to use SRQ commands. If userspace tries to
use SRQ commands, it will get EINVAL, which is perfectly
reasonable: the kernel doesn't support SRQs, so we couldn't do any
better.
Signed-off-by: Roland Dreier <rolandd@cisco.com>