firefly-linux-kernel-4.4.55.git
17 years ago[SCTP]: Set assoc_id correctly during INIT collision.
Vlad Yasevich [Fri, 4 May 2007 20:55:27 +0000 (13:55 -0700)]
[SCTP]: Set assoc_id correctly during INIT collision.

During the INIT/COOKIE-ACK collision cases, it's possible to get
into a situation where the association id is not yet set at the time
of the user event generation.  As a result, user events have an
association id set to 0 which will confuse applications.

This happens if we hit case B of duplicate cookie processing.
In the particular example found and provided by Oscar Isaula
<Oscar.Isaula@motorola.com>, flow looks like this:
A B
---- INIT------->  (lost)
    <---------INIT------
---- INIT-ACK--->
    <------ Cookie ECHO

When the Cookie Echo is received, we end up trying to update the
association that was created on A as a result of the (lost) INIT,
but that association doesn't have the ID set yet.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SCTP]: Re-order SCTP initializations to avoid race with sctp_rcv()
Sridhar Samudrala [Fri, 4 May 2007 20:36:30 +0000 (13:36 -0700)]
[SCTP]: Re-order SCTP initializations to avoid race with sctp_rcv()

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SCTP]: Fix the SO_REUSEADDR handling to be similar to TCP.
Vlad Yasevich [Fri, 4 May 2007 20:34:49 +0000 (13:34 -0700)]
[SCTP]: Fix the SO_REUSEADDR handling to be similar to TCP.

Update the SO_REUSEADDR handling to also check for listen state.  This
was muliple listening server sockets can't be created and they will
not steal packets from each other.

Reported by Paolo Galtieri <pgaltieri@mvista.com>

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SCTP]: Verify all destination ports in sctp_connectx.
Vlad Yasevich [Fri, 4 May 2007 20:34:09 +0000 (13:34 -0700)]
[SCTP]: Verify all destination ports in sctp_connectx.

We need to make sure that all destination ports are the same, since
the association really must not connect to multiple different ports
at once.  This was reported on the sctp-impl list.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[XFRM] SPD info TLV aggregation
Jamal Hadi Salim [Fri, 4 May 2007 19:55:39 +0000 (12:55 -0700)]
[XFRM] SPD info TLV aggregation

Aggregate the SPD info TLVs.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[XFRM] SAD info TLV aggregationx
Jamal Hadi Salim [Fri, 4 May 2007 19:55:13 +0000 (12:55 -0700)]
[XFRM] SAD info TLV aggregationx

Aggregate the SAD info TLVs.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[AF_RXRPC]: Sort out MTU handling.
David Howells [Fri, 4 May 2007 19:41:11 +0000 (12:41 -0700)]
[AF_RXRPC]: Sort out MTU handling.

Sort out the MTU determination and handling in AF_RXRPC:

 (1) If it's present, parse the additional information supplied by the peer at
     the end of the ACK packet (struct ackinfo) to determine the MTU sizes
     that peer is willing to support.

 (2) Initialise the MTU size to that peer from the kernel's routing records.

 (3) Send ACKs rather than ACKALLs as the former carry the additional info,
     and the latter do not.

 (4) Declare the interface MTU size in outgoing ACKs as a maximum amount of
     data that can be stuffed into an RxRPC packet without it having to be
     fragmented to come in this computer's NIC.

 (5) If sendmsg() is given MSG_MORE then it should allocate an skb of the
     maximum size rather than one just big enough for the data it's got left
     to process on the theory that there is more data to come that it can
     append to that packet.

     This means, for example, that if AFS does a large StoreData op, all the
     packets barring the last will be filled to the maximum unfragmented size.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[AF_IUCV/IUCV] : Add missing section annotations
Heiko Carstens [Fri, 4 May 2007 19:23:27 +0000 (12:23 -0700)]
[AF_IUCV/IUCV] : Add missing section annotations

Add missing section annotations and found and fixed some
Coding Style issues.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
17 years ago[AF_IUCV]: Implementation of a skb backlog queue
Jennifer Hunt [Fri, 4 May 2007 19:22:07 +0000 (12:22 -0700)]
[AF_IUCV]: Implementation of a skb backlog queue

With the inital implementation we missed to implement a skb backlog
queue . The result is that socket receive processing tossed packets.
Since AF_IUCV connections are working synchronously it leads to
connection hangs. Problems with read, close and select also occured.

Using a skb backlog queue is fixing all of these problems .

Signed-off-by: Jennifer Hunt <jenhunt@us.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETLINK]: Remove bogus BUG_ON
Patrick McHardy [Fri, 4 May 2007 19:15:11 +0000 (12:15 -0700)]
[NETLINK]: Remove bogus BUG_ON

Remove bogus BUG_ON(mutex_is_locked(nlk_sk(sk)->cb_mutex)), when the
netlink_kernel_create caller specifies an external mutex it might
validly be locked.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV6]: Some cleanups in include/net/ipv6.h
Eric Dumazet [Fri, 4 May 2007 00:39:04 +0000 (17:39 -0700)]
[IPV6]: Some cleanups in include/net/ipv6.h

1) struct ip6_flowlabel : moves 'users' field to avoid two 32bits
   holes for 64bit arches. Shrinks by 8 bytes sizeof(struct
   ip6_flowlabel)

2) ipv6_addr_cmp() and ipv6_addr_copy() dont need (void *) casts :
   Compiler might take into account natural alignement of in6_addr
   structs to emit better code for memcpy()/memcmp() Casts to (void *)
   force byte accesses.

3) ipv6_addr_prefix() optimization :

Better to clear whole struct, as compiler can emit better code for
memset(addr, 0, 16) (2 stores on x86_64), and avoid some conditional
branches.

# size vmlinux.after vmlinux.before
   text    data     bss     dec     hex filename
5262262  647612  557432 6467306  62aeea vmlinux.after
5262550  647612  557432 6467594  62b00a vmlinux.before

thats 288 bytes saved.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[TCP]: zero out rx_opt in tcp_disconnect()
Srinivas Aji [Fri, 4 May 2007 00:32:28 +0000 (17:32 -0700)]
[TCP]: zero out rx_opt in tcp_disconnect()

When the server drops its connection, NFS client reconnects using the
same socket after disconnecting. If the new connection's SYN,ACK
doesn't contain the TCP timestamp option and the old connection's did,
tp->tcp_header_len is recomputed assuming no timestamp header but
tp->rx_opt.tstamp_ok remains set. Then tcp_build_and_update_options()
adds in a timestamp option past the end of the allocated TCP header,
overwriting TCP data, or when the data is in skb_shinfo(skb)->frags[],
overwriting skb_shinfo(skb) causing a crash soon after. (The issue was
debugged from such a crash.)

Similarly, wscale_ok and sack_ok also get set based on the SYN,ACK
packet but not reset on disconnect, since they are zeroed out at
initialization. The patch zeroes out the entire tp->rx_opt struct in
tcp_disconnect() to avoid this sort of problem.

Signed-off-by: Srinivas Aji <Aji_Srinivas@emc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Fix TSO problem with small MSS.
Michael Chan [Fri, 4 May 2007 00:23:35 +0000 (17:23 -0700)]
[BNX2]: Fix TSO problem with small MSS.

Remove the check for skb->len greater than MTU when doing TSO.  When
the destination has a smaller MSS than the source, a TSO packet may
be smaller than the MTU at the source and we still need to process it
as a TSO packet.

Thanks to Brian Ristuccia <bristuccia@starentnetworks.com> for
reporting the problem.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Rework dev_base via list_head (v3)
Pavel Emelianov [Thu, 3 May 2007 22:13:45 +0000 (15:13 -0700)]
[NET]: Rework dev_base via list_head (v3)

Cleanup of dev_base list use, with the aim to simplify making device
list per-namespace. In almost every occasion, use of dev_base variable
and dev->next pointer could be easily replaced by for_each_netdev
loop. A few most complicated places were converted to using
first_netdev()/next_netdev().

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[TCP] Highspeed: Limited slow-start is nowadays in tcp_slow_start
Ilpo Järvinen [Thu, 3 May 2007 20:28:35 +0000 (13:28 -0700)]
[TCP] Highspeed: Limited slow-start is nowadays in tcp_slow_start

Reuse limited slow-start (RFC3742) included into tcp_cong instead
of having another implementation in High Speed TCP.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Update version and reldate.
Michael Chan [Thu, 3 May 2007 20:25:32 +0000 (13:25 -0700)]
[BNX2]: Update version and reldate.

Update version to 1.5.10.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Print bus information for PCIE devices.
Michael Chan [Thu, 3 May 2007 20:25:11 +0000 (13:25 -0700)]
[BNX2]: Print bus information for PCIE devices.

Fix the code to print PCI or PCIE bus information for all devices.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Add 1-shot MSI handler for 5709.
Michael Chan [Thu, 3 May 2007 20:24:48 +0000 (13:24 -0700)]
[BNX2]: Add 1-shot MSI handler for 5709.

The 5709 supports the one-shot MSI handler similar to some of the tg3
chips.  In this mode, the MSI disables itself automatically until it
is re-enabled at the end of NAPI poll.

Put the request_irq/free_irq logic in common procedures.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Restructure PHY event handling.
Michael Chan [Thu, 3 May 2007 20:24:23 +0000 (13:24 -0700)]
[BNX2]: Restructure PHY event handling.

Restructure by adding bnx2_phy_event_is_set() to make code cleaner
and easier to understand.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Add indirect spinlock.
Michael Chan [Thu, 3 May 2007 20:24:05 +0000 (13:24 -0700)]
[BNX2]: Add indirect spinlock.

The indirect register access method will be used by more than one
caller in BH context (NAPI poll and timer), so a spinlock is required.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Add support for 5709 Serdes.
Michael Chan [Thu, 3 May 2007 20:23:41 +0000 (13:23 -0700)]
[BNX2]: Add support for 5709 Serdes.

Add PCI ID and code to support the 5709 Serdes PHY.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Re-structure the 2.5G Serdes code.
Michael Chan [Thu, 3 May 2007 20:23:13 +0000 (13:23 -0700)]
[BNX2]: Re-structure the 2.5G Serdes code.

Add some common procedures to handle enabling and disabling 2.5G.
Add some missing code to resolve flow control.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Put MII register offsets in the bnx2 struct.
Michael Chan [Thu, 3 May 2007 20:22:52 +0000 (13:22 -0700)]
[BNX2]: Put MII register offsets in the bnx2 struct.

The 5709 Serdes device uses non-standard MII register offsets.  This
re-structuring will make it easier to support 5709 Serdes.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Add ipv6 TSO and checksum for 5709.
Michael Chan [Thu, 3 May 2007 20:22:28 +0000 (13:22 -0700)]
[BNX2]: Add ipv6 TSO and checksum for 5709.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Update 5709 firmware.
Michael Chan [Thu, 3 May 2007 20:21:48 +0000 (13:21 -0700)]
[BNX2]: Update 5709 firmware.

Add ipv6 TSO support in firmware.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Update 5708 firmware.
Michael Chan [Thu, 3 May 2007 20:21:13 +0000 (13:21 -0700)]
[BNX2]: Update 5708 firmware.

This fixes the problem of not counting all dropped multicast packets.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Save PCI state during suspend.
Michael Chan [Thu, 3 May 2007 20:20:40 +0000 (13:20 -0700)]
[BNX2]: Save PCI state during suspend.

This is needed to save the MSI state which will be lost during
suspend.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Fix race conditions when calling register_netdev().
Michael Chan [Thu, 3 May 2007 20:20:19 +0000 (13:20 -0700)]
[BNX2]: Fix race conditions when calling register_netdev().

Hot-plug scripts can call bnx2_open() as soon as register_netdev() is
called in bnx2_init_one().  We need to call pci_set_drvdata() and
setup everything before calling register_netdev(). netif_carrier_off()
also needs to be moved to bnx2_open() to avoid race conditions with
the irq.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Add 40-bit DMA workaround for 5708.
Michael Chan [Thu, 3 May 2007 20:19:18 +0000 (13:19 -0700)]
[BNX2]: Add 40-bit DMA workaround for 5708.

The internal PCIE-to-PCIX bridge of the 5708 has the same 40-bit DMA
limitation as some of the tg3 chips.  Set dma_mask and persistent DMA
mask to 40-bit to workaround.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Fix register and memory test on 5709.
Michael Chan [Thu, 3 May 2007 20:18:46 +0000 (13:18 -0700)]
[BNX2]: Fix register and memory test on 5709.

Tweak registers and memory test range for 5709.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BNX2]: Block MII access when ifdown.
Michael Chan [Thu, 3 May 2007 20:18:03 +0000 (13:18 -0700)]
[BNX2]: Block MII access when ifdown.

The device may be in D3hot state and should not allow MII register
access.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[ETHTOOL]: Add 2.5G bit definitions.
Michael Chan [Thu, 3 May 2007 20:17:25 +0000 (13:17 -0700)]
[ETHTOOL]: Add 2.5G bit definitions.

Add 2.5G supported and advertising bit definitions.  2.5G is supported
by the bnx2 driver.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: bridge netfilter: consolidate header pushing/pulling code
Patrick McHardy [Thu, 3 May 2007 10:36:16 +0000 (03:36 -0700)]
[NETFILTER]: bridge netfilter: consolidate header pushing/pulling code

Consolidate the common push/pull sequences into a few helper functions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: sip: Fix RTP address NAT
Herbert Xu [Thu, 3 May 2007 10:35:31 +0000 (03:35 -0700)]
[NETFILTER]: sip: Fix RTP address NAT

I needed to use this recently to talk to a Cisco server.  In my case
I only did SNAT while the Cisco server used a different address for
RTP traffic than the one for SIP.  I discovered that nf_nat_sip NATed
the RTP address to the SIP one which was unnecessary but OK.  However,
in doing so it did not DNAT the destination address on the RTP traffic
to the Cisco back to the original RTP address.

This patch corrects this by noting down the RTP address and using it
when the expectation fires.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_nat_proto_gre: do not modify/corrupt GREv0 packets through NAT
Jorge Boncompte [Thu, 3 May 2007 10:34:42 +0000 (03:34 -0700)]
[NETFILTER]: nf_nat_proto_gre: do not modify/corrupt GREv0 packets through NAT

While porting some changes of the 2.6.21-rc7 pptp/proto_gre conntrack
and nat modules to a 2.4.32 kernel I noticed that the gre_key function
returns a wrong pointer to the GRE key of a version 0 packet thus
corrupting the packet payload.

The intended behaviour for GREv0 packets is to act like
nf_conntrack_proto_generic/nf_nat_proto_unknown so I have ripped the
offending functions (not used anymore) and modified the
nf_nat_proto_gre modules to not touch version 0 (non PPTP) packets.

Signed-off-by: Jorge Boncompte <jorge@dti2.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: ipt_DNAT: accept port randomization option
Patrick McHardy [Thu, 3 May 2007 10:34:03 +0000 (03:34 -0700)]
[NETFILTER]: ipt_DNAT: accept port randomization option

Also accept the --random option for DNAT to allow randomly selecting a
destination port from the given range.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[TCP]: Use S+L catcher only with SACK for now
Ilpo Järvinen [Thu, 3 May 2007 10:30:34 +0000 (03:30 -0700)]
[TCP]: Use S+L catcher only with SACK for now

TCP has a transitional state when SACK is not in use during
which this invariant is temporarily broken. Without SACK,
tcp_clean_rtx_queue does not decrement sacked_out. Therefore
calls to tcp_sync_left_out before sacked_out is again
corrected by tcp_fastretrans_alert can trigger this trap as
sacked_out still has couple of segments that are already out
of window.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[AFS]: Adjust the new netdevice scanning code
David Howells [Thu, 3 May 2007 10:29:41 +0000 (03:29 -0700)]
[AFS]: Adjust the new netdevice scanning code

Adjust the new netdevice scanning code provided by Patrick McHardy:

 (1) Restore the function banner comments that were dropped.

 (2) Rather than using an array size of 6 in some places and an array size of
     ETH_ALEN in others, pass a pointer instead and pass the array size
     through so that we can actually check it.

 (3) Do the buffer fill count check before checking the for_primary_ifa
     condition again.  This permits us to skip that check should maxbufs be
     reached before we run out of interfaces.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[AFS]: Replace rtnetlink client by direct dev_base walking
Patrick McHardy [Thu, 3 May 2007 10:28:49 +0000 (03:28 -0700)]
[AFS]: Replace rtnetlink client by direct dev_base walking

Replace the large and complicated rtnetlink client by two simple
functions for getting the MAC address for the first ethernet device
and building a list of IPv4 addresses.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Add __dev_getfirstbyhwtype
Patrick McHardy [Thu, 3 May 2007 10:28:13 +0000 (03:28 -0700)]
[NET]: Add __dev_getfirstbyhwtype

Add __dev_getfirstbyhwtype for callers that don't want a reference but
some data from the device and thus need to take the rtnl anyway.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[AFS]: Fix memory leak in SRXAFSCB_GetCapabilities
Patrick McHardy [Thu, 3 May 2007 10:27:39 +0000 (03:27 -0700)]
[AFS]: Fix memory leak in SRXAFSCB_GetCapabilities

The interface array is not freed on exit.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETLINK]: Fix use after free in netlink_recvmsg
Patrick McHardy [Thu, 3 May 2007 10:27:01 +0000 (03:27 -0700)]
[NETLINK]: Fix use after free in netlink_recvmsg

When the user passes in MSG_TRUNC the skb is used after getting freed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETLINK]: Kill CB only when socket is unused
Herbert Xu [Thu, 3 May 2007 10:17:14 +0000 (03:17 -0700)]
[NETLINK]: Kill CB only when socket is unused

Since we can still receive packets until all references to the
socket are gone, we don't need to kill the CB until that happens.
This also aligns ourselves with the receive queue purging which
happens at that point.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET] skbuff: fix kernel-doc
Randy Dunlap [Thu, 3 May 2007 10:16:20 +0000 (03:16 -0700)]
[NET] skbuff: fix kernel-doc

Fix skbuff.h kernel-doc:
linux-2.6.21-git4//include/linux/skbuff.h:316): No description found for parameter 'transport_header'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[TCP]: Delete unused header file net/ipv4/tcp_yeah.h.
Robert P. J. Day [Thu, 3 May 2007 10:13:35 +0000 (03:13 -0700)]
[TCP]: Delete unused header file net/ipv4/tcp_yeah.h.

Delete the apparently unused header file net/ipv4/tcp_yeah.h.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[AFS]: Fix use of __exit functions from __init path
David Howells [Thu, 3 May 2007 10:12:46 +0000 (03:12 -0700)]
[AFS]: Fix use of __exit functions from __init path

Fix use of __exit functions from __init path.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[AFS/AF_RXRPC]: Miscellaneous fixes.
David Howells [Thu, 3 May 2007 10:11:29 +0000 (03:11 -0700)]
[AFS/AF_RXRPC]: Miscellaneous fixes.

Make miscellaneous fixes to AFS and AF_RXRPC:

 (*) Make AF_RXRPC select KEYS rather than RXKAD or AFS_FS in Kconfig.

 (*) Don't use FS_BINARY_MOUNTDATA.

 (*) Remove a done 'TODO' item in a comemnt on afs_get_sb().

 (*) Don't pass a void * as the page pointer argument of kmap_atomic() as this
     breaks on m68k.  Patch from Geert Uytterhoeven <geert@linux-m68k.org>.

 (*) Use match_*() functions rather than doing my own parsing.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[AFS]: Make the match_*() functions take const options.
David Howells [Thu, 3 May 2007 10:10:39 +0000 (03:10 -0700)]
[AFS]: Make the match_*() functions take const options.

Make the match_*() functions take a const pointer to the options table
and make strings pointers in the options table const too.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV6]: Get rid of __HAVE_ARCH_ADDR_SET.
Eric Dumazet [Thu, 3 May 2007 10:08:43 +0000 (03:08 -0700)]
[IPV6]: Get rid of __HAVE_ARCH_ADDR_SET.

__HAVE_ARCH_ADDR_SET seems unused these days, just get rid of it.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agolibata: honour host controllers that want just one host
Linus Torvalds [Tue, 1 May 2007 00:43:48 +0000 (17:43 -0700)]
libata: honour host controllers that want just one host

The Marvell IDE interface on my machine would hit a BUG_ON() in
lib/iomem.c because it was calling ata_pci_init_one() specifying just a
single port on the host, but that would actually end up trying to
initialize two ports, the second one with bogus information.

This fixes "ata_pci_init_one()" so that it actually passes down the
n_ports variable that it got from the low-level driver to the host
allocation routine ("ata_host_alloc_pinfo()"), which results in the ATA
layer actually having the correct port number information.

And in order to make it all work, I also needed to fix a few places that
had incorrectly hard-coded the fact that a host always had exactly two
ports (both ata_pci_init_bmdma() and ata_request_legacy_irqs() would
just always iterate over both ports).

Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agopm: include EIO from errno-base.h
David Rientjes [Mon, 30 Apr 2007 22:09:56 +0000 (15:09 -0700)]
pm: include EIO from errno-base.h

For backwards compatibility, call_platform_enable_wakeup() can return 0
instead of -EIO since we aren't guaranteed to have errno defined.

Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoAdd kvasprintf()
Jeremy Fitzhardinge [Mon, 30 Apr 2007 22:09:56 +0000 (15:09 -0700)]
Add kvasprintf()

Add a kvasprintf() function to complement kasprintf().

No in-tree users yet, but I have some coming up.

[akpm@linux-foundation.org: EXPORT it]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Keir Fraser <keir@xensource.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agopower management: force pm_ops.valid callback to be assigned
Johannes Berg [Mon, 30 Apr 2007 22:09:55 +0000 (15:09 -0700)]
power management: force pm_ops.valid callback to be assigned

This patch changes the docs and behaviour from "all states valid" to "no
states valid" if no .valid callback is assigned.  Users of pm_ops that only
need mem sleep can assign pm_valid_only_mem without any overhead, others
will require more elaborate callbacks.

Now that all users of pm_ops have a .valid callback this is a safe thing to
do and prevents things from getting messy again as they were before.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Looks-okay-to: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <linux-pm@lists.linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agopower management: implement pm_ops.valid for everybody
Johannes Berg [Mon, 30 Apr 2007 22:09:54 +0000 (15:09 -0700)]
power management: implement pm_ops.valid for everybody

Almost all users of pm_ops only support mem sleep, don't check in .valid and
don't reject any others in .prepare so users can be confused if they check
/sys/power/state, especially when new states are added (these would then
result in s-t-r although they're supposed to be something different).

This patch implements a generic pm_valid_only_mem function that is then
exported for users and puts it to use in almost all existing pm_ops.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: David Brownell <david-b@pacbell.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: linux-pm@lists.linux-foundation.org
Cc: Len Brown <lenb@kernel.org>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agopower management: remove firmware disk mode
Johannes Berg [Mon, 30 Apr 2007 22:09:53 +0000 (15:09 -0700)]
power management: remove firmware disk mode

This patch removes the firmware disk suspend mode which is the wrong approach,
it is supposed to be used for implementing firmware-based disk suspend but
cannot actually be used for that.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: <linux-pm@lists.linux-foundation.org>
Cc: David Brownell <david-b@pacbell.net>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agorework pm_ops pm_disk_mode, kill misuse
Johannes Berg [Mon, 30 Apr 2007 22:09:51 +0000 (15:09 -0700)]
rework pm_ops pm_disk_mode, kill misuse

This patch series cleans up some misconceptions about pm_ops.  Some users of
the pm_ops structure attempt to use it to stop the user from entering suspend
to disk, this, however, is not possible since the user can always use
"shutdown" in /sys/power/disk and then the pm_ops are never invoked.  Also,
platforms that don't support suspend to disk simply should not allow
configuring SOFTWARE_SUSPEND (read the help text on it, it only selects
suspend to disk and nothing else, all the other stuff depends on PM).

The pm_ops structure is actually intended to provide a way to enter
platform-defined sleep states (currently supported states are "standby" and
"mem" (suspend to ram)) and additionally (if SOFTWARE_SUSPEND is configured)
allows a platform to support a platform specific way to enter low-power mode
once everything has been saved to disk.  This is currently only used by ACPI
(S4).

This patch:

The pm_ops.pm_disk_mode is used in totally bogus ways since nobody really
seems to understand what it actually does.

This patch clarifies the pm_disk_mode description.

It also removes all the arm and sh users that think they can veto suspend to
disk via pm_ops; not so since the user can always do echo shutdown >
/sys/power/disk, they need to find a better way involving Kconfig or such.

ACPI is the only user left with a non-zero pm_disk_mode.

The patch also sets the default mode to shutdown again, but when a new pm_ops
is registered its pm_disk_mode is selected as default, that way the default
stays for ACPI where it is apparently required.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: David Brownell <david-b@pacbell.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: <linux-pm@lists.linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoreiserfs: suppress lockdep warning
Jeff Mahoney [Mon, 30 Apr 2007 22:09:50 +0000 (15:09 -0700)]
reiserfs: suppress lockdep warning

We're getting lockdep warnings due to a post-2.6.21-rc7 bugfix.

The xattr_sem can never be taken in the manner described. Internal inodes
are protected by I_PRIVATE.  Add the appropriate annotation.

Cc: <stable@kernel.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoExtend print_symbol capability
Robert Peterson [Mon, 30 Apr 2007 22:09:48 +0000 (15:09 -0700)]
Extend print_symbol capability

Today's print_symbol function dumps a kernel symbol with printk.  This
patch extends the functionality of kallsyms.c so that the symbol lookup
function may be used without the printk.  This is useful for modules that
want to dump symbols elsewhere, for example, to debugfs.  I intend to use
the new function call in the GFS2 file system (which will be a separate
patch).

[akpm@linux-foundation.org: build fix]
[clameter@sgi.com: sprint_symbol should return length of string like sprintf]
Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Paulo Marques <pmarques@grupopie.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years ago[UDP]: Do not allow specific bind when wildcard bind exists.
David S. Miller [Mon, 30 Apr 2007 21:51:58 +0000 (14:51 -0700)]
[UDP]: Do not allow specific bind when wildcard bind exists.

When allocating local ports, do not allow a bind to a port
with a specific local address when a bind to that port with
a wildcard local address already exists.

Noticed by Linus.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV4] UDP: Fix endianness bugs in hashing changes.
David S. Miller [Mon, 30 Apr 2007 20:35:29 +0000 (13:35 -0700)]
[IPV4] UDP: Fix endianness bugs in hashing changes.

I accidently applied an earlier version of Eric Dumazet's patch, from
March 21st.  His version from March 30th didn't have these bugs, so
this just interdiffs to the correct patch.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394...
Linus Torvalds [Mon, 30 Apr 2007 15:59:57 +0000 (08:59 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ieee1394/linux1394-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (56 commits)
  ieee1394: remove garbage from Kconfig
  ieee1394: more help in Kconfig
  ieee1394: ohci1394: Fix mistake in printk message.
  ieee1394: ohci1394: remove unnecessary rcvPhyPkt bit flipping in LinkControl register
  ieee1394: ohci1394: fix cosmetic problem in error logging
  ieee1394: eth1394: send async streams at S100 on 1394b buses
  ieee1394: eth1394: fix error path in module_init
  ieee1394: eth1394: correct return codes in hard_start_xmit
  ieee1394: eth1394: hard_start_xmit is called in atomic context
  ieee1394: eth1394: some conditions are unlikely
  ieee1394: eth1394: clean up fragment_overlap
  ieee1394: eth1394: don't use alloc_etherdev
  ieee1394: eth1394: omit useless set_mac_address callback
  ieee1394: eth1394: CONFIG_INET is always defined
  ieee1394: eth1394: allow MTU bigger than 1500
  ieee1394: unexport highlevel_host_reset
  ieee1394: eth1394: contain host reset
  ieee1394: eth1394: shorter error messages
  ieee1394: eth1394: correct a memset argument
  ieee1394: eth1394: refactor .probe and .update
  ...

17 years agoMerge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jikos/hid
Linus Torvalds [Mon, 30 Apr 2007 15:58:21 +0000 (08:58 -0700)]
Merge branch 'for-linus' of /linux/kernel/git/jikos/hid

* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jikos/hid: (21 commits)
  USB HID: don't warn on idVendor == 0
  USB HID: add 'quirks' module parameter
  USB HID: add support for dynamically-created quirks
  USB HID: clarify static quirk handling as squirks
  USB HID: encapsulate quirk handling into hid-quirks.c
  USB HID: EMS USBII device needs HID_QUIRK_MULTI_INPUT
  HID: update copyright and authorship macro
  HID: introduce proper zeroing of unused bits in output reports
  USB HID: add support for WiseGroup MP-8800 Quad Joypad
  USB HID: add FF support for Logitech Force 3D Pro Joystick
  USB HID: numlock quirk for dell W7658 keyboard
  USB HID: Logitech MX3000 keyboard needs report descriptor quirk
  USB HID: extend quirk for Logitech S510 keyboard
  USB HID: usbkbd/usbmouse - handle errors when registering devices
  USB HID: add QUIRK_HIDDEV for Belkin Flip KVM
  HID: enable dead keys on a belkin wireless keyboard
  USB HID: Thustmaster firestorm dual power v1 support
  USB HID: specify explicit size for hid_blacklist.quirks
  USB HID: fix retry & reset logic
  USB HID: consolidate vendor/product ids
  ...

17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Mon, 30 Apr 2007 15:14:42 +0000 (08:14 -0700)]
Merge /pub/scm/linux/kernel/git/davem/net-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
  [IPV4] SNMP: Support OutMcastPkts and OutBcastPkts
  [IPV4] SNMP: Support InMcastPkts and InBcastPkts
  [IPV4] SNMP: Support InTruncatedPkts
  [IPV4] SNMP: Support InNoRoutes
  [SNMP]: Add definitions for {In,Out}BcastPkts
  [TCP] FRTO: RFC4138 allows Nagle override when new data must be sent
  [TCP] FRTO: Delay skb available check until it's mandatory
  [XFRM]: Restrict upper layer information by bundle.
  [TCP]: Catch skb with S+L bugs earlier
  [PATCH] INET : IPV4 UDP lookups converted to a 2 pass algo
  [L2TP]: Add the ability to autoload a pppox protocol module.
  [SKB]: Introduce skb_queue_walk_safe()
  [AF_IUCV/IUCV]: smp_call_function deadlock
  [IPV6]: Fix slab corruption running ip6sic
  [TCP]: Update references in two old comments
  [XFRM]: Export SPD info
  [IPV6]: Track device renames in snmp6.
  [SCTP]: Fix sctp_getsockopt_local_addrs_old() to use local storage.
  [NET]: Remove NETIF_F_INTERNAL_STATS, default to internal stats.
  [NETPOLL]: Remove CONFIG_NETPOLL_RX
  ...

17 years agoMerge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block
Linus Torvalds [Mon, 30 Apr 2007 15:12:39 +0000 (08:12 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block:
  [PATCH] elevator: elv_list_lock does not need irq disabling
  [BLOCK] Don't pin lots of memory in mempools
  cfq-iosched: speedup cic rb lookup
  ll_rw_blk: add io_context private pointer
  cfq-iosched: get rid of cfqq hash
  cfq-iosched: tighten queue request overlap condition
  cfq-iosched: improve sync vs async workloads
  cfq-iosched: never allow an async queue idling
  cfq-iosched: get rid of ->dispatch_slice
  cfq-iosched: don't pass unused preemption variable around
  cfq-iosched: get rid of ->cur_rr and ->cfq_list
  cfq-iosched: slice offset should take ioprio into account
  [PATCH] cfq-iosched: style cleanups and comments
  cfq-iosched: sort IDLE queues into the rbtree
  cfq-iosched: sort RT queues into the rbtree
  [PATCH] cfq-iosched: speed up rbtree handling
  cfq-iosched: rework the whole round-robin list concept
  cfq-iosched: minor updates
  cfq-iosched: development update
  cfq-iosched: improve preemption for cooperating tasks

17 years agoMerge branch 'for-2.6.22' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus...
Linus Torvalds [Mon, 30 Apr 2007 15:10:12 +0000 (08:10 -0700)]
Merge branch 'for-2.6.22' of git://git./linux/kernel/git/paulus/powerpc

* 'for-2.6.22' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (255 commits)
  [POWERPC] Remove dev_dbg redefinition in drivers/ps3/vuart.c
  [POWERPC] remove kernel module option for booke wdt
  [POWERPC] Avoid putting cpu node twice
  [POWERPC] Spinlock initializer cleanup
  [POWERPC] ppc4xx_sgdma needs dma-mapping.h
  [POWERPC] arch/powerpc/sysdev/timer.c build fix
  [POWERPC] get_property cleanups
  [POWERPC] Remove the unused HTDMSOUND driver
  [POWERPC] cell: cbe_cpufreq cleanup and crash fix
  [POWERPC] Declare enable_kernel_spe in a header
  [POWERPC] Add dt_xlate_addr() to bootwrapper
  [POWERPC] bootwrapper: CONFIG_ -> CONFIG_DEVICE_TREE
  [POWERPC] Don't define a custom bd_t for Xilixn Virtex based boards.
  [POWERPC] Add sane defaults for Xilinx EDK generated xparameters files
  [POWERPC] Add uartlite boot console driver for the zImage wrapper
  [POWERPC] Stop using ppc_sys for Xilinx Virtex boards
  [POWERPC] New registration for common Xilinx Virtex ppc405 platform devices
  [POWERPC] Merge common virtex header files
  [POWERPC] Rework Kconfig dependancies for Xilinx Virtex ppc405 platform
  [POWERPC] Clean up cpufreq Kconfig dependencies
  ...

17 years ago[IPV4] SNMP: Support OutMcastPkts and OutBcastPkts
Mitsuru Chinen [Mon, 30 Apr 2007 07:48:20 +0000 (00:48 -0700)]
[IPV4] SNMP: Support OutMcastPkts and OutBcastPkts

A transmitted IP multicast datagram should be counted as OutMcastPkts.
By the same token, a transmitted IP broadcast datagram should be
counted as OutBcastPkts.

Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV4] SNMP: Support InMcastPkts and InBcastPkts
Mitsuru Chinen [Mon, 30 Apr 2007 07:48:10 +0000 (00:48 -0700)]
[IPV4] SNMP: Support InMcastPkts and InBcastPkts

A received IP multicast datagram should be counted as InMcastPkts.
By the same token, a received IP broadcast datagram should be
counted as InBcastPkts.

Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV4] SNMP: Support InTruncatedPkts
Mitsuru Chinen [Mon, 30 Apr 2007 07:46:30 +0000 (00:46 -0700)]
[IPV4] SNMP: Support InTruncatedPkts

An IP datagram which is being discarded because the datagram frame
didn't carry enough data should be counted as InTruncatedPkts.

Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV4] SNMP: Support InNoRoutes
Mitsuru Chinen [Mon, 30 Apr 2007 07:45:49 +0000 (00:45 -0700)]
[IPV4] SNMP: Support InNoRoutes

An IP datagram which is being discarded because of no routes in the
forwarding path should be counted as InNoRoutes.

Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SNMP]: Add definitions for {In,Out}BcastPkts
Mitsuru Chinen [Mon, 30 Apr 2007 07:45:02 +0000 (00:45 -0700)]
[SNMP]: Add definitions for {In,Out}BcastPkts

The updated IP-MIB RFC (RFC4293) specifys new objects, InBcastPkts
and OutBcastPkts. This adds definitions for them.

Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[TCP] FRTO: RFC4138 allows Nagle override when new data must be sent
Ilpo Järvinen [Mon, 30 Apr 2007 07:42:20 +0000 (00:42 -0700)]
[TCP] FRTO: RFC4138 allows Nagle override when new data must be sent

This is a corner case where less than MSS sized new data thingie
is awaiting in the send queue. For F-RTO to work correctly, a
new data segment must be sent at certain point or F-RTO cannot
be used at all. RFC4138 allows overriding of Nagle at that
point.

Implementation uses frto_counter states 2 and 3 to distinguish
when Nagle override is needed.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[TCP] FRTO: Delay skb available check until it's mandatory
Ilpo Järvinen [Mon, 30 Apr 2007 07:39:55 +0000 (00:39 -0700)]
[TCP] FRTO: Delay skb available check until it's mandatory

No new data is needed until the first ACK comes, so no need to check
for application limitedness until then.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[XFRM]: Restrict upper layer information by bundle.
Masahide NAKAMURA [Mon, 30 Apr 2007 07:33:35 +0000 (00:33 -0700)]
[XFRM]: Restrict upper layer information by bundle.

On MIPv6 usage, XFRM sub policy is enabled.
When main (IPsec) and sub (MIPv6) policy selectors have the same
address set but different upper layer information (i.e. protocol
number and its ports or type/code), multiple bundle should be created.
However, currently we have issue to use the same bundle created for
the first time with all flows covered by the case.

It is useful for the bundle to have the upper layer information
to be restructured correctly if it does not match with the flow.

1. Bundle was created by two policies
Selector from another policy is added to xfrm_dst.
If the flow does not match the selector, it goes to slow path to
restructure new bundle by single policy.

2. Bundle was created by one policy
Flow cache is added to xfrm_dst as originated one. If the flow does
not match the cache, it goes to slow path to try searching another
policy.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[TCP]: Catch skb with S+L bugs earlier
Ilpo Järvinen [Mon, 30 Apr 2007 07:57:33 +0000 (00:57 -0700)]
[TCP]: Catch skb with S+L bugs earlier

SACKED_ACKED and LOST are mutually exclusive with SACK, thus
having their sum larger than packets_out is bug with SACK.
Eventually these bugs trigger traps in the tcp_clean_rtx_queue
with SACK but it's much more informative to do this here.

Non-SACK TCP, however, could get more than packets_out duplicate
ACKs which each increment sacked_out, so it makes sense to do
this kind of limitting for non-SACK TCP but not for SACK enabled
one. Perhaps the author had the opposite in mind but did the
logic accidently wrong way around? Anyway, the sacked_out
incrementer code for non-SACK already deals this issue before
calling sync_left_out so this trapping can be done
unconditionally.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[PATCH] INET : IPV4 UDP lookups converted to a 2 pass algo
Eric Dumazet [Mon, 30 Apr 2007 07:26:00 +0000 (00:26 -0700)]
[PATCH] INET : IPV4 UDP lookups converted to a 2 pass algo

Some people want to have many UDP sockets, binded to a single port but
many different addresses. We currently hash all those sockets into a
single chain.  Processing of incoming packets is very expensive,
because the whole chain must be examined to find the best match.

I chose in this patch to hash UDP sockets with a hash function that
take into account both their port number and address : This has a
drawback because we need two lookups : one with a given address, one
with a wildcard (null) address.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[L2TP]: Add the ability to autoload a pppox protocol module.
James Chapman [Mon, 30 Apr 2007 07:21:02 +0000 (00:21 -0700)]
[L2TP]: Add the ability to autoload a pppox protocol module.

This patch allows a name "pppox-proto-nnn" to be used in modprobe.conf
to autoload a PPPoX protocol nnn.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoMerge branch 'cfq' into for-linus
Jens Axboe [Mon, 30 Apr 2007 07:09:27 +0000 (09:09 +0200)]
Merge branch 'cfq' into for-linus

17 years ago[PATCH] elevator: elv_list_lock does not need irq disabling
Jens Axboe [Thu, 26 Apr 2007 12:41:53 +0000 (14:41 +0200)]
[PATCH] elevator: elv_list_lock does not need irq disabling

It's never grabbed from irq context, so just make it plain spin_lock().

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years ago[BLOCK] Don't pin lots of memory in mempools
Jens Axboe [Mon, 2 Apr 2007 08:06:42 +0000 (10:06 +0200)]
[BLOCK] Don't pin lots of memory in mempools

Currently we scale the mempool sizes depending on memory installed
in the machine, except for the bio pool itself which sits at a fixed
256 entry pre-allocation.

There's really no point in "optimizing" this OOM path, we just need
enough preallocated to make progress. A single unit is enough, lets
scale it down to 2 just to be on the safe side.

This patch saves ~150kb of pinned kernel memory on a 32-bit box.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years ago[SKB]: Introduce skb_queue_walk_safe()
James Chapman [Mon, 30 Apr 2007 07:07:31 +0000 (00:07 -0700)]
[SKB]: Introduce skb_queue_walk_safe()

This patch provides a method for walking skb lists while inserting or
removing skbs from the list.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agocfq-iosched: speedup cic rb lookup
Jens Axboe [Tue, 24 Apr 2007 19:23:53 +0000 (21:23 +0200)]
cfq-iosched: speedup cic rb lookup

We often lookup the same queue many times in succession, so cache
the last looked up queue to avoid browsing the rbtree.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoll_rw_blk: add io_context private pointer
Jens Axboe [Tue, 24 Apr 2007 19:17:33 +0000 (21:17 +0200)]
ll_rw_blk: add io_context private pointer

To be used by as/cfq as they see fit.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocfq-iosched: get rid of cfqq hash
Vasily Tarasov [Wed, 25 Apr 2007 10:29:51 +0000 (12:29 +0200)]
cfq-iosched: get rid of cfqq hash

cfq hash is no more necessary.  We always can get cfqq from io context.
cfq_get_io_context_noalloc() function is introduced, because we don't
want to allocate cic on merging and checking may_queue.  In order to
identify sync queue we've used hash key = CFQ_KEY_ASYNC. Since hash is
eliminated we need to use other criterion: sync flag for queue is added.
In all places where we dig in rb_tree we're in current context, so no
additional locking is required.

Advantages of this patch: no additional memory for hash, no seeking in
hash, code is cleaner. But it is necessary now to seek cic in per-ioc
rbtree, but it is faster:
- most processes work only with few devices
- most systems have only few block devices
- it is a rb-tree

Signed-off-by: Vasily Tarasov <vtaras@openvz.org>
Changes by me:

- Merge into CFQ devel branch
- Get rid of cfq_get_io_context_noalloc()
- Fix various bugs with dereferencing cic->cfqq[] with offset other
  than 0 or 1.
- Fix bug in cfqq setup, is_sync condition was reversed.
- Fix bug where only bio_sync() is used, we need to check for a READ too

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocfq-iosched: tighten queue request overlap condition
Jens Axboe [Fri, 20 Apr 2007 18:45:39 +0000 (20:45 +0200)]
cfq-iosched: tighten queue request overlap condition

For tagged devices, allow overlap of requests if the idle window
isn't enabled on the current active queue.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocfq-iosched: improve sync vs async workloads
Jens Axboe [Mon, 23 Apr 2007 06:33:33 +0000 (08:33 +0200)]
cfq-iosched: improve sync vs async workloads

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocfq-iosched: never allow an async queue idling
Jens Axboe [Thu, 19 Apr 2007 12:32:26 +0000 (14:32 +0200)]
cfq-iosched: never allow an async queue idling

We don't enable it by default, don't let it get enabled during
runtime.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocfq-iosched: get rid of ->dispatch_slice
Jens Axboe [Mon, 23 Apr 2007 06:26:36 +0000 (08:26 +0200)]
cfq-iosched: get rid of ->dispatch_slice

We can track it fairly accurately locally, let the slice handling
take care of the rest.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocfq-iosched: don't pass unused preemption variable around
Jens Axboe [Mon, 23 Apr 2007 06:25:00 +0000 (08:25 +0200)]
cfq-iosched: don't pass unused preemption variable around

We don't use it anymore in the slice expiry handling.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocfq-iosched: get rid of ->cur_rr and ->cfq_list
Jens Axboe [Thu, 19 Apr 2007 10:03:34 +0000 (12:03 +0200)]
cfq-iosched: get rid of ->cur_rr and ->cfq_list

It's only used for preemption now that the IDLE and RT queues also
use the rbtree. If we pass an 'add_front' variable to
cfq_service_tree_add(), we can set ->rb_key to 0 to force insertion
at the front of the tree.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocfq-iosched: slice offset should take ioprio into account
Jens Axboe [Fri, 20 Apr 2007 12:18:00 +0000 (14:18 +0200)]
cfq-iosched: slice offset should take ioprio into account

Use the max_slice-cur_slice as the multipler for the insertion offset.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years ago[PATCH] cfq-iosched: style cleanups and comments
Jens Axboe [Thu, 26 Apr 2007 10:54:48 +0000 (12:54 +0200)]
[PATCH] cfq-iosched: style cleanups and comments

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocfq-iosched: sort IDLE queues into the rbtree
Jens Axboe [Wed, 18 Apr 2007 18:13:32 +0000 (20:13 +0200)]
cfq-iosched: sort IDLE queues into the rbtree

Same treatment as the RT conversion, just put the sorted idle
branch at the end of the tree.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocfq-iosched: sort RT queues into the rbtree
Jens Axboe [Wed, 18 Apr 2007 18:01:57 +0000 (20:01 +0200)]
cfq-iosched: sort RT queues into the rbtree

Currently CFQ does a linked insert into the current list for RT
queues. We can just factor the class into the rb insertion,
and then we don't have to treat RT queues in a special way. It's
faster, too.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years ago[PATCH] cfq-iosched: speed up rbtree handling
Jens Axboe [Thu, 26 Apr 2007 10:53:50 +0000 (12:53 +0200)]
[PATCH] cfq-iosched: speed up rbtree handling

For cases where the rbtree is mainly used for sorting and min retrieval,
a nice speedup of the rbtree code is to maintain a cache of the leftmost
node in the tree.

Also spotted in the CFS CPU scheduler code.

Improved by Alan D. Brunelle <Alan.Brunelle@hp.com> by updating the
leftmost hint in cfq_rb_first() if it isn't set, instead of only
updating it on insert.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocfq-iosched: rework the whole round-robin list concept
Jens Axboe [Fri, 20 Apr 2007 12:27:50 +0000 (14:27 +0200)]
cfq-iosched: rework the whole round-robin list concept

Drawing on some inspiration from the CFS CPU scheduler design, overhaul
the pending cfq_queue concept list management. Currently CFQ uses a
doubly linked list per priority level for sorting and service uses.
Kill those lists and maintain an rbtree of cfq_queue's, sorted by when
to service them.

This unfortunately means that the ionice levels aren't as strong
anymore, will work on improving those later. We only scale the slice
time now, not the number of times we service. This means that latency
is better (for all priority levels), but that the distinction between
the highest and lower levels aren't as big.

The diffstat speaks for itself.

 cfq-iosched.c |  363 +++++++++++++++++---------------------------------
 1 file changed, 125 insertions(+), 238 deletions(-)

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocfq-iosched: minor updates
Jens Axboe [Tue, 17 Apr 2007 10:47:55 +0000 (12:47 +0200)]
cfq-iosched: minor updates

- Move the queue_new flag clear to when the queue is selected
- Only select the non-first queue in cfq_get_best_queue(), if there's
  a substantial difference between the best and first.
- Get rid of ->busy_rr
- Only select a close cooperator, if the current queue is known to take
  a while to "think".

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocfq-iosched: development update
Jens Axboe [Wed, 25 Apr 2007 10:44:27 +0000 (12:44 +0200)]
cfq-iosched: development update

- Implement logic for detecting cooperating processes, so we
  choose the best available queue whenever possible.

- Improve residual slice time accounting.

- Remove dead code: we no longer see async requests coming in on
  sync queues. That part was removed a long time ago. That means
  that we can also remove the difference between cfq_cfqq_sync()
  and cfq_cfqq_class_sync(), they are now indentical. And we can
  kill the on_dispatch array, just make it a counter.

- Allow a process to go into the current list, if it hasn't been
  serviced in this scheduler tick yet.

Possible future improvements including caching the cfqq lookup
in cfq_close_cooperator(), so we don't have to look it up twice.
cfq_get_best_queue() should just use that last decision instead
of doing it again.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocfq-iosched: improve preemption for cooperating tasks
Jens Axboe [Wed, 14 Feb 2007 18:59:49 +0000 (19:59 +0100)]
cfq-iosched: improve preemption for cooperating tasks

When testing the syslet async io approach, I discovered that CFQ
sometimes didn't perform as well as expected. cfq_should_preempt()
needs to better check for cooperating tasks, so fix that by allowing
preemption of an equal priority queue if the recently queued request
is as good a candidate for IO as the one we are currently waiting for.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years ago[POWERPC] Remove dev_dbg redefinition in drivers/ps3/vuart.c
Paul Mackerras [Mon, 30 Apr 2007 03:03:39 +0000 (13:03 +1000)]
[POWERPC] Remove dev_dbg redefinition in drivers/ps3/vuart.c

Commit 404d5b185b4eb56d6fa2f7bd27833f8df1c38ce4 changed the definition
of dev_dbg in the !DEBUG case from being a #define to being a static
inline.  There was code in drivers/ps3/vuart.c to do exactly that,
which fails to compile now.  This fixes it by removing the redefinition,
as the redefinition is now superfluous.

Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years agoMerge branch 'linux-2.6' into for-2.6.22
Paul Mackerras [Mon, 30 Apr 2007 02:38:01 +0000 (12:38 +1000)]
Merge branch 'linux-2.6' into for-2.6.22