Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

dccp: Initialisation framework for feature negotiation

This initialises feature negotiation from two tables, which are initialised
from sysctls.

As a novel feature, specifics of the implementation (e.g. currently short
seqnos and ECN are not supported) are advertised for robustness.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp ccid-2: Phase out the use of boolean Ack Vector sysctl

This removes the use of the sysctl and the minisock variable for the Send Ack
Vector feature, which is now handled fully dynamically via feature negotiation;
i.e. when CCID2 is enabled, Ack Vectors are automatically enabled (as per
RFC 4341, 4.).

Using a sysctl in parallel to this implementation would open the door to
crashes, since much of the code relies on tests of the boolean minisock /
sysctl variable. Thus, this patch replaces all tests of type

if (dccp_msk(sk)->dccpms_send_ack_vector)
/* ... */
with
if (dp->dccps_hc_rx_ackvec != NULL)
/* ... */

The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
negotiation concluded that Ack Vectors are to be used on the half-connection.
Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
so that the test is a valid one.

The activation handler for Ack Vectors is called as soon as the feature
negotiation has concluded at the
* server when the Ack marking the transition RESPOND => OPEN arrives;
* client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.

Adding the sequence number of the Response packet to the Ack Vector has been
removed, since
(a) connection establishment implies that the Response has been received;
(b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
this entry will always be ignored;
(c) it can not be used for anything useful - to detect loss for instance, only
packets received after the loss can serve as pseudo-dupacks.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Remove manual influence on NDP Count feature

Updating the NDP count feature is handled automatically now:
* for CCID-2 it is disabled, since the code does not use NDP counts;
* for CCID-3 it is enabled, as NDP counts are used to determine loss lengths.

Allowing the user to change NDP values leads to unpredictable and failing
behaviour, since it is then possible to disable NDP counts even when they
are needed (e.g. in CCID-3).

This means that only those user settings are sensible that agree with the
values for Send NDP Count implied by the choice of CCID. But those settings
are already activated by the feature negotiation (CCID dependency tracking),
hence this form of support is redundant.

At startup the initialisation of the NDP count feature is with the default
value of 0, which is done implicitly by the zeroing-out of the socket when
it is allocated. If the choice of CCID or feature negotiation enables NDP
count, this will then be updated via the NDP activation handler.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Remove obsolete parts of the old CCID interface

The TX/RX CCIDs of the minisock are now redundant: similar to the Ack Vector
case, their value equals initially that of the sysctl, but at the end of
feature negotiation may be something different.

The old interface removed by this patch thus has been replaced by the newer
interface to dynamically query the currently loaded CCIDs earlier in this
patch set.

Also removed the constructors for the TX CCID and the RX CCID, since the
switch rx/non-rx is done by the handler in minisocks.c (and the handler is
the only place in the code where CCIDs are loaded).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Clean up old feature-negotiation infrastructure

The code removed by this patch is no longer referenced or used, the added
lines update documentation and copyrights.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Integration of dynamic feature activation - part 3 (client side)

This integrates feature-activation in the client, with these details:

1. When dccp_parse_options() fails, the reset code is already set, request_sent
    _state_process() currently overrides this with `Packet Error', which is not
    intended - so changed to use the reset code set in dccp_parse_options();

2. There was a FIXME to change the error code when dccp_ackvec_add() fails.
    I have looked this up and found that:
    * the check whether ackno < ISN is already made earlier,
    * this Response is likely the 1st packet with an Ackno that the client gets,
    * so when dccp_ackvec_add() fails, the reason is likely not a packet error.

3. When feature negotiation fails, the socket should be marked as not usable,
    so that the application is notified that an error occurs. This is achieved
    by a new label, which uses an error code of `Aborted' and which sets the
    socket state to CLOSED, as well as sk_err.

4. Avoids parsing the Ack twice in Respond state by not doing option processing
    again in dccp_rcv_respond_partopen_state_process (as option processing has
    already been done on the request_sock in dccp_check_req).

Since this addresses congestion-control initialisation, a corresponding
FIXME has been removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Integration of dynamic feature activation - part 2 (server side)

This patch integrates the activation of features at the end of negotiation
into the server-side code.

Note:
  In dccp_create_openreq_child the request_sock argument is no longer constant,
  since dccp_activate_values() uses the feature-negotiation list on dreq to sort
  out the initialisation values for the different features of the child socket;
  and purges this queue after use (but the `req' argument to openreq_child
  can and does still remain constant).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Integration of dynamic feature activation - part 1 (socket setup)

This first patch out of three replaces the hardcoded default settings with
initialisation code for the dynamic feature negotiation.

Note on retransmitting Confirm options:
---------------------------------------
This patch also defers flushing the client feature-negotiation queue,
due to the following considerations.

As long as the client is in PARTOPEN, it needs to retransmit the Confirm
options for the Change options received on the DCCP-Response from the server.

Otherwise, if the packet containing the Confirm options gets dropped in the
network, the connection aborts due to undefined feature negotiation state.

Thanks to Leandro Melo de Sales who reported a bug in an earlier revision
of the patch set, resulting from not retransmitting the Confirm options.

The patch now ensures that the client feature-negotiation queue is flushed only
when entering the OPEN state. Since confirmed Change options are removed as
soon as they are confirmed (in the DCCP-Response), this ensures that Confirm
options are retransmitted.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Feature activation handlers

This patch provides the post-processing of feature negotiation state, after
the negotiation has completed.

To this purpose, handlers are used and added to the dccp_feat_table. Each
handler is passed a boolean flag whether the RX or TX side of the feature
is meant.

Several handlers are provided already, new handlers can easily be added.

The initialisation is now fully dynamic, i.e. CCIDs are activated only
after the feature negotiation. The integration of this dynamic activation
is done in the subsequent patches.

Thanks to Wei Yongjun for pointing out the necessity of skipping over empty
Confirm options while copying the negotiated feature values.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Processing Confirm options

Analogous to the previous patch, this adds code to interpret incoming Confirm
feature-negotiation options. Both functions operate on the feature-negotiation
list of either the request_sock (server) or the dccp_sock (client).

Thanks to Wei Yongjun for pointing out that it is overly restrictive to check
the entire list of confirmed SP values.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Process incoming Change feature-negotiation options

This adds/replaces code for processing incoming ChangeL/R options.
The main difference is that:
* mandatory FN options are now interpreted inside the function
(there are too many individual cases to do this externally);
* the function returns an appropriate Reset code or 0,
which is then used to fill in the data for the Reset packet.

Old code, which is no longer used or referenced, has been removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Preference list reconciliation

This provides two functions to
* reconcile preference lists (with appropriate return codes) and
* reorder the preference list if successful reconciliation changed the
preferred value.

The patch also removes the old code for processing SP/NN Change options, since
new code to process these is mostly there already; related references have been
commented out.

The code for processing Change options follows in the next patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Integrate feature-negotiation insertion code

The patch implements insertion of feature negotiation at the server (listening
and request socket) and the client (connecting socket).

In dccp_insert_options(), several statements have been grouped together now
to achieve (I hope) better efficiency by reducing the number of tests each
packet has to go through:
- Ack Vectors are sent if the packet is neither a Data or a Request packet;
- a previous issue is corrected - feature negotiation options are allowed
on DataAck packets (5.8).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Insert feature-negotiation options into skb

This patch replaces the earlier insertion routine from options.c, so that
code specific to feature negotiation can remain in feat.c. This is possible
by calling a function already existing in options.c.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Header option insertion routine for feature-negotiation

The patch extends existing code:
* Confirm options divide into the confirmed value plus an optional preference
list for SP values. Previously only the preference list was echoed for SP
values, now the confirmed value is added as per RFC 4340, 6.1;
* length and sanity checks are added to avoid illegal memory (or NULL) access.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Support for Mandatory options

Support for Mandatory options is provided by this patch, which will
be used by subsequent feature-negotiation patches.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>

dccp: Increase the scope of variable-length htonl/ntohl functions

This extends the scope of two available functions, encode|decode_value_var,
to work up to 6 (8) bytes, to match maximum requirements in the RFC.

These functions are going to be used both by general option processing and
feature negotiation code, hence declarations have been put into feat.h.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>

dccp: API to query the current TX/RX CCID

This provides function to query the current TX/RX CCID dynamically, without
reliance on the minisock value, using dynamic information available in the
currently loaded CCID module.

This query function is then used to
(a) provide the getsockopt part for getting/setting CCIDs via sockopts;
(b) replace the current test for "which CCID is in use" in probe.c.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Set per-connection CCIDs via socket options

With this patch, TX/RX CCIDs can now be changed on a per-connection basis, which
overrides the defaults set by the global sysctl variables for TX/RX CCIDs.

To make full use of this facility, the remaining patches of this patch set are
needed, which track dependencies and activate negotiated feature values.

Note on the maximum number of CCIDs that can be registered:
-----------------------------------------------------------
The maximum number of CCIDs that can be registered on the socket is constrained
by the space in a Confirm/Change feature negotiation option.

The space in these in turn depends on the size of header options as defined
in RFC 4340, 5.8. Since this is a recurring constant, it has been moved from
ackvec.h into linux/dccp.h, clarifying its purpose.

Relative to this size, the maximum number of CCID identifiers that can be
present in a Confirm option (which always consumes 1 byte more than a Change
option, cf. 6.1) is 2 bytes less than the maximum TLV size: one for the
CCID-feature-type and one for the selected value.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Tidy up setsockopt calls

This splits the setsockopt calls into two groups, depending on whether an
integer argument (val) is required and whether routines being called do
their own locking.

Some options (such as setting the CCID) use u8 rather than int, so that for
these the test with regard to integer-sizeof can not be used.

The second switch-case statement now only has those statements which need
locking and which make use of `val'.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Eugene Teo <eugeneteo@kernel.sg>

dccp: Deprecate Ack Ratio sysctl

This patch deprecates the Ack Ratio sysctl, since
* Ack Ratio is entirely ignored by CCID-3 and CCID-4,
* Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
* even if it would work in CCID-2, there is no point for a user to change it:
   - Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
   - if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts
     (since waiting for Acks which will never arrive in this window),
   - cwnd is not a user-configurable value.

The only reasonable place for Ack Ratio is to print it for debugging. It is
planned to do this later on, as part of e.g. dccp_probe.

With this patch Ack Ratio is now under full control of feature negotiation:
* Ack Ratio is resolved as a dependency of the selected CCID;
* if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
   the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
   Ratio 2 for both endpoints";
* what happens then is part of another patch set, since it concerns the
   dynamic update of Ack Ratio while the connection is in full flight.

Thanks to Tomasz Grobelny for discussion leading up to this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>

dccp: Feature negotiation for minimum-checksum-coverage

This provides feature negotiation for server minimum checksum coverage
which so far has been missing.

Since sender/receiver coverage values range only from 0...15, their
type has also been reduced in size from u16 to u4.

Feature-negotiation options are now generated for both sender and receiver
coverage, i.e. when the peer has `forgotten' to enable partial coverage
then feature negotiation will automatically enable (negotiate) the partial
coverage value for this connection.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Deprecate old setsockopt framework

The previous setsockopt interface, which passed socket options via struct
dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to
ugly code since the old approach did not distinguish between NN and SP values.

This patch removes the old setsockopt interface and replaces it with two new
functions to register NN/SP values for feature negotiation. These are
essentially wrappers around the internal __feat_register functions, with
checking added to avoid
* wrong usage (type);
* changing values while the connection is in progress.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Mechanism to resolve CCID dependencies

This adds a hook to resolve features whose value depends on the choice of
CCID. It is done at the server since it can only be done after the CCID
values have been negotiated; i.e. the client will add its CCID preference
list on the Change options sent in the Request, which will be reconciled
with the local preference list of the server.

The concept is documented on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\
implementation_notes.html#ccid_dependencies

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Resolve dependencies of features on choice of CCID

This provides a missing link in the code chain, as several features implicitly
depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector
feature, but also Ack Ratio and Send Loss Event Rate (also taken care of).

For Send Ack Vector, the situation is as follows:
* since CCID2 mandates the use of Ack Vectors, there is no point in allowing
   endpoints which use CCID2 to disable Ack Vector features such a connection;

* a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer
   with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4);

* for all other CCIDs, the use of (Send) Ack Vector is optional and thus
   negotiable. However, this implies that the code negotiating the use of Ack
   Vectors also supports it (i.e. is able to supply and to either parse or
   ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack
   Vector support), the use of Ack Vectors is here disabled, with a comment
   in the source code.

An analogous consideration arises for the Send Loss Event Rate feature,
since the CCID-3 implementation does not support the loss interval options
of RFC 4342. To make such use explicit, corresponding feature-negotiation
options are inserted which signal the use of the loss event rate option,
as it is used by the CCID3 code.

Lastly, the values of the Ack Ratio feature are matched to the choice of CCID.

The patch implements this as a function which is called after the user has
made all other registrations for changing default values of features.

The table is variable-length, the reserved (and hence for feature-negotiation
invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0'
is used to mark the end of the table.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Query supported CCIDs

This provides a data structure to record which CCIDs are locally supported
and three accessor functions:
- a test function for internal use which is used to validate CCID requests
made by the user;
- a copy function so that the list can be used for feature-negotiation;
- documented getsockopt() support so that the user can query capabilities.

The data structure is a table which is filled in at compile-time with the
list of available CCIDs (which in turn depends on the Kconfig choices).

Using the copy function for cloning the list of supported CCIDs is useful for
feature negotiation, since the negotiation is now with the full list of available
CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation
will not fail if the peer requests to use CCID3 instead of CCID2.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Registration routines for changing feature values

Two registration routines, for SP and NN features, are provided by this patch,
replacing a previous routine which was used for both feature types.

These are internal-only routines and therefore start with `__feat_register'.

It further exports the known limits of Sequence Window and Ack Ratio as symbolic
constants.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Limit feature negotiation to connection setup phase

This patch starts the new implementation of feature negotiation:
1. Although it is theoretically possible to perform feature negotiation at any
    time (and RFC 4340 supports this), in practice this is prohibitively complex,
    as it requires to put traffic on hold for each new negotiation.
2. As a byproduct of restricting feature negotiation to connection setup, the
    feature-negotiation retransmit timer is no longer required. This part is now
    mapped onto the protocol-level retransmission.
    Details indicating why timers are no longer needed can be found on
    http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\
                                      implementation_notes.html

This patch disables anytime negotiation, subsequent patches work out full
feature negotiation support for connection setup.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Cleanup routines for feature negotiation

This inserts the required de-allocation routines for memory allocated by
feature negotiation in the socket destructors, replacing dccp_feat_clean()
in one instance.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Per-socket initialisation of feature negotiation

This provides feature-negotiation initialisation for both DCCP sockets and
DCCP request_sockets, to support feature negotiation during connection setup.

It also resolves a FIXME regarding the congestion control initialisation.

Thanks to Wei Yongjun for help with the IPv6 side of this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: List management for new feature negotiation

This adds list fields and list management functions for the new feature
negotiation implementation. The new code is kept in parallel to the old
code, until removed at the end of the patch set.

Thanks to Arnaldo for suggestions to improve the code.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Implement lookup table for feature-negotiation information

A lookup table for feature-negotiation information, extracted from RFC 4340/42,
is provided by this patch. All currently known features can be found in this
table, along with their feature location, their default value, and type.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp: Basic data structure for feature negotiation

This patch prepares for the new and extended feature-negotiation routines.

The following feature-negotiation data structures are provided:
* a container for the various (SP or NN) values,
* symbolic state names to track feature states,
* an entry struct which holds all current information together,
* elementary functions to fill in and process these structures.

Entry structs are arranged as FIFO for the following reason: RFC 4340 specifies
that if multiple options of the same type are present, they are processed in the
order of their appearance in the packet; which means that this order needs to be
preserved in the local data structure (the later insertion code also respects
this order).

The struct list_head has been chosen for the following reasons: the most
frequent operations are
* add new entry at tail (when receiving Change or setting socket options);
* delete entry (when Confirm has been received);
* deep copy of entire list (cloning from listening socket onto request socket).

The NN value has been set to 64 bit, which is a currently sufficient upper limit
(Sequence Window feature has 48 bit).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

dccp ccid-3: Replace lazy BUG_ON with condition

The BUG_ON(w_tot == 0) only holds if there is no more than 1 loss interval in
the loss history. If there is only a single loss interval, the calc_i_mean()
routine need in fact not be called (RFC 3448, 6.3.1).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Toggle debug output without module unloading

This sets the sysfs permissions so that root can toggle the `debug'
parameter available for nearly every DCCP module. This is useful
since there are various module inter-dependencies. The debug flag
can now be toggled at runtime using

  echo 1 > /sys/module/dccp/parameters/dccp_debug
  echo 1 > /sys/module/dccp_ccid2/parameters/ccid2_debug
  echo 1 > /sys/module/dccp_ccid3/parameters/ccid3_debug
  echo 1 > /sys/module/dccp_tfrc_lib/parameters/tfrc_debug

The last is not very useful yet, since no code at the moment calls
the tfrc_debug() macro.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Empty the write queue when disconnecting

dccp_disconnect() can be called due to several reasons:

1. when the connection setup failed (inet_stream_connect());
2. when shutting down (inet_shutdown(), inet_csk_listen_stop());
3. when aborting the connection (dccp_close() with 0 linger time).

In case (1) the write queue is empty. This patch empties the write queue,
if in case (2) or (3) it was not yet empty.

This avoids triggering the write-queue BUG_TRAP in sk_stream_kill_queues()
later on.

It also seems natural to do: when breaking an association, to delete all
packets that were originally intended for the soon-disconnected end (compare
with call to tcp_write_queue_purge in tcp_disconnect()).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Fill in the Data fields for "Option Error" Resets

This updates the use of the `out_invalid_option' label, which produces a
Reset (code 5, "Option Error"), to fill in the Data1...Data3 fields as
specified in RFC 4340, 5.6.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Silently ignore options with nonsensical lengths

This updates the option-parsing code with regard to RFC 4340, 5.8:
"[..] options with nonsensical lengths (length byte less than two or more
  than the remaining space in the options portion of the header) MUST be
  ignored, and any option space following an option with nonsensical length
  MUST likewise be ignored."

Hence in the following cases erratic options will be ignored:
1. The type byte of a multi-byte option is the last byte of the header
    options (i.e. effective option length of 1).
2. The value of the length byte is less than the minimum 2. This has been
    changed from previously 3: although no multi-byte option with a length
    less than 3 yet exists (cf. table 3 in 5.8), a length of 2 is valid.
    (The switch-statement in dccp_parse has further per-option length checks.)
3. The option length exceeds the length of the remaining option space.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Always generate a Reset in response to option errors

RFC4340 states that if a packet is received with an option error (such as a
Mandatory Option as the last byte of the option list), the endpoint should
repond with a Reset.

In the LISTEN and RESPOND states, the endpoint correctly reponds with Reset,
while in the REQUEST/OPEN states, packets with option errors are just ignored.

The packet sequence is as follows:

Case 1:

  Endpoint A                           Endpoint B
  (CLOSED)                             (CLOSED)

               <----------------       REQUEST

  RESPONSE     ----------------->      (*1)
  (with invalid option)
               <----------------       RESET
                                       (with Reset Code 5, "Option Error")

  (*1) currently just ignored, no Reset is sent

Case 2:

  Endpoint A                           Endpoint B
  (OPEN)                               (OPEN)

  DATA-ACK     ----------------->      (*2)
  (with invalid option)
               <----------------       RESET
                                       (with Reset Code 5, "Option Error")

  (*2) currently just ignored, no Reset is sent

This patch fixes the problem, by generating a Reset instead of silently
ignoring option errors.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

Merge branch 'davem-fixes' of /linux/kernel/git/jgarzik/netdev-2.6

bnx2x: Accessing un-mapped page

The allocated RX buffer size was 64 bytes bigger than the PCI mapped
size with no good reason. If the packet was actually using the buffer up
to its limit and if the last 64 bytes of the buffer crossed 4KB boundary
then an unmapped PCI page was accessed. The fix is to use only one
parameter for the buffer size - there is no need to differentiate
between the buffer size and the PCI mapping size since the extra 64
bytes can actually be used by the FW to align the Ethernet payload to
64 bytes.

Also updating the driver version and date

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-2.6

ath9k: Fix TX control flag use for no ACK and RTS/CTS

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

ath9k: Fix TX status reporting

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

iwlwifi: fix STATUS_EXIT_PENDING is not set on pci_remove

This patch sets STATUS_EXIT_PENDING on pci_remove. Otherwise
iwl4965_down may fail to uninitialize the driver.

Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

iwlwifi: call apm stop on exit

This patch calls apm stop on exit and suspend. Without this patch
hardware consumes power even after driver is removed or suspended.

Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

iwlwifi: fix Tx cmd memory allocation failure handling

This patch "iwlwifi: do not use GFP_DMA in iwl_tx_queue_init" removes
GFP_DMA from allocation tx command buffers. GFP_DMA allows allocation
only for memory under 16M which causes allocation problems
suspend/resume flows.

Using kmalloc is temporal solution and some consistent/coherent
allocation schema will be more correct. Since iwlwifi hardware
supports 64bit address this solution should work on x86 (32 and
64bit) for now.

This patch fixes memory freeing problem in the previous patch.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Ian Schram <ischram@telenet.be>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

iwlwifi: fix rx_chain computation

This patch fixes rx_chain computation. The code that adjusts number of
rx chains to number supported by HW was missing. Miss configuration
causes firmware error. Note: iwlwifi supports HW with up to 3 RX
chains (2x2, 2x3, 1x2, and 3x3 MIMO). This patch also simplifies the
whole RX chain computation.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

iwlwifi: fix station mimo power save values

This patch fixes the wrong use MIMO power save values. Our TX was
configured with our MIMO power save values instead of peer's MIMO power
save values, this may affect connectivity. The peer STA/AP may not sense
our traffic at all as it doesn't have all RX chains opened.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

iwlwifi: remove false rxon if rx chain changes

Rx chain might change during power save transitions but it doesn't
require sending Full-ROXN command to the firmware. Full-RXON requires
reconnection to an AP and thus affects user experience. The patch
avoids the Full-RXON by removing the rx_chain modification check in
iwl_full_rxon_required function.

Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

iwlwifi: fix hidden ssid discovery in passive channels

This enables sending of direct probes on passive channels, as long as
traffic was detected on that channel. This enables connectivity to
hidden/non broadcasting SSIDs APs on passive channels. Note 5000 HW
declares all 5.2 spectrum as passive.

Signed-off-by: Cahill Ben <ben.m.cahill@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

iwlwifi: W/A for the TSF correction in IBSS

This patch is a W/A for the TSF sync issue in IBSS merging. HW is not
capable to sync TSF (it's constantly little behind). This creates
constant IBSS merging upon reception of each beacon, adding and removing
station which in turn creates above 50% packet loss and thus dramatically
degrade the throughput. The W/A simply stops the driver from declaring it
has a reliable TSF value and thus eliminates IBSS merging.

Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

netxen: Remove workaround for chipset quirk

Remove chipset-specific quirk workaround; the workaround caused
unrecoverable DMA lockups when the driver was loaded following a
PXE boot.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: Michael Brown <mbrown@fensystems.co.uk>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

pcnet-cs, axnet_cs: add new IDs, remove dup ID with less info

pcnet_cs:
    add new ID: "corega Ether PCC-TD".
    remove duplicate ID: "IC-CARD".

axnet_cs:
    add new ID: "IO DATA ETXPCM".

Signed-off-by: Komuro <komurojun-mbn@nifty.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ixgbe: initialize interrupt throttle rate

This commit dropped the setting of the default interrupt throttle rate.

commit 021230d40ae0e6508d6c717b6e0d6d81cd77ac25
Author: Ayyappan Veeraiyan <ayyappan.veeraiyan@intel.com>
Date:   Mon Mar 3 15:03:45 2008 -0800

    ixgbe: Introduce MSI-X queue vector code

The following patch adds it back.  Without this the default value of 0
causes the performance of this card to be awful.  Restoring these to the
default values yields much better performance.

This regression has been around since 2.6.25.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: stable@kernel.org [2.6.25 and later]
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

net/usb/pegasus: avoid hundreds of diagnostics

Make the "pegasus" driver scream less loudly in the face of
problems as it initializes, avoiding hundreds of messages:

- ratelimit some key error messages
- avoid some spurious diagnostics caused by strange codeflow

And fix one instance of goofy indentation.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

tipc: Don't use structure names which easily globally conflict.

Andrew Morton reported a build failure on sparc32, because TIPC
uses names like "struct node" and there is a like named data
structure defined in linux/node.h

This just regexp replaces "struct node*" to "struct tipc_node*"
to avoid this and any future similar problems.

Signed-off-by: David S. Miller <davem@davemloft.net>

ipsec: Fix deadlock in xfrm_state management.

Ever since commit 4c563f7669c10a12354b72b518c2287ffc6ebfb3
("[XFRM]: Speed up xfrm_policy and xfrm_state walking") it is
illegal to call __xfrm_state_destroy (and thus xfrm_state_put())
with xfrm_state_lock held. If we do, we'll deadlock since we
have the lock already and __xfrm_state_destroy() tries to take
it again.

Fix this by pushing the xfrm_state_put() calls after the lock
is dropped.

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv: Re-enable IP when MTU > 68

Re-enable IP when the MTU gets back to a valid size.

This patch just checks if the in_dev is NULL on a NETDEV_CHANGEMTU event
and if MTU is valid (bigger than 68), then re-enable in_dev.

Also a function that checks valid MTU size was created.

Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/xfrm: Use an IS_ERR test rather than a NULL test

In case of error, the function xfrm_bundle_create returns an ERR
pointer, but never returns a NULL pointer. So a NULL test that comes
after an IS_ERR test should be deleted.

The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@match_bad_null_test@
expression x, E;
statement S1,S2;
@@
x = xfrm_bundle_create(...)
... when != x = E
* if (x != NULL)
S1 else S2
// </smpl>

Signed-off-by: Julien Brunel <brunel@diku.dk>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

ath9: Fix ath_rx_flush_tid() for IRQs disabled kernel warning message.

This patch addresses an issue with the locking order. ath_rx_flush_tid()
uses spin_lock/unlock_bh when IRQs are disabled in sta_notify by mac80211.

As node clean up is still pending with ath9k and this problematic portion
of the code is expected to change anyway, thinking of a proper fix may not
be worthwhile. So having this interim fix helps the users to get rid of the
kernel warning message.

Pasted the kernel warning message for reference.

kernel: ath0: No ProbeResp from current AP 00:1b:11:60:7a:3d - assume out of range
kernel: ------------[ cut here ]------------
kernel: WARNING: at kernel/softirq.c:136 local_bh_enable+0x3c/0xab()
kernel: Pid: 1029, comm: ath9k Not tainted 2.6.27-rc4-wt-w1fi-wl
kernel:
kernel: Call Trace:
kernel:  [<ffffffff802278d8>] warn_on_slowpath+0x51/0x77
kernel:  [<ffffffff80224c51>] check_preempt_wakeup+0xf3/0x123
kernel:  [<ffffffff80239658>] autoremove_wake_function+0x9/0x2e
kernel:  [<ffffffff8022c281>] local_bh_enable+0x3c/0xab
kernel:  [<ffffffffa01ab75a>] ath_rx_node_cleanup+0x38/0x6e [ath9k]
kernel:  [<ffffffffa01b2280>] ath_node_detach+0x3b/0xb6 [ath9k]
kernel:  [<ffffffffa01ab09f>] ath9k_sta_notify+0x12b/0x165 [ath9k]
kernel:  [<ffffffff802366cf>] queue_work+0x1d/0x49
kernel:  [<ffffffffa018c3fc>] add_todo+0x70/0x99 [mac80211]
kernel:  [<ffffffffa017de76>] __sta_info_unlink+0x16b/0x19e [mac80211]
kernel:  [<ffffffffa017e6ed>] sta_info_unlink+0x18/0x43 [mac80211]
kernel:  [<ffffffffa0182732>] ieee80211_associated+0xaa/0x16d [mac80211]
kernel:  [<ffffffffa0184a1a>] ieee80211_sta_work+0x4fb/0x6b4 [mac80211]
kernel:  [<ffffffff80469c58>] thread_return+0x30/0xa9
kernel:  [<ffffffffa018451f>] ieee80211_sta_work+0x0/0x6b4 [mac80211]
kernel:  [<ffffffff802362c2>] run_workqueue+0xb1/0x17a
kernel:  [<ffffffff80236be9>] worker_thread+0xd0/0xdb
kernel:  [<ffffffff8023964f>] autoremove_wake_function+0x0/0x2e
kernel:  [<ffffffff80236b19>] worker_thread+0x0/0xdb
kernel:  [<ffffffff8023954a>] kthread+0x47/0x75
kernel:  [<ffffffff80223121>] schedule_tail+0x18/0x50
kernel:  [<ffffffff8020bc49>] child_rip+0xa/0x11
kernel:  [<ffffffff80239503>] kthread+0x0/0x75
kernel:  [<ffffffff8020bc3f>] child_rip+0x0/0x11
kernel:
kernel: ---[ end trace e9bb5da661055827 ]---

Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

ath9k: Incorrect key used when group and pairwise ciphers are different.

Updating sc_keytype multiple times when groupwise and pairwise
ciphers are different results in incorrect pairwise key type
assumed for TX control and normal ping fails. This works fine
for cases where both groupwise and pairwise ciphers are same.

Also use mac80211 provided enums for key length calculation.

Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Compiler warning unmasked by fix of BUILD_BUG_ON

A "Set" to a sign-bit in an "&" operation causes a compiler warning.
Make calculations unsigned.

[ The warning was masked by the old definition of BUILD_BUG_ON() ]

Also remove __builtin_constant_p from FIELD_CHECK since BUILD_BUG_ON
no longer permits non-const values.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

mac80211: Fix debugfs union misuse and pointer corruption

debugfs union in struct ieee80211_sub_if_data is misused by including a
common default_key dentry as a union member. This ends occupying the same
memory area with the first dentry in other union members (structures;
usually drop_unencrypted). Consequently, debugfs operations on
default_key symlinks and drop_unencrypted entry are using the same
dentry pointer even though they are supposed to be separate ones. This
can lead to removing entries incorrectly or potentially leaving
something behind since one of the dentry pointers gets lost.

Fix this by moving the default_key dentry to a new struct
(common_debugfs) that contains dentries (more to be added in future)
that are shared by all vif types. The debugfs union must only be used
for vif type-specific entries to avoid this type of pointer corruption.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

wireless/libertas/if_cs.c: fix memory leaks

The leak in if_cs_prog_helper() is obvious.

It looks a bit as if not freeing "fw" in if_cs_prog_real() was done
intentionally, but I'm not seeing why it shouldn't be freed.

Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

orinoco: Multicast to the specified addresses

When multicasting the driver sets the number of group addresses using
the count from the previous set multicast command. In general this means
you have to set the multicast addresses twice to get the behaviour you
want.

If we were multicasting, and reduce the number of addresses we are
multicasting to, then the driver would write uninitialised data from the
stack into the group addresses to multicast to.

Only write the multicast addresses we have specifically set.

Signed-off-by: David Kilroy <kilroyd@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

iwlwifi: fix 64bit platform firmware loading

This patch fixes loading firmware from memory above 32bit.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Marcel Holtmann <holtmann@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

iwlwifi: fix apm_stop (wrong bit polarity for FLAG_INIT_DONE)

The patch fixes CSR_GP_CNTRL_REG_FLAG_INIT_DONE was set instead of
cleared which disabled moving device to D0U state.

Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

iwlwifi: workaround interrupt handling no some platforms

This patch adds workaround for an interrupt related hardware bug on
some platforms. (Apparently these platforms boot-up w/ INTX_DISABLED
set. -- JWL)

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

iwlwifi: do not use GFP_DMA in iwl_tx_queue_init

GFP_DMA is not necessary for the iwlwifi hardware and it can cause
allocation failures and/or invoke the OOM killer on lots of systems.

For reference:

https://bugzilla.redhat.com/show_bug.cgi?id=459709

Signed-off-by: John W. Linville <linville@tuxdriver.com>

net/wireless/Kconfig: clarify the description for CONFIG_WIRELESS_EXT_SYSFS

Current setup with hal and NetworkManager will fail to work
without newest hal version with this config option disabled.

Although this will solve itself by time, at the moment it is
dishonest to say that we don't know any software that uses it,
if there are many many people relying on old hal versions.

Signed-off-by: Florian Mickler <florian@mickler.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

net: Unbreak userspace usage of linux/mroute.h

Nothing in linux/pim.h should be exported to userspace.

This should fix the XORP build failure reported by
Jose Calhariz, the debain package maintainer.

Nothing originally in linux/mroute.h was exported to userspace
ever, but some of this stuff started to be when it was moved into
this new linux/pim.h, and that was wrong. If we didn't provide these
definitions for 10 years we can reasonably expect that applications
defined this stuff locally or used GLIBC headers providing the
protocol definitions. And as such the only result of this can
be conflict and userland build breakage.

Signed-off-by: David S. Miller <davem@davemloft.net>

pkt_sched: Fix locking of qdisc_root with qdisc_root_sleeping_lock()

Use qdisc_root_sleeping_lock() instead of qdisc_root_lock() where
appropriate. The only difference is while dev is deactivated, when
currently we can use a sleeping qdisc with the lock of noop_qdisc.
This shouldn't be dangerous since after deactivation root lock could
be used only by gen_estimator code, but looks wrong anyway.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: When we droped a packet, we should return NET_RX_DROP instead of 0

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: fix random memory dereference with SCTP_HMAC_IDENT option.

The number of identifiers needs to be checked against the option
length. Also, the identifier index provided needs to be verified
to make sure that it doesn't exceed the bounds of the array.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: correct bounds check in sctp_setsockopt_auth_key

The bonds check to prevent buffer overlflow was not exactly
right. It still allowed overflow of up to 8 bytes which is
sizeof(struct sctp_authkey).

Since optlen is already checked against the size of that struct,
we are guaranteed not to cause interger overflow either.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

wan: Missing capability checks in sbni_ioctl()

There are missing capability checks in the following code:

1300 static int
1301 sbni_ioctl( struct net_device  *dev,  struct ifreq  *ifr,  int  cmd)
1302 {
[...]
1319     case  SIOCDEVRESINSTATS :
1320         if( current->euid != 0 )    /* root only */
1321             return  -EPERM;
[...]
1336     case  SIOCDEVSHWSTATE :
1337         if( current->euid != 0 )    /* root only */
1338             return  -EPERM;
[...]
1357     case  SIOCDEVENSLAVE :
1358         if( current->euid != 0 )    /* root only */
1359             return  -EPERM;
[...]
1372     case  SIOCDEVEMANSIPATE :
1373         if( current->euid != 0 )    /* root only */
1374             return  -EPERM;

Here's my proposed fix:

Missing capability checks.

Signed-off-by: Eugene Teo <eugeneteo@kernel.sg>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'no-iwlwifi' of git://git./linux/kernel/git/linville/wireless-2.6

Merge branch 'davem-fixes' of /linux/kernel/git/jgarzik/netdev-2.6

e100, fix iomap read

There were 2 omitted readb's used on an iomap space. eliminate them
by using ioread8 instead.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

qeth: preallocated header account offset

When a preallocated header qdio buffer is filled we have to account
the offset for the data length.

Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

qeth: l2 write unicast list to hardware

In case the netdev unicast list contains additional entries we have
to register/deregister them.

Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

qeth: use -EOPNOTSUPP instead of -ENOTSUPP.

return value -ENOTSUPP is not valid in userspace context, use
-EOPNOTSUPP instead.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ibm_newemac: Don't call dev_mc_add() before device is registered

We must not call dev_mc_add() from within our HW configure which happens
before we initialize and register the netdev. Do it in open() instead.

Thanks to Sebastian Siewior for tracking it down.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

net: don't grab a mutex within a timer context in gianfar

I got the following backtrace while network was unavailble:

|NETDEV WATCHDOG: eth0: transmit timed out
|BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:87
|in_atomic():1, irqs_disabled():0
|Call Trace:
|[c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable)
|[c0383db0] [c001e938] __might_sleep+0xe0/0xf4
|[c0383dc0] [c025a43c] mutex_lock+0x24/0x3c
|[c0383de0] [c019005c] phy_stop+0x20/0x70
|[c0383df0] [c018d4ec] stop_gfar+0x28/0xf4
|[c0383e10] [c018e8c4] gfar_timeout+0x30/0x60
|[c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144
|[c0383e30] [c002f93c] run_timer_softirq+0x148/0x1c8
|[c0383e60] [c002b084] __do_softirq+0x5c/0xc4
|[c0383e80] [c00046fc] do_softirq+0x3c/0x54
|[c0383e90] [c002ac60] irq_exit+0x3c/0x5c
|[c0383ea0] [c000b378] timer_interrupt+0xe0/0xf8
|[c0383ec0] [c000e5ac] ret_from_except+0x0/0x18
|[c0383f80] [c000804c] cpu_idle+0xcc/0xdc
|[c0383fa0] [c025c07c] etext+0x7c/0x90
|[c0383fc0] [c0338960] start_kernel+0x294/0x2a8
|[c0383ff0] [c00003dc] skpinv+0x304/0x340
|------------[ cut here ]------------

The phylock was once a spinlock but got changed into a mutex via
commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping]

Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

forcedeth: fix checksum flag

Fix the checksum feature advertised in device flags. The hardware support
TCP/UDP over IPv4 and TCP/UDP over IPv6 (without IPv6 extension headers).
However, the kernel feature flags do not distinguish IPv6 with/without
extension headers.

Therefore, the driver needs to use NETIF_F_IP_CSUM instead of
NETIF_F_HW_CSUM since the latter includes all IPv6 packets.

A future patch can be created to check for extension headers and perform
software checksum calculation.

Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Manfred Spraul <manfred@colorfullife.com
Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

net/usb/mcs7830: add set_mac_address

Implement set_mac_address for mcs7830. This enables me to use it with my
cable modem.

Signed-off-by: Oliver Martin <oliver.martin@student.tuwien.ac.at>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

net/usb/mcs7830: new device IDs

This adds USB device IDs for MosChip 7730 and Sitecom LN030
to the mcs7830 driver. The IDs have been reported to work without
further modifications.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David Brownell <david-b@pacbell.net>
Cc: Viktor Horvath <ViktorHorvath@gmx.net>
Cc: Robbert Wethmar <robbert@wethmar.nl>
Cc: Bart van der Klip <bklip@xs4all.nl>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

[netdrvr] smc91x: fix resource removal (null ptr deref)

Properly handle resource cleanup on unplug/exit.

Spotted by Jonathan Cameron

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ibmveth: fix bad UDP checksums

This patch fixes a ibmveth bug where bad UDP checksums are being transmitted
when checksum offloading is enabled.
The hypervisor does checksum offloading only on TCP packets, so ibmveth calls
skb_checksum_help() for any other protocol. The bug happens because
the packet is being modified after the DMA map, so we would need a memory
barrier before making the hypervisor call. Reordering the code so that the
DMA map happens after skb_checksum_help() has the additional advantage of
fixing a DMA map leak if skb_checksum_help() where to fail.

Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

[netdrvr] hso: dev_kfree_skb crash fix

Fixes dev_kfree_skb happening too many times when hso_start_net_device
is called from hso_resume.

Signed-off-by: Denis Joseph Barrow <D.Barow@option.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

[netdrvr] hso: icon 322 detection fix

Fixes Icon-322 detection.

Signed-off-by: Denis Joseph Barrow <D.Barow@option.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

atl1: disable TSO by default

The atl1 driver is causing stalled connections and file corruption
whenever TSO is enabled. Two examples are here:

http://lkml.org/lkml/2008/7/15/325
http://lkml.org/lkml/2008/8/18/543

Disable TSO by default until we can determine the source of the
problem.

Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net>
cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

atl1e: multistatement if missing braces

Doesn't cause problems (yet) because err gets zeroed earlier.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

igb: remove 82576 quad adapter

Disable support for device 8086:10E8. Currently the result of loading the
driver with the device present causes system instability.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

drivers/net/skfp/ess.c: fix compile warnings

CC [M] drivers/net/skfp/ess.o
drivers/net/skfp/ess.c: In function 'ess_send_response':
drivers/net/skfp/ess.c:513: warning: cast from pointer to integer of different size
drivers/net/skfp/ess.c: In function 'ess_send_alc_req':
drivers/net/skfp/ess.c:609: warning: cast from pointer to integer of different size
drivers/net/skfp/ess.c:639: warning: cast from pointer to integer of different size

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ipv4: mode 0555 in ipv4_skeleton

vpnc on today's kernel says Cannot open "/proc/sys/net/ipv4/route/flush":
d--------- 0 root root 0 2008-08-26 11:32 /proc/sys/net/ipv4/route
d--------- 0 root root 0 2008-08-26 19:16 /proc/sys/net/ipv4/neigh

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: fix tcp header size miscalculation when window scale is unused

The size of the TCP header is miscalculated when the window scale ends
up being 0. Additionally, this can be induced by sending a SYN to a
passive open port with a window scale option with value 0.

Signed-off-by: Philip Love <love_phil@emc.com>
Signed-off-by: Adam Langley <agl@imperialviolet.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

pkt_sched: Fix sch_tree_lock()

Use new qdisc_root_sleeping_lock() instead of qdisc_root_lock() as
sch_tree_lock() because this lock could be used while dev is
deactivated, but we never need to use this with noop_qdisc as a root.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pkt_sched: Fix gen_estimator locks

While passing a qdisc root lock to gen_new_estimator() and
gen_replace_estimator() dev could be deactivated or even before
grafting proper root qdisc as qdisc_sleeping (e.g. qdisc_create), so
using qdisc_root_lock() is not enough. This patch adds
qdisc_root_sleeping_lock() for this, plus additional checks, where
necessary.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>