/* @(#)WhirlpoolMD_64.java	3.0	2005-03-01
 * This file was freely contributed to the LimeWire project and is covered
 * by its existing GPL licence, but it may be used individually as a public
 * domain implementation of a published engine (see below for references).
 * It was also freely contributed to the Bitzi public domain sources.
 * @author Philippe Verdy
 */

/* Sun may wish to change the following package name, if integrating this
 * class in the Sun JCE Security Provider for Java.
 *
 * You can include it in your own Security Provider by inserting this property
 * in your Provider-derived class:
 * put("MessageDigest.Whirlpool", "org.rodage.pub.java.security.WhirlpoolMD_64");
 */
package org.rodage.pub.java.security;

import java.security.DigestException;
import java.security.MessageDigestSpi;
//--+---+1--+---+--2+---+---+3--+---+--4+---+---+5--+---+--6+---+---+7--+---+--
//34567890123456789012345678901234567890123456789012345678901234567890123456789

/**
 * The Whirlpool 3.0 hashing function (64-bit implementation for JCE SPI).
 *
 * <p>This class is a highly optimized Java implementation of the WhirlpoolMD_64
 * 512-bit message digest algorithm, based on the sample Java and C code, and
 * documentation, published by its authors at the European NESSIE cryptographic
 * workshop as a recommanded algorithm (along with SHA256 and SHA512).</p>
 *
 * <p>The algorithm allows digesting data files with sizes up to
 * 2<sup>256</sup>-1 bits, i.e. more than 4.5*10<sup>74</sup> bytes, that can
 * be feeded by data chunks up to 2 GB in this implementation.</p>
 *
 * <p>Note that this class uses 64-bit arithmetic and large lookup tables,
 * which should work best on the newest desktop processors with large caches.
 * A 32-bit implementation is also available which uses tables that are twice
 * smaller, and that will run with less than 5% additional cycles per digested
 * byte on these modern 64-bit (AMD64, Pentium 4-6) or 32-bit processors
 * (Pentium 4 or M, Athlon), but that may run faster on cheaper processors with
 * smaller data caches and slow memory bus (see <code>WhirlpoolMD_32</code>).</p>
 *
 * <p><i>May be in the future, this class will be part of the standard JCE
 * (Java Cryptography Environment) bundled with Java Runtime 1.5 or later.
 * For now Java 1.4 (code named Merlin), and 1.5-beta do not have this digest
 * algorithm, and only includes Adler32 or CRC32 (in java.util.zip),
 * and MD4, MD5 or SHA1 (in javax.crypto.Mac, included in the SUN JCE),
 * and SHA256 or SHA512 (included in the SUN JCE, since Java 1.5-beta).</i></p>
 *
 * <p><b>Advantages:</b></p>
 *
 * <p>WhirlpoolMD_64 is much more scalable than most modern hashing functions. Even
 * though is not specifically oriented toward any platform, it is rather
 * efficient on many of them, its structure favouring extensively parallel
 * execution of the component mappings. At the same time, it does not require
 * excessive storage space (either for code or for tables), and can therefore
 * be efficiently implemented in quite constrained environments like smart
 * cards, although it can benefit from larger cache memory available on modern
 * processors to achieve higher performance.</p>
 *
 * <p>It does not use expensive or unusual instructions that must be built in
 * the processor. The mathematical simplicity of the primitive resulting from
 * the design strategy tends to make analysis easier.</p>
 *
 * <p>And finally, it has a very long hash length; this not only provides
 * increased protection against birthday attacks, but also offers a larger
 * internal state for entropy containment, as is needed for certain classes of
 * pseudo-random number generators (See: J. Kelsey, B. Schneier, D. Wagner,
 * and C. Hall, "Cryptanalytic Attacks on Pseudorandom Number Generators",
 * Fast Software Encryption – FSE’98, Lecture Notes in Computer Science,
 * vol. 1372, Springer-Verlag, 1998, pp. 168–188.)</p>
 *
 * <p><b>Expected Strength:</b></p>
 *
 * <p>Assume we take as digest result the value of any n-bit substring of the
 * full 512-bit WhirlpoolMD_64 output. Then:</p>
 * <ul>
 * <li>The expected workload of generating a collision pair is of the order of
 *     2<sup>n</sup>/2 executions of WhirlpoolMD_64 (i.e. the maximum expected order
 *     for an attack based on the "Birthday paradox" theorem applied on any
 *     cryptographically strong n-bit message digest algorithm).</li>
 * <li>Given an n-bit value, the expected workload of finding a message that
 *     digests to that value is of the order of 2<sup>n</sup> executions of
 *     WhirlpoolMD_64.</li>
 * <li>Given a message and its n-bit digest result, the expected workload of
 *     finding a second message that digests to the same value is of the order
 *     of 2<sup>n</sup> executions of WhirlpoolMD_64.</li>
 * </ul>
 *
 * <p>Moreover, it is infeasible to detect systematic correlations between any
 * linear combination of input bits and any linear combination of bits of the
 * hash result. It is also infeasible to predict what bits of the hash result
 * will change value when certain input bits are flipped, i.e. WhirlpoolMD_64 is
 * resistant against differential attacks.</p>
 *
 * <p>These claims result from the considerable safety margin taken with
 * respect to all known attacks (in 2004). We do however realise that it is
 * impossible to make nonspeculative statements on things unknown.</p>
 *
 * <p>All these mean that it is possible to use truncated digest results of
 * this 512-bit algorithm in applications that need a shorter message digest,
 * whilst also keeping the cryptographic strength. For example, this algorithm
 * can be used as well to generate 256-bit message digests (instead of SHA256),
 * or 192-bit message digests (instead of Tiger, which runs slower and requires
 * 64-bit arithmetic support) or 160-bit message digests (instead of using SHA1,
 * whose cryptographic strength is now less secure than WhirlpoolMD_64).</p>
 *
 * <p><b>Summary Description of the Algorithm:</b></p>
 *
 * <p>The WhirlpoolMD_64 input is a suite of 64-bytes blocks built from the input
 * data (using bit-big-endian bytes). The last block of a digested message is
 * padded with a required bit which must be set to 1, the remaining padding
 * bits are set to 0, except for the last 256 bits of the last block which
 * encodes in big-endian order the total bitlength of the hashed source data.</p>
 *
 * <p>Each 64-byte block is internally transformed in 10 successive rounds
 * based on a dedicated block-cipher, which operates on a 512-bit hash state
 * using a chained key state, both derived from the input data. The
 * block-cipher function uses a proven strong non-linear substitution box on
 * each byte of the 8x8 data matrix, a cyclical permutation that disperse the
 * bytes of each row among all rows, a linear diffusion layer to mix the bytes
 * in each state row, and a exor key addition using the chained key state and
 * distinct round constants.</p>
 *
 * <p>It then computes a 512-bit digest based on the proven strong compression
 * Miyaguchi-Preneel hashing scheme for successive blocks. The final digest is
 * considered at least as strong, but simpler to compute than 512-bit SHA-512.
 * The algorithm itself is not sensitive to endianness of integers within its
 * cipher and compression functions, anthough the reference implementation
 * uses big-endian ordering of bytes for integers.</p>
 *
 * <p>The exact specification of the algorithm implemented here integrates the
 * following changes, that have been made part of the algorithm approved in the
 * final European NESSIE workshop recommandation, published in 2004:</p>
 * <ul>
 * <li>version 3.0 (2003.03.12): suboptimal diffusion matrix replaced by
 *     cir(1, 1, 4, 1, 8, 5, 2, 9), eliminating the minor flaw discovered
 *     by Shirai and Shibutani, during the final evaluation of the algorithm
 *     for the NESSIE workshop in 2003.</li>
 * <li>version 2.1 (bug fix): nonzero carry was ignored when tallying the data
 *     length (this bug apparently only manifested itself when feeding data
 *     in pieces rather than in a single chunk at once).</li>
 * <li>version 2.0: original substitution box replaced by the tweaked,
 *     hardware-efficient version.</li>
 * <li>version 1.0: initial design and implementation for NESSIE.</li>
 * </ul>
 *
 * <p><b>References:</b></p>
 *
 * <p>The WhirlpoolMD_64 message digest algorithm was developed by:</p>
 * <ul>
 * <li><a href="mailto:&quot;Paulo S. L. M. Barreto&quot;
 *     &lt;pbarreto AT scopus DOT com DOT br&gt;">Paulo S. L. M.
 *     <b>Barreto</b></a>, from <b>Scopus Tecnologia S.A.</b> (Av. Mutinga,
 *     4105 - Pirituba, BR-05110-000 Sao Paulo (SP), Brasil), and</li>
 * <li><a href="mailto:&quot;Vincent Rijmen&quot;
 *     &lt;vincent DOT rijmen AT cryptomathic DOT com&gt;">Vincent
 *     <b>Rijmen</b></a>, from <b>Cryptomathic NV</b> (Lei 8A, B-3000 Leuven,
 *     Belgium).</li>
 * </ul>
 *
 * <p>See: P.S.L.M. Barreto, V. Rijmen, "The WhirlpoolMD_64 hashing function", first
 * NESSIE workshop, 2000 (final tweaked version 3.0, 2003.03.12), <a
 * href="https://www.cosic.esat.kuleuven.ac.be/nessie/workshop/submissions/whirlpool.zip">
 * https://www.cosic.esat.kuleuven.ac.be/nessie/workshop/submissions/whirlpool.zip</a></p>
 *
 * <p>NESSIE, New European Schemes for Signatures, Integrity, and Encryption,
 * <a href="http://cryptonessie.org/">http://cryptonessie.org/</a>, 2000.
 *
 * <p>WhirlpoolMD_64 is named after the WhirlpoolMD_64 galaxy in Canes Venatici (M51, or
 * NGC 5194), the first one recognised to have spiral structure by William Parsons,
 * third Earl of Rosse, in April 1845. See: M. Hoskin, "The Cambridge Illustrated
 * History of Astronomy", Cambridge University Press, London, 1997.</p>
 *
 * <p><b>Important Disclaimer by the Authors of the Reference Algorithm:</b></p>
 *
 * <p>THIS SOFTWARE IS PROVIDED BY THE AUTHORS &quot;AS IS&quot; AND ANY
 * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 * DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE FOR ANY
 * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.</p>
 *
 * @version 3.0 2005-03-01
 *
 * @author Philippe Verdy (optimizations and <code>MessageDigestSpiTestable</code>-based
 *         implementation conforming with the standard JCE), Paulo S.L.M.
 *         Barreto and Vincent Rijmen (initial design and documentation, and
 *         sample reference C and Java implementations).
 *
 * see WhirlpoolMD_32  the same digest algorithm but with a 32-bit implementation.
 */
public class WhirlpoolMD_64 extends MessageDigestSpi implements Cloneable {

    /**
     * Creates a Whirlpool digest with default initial state.
     */
    public WhirlpoolMD_64() {
        // The initial content of the pad buffer doesn't matter.
        pad = new byte[64];
        // The initial counters are already set to 0 by the default constructor.
        // Initialize the digest state with the init vector values.
        init();
    }

    /**
     * Clones this object.
     */
    public Object clone() throws CloneNotSupportedException {
        WhirlpoolMD_64 that = (WhirlpoolMD_64)super.clone();
        that.pad = (byte[])this.pad.clone();
        return that;
    }

    /**
     * Reset then initialize the digest context.
     *
     * Overrides the protected abstract method of
     * <code>java.security.MessageDigestSpiTestable</code>.
     */
    protected void engineReset() {
        // Erase the evidence from previously computed digest (only for security).
        int i = 60;
        do {
           pad[i    ] = (byte)0x00;
           pad[i + 1] = (byte)0x00;
           pad[i + 2] = (byte)0x00;
           pad[i + 3] = (byte)0x00;
        } while ((i -= 4) >= 0);
        // Reset the initial counters.
        padding = 0;
        n0 = n1 = n2 = n3 = n4 = n5 = n6 = n7 = 0;
        // Reset the digest state with the init vector values.
        init();
    }

    /**
     * Returns the digest length in bytes.
     *
     * Can be used to allocate your own output buffer when
     * computing multiple digests.
     *
     * Overrides the protected abstract method of
     * <code>java.security.MessageDigestSpiTestable</code>.
     * @return the digest length in bytes.
     */
    protected int engineGetDigestLength() {
        return HASH_LENGTH;
    }

    /**
     * Initialize the digest context.
     */
    protected void init() {
        h0 = h1 = h2 = h3 = h4 = h5 = h6 = h7 = 0L;
    }

    /**
     * This implementation returns a fixed-size digest.
     */
    private static final int HASH_LENGTH = 64; /* bytes (512 bits) */

    /**
     * The substitution box (256 distinct bytes packed into a 128-chars String).
     * (Note that this is not a valid Unicode text, as it contains:
     * - one unpaired high surrogate: \uDBA1,
     * - tree unpaired low surrogates: \uDC0B, \uDD17, \uDE1C.
     * However, a String in Java can contain any sequence of UCS2
     * codepoints, and is not restricted to valid Unicode text).
     */
    private static final String SBOX =
        "\u1823\uC6E8\u87B8\u014F\u36A6\uD2F5\u796F\u9152" +
        "\u60BC\u9B8E\uA30C\u7B35\u1DE0\uD7C2\u2E4B\uFE57" +
        "\u1577\u37E5\u9FF0\u4ADA\u58C9\u290A\uB1A0\u6B85" +
        "\uBD5D\u10F4\uCB3E\u0567\uE427\u418B\uA77D\u95D8" +
        "\uFBEE\u7C66\uDD17\u479E\uCA2D\uBF07\uAD5A\u8333" +
        "\u6302\uAA71\uC819\u49D9\uF2E3\u5B88\u9A26\u32B0" +
        "\uE90F\uD580\uBECD\u3448\uFF7A\u905F\u2068\u1AAE" +
        "\uB454\u9322\u64F1\u7312\u4008\uC3EC\uDBA1\u8D3D" +
        "\u9700\uCF2B\u7682\uD61B\uB5AF\u6A50\u45F3\u30EF" +
        "\u3F55\uA2EA\u65BA\u2FC0\uDE1C\uFD4D\u9275\u068A" +
        "\uB2E6\u0E1F\u62D4\uA896\uF9C5\u2559\u8472\u394C" +
        "\u5E78\u388C\uD1A5\uE261\uB321\u9C1E\u43C7\uFC04" +
        "\u5199\u6D0D\uFADF\u7E24\u3BAB\uCE11\u8F4E\uB7EB" +
        "\u3C81\u94F7\uB913\u2CD3\uE76E\uC403\u5644\u7FA9" +
        "\u2ABB\uC153\uDC0B\u9D6C\u3174\uF646\uAC89\u14E1" +
        "\u163A\u6909\u70B6\uD0ED\uCC42\u98A4\u285C\uF886";

    /**
     * Lookup tables for all values of x over the finite field GF(2^8),
     * each returning row i of the associated 8x8 diffusion matrix:
     * C{i}[x] = SBOX[x].cir{i}(1,1,4,1,8,5,2,9).
     */
    private static final long[]
        C0 = new long[256], C1 = new long[256],
        C2 = new long[256], C3 = new long[256],
        C4 = new long[256], C5 = new long[256],
        C6 = new long[256], C7 = new long[256];
    static {
        int x = 255; do {
            long v;
            int v1, v2, v4, v8;
            final char c;
            /*
             * Unpack the low byte v1 of char c.
             */
            v1 = (c = SBOX.charAt(x >> 1)) & 0xFF;
            /*
             * Compute 2*v1,4*v1,5*v1,8*v1,9*v1 over the finite field GF(2^8),
             * i.e. over GF(2)[x]/p_8(x) where p_8(x)=x^8+x^4+x^3+x^2+1=11D_x.
             */
            v2 = (v1 << 1) ^ (v1 <= 0x7F ? 0 : 0x11D);
            v4 = (v2 << 1) ^ (v2 <= 0x7F ? 0 : 0x11D); /* v5 = v4 ^ v1 */
            v8 = (v4 << 1) ^ (v4 <= 0x7F ? 0 : 0x11D); /* v9 = v8 ^ v1 */
            C0[x] = v = ((long)(
                    (((((v1 << 8) | v1) << 8) | v4) << 8) | v1
                ) << 32) | ((long)(
                    (((((v8 << 8) | (v4 ^ v1)) << 8) | v2) << 8) | (v8 ^ v1)
                ) & 0xFFFFFFFFL);           /* S[x].[1,1,4,1,8,5,2,9]. */
            C1[x] = (v >>>  8) | (v << 56); /* S[x].[9,1,1,4,1,8,5,2]. */
            C2[x] = (v >>> 16) | (v << 48); /* S[x].[2,9,1,1,4,1,8,5]. */
            C3[x] = (v >>> 24) | (v << 40); /* S[x].[5,2,9,1,1,4,1,8]. */
            C4[x] = (v >>> 32) | (v << 32); /* S[x].[8,5,2,9,1,1,4,1]. */
            C5[x] = (v >>> 40) | (v << 24); /* S[x].[1,8,5,2,9,1,1,4]. */
            C6[x] = (v >>> 48) | (v << 16); /* S[x].[4,1,8,5,2,9,1,1]. */
            C7[x] = (v >>> 56) | (v <<  8); /* S[x].[1,4,1,8,5,2,9,1]. */
            --x;
            /*
             * Unpack the high byte v1 of char c.
             */
            v1 = c >>> 8;
            /*
             * Compute 2*v1,4*v1,5*v1,8*v1,9*v1 over the finite field GF(2^8),
             * i.e. over GF(2)[x]/p_8(x) where p_8(x)=x^8+x^4+x^3+x^2+1=11D_x.
             */
            v2 = (v1 << 1) ^ (v1 <= 0x7F ? 0 : 0x11D);
            v4 = (v2 << 1) ^ (v2 <= 0x7F ? 0 : 0x11D); /* v5 = v4 ^ v1 */
            v8 = (v4 << 1) ^ (v4 <= 0x7F ? 0 : 0x11D); /* v9 = v8 ^ v1 */
            C0[x] = v = ((long)(
                    (((((v1 << 8) | v1) << 8) | v4) << 8) | v1
                ) << 32) | ((long)(
                    (((((v8 << 8) | (v4 ^ v1)) << 8) | v2) << 8) | (v8 ^ v1)
                ) & 0xFFFFFFFFL);           /* S[x].[1,1,4,1,8,5,2,9]. */
            C1[x] = (v >>>  8) | (v << 56); /* S[x].[9,1,1,4,1,8,5,2]. */
            C2[x] = (v >>> 16) | (v << 48); /* S[x].[2,9,1,1,4,1,8,5]. */
            C3[x] = (v >>> 24) | (v << 40); /* S[x].[5,2,9,1,1,4,1,8]. */
            C4[x] = (v >>> 32) | (v << 32); /* S[x].[8,5,2,9,1,1,4,1]. */
            C5[x] = (v >>> 40) | (v << 24); /* S[x].[1,8,5,2,9,1,1,4]. */
            C6[x] = (v >>> 48) | (v << 16); /* S[x].[4,1,8,5,2,9,1,1]. */
            C7[x] = (v >>> 56) | (v <<  8); /* S[x].[1,4,1,8,5,2,9,1]. */
        } while (--x > 0);
    }

    /**
     * The number of rounds of the internal dedicated block cipher.
     */
    private static final int R = 10;

    /**
     * The round constants (stored in reverse order of rounds).
     */
    private static final long[] RC = new long[R];
    static {
        int i = 0, r = R - 1; do {
            RC[r] =
                (C0[i    ] & 0xFF00000000000000L) |
                (C1[i + 1] & 0x00FF000000000000L) |
                (C2[i + 2] & 0x0000FF0000000000L) |
                (C3[i + 3] & 0x000000FF00000000L) |
                (C4[i + 4] & 0x00000000FF000000L) |
                (C5[i + 5] & 0x0000000000FF0000L) |
                (C6[i + 6] & 0x000000000000FF00L) |
                (C7[i + 7] & 0x00000000000000FFL);
            i += 8;
        } while (--r >= 0);
    }


    /**
     * Internal context buffer for incomplete blocks and padding bytes.
     */
    private byte[] pad;

    /**
     * Internal context counter for incomplete blocks and padding bytes.
     * INVARIANT: padding must be in 0..63.
     * When the padding reaches 64, a new block is computed, and
     * the 32 last bytes are kept in the padding history.
     */
    private int padding;

    /**
     * Internal global number of hashed bits (256-bit counter).
     * Stored in big-endian order as eight 32-bit ints, where
     * <code>n0</code> is the most significant 32-bit int, and
     * <code>n7</code> is the least significant 32-bit int.
     */
    private int n0, n1, n2, n3, n4, n5, n6, n7;

    /**
     * Internal hashing state (512-bit).
     * Stored in big-endian order as eight 64-bit longs, where
     * <code>h0</code> packs the 0th row of the M8x8[GF(2^3)] state matrix, and
     * <code>h7</code> packs the 7th row of the M8x8[GF(2^3)] state matrix.
     * @see #init()
     */
    protected long h0, h1, h2, h3, h4, h5, h6, h7;

    /**
     * Updates the digest using the specified array of bytes,
     * starting at the specified offset, but an implied length
     * of exactly 64 bytes.
     *
     * Requires no internal buffering, but assumes a fixed input size,
     * in which the required padding bytes may have been added.
     *
     * @param input  the array of bytes to use for the update.
     * @param offset  the offset to start from in the array of bytes.
     */
    private void computeBlock(final byte[] input, int offset) {
        long t0, t1, t2, t3, t4, t5, t6, t7; // Intermediate temporary.
        long k0, k1, k2, k3, k4, k5, k6, k7; // The round key K{r}[i].
        long w0, w1, w2, w3, w4, w5, w6, w7; // The cipher state W[K{r}](H[i]).
        long e0, e1, e2, e3, e4, e5, e6, e7; // The ciphered block eta[i].
        /*
         * Map the (big-endian) input buffer m to a block: eta[i] = mu(m[i]),
         * set the initial round key: K{0}[i] = H[i], and apply it to the
         * cipher state: W{0}(H[i]) = K{0}[i] ^ eta[i].
         */
        w0 = (k0 = h0) ^ (e0 =
         ((long)( ((((( input[ offset       ]          << 8) |
                       (input[ offset     +1] & 0xFF)) << 8) |
                       (input[ offset     +2] & 0xFF)) << 8) |
                       (input[ offset     +3] & 0xFF)        ) << 32) |
         ((long)( ((((( input[ offset     +4]          << 8) |
                       (input[ offset     +5] & 0xFF)) << 8) |
                       (input[(offset+=11)-5] & 0xFF)) << 8) |
                       (input[ offset     -4] & 0xFF)        ) & 0xFFFFFFFFL));
        w1 = (k1 = h1) ^ (e1 =
         ((long)( ((((( input[ offset     -3]          << 8) |
                       (input[ offset     -2] & 0xFF)) << 8) |
                       (input[ offset     -1] & 0xFF)) << 8) |
                       (input[ offset       ] & 0xFF)        ) << 32) |
         ((long)( ((((( input[ offset     +1]          << 8) |
                       (input[ offset     +2] & 0xFF)) << 8) |
                       (input[ offset     +3] & 0xFF)) << 8) |
                       (input[ offset     +4] & 0xFF)        ) & 0xFFFFFFFFL));
        w2 = (k2 = h2) ^ (e2 =
         ((long)( ((((( input[ offset     +5]          << 8) |
                       (input[(offset+=11)-5] & 0xFF)) << 8) |
                       (input[ offset     -4] & 0xFF)) << 8) |
                       (input[ offset     -3] & 0xFF)        ) << 32) |
         ((long)( ((((( input[ offset     -2]          << 8) |
                       (input[ offset     -1] & 0xFF)) << 8) |
                       (input[ offset       ] & 0xFF)) << 8) |
                       (input[ offset     +1] & 0xFF)        ) & 0xFFFFFFFFL));
        w3 = (k3 = h3) ^ (e3 =
         ((long)( ((((( input[ offset     +2]          << 8) |
                       (input[ offset     +3] & 0xFF)) << 8) |
                       (input[ offset     +4] & 0xFF)) << 8) |
                       (input[ offset     +5] & 0xFF)        ) << 32) |
         ((long)( ((((( input[(offset+=11)-5]          << 8) |
                       (input[ offset     -4] & 0xFF)) << 8) |
                       (input[ offset     -3] & 0xFF)) << 8) |
                       (input[ offset     -2] & 0xFF)        ) & 0xFFFFFFFFL));
        w4 = (k4 = h4) ^ (e4 =
         ((long)( ((((( input[ offset     -1]          << 8) |
                       (input[ offset       ] & 0xFF)) << 8) |
                       (input[ offset     +1] & 0xFF)) << 8) |
                       (input[ offset     +2] & 0xFF)        ) << 32) |
         ((long)( ((((( input[ offset     +3]          << 8) |
                       (input[ offset     +4] & 0xFF)) << 8) |
                       (input[ offset     +5] & 0xFF)) << 8) |
                       (input[(offset+=11)-5] & 0xFF)        ) & 0xFFFFFFFFL));
        w5 = (k5 = h5) ^ (e5 =
         ((long)( ((((( input[ offset     -4]          << 8) |
                       (input[ offset     -3] & 0xFF)) << 8) |
                       (input[ offset     -2] & 0xFF)) << 8) |
                       (input[ offset     -1] & 0xFF)        ) << 32) |
         ((long)( ((((( input[ offset       ]          << 8) |
                       (input[ offset     +1] & 0xFF)) << 8) |
                       (input[ offset     +2] & 0xFF)) << 8) |
                       (input[ offset     +3] & 0xFF)        ) & 0xFFFFFFFFL));
        w6 = (k6 = h6) ^ (e6 =
         ((long)( ((((( input[ offset     +4]          << 8) |
                       (input[ offset     +5] & 0xFF)) << 8) |
                       (input[(offset+=11)-5] & 0xFF)) << 8) |
                       (input[ offset     -4] & 0xFF)        ) << 32) |
         ((long)( ((((( input[ offset     -3]          << 8) |
                       (input[ offset     -2] & 0xFF)) << 8) |
                       (input[ offset     -1] & 0xFF)) << 8) |
                       (input[ offset       ] & 0xFF)        ) & 0xFFFFFFFFL));
        w7 = (k7 = h7) ^ (e7 =
         ((long)( ((((( input[ offset     +1]          << 8) |
                       (input[ offset     +2] & 0xFF)) << 8) |
                       (input[ offset     +3] & 0xFF)) << 8) |
                       (input[ offset     +4] & 0xFF)        ) << 32) |
         ((long)( ((((( input[ offset+= 5   ]          << 8) |
                       (input[ offset    + 1] & 0xFF)) << 8) |
                       (input[ offset    + 2] & 0xFF)) << 8) |
                       (input[ offset    + 3] & 0xFF)        ) & 0xFFFFFFFFL));
        /*
         * Iterate over all R rounds.
         */
        offset = R - 1; do {
            /*
             * Compute the round key K{r+1}[i] from K{r}[i], here with
             * r = (R - 1) - offset, and apply the (r+1)th round transformation
             * of the cipher state W{r}:
             * W{r+1}(H[i]) = ( K{r+1}[i] = C[K{r}[i]] ^ RC{r} ) ^ C[W{r}(H[i])].
             */
            t7 = C0[(int)(k7 >>> 56)       ] ^ C1[(int)(k6 >>> 48) & 0xFF] ^
	             C2[(int)(k5 >>> 40) & 0xFF] ^ C3[(int)(k4 >>> 32) & 0xFF] ^
	             C4[ (int)k3 >>> 24        ] ^ C5[((int)k2 >>> 16) & 0xFF] ^
	             C6[((int)k1 >>>  8) & 0xFF] ^ C7[ (int)k0         & 0xFF];
	        t6 = C6[((int)k0 >>>  8) & 0xFF] ^ C7[ (int)k7         & 0xFF] ^
	             C0[(int)(k6 >>> 56)       ] ^ C1[(int)(k5 >>> 48) & 0xFF] ^
	             C2[(int)(k4 >>> 40) & 0xFF] ^ C3[(int)(k3 >>> 32) & 0xFF] ^
	             C4[ (int)k2 >>> 24        ] ^ C5[((int)k1 >>> 16) & 0xFF];
	        t5 = C4[ (int)k1 >>> 24        ] ^ C5[((int)k0 >>> 16) & 0xFF] ^
	             C6[((int)k7 >>>  8) & 0xFF] ^ C7[ (int)k6         & 0xFF] ^
	             C0[(int)(k5 >>> 56)       ] ^ C1[(int)(k4 >>> 48) & 0xFF] ^
	             C2[(int)(k3 >>> 40) & 0xFF] ^ C3[(int)(k2 >>> 32) & 0xFF];
	        t4 = C2[(int)(k2 >>> 40) & 0xFF] ^ C3[(int)(k1 >>> 32) & 0xFF] ^
	             C4[ (int)k0 >>> 24        ] ^ C5[((int)k7 >>> 16) & 0xFF] ^
	             C6[((int)k6 >>>  8) & 0xFF] ^ C7[ (int)k5         & 0xFF] ^
	             C0[(int)(k4 >>> 56)       ] ^ C1[(int)(k3 >>> 48) & 0xFF];
	        t3 = C0[(int)(k3 >>> 56)       ] ^ C1[(int)(k2 >>> 48) & 0xFF] ^
	             C2[(int)(k1 >>> 40) & 0xFF] ^ C3[(int)(k0 >>> 32) & 0xFF] ^
	             C4[ (int)k7 >>> 24        ] ^ C5[((int)k6 >>> 16) & 0xFF] ^
	             C6[((int)k5 >>>  8) & 0xFF] ^ C7[ (int)k4         & 0xFF];
	        t2 = C6[((int)k4 >>>  8) & 0xFF] ^ C7[ (int)k3         & 0xFF] ^
	             C0[(int)(k2 >>> 56)       ] ^ C1[(int)(k1 >>> 48) & 0xFF] ^
	             C2[(int)(k0 >>> 40) & 0xFF] ^ C3[(int)(k7 >>> 32) & 0xFF] ^
	             C4[ (int)k6 >>> 24        ] ^ C5[((int)k5 >>> 16) & 0xFF];
	        t1 = C4[ (int)k5 >>> 24        ] ^ C5[((int)k4 >>> 16) & 0xFF] ^
	             C6[((int)k3 >>>  8) & 0xFF] ^ C7[ (int)k2         & 0xFF] ^
	             C0[(int)(k1 >>> 56)       ] ^ C1[(int)(k0 >>> 48) & 0xFF] ^
	             C2[(int)(k7 >>> 40) & 0xFF] ^ C3[(int)(k6 >>> 32) & 0xFF];
	        t0 = (k0 =
	             C2[(int)(k6 >>> 40) & 0xFF] ^ C3[(int)(k5 >>> 32) & 0xFF] ^
	             C4[ (int)k4 >>> 24        ] ^ C5[((int)k3 >>> 16) & 0xFF] ^
	             C6[((int)k2 >>>  8) & 0xFF] ^ C7[ (int)k1         & 0xFF] ^
	             C0[(int)(k0 >>> 56)       ] ^ C1[(int)(k7 >>> 48) & 0xFF] ^
	             RC[offset]) ^
	             C0[(int)(w0 >>> 56)       ] ^ C1[(int)(w7 >>> 48) & 0xFF] ^
	             C2[(int)(w6 >>> 40) & 0xFF] ^ C3[(int)(w5 >>> 32) & 0xFF] ^
	             C4[ (int)w4 >>> 24        ] ^ C5[((int)w3 >>> 16) & 0xFF] ^
	             C6[((int)w2 >>>  8) & 0xFF] ^ C7[ (int)w1         & 0xFF];
	        t1 = (k1 = t1) ^
	             C0[(int)(w1 >>> 56)       ] ^ C1[(int)(w0 >>> 48) & 0xFF] ^
	             C2[(int)(w7 >>> 40) & 0xFF] ^ C3[(int)(w6 >>> 32) & 0xFF] ^
	             C4[ (int)w5 >>> 24        ] ^ C5[((int)w4 >>> 16) & 0xFF] ^
	             C6[((int)w3 >>>  8) & 0xFF] ^ C7[ (int)w2         & 0xFF];
	        t2 = (k2 = t2) ^
	             C0[(int)(w2 >>> 56)       ] ^ C1[(int)(w1 >>> 48) & 0xFF] ^
	             C2[(int)(w0 >>> 40) & 0xFF] ^ C3[(int)(w7 >>> 32) & 0xFF] ^
	             C4[ (int)w6 >>> 24        ] ^ C5[((int)w5 >>> 16) & 0xFF] ^
	             C6[((int)w4 >>>  8) & 0xFF] ^ C7[ (int)w3         & 0xFF];
	        t3 = (k3 = t3) ^
	             C0[(int)(w3 >>> 56)       ] ^ C1[(int)(w2 >>> 48) & 0xFF] ^
	             C2[(int)(w1 >>> 40) & 0xFF] ^ C3[(int)(w0 >>> 32) & 0xFF] ^
	             C4[ (int)w7 >>> 24        ] ^ C5[((int)w6 >>> 16) & 0xFF] ^
	             C6[((int)w5 >>>  8) & 0xFF] ^ C7[ (int)w4         & 0xFF];
	        t4 = (k4 = t4) ^
	             C0[(int)(w4 >>> 56)       ] ^ C1[(int)(w3 >>> 48) & 0xFF] ^
	             C2[(int)(w2 >>> 40) & 0xFF] ^ C3[(int)(w1 >>> 32) & 0xFF] ^
	             C4[ (int)w0 >>> 24        ] ^ C5[((int)w7 >>> 16) & 0xFF] ^
	             C6[((int)w6 >>>  8) & 0xFF] ^ C7[ (int)w5         & 0xFF];
	        t5 = (k5 = t5) ^
	             C0[(int)(w5 >>> 56)       ] ^ C1[(int)(w4 >>> 48) & 0xFF] ^
	             C2[(int)(w3 >>> 40) & 0xFF] ^ C3[(int)(w2 >>> 32) & 0xFF] ^
	             C4[ (int)w1 >>> 24        ] ^ C5[((int)w0 >>> 16) & 0xFF] ^
	             C6[((int)w7 >>>  8) & 0xFF] ^ C7[ (int)w6         & 0xFF];
	        t6 = (k6 = t6) ^
	             C0[(int)(w6 >>> 56)       ] ^ C1[(int)(w5 >>> 48) & 0xFF] ^
	             C2[(int)(w4 >>> 40) & 0xFF] ^ C3[(int)(w3 >>> 32) & 0xFF] ^
	             C4[ (int)w2 >>> 24        ] ^ C5[((int)w1 >>> 16) & 0xFF] ^
	             C6[((int)w0 >>>  8) & 0xFF] ^ C7[ (int)w7         & 0xFF];
	        w7 = (k7 = t7) ^
	             C0[(int)(w7 >>> 56)       ] ^ C1[(int)(w6 >>> 48) & 0xFF] ^
	             C2[(int)(w5 >>> 40) & 0xFF] ^ C3[(int)(w4 >>> 32) & 0xFF] ^
	             C4[ (int)w3 >>> 24        ] ^ C5[((int)w2 >>> 16) & 0xFF] ^
	             C6[((int)w1 >>>  8) & 0xFF] ^ C7[ (int)w0         & 0xFF];
	        w6 = t6; w5 = t5; w4 = t4; w3 = t3; w2 = t2; w1 = t1; w0 = t0;
        } while (--offset >= 0);
        /*
         * Apply the Miyaguchi-Preneel compression function:
         * H[i+1] = H[i] ^ W{R}(H[i]) ^ eta[i].
         */
        h0 ^= w0 ^ e0; h1 ^= w1 ^ e1; h2 ^= w2 ^ e2; h3 ^= w3 ^ e3;
        h4 ^= w4 ^ e4; h5 ^= w5 ^ e5; h6 ^= w6 ^ e6; h7 ^= w7 ^ e7;
    }

    /**
     * Updates the digest using the specified byte.
     * Requires internal buffering, and may be slow.
     *
     * Overrides the protected abstract method of
     * java.security.MessageDigestSpiTestable.
     * @param input  the byte to use for the update.
     */
    protected void engineUpdate(byte input) {
        /* Increment the total bit length by 8. */
        if ((n7 += 8) == 0) // Test if updated n7 has a non-0 carry.
            // Note that the maximum value of n7 ,if it was large enough to
            // contain all bits of the results, would be 0xFFFFFFFFL+8, i.e.
            // 0x1000000007L, so the possible carry can only be 0 or 1. We
            // know that we have a carry when the unsigned result of the
            // 32-bit int n7, after incrementing it by 8, is lower than 8. We
            // should have to test this case by "if ((n7 += 8) & 7 == l7)", if
            // the engine state allowed hashing data chunks size not multiple
            // 8 bits (not supported by the standard JCE MessageDigest SPI).
            if (++n6 == 0) // Increment and test successive carries.
                if (++n5 == 0)
                    if (++n4 == 0)
                        if (++n3 == 0)
                            if (++n2 == 0)
                                if (++n1 == 0)
                                    ++n0;
        /* Process input byte, and terminated 64-byte block as necessary. */
        if (padding < 63) {
            pad[padding++] = input;
            return;
        }
        pad[63] = input;
        computeBlock(pad, 0);
        padding = 0;
    }

    /**
     * Updates the digest using the specified array of bytes,
     * starting at the specified offset.
     *
     * Input length can be any size. May require internal buffering,
     * if input blocks are not multiple of 64 bytes.
     *
     * Overrides the protected abstract method of
     * java.security.MessageDigestSpiTestable.
     * @param input  the array of bytes to use for the update.
     * @param offset the offset to start from in the array of bytes.
     * @param len    the number of bytes to use, starting at offset.
     */
    protected void engineUpdate(byte[] input, int offset, int len) {
        if (offset >= 0 && len >= 0 && offset + len <= input.length) {
            /*
             * Tally the 256-bit (eight 32-bit ints) bit-length (len * 8) of
             * the added data.
             */
            /* Add (len * 8) to the total bitlength. */
            {
                long n;
                n7 = (int)(n = (n7         & 0xFFFFFFFFL) +
                               ((len << 3) & 0xFFFFFFFFL));
                n6 = (int)(n = (n >>> 32                ) +
                               (n6         & 0xFFFFFFFFL) +
                               (len >>> 29              ));
                // Note that the possible carry in (n >>> 32) can only be 0
                // or 1, because n is the sum of two unsigned 32-bit int and
                // a 0-or-1 carry, so the maximum value of n here will be
                // 1+2*0xFFFFFFFFL, i.e. here n is in [0, 0x1FFFFFFFFL].
                if (n >>> 32 != 0) // Test if n has a non-0 carry.
                    if (++n5 == 0) // Increment and test successive carries.
                        if (++n4 == 0)
                            if (++n3 == 0)
                                if (++n2 == 0)
                                    if (++n1 == 0)
                                        ++n0;
            }
            /* Terminate a previous partial 64-byte block. */
            if (padding > 0) {
                final int padlen;
                if ((padlen = 64 - padding) <= len) {
                    System.arraycopy(input, offset, pad, padding, padlen);
                    computeBlock(pad, padding = 0);
                    offset += padlen;
                    len -= padlen;
                }
            }
            /* Loop on large sets of complete 64-byte blocks. */
            while (len >= 512) {
                computeBlock(input, offset);
                computeBlock(input, offset + 64);
                computeBlock(input, offset + 128);
                computeBlock(input, offset + 192);
                computeBlock(input, offset + 256);
                computeBlock(input, offset + 320);
                computeBlock(input, offset + 384);
                computeBlock(input, offset + 448);
                offset += 512;
                len -= 512;
            }
            /* Loop on remaining complete 64-byte blocks. */
            while (len >= 64) {
                computeBlock(input, offset);
                offset += 64;
                len -= 64;
            }
            /* Remaining bytes kept for next 64-byte block. */
            if (len > 0) {
                System.arraycopy(input, offset, pad, padding, len);
                padding += len;
            }
            return;
        }
        throw new ArrayIndexOutOfBoundsException(offset);
    }

    /**
     * Completes the hash computation by performing final operations
     * such as padding. Once engineDigest has been called, the engine
     * will be automatically reset (see engineReset).
     *
     * Overrides the protected abstract method of
     * java.security.MessageDigestSpiTestable.
     * @param hashvalue  the output buffer in which to store the digest.
     * @param offset  offset to start from in the output buffer
     * @param len  number of bytes within buf allotted for the digest.
     *             Both this default implementation and the SUN provider
     *             do not return partial digests.  The presence of this
     *             parameter is solely for consistency in our API's.
     *             If the value of this parameter is less than the
     *             actual digest length, the method will throw a
     *             DigestException.  This parameter is ignored if its
     *             value is greater than or equal to the actual digest
     *             length.
     * @return  the length of the digest stored in the output buffer.
     */
    protected int engineDigest(byte[] hashvalue, int offset, int len)
            throws DigestException {
        if (len >= HASH_LENGTH) {
            if (hashvalue.length - offset >= HASH_LENGTH) {
                /* Flush the trailing bytes, adding padding bytes into last
                 * blocks. */
                int i, j;
                /* Add padding null bytes but replace the last 32 padding bytes
                 * by the big-endian 512-bit digested message bit-length. */
                pad[i = padding] = (byte)0x80; /* required 1st padding byte */
                /* Check if 32 bytes available in pad to store the 512-bit
                 * total message bit-length. */
                if (++i > 32) { /* INVARIANT: i must be in [0..63] */
                    /* clear all the remaining bytes of pad[] */
                    switch (i & 3) {
                    case 0: break;
                    case 1: pad[i++] = (byte)0x00; /* no break; fall thru */
                    case 2: pad[i++] = (byte)0x00; /* no break; fall thru */
                    case 3: pad[i++] = (byte)0x00; /* no break; fall thru */
                    }
                    while (i < 64) {
                        pad[i    ] = (byte)0x00;
                        pad[i + 1] = (byte)0x00;
                        pad[i + 2] = (byte)0x00;
                        pad[i + 3] = (byte)0x00;
                        i += 4;
                    }
                    computeBlock(pad, i = 0);
                }
                /* Clear the remaining bytes in the first 32 bytes of pad[] */
                switch (i & 3) {
                case 0: break;
                case 1: pad[i++] = (byte)0x00; /* no break; fall thru */
                case 2: pad[i++] = (byte)0x00; /* no break; fall thru */
                case 3: pad[i++] = (byte)0x00; /* no break; fall thru */
                }
                while (i < 32) {
                   pad[i    ] = (byte)0x00;
                   pad[i + 1] = (byte)0x00;
                   pad[i + 2] = (byte)0x00;
                   pad[i + 3] = (byte)0x00;
                   i += 4;
                }
                /* Convert the message bitlength to big-endian byte order. */
                pad[(i = 58) + 5] = (byte)(j = n7);
                pad[ i       + 4] = (byte)(j >>>  8);
                pad[ i       + 3] = (byte)(j >>> 16);
                pad[ i       + 2] = (byte)(j >>> 24);
                pad[ i       + 1] = (byte)(j = n6);
                pad[ i          ] = (byte)(j >>>  8);
                pad[ i       - 1] = (byte)(j >>> 16);
                pad[ i       - 2] = (byte)(j >>> 24);
                pad[ i       - 3] = (byte)(j = n5);
                pad[ i       - 4] = (byte)(j >>>  8);
                pad[ i       - 5] = (byte)(j >>> 16);
                pad[(i = 47) + 5] = (byte)(j >>> 24);
                pad[ i       + 4] = (byte)(j = n4);
                pad[ i       + 3] = (byte)(j >>>  8);
                pad[ i       + 2] = (byte)(j >>> 16);
                pad[ i       + 1] = (byte)(j >>> 24);
                pad[ i          ] = (byte)(j = n3);
                pad[ i       - 1] = (byte)(j >>>  8);
                pad[ i       - 2] = (byte)(j >>> 16);
                pad[ i       - 3] = (byte)(j >>> 24);
                pad[ i       - 4] = (byte)(j = n2);
                pad[ i       - 5] = (byte)(j >>>  8);
                pad[(i = 36) + 5] = (byte)(j >>> 16);
                pad[ i       + 4] = (byte)(j >>> 24);
                pad[ i       + 3] = (byte)(j = n1);
                pad[ i       + 2] = (byte)(j >>>  8);
                pad[ i       + 1] = (byte)(j >>> 16);
                pad[ i          ] = (byte)(j >>> 24);
                pad[ i       - 1] = (byte)(j = n0);
                pad[ i       - 2] = (byte)(j >>>  8);
                pad[ i       - 3] = (byte)(j >>> 16);
                pad[ i       - 4] = (byte)(j >>> 24);
                computeBlock(pad, 0);
                /* Return the computed digest in big-endian byte order. */
                hashvalue[ offset           ] = (byte)((i = (int)(h0 >>> 32)) >>> 24);
                hashvalue[ offset        + 1] = (byte)( i >>> 16);
                hashvalue[ offset        + 2] = (byte)( i >>> 8);
                hashvalue[ offset        + 3] = (byte)  i;
                hashvalue[ offset        + 4] = (byte)((i = (int) h0) >>> 24);
                hashvalue[ offset        + 5] = (byte)( i >>> 16);
                hashvalue[(offset += 11) - 5] = (byte)( i >>> 8);
                hashvalue[ offset        - 4] = (byte)  i;
                hashvalue[ offset        - 3] = (byte)((i = (int)(h1 >>> 32)) >>> 24);
                hashvalue[ offset        - 2] = (byte)( i >>> 16);
                hashvalue[ offset        - 1] = (byte)( i >>> 8);
                hashvalue[ offset           ] = (byte)  i;
                hashvalue[ offset        + 1] = (byte)((i = (int) h1) >>> 24);
                hashvalue[ offset        + 2] = (byte)( i >>> 16);
                hashvalue[ offset        + 3] = (byte)( i >>> 8);
                hashvalue[ offset        + 4] = (byte)  i;
                hashvalue[ offset        + 5] = (byte)((i = (int)(h2 >>> 32)) >>> 24);
                hashvalue[(offset += 11) - 5] = (byte)( i >>> 16);
                hashvalue[ offset        - 4] = (byte)( i >>> 8);
                hashvalue[ offset        - 3] = (byte)  i;
                hashvalue[ offset        - 2] = (byte)((i = (int) h2) >>> 24);
                hashvalue[ offset        - 1] = (byte)( i >>> 16);
                hashvalue[ offset           ] = (byte)( i >>> 8);
                hashvalue[ offset        + 1] = (byte)  i;
                hashvalue[ offset        + 2] = (byte)((i = (int)(h3 >>> 32)) >>> 24);
                hashvalue[ offset        + 3] = (byte)( i >>> 16);
                hashvalue[ offset        + 4] = (byte)( i >>> 8);
                hashvalue[ offset        + 5] = (byte)  i;
                hashvalue[(offset += 11) - 5] = (byte)((i = (int) h3) >>> 24);
                hashvalue[ offset        - 4] = (byte)( i >>> 16);
                hashvalue[ offset        - 3] = (byte)( i >>> 8);
                hashvalue[ offset        - 2] = (byte)  i;
                hashvalue[ offset        - 1] = (byte)((i = (int)(h4 >>> 32)) >>> 24);
                hashvalue[ offset           ] = (byte)( i >>> 16);
                hashvalue[ offset        + 1] = (byte)( i >>> 8);
                hashvalue[ offset        + 2] = (byte)  i;
                hashvalue[ offset        + 3] = (byte)((i = (int) h4) >>> 24);
                hashvalue[ offset        + 4] = (byte)( i >>> 16);
                hashvalue[ offset        + 5] = (byte)( i >>> 8);
                hashvalue[(offset += 11) - 5] = (byte)  i;
                hashvalue[ offset        - 4] = (byte)((i = (int)(h5 >>> 32)) >>> 24);
                hashvalue[ offset        - 3] = (byte)( i >>> 16);
                hashvalue[ offset        - 2] = (byte)( i >>> 8);
                hashvalue[ offset        - 1] = (byte)  i;
                hashvalue[ offset           ] = (byte)((i = (int) h5) >>> 24);
                hashvalue[ offset        + 1] = (byte)( i >>> 16);
                hashvalue[ offset        + 2] = (byte)( i >>> 8);
                hashvalue[ offset        + 3] = (byte)  i;
                hashvalue[ offset        + 4] = (byte)((i = (int)(h6 >>> 32)) >>> 24);
                hashvalue[ offset        + 5] = (byte)( i >>> 16);
                hashvalue[(offset += 11) - 5] = (byte)( i >>> 8);
                hashvalue[ offset        - 4] = (byte)  i;
                hashvalue[ offset        - 3] = (byte)((i = (int) h6) >>> 24);
                hashvalue[ offset        - 2] = (byte)( i >>> 16);
                hashvalue[ offset        - 1] = (byte)( i >>> 8);
                hashvalue[ offset           ] = (byte)  i;
                hashvalue[ offset        + 1] = (byte)((i = (int)(h7 >>> 32)) >>> 24);
                hashvalue[ offset        + 2] = (byte)( i >>> 16);
                hashvalue[ offset        + 3] = (byte)( i >>> 8);
                hashvalue[ offset        + 4] = (byte)  i;
                hashvalue[ offset +=  5     ] = (byte)((i = (int) h7) >>> 24);
                hashvalue[ offset        + 1] = (byte)( i >>> 16);
                hashvalue[ offset        + 2] = (byte)( i >>> 8);
                hashvalue[ offset        + 3] = (byte)  i;
                engineReset(); /* clear the evidence */
                return HASH_LENGTH;
            }
            throw new DigestException(
                "insufficient space in output buffer to store the digest");
        }
        throw new DigestException("partial digests not returned");
    }

    /**
     * Completes the hash computation by performing final operations
     * such as padding. Computes the final hash and returns the final
     * value as a byte[24] array. Once engineDigest has been called,
     * the engine will be automatically reset as specified in the
     * JavaSecurity MessageDigest specification.
     *
     * For faster operations with multiple digests, allocate your own
     * array and use engineDigest(byte[], int offset, int len).
     *     * Overrides the protected abstract method of
     * java.security.MessageDigestSpiTestable.
     * @return the length of the digest stored in the output buffer.
     */
    protected byte[] engineDigest() {
        try {
            final byte hashvalue[] = new byte[HASH_LENGTH];
            engineDigest(hashvalue, 0, HASH_LENGTH);
            return hashvalue;
        } catch (DigestException e) {
            return null;
        }
    }

}

