S1(POOL|S) = S1(POOLS) is the set of words W' < POOLS that begin POOL and have the same letter multiset as POOLS. Then the last letter of W' must be S, but then it is not the case that W' < POOLS, so S1 is empty.
S2(POO|LS) is the set of words POOxy where xy is a word on letter multiset LS so that x < L. Again there are none.
S3(PO|OLS) is the set of words POxyz where xyz is a word on letter multiset LOS so that x < O. (For this purpose we can ignore the prefix PO and focus on the suffix OLS.) Then xyz is one of LOS, LSO. In fact, the size #S3 of set S3 is the multinomial coefficient
#S3 = m(2; 1, 1) = m(1, 1) = (1 + 1)! / (1! · 1!) = 2,for we know x = L, and once this is determined, there are m(1, 1) ways to choose y and z from letter multiset OS (there is one O and one S).
S4(P|OOLS) is the set of words Pxyzw where xyzw is a word on letter multiset LOOS and x < O. Then x can be only L, and yzw is a word on letter multiset OOS. Then
#S4 = m(2, 1) = (2 + 1)! / (2! · 1!) = 3,as there are 2 instances of O and 1 of S; yzw is one of OOS, OSO, SOO.
S5(|POOLS) is the set of words xyzwv on letter multiset LOOPS with x < P. Then x is one of OL, and
#S5 = m(1, 1, 1, 1) + m(2, 1, 1).
In all,
N(POOLS) - 1 = m(1, 1) + m(2, 1) + m(1, 1, 1, 1) + m(2, 1, 1) = 2 + 3 + 24 + 12 = 41.That is, N(POOLS) = 42.
W0 empty, W1 = S, W2 = LS, W3 = OLS, W4 = OOLS, W5 = POOLS.
Let Mn denote the set of arrangements of the letters of Wn. Let wn(c) be the number of occurrences of character c in Wn. Then |Mn| is a multinomial coefficient,
|Mn| = mn = [wn(A) + ... + wn(Z)]! / [wn(A)! · ... · wn(Z)!].
In the code below, *t refers to the character at the head of Wn. Let wn(c < *t) denote the number of characters c in Wn before *t in the alphabet.
6! = 1 · 2 · 3 · 4 · 5 · 6 = 2 · 3 · 22 · 5 · (2 · 3) = 45 · 24.We store the number of factors 2 rather than a power of 2. For example, the odd part of 6! is 45, and its even part is 4. The odd part of 24! fits in 58 bits (and conveniently in a 64-bit unsigned long long int of C99). Each multinomial coefficient m(k; ...) that we consider has no more of any prime factor than does k!, and k! has no more of any prime factor than does 24!.
#include <stdio.h> int odd[] = {0,1,1,3,1,5,3,7,1,9,5,11,3,13,7,15,1,17,9,19,5,21,11,23,3,25}; int even[] = {0,0,1,0,2,0,1,0,3,0,1, 0,2, 0,1, 0,4, 0,1, 0,2, 0, 1, 0,3, 0};We need odd and even parts of only a few small numbers. Precompute them for speed.
void rank(char *s) { char L[32]; // L[k]: kth distinct character seen int C[32]; // C[k]: no. chars. L[k] in W_n int D = 0; // D: no. distinct characters seen int next[32]; // Let i(0) = 0; i(k) = next[i(k-1)] if k > 0; and next[i(D)] next[0] = 0; // = 0. The alphabetically kth distinct char. seen is L[i(k)]. unsigned long long rank = 1, m_odd = 1; int m_even = 0; // m = m_odd * (2 ^ m_even) char *t; // tail string W_n for(t = s; *t; t++); // advance t to end of string for(int n = 1; t-- != s; n++) { // read string right to left int k = 0, w = 0; // w: w_n(c < *t) for(; next[k] && L[next[k]] < *t; k = next[k]) w += C[next[k]]; // now L[k] is *t's alphabetical predecessor if(!next[k] || L[next[k]] > *t) { // *t is new; update L, C, D, and next D++; L[D] = *t; C[D] = 0; next[D] = next[k]; next[k] = D; } int ct = ++C[next[k]]; // ct: w_n(*t) // rank += m_{n-1} * w_n(c < *t) / w_n(*t); see comments @1, @2 if(w) rank += m_odd * odd[w] / odd[ct] << m_even + even[w] - even[ct]; // m = m_n = m_{n-1} * n / w_n(*t) m_odd = m_odd * odd[n] / odd[ct]; m_even = m_even + even[n] - even[ct]; } printf("%llu\n", rank); }
Sn × {1, 2, ..., wn(*t)}to
Mn-1 × {1, 2, ..., wn(c < *t)}.
Here's one. Think of an element (U, k) of Sn × {1, ..., wn(*t)} as a word U < Wn (on the letters of Wn) with its kth character *t underscored. (By definition, there are wn(*t) occurrences of the letter *t in Wn. Put an underscore under the kth one, but don't tie it to the character, as the character will be removed shortly.)
E.g., if Wn = BAABA, then U could be AAABB, and k could be 1, giving AAABB, where bold type indicates the underscored character.
Consider the following operation: take such an element (U, k), pop off its head letter, and replace the underscored character by it. The result is a word on the letters of Wn-1 with a character c < *t underscored.
E.g., AAABB -> AAAB. That is, A1A2A3BiBii -> A2A3A1Bii.
The inverse operation is to remove the underscored character to the front and fill its vacated place with *t.
It would be easier to avoid overflow by using a big-integer library such as GMP, but slower. If longer words are to be considered, we might try to separate more prime factors. Because factorials lack large prime factors, it should be possible to pack their prime-factor representations into a single 64-bit integer for strings 33 characters long. Combining this with the idea of separating 2 from the next 12 primes, it should be possible to pack prime-factor representations (say, with a 5-bit count of each prime) into a single 64-bit integer for strings of 43 characters or so.
int main(int argc, char **argv) { if(argc < 2) { printf("Usage: %s WORD\n", argv[0]); return 1; } if(argv[1][0] != '-') rank(argv[1]); else { // take words from stdin instead of argv (for performance testing) char line[30]; while(0 < scanf("%s", line)) rank(line); } return 0; }
Here is the complete program source.