Hello,
I just analysed the ifrench-gut_1.0alpha8.orig.tar.gz in the Slink
distribution of Debian 2.1 .. it has much more words in different
files ... but I found out that there is a matching subset of dicts
that has the 58233 words ... I wonder if this set makes any sense to
select. Due to my no French abilities, I have absolutely no clue.
The subset I give most likelihood is:
> wc -w math.dico noms_propres.dico nonverbes.dico reflex-gp3.dico
typo.dico verbes-gp12.dico verbes-impers.dico 2<>/dev/null
1174 math.dico
1177 noms_propres.dico
21866 nonverbes.dico
455 reflex-gp3.dico
42 typo.dico
33357 verbes-gp12.dico
162 verbes-impers.dico
58233 total
What do you think!?
ps: Here, I attached my haskell protocol:
power [] = [[]]
power (x:xs) = let ys = power xs in
ys ++ [ (x:ss) | ss <- ys ]
sort [] = []
sort (x:xs) = (sort [y | y <- xs, y <= x]) ++ [x] ++ (sort [y | y <- xs, y
> x])
{- Session : analysing ifrench-gut_1.0alpha8.orig.tar.gz
Francais-GUTenberg-v1.0alpha8/dicos > wc *.dico* 2<>/dev/null
17 17 71 abrev.dico
153 153 1773 auxil.dico
30 30 404 helvetismes.dico
30 30 252 informatique.dico
1174 1174 14686 math.dico
2273 2273 26988 math.dico-old
1177 1177 10330 noms_propres.dico
21868 21866 270876 nonverbes.dico
3627 3627 48939 pronominaux.dico
6511 6511 85728 reflex-gp12.dico
455 455 5788 reflex-gp3.dico
239 239 2296 series.dico
42 42 511 typo.dico
68 68 595 verbes-defect.dico
33357 33357 422611 verbes-gp12.dico
1730 1730 20766 verbes-gp3.dico
162 162 1534 verbes-impers.dico
72913 72911 914148 total
we have got 2 empty lines in the nonverbes.dico and assume that the
authors did find that issue
> [ xs | xs <- power
[17,153,30,30,1174,2273,1177,21866,3627,6511,455,239,42,68,33357,1730,162],
sum xs == 58233 ]
[[1174,1177,21866,455,42,33357,162]]
which corresponds to
> wc -w math.dico noms_propres.dico nonverbes.dico reflex-gp3.dico
typo.dico verbes-gp12.dico verbes-impers.dico 2<>/dev/null
1174 math.dico
1177 noms_propres.dico
21866 nonverbes.dico
455 reflex-gp3.dico
42 typo.dico
33357 verbes-gp12.dico
162 verbes-impers.dico
58233 total
> [ sum xs | xs <- power
[17,153,30,30,1174,2273,1177,21866,3627,6511,455,239,42,68,33357,1730,162],
typo <- [222,223,232,233,322,323,332,333], sum xs == 58000 + typo]
[58223,58232,58233,58322,58322,58232,58322,58322,58322,58232,58232,58223]
> lenght $$
12
--------------------------------------------------------------------------------
Just in case that the author just count lines and not words:
> [ xs | xs <- power
[17,153,30,30,1174,2273,1177,21868,3627,6511,455,239,42,68,33357,1730,162],
sum xs == 58233]
[[17,153,2273,21868,455,42,68,33357]]
> wc abrev.dico auxil.dico math.dico-old nonverbes.dico reflex-gp3.dico
typo.dico verbes-defect.dico verbes-gp12.dico 2<>/dev/null
17 17 71 abrev.dico
153 153 1773 auxil.dico
2273 2273 26988 math.dico-old
21868 21866 270876 nonverbes.dico
455 455 5788 reflex-gp3.dico
42 42 511 typo.dico
68 68 595 verbes-defect.dico
33357 33357 422611 verbes-gp12.dico
58233 58231 729213 total
> [ sum xs | xs <- power
[17,153,30,30,1174,2273,1177,21868,3627,6511,455,239,42,68,33357,1730,162],
typo <- [222,223,232,233,322,323,332,333], sum xs == 58000 + typo]
[58333,58222,58332,58223,58333,58222,58332,58223,58333,58222,58232,58233,58332,58332]
> length $$
14
> [ xs | xs <- power
[17,153,30,30,1174,2273,1177,21868,3627,6511,455,239,42,68,33357,1730,162],
sum xs == 58223 ]
[[30,1174,1177,21868,455,33357,162],[30,1174,1177,21868,455,33357,162]]
whose correspond to
> wc informatique.dico math.dico noms_propres.dico nonverbes.dico
reflex-gp3.dico verbes-gp12.dico verbes-impers.dico 2<>/dev/null
30 30 252 informatique.dico
1174 1174 14686 math.dico
1177 1177 10330 noms_propres.dico
21868 21866 270876 nonverbes.dico
455 455 5788 reflex-gp3.dico
33357 33357 422611 verbes-gp12.dico
162 162 1534 verbes-impers.dico
58223 58221 726077 total
> [ xs | xs <- power
[17,153,30,30,1174,2273,1177,21868,3627,6511,455,239,42,68,33357,1730,162],
sum xs == 58332 ]
[[30,2273,21868,455,239,42,68,33357],[30,2273,21868,455,239,42,68,33357],[17,153,30,1177,21868,33357,1730],[17,153,30,1177,21868,33357,1730]]
whose correspond to
> wc abrev.dico auxil.dico informatique.dico noms_propres.dico
nonverbes.dico verbes-gp12.dico verbes-gp3.dico 2<>/dev/null
17 17 71 abrev.dico
153 153 1773 auxil.dico
30 30 252 informatique.dico
1177 1177 10330 noms_propres.dico
21868 21866 270876 nonverbes.dico
33357 33357 422611 verbes-gp12.dico
1730 1730 20766 verbes-gp3.dico
58332 58330 726679 total
or
> wc informatique.dico math.dico-old nonverbes.dico reflex-gp3.dico
series.dico typo.dico verbes-defect.dico verbes-gp12.dico 2<>/dev/null
30 30 252 informatique.dico
2273 2273 26988 math.dico-old
21868 21866 270876 nonverbes.dico
455 455 5788 reflex-gp3.dico
239 239 2296 series.dico
42 42 511 typo.dico
68 68 595 verbes-defect.dico
33357 33357 422611 verbes-gp12.dico
58332 58330 729917 total
end session -}


|