004.93
..

. . , , ' , , .
: , , , , .
. . , , , , .
: , , , , .
Abstract. The task of sample formation automaticity for diagnostic and recognizing model building on precedents is solved. Extraction method of training samples is offered. It maintains saving the important topological properties of the original sample in a formed sub-sample, and does not require download of the original sample to the computer memory, and the numerous passages of the original sample. This reduces the size of the sample and reduces the resource requirements of a computer.
Keywords: sample, example selection, data reduction, data mining, data dimensionality reduction.
1.
- [1-4], [3], . , . .
[1-5], , [5, 6], . , , , - .
.., 2013
ISSN 1028-9763. , 2013, 1
, , . , [7-9], , .
[10-13] , ( ) , , . , .
.
2.
X =< , > - 8 ( X ), = {*},
= {*} , 5 = 1,2,. .,8, N {]}, = 1,2,...,N, - , . * -
I I * .
< , >, = {.}, - - , -
- () , * {1,2,...,} ,
- , > 1.
() X =< , > X*, XX,
88, .
, , , .
3.
, , -, . , , ( - ), ( ), .
, . , , N , , N - .
, , ,
N - . , , .
.
. X =< , > .
. , , ] - , j = 1,2,...,N, :
x"1 = max {xS}, x]fin = min {xS},
s=1,2,...,S
s=1,2,...,S
:
5
cqj = 4I(x.SlyS = q}, q=1,2,...,K,
Sq
s =1
4 - , q - . :
- - :
5 j = xj
max - xmin, j = 1,2,..., N;
- :
, = 1,2,...,N, q, = 1,2,...,;
rq - rp
rj rj
) = (^ q)
- - , 5 > :
(
n
round
ln
5 j S
0,5S,
min (d j (q, p )}
q=1,2,..., K;
^ p=q+1, q+2,..., K
min (dj (q, p)} = 0;
min (dj(q,p)} > 0;
q=1,2,..., K; p=q+1,q+2,...,K
q=1,2,...,; =q+1,q+2,...,
8
- j - : 0 =^~.
. 5- *, * = 1,2,..., -
:
:
, - -
rj (Xs)
round
1,ej=
1+
j > 0; j = 1,2,...,N;
e
- :
( 5 )
N
(5)2 (5) = {(5)};
=1
=1,2,..., N
- , :
^
( 5) = 8
N
(5)
=1

( (5 ))2
=1
- : ^ 5 ) + (5 ).
I. , .
X ={5},
5 =< , 5,5 > 15.
. I ( ) , .
:
- , , . : = 0 5 = 1. , 5 < , : : = +1, - : 1 = 15, - : = 5, , 5 < 5+1 = ,
5 :5 = 5 +1, : = 15. 1;
- X , :
X' = X'\{< ,", 5 >| ^ = 1,2,..., ! : 1 = V = }.
. X X* X,
X':
* *
" = 1,2,..., : X = X {< 5,5 >| 5 X'}.
X, X*, .
4.
:
X, X (, , ).
( - , - - ) 2 + 81 8 ( - , - , - ).
(+ 58 + 6N + 2 - ) -
(58 + 7N + 2 - ) -
, , , .

(14N8 + 68 + 818 + N + (N +1)(2 - )) , .
= 2, N << 8 (, N ї 0,0018 ) = N8 ї 0,00182, , 81 8ї88, : -
(0,01482 +14,0038)ї (14 + 442,8/), - (5,0078)ї (158,34>/).
5.
, [14-16], . 1.
1.
N 8 8 * 8 * / 8
[14] 3 23 2126 48898 236 0,11
[15] 7 54 581012 31374648 51926 0,09
, , . . 1, ( 9-11 ), , , , .
.
.
, , , , , , .
, , , , .
, .
- ", " ( . 0111U000059) " , , ".

1. .. I .. , .. . - : , 200. - 404 .
2. . , I . , . , . ; . . .. . - .: - , 2004. -452 .
3. I [. . , .. , .. .; . .. ]. - : ї, 2012. - 317 .
4. , I [.. , .. , .. .; . .. , .. ]. - : " ", 2009. - 4 .
5. .. - I .. . - Saarbrucken: LAP Lambert academic publishing, 2012. - 232 .
6. Jensen R. Computational intelligence and feature selection: rough and fuzzy approaches I R. Jensen, Q. Shen. - Hoboken: John Wiley & Sons, 200B. - 339 p.
7. Chaudhuri A. Survey sampling theory and methods I A. Chaudhuri, H. Stenger. - New York: Chapman
& Hall, 2005. - 41 p.
B. Encyclopedia of survey research methods I ed. P.J. Lavrakas. - Thousand Oaks: Sage Publications, 200B. - Vol. 1-2. - 9 p.
9. . I . ; . . .. ; . .. , .. . - .: , 197. - 440 .
10. Subbotin S.A. The training set quality measures for neural network learning I S.A. Subbotin II Optical Memory and Neural Networks (Information Optics). - 2010. - Vol. 19, N 2. - P. 12 - 139.
11. .. I .. II .
- 2010. - 1. - . 25 - 39.
12. .. I .. II . - 2010.
- 1. - . 3B - 42.
13. .. I .. II " ": . . . - : "", 2011. - N∞ 17. - C. 149 - 15.
14. Cardiotocography Data Set [ ]. - : http:IIarchive.ics.uci.eduI mlIdatasetsICardiotocography.
15. Covertype Data Set [ ]. - : http:IIarchive.ics.uci.eduImlIdatasets/ Covertype.
03.10.2012