{"title": "Neural Computing with Small Weights", "book": "Advances in Neural Information Processing Systems", "page_first": 944, "page_last": 949, "abstract": null, "full_text": "Neural Computing with Small Weights \n\nDept. of Electrical & Computer Engineering \n\nUniversity of California, Irvine \n\nKai-Yeung Siu \n\nIrvine, CA 92717 \n\nJ ehoshua Bruck \n\nIBM Research Division \n\nAlmaden Research Center \nSan Jose, CA 95120-6099 \n\nAbstract \n\nAn important issue in neural computation is the dynamic range of weights \nin the neural networks. Many experimental results on learning indicate \nthat the weights in the networks can grow prohibitively large with the \nsize of the inputs. Here we address this issue by studying the tradeoffs \nbetween the depth and the size of weights in polynomial-size networks \nof linear threshold elements (LTEs). We show that there is an efficient \nway of simulating a network of LTEs with large weights by a network \nof LTEs with small weights. In particular, we prove that every depth-d, \npolynomial-size network of LTEs with exponentially large integer weights \ncan be simulated by a depth-(2d + 1), polynomial-size network of LTEs \nwith polynomially bounded integer weights. To prove these results, we \nuse tools from harmonic analysis of Boolean functions. Our technique is \nquite general, it provides insights to some other problems. For example, \nwe are able to improve the best known results on the depth of a network \nof linear threshold elements that computes the COM PARI SO N, SUM \nand PRO DU CT of two n-bits numbers, and the MAX 1M U M and the \nSORTING of n n-bit numbers. \n\n1 \n\nIntroduction \n\nThe motivation for this work comes from the area of neural networks, where a \nlinear threshold element is the basic processing element. Many experimental results \non learning have indicated that the magnitudes of the coefficients in the threshold \nelements grow very fast with the size of the inputs and therefore limit the practical \nuse of the network. One natural question to ask is the following: How limited \n944 \n\n\fNeural Computing with Small Weights \n\n945 \n\nis the computational power of the network if we restrict ourselves to threshold \nelements with only \"small\" growth in the coefficients? We answer this question by \nshowing that we can trade-off an exponential growth with a polynomial growth in \nthe magnitudes of coefficients by increasing the depth of the network by a factor of \nalmost two and a polynomial growth in the size. \n\nLinear Threshold Functions: A linear threshold function f(X) is a Boolean \nfunction such that \n\nwhere \n\nf(X) = sgn(F(X\u00bb = {_II if F(X) > 0 \nif F(X) < 0 \n\nn \n\nF(X) = 2:= Wi . Xi + Wo \n\ni=l \n\n---\n\n---\n\n-\n\nThroughout this paper, a Boolean function will be defined as f : {I, _I}n --+ \n{I, -I}; namely, 0 and 1 are represented by 1 and -1, respectively. Without loss \nof generality, we can assume F(X):/; 0 for all X E {I,-I}n. The coefficients Wi \nare commonly referred to as the weights of the threshold function. We denote the \nclass of all linear threshold functions by LT1 \u2022 \n\n---LT1 functions: In this paper, we shall study a subclass of LT1 which we denote \nby IT1 . Each function f(X) = sgn(L:~=l Wi' Xi + wo) in IT1 is characterized by \nthe property that the weights Wi are integers and bounded by a polynomial in n, \ni.e. IWil ~ n C for some constant c > O. \nThreshold Circuits: A threshold circuit [5, 10] is a Boolean network in which \nevery gate computes an LT1 function. The size of a threshold circuit is the number \nof LT1 elements in the circuit. Let LTk denote the class of threshold circuits of \ndepth k with the size bounded by a polynomial in the number of inputs. We define \nLTk similarly except that we allow each gate in LTk to compute an LTI function. \nAlthough the definition of (LTd linear threshold function allows the weights to be \nreal numbers, it is known [12] that we can replace each of the real weights by integers \nof O( n log n) bits, where n is the number of input Boolean variables. So in the rest \nof the paper, we shall assume without loss of generality that all weights are integers. \nHowever, this still allows the magnitudes of the weights to increase exponentially \nfast with the size of the inputs. It is natural to ask if this is necessary. In other \nwords, is there a linear threshold function that must require exponentially large \nweights? Since there are 2n(n~) linear threshold functions in n variables [8, 14, 15], \nthere exists at least one which requires O(n 2 ) bits to specify the weights. By the \npigeonhole principle, at least one weight of such a function must need O(n) bits, \nand thus is exponentially large in magnitude. i.e. \n\n-LTI ~ LT1 \n\nThe above result was proved in [9] using a different method by explicitly constructing \nan LT1 function and proving that it is not in LT1 . In the following section, we \nshall show that the COMPARISON function (to be defined later) also requires \nexponentially large weights. We will refer to this function later on in the proof of \n\n-\n\n\f946 \n\nSiu and Bruck \n\nour main results. Main Results: The fact that we can simulate a linear threshold \nfunction with exponentially large weights in a 'constant' number oflayers of elements \nwith 'small' weights follows from the results in [3] and [11]. Their results showed \nthat the sum of n n-bit numbers is computable in a constant number of layers of \n'counting' gates, which in turn can be simulated by a constant number of layers of \nthreshold elements with 'small' weights. However, it was not explicitly stated how \nmany layers are needed in each step of their construction and direct application of \ntheir results would yield a constant such as 13. In this paper, we shall reduce the \nconstant to 3 by giving a more 'depth'-efficient algorithm and by using harmonic \nanalysis of Boolean functions [1,2,6]. We then generalize this result to higher depth \ncircuits and show how to simulate a threshold circuit of depth-d and exponentially \nlarge weights in a depth-(2d + 1) threshold circuit of 'small' weights, i.e. LTd ~ \nfr2d+l. \nAs another application of harmonic analysis, we also show \nthe \nCOM P ARISON and ADDITION of two n-bit numbers is computable with only \ntwo layers of elements with 'small' weights, while it was only known to be com(cid:173)\nputable in 3 layers [5]. We also indicate how our 'depth'-efficient algorithm can be \napplied to show that the product of two n-bit numbers can be computed in LT4 . \nIn addition, we show that the MAXIMUM and SORTING ofn n-bit numbers \ncan be computed in fr3 and LT4 , respectively. \n\n--\n\nthat \n\n2 Main Results \n\nDefinition: Let X = (Xl, ... , Xn), Y = (YI, ... , Yn) E {I, _l}n. We consider X \nand Y as two n-bit numbers representing E?=l Xi\u00b7 2' and E?=l Yi . 2i , respectively. \nThe COMPARISON function is defined as \n\nC(X, Y) = 1 iff X ~ Y \n\nIn other words, \n\nLemma 1 \n\nC(X, Y) = sgn{L:: 2i(Xi - yd + I} \n\nn \n\ni=l \n\n-\nCOMPARISON tt LTI \n\nOn the other hand, using harmonic analysis [2], we can show the following: \n\nLemma 2 \n\nCOMPARISON E m \n\nSpectral representation of Boolean functions: Recently, harmonic analysis \nhas been found to be a powerful tool in studying the computational complexity of \nBoolean functions [1, 2, 7]. The idea is that every Boolean function f : {I, _1}n -+ \n{I, -I} can be represented as a polynomial over the field of rational numbers as \nfollows: \n\nf(X) = L aa xa \n\naE{O,l}n \n\n\fNeural Computing with Small Weights \n\n947 \n\nh \nwere \n\nX a \n\nal al \n= x 1 x 2 \n\nan \n.\u2022. Xn \n\n\u2022 \n\nSuch representation is unique and the coefficients of the polynomial, {aal Q E \n{a, l}n}, are called the spectral coefficients of f. \nWe shall define the Ll spectral norm of f to be \n\nIIfll = ~ laal\u00b7 \n\nae{O,I}n \n\nThe proof of Lemma 2 is based on the spectral techniques developed in [2]. Using \nprobabilistic arguments, it was proved in [2] that if a Boolean function has Ll spec-\ntral norm which is polynomially bounded, then the function is computable in LT2 \u2022 \nWe observe (together with Noga Alon) that the techniques in [2] can be generalized \nto show that any Boolean function with polynomially bounded Ll spectral norm \ncan even be closely approximated by a sparse polynomial. This observation is crucial \nwhen we extend our result from a single element to networks of elements with large \nweights. \n\n. -\n\nLemma 3 Let f(X) : {I, _l}n --+ {I, -I} such that IIfll ~ n C for some c. Then \nfor any k > 0, there exists a sparse polynomial \n\nF(X) = N 2:'.:: wa Xa such that \n\n1 \n\naes \n\nwhere Wa and N are integers, S c {O, l}n, the size of S, Wa and N are all bounded \nby a polynomial in n. Hence, f(X) E LT2 \u2022 \n\nIF(X) -\n\nf(X)1 ~ n- k , \n\nAs a consequence of this result, Lemma 2 follows since it can be shown that \nCOM PARISON has a polynQmially bounded Ll spectral norm. \n\nNow we are ready to state our main results. Although most linear threshold func(cid:173)\ntions require exponentially large weights, we can always simulate them by 3 layers \nof in elements. \nTheorem 1 \n\n-\n\nLTI ~ LT3 \n\nThe result stated in Theorem 1 implies that a depth-d threshold circuit with ex(cid:173)\nponentially large weights can be simulated by a depth-3d threshold circuit with \npolynomially large weights. Using the result of Lemma 3, we can actually obtain a \nmore depth-efficient simulation. \n\nTheorem 2 \n\nAs another consequence of Lemma 3, we have the following : \n\n\f948 \n\nSiu and Bruck \n\nCorollary 1 Let /1 (X), ... , fm(X) be functions with polynomially bounded Ll spec-\ntral norms, and g(/1 (X), ... , fm(X\u00bb be an fi\\ function with fi(X) 's as inputs, \nI.e. \n\ng(/1(X), ... , fm(X\u00bb = sgn(2: Wdi(X) + wo) \n\nm \n\nThen 9 can be expressed as a sign of a sparse polynomial in X with polynomially \nmany number of monomial terms xcr 's and polynomially bounded integer coeffi-\ncients. Hence 9 E LT2. \n\n---\n\ni=l \n\nIf all LTI functions have polynomially bounded Ll spectral norms, then it would \nfollow that LTI C iT 2 \u2022 However, even the simple MAJORITY function does not \nhave a polynomially bounded Ll spectral norm. We shall prove this fact via the \nfollowing theorem. (As in Lemma 3, by a sparse polynomial we mean a polynomial \nwith only polynomially many monomial terms xcr's). \nTheorem 3 The iTl function MAJORITY: \n\nn \n\nsgn(2: X i) \n\ni=l \n\ncannot be approximated by a sparse polynomial with an error o( n -1). \n\nOther applications of the harmonic analysis techniques and the results of Lemma 3 \nyields the following theorems: \n\nTheorem 4 \n\nLet x, y be two n-bit numbers. Then \n\nADDITION(x, y) E m \n\nTheorem 5 The product of two n-bit integers can be computed in LT4 \u2022 \n\n---\nTheorem 6 The MAX I MU M of n n-bit numbers can be computed in LT3. \nTheorem 7 The SORTING ofn n-bit numbers can be computed in IT4 . \n\n---\n\n3 Concluding Remarks \n\nOur main result indicates that for networks of linear threshold elements, we can \ntrade-off arbitrary real weights with polynomially bounded integer weights, at the \nexpense of a polynomial increase in the size and a factor of almost two in the depth of \nthe network. The proofs of the results in this paper can be found in [13]. We would \nlike to mention that our results have recently been improved by Goldmann, Hastad \nand Razborov [4]. They showed that any polynomial-size depth-d network oflinear \nthreshold elements with arbitrary weights can be simulated by a polynomial-size \ndepth-( d + 1) network with \"small\" (polynomially bounded integer) weights. While \nour construction can be made explicit, only the existence of the simulation result is \nproved in [4]; it is left as an open problem in [4] if there is an explicit construction \nof their results. \n\n\fNeural Computing with Small Weights \n\n949 \n\nAcknowledgements \n\nThis work was done while Kai-Yeung Siu was a research student associate at IBM \nAlmaden Research Center and was supported in part by the Joint Services Program \nat Stanford University (US Army, US Navy, US Air Force) under Contract DAAL03-\n88-C-0011, and the Department of the Navy (NAVELEX), NASA Headquarters, \nCenter for Aeronautics and Space Information Sciences under Grant NAGW-419-\nS6. \n\nReferences \n\n[1] J. Bruck. Harmonic Analysis of Polynomial Threshold Functions. SIAM Jour(cid:173)\n\nnal on Discrete Mathematics, May 1990. \n\n[2] J. Bruck and R. Smolensky. Polynomial Threshold Functions, ACo Functions \nand Spectral Norms. Technical Report RJ 7140, IBM Research, November \n1989. Appeared in IEEE Symp. on Found. of Compo Sci. October, 1990. \n\n[3] A. K. Chandra, L. Stockmeyer, and U. Vishkin. Constant depth reducibility. \n\nSiam J. Comput., 13:423-439, 1984. \n\n[4] M. Goldmann, J. Hastad, and A. Razborov Majority Gates vS. General \n\nWeighted Threshold Gates. Unpublished Manuscript. \n\n[5] A. HajnaI, W . Maass, P. PudIak, M. Szegedy, and G. Turan. Threshold circuits \n\nof bounded depth . IEEE Symp. Found. Compo Sci., 28:99-110, 1987. \n\n[6] R. J. Lechner. Harmonic analysis of switching functions. In A. Mukhopadhyay, \n\neditor, Recent Development in Switching Theory. Academic Press, 1971. \n\n[7] N. LiniaI, Y. Mansour, and N. Nisan. Constant Depth Circuits, Fourier Trans(cid:173)\n\nforms, and Learnability. Proc. 30th IEEE Symp. Found. Compo Sci., 1989. \n\n[8] S. Muroga and 1. Toda. Lower Bound of the Number of Threshold Functions. \n\nIEEE Trans. on Electronic Computers, EC 15, 1966. \n\n[9] J. Myhill and W. H. Kautz. On the Size of Weights Required for Linear-Input \n\nSwitching Functions. IRE Trans. on Electronic Computers, EC 10, 1961. \n\n[10] I. Parberry and G. Schnitger. Parallel Computation with Threshold Functions \n\n. Journal of Computer and System Sciences, 36(3):278-302, 1988. \n\n[11] N. Pippenger. The complexity of computations by networks. IBM J. Res. \n\nDevelop. , 31(2), March 1987. \n\n[12] P. Raghavan. Learning in Threshold Networks: A Computation Model and \n\nApplications. Technical Report RC 13859, IBM Research, July 1988. \n\n[13] K.-Y. Siu and J. Bruck. On the Power of Threshold Circuits with Small \n\nWeights. SIAM J. Discrete Math., 4(3):423-435, August 1991. \n\n[14] D. R. Smith. Bounds on the Number of Threshold Functions. IEEE Trans. on \n\nElectronic Computers, EC 15, 1966. \n\n[15] S. Yajima and T. Ibaraki. A Lower Bound on the Number of Threshold Func(cid:173)\n\ntions. IEEE Trans. on Electronic Computers, EC 14, 1965. \n\n\f", "award": [], "sourceid": 529, "authors": [{"given_name": "Kai-Yeung", "family_name": "Siu", "institution": null}, {"given_name": "Jehoshua", "family_name": "Bruck", "institution": null}]}