Proof of Independence of Initial State and Corresponding Symbol Sequence

Dalia Terhesiu and Whitney Tabor

University of Connecticut

Last update:  October 19, 2004

 

 

Lyapunov exponents for SLDA

 

I. Notations and assumptions

FLNN receives/selects one symbol from the alphabet S= {1,.., K}at a time   and it forms sequences, where  is the set of infinite strings formed from the alphabet. Let  be symbols in indexed by described as in the following.

FLLN encodes symbols fromin the following way. If at time t=n the net is supposed to encode the symbol , the input vector will be a vector that has 1 on the -th position and 0 in rest. The input vectors form the standard basis of. The hidden state is given by the family of functions  on  with values in with

                                       (1)                                          

where  is the ’th element of the hidden vector,

Assuming

is the second order weight between the input unit and the hidden unit

 is the first order weight between the input unit and the hidden unit .

Let . The hidden vector is given by

 where                                                                             (2)

is the diagonal matrix that has on its diagonal and is the vector whose elements are .

Now let be symbol sequences in S¥ indexed by, where is sequence of which n-th element is. corresponds to a sequence

 are partitions of , the boundaries of whose comportments are the domains of . When is in, then a possible next symbol is and the corresponding next state is

 The output vectors are probability distributions given by the probability function       where                                   (3)

 gives the probability that in is selected. That is it gives the probability that is selected given

 ,                                                                    (4)

where  is the conditional probability of the next state given .

 

II. The metric dynamical system

The spacecan be associated with a metric

  where             (6)

and .`                                                                       

Let  be symbols in  and be the corresponding symbol sequences in S¥ introduced in the previous section.

 The collection of sets

   (7)                            

is the Borel s algebra of  . From now on, the sets will be simply referred as .

The cylinder sets                (8)

with base are sets of the Borel s algebra [1]and is a measurable space.

Following Kolmogorov’s theorem on the extension of measures (A.N.Shiryaev,1996) there exists an unique probability measure  on with

                                                                   (9),

is a sequence of probability measures on

Thus, is a probability space.

Let                                                   (10)[2]

be the shift map on .

The shift map defined in (10) is trivially measurable and  has the cocycle property

.                                                  (11)

In the next section we define our random dynamical system over the metric dynamical system .

             

 III. The affine random dynamical system over the metric dynamical system

 

Let  be two functions 

                (12)                                                                    

with and  introduced in (2).

Since is defined on the probability space with values in the measurable space, where is the Borel sigma algebra of , is measurable. In fact since  the function defined on with values in is measurable (or a so called random element of ).

The same holds for .The function  defined on with values in is measurable (or a so called random element of ).

and thus  is measurable.

Let be the semigroup of affine transformation defined as

                                                                                      (13)                                                

with  and measurable as introduced in (12).

Now, the family of functionsintroduced in (2) can be rewritten as in the following[3].

In the following, let   be the one time mapping  

                                                                        (14)                                               

with and , introduced in (12) and

 with the initial state.                                 

The hidden state is given by the random difference equation

                                         (15)[4]                                         

Now, the solution for (15) is the cocycle over the metrical dynamical system  :

                                                   (16)[5]

To see that is measurable we introduce the Borel sigma algebra of . Consider again the comportments  introduced in section I and the decomposition . The Borel sigma algebra of is the smallest sigma algebra that contains the decomposition. The algebrawill contain all the sets . The random dynamical system is measurable[6] if and only if the mapping is measurable. That is, denoting the mapping  by , is measurable if and only if , where , which is a direct consequence of the fact thatand are measurable and of the fact that all setsare in .

Further, we consider the linearization or derivative of at ,   for each fixed , i.e. the Jacobian matrix .   is the linear cocycle onover the metrical dynamical system generated by the difference equation

                                                (17)

 

IV. The invariant measure for

Since is measurable and continuous[7] and becauseis compact, according to Markov-Kakutani fixed point theorem (Arnold.L,1998) we have that there is at least one probability measure on which is invariant. More exactly, given that is the skew product corresponding to and is the projection onto , the probability measure is said to be -invariant if

1. for all . That is, , where

and

2.. That is, , where  is the probability on the space .

            Another way to look at the problem is to consider the homogenous Markov chain on the state space . Conform Kifer’s theorem[8], since  is a measurable cocycle with time , which is a product of i.i.d. random mappings we have that for any fixed independent of , the orbit of is a homogenous Markov chain with the state space and transition probability

                                                                                     (18)[9].

Now, given that is continuous and considering Ohno’s one to one correspondence[10] between  invariant product measures  on and the measures  corresponding to Markov chains on for one sided time random dynamical systems, we have that there exist a measure which is  invariant.

            Here we notice that the passing from the product of random mappings to the Markov chain is unique and thus the measure  corresponding to the Markov chain onis unique, the importance of which will be discussed in the last section after the Multiplicative Ergodic Theorem for our RDS is provided.

 

 

 

V. The Multiplicative ergodic theorem for our RDS [11]

Consider linear cocycle over the metrical dynamical system  introduced in (17). The fact that there is an invariant measure on which leaves invariant guaranties that there exist an invariant set of full measure such that for each

1)      The limit exists,

with  introduced in (18).

2)      Let be the different eigenvalues of and let be the corresponding eigenspaces with multiplicities .Then

 ,

for all

for all

3) If and

 defines a filtration of . Then for every , the Lyapunov exponent , i.e. the limit

 exists and

or equivalently

V. The independence of the initial state  and the corresponding sequence

The invariant set introduced in the formulation of MET for our RDS is independent of the choice of the initial sequence if the invariant measure for is unique.

As said in section III, the measure  corresponding to the Markov chain onis unique. Given that is unique, the uniqueness of the measure  for  on is trivial.

However, it should be noticed that while the passing from a measurable, continuous constructed RDS is unique, the reverse problem is not. That is, we cannot be sure of the uniqueness of a measurable, continuous constructed RDS with a prescribed transition probability.

 

 



[1] The smallest s algebra that containing all the sets introduced in (8).

[2] This is just another way of writing the shift   map. In the notation used above.

[3] Here we are following (Arnold.L,1998,S.5.6.)

[4] In terms of the previous notation .

[5] In terms of the previous notation

[6] Note that because time is discrete, measurability of  is equivalent to the measurability of for each fixed .

[7] Again because time is discrete continuity of is equivalent to continuity of for each fixed . In terms of the previous notation continuity accounts to the continuity of each .

[8] As provided in (Arnold.L,1998, S2.1.4)

 

[9] Note that in terms of functions the measure is equivalent to the following. First consider the probabilistic iterated function system (IFS) with and the vector of probability distribution.

[10] As provided in (Arnold.L,1998, S2.1.6)

 

[11] As provided in (Arnold.L,1998, S3.4.2)