el_16_1_05

Upload: mauricio-ramos

Post on 03-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 EL_16_1_05

    1/9

    Abstract A population based real-time optimization methodfor tuning dynamic position control parameters of robotmanipulators has been proposed. Conventionally, the positionalcontrol is achieved by inverse dynamics feedforward and PIDfeedback controllers. The proposed method allows to tune the PIDcontroller parameters using population based optimizationmethods to minimize the error while tracking a repeated desired

    trajectory on real-time. The stability of the system is granted byswitching the inappropriate settings to a stable default using areal-time cost evaluation function.

    The proposed tuning method is tested on a two-joint planarmanipulator with Cross-Entropy optimization, and on a planarinverted pendulum both with Cross Entropy, and DifferentialEvolutionary search methods. The test results indicated that theproposed method improves the settling time and reduces theposition error over the repeated paths for both population basedevolutionary optimization.

    I ndex Terms adaptive control, PID tuning, real-time CEoptimization, real-time evolutionary optimization, real-time DEoptimization.

    I. I NTRODUCTION Robot manipulators are commonly employed in repetitivetasks in the industry for the reduction of production costs,enhancement of precision, quality and productivity whilehaving greater flexibility than other specialized machines aswell as in hazardous environmental conditions such as inradioactive, toxic zones or where a risk of explosion exists, orspatial and submarine applications. The short-term projectionsshow that assembly tasks will continue to be the mainapplications of robot manipulators [1].

    Robot manipulators are mainly positioning devices withmultiple degrees-of-freedom (DoF). Dynamic position control

    of robot manipulators is a well known problem in controlengineering due to the multi-input-multi-output nonlinear behaviour of their equation of motion. There are a large numberof excellent survey books and articles on the control of themanipulators [1]-[5]. In a typical position control problem, the

    plant is characterized by the dynamic equation of motion of themanipulator, which describes the motion of the jointdisplacements corresponding to the applied joint torques. Thecontroller is designed to generate appropriate joint torques totrack a desired task space trajectory, or equivalently to track thecorresponding desired joint space trajectory. The conversion of

    Dr. Mehmet Bodur is with the Computer Engineering Department at EasternMediterranean University, G.Magusa, TRNC, Mersin-10 Turkey. (e-mail:mehmet @ ieee . org ).

    trajectories from the task space to the joint space requireskinematic modeling of the robot manipulator. The equation ofmotion is obtained by one of the methods such as

    Newton-Euler, Lagrangian, and their backward and forwardrecursive applications.

    The decentralized PD and PID control are the conventionalmethods for position control of the manipulator joint

    displacement. PD control can achieve global asymptoticstability in positioning the end-point in the absence of gravity.The integral action of the PID control further compansates any

    positional offset due to gravitational forces [5]. Severalmethods were proposed in literature to obtain suitable

    propotional, integral and derivative controller gain settings: K p , K i and K d . In absence of friction, the gains may be selectedusing a Lyapunov function candidate and LaSalles InvariancePrinciple [5]. Adaptive PID settings were also proposed forobtaining proper local gain settings based on plantidentification and pole placement techniques [6].

    Population-based metaheuristics is one of the important

    branches of the metaheuristic methods to search solutions ofdifficult optimization problems. Even though metaheuristicmethods have drawbacks such as sensitivity to parameters andlack of strong proof of convergence, they are still preferred forseveral reasons: They do not require special conditions for the

    proprieties of the objective function and constraints, they aresuitable for both continuous and combinatorial problems, andthey are suitable as well to multiobjective optimizationapplications. Population-based methods exhibit furtheradvantages compared to the neighborhood metaheuristics.They are less sensitive to misbehaving paths of certainindividual solutions, and they increase the probability ofattaining the global optimum by employing information on thesurface of an objective function [7].

    Population-based algorithms are difficult to apply toreal-time optimization of robotic control problems, because ofthe difficulty to keep the trajectory in tolerance for all tested

    population members. For a reinforcement learning applicationLin restricted the population only to stable parameters using aLyapunov function [8]. This study proposes a simpler solution

    by real-time monitoring the tracking error. The stability problem is tackled by switching the unstable control parametersto a stable parameter set if the continuously monitored errorscore exceeds a tolerable threshold. The proposed method isdemonstrated on two population based search algorithms: the

    Differential Evolution and the Cross-Entropy methods.

    Real-Time Population Based Optimization for

    Adaptive Motion Control of Robot ManipulatorsMehmet Bodur, Member, IEEE

    Engineering Letters, 16:1, EL_16_1_05 ______________________________________________________________________________________

    (Advance online publication: 19 February 2008)

  • 8/12/2019 EL_16_1_05

    2/9

    Cross-Entropy (CE) Method is a statistical population basedinformation theoretic method of inference about an unknown

    probability density from a prior estimate of the density and newexpected values [9], [10]. CE Method was further developed asa combinatorial optimization tool to obtain the optimalsolutions for rare-event problems and optimization of the scalarfunctions [10], [11]. It is one of the optimization methods withthe proven convergence to the optimal parameters similar toMonte-Carlo and the Simulated Annealing methods.

    Evolutionary Algorithms (EAs) are nature inspired population based optimization methods for both continuousvariable, and combinatorial problems. The DifferentialEvolution is a population based search method developed fromGenetic Algorithms (GAs), which provide evolution of the bestsolution by survival of the fittest. GAs are search algorithmsinspired of natural selection and genetics to search bettersolutions in a solution space. A simple genetic algorithm is

    typically composed of three operators: reproduction, crossoverand mutation [12]. The beginning of the Differential Evolution(DE) algorithm was Genetic Annealing developed by K. Price[13], [14]. Unlike to the GAs, DE employs only a simple andfast linear operator, differential mutation, which uses thedifferences between individuals instead of crossover andmutation operators [7], [13]. In continuous domain multimodal

    problems, DE finds the global best faster than GAs [7], [15].This success resides in the manner of the trial individualcreation.

    This article is organized as follows: Section II presents thekinematics, dynamic modeling and control of a robotmanipulator. Section III and IV introduces the Cross-Entropyand the Differential Evolution methods for optimization of thecontroller gains. Section V describes the proposed real-time

    population based optimization algorithm with its goal. SectionVI and VII contains the experimental details and the results ofthe proposed parameter switching on a two-link planarmanipulator, and on an inverted pendulum system. Section VIIIconcludes the article.

    II. KINEMATICS DYNAMIC AND CONTROL

    The forward kinematics relation maps the joint space position q to the end-effector pose

    p = f 0(q) (1)

    using homogenous link-frame coordinate transformationmatrices. Denavit-Hartenberg (D-H) convention provides theD-H transformation matrix Ak that maps a coordinate at k th frame to the ( k 1) st frame [15]. In D-H convention, the frameaxis k 1 z is assigned along the axis of rotation of the k

    th link. Ifthe joint is translational, k 1 z is assigned along the movementaxis of the k th link. k is the joint rotation from k 1 x to k x. Thecommon normal from k 1 z to k z is called a k . The distance fromak 1 to a k is denoted by d k . The twist angle from k 1 z to k z iscalled k . The transformation function from k th to ( k 1)st frame is obtained by a chain of rotations and translations

    k 1

    Ak =Rot( z k 1, k )Trans n 1(ak , 0, d k ) Rot( xk 1, k )

    k 1 Ak =cos sin cos sin sin a cos

    sin cos cos cos sin a sin

    0 sin cos

    0 0 0 1

    k k k k k k k

    k k k k k k k

    k k k d

    (2)

    The Lagrangian provides a systematic method to obtain theequation of motion of a multi-degree-of-freedom open chainmechanism. The Lagrangian function L is defined by

    L = K P , where K is the kinetic energy, and P is the potentialenergy of the analyzed system. The partial derivatives of L

    provide a simple means of calculation for the joint force andtorques. The derivatives of the D-H transformation matrices Ai can be expressed by

    i

    i

    Aq

    = Q i Ai , (3)

    where Qi =

    0000

    0000

    0001

    0010

    for revolute, and Q i =

    0 0 0 0

    0 0 0 0

    0 0 0 1

    0 0 0 0

    for

    translational joints. This property provides a convenient formfor the derivatives of the link transformation matrices 0T i withrespect to the j th generalized-joint displacement variables q j andqk [15].

    U ij =0

    i

    j

    T q

    = 1 2( ... ... ) j i

    j

    A A A Aq

    = A1 A2 ... Q j A j ... Ai ; j i . (4)and

    U ijk =0

    i

    j k

    T q q

    = 1 2

    ( ... ... ) j i j k

    A A A Aq q

    = A1 A2 ... Q j A j ... Q k Ak ... Ai ; j i, k i . (5)Linear or revolute, the kinetic energy of a mass dm i located

    at the position ir dm and moving with the i-th coordinate frameis

    d k i = 12 d m i

    0viT 0vi , (6)

    where 0vi denotes the velocity of mass mi. The total kineticenergy of the link requires the integration of d k i over thecomplete mass of the i-th link.

    k i = mi dk i = 12Trace [(i p= 1 i r= 1U ip J pi U ir T qr q p )], (7)

    where J pi=mi ir dm ir dmTdm i is the pseudo-inertia matrix of the

    thi link. The potential energy of ith link due to thegravitational field is expressed using the mass mi, the center ofmass icm and the gravitational acceleration vector g a .

    p i = mi g a T 0T i icm . (8)

    Finally, the Lagrangian of the complete system is obtained L = K P

    = 12 ni= 1 ik= 1 ir= 1 Trace (U ik J pi U ir T ) qr q p

    n i= 1 mi g a T 0T i icm (9) and its derivatives give the generalized joint-force at the joint- j.

    j = dt d

    jq L

    jq

    L , (10)

    j = n i=j n k= 1 Trace (U ik J pi U ijT ) q k

    Engineering Letters, 16:1, EL_16_1_05 ______________________________________________________________________________________

    (Advance online publication: 19 February 2008)

  • 8/12/2019 EL_16_1_05

    3/9

  • 8/12/2019 EL_16_1_05

    4/9

    since determination of g *( x) requires l to be known. Instead,the optimum tilting parameter v * of a pmf f (x, v ) reduces the

    problem to scalar case.The tilting parameter v can be estimated by minimizing

    Kullback-Leibler distance (also called cross entropy) between the two densities f ( x) and g ( x)

    CE ( f , g )=( )

    ( ) ln( )

    f x f x dx

    g x . (21)

    After reducing the problem to tilt parameter v , the cross-entropy between f ( x, v ) and the optimal distribution f ( x, v ) isdescribed by

    CE (v ,v *) := { } { }*

    ( ) ( ) ( , )ln

    ( , )

    S x S x I I f x E

    c c f x

    , (18)

    where c = { }( ) ( )S x I f x dx . The optimal solution* is

    obtained by solving * from minv CE N ( ) , where

    CE N (v ) = E v 1= 1 N Ni=1 I {S ( x(i) ) > } W ( x(i), , * ) . (22)

    For example, if all is equal to *, in that case l v ( )= f ( x*; ),which typically would be a very small number. l v ( * ) isestimated by Importance Sampling with v k =

    * = argmaxv

    N i= 1 E v H ( x ; ) ln f ( x ; ) , (23)

    and using x(i), which are generated from pmf f ( x ; ) byargmax

    v N i= 1 H ( x(i) ; ) ln f ( x(i) ; ) . (24)

    However, the estimator of * is only valid when H ( x(i) ; ) =1for enough samples.

    The CE-algorithm consist of the two phases: 1) Generaterandom samples using f (-;vk ), and calculate the estimate of theobjective function; 2) Update f (-;v k ) based on the datacollected in the first phase via the CE method. The main CEAlgorithm is

    1) Initialize k =0, and choose initial parameter vector v 0=v .2) Generate a sample of states x(1) , ..., x( M ) according to the

    pmf f (-;v k ) .3) Calculate the scores S ( x(i) ) for all i, and order them from

    the biggest to the smallest: s1, ... s N . Define k = s [ N ] ,where [ N ] is the integer part of N , so that k > , set k = ,this yields a reliable estimate by ensuring the target event is

    temporarily made less rare for the next step if the target is notreached by at least a fraction of the samples.

    4) For j=1, ..., 5, let

    v k+ 1 ,j=( )

    ( )

    ( )( )1 { ( ) }

    ( )1 { ( ) }

    ( ; , )

    ( ; , )

    ik

    ik

    M iik i S x

    M ik i S x

    I W x x

    I W x

    = >

    = >

    . (25)

    5) Increment k and repeat steps 2, to 5, until the parametervector has converged. Let * denote the final parameter vector.

    6) Generate a sample x(1) , ..., x( N ) according to the pmf f (-; v k ) and estimate l via the IS estimate

    l ^ = 1 N Ni=1 H ( x(i)) W ( x(i) ; v , * ) . (26)

    Step (6) is not necessary in optimization of the scalar cost

    functions, since the goal is to minimize the cost rather thancalculation of the probability of a rare event.

    IV. DIFFERENTIAL EVOLUTIONARY OPTIMIZATION

    For a given problem described by a parameter space X and anobjective function S ( x) defined on this space, an initial

    population of randomly selected individuals { x(i) | i=1,..., M } isconstructed and each individual is evaluated according to theobjective function. The objective function S ( x(i)) value for anindividual x(i) is referred to its fitness and it is a relativemeasure of functional performance specifying how well theindividual satisfies the optimization criteria. In GeneticAlgorithms, this population is evolved over a number ofgenerations, by a process based on three genetic operators:reproduction , crossover , and mutation . Reproduction is a

    process in which individuals are selected to a mating pooldepending on their fitness. A higher probability of generatingoffspring in the next generation is assured by assigning a highermating probability to that individual with a higher fitnessvalue. This imitates the natural selection, in which the fittesttend to survive and reproduce, thus propagate their geneticmaterial to future generations.

    Differential evolution uses a special kind of mutation as themain search mechanism where a greedy selection approach isemployed to direct the search toward the promising regions ofthe search space. The fundamental difference between GAs andDE is that GAs rely on crossover, while DE uses mutations ofthe differences of the parameter vectors as the primary searchmechanism.

    Fig. 3. The Differential Evolution template

    DE search strategies are classified in four groups [7]. Thisapplication used the rand3 search strategy, where the newindividuals are generated from the combination of mutuallydistinct three random individuals within the population.Differential mutation expands the search space by adding theweighted difference of two randomly chosen vectors to a thirdone. That is, in k th generation of an M -vector population, for agiven solution x(i, k ) ( i =1, , M , and k =1, , Gmax ), threevectors x(r 1, k ), x(r 2, k ) , x(r 3, k ) are uniformly randomly selectedfrom the population such that the indices i, r 1, r 2 and r 3 aremutually distinct. A donor vector is formed adding theweighted difference F ( x(r 2, k ) x(r 3, k ) ) to x(r 1, k )

    ( ) ( ) ( ) ( ), 1 1, 2, 3, ( )i k r k r k r k v x F x x+ = + (27)where F (0,2] is called the mutation constant. The number ofdonor vectors generated this way is equal to the population size[13]. Other variations of the mutation are also proposed inliterature [21]. Mutation is followed by a non-uniformrecombination procedure in which a trial vector z (i,k +1) isgenerated as

    1. Generate initial population2. Evaluate population3. Repeat

    Select Parents for ReproductionDifferential MutationEvaluate new populationSubstitute old population

    Until Termination Conditions are satisfied .

    Engineering Letters, 16:1, EL_16_1_05 ______________________________________________________________________________________

    (Advance online publication: 19 February 2008)

  • 8/12/2019 EL_16_1_05

    5/9

    ( , 1)( , 1)

    ( , )

    if rand or irand(1, )

    otherwise

    i k ji k

    j i k j

    x CR j N z

    x

    ++

    < ==

    (28)

    where CR is the crossover rate; rand generates a randomnumber in [0,1); irand( p,q) generates an integer number in theinterval,[ p, q]; and z j ( j{1, ... , N }) denotes the jth entry of the trial

    parameter vector z . Finally, the greedy selection proceduredetermines the ith offspring as

    ( , 1) ( , 1) ( , )( , 1)

    ( , )

    if ( ) ( )

    otherwise

    i k i k i k i k

    i k

    z S x S x x

    x

    + ++

  • 8/12/2019 EL_16_1_05

    6/9

    are based on plant identification or state estimation techniques.For example, in the following simple demonstration, thecartesian absolute path deviation ( e(t)= | p(t) pd (.)| ) is usedinstead of using the conventional trajectory tracking error(e(t)= q(t) qd (t) ) in joint space.

    VI. SIMPLE 2-LINK (2R) DEMONSTRATION

    Consider the two link manipulator with two revolute jointsshown in Fig. 1. The length of the links L1 and L2 arerespectively b1, and b2. The masses m1 and m2 arehomogenously distributed along the links L1 and L2.Coordinate frames 0T , 1T and 2T are assigned for the base, L1,and L2 by applying D-H convention as shown in Fig-1. Thesymbolic equation of motion for this planar manipulator isderived in symbolic toolbox of MATLAB using Lagrangianformulation

    1 = (13 m1b12 + 13m2b12 +m2 b1 b2 C2 +m2b12) q..

    1

    +( 13 m1b12m1b1b2 12 m1b12 C2+m1b22 +m1b1b2 C2) q..2

    m2b1b2 S2 q.1q

    .2 12 m2b1b2 S2 q

    .22

    + 12 g a (b1m1C1 + b2m2C12 +2 b 1m2C1) , (24) 2 = + (13m1b12m1b1b2 12 m1b12C2+m1b22+m1b1b2C2)q

    ..1

    + 13 m2b22 q

    ..2 + 12 m2 b1b2 S2 q

    .12 + 12 g a m2 b2C12 , (25)

    where C i, and S i, denote cos( qi) and sin( q i); C ij, and S ij, denotecos( q i + q j) and sin( qi + q j). In the simulation, the equation ofmotion is integrated for m1 = 5 kg, m2 = 3 kg, b1 = 0.5 m andb2 = 0.4 m, reducing the simulation model to

    s,1 = (199 150 +35 C2) q

    ..1 + (1360 +

    38 C2) q

    ..2

    35 S2 q.1q.2 310 S2 q

    .22 + 12 g a (112 C1 +

    65 C12)

    s,2 = (1360 +38 C2) q

    ..1 + _425 q

    ..2 + 310 S2 q

    .22 + 35 g a C12. (26)

    Fig. 2. Parameters of the simulated manipulator with 2-revolute joints.

    All joint and actuator friction forces are assumed to be zeroto prevent their stabilizing effect on the closed loop stability ofthe system. The feedforward control force c is calculated onlyfrom the inertial and gravitational terms of a similar model(c= D c(qd ) q d + g c(qd ) ) but with a higher load massm2=3.5kg.

    c,1 (q) = (887 600

    + 710

    C2) q..1+(1360

    + 38

    C2) q..2d +12 g a (6C 1+

    7

    5C12)

    c,2 (q) = (1360 +38 C2) q

    ..1 + 710 g a C12 . (27)

    The test path is selected on a line segment in cartesianworkspace starting from the point ( 0 x, 0 y) = (0.2, 0) to the point(0.7, 0.5) through the via-point (0.4, 0.2). The test trajectory is

    calculated for t = 1 ms intervals using parabolic blend withlinear segments trajectory generation method with via points[11] under the constraints: maximum linear velocity= 1m/s, andlinear acceleration 1 m/s 2. The points of the desired trajectoryare shown in Fig. 2 for every 50 ms periods.

    0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

    0.1

    0.2

    0.3

    0.4

    0.5

    0. 8 0 .9 1 1. 1 1 .2 1. 3 1. 4 1. 5-3

    -2.5

    -2

    -1.5

    -1

    -0.5

    (a) (b) (c)

    Fig. 3 . Plots of the desired trajectory(a) x vs. y (in meters), (b) q1 vs. q2 (in radians),(c) the movement of the links in x- y plane.

    The real-time cost function was chosen to be J (t , p(t ), pd (t )) = || p(t ), pd (t )||

    where p(t ) and pd (t ) are the actual and desired end-pointtrajectory in the cartesian space, and || a , b|| denotes thecartesian distance between the points a and b.

    The goal of the algorithm was to search smallest scoring PID parameter set K PID = ( K 1 P , K 1 I , K 1 D K 2 P , K 2 I , K 2 D ) starting witha default stable parameter set

    x s = (5000, 200, 80, 5000, 200, 80) ,without an intolerable tracking error cost.

    The cost threshold was set to J *

    = 0.002 m, and the cost scorefunction S ( w, pd (.)) was evaluated by the maximum of J (t )over one trajectory period. The population size was taken N=60; elite population ratio was taken =0.05; smooth updatecoefficients were = 0.9, = 0.9, and q s=5. In generating

    populations, the initial standard deviation of the gaussian pdfwas set to 0 = 0.5 x s. The convergence plot of cartesiantracking displacement error for Gmax=50 generations is shownin Fig. 4.

    Fig.5 demonstrates typical tracking cases observed duringthe optimization of K PID . The tracking error of the manipulatorfor with K PID = x s is seen in Fig. 5.a. The effect of switching

    K PID from an unstable setting x(12,7) = ( 3132, 268, 50, 2121,221, 98) to x s=(5000, 200, 80, 5000, 200, 80) is seen in Fig. 5.b.The trajectory plots in Fig.5.c was obtained with the best scored

    parameter vector of 50 generations, x(4,37) = (13032, 452, 136,5391, 46, 161).

    Fig. 4 . Convergence plot of the best-score for two-link manipulator.

    0 0.1 0.2 0.3 0.4 0.5 0.60

    0.1

    0.2

    0.3

    0.4

    0.5

    0.7

    y

    x

    b1

    b2

    0x

    0y

    1x

    1y

    q1

    2x2y

    End-point location, p q2 m2

    m1

    0 10 20 30 40 500.0012

    0.0014

    0.0016

    0.0018

    0.0020es score

    CE-iteration

    q1

    q2y

    x

    Engineering Letters, 16:1, EL_16_1_05 ______________________________________________________________________________________

    (Advance online publication: 19 February 2008)

  • 8/12/2019 EL_16_1_05

    7/9

    Fig. 5 a . Tracking performance of manipulator withinitial stable control parameter set w s.

    `Fig. 5 b . An unstable parameter set causes excessive cartesian error

    at t =0.42, and control is switched back to x s.

    Fig. 5 c . The best parameter set reduced the cartesian error to 50%, but bounded oscillatory action of torques were observed

    because of the high proportional gain.The CE-optimization provided almost 50% reduction ofcartesian displacement error. At the wide open configuration ofthe manipulator the optimized controller settings produce anoscillatory action of actuator torques. But the oscillatorytorques has a very minor effect on the tracking error. Thisoscillatory behavior may be prevented by assigning anadditional cost on the derivative of the controller output.

    VII. PID SETTINGS OF AN I NVERTED PENDULUM

    The inverted pendulum is a well analyzed classical control problem [1], [18]. The objective is to track a desired carttrajectory with a small deviation e x by applying an externalforce f on the cart along x-axis while keeping the pendulumstable in vertical position. The mechanical parameters of thesystem are introduced in Fig.6.

    Fig. 6 . Inverted Pendulum System.

    The equation of motion of the cart and pole inverted pendulum system is expressed by

    2

    c p p q

    p q p

    m m m bC x

    qm bC J m b

    + +

    +0

    c p q

    p

    C m b q S x

    C q

    +

    0

    0 p a q

    f m b g S

    =

    , (31)

    where C q and S q are cos( q) and sin( q), g a is the accelerationdue to gravity, mc is the mass of cart, m p is the mass of the pole,b is the half length of the pole, C c is the friction coefficient ofcart, C p is the friction of the pole, J is the mass moment inertialof the pole, and f is the force applied to the cart [18]. The polemass distributed homogeneously along the pole is J = 1/3 m pb

    2.

    Fig. 7 . Block diagram of inverted pendulum systemfor parameter optimization

    Two PID controllers were cascaded to reduce the trackingerrors e x = x xd , and eq = q qd , asymptotically to zero asshown in Fig 7. The goal of the exercise is to search the lowestcosting PID parameters K PID = [ K xP , K xI , K xD, K qP , K qI , K qD].For this cascaded PID control scheme, an oscillatory behaviorwas expected since no predictive or inverse dynamicfeedforward action employed.

    In the tests, a cart and pole system is simulated with thehalf-length of the pole b was taken 0.5m, the cart mass mc=1kg, and pole mass m p =1 kg. The pole inertia J p =16/3corresponds to the homogenously distributed mass along thelength of the pole. The desired cart trajectory was specified by asquare function switching between 0.05m and 0.05m at every

    25s, completing its period in 50s. The friction coefficients on

    mc f

    x0

    massm p

    q2b

    0 0.5 1 1.5 2 2.5-20 0

    20 40 60 error along x and joint-1 KP=13032 5391 KI=452 46 KD=136 161

    time (s)

    : (pd-ps)x10000: (qd-qs)x10000

    : Torque (N.m)

    0 0.5 1 1.5 2 2.5-40 -20

    0 20 40 error along y and joint-2

    : (pd-ps)x10000 : (qd-qs)x10000 : Torque (N.m)

    time (s)

    0 0.5 1 1.5 2 2.50

    5 10

    15 maximum deviation from the path : cart-dev x10000 (m) : switchback 0:off 1:on

    time

    0 0.5 1 1.5 2 2.5-40 -20

    0 20 40 60 error along x and joint-1 KP=5000 5000 KI=200 200 KD=80 80

    time (s)

    : (pd-ps)x10000: (qd-qs)x10000

    : Torque (N.m)

    0 0.5 1 1.5 2 2.5-40 -20

    0 20 40 error along y and joint-2

    : (pd-ps)x10000 : (qd-qs)x10000 : Torque (N.m)

    time (s)

    0 0.5 1 1.5 2 2.50 10 20 30 maximum deviation from the path

    : cart-dev x10000 (m) : switchback 0:off 1:on

    time

    100 : Torque (N.m)

    0 0.5 1 1.5 2 2.5-50

    0

    50

    error along x and joint-1 KP=3132 2121 KI=268 221 KD=50 98

    time (s)

    : (pd-ps)x10000: (qd-qs)x10000

    0 0.5 1 1.5 2 2.5-50

    0

    50 error along y and joint-2

    : (pd-ps)x10000 : (qd-qs)x10000 : Torque (N.m)

    time (s)

    0 0.5 1 1.5 2 2.50

    20

    40

    60 maximum deviation from the path : cart-dev x10000 (m): switchback 0:off 1:on

    time

    Equationof Motion

    ShaftEncoder

    x d cascaded controllers

    PID e x x, q f q d

    q f

    +

    +

    Population Based Search AlgorithmPerformanceRequirements

    e q

    K q P , K q I , K qD

    PID

    K x P , K x I , K xD

    x f

    x q

    inverted pendulum system

    .dt .dt

    Engineering Letters, 16:1, EL_16_1_05 ______________________________________________________________________________________

    (Advance online publication: 19 February 2008)

  • 8/12/2019 EL_16_1_05

    8/9

    the cart C c, and on the pole C p were taken 0.01. With thesecoefficients the free motion amplitude of the pole diminishalmost 50 percent in 50s period. The gravitational accelerationwas taken g a= 9.8 m/s 2. Equation of motion was integrated overt =5ms time steps using trapezoidal method. The initial

    parameter set x s = [ K xP , K xI , K xD, K qP , K qI , K qD]

    = [0.05 0.001 0.01 300 0.001 50 ]managed stabilization of the system marginally with largeoscillatory movements. The default parameter set was updatedafter every generation assigning the best scoring x (i,k ) to thedefault stable parameter set xs .

    The cost function was specified using the quadraticdisplacement errors e x2 = ( x xd )2, eq2=(q-q d )2, and theirintegrals:

    J (t , p(t , x(i)), pd )

    =10 e x2 + 8 eq2 + e x2dt +0.6 eq2dt . (32)In (32), the proportional gains on quadratic errors providereal-time cost action, while the integrals of quadratic errorsaccumulating the error for an end-score at the end of the cycle.At every generation k , for each member x(i, k ) of the k th

    population, the PID controller settings were set to the control parameters x(i, k ) at the beginning of the desired trajectorycycle, for 50s, and corresponding score s (i) was evaluated usingS ( x(i)) = J (t f , p(t f , x(i)), pd ) (32) at the end of the trajectorycycle. Control parameters were switched to the stable

    parameter set x s at the threshold of J (t , p(t , x), pd ) > J * = 0.1 .

    a) Adaptation by Real-time Cross Entropy Search Method

    The initial standard deviation of the gaussian pdf of the CEsearch was selected k = 0.5 x s; maximum generations and population size were selected Gmax=50 and M = 30; elite population ratio was = 0.1, smooth update coefficients wereselected to be = 0.9, = 0.9 , and q s = 5.

    Fig.8 shows the convergence-rate of real-time CE search forthree identical cases, with exactly the same initial parametersets. The best scored parameter set of all runs reduced thetracking error score more than 50 percent.

    The plot of x and q for the initial stable parameter set isshown in Fig. 9.a. A typical control parameter switching due tounstable character of the tested parameter set x(i) is given inFig. 9.b, and the initial and final performance of the cart and

    pole system is compared in Fig. 9.c.

    Fig. 8 Convergence-rate plots for three CE search runswith exactly the same initial parameters.

    Fig. 9 .a The initial stable parameter set x s is oscillatory, but it satisfies the tracking error tolerance J *.

    Fig. 9.b During the test of an unstable controller parameter setPID parameters were switched to xs at t =2.8s.

    Fig 9 .c Trajectory with the best scoring PID setting after 50 generations.On the first graph the trajectory (a) belongs to the initial PID settings x s,

    inserted to the graph to compare initial and best-scored cases.

    0 0.005 0.01

    0.015 0.02

    0.025 0.03

    0.035 0.04

    0.045 0.05

    0 10 20 30 40 50CE-iteration

    best-score

    0 5 10 15 20 25 30 35 40 45 50-0.2

    0

    0.2

    Cycle =0 PID-gain= 0.0500 0.0010 0.0100 300.0000 0.0010 50.0000

    s o l i d : x

    d o t t e d : x - r e

    f

    time (s)

    0 5 10 15 20 25 30 35 40 45 50-0.01

    0

    0.01

    s o l i d : q

    d o t t e d :

    f / 1 0 0 0

    time (s)

    0 5 10 15 20 25 30 35 40 45 50-0.1

    0

    0.1 End-score= 0.039811 Cycle= 0

    time (s)

    s o l i d : x - r e

    f

    d a s h - d

    o t : s c o r e

    d o t t e d : s w

    i t c h / 1 0

    0 5 10 15 20 25 30 35 40 45 50-0.1

    0

    0.1

    Cycle =1471 PID-gain= 0.0357 0.0003 0.0312 45.7955 0.0013 4.9168

    s o l i d : x

    d o t t e d : x - r e

    f

    time (s)

    0 5 10 15 20 25 30 35 40 45 50-0.01

    0

    0.01

    s o l i d : q

    d o t t e d :

    f / 1 0 0 0

    time (s)

    0 5 10 15 20 25 30 35 40 45 50-0.1

    0

    0.1 End-score= 0.01957 Cycle= 1471

    time (s)

    s o l i d : x - r e

    f

    d a s h - d

    o t : s c o r e

    d o t t e d : s w

    i t c h / 1 0

    a

    0 5 10 15 20 25 30 35 40 45 50-0.2

    0

    0.2

    Cycle =191 PID-gain= 0.0542 0.0012 0.0242 218.4103 0.0008 3.1737

    s o

    l i d : x

    d o

    t t e

    d : x - r e

    f

    time (s)

    0 5 10 15 20 25 30 35 40 45 50-0.1

    0

    0.1

    s o

    l i d : q

    d o

    t t e

    d :

    f / 1 0 0 0

    time (s)

    0 5 10 15 20 25 30 35 40 45 50-0.2

    0

    0.2 End-score= 0.029355 Cycle= 191 / 151

    time (s)

    s o

    l i d : x - r e

    f

    d a s

    h - d o

    t : s c o r e

    d o

    t t e

    d : s w

    i t c h / 1 0

    Engineering Letters, 16:1, EL_16_1_05 ______________________________________________________________________________________

    (Advance online publication: 19 February 2008)

  • 8/12/2019 EL_16_1_05

    9/9

    b) Adaptation by Real-Time Differential Evolution Method Similar to the CE search cases, the population size and the

    maximum number of generations for the DE search are set to M =30 and Gmax=50. The differentiation coefficient, thecrossover rate, and the mutation probability are set to F = 0.9,

    CR=0.3 and pm=0.25. The initial stable PID parameter vector x s of real-time DE search, the real-time cost function, the scorefunction, and the real time cost threshold value are all chosenthe same as used in CE case.

    Fig.10 shows the convergence-rate of real-time DE searchfor five identical cases, with exactly the same initial parametersets. Each best scored parameter produced by these runsreduced the tracking error score almost 65 percent. Asexpected, the initial parameter set K PID = x s gives a very similartrajectory with the CE case shown in Fig. 9.a. The initial andfinal performance of the cart and pole system is compared inFig. 11.

    Fig. 10 Convergence-rate plots for five DE search runswith exactly the same initial parameters.

    Fig. 11 Trajectory plots comparing the tracking error of (a) initialand the best scored PID parameters obtained by the real-time DE search.

    VIII. CONCLUSION

    This study proposes a real time evolutionary search method that provides stable operation within a specified cost threshold byswitching the unstable control parameters with a prespecifiedstable parameter set whenever a tracking error based costfunction exceeds a threshold. The proposed method has beentested successfully on an open chain manipulator dynamiccontrol, and on the control of an inverted pendulum, where

    without a feedforward or predictive control the system is stableonly in a very narrow band of PID settings.

    During the tests none of these two systems has been falleninto non controllable states. In CE method, even though someof the oscillatory controller settings collected better scores thansimilar but less oscillatory settings, the mean of the elitesalways remained preferably stable. In both 2R and inverted

    pendulum cases, over 50% reduction of the specified costfunction is achieved by population-based real-time searchmethod during the progress of its repetitive operation.

    ACKNOWLEDGMENT Special thanks go to my colleague Dr. Adnan Acan for discussing

    several issues on implementation of the proposed method. Thanks tomy graduate robotic course students: Maneli Badakshan, VassilyaAbdulova, Dilek Beyaz, Duygu elik for their efforts in reproductionof the trajectory planning algorithms; M. Reza Najiminaini, and A.A.M.Abed for their efforst in deriving the symbolic equation of motion.

    R EFERENCES [1] Kelly, R. Control of robot manipulators in joint space. (Advanced textbooks

    in control and signal processing)Control of Robot Manipulators in JointSpace), Springer-Verlag London Limited 2005

    [2] Paul, R.C., Modeling, Trajectory Calculation, and Servoing of aComputer Controlled Arm Stanford A.I. Lab, A.I. Memo 177, Stanford,CA, Nov.1972.

    [3] Paul, R.P., Robot Manipulators: Mathematics Programming, and Control,MIT Press. Cambridge 1982.

    [4] Fu, K.S., Gonzales, R.C., Lee C.S.G., Robotics Control, Sensing, Vision,and Intelligence, Mc.Graw-Hill International Ed. Singapore, 1988.

    [5] Mark W. Spong", "Motion Control of Robot Manipulators", onlinearticle: url:citeseer.ist.psu.edu/165889.html .

    [6] Bodur M., Sezer M. E., Adaptive control of flexible multilink manipulators,Int.J.Control, vol.58, no.3, pp 519-536, 1993.

    [7] V. Feoktistov, Differential Evolution In Search of Solutions Optimization and Its Applications Vol 5, Springer, USA, 2006

    [8] Chuan-Kai Lin, Reinforcement learning adaptive fuzzy controller forrobots, Fuzzy Sets and Systems, Elsevier Vol.137, pp.339-352, 2002.

    [9] Kullback S. Information theory and statistics. NY: Wiley, 1959.[10] J. Shore and R. Johnson. Properties of cross-entropy minimization. IEEE

    Trans. Information Theory, 27(4):472-482, 1981[11] P.T. de Boer, D.P. Kroese, S. Mannor, and R.Y. Rubinstein. A tutorial on

    the cross-entropy method. Annals of Operations Research, 2004[12] D.E. Goldberg, Genetic Algorithms in search, Optimization, and Machine

    Learning, Addison Wesley, Massachusetts, USA, 1989.[13] R. Storn and K. Price, Differential evolution: a simple and efficient

    adaptive scheme for global optimization over continuous spaces, Journalof Global Optimization Vol. 11 , pp. 341-359, 1997.

    [14] Kenneth V. Price. Genetic annealing. Dr. Dobbs Journal, pages 127132,October 1994.

    [15] Bodur, M.; Acan, A.; Akyol, T. Fuzzy System Modeling with the Geneticand Differential Evolutionary Optimization, International Conference on

    Computational Intelligence for Modelling, Control and Automation,Volume 1, pp. 432 - 438, 28-30 Nov. 2005

    [16] Uicker, J. J., Denavit, J. , Hartenberg, R. S., An iterative method for thedisplacement analysis of spatial mechanisms. Journal of AppliedMechanics. 26. 309-314, 1964.

    [17] S. B. Niku, Introduction to Robotics, Analysis, Systems, Applications.Prentice Hall Inc. pp 153-165, 2001.

    [18] Shozo Mori, et al., Control of unstable mechanical system, control of pendulum, Internat. J. Control 23 (5) pp. 673692, 1976

    [19] Rubinstein, R. Y. and Kroese, D.P. (2004). The Cross-Entropy Method:AUnified Approach to Combinatorial Optimization, Monte-CarloSimulation and Machine Learning. Springer-Verlag, New York.

    [20] D.P. Kroese and K.-P. Hui: Applications of the Cross-Entropy Method inReliability, Computational Intelligence in Reliability Engineering (SCI)40, 3782 Springer-Verlag Berlin Heidelberg (2007).

    [21] H.Y. Fan, J. Lampien, A trigonometric mutation operation to differential

    evolution, J. of Global Opt. , Vol. 27, pp. 105-129, 2003.

    00.0050.01

    0.0150.02

    0.0250.03

    0.0350.04

    0.0450.05

    0 10 20 30 40 50Generations

    best-score

    0 5 10 15 20 25 30 35 40 45 50-0.1

    0

    0.1

    Cycle =1436 PID-gain= 0.0562 0.0011 0.0583 103.2069 0.0009 6.2266

    s o

    l i d

    : x

    d o

    t t e d : x - r e

    f

    time (s)

    0 5 10 15 20 25 30 35 40 45 50-0.02

    0

    0.02

    s o

    l i d : q

    d o

    t t e

    d :

    f / 1 0 0 0

    time (s)

    0 5 10 15 20 25 30 35 40 45 50-0.1

    0

    0.1 End-score= 0.01316 Cycle= 1436

    time (s)

    s o

    l i d : x - r e

    f

    d a s

    h - d o

    t : s c o r e

    d

    o t t e

    d : s w

    i t c

    h / 1 0

    a

    Engineering Letters, 16:1, EL_16_1_05 ______________________________________________________________________________________

    (Advance online publication: 19 February 2008)