WBLC: Whole Body Locomotion Controller and Reinforcement Learning-Based Locomotion Planning

WBLC: Whole Body Locomotion Controller and
Reinforcement Learning-based Locomotion Planning

Donghyun Kim, Jaemin Lee, and Luis Sentis
University of Texas at Austin, Austin, USA
dk6587@utexas.edu, jmlee87@utexas.edu, lsentis@austin.utexas.edu
1 Introduction where x and q denote the task and joint variables and J is
1 > 1
the corresponding Jacobian. And, J 1 = A1 J >

1 J1A J1
Robots agile locomotion is significantly challenging and denotes dynamically consistent inverse of J 1 and A is the
complicate due to lots of issues, for instance, multi-body dy- mass/inertia matrix. The joint acceleration for n tasks can
namics, constraints of reaction forces and kinematics, foot be computed as following:
placement selection and so on. Although the control frame-
n
work becomes more complicated for agile locomotion be-
q[task] = J 1 ed1 + qk , (n 2) (2)
cause of these tough constraints, machine learning techniques k=2
open the potential to be used in robot locomotion to improve
agility of robot walking. As a first step toward to devel- and
oping model-based controller incorporating into intelligent k1
!

locomotion planner, we propose Whole Body Locomotion qk = J k||1 edk Jk qn , J k||1 = J k N k1
Controller (WBLC) and Reinforcement Learning based Phase n=1
(3)
Space Planner (RL-PSP).
k

N k = N s|s1 (k 2, N 1 = N 1|0 = N 1 ).
WBLC is the locomotion oriented Whole Body Operational s=1
Space Control (WBOSC) in [1]. Hierarchy-based Whole Using the generalized formulation, we can define task spec-
Body Control (WBC) has been widely utilized to enable hu- ifications with acceleration level without coriolis/centrifugal
manoid robots to execute various and complex motion tasks. and gravity terms in dynamics equations.
For instance, WBCs have covered sequentially scheduled
motions, joint limit avoidance [2], obstacle avoidance and 2.2 Reaction Force Optimization
switching multiple tasks [3]. In this work, we strongly con- Based on the task specification defined in the previous sec-
centrate on building an efficient strategy for agile locomotion, the operational space framework for whole-body control
tion tasks using WBCs. We define proper locomotion task can be designed with respect to the desired joint acceleration
specifications with hierarchy and treat inequality constraints from equation (2) as following:
of bipedal robots. n
o
A q[task] + N n qres + b + g + J > >
r Fr = U (4)
Second, we propose RL-PSP, which is the foot placement
planner addressing the kinematics limitations of legged part, where b and g are the coriolis/centrifugal and gravity terms,
swing speed of foot, and uncertainty of center of mass (CoM). respectively. Fr and J r denote the reaction force and the cor-
To enhance the learning performance, we exploit Phase Space responding Jacobian. and U represent the torque and the
Planning framework [4] in the RL formulation. We verify the selection matrix for underactuated floating base. And, qres is
performance of the proposed approach by implementing sim- the residual joint acceleration for reaction force control.
ulation using human-size full humanoid robot Valkyrie.
To find the reaction forces (Fr ), we need to specify the com-
mand for Centroidal Momentum (CM) task. Although the
2 Whole Body Locomotion Controller (WBLC)
linear part of it is easily defined by desired CoM motion,
In this work, WBLC consists of two parts, which are to de- which ordinary comes from locomotion planners, Centroidal
fine acceleration-based task specifications with hierarchy, and Angular Momentum (CAM) is not obvious to define. Existing
then, to obtain torque command by solving reaction force op- WBCs set zero centroidal angular velocity as the command.
timization problem and dynamics equation of robot. However, it is not always good to minimize the angular mo-
mentum since the desired CAM is not zero when robots rotate
links by the help from the reaction forces. Since it is possible
2.1 Acceleration-based Formula with Hierarchy to allocate different levels of hierarchy to both primal tasks
The joint acceleration for the task xxd1 can be defined and centroidal angular momentum task, the command torque
can minimize centroidal angular momentum as long as it does
q1 = J 1 xd1 JJ 1 q = J 1 ed1 (1) not intervene other tasks with higher priority.
Figure 1: Body Orientation Test. (a) CAM task is above the joint Figure 2: Turning Walking. Valkyrie makes constant turning mo-
position task. (b) CAM task is below the joint position tion with learned policy (x p , vapex )
task
learning process with an initial state. However, adding many

Using this motion equation, we can formulate the optimiza-
terminal conditions reduces the efficiency of the learning.
tion problem for regulating linear and angular part of CM as
following: Instead of adding constraints to the terminal condition, we
utilize Phase Space Planning (PSP) in RL formulation. In the
min F> d
r Q Fr + kFcm,ang W
W Fr k2
formulation, the action is [ x p , vapex ]> and y p and tswitch are
Subject to. |Fr,z | |Fr,x |, |Fr,z | |Fr,y | (5) computed by the planner. Moreover, the analytic process to
Fdcm,lin W
W Fr =0 find [ y p , tswitch ]> speeds up the learning process [4]. The
robustness to the disturbance in CoM allows a robot to turn
where Fdcm,lin and Fdcm,ang are the desired linear and angular with a small compensation of the forward speed (Fig. 2).
parts of centeroidal momentum. After obtaining command for
the centroidal angular momentum task, the torque command 4 Future Work
and the residual joint acceleration are computed as
We are preparing a journal paper that explains details of
WBLC and RL-PHP with extensive verifications and proofs.
h i+

= U > A AN n A q[task] + b + g + J >
r Fr The paper also includes the method to obtain the time deriva-
qres
(6) tive of jacobian times joint velocity, which is the term often
where (.)+ denotes the pseudo inverse of (.). Finally, the ob- ignored in most implementations of WBC.
tained command torque can control tasks related to locomo-
tion task such as the body orientation with hierarchy as shown References
in Figure 1.
[1] D. Kim, Y. Zhao, G. Thomas, B. Fernandez, and
L. Sentis. Stabilizing series-elastic point-foot bipeds using
3 Reinforcement Learning based Phase Space Planning whole-body operational space control, IEEE Transactions on
Robotics, 32(6):13621379, 2016.
The main challenge in the application of Reinforcement
Learning (RL) into the locomotion is that legged systems [2] M. Mansard, O. Khatib, and A. Kheddar, A unified ap-
high degree of freedom significantly increases the complex- proach to integrate unilateral constraints in the stack of tasks,
ity of the RL. However, equipping a method to replicate the IEEE Transactions on Robotics, 25(3), 670-685, 2009
simplified models dynamics in a high-dimensional system al- [3] J. Lee, N. Mansard, and J. Park, Intermediate de-
lows to utilize the simplified model such as Linear Inverted sired value approach for task transition of robots in kinematic
Pendulum (LIP) model. The actions that the agent needs to control, IEEE Transactions on Robotics, 28(6), 1260-1277,
provide are just a foot placement (x p , y p ) and the time to com- 2012
mit to the location, which is equivalent to the time to switch [4] J. Ahn, O. Campbell, D. Kim, and L. Sentis, Contin-
the stance leg (tswitch ). uous Cyclic Stepping on 3D Point-Foot Biped Robots Via
Constant Time to Velocity Reversal, submitted at Robotics:
Even if we tackle the dimensionality problem through the
Science and Systems, 2017.
simplified model and WBLC, addressing the various con-
straints realized in locomotion, such as forward walking and
cyclic lateral motion, is challenging. The common and naive
way is to set terminal conditions based on whether the motion
satisfies the constraints or not. For instance, whenever CoM
move backward, a learner terminates update and restarts the

WBLC: Whole Body Locomotion Controller and Reinforcement Learning-Based Locomotion Planning

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

WBLC: Whole Body Locomotion Controller and Reinforcement Learning-Based Locomotion Planning

Încărcat de

Drepturi de autor:

Formate disponibile

WBLC: Whole Body Locomotion Controller and

Reinforcement Learning-based Locomotion Planning

learning process with an initial state. However, adding many

S-ar putea să vă placă și