||Several critical issues for reinforcement learning (RL) have been identified from our SDC work. First, the state-action space is often huge. Second, how to achieve distributed RL, with fragmented SDC as a special form, is an open issue. Third, SDC is a system of subsystems, each controlled by an RL agent. The overall system is controlled by a global agent, which interacts with the subsystem agents. Efficient hierarchical RL design remains an open problem. First, this paper tackles the space explosion issue by observing that the state-transition probability matrix for the underlying Markov Decision Process (MDP) is often sparse, which can thus be decomposed into submatrices. Each submatrix corresponds to an independent RL problem (subsystem), for which a neural network (NN) is used to approximate the Q-value function. These NNs for the subsystems are merged as inputs to another NN at the next stage to capture interactions among the subsystems. Second, using the same architecture, Q-values can be merged from different sources to achieve distributed RL. Similarly, the RL agents control different subsystems and the second NN at the next stage can be used to serve the global RL functions. Stratified training can be performed first for the individual subsystems and then for the global RL agent.