Production systems in Industry 4.0 are characterized by a high degree of system networking and adaptability. They are often characterized by jointed-arm robots, which have a high degree of adaptation. Networking and adaptivity increase the flexibility of a system, but also the complexity of the control, which requires the use of new development methods. In this context, the Simulation-Based Control approach, a model-based design method, and the concept of Reinforcement Learning (RL) are introduced and it is shown how a task-based robot control can be learned and executed. Afterwards, the time complexity of the Q-learning method will be examined using the application example of a robot-based assembly cell with two differently flexible system configurations. It is shown that, depending on the system configuration, the time complexity of learning can be significantly reduced when using several agents. In the studied case, the complexity decreases from exponential to linear. The modified RL structure is discussed in detail.