We developed a reinforcement learning-based algorithm to maximize project value or benefit subject to due date and budget constraints in multi-mode projects. Each activity mode, apart from containing cost and duration data, may be associated with one or more value attributes, thus integrating project scope and product scope. The activity durations are stochastic and we find solutions that satisfy given on-time and on-budget probabilities. By selecting a mode for each activity, the value of the project is determined, and stability is achieved by the insertion of time buffers according to the modes selected. In our experiments, our algorithm outperformed a previously published genetic algorithm.