Classes
-
MdpFactory ⇐
EnvironmentFactory
-
Class for constructing an Environment implemented as a ReduxMDP
-
ReduxMDP ⇐
Environment
-
Class representing in an Environment as an MDP using Redux.
Typedefs
-
State :
*
-
The underlying state representation of the environment. Should be a serializable object, e.g. state => JSON.parse(JSON.stringify(state)) should be an identity
-
MdpAction :
*
-
An object representing an action in an MDP. The type is specific to the MDP.
-
Observation :
*
-
An object representing the observation of an agent in the current state. The type is specific to the MDP.
-
ReduxAction :
Object
-
An Redux action. e.g. a Flux Standard Action: https://github.com/redux-utilities/flux-standard-action Your MdpAction will be converted into a ReduxAction by resolveAction
-
reducer ⇒
State
-
A Redux reducer. Computes the next state without mutating the previous state object
-
getObservation ⇒
Observation
-
A function to get the observation of the agent given the current state.
-
computeReward ⇒
number
-
A function to compute the reward given a state transition, i.e. (s, a, s). This function should be completely deterministic; any non-determinism should be handled by resolveAction.
-
isTerminated ⇒
boolean
-
A function to compute whether the environment is terminated, i.e. the current episode is over.
-
resolveAction ⇒
ReduxAction
-
A function to resolve a MdpAction into a ReduxAction. Any non-determinism in your environment should go here, as your Redux reducer should be completely deterministic.
EnvironmentFactory
MdpFactory ⇐ Class for constructing an Environment implemented as a ReduxMDP
Kind: global class
Extends: EnvironmentFactory
-
MdpFactory ⇐
EnvironmentFactory
new MdpFactory(params)
Create a factory for a particular MDP
Param | Type | Default | Description |
---|---|---|---|
params | object |
Parameters for constructing the MDP | |
params.reducer | Reducer |
Redux reducer representing the state of the MDP | |
params.getObservation | getObservation |
Compute the current observation | |
params.computeReward | computeReward |
Compute the current reward | |
params.isTerminated | isTerminated |
Compute whether the environment is terminated | |
[params.resolveAction] | resolveAction |
Resolve the MdpAction into a ReduxAction | |
[params.gamma] | number |
1 |
Reward discounting factor for the MDP |
ReduxMDP
mdpFactory.createEnvironment() ⇒ Create an instance of the environment.
Kind: instance method of MdpFactory
mdpFactory.setMdpMiddleware(middleware)
Configure any MdpMiddleware that should be part of the next invocation of createEnvironment()
Kind: instance method of MdpFactory
Param | Type |
---|---|
middleware | function |
mdpFactory.setReduxMiddleware(middleware)
Configure any ReduxMiddleware that should be part of the next invocation of createEnvironment()
Kind: instance method of MdpFactory
Param | Type |
---|---|
middleware | function |
Environment
ReduxMDP ⇐ Class representing in an Environment as an MDP using Redux.
Kind: global class
Extends: Environment
*
State : The underlying state representation of the environment. Should be a serializable object, e.g. state => JSON.parse(JSON.stringify(state)) should be an identity
*
MdpAction : An object representing an action in an MDP. The type is specific to the MDP.
*
Observation : An object representing the observation of an agent in the current state. The type is specific to the MDP.
Object
ReduxAction : An Redux action. e.g. a Flux Standard Action: https://github.com/redux-utilities/flux-standard-action Your MdpAction will be converted into a ReduxAction by resolveAction
Kind: global typedef
Properties
Name | Type | Description |
---|---|---|
type | string |
Each action must have a type associated with it. |
[payload] | * |
Any data associated with the action goes here |
[error] | boolean |
Should be true IIF the action represents an error |
[meta] | * |
Any data that is not explicitly part of the payload |
State
reducer ⇒ A Redux reducer. Computes the next state without mutating the previous state object
Kind: global typedef
Returns: State
- The new state object after the action is applied
Param | Type | Description |
---|---|---|
state | State |
The current state of the MDP |
action | ReduxAction |
The resolved action for the MDP |
Observation
getObservation ⇒ A function to get the observation of the agent given the current state.
Kind: global typedef
Returns: Observation
- The observation for the current state
Param | Type | Description |
---|---|---|
state | State |
The current state of the MDP |
number
computeReward ⇒ A function to compute the reward given a state transition, i.e. (s, a, s). This function should be completely deterministic; any non-determinism should be handled by resolveAction.
Kind: global typedef
Returns: number
- The reward for given the state transition.
Param | Type | Description |
---|---|---|
state | State |
the current state for the MDP |
action | ReduxAction |
The next action |
nextState | State |
the next state for the mdp |
boolean
isTerminated ⇒ A function to compute whether the environment is terminated, i.e. the current episode is over.
Kind: global typedef
Returns: boolean
- True if the environment is terminated, false otherwise.
Param | Type | Description |
---|---|---|
state | State |
the current state for the MDP |
action | ReduxAction |
The next action |
nextState | State |
the next state for the MDP. |
time | number |
The current timestep of the MDP, useful for finite horizon MDPs. |
ReduxAction
resolveAction ⇒ A function to resolve a MdpAction into a ReduxAction. Any non-determinism in your environment should go here, as your Redux reducer should be completely deterministic.
Kind: global typedef
Returns: ReduxAction
- The new state object after the action is applied
Param | Type | Description |
---|---|---|
state | State |
the current state for the MDP |
action | MdpAction |
The resolved action for the MDP |