EPSRC Reference: |
GR/J82386/01 |
Title: |
STOCHASTIC REINFORCEMENT ALGORITHMS FOR LEARNING CONTINUOUS FUNCTIONS USING PROBABILISTIC RAM NETS |
Principal Investigator: |
Gorse, Dr D |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Computer Science |
Organisation: |
UCL |
Scheme: |
Standard Research (Pre-FEC) |
Starts: |
18 April 1994 |
Ends: |
17 April 1996 |
Value (£): |
98,312
|
EPSRC Research Topic Classifications: |
|
EPSRC Industrial Sector Classifications: |
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
1. To develop a form of pulse-driven stochastic reinforcement training which is able to learn real-valued (continuous) functions and which is suitable for hardware-implementation using probabilistic net (pRAM) technology. 2. To demonstrate the effectiveness of the algorithm in a variety of application areas, including pattern recognition, time series prediction and real-time control.Progress:1.(a) We have developed a modified form of error-dependent reward and penalty function which gives faster convergence and lower final error levels.(b) We have extended the use of the output transform described in the project proposal to include an adaptable threshold as well as an adaptable gain thus giving greater flexibility to the learning system.(c) We have explored the use of an alternative output transform module (also with adaptable gain and threshold) which has the advantage of a simpler hardware realisation.2.(a) We have used the system for the classification of high-dimensional real-world pattern data. This has involved exploring the use of pyramidal architectures and multiple sampling of data points in order to ensure adequate generalisation.(b) We have used the system in a benchmark time series prediction problem, the well known sunspot numbers prediction task. The pRAM system was able to predict more accurately than a conventional neural network system of comparable complexity.(c) We have extended the learning techniques to situations of delayed reinforcement, using traces to record past experiences and actions and applied the new techniques successfully to the classic pole-balancing (inverted pendulum) control problem.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
|