Imitation learning for modelling air combat behaviour — an exploratory study

Gorton, Patrick; Asprusten, Martin; Bråthen, Karsten

dc.contributor.author	Gorton, Patrick	en_GB
dc.contributor.author	Asprusten, Martin	en_GB
dc.contributor.author	Bråthen, Karsten	en_GB
dc.date.accessioned	2023-01-18T18:58:34Z
dc.date.available	2023-01-18T18:58:34Z
dc.date.issued	2023-01-13
dc.identifier	1554
dc.identifier.isbn	978-82-464-3454-4	en_GB
dc.identifier.uri	http://hdl.handle.net/20.500.12242/3136
dc.description.abstract	Fighter pilots commonly use simulators to practice their required tactics, techniques and procedures. The training may involve computer-generated forces controlled by predefined behaviour models. Such behaviour models are typically manually crafted by eliciting knowledge from experienced pilots and take a long time to develop. Nonetheless, these behaviour models generally fall short due to their predictable nature and lack of adaptivity, and the instructors must spend time manually monitoring and controlling aspects of these forces. However, recent advances in artificial intelligence (Al) research have developed methods capable of producing intelligent agents that beat expert human players in complex games such as Go and StarCraft II. Similarly, one may use methods from Al to compose advanced behaviour models for air combat, allowing the instructors to focus more on the pilots’ training progression rather than manually controlling their opponents and teammates. Such intelligent behaviour must perform realistically and follow the correct military doctrines to prove useful for pilot training. One possible way of achieving this is through imitation learning, a machine learning (ML) type where agents learn to imitate examples given by expert pilots. This report summarizes work on optimizing air combat behaviour models using an imitation learning technique. These behaviour models are expressed as behaviour transition networks (BTNs) controlling the computer-generated forces, simulated by the Next Generation Threat System (NGTS), a military simulation application aimed mainly toward the air domain. An adapted version of the genetic algorithm Neuroevolution of Augmenting Topologies (NEAT) optimizes the BTNs to behave similarly to demonstrations of pilot behaviour. As with most ML methods, NEAT requires many consecutive behaviour simulations to yield satisfying solutions. NGTS is not designed for ML purposes, so a system was developed around NGTS that automatically handles simulation and data management and controls the optimization process. A set of experiments were performed in which the developed ML system optimized BTNs to imitate example behaviours across three simple air combat scenarios. The experiments show that the adapted version of NEAT (BTN-NEAT) produces BTNs that successfully imitate simple demonstrations. However, the optimization process took considerable time, up to 44 hours of computation or 92 days of simulated flight time. The slow optimization was mainly influenced by NGTS’s inability to run fast while remaining reliable. This reliability issue is caused by NGTS’s lack of time management, which would have associated the agents’ states with simulation time stamps. To achieve successful behaviour optimization with more complex scenarios and demonstrations, one should simulate the behaviours much faster than in real-time with high reliability. Therefore, we consider NGTS not to be well-suited for future ML work. Instead, a lightweight air combat simulation designed for ML purposes capable of running fast and reliably is needed.	en_GB
dc.description.abstract	Kampflypiloter trener på taktikker, teknikker og prosedyrer ved hjelp av simulatorer. Der inngår datagenererte styrker med forhåndsbestemte oppførsler. Å utvikle slike oppførsler er omfattende. Tradisjonelt lages de for hånd, i tett samarbeid med erfarne piloter. Dessverre har de resulterende modellene ofte svakheter, for eksempel ved at de framstår forutsigbare eller er lite tilpasningsdyktige. Slike problemer gjør at simulatorinstruktørene blir nødt til å følge opp og kontrollere aspekter ved de datagenererte styrkene manuelt. Simulatorinstruktørene bør avlastes fra denne jobben, slik at de best mulig kan bidra til at pilotene tilegner seg nødvendige ferdigheter. Behovet for denne typen støtte har ført til en økning av forskningsaktiviteter knyttet til modellering av luftstridsoppførsel. Nylige framskritt innenfor kunstig intelligens har bidratt med metoder som gjør agenter i stand til å spille komplekse spill som Go og StarCraft II. Slike metoder kan også brukes til å lære avansert kampflyoppførsel, til bruk i simuleringsbasert trening. Likevel må kunstige, intelligente piloter være nyttige. I så måte må de kunne opptre realistisk, og i tråd med militær doktrine. Dette krever at de er i stand til å etterligne ekspertoppførsel. Denne rapporten oppsummerer arbeid knyttet til imiteringslæring for luftstridsoppførsel ved bruk av Next Generation Threat System (NGTS), som er et simuleringssystem med særlig vekt på luftdomenet. Oppførselsmodellene er uttrykt som oppførselstransisjonsnettverk (BTN). For å optimalisere dem ble en tilpasset utgave av den genetiske algoritmen Neuroevolution of Augmenting Topologies (NEAT) benyttet. Den fikk navnet BTN-NEAT. På lik linje med andre maskinlæringsmetoder krever NEAT at mange oppførsler simuleres og evalueres, for å kunne finne gode løsninger. NGTS er ikke rettet spesifikt mot maskinlæringsformål. Derfor ble det utviklet et eget maskinlæringssystem som kobler sammen simulering, oppførselsmodeller og optimaliseringsmetode. Maskinlæringssystemet ble benyttet i eksperimenter med utgangspunkt i tre enkle luftkampscenarioer. I disse eksperimentene lærte agenter å handle i tråd med demonstrert oppførsel, ved å bruke imiteringslæring. BTN-NEAT lyktes i å lage BTN-er som etterlikner demonstrasjonene, selv om læringsprosessen krevde mer tid enn antatt. Årsaken er at NGTS ikke kan kjøre raskt, og samtidig gi pålitelige data. Det tok opp mot 44 timer med beregning å lære egnede oppførsler. Det tilsvarer 92 dager med simulert flytid. Eksperimentene avslørte at NGTS har svakheter knyttet til tidsstyring. Det gjør simuleringssystemet lite attraktivt i videre arbeid knyttet til modellering av luftstridsoppførsel med kunstig intelligens. I stedet er det behov for å utvikle en enkel og beregningsmessig hurtig luftstridssimulering til bruk for maskinlæring.	en_GB
dc.language.iso	en	en_GB
dc.subject	Kunstig intelligens	en_GB
dc.subject	Maskinlæring	en_GB
dc.subject	Datagenererte styrker	en_GB
dc.subject	Kampfly	en_GB
dc.subject	Modellering og simulering	en_GB
dc.title	Imitation learning for modelling air combat behaviour — an exploratory study	en_GB
dc.source.issue	22/02423	en_GB
dc.source.pagenumber	71	en_GB

Files in this item

Name:: 22-02423.pdf
Size:: 2.636Mb
Format:: PDF

This item appears in the following Collection(s)

Rapporter

Show simple item record