../astro-log-ai

keeping an astronomical log with a microphone and AI

Table of contents

Here I will describe a method for using voice recordings in the field and LLMs running entirely on your own computer to generate astronomical observing logs.

background

Hobbyist astronomers usually keep a log of their observations. Since there are so many things to see up in the sky, a log helps us keep track of what we've seen before, or see how our ability to observe a certain target might change over time.

I, however, have been historically bad at keeping a log. Writing by hand in the dark and cold is difficult and unpleasant. Typing is impossible because you'll blind yourself with whatever screen your device has. So up to now my astronomy logging has been somewhat lackluster.

However, with a handheld recording device, OpenAI's whisper model, and a general-purpose LLM it's a breeze to keep a detailed observing log by only speaking into a microphone.

recording device

I started doing this originally with my phone, but a dedicated recording device makes things much easier. Having a single physical button to press to start and stop recording is much less hassle than trying to use an app, especially because I use my phone for star charts as well.

A phone would be fine if I did one long, continuous recording, but Whisper doesn't work nearly as well converting long recordings with large periods of silence. A single continous recording means that the speech-to-text conversion will take longer and Whisper sometimes will transcribe things in silent portions of an audio recording.

Therefore, I found it to work much better to keep a handhold recorder around, with a dedicated record button that I could record short segments of audio such as "M31 found by star hopping from Deneb at 22:04."

whisper speech-to-text

I bring all the WAV files over from the recorder and run whisper on them to turn the speech into text.

whisper --model turbo --output_format txt --language en *.WAV

This gives me text files that look something like this:

Seeing is not the best right now because of all of the wind, so Venus is very blurry and shaky in the eyepiece.
It's pretty full. I'd say it's a gibbous shape, but it's very difficult to tell because it's swimming all around with the 7mm eyepiece.
Alberi observed at 1921.
It's 2143 and I've finally found the star that I'm going to start star hopping to M5 with.

The transcript inevitably has some errors (here, it should be Alberio instead of Alberi), but we should see these cleaned up in the next step.

llm transcription to observing notes

To transcribe the fairly unstructured ramblings I make when out in the field, I use Ollama to run my LLMs.

When it comes to summarizing the raw text into a log, I find that parameter size seems to make a big difference in the quality of summarization, which contradicts what I've read online. But for the specific use-case of astronomy it makes sense: the models need more context to determine what's important to summarize, and astronomical terms are likely rarer in training data than everyday speech. Therefore I've settled on the models available in Ollama with larger parameter sets: Gemma 2 27B and Llama 3.1 70B.

Gemma 2 and Llama 3.1 are good at different things, though! In general, I like the formatting output of Gemma 2 a lot better than what I can get Llama 3.1 to output, and Gemma 2 seems to be a bit better about extracting the most relevant details from the log. On the other hand, Llama 3.1 is not as good at pulling out important details from the log, but fares much better at fixing errors in the transcription and adding in context in some places (the 70B model specifically, the 3B model is worse than Gemma).

From the above transcript, Gemma 2 did not manage to correct Alberi to Alberio, but Llama 3.1 did.

The weaknesses of each model can be overcome by the other, so the workflow I've settled on is to send the raw text into each of the models, then take the output and send it back into Gemma 2 for the final rendering.

The initial prompt into both models looks something like this:

Here is a speech to text transcript of an observation session. Create a detailed log of the session, using only facts found in the text. Do not add your own interpretation to the text. For each observation, give the title of the target, the time observed if available, and a bulleted list of notes. Correct possible mistranscriptions.

I'm still experimenting with prompts. Shorter prompts seem to work better, and I have no idea if this is expected or not.

Here is an excerpt from the logs each of those generated:

gemma 2

* **Venus:**
    * Time Observed: ~19:05
    * Notes:
        * Appearance affected by wind, blurry and shaky.
        * Described as gibbous shape, but difficult to discern due to conditions.
        
* **Alberi:**
    * Time Observed: 19:21
    
* **M81 & M82 (Galaxies):**
    * Time Observed: 19:55 - 20:01
    * Notes:
        * Found by starhopping from the nose of Ursa Major, not Dubhee as initially planned.
        * Both visible in the same eyepiece field of view.
        * Faint smudges, slightly elliptical in shape.
        * Distinctly galaxies.
        * 18.2mm eyepiece: Background darkened, objects more prominent.
        * 7mm eyepiece: Objects no longer fit field of view, appear as dim wisps, visible mainly with averted vision.

llama 3.1

**Venus**
* Time observed: 1905
* Notes:
	+ Very blurry and shaky in the eyepiece due to wind
	+ Gibbous shape, but difficult to tell due to movement
	+ Pretty full

**Albirio**
* Time observed: 1921
* No notes available

**M82 and M81**
* Time observed: 1955 (starhopping started earlier)
* Notes:
	+ Faint smudges, elliptical in shape
	+ Distinctly galaxies, very close to each other
	+ Visible in the same eyepiece with the 18.2mm eyepiece
	+ Appear as wisps with the 7mm eyepiece and averted vision


combined

Combining these two logs and running them again through Gemma 2 yields the insight of Gemma 2 with the correction capabilities of Llama 3.1:

* **Venus:**
    * Time Observed: ~19:05
    * Notes:
        + Very blurry and shaky in the eyepiece due to wind.
        + Gibbous shape, but difficult to tell due to movement.
        + Described as "pretty full".
* **Albirio:**
    * Time Observed: 19:21
    * No notes available. 
* **M81 & M82 (Galaxies):**
    * Time Observed: 19:55 - 20:01
    * Notes:
        + Found by starhopping from the nose of Ursa Major, not Dubhee as initially planned.
        + Both visible in the same eyepiece field of view.
        + Faint smudges, slightly elliptical in shape.
        + Distinctly galaxies.
        + 18.2mm eyepiece: Background darkened, objects more prominent.
        + 7mm eyepiece: Objects no longer fit field of view, appear as dim wisps, visible mainly with averted vision.

conclusion

I consider this project a resounding success. I'm able to keep notes without taking time out from looking through the eyepiece, and I don't have to listen back to my own voice later to transcribe the notes into a more archival-friendly format.

Even though I save a lot of work using these models, I'm still doing a lot of work by hand. Organizing the files, running the models and copy-and-pasting prompts. Once I have the text of the log, I manually add the observations and notes into AstroPlanner, which I keep as my source of truth for past observations.

I hope very soon to write some software to handle the whole proces from plugging in my recorder to getting the final text log out.

Automating entry into AstroPlanner would be a neat next step as well, but doing it by hand allows me to verify the log generated, which is important to me because verifying AI output is still a necessary step for any work done by an LLM.

You can see an example of the full output of this process here.