Can you please annotate your graph: when does drop_caches run? Is it every time the yellow line moves back up? When would KStars crash if drop_caches didn't run? At the bottom of the first downward yellow line?
Also, how much swap space do you have? Is that your issue?
PS Off topic, but instead of µSD, you should definitely consider using an SSD with your RPi4.
So plot is the reported stats from the linux FREE command as can be seen in the script (assuing you've looked at it as you mention "drop_caches"
I also attach here the output from the text file, which shows all the data that is plotted:
Total number of lines in the file (approx 1 line per second or so) is around 30k.
So the first drop_caches occurs at line ~1930, and how I've recorded this is by outputting string "Iteration %d; Cache memory greater than 75% threshold ( 76.0%). Cleaning memory in system." (search for "iteration" in the text file and we can see 15 occurrances of this - each time we get to a drop_cache, there is a drop in the CACHE memory and rise is the FREE memory - the command in the script does what I needed.
Prior to this addition, Swap would eventually get used until no resources were left and KStars/EKOS would crash, stopping imaging session - the original topic of this discussion.
With the plot, sure enough, we see 15 'jumps' where the cache used is cleared up (cache shown by purple line) - it rises continuously, until cleared.
The yellow FREE plot (again these terms seen in the legend) are the % returned from the "free -w" command in bash) drops until a "drop_caches" when the FREE available jumps up again.
Available swap (unchanged from the OS install) noted at top of the attached file as 99MB. Again, throughout previous tests, swap was used (which from my understanding and eg
isn't 'ideal', but not disasterous). This latest test requires 0% swap used throughout.
Unfortunately, I've not done any real editing of the graph colours - "SWAP usage" and "FREE memory" are both yellow, but the SWAP stays at 0% usage throughout (look along the Y=0% line).
The HDD is the available space on the µSD which (no surprise) drops constantly from about 82% free to 50% free as the simulation files are written at regular intervals ("exposures" of 2.5 sec)
Not sure what else to comment here - the astroberry OS is a clean install, add on Conky and Anydesk, then running KStars for many imaging sessions etc. When I became adventurous and left the rig overnight and went to bed, I'd get up and things had crashed so I've gone down this debug route.
Prior to the addition of the "drop_caches" syntax, the cache would get used up (see previous posts), I would expect the yellow line would continue to decrease, swap would rise, and things would appear to 'stabilise' here for a period, before yes, KStars would crash.
I'm going to run another test to prove this as I'd updated several times throughout this testing. Essentially it'll be the same output. I'll just comment the "drop_caches" line in the script... Watch this space then!
RE SSD - certainly not against the idea! I did look at SSD, although just at the moment, I've done 3D prints which hold all the RPi, cables, etc, on the mount, so just at this point would rather keep things tidier. But yes, I would still consider SSD! I've also considered moving from astroberry OS to stellarmate OS (stellarmate running on the RPi), and thought that the latter would be the best way to get around this issue, although having discovered the memory issue, feel I can run at least for now with my 'hack'.
Does this help?
I guess I don't want to get bogged down with terms and techy stuff, just that I think I've found an issue that causes my system to crash if not kept in check - are others seeing this? Is this a hardware issue (RPi defective? µSD causing this issue? Older DSLR with latest software? Power instability in the cold? etc!) or software (KStars clashing with Conky/Anydesk? ETC ETC?) What's the root cause is where I'm aiming!
On a last note about the plot - the USED memory (bottom red/pink plot) starts low, then jumps up as KStars/EKOS/Indi is started and the imaging session is started, then stays fairly constant throughout the process, before jumping up a bit more at the end as I start looking at the files and thinking about getting data off for analysis.
Colours on graph changed to order in the rainbow scheme. New test still running (crash point never certain, but shows same effect - cache increases as free decreases, and eventually swap gets used.
At this point, I've no idea how long the simulation will continue, other than at some point a crash and stop of acquisition WILL occur.
Also redone the previous text file to show the plots on the same colour scheme.
The swap in this case appears to be 100mb, not a lot admitted but then some people recommend using no swap and some like me use zram (in memory cache with compression) to avoid slow access to a micro sd card and any deterioration that causes to the card. afaik swap shouldn’t really be needed in this case as at the start everything fits quite happily in ram. Uping to 4gb would probably just mean waiting longer for the crash, maybe that’s enough after all how many pictures can one take in a night…
However I suspect Pete is trying to get to the bottom of why he has crashes rather than take quite so many pictures. Pete did you get anywhere with using gdb? Have you tried using one of the memory check widgets in Qtcreator? Might be worth a trawl of some linux forums to see if others are running into similar problems with other programs since it seems strange to use swap to hold stuff forced out by buffer cache perhaps the swap aggression settings need changing.
Thanks again guys, Coming back to original question, why are things crashing - Nigel catches this in the above post correctly! My own 'memory monitor' script fix/hack should now allow me to avoid the crash issue, but ultimately, I still haven't got to the bottom of the crashing EKOS.
If more swap should be required, astroberry should probably do this at time of install with etcher or however. The basic install using etcher has (I believe?) done all the memory allocation for me so the small swap I'd guess is the default. If it is astroberry doing this, maybe it would look at the size of the µSD and 'smartly' allocate memory? Is it this reason astroberry minimum is 16GB with 32GB recommended? If it allocates 4GB swap on a 16GB card, 7 with OS, then that's not really going to handle lots of imaging time! Again, not a computer guru but happy to dig around in my own time, and from reading around, the use of swap does appear to provoke 'try best to avoid' reactions.
Nigel, sods law, the last test with no drop_cache is still running, which if I was imaging, wouldn't be an issue! (36hrs+ without crash). So much for trying to show the crash on the memory graph! I'm pretty sure the crash is down to this memory thing, so happy that I've at least raised the issue and at the very least, your 1st post also told me you were seeing the same effect on cache and free!
gdb got skipped till now, sorry Nigel. I skipped this as I'd just discovered the drop_cache idea, so gdb will be next. I'll also browse around to see about swap agression.
Anyway, I'll do some more tests and just keep popping results here, but I'd hope (come September/October) when the nights are longer that I can do more imaging and confirm that at least the drop_cache will work, giving me what will be my first full night of imaging (previous sessions I've packed up before bed in the wee hours, or crashes each night when leaving things out overnight)!
Thanks again for comments
KStars is crashing, you might have memory leak. It is not obvious KStars is crashing because you are out of memory.
My system is quite different, not running Astroberry, 8GB memory and I am allowing the system to page to a swap file. But last week I ran a simple test using simulators taking 10K 1 second images and storing the .fits files.
KStars is a substantial program, on my system just loading it without using Ekos or drivers.
Getting through initial setup and imaging sessions will obviously require more resources non of which can be considered a memory leak. So I started gathering memory info after 200 images to allow the software to establish itself.
KStars Process going from 200 to 8,000 images.
# images: 200 4K 8K Growth%
VM GB: 2.072.082.112%
RES MB: 35136439312%
MEM%: 4.3% 4.5% 4.9% 0.6%
Used GB: 220.127.116.11
Avail GB: 18.104.22.168
Swap MB: 457792
The KStars/INDI had around 8.5K files/sockets open, this remained fairly consistent during test. So it did continue to grow but not alarmingly so. After 8K of image files, nothing that would stress the systems ability to mange the memory. It did use some swap space but just at a housekeeping level.
I want to suggest something about the scripting you are doing and hoping I don't have too many wrong assumptions.
We see the operating system using a good portion of its memory as a disk cache for performance reasons. I believe that if it needs more memory for resident programs it will trade off that cache memory to satisfy the needs of the requesting program. That is the memory identified as available. When you force out of the cache the clean pages stored there, presumably you are moving the memory from being available to being free. I do not think you are making new memory available to KStars. It would seem that the operating system will get busy establishing its cache again. It think it would be better to leave management of the cache up to the OS. If there are cached pages and some process needs more memory, the OS should be able to free up those cached pages itself.
I am thinking the see sawing shown in the graph might be the OS is setting up the cache and the script undoing that.
Below from your initial post where you show the transition from pre to post crash, I think this is the area referred to:
2022-05-16 18:28:13; Temp=43.3ºC. RAM 'Used': 1.2GB (31.4%); RAM 'Free': 104MB ( 2.8%); RAM 'Available': 2.4GB (64.0%); RAM used in Shared & Buffer: 66.7%. Swap used:100.0%. µSD @/dev/root uses 39,604,448kB.
2022-05-16 18:28:15; Temp=42.8ºC. RAM 'Used': 1.0GB (24.4%); RAM 'Free': 489MB (17.1%); RAM 'Available': 3.1GB (83.1%); RAM used in Shared & Buffer: 66.8%. Swap used:98.8%. µSD @/dev/root uses 39,607,008kB.
2022-05-16 18:28:16; Temp=42.3ºC. RAM 'Used': 459MB (12.1%); RAM 'Free': 847MB (22.4%); RAM 'Available': 3.1GB (83.8%); RAM used in Shared & Buffer: 65.7%. Swap used:98.8%. µSD @/dev/root uses 39,607,008kB.
If the idea that the OS will make "available" memory available as needed we see before the crash 2.4GB might still be accessable. This strongly suggests that the crash is not due to out of memory errors. Perhaps if you posted a log file somewhere that would show otherwise.
About the swap file Which is the only thing I can see that is maxed out.
What you are running is a kind of stress test, a large software application creating thousands of images on what today is considered constrained hardware. At the same time because of the very small swap space it is in a state where this test has to run without being able to trade off unused or less used pages in memory because it not being allowed to page the files onto disk. At one time I ran on a Raspberry pi and also a Intel PC computer. In both cases I added a USB or better to run the software from and use for the swap. Certainly would not want to run the swap on a SSD card.
We know from its use of the swap file that the system is trying to page files, perhaps to get the older less used pages out of its cache. On my larger system it with memory to spare it wants to use the page file. I am not trying to make a case that the lack of swap space is causing a crash. Also not trying to say KStars is being closed because the OS has stared killing processes I thought it only did that to save itself when unable to satisfy memory requests.
Unless it has got itself into a state where not being able to page makes it unable to turn availalbe memory back into free.
Do wonder if you gave it 2GB or even 1GB page file if it would make a difference. It is tough if you only have a small SSD card on the system.
That is the last time, promise.
Mach1, TS86SDQ, ASI071, ASI174, OAG, focusPro
Last edit: 1 week 4 days ago by wotalota. Reason: Format
Using memory as file cache is fine and to be expected/wanted. Paging out things that have not been used for a ‘long’ time is fine (i would think this is data rather than executable but could be wrong) I think I’ve even changed a setting on my astroberry to make it keep things in memory more rather than swap them out.
The problem is more why is kstars dying, is it a problem with the code, some linux setting or a bug in some part of linux. It could just be coincidence and it’s not actually the memory/swap that’s run out in some way. Really there’s a need to look at some core dump or trace to see what was happening when it failed. Pete seems the only one with the patience to take thousands of photos and it sounds like it takes a while to do. Unfortunately I don’t know enough to be able to advise how to get that dump, ages since I’ve looked at such a thing..
Yep that’s the setting I rather randomly changed
I also use zram/zswap for swap and even used tmpfs for /tmp (though ran into problems due to temporary files building up when doing plate solving). This all on a 4gb PI. I don’t think I’ve run out of ram and only ever had minimal swap usage so kstars/ekos isn’t exactly eating all the memory. That said I’ve only gotten things all working recently so have only taken around a hundred images in a session.
I used to have more problems when I used to take larger resolution images (with the same camera ) but stopped using the fits viewer all the time, switched off the notification sounds, changed to aps-c resolution and applied lots of updates which seems to have ‘cured’ that.No idea what it was though as it was quite random