×

INDI Library v1.9.8 Released (29 Sep 2022)

Bi-monthly INDI Library released with new drivers and bug fixes.

KStars/EKOS crashing when imaging; Astroberry, RPi 4-4GB

  • Posts: 19
  • Thank you received: 1
Right!
I've solved (or it appears I have solved) the issue for my setup.
All the pre-mentioned memory issues still appear to occur, but I have a solution that will "fix" the issue for me at this time.
Whilst I know the fix isn't solving the issue, I'm hoping that it makes it known that I've found an issue and able to get around the problem.
As previous, I note the usage in specific elements of RAM going up (mainly cache) when imaging with KStars/EKOS/Indi on RPi4 (4GB). It might be those with 8GB models don't see the issue as there's more resource to 'lose', but there are then later times when the software crashes with no obvious issue at the front end.
Through reading about different memory allocations, etc on Linux and my limited scripting abilities, I've written a script that outputs the memory usage every second or so for reference. This script was also updated several times and eventually finalised with one that will force a memory purge by the system, should the resources available become too little.
Prior to the memory purge, I could run the EKOS simulators and cause the software to crash, similar to the real-life imaging that I was experiencing earlier in the year (before the clocks changed and imaging conditions became a bit too short!).
I'm more familiar with Matlab coding and wrote a script that will take the text file and output the data in a numerical format to plot it and have the following after a 12hour simulation session (15k images obtained). Sure enough, one can see that the memory is purged sevaral times throughout the simulation, but I do not suffer from any crash - I ran this prior and ended up filling the 128GB µSD card with simulation images in one go, so very happy that at least I've something that will work.


Can an admin look into this for me and see is there a memory leak or something similar?
I'm prepared to try a new µSD or whatever, but there appears to be at least a few of us with similar issues.
Script file attached, hoping it comments enough to allow other/better coders to let them know my thought process.
Regards to all

File Attachment:

File Name: SampleShel...5-24.txt
File Size:5 KB
4 months 1 week ago #83085
Attachments:

Please Log in or Create an account to join the conversation.

  • Posts: 274
  • Thank you received: 32
I thought a swap size at least the size of physical memory was needed.
root@:~# free
total used free shared buff/cache available
Mem: 8024804 1759520 230596 690028 6034688 5265496
Swap: 8191996 227072 7964924
root@:~#
/Tom
Mach1, TS86SDQ, ASI071, ASI174, OAG, focusPro
4 months 1 week ago #83087

Please Log in or Create an account to join the conversation.

  • Posts: 886
  • Thank you received: 421
Pete,

Can you please annotate your graph: when does drop_caches run? Is it every time the yellow line moves back up? When would KStars crash if drop_caches didn't run? At the bottom of the first downward yellow line?

Also, how much swap space do you have? Is that your issue?

Thanks,
Hy

PS Off topic, but instead of µSD, you should definitely consider using an SSD with your RPi4.
AP1100 & Orion Atlas Pro, WO/ZS105 w/Moonlight V2 focus, GSO RC10 w/RSF focus
ZWO ASI1600, Astronomik Filters, ST80, QHY 5L-IIm.
KStars/Ekos/Indi on NUC10 & RPi4 w/SSD -- Ubuntu
Projects: Greedy Scheduler, Terrain, Polar Align, Analyze, Linear Focuser, SEP MultiStar & GPG Guide, FITS autostretch.
4 months 1 week ago #83089

Please Log in or Create an account to join the conversation.

  • Posts: 19
  • Thank you received: 1
Thanks Hy,
So plot is the reported stats from the linux FREE command as can be seen in the script (assuing you've looked at it as you mention "drop_caches"
I also attach here the output from the text file, which shows all the data that is plotted:

File Attachment:

File Name: Z_PTS_Log.txt
File Size:5,433 KB

Total number of lines in the file (approx 1 line per second or so) is around 30k.
So the first drop_caches occurs at line ~1930, and how I've recorded this is by outputting string "Iteration %d; Cache memory greater than 75% threshold ( 76.0%). Cleaning memory in system." (search for "iteration" in the text file and we can see 15 occurrances of this - each time we get to a drop_cache, there is a drop in the CACHE memory and rise is the FREE memory - the command in the script does what I needed.
Prior to this addition, Swap would eventually get used until no resources were left and KStars/EKOS would crash, stopping imaging session - the original topic of this discussion.
With the plot, sure enough, we see 15 'jumps' where the cache used is cleared up (cache shown by purple line) - it rises continuously, until cleared.
The yellow FREE plot (again these terms seen in the legend) are the % returned from the "free -w" command in bash) drops until a "drop_caches" when the FREE available jumps up again.

Available swap (unchanged from the OS install) noted at top of the attached file as 99MB. Again, throughout previous tests, swap was used (which from my understanding and eg searching isn't 'ideal', but not disasterous). This latest test requires 0% swap used throughout.
Unfortunately, I've not done any real editing of the graph colours - "SWAP usage" and "FREE memory" are both yellow, but the SWAP stays at 0% usage throughout (look along the Y=0% line).

The HDD is the available space on the µSD which (no surprise) drops constantly from about 82% free to 50% free as the simulation files are written at regular intervals ("exposures" of 2.5 sec)
Not sure what else to comment here - the astroberry OS is a clean install, add on Conky and Anydesk, then running KStars for many imaging sessions etc. When I became adventurous and left the rig overnight and went to bed, I'd get up and things had crashed so I've gone down this debug route.

Prior to the addition of the "drop_caches" syntax, the cache would get used up (see previous posts), I would expect the yellow line would continue to decrease, swap would rise, and things would appear to 'stabilise' here for a period, before yes, KStars would crash.
I'm going to run another test to prove this as I'd updated several times throughout this testing. Essentially it'll be the same output. I'll just comment the "drop_caches" line in the script... Watch this space then!

RE SSD - certainly not against the idea! I did look at SSD, although just at the moment, I've done 3D prints which hold all the RPi, cables, etc, on the mount, so just at this point would rather keep things tidier. But yes, I would still consider SSD! I've also considered moving from astroberry OS to stellarmate OS (stellarmate running on the RPi), and thought that the latter would be the best way to get around this issue, although having discovered the memory issue, feel I can run at least for now with my 'hack'.

Does this help?
I guess I don't want to get bogged down with terms and techy stuff, just that I think I've found an issue that causes my system to crash if not kept in check - are others seeing this? Is this a hardware issue (RPi defective? µSD causing this issue? Older DSLR with latest software? Power instability in the cold? etc!) or software (KStars clashing with Conky/Anydesk? ETC ETC?) What's the root cause is where I'm aiming!

On a last note about the plot - the USED memory (bottom red/pink plot) starts low, then jumps up as KStars/EKOS/Indi is started and the imaging session is started, then stays fairly constant throughout the process, before jumping up a bit more at the end as I start looking at the files and thinking about getting data off for analysis.
4 months 1 week ago #83095
Attachments:

Please Log in or Create an account to join the conversation.

  • Posts: 19
  • Thank you received: 1
Colours on graph changed to order in the rainbow scheme. New test still running (crash point never certain, but shows same effect - cache increases as free decreases, and eventually swap gets used.
At this point, I've no idea how long the simulation will continue, other than at some point a crash and stop of acquisition WILL occur.
Also redone the previous text file to show the plots on the same colour scheme.
Without drop_cache:


With drop_cache:
4 months 1 week ago #83113
Attachments:

Please Log in or Create an account to join the conversation.

  • Posts: 274
  • Thank you received: 32
Pete,
These are small systems and there is just a 100k swap file which is next to nothing. Just try creating a 4GB swap partition for the OS to use and see if it helps.
/Tom
Mach1, TS86SDQ, ASI071, ASI174, OAG, focusPro
4 months 1 week ago #83124

Please Log in or Create an account to join the conversation.

  • Posts: 76
  • Thank you received: 5
The swap in this case appears to be 100mb, not a lot admitted but then some people recommend using no swap and some like me use zram (in memory cache with compression) to avoid slow access to a micro sd card and any deterioration that causes to the card. afaik swap shouldn’t really be needed in this case as at the start everything fits quite happily in ram. Uping to 4gb would probably just mean waiting longer for the crash, maybe that’s enough after all how many pictures can one take in a night…

However I suspect Pete is trying to get to the bottom of why he has crashes rather than take quite so many pictures. Pete did you get anywhere with using gdb? Have you tried using one of the memory check widgets in Qtcreator? Might be worth a trawl of some linux forums to see if others are running into similar problems with other programs since it seems strange to use swap to hold stuff forced out by buffer cache perhaps the swap aggression settings need changing.
4 months 1 week ago #83139

Please Log in or Create an account to join the conversation.

  • Posts: 19
  • Thank you received: 1
Thanks again guys, Coming back to original question, why are things crashing - Nigel catches this in the above post correctly! My own 'memory monitor' script fix/hack should now allow me to avoid the crash issue, but ultimately, I still haven't got to the bottom of the crashing EKOS.
If more swap should be required, astroberry should probably do this at time of install with etcher or however. The basic install using etcher has (I believe?) done all the memory allocation for me so the small swap I'd guess is the default. If it is astroberry doing this, maybe it would look at the size of the µSD and 'smartly' allocate memory? Is it this reason astroberry minimum is 16GB with 32GB recommended? If it allocates 4GB swap on a 16GB card, 7 with OS, then that's not really going to handle lots of imaging time! :lol: :lol: :lol: Again, not a computer guru but happy to dig around in my own time, and from reading around, the use of swap does appear to provoke 'try best to avoid' reactions.
Nigel, sods law, the last test with no drop_cache is still running, which if I was imaging, wouldn't be an issue! (36hrs+ without crash). So much for trying to show the crash on the memory graph! I'm pretty sure the crash is down to this memory thing, so happy that I've at least raised the issue and at the very least, your 1st post also told me you were seeing the same effect on cache and free!
gdb got skipped till now, sorry Nigel. I skipped this as I'd just discovered the drop_cache idea, so gdb will be next. I'll also browse around to see about swap agression.
Anyway, I'll do some more tests and just keep popping results here, but I'd hope (come September/October) when the nights are longer that I can do more imaging and confirm that at least the drop_cache will work, giving me what will be my first full night of imaging (previous sessions I've packed up before bed in the wee hours, or crashes each night when leaving things out overnight)!
Thanks again for comments
4 months 1 week ago #83141

Please Log in or Create an account to join the conversation.

  • Posts: 19
  • Thank you received: 1
3 months 3 weeks ago #83471

Please Log in or Create an account to join the conversation.

  • Posts: 274
  • Thank you received: 32
Do you see similar python messages in the log file?
Dec 02 21:31:31 astroberry python3[1047]: swig/python detected a memory leak of type 'INDI::BaseDevice:cheekyroperties *', no destructor found.
Mach1, TS86SDQ, ASI071, ASI174, OAG, focusPro
3 months 3 weeks ago #83541

Please Log in or Create an account to join the conversation.

  • Posts: 274
  • Thank you received: 32
KStars is crashing, you might have memory leak. It is not obvious KStars is crashing because you are out of memory.
My system is quite different, not running Astroberry, 8GB memory and I am allowing the system to page to a swap file. But last week I ran a simple test using simulators taking 10K 1 second images and storing the .fits files.

KStars is a substantial program, on my system just loading it without using Ekos or drivers.
Virtual: 1614128
RES: 212288

Getting through initial setup and imaging sessions will obviously require more resources non of which can be considered a memory leak. So I started gathering memory info after 200 images to allow the software to establish itself.

KStars Process going from 200 to 8,000 images.
# images:     200       4K       8K         Growth%
 
VM GB:         2.07      2.08     2.11       2%
RES MB:       351       364      393        12%
MEM%:         4.3%     4.5%    4.9%      0.6%     
 
System
   Used GB:   1.6        1.7       1.7
    Avail GB:  5.2         5.1       5.1
Swap MB:     45          77        92
The KStars/INDI had around 8.5K files/sockets open, this remained fairly consistent during test. So it did continue to grow but not alarmingly so. After 8K of image files, nothing that would stress the systems ability to mange the memory. It did use some swap space but just at a housekeeping level.

I want to suggest something about the scripting you are doing and hoping I don't have too many wrong assumptions.
We see the operating system using a good portion of its memory as a disk cache for performance reasons. I believe that if it needs more memory for resident programs it will trade off that cache memory to satisfy the needs of the requesting program. That is the memory identified as available. When you force out of the cache the clean pages stored there, presumably you are moving the memory from being available to being free. I do not think you are making new memory available to KStars. It would seem that the operating system will get busy establishing its cache again. It think it would be better to leave management of the cache up to the OS. If there are cached pages and some process needs more memory, the OS should be able to free up those cached pages itself.
I am thinking the see sawing shown in the graph might be the OS is setting up the cache and the script undoing that.

Below from your initial post where you show the transition from pre to post crash, I think this is the area referred to:
2022-05-16 18:28:13; Temp=43.3ºC. RAM 'Used': 1.2GB (31.4%); RAM 'Free': 104MB ( 2.8%); RAM 'Available': 2.4GB (64.0%); RAM used in Shared & Buffer: 66.7%. Swap used:100.0%. µSD @/dev/root uses 39,604,448kB.
2022-05-16 18:28:15; Temp=42.8ºC. RAM 'Used': 1.0GB (24.4%); RAM 'Free': 489MB (17.1%); RAM 'Available': 3.1GB (83.1%); RAM used in Shared & Buffer: 66.8%. Swap used:98.8%. µSD @/dev/root uses 39,607,008kB.
2022-05-16 18:28:16; Temp=42.3ºC. RAM 'Used': 459MB (12.1%); RAM 'Free': 847MB (22.4%); RAM 'Available': 3.1GB (83.8%); RAM used in Shared & Buffer: 65.7%. Swap used:98.8%. µSD @/dev/root uses 39,607,008kB.

If the idea that the OS will make "available" memory available as needed we see before the crash 2.4GB might still be accessable. This strongly suggests that the crash is not due to out of memory errors. Perhaps if you posted a log file somewhere that would show otherwise.

About the swap file Which is the only thing I can see that is maxed out.
What you are running is a kind of stress test, a large software application creating thousands of images on what today is considered constrained hardware. At the same time because of the very small swap space it is in a state where this test has to run without being able to trade off unused or less used pages in memory because it not being allowed to page the files onto disk. At one time I ran on a Raspberry pi and also a Intel PC computer. In both cases I added a USB or better to run the software from and use for the swap. Certainly would not want to run the swap on a SSD card.

We know from its use of the swap file that the system is trying to page files, perhaps to get the older less used pages out of its cache. On my larger system it with memory to spare it wants to use the page file. I am not trying to make a case that the lack of swap space is causing a crash. Also not trying to say KStars is being closed because the OS has stared killing processes I thought it only did that to save itself when unable to satisfy memory requests.
Unless it has got itself into a state where not being able to page makes it unable to turn availalbe memory back into free.
Do wonder if you gave it 2GB or even 1GB page file if it would make a difference. It is tough if you only have a small SSD card on the system.
That is the last time, promise.
/Tom
Mach1, TS86SDQ, ASI071, ASI174, OAG, focusPro
Last edit: 3 months 3 weeks ago by wotalota. Reason: Format
3 months 3 weeks ago #83601

Please Log in or Create an account to join the conversation.

  • Posts: 76
  • Thank you received: 5
Using memory as file cache is fine and to be expected/wanted. Paging out things that have not been used for a ‘long’ time is fine (i would think this is data rather than executable but could be wrong) I think I’ve even changed a setting on my astroberry to make it keep things in memory more rather than swap them out.
The problem is more why is kstars dying, is it a problem with the code, some linux setting or a bug in some part of linux. It could just be coincidence and it’s not actually the memory/swap that’s run out in some way. Really there’s a need to look at some core dump or trace to see what was happening when it failed. Pete seems the only one with the patience to take thousands of photos and it sounds like it takes a while to do. Unfortunately I don’t know enough to be able to advise how to get that dump, ages since I’ve looked at such a thing..
3 months 3 weeks ago #83602

Please Log in or Create an account to join the conversation.

Moderators: Radek Kaczorek
Time to create page: 1.445 seconds