Jump to content

Petr Mazůrek

Members
  • Posts

    11
  • Joined

  • Last visited

Everything posted by Petr Mazůrek

  1. Minor update: We tested the whole thing on a virtualized Windows server. We can see slowdown of the datastore loop but no other loops are affected. Therefore, behaviour on LinuxRT really is different and it causes the problems to appear sooner.
  2. I tried to remember what led to this design, I think it was the idea of having the whole thing wrapped in something better defined than just a feedback node (used to be FGV). I suppose it doesn't achieve much.
  3. Scaling of data is important for us so we would like to build upon existing structures within the code.
  4. Are there any more details available on how occurences are used in case of notifiers and queues?
  5. The DataStore is shared for the machine which runs multiple IOCs. So Queue can be accessed by name _DATASTORE from other pieces of code.
  6. So we made some progress with diagnostics of this thing. We tried to condense it in the appended report. In short, we managed to isolate this to access to notifier references. This is clearly realized differently on PharLap and Linux. We tried to swap notifiers for queues and it improves readings but writing is still slower. I can post the code after some revision to make sure that you don't need extra packages if you want to try it yourselves. LinuxRMC CPU Load_New_revised.pptx
  7. I tried forcing of FIFO scheduling on all threads while making sure that policies are all set to other before that. lvrt.conf thpolicy_tcrit=other thpri_tcrit=0 thpolicy_vhigh=other thpri_vhigh=0 thpolicy_high=other thpri_high=0 Command line chrt -p -a -f 1 <pidnumber> I am at a better rate now (would be acceptable) but it is not comparable to the Win10 build in any way. Also, it is stable - that did not apply in the past when testing FIFO scheduling. The last thing I tried was to kill the thread for mDNS but it did not change anything. Here is the trace with FIFO enabled I will keep testing it.
  8. It's so great to have a fresh pair of eyes on this. mDNS struck me too. I suppose that EPICS engine takes quite a load but it opens virtual channels that don't need search happening all the time. We've had other related problems before with UDP. The buffer was too big. We optimized it and now it works like a charm. Any tips what to look for? Is this TCP? Regarding network variables, we are not using any. Codebase is large though and our partners from LLNL used variables in the past. Is it maybe spawned automatically by LVRT? It seemes to me like some of these services are. I noticed some DAQmx stuff in there. Thanks fo the feedback.
  9. My thoughts exactly The PreemptRT architect I've reached was interested but he was transfered to a different project because they consider Linux RT transition for key products solved. I guess that scaling another application to a reasonable size could reveal the same behaviour. PharLap has been used in other places where Big Physics happens but it is still too early to migrate to Linux for any large-scale project (even though you pay in gold for a single RMC). Maybe answers will start coming when key customers are affected.
  10. Hi everyone, this is a lengthy one but I think it deserves some more information. INTRODUCTION I'd like to first and foremost state the following: I understand that VMs are not officially supported and that idea exchange posts on how to setup your own Linux virtual machine are not an official source of information. That being said, we experienced with just that and discovered that we hit a certain performance cap that evidently exists on cRIOs and PXIs as well. SYSTEM DESCRIPTION Our applications are large and could be described followingly: 1) It is not object oriented but consists of individual processes with wrappers for commands. These processes are governed by what we call an IOC (input-output controller) and we use QMH for communication. 2) Several IOCs can run on a single machine and they publish data using network streams to 'Developer GUIs' - lower level access. 3) All variables are stored in a local datastore (basically a Queue with one variant in it, keys are variant attributes to accomodate fast search). 4) Meaningful data are then sent using EPICS 3.14 server. Such a server is hosted in one of the IOCs which we describe as a ROOT IOC. PV variables are then used by 'EPICS GUIs' - higher level control and supervision. 5) We used to develop this for PharLap on RMC-8354 therefore such machine are called RMCs. RMC-8354 on NI page 6) As an example with our labels and terminology: RMC101 host two IOCs: ROOT IOC and Camera Manager IOC. ISSUES PharLaps frequently crashed (once a week) and RMC-8354 became a mature product with no viable rack-mount replacement. Therefore we started experimenting with VMs on either physical or cloud-based platform (vxRail). It became soon obvious that: 1) Linux can be easily managed and it is very stable. 2) Deployment is in fact easier than in case of a PharLap machine. 3) Our software cannot run as many IOCs as one would expect. We confirmed this on genuine NI hardware as well (PXI and cRIO with varying numbers of IOCs active). Let me describe the behaviour: Everything scales nicely until we exceed a certain CPU load. We know that CPU load data are reliable and thus it made no sense that it stopped increasing when we started adding more software on to the machine. During several months of experiments and various hacks we isolated the problem to scheduling. We managed to make a few traces and soon it became obvious that PharLap and Linux are very different beasts. Pharlap trace is very organized and it manages to run 4-5 IOC with no problems (CPU is very high though, could be up to 95%). A) PHARLAP TRACE Linux on the other hand stops working when we add more than 2 IOCs and even 2 demanding IOCs can underperform. B) LINUX TRACE WITH FUNCTIONAL CODE C) LINUX TRACE WITH DYSFUNCTIONAL CODE IN-DETAIL DESCRIPTION Any increase in CPU load on a Linux machine causes more threads to be spawned and very high number of context switches is detected. So observing of CPU load does not help you because it is thread switching that becomes the issue. Why this happens is probably down to how compiler for Linux works and how it sets scheduling in NI Linux kernel. We tested multiple things, including changes in scheduling strategies but these hacks are never stable and help only in specific arrangements of our IOCs. One such example is pidof lvrt 2074 chrt -p -a -f 1 2074 This forces all threads to use FIFO scheduler with the lowest priority. Magically, this can improve performance - and then it crashes. CPU rises from 50 to 70% (since it can actually do more and it is not just swapping threads around). I investigated threadconfig.vi and started experimenting with the number of threads being spawned. My latest discovery shows that by limiting the number of available threads per core in combination with increase in the number of available cores improves performance. I added these into lvrt.conf ESys.Normal=10 ESys.StdNParallel=-1 ESys.Bgrnd=0 ESys.High=0 ESys.VHigh=10 ESys.TCritical=6 ESys.instrument.Bgrnd=0 ESys.instrument.Normal=0 ESys.instrument.High=0 ESys.instrument.VHigh=0 ESys.instrument.TCritical=0 ESys.DAQ.Bgrnd=0 ESys.DAQ.Normal=0 ESys.DAQ.High=0 ESys.DAQ.VHigh=0 ESys.DAQ.TCritical=0 ESys.other1.Bgrnd=0 ESys.other1.Normal=0 ESys.other1.High=0 ESys.other1.VHigh=0 ESys.other1.TCritical=0 ESys.other2.Bgrnd=0 ESys.other2.Normal=0 ESys.other2.High=0 ESys.other2.VHigh=0 ESys.other2.TCritical=0 I believe that the default number is 20 (threadconfig.vi somehow reported 26?) and we don't use other execution priorities so I limited standard threads to 10 per core. Also, if you completely omit time critical threads, it doesn't help in any way. Adding ESys.TCritical=6 improved performance to the best it has ever been. I don't know why, we are not using time critical priority anyway, I searched the code with a script just to make sure. One sure thing to improve it is going up with computational frequency - but that is not the latest trend and especially not in vmWare world. COMPARISON WITH WIN10 BUILD What makes me believe that this is not our fault? I admit that we are doing something very complex afterall. But the whole thing is not really using RT in any way and Windows 10 build of the same thing runs like a charm on much less powerful PCs. SUMMARY I am opened to any suggestions what to try next. Maybe you share the same experience? The scariest thing for us is the fact that NI does not offer a follow-up to RMC that could perform as well as an 8-year-old PharLap machine. We have contacted American NI team but the scale of the project does not help in explaining what is wrong. Their general response is - we will get back to you. While I understand that, we need to somehow move forward. Thank you for any tips, Petr
  11. Hi Lukasz, have you made any progress regarding this? I am investigating RT Linux vs Windows but I observe similar tropes. Might start a topic on the subject... Thanks, Petr
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.