Jump to content
AlexA

Q: What causes unresponsive FPGA elements after restart?

Recommended Posts

Hey guys,

 

I'm at my wits end here trying to interface with an FPGA. I've currently got it configured as follows:

 

7813R FPGA card in a LV-RT machine, running LV2012 on host and Real-Time machines.

 

Compile: Run when loaded = True

Obtain Reference:

  • Linked to bitfile 
  • FPGA ref bound to typedef
  • Run the FPGA VI = False

FIFOs (2 host-to-target DMAs) depth initialized after reference obtained.

 

Code then utilises reference to read indicators, send values to FIFOs and update controls.

 

On Shutdown: FIFOs stopped, but reference not closed.

 

My problem: All subsequent runs after shutting down the program (via the shutdown method) will successfully obtain a reference to the FPGA and init FIFOs (no errors generate), but attempts to read any of the indicators, or update any of the controls do nothing. Indicators show default values, and controls are unresponsive. In the past, I've fixed this by recompiling the FPGA, also sometimes by just redownloading it. Since I made some changes to the way data is loaded into the FIFOs (absolutely no changes to anything that ever touched the FPGA reference, purely to the way the arrays are formatted), nothing I do seems to recover the responsiveness.

 

I've tried: Restarting the RT machine, re-downloading the FPGA (in different orders). I've added close FPGA reference and "Start on Run" to the obtain reference and then restarting and re-downloading. Absolutely nothing has worked. I'm currently in the process of removing "Run on load" (performing a recompile), after which I will add "Start on Run" back into the Obtain reference vi.

 

I'm at my wits end. Has anyone encountered this sort of problem before, does anyone know what causes it?

Share this post


Link to post
Share on other sites

Ok, to follow this up.

 

Currently it looks like it is now behaving itself.


What I did: Removed "Run on load" from the compile flags. Set Obtain Reference to "Start on Run" and made ABSOLUTELY sure that when I closed the reference there was no possibility of that reference still being actively used by one of the worker loops that interacts with the FPGA.

 

I haven't conclusively proved it, but I think the key is to make sure the reference is closed after all loops that could possibly be using it have exited. Hope this helps someone else in the future!

Share this post


Link to post
Share on other sites

Another note for anyone that happens to stumble across this for a similar problem. In the previous post I made a note about the things I have changed to get it to work. This solves the problem of the unresponsive FPGA, but it seems to introduce another problem.

 

"Deployment Completed With Errors". I didn't receive this error when "Run on load" was set to true and "Start on Run" to false (noting that the FPGA did actually work sometimes when configured like this). Now, I'm getting it all the time whenever I make a modification to the RT code which deals with the 7813R reference and then hit the run arrow again, restarting the RT machine fixes this problem.

 

I'm guessing it has something to do with dangling references again, but I've made every effort to ensure it's closed before exiting, so I don't know why this is happening.

Share this post


Link to post
Share on other sites

Your use of the references sounds complicated (although doesn't explain the latest error).

If you have multiple items needing to access the FPGA each should open its own reference. If you attempt to run the FPGA and it already is you will just get a warning generated. Then each element should close its reference. The default behaviour is if it is the last reference open then it will reset the FPGA, you can change this with a right-click.

Your original issue sounds like an FPGA VI that either connection has failed (but this should throw an error) or an FPGA that has been reset. Reset is not the same as load so a reset would mean the FPGA is not running in your setup but it is perfectly valid to interact with FPGA when it is not running hence no error. I hope that explains it.

Open references shouldn't cause deployment issues, only run time issues. There should be an error message earlier in the window somewhere indicating the issue. Sounds like some sort of conflict is occurring.

Cheers,

James

Share this post


Link to post
Share on other sites

Hi James,

 

Thanks for the insight. The use case for the FPGA is pretty complicated, it's responsible for a number of hardware interactions (motor control, electrical stimulator control) as well as monitoring a laser interferometer. These processes are all relatively independant on the fpga itself (separate loops), and I want to be able to do things with them independantly, so I have things like a timed loop in RT which is responsible for computing drive commands and updating the AO connected to the FPGA, and another timed loop which polls the interferometer indicators on the FPGA and sends the info out to a UI.

 

This is why I've split the FPGA reference. It appears that I wouldn't be able to interact with the FPGA like this if each loop opened its own FPGA reference (the others would return an error?).

 

Could you clarify on that last point.

 

Cheers,

Alex

Share this post


Link to post
Share on other sites

Hi Alex,I've never used it this way but I believe each loop should be able to open their own reference to the FPGA as long as they are all using the same bitfile/VI. Cheers,James

Share this post


Link to post
Share on other sites

Hey just an update, I switched to the obtain reference for every loop approach.

 

It appears to have solved my problems, both with responsivity and with deployment after making changes to the RT code.

Share this post


Link to post
Share on other sites

From what it sounds like, I bet that James' notion was spot-on: the last closing of the session was resetting the FPGA, which is fine but will not result in the FPGA restarting (and, as he noted, opening a session to a non-running FPGA VI is perfectly valid, it's just not running).

 

Ensure (at least) one of the following:

  1. ensure that all closing sessions are not configured to reset the FPGA (and ensure that you do not abort the RT VI(s) that open the session(s))
  2. configure the Open Sessions to force a download (if you do not need to maintain state in the FPGA from one opening of the sessions to the next)
  3. On opening a session, force that the VI be running (and possibly filter the warning that occurs if the VI is already running)

Share this post


Link to post
Share on other sites

God I hate Labview FPGA so much. Still having to deal with this crap after all this time.

 

The FPGA spontaneously goes unresponsive. As in, it will work for an hour or two and then suddenly just refuse to operate. Note, I didn't say refuse to connect, it connects just fine, but the controls are just completely unresponsive. No error, just unresponsive. The issue persists across system reboots. Across forced redownloads. I can't figure out any way to stop the issue.

 

I'm at a loss. The only thing keeping in a long stream of invective are my colleagues sitting around me.

Share this post


Link to post
Share on other sites

AlexA,

 

I am not sure what is going on with your system, but we use fpga's with cRIO all the time with the fpga part of the code being the most reliable.

 

We typically open a reference to the fpga bit file and share that in a global across all the parallel loops that need to access the fpga (so only one reference is opened).  We never close the reference unless the real-time code is stopped, which in our case is never.

 

Are you getting any errors with your system?  Opening multiple references (fpga or otherwise) in a loop continuously?

 

"Unresponsive after a while"  points to some sort of memory leak. What if you disable all other parallel fpga loops and run only one fpga loop?

 

Are fpga inputs updating?  do you have an led pulsing with your fpga as a heartbeat to see if it is indeed functioning at all?

 

Can you monitor fpga outputs with a scope or meter of some sort?

 

What is memory and cpu usage on your rt target?  Use NI distributed system mgr to check. Maybe it is your rt target that is not running..

 

Neville.

Share this post


Link to post
Share on other sites

Hi Neville,

 

Thanks for the thoughts.

 

Addressing things:

1) The globally shared reference seems interesting. Do you encapsulate it in a functional global or just a raw global?

 

2) I don't open multiple references continuously. I open one for one loop, one for the other. In the past I've tried just opening a single reference and branching the wire, but that seems to be even more unreliable.

 

3) Whether or not FPGA inputs update is dependant on whether at least one Open Ref manages to complete. (Because of my architecture/separation of responsibility). In some cases, one ref will acquire succesfully while the other will fail.

 

4) FPGA outputs are internalised in this case.

 

5) Memory & CPU is stable while running. This is a restart to restart problem (or appears to be). When I say they become unresponsive "after a while", I mean that, after a random number of system restarts, one or more Acquire Reference operations can fail.

 

Thanks again for your thoughts on this.

Share this post


Link to post
Share on other sites
Hi Neville,

 

Thanks for the thoughts.

 

Addressing things:

1) The globally shared reference seems interesting. Do you encapsulate it in a functional global or just a raw global?

 

2) I don't open multiple references continuously. I open one for one loop, one for the other. In the past I've tried just opening a single reference and branching the wire, but that seems to be even more unreliable.

 

3) Whether or not FPGA inputs update is dependant on whether at least one Open Ref manages to complete. (Because of my architecture/separation of responsibility). In some cases, one ref will acquire succesfully while the other will fail.

 

4) FPGA outputs are internalised in this case.

 

5) Memory & CPU is stable while running. This is a restart to restart problem (or appears to be). When I say they become unresponsive "after a while", I mean that, after a random number of system restarts, one or more Acquire Reference operations can fail.

 

Thanks again for your thoughts on this.

 

Have you considered you may just have a dud FPGA card or problem with the PCIe/PXI bus? I did some work on a system that has an industrial PC with a PCI bus extender, and had lots of very strange hardware problems that just went away when we used a different PC vendor.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.