CUDA Launch: Out of Resources Error and Strongly Typed Methods

You may see this cryptic error now and then when developing with CUDAfy.NET.  Typically the reason is down to the passing of the wrong parameters to the device function. This example is going to end in tears:

        [Cudafy]
        public static void Scale(GThread thread, ComplexFloat[] c, float scale)
        {
            int id = thread.get_global_id(0);
            c[id].R = c[id].R * scale;
            c[id].I = c[id].I * scale;
        }
        ...
        int N = 1024;
        gpu.Launch(gridSize, BLOCK_LEN, "Scale", devBuffer, 1.0/N);
        ...

It can be corrected by changing 1.0 to 1.0F to ensure we stay with single floating point. As it stands above the result of the division is double floating point which is 8 bytes. The device function expects single which is 4 bytes. Hence the out of resources message. Kind of makes sense.
A better policy can be to use strong typing in the Launch.

        [Cudafy]
        public static void Scale(GThread thread, ComplexFloat[] c, float scale)
        {
            int id = thread.get_global_id(0);
            c[id].R = c[id].R * scale;
            c[id].I = c[id].I * scale;
        }
        ...
        int N = 1024;
        gpu.Launch(gridSize, BLOCK_LEN, Scale, devBuffer, 1.0F/N);
        ...

The only downside is a slight performance hit due to reflection. The degree of this is nowhere near as high as the more elegantly looking dynamic launching that CUDAfy supports.

        gpu.Launch(gridSize, BLOCK_LEN).Scale(devBuffer, 1.0F/N);

First time this is called you can get hit with about 20ms of dynamic runtime goodness. Subsequent calls appear efficient. You have however also lost your strongly typed safety, which does not appear to worry millions of programmers around the world.

Advertisements

Programmers, Engineers and Politics

Should engineers and other technical people be more involved in politics?  Yes, we probably should.  With our analytic problem solving minds we could or should be able to provide a stabilizing hand to the sound-bite laden hubris.  However that means dealing with the kind of people that we’d normally prefer to avoid.  Almost universally there has been a rise in a political class – career politicians with no experience of the world beyond that of politics.  There is the amusing anecdote of former British minister for health under Tony Blair, Alan Milburn who by default was in charge of one of the world’s largest employers – the NHS.  He actually had business experience having run a second hand book store in the 60s caused Daze of Hope, which more usually went under the name Haze of Dope.  This is still more than could be said of his fellow ministers with the exception of John Prescott, who had been a ship’s steward in the 1950s.

English: John Prescott, British Labour politic...

English: John Prescott, British Labour politician, during his last day as Deputy Prime Minister. (Photo credit: Wikipedia)

The political class are the game players that have come to increasingly rule the landscape of governance, intelligence and even that of business.  And game is the apt word even if they are literally dealing with the welfare and lives of millions of people.  The financial crisis of 2008 and its continuing fall out remains a complex scenario.  The causes are well discussed and apparently were clear to see for many in the know, plus the all-knowing man in the pub who at heart knew that we can’t all get rich off selling each other houses with fantasy world values.  What remains a greater mystery is the inability to decide on a means of exiting the recessions and declining fates hitting many economies.  Is it possible to look at this as an engineering problem?  Here is my take on it.

First there was the bailout.  Now that was essentially done to stop global meltdown and general bad news by ensuring that banks could still lend to each other and keep the flow of money going.  Many too big to fails with men in the right places ensured that their institutions not only survived but were rapidly back to declaring record levels of profit (of which next to none is paid out in tax).  Meanwhile thousands of real world product and service businesses small and large went to the wall as they saw orders placed on hold or cancelled and their lines of credit drying up.

Second we saw the complete lack of prosecutions, or Too Big to Jail.  Unlike the Savings and Loans Crisis of the late 1980s we saw comparatively few prosecutions of guilty bankers – outliers like Madoff going to jail until the end of time and some French wet behind the ears minion (Jérôme Kerviel) getting a fine of €5 billion were meant to assuage our anger and thirst for blood.

English: Bernard Madoff's mugshot

English: Bernard Madoff’s mugshot (Photo credit: Wikipedia)

Thirdly, governments discussed how we may best reanimate the wavering economies.  Interestingly the weakest economies were dealt an additional blow when their credit ratings were downgraded which in turn increased their cost of borrowing.  Even more interesting is that the credit rating agencies that now chose to downgrade whole countries were the same agencies that saw fit to put AAA ratings on the toxic subprime mortgage packages.  And in parallel the same politicians who allowed the whole mess to occur were the same to decide how we’d crawl our way out.  Primary solutions involved slashing interest rates, quantitative easing and austerity.  Now the former two are essentially giving the banks access to more money in the laughable hope they will lend it to businesses that can resurrect the economy.  Austerity is even more controversial.

From what I can figure almost no country had ever managed to successfully use budget cuts to get out of a financial hole.  Yes there is no point in wasting money, but not every country is as extreme as Greece and therefore vast overspending on internal costs is rarely the key issue (Greece also received massive loans before the proverbial hit the fan in order to continue consuming foreign goods that they could ill afford).  Basically countries are forced by world or continent level banks (World, European, etc) to conform to certain economic indicators.  These include trade deficits, interest rates, spending, etc.  At stake is our ratings doled out as discussed previously which in turn affects borrowing rates.  Bizarrely enough making gross cuts in things like welfare and education scores big points with the lords that be, even if it is likely to result in a more unstable society and long term skills shortage.  When the public protests we hear “We’re all in it together, we must all suffer.” Unfortunately if you are earning €10K per year and you lose 10% of that, this is far more painful than losing say 10% of €100K.  Yes really.  Cutting back on care is also an idiotic move that is rooted in fiction and exposes a cruel element of human nature.  A government says they need to cut €1 billion from the home help budget.  Now has this got anything to do with needing that €1 billion somewhere else?  Let’s forget any level of human compassion here and act like a hard, cold blooded money man.  That €1 billion is paid to carers who help old and disabled people stay at home longer.  We cut it and watch the real consequences.  First of all those carers pay tax, so we get a decent whack of that back anyway.  Secondly, we end up paying unemployment money and housing benefit to have some capable person sit at home, or if welfare not present we risk increasing crime rates.  Thirdly we need more places for old people in nursing homes which is far more expensive.  Fourthly we lose flow of money that these now out of work people would bring to their communities, the sales tax and the taxes these people pay.   Such cuts have nothing to do with economic reality.

Despite being told by financiers that we engineers do not understand the complex world of national and global economics, that does not mean we cannot apply some hard pragmatism.  As far as I know finance does not break down as we go to the world level as Newtonian physics does when moving to the subatomic.  No, it is quite simple at heart and all the layers of shenanigans above are just that, a means of disguising and delaying hard truths.  Goldman Sachs fudged the books for Greece and in the 1990s and 2000s the Irish gave an impression that they were the Celtic Tiger and were richer per capita than Germany.  This was, as was borne out, complete bullshit.

A country within its own borders need only decide what standard of living it wishes to maintain.  Is there a minimum level that we refuse to see people live below?  We then as a society distribute the work to ensure we meet that level.  Infrastructure such as roads, power generation and distribution, water and sewage, roads, railways, airports need building and maintaining; we need healthcare, food, transport, education, and we want entertainment, some luxuries.  If we can meet all these needs within our own borders the state bank need only create enough money to grease the system.  The money is invented from thin air as it always could be and still is.

Hoover Dam Bypass

Hoover Dam Bypass (Photo credit: dherrera_96)

After the great depression of the early 1930s the USA embarked on a series of massive public projects.  They were broke but they could still invent the money to get the work done.  How successful this was is still debated, with many believing that real growth in the USA only began after World War II.  The rise of Nazi Germany was in response to the devastated German economy of the 1920s.  But if the economy was so poor how could they afford to build the greatest war machine in history?  And amazingly even a broke country busy with cutting social services can pull plans out of the hat to build say a high speed railway or fight yet another war in a far flung land?

Ultimately the country draws parallels with the family – if we decide to outsource our housework then we better have the means to pay for it in an external valid currency (at home we can choose to use monopoly money or barter, outside our home we typically use the national currency but there are also local community currencies and the likes of Bitcoin).  If we choose to close our own coal mines and steelworks because it is cheaper abroad then we better have enough foreign currency to pay for it.  That means we need to export enough other things.  When we don’t then we need to borrow, so we need a good credit rating.  The good rating is dependent on factors like our trade deficit.  This can be hidden by making arbitrary cuts in internal budgets and playing in the casinos that our financial centres resemble, but ultimately a country cannot go on forever importing more than it exports.

Challenger

Challenger (Photo credit: jesvwilliams)

Richard Feynman famously said that at the conclusion of the Challenger enquiry “For a successful technology, reality must take precedence over public relations, for nature cannot be fooled”.  I’d say this equally applies to economics and society.   As engineers, scientists and programmers we can keep this in mind next time we’re downwind of a mob of game playing bankers and politicians.

Portable CUDA on NVIDIA, AMD, Intel GPUs and CPUs

I’ve just uploaded a new version of the open source CUDAfy.NET SDK that targets Linux.  For those that do not know, CUDAfy.NET is an easy to use .NET wrapper that brings the NVIDIA CUDA programming model and the power of GPGPU to the world of C#, VB, F# and other .NET users.  Anyone with a brief understanding of the CUDA runtime model will be able to pick up and run with CUDAfy within a few minutes.

One criticism of CUDA is that it only targets NVIDIA GPUs.  OpenCL is often regarded as a better alternative, being supported by both NVIDIA and AMD on their GPUs, but also on CPUs by Intel and AMD.  That should have been the end of the story for CUDA if it were not for the fact that OpenCL remains even in its 1.2 version, a bit of a dog’s breakfast in comparison to CUDA.  I guess that is what it gets by trying to be all things to all men.  Although in theory platform agnostic to get the best out of it you need to take care of a number of often vendor specific issues and if you need higher level maths libraries such as FFTs and BLAS then it gets tougher.  Ultimately if you’ve been reared on CUDA moving to OpenCL is not a nice proposition.

cudafy-multi-platform

CUDAfy .NET brings together a rare mix of bed fellows

Because CUDAfy is a wrapper we were able to overcome this limitation.  The same code with some small restrictions can now target either OpenCL or CUDA.  The restrictions include no FFT, BLAS, SPARSE and RAND libraries, and no functions in structures. Other than this you can use either OpenCL or CUDA notation for accessing thread, block and grid id and dimensions, same launch parameters, same handling of shared, constant and global memory, and same host interface (the methods you use for copying data etc).  Now with the release of the CUDAfy Linux library you can run the same application on both Windows and Linux, 32-bit or 64-bit, and use AMD GPUs, NVIDIA GPUs, Intel CPU/GPUs, AMD APU/CPU/GPUs and also some Altera FPGA cards such as those from BittWare and Nallatech.  To run such an app you need only have the relevant device drivers and OpenCL libraries installed, and you’ll need Mono installed on Linux.  If you have an NVIDIA or AMD GPU in the system and up to date drivers then it should just work.  For developing you’ll need the relevant CUDA or OpenCL SDK.  NVIDIA, Intel, AMD and Altera all have theirs.  An advantage of OpenCL is that the compiler is built in, you do not need Visual C++ or gcc on the machine to make new GPU functions.

The aim we have in mind is to cater for the increasing number of embedded or mobile devices with GPU capabilities.  NVIDIA is yet to support CUDA on their Tegra, but there are already ARM and other devices that have OpenCL libraries.  Examples include the Google Nexus 4 phone and Nexus 10 tablet.  The embedded market may often be less visible (excuse pun) than their more obvious phone and tablet counterparts but since they are hidden in so many machines their numbers are huge.  Power consumption is always an issue which is why being able to use the GPU for general purpose processing instead of just graphics is so important.  Being able to take advantage of that in a straightforward way is therefore vital and CUDAfy .NET can be a solution for such devices running Windows or Linux.

Nexus_10

Google Nexus 10 has OpenCL libraries onboard

CUDAfy .NET is an open source project hosted at codeplex and is licensed under LGPL.  You can freely link to the library from commercial applications.  Any changes you make to the source code must be re-submitted.  Alternatively there is a commercial license available that allows you to do the things you cannot with LGPL like embed (modified) versions into your application and it includes support.  All donations received for CUDAfy .NET go to Harmony Through Education, a charity helping handicapped children in the third world.

Below is an example application written in C# that can be targeted for either CUDA or OpenCL by simply changing an enumerator.

Continue reading

Low Cost, High Performance FPGA and GPGPU Based Data Acquisition System

The Xilinx evaluation boards such as the ML605 (Virtex-6), KC705 (Kintex-7) and VC707 (Virtex-7) give access to high end FPGAs for a relatively low budget.  Actually the cost of these boards is about the same as you’d pay if you wanted to buy just the FPGA on the board itself, so essentially you are getting all the other features such as USB, memory, Ethernet, PCI-Express interface, etc. for free.  A very interesting use for these boards is in combination with analogue to digital or digital to analogue converters (ADCs and DACs).   The FMC form factor has grown rapidly in popularity giving a huge range of high performance modules.  The Xilinx evaluation boards listed above can all take up to two such FMC modules.  This makes an extremely powerful processing platform.  Below you can see a Xilinx KC705 Kintex-7 Evaluation Board and a 4DSP FMC150 ADC/DAC Daughter Module.  We have a lot more FMC modules available.

kc705fmc150 FMC150

Xilinx Evaluation Board Problems

The main disadvantage is that the PCBs are on the large side and when FMCs are mounted and the board is placed in a PCI-Express slot of a standard PC or workstation, it is no longer possible to close the system.  The analogue cables are also positioned in an awkward position, protruding further from the open door.  This makes such systems very fragile and liable to damage should a cable be accidently pulled or an item fall into the open computer.

4DSP-GPUDirect-ML605-Quadro4000

1U Rack Solution

To provide a solution we have come up with a 1U 19” rack platform that can house a Mini-ITX motherboard, Xilinx evaluation board in a PCIe 16x slot via riser, hard drives and FMC modules.  Short analogue cables are used to connect the FMC analogue connectors to the front panel.  This has the secondary advantage of allowing use of larger, more robust and convenient SMA connectors in place of the customary SSMC or MMCX of the FMC modules.  The power supply is an external laptop style unit rated at 120W which is more than sufficient.

IMG_6196-600px               IMG_6153-400x400px

400Mbytes/second Storage to a Single Drive

In the photographs you can see a system with a 4DSP FMC108 8 channel 250MHz ADC.  Using a single SSD we can acquire and store data at more than 400Mbytes/second sustained.  Furthermore we can elect to do some of the processing on the CPU.

OpenCL and General Purpose GPU on Intel CPU

Using latest Intel Core CPUs we have good OpenCL performance.  This allows us to make use of the rapid development possibilities of a GPU and CPU and put say complex image processing here and leave simpler pre-processing tasks to the FPGA.  This flexibility can be a huge benefit during proof of concept stages when algorithms are changing frequently.  In my next post I’ll look at some code and show how we bring FPGA board, Intel CPU and storage together.

OpenCL logo

OpenCL logo (Photo credit: Wikipedia)

Availability

Each 1U chassis is made from aluminium and is customized for the required drillings.  It’s a relatively small amount of work for a very professional result that will protect your evaluation system and even allow you to take it out of the lab.  If you are interested in this platform or something similar for your Xilinx FPGA evaluation board based projects then please contact us.

NVIDIA – Clear as Mud – Hyper-Q and Dynamic Parallelism on Laptops

Until now NVIDIA CUDA’s powerful new Hyper-Q and Dynamic Parallelism features were only available on Tesla Kepler K20 and some Quadro K-series cards.  Geforce cards do not support it.  The main reason for this apparently is that these functions require features only available on some Kepler architectures.  However that is not the full story since the Geforce Titan does supposedly support Hyper-Q and Dynamic Parallelism but not dual copy engines or GPUDirect RDMA as we’ll see below.

Now that is rather disappointing news if you want to experiment with these features on a lower cost platform, but there is an alternative.  Hyper-Q and Dynamic Parallelism will be available on certain laptops with the GK208 chip.  And it is not only expensive laptops.  Likely there will be an Acer model coming in around $700.  Why is this?  Many programmers develop their CUDA algorithms on a laptop and then move to the large expensive (and very likely shared) Tesla workstations later.  I’m not clear on whether dual copy engines will also feature.  Probably not.  Titan also misses this powerful ability to transfer data to and from the GPU concurrently.  For the right algorithm it can make a big difference.  I’ve got a GTX680 here which wipes the floor with the Quadro 4000 in terms of single floating point processing, however when there is a lot of data transfer the advantage can switch around.  Dual copy engines are actually available on almost all NVIDIA cards, only it is not enabled.  In some earlier series cards it could be enabled by hacking the firmware.  In some Kepler cards it and other Tesla features can be enabled by unleashing a soldering iron on your board.

With NVIDIA, accurate information is about as hard to get as water in the desert.  There is more misinformation flying around than during the Iraq war.  Supposedly the GT730M will have GK208.  Except when it has a GK107.  Then there is GPUDirect.  This marketing word when in the (Linux only) GPUDirect RDMA flavour (as opposed to the GPUDirect for Video variant) refers to the ability to make use of a little used PCI-Express features for peer to peer data transfers.  This is a fantastically important feature, allowing data to travel between GPUs, network cards and other PCIe devices without burdening the CPU and system memory.  At GTC2012 this was said to be a Kepler feature and all that was needed was CUDA 5.0 and the latest drivers.  You could then adapt your own device drivers to support this.  In our case that was for an FPGA card, so we could DMA directly to the GPU and thus reduce latency and load on CPU.  Then without warning this support was removed from all but the Tesla K20.  I think.

One piece of info I got from a source at NVIDIA is that they do not want the support headache of having massive numbers of (Geforce) people using advanced features like these.  So they restrict to their own made high end cards.  For GPUDirect RDMA I can see some logic in this, in that you are also reliant on correctly modifying the driver of another card and the motherboard architecture is also important.  Making use of a PCIe bridge is best, like a PLX device.  If you only have the PCIe switch of the CPU then it may not work at all.  So complaints to NVIDIA could actually be the fault of motherboard, motherboard drivers, chipset, 3rd party PCIe device, 3rd party made NVIDIA GPU etc – too many variables.  Unfortunately the Tesla K20 and Titan cards are too power hungry for many non-high performance computing applications and a low cost Geforce would work fine (think embedded systems).

Will GK208 make it to desktop GPUs?  And more so, will it have these K20 features enabled?  Depends who you read.  Here is a GT630 with the GK208?!?! The moral of the story is actually that NVIDIA needs to get clear on what cards support what.  We need clear information and guarantees.  Prancing around smugly announcing the latest super computer populated with their CUDA cards is not enough.  The rest of us mortals are generally left speculating and making fools of ourselves in front of our customers after trying to convince them to leap into GPGPU.  End users are right to be concerned about product road maps, product lifetimes, driver and feature supports.

Ethics and your Graphics Processing Unit (GPU)

With great power comes great responsibility goes the phrase.  In computing circles I would say that we’d be liable to think more along the lines of With great power comes great electricity bills or With great power comes great cooling problems.  But should we also be more often considering the original intention?  Should we as the engineers wielding the computer power be concerned with how this technology could be abused?  A quick trawl of the internet shows precious little concern for such issues – we are almost all completely entranced by the rush of technical possibilities coming at us.  If we give the matter any concern at all we tend to think in altruistic terms, of the great potential for a safer, more organized, more open, more equal, more efficient and faster world.

In 2011 there was a flood of news reports about how GPUs (Graphics Processing Units) could be used for more than pretty graphics and be used to target such tasks as decryption and password cracking.  This of course was a good way to raise publicity for GPGPU (General Purpose GPU) and create a new market for the likes of NVIDIA and AMD.  Programming tools such as CUDA and OpenCL made leveraging the massively parallel architectures of GPUs for non-graphics tasks much easier.  Decrypting secure data and cracking passwords were apparently well suited to such devices. [1][2][3]

From an engineering point of view it is exciting to understand the challenges of cracking modern encryption methods and to see the effect of password length on complexity.  Up to eight random characters is apparently quite straightforward and can be done within hours.  Going to ten characters can suddenly take decades.  Going beyond this can quickly take millennia.  The mathematical theory behind all this is quite fascinating. [4]

Another area often mentioned in the same breath as GPUs is computational finance.  This is the world of so-called quants.  These chaps are attracted from the fields of science into the world of financial engineering.  High performance computing is used to predict stock market movements, calculate pricing, quantify risk, etc.  The more horsepower we have the more chance we have of outsmarting the competition.  We learn of such things as high frequency trading where automated buying and selling are made so rapidly that sometimes the investment is held only for milliseconds. We learn that latency is critical and by locating a computer centre closer to the exchange and improving data throughput, we can get further advantage. [5]

If you attend a high performance computing conference you will be able to attend any number of talks given by academics or software engineers detailing the astonishing breakthroughs they have made in areas such as these.  But while listening to such information should we not also use our well trained and agile minds to question the greater ramifications of what we are actually doing?

Is the cracking of passwords and accessing confidential information always good?  Is being able to see finance as mathematical models devoid of a bricks and mortar, flesh and blood reality really of sound use?  If we touch on such questions at all then we will hear about system administrators who lose access to vital company information.  We hear about people losing access to personal, irreplaceable documents and photographs.  We hear about security agencies needing access the communications of criminals and terrorists.  In finance, we are told that we will get better liquidity, better market stability and that the competition will bring value.

But do we really believe all this?  Does this kind of computing power really give us a more stable, more efficient financial sector?  We can learn that high frequency trading skims off money from the transactions between investors and businesses, a form of unauthorized taxation as they pre-empt genuine trades.  We witness phenomena such as the flash crash of 2010 as algorithms compound on errors to create chaotic spikes. [6]  We can hide behind the maths of risk to the exclusion of real facts such as the unsustainable house of cards that was sub-prime.  While the banking sector has pretty much recovered, there are legions of ordinary people with reduced pensions, bankrupt businesses, lost savings, without work and facing austerity measures cutting benefits and services. [7]

The impact of being able to break passwords, decrypt secure communication and monitor all internet traffic is altogether more sinister and raises questions about the kind of world we wish to live in.  We naively assume that we have nothing to hide and that such technology is used for our collective safety.  We can intercept terrorist plots and illegal business activities for example.  It is now technically possible to monitor all internet traffic in a small to medium size country [8][9] and within a year or two it will be possible to affordably do so for any country.  The cost of such processing power would apparently come in at less than one modern fighter jet.  Scaling up from the systems already available this is quite believable. [10]

When discussing such issues with an engineer friend, he claimed he was not worried because likely it would not be possible to monitor the data in any useful way.  This underestimates the ingenuity and rate of progress of the computing world.  Google serves a significant proportion of the world’s internet users and already tracks a massive range of statistics of these people.  Gmail scans emails and based on content displays adverts.  Search history, links clicked and more are stored.  Google have already showed that the theory works fine.

As people, as engineers, we like to assume that the technology we create will be used for good.  We tout the so-called Twitter and Facebook revolutions as examples of how technology is opening up the world and allowing repressed peoples to overthrow corrupt despots[11], but we fail to see how the same technologies allow those same despots to monitor their own people.  We do not hear often how the Iranian government used Facebook to identify the protestors and their families. [12]  We do not also hear often about how easy it is to track the internet activities of our heroic freedom campaigners.  And what should happen if we should become disillusioned with our own governments?  Would we be allowed to democratically protest and oust them, or would the technological might and subsequent rule of law be used against us under a flimsy patriotic pretext?

And where will it all stop?  If our on-line and mobile communications can be monitored what about our off-line personal discussions with our fellow freedom fighters / terrorists (delete depending on view point)?  As Google Glass makes its entrance, backed with on-the-fly translation[13] and facial recognition [14] then we see that technically speaking we could also monitor everything we do, see, hear and say.  If we are not careful we will find that we are engaged in a technology enabled race to the bottom of morality, a desperate fight to protect a way of life that we’d already lost.

References

1.       https://securityledger.com/new-25-gpu-monster-devours-passwords-in-seconds/
2.       http://erratasec.blogspot.nl/2011/06/password-cracking-mining-and-gpus.html#.UZ90kLVmh8E
3.       http://www.cyint.in/products_decryptiontools.htm
4.       http://en.wikipedia.org/wiki/Password_cracking
5.       http://en.wikipedia.org/wiki/High_frequency_trading
6.       http://en.wikipedia.org/wiki/2010_Flash_Crash
7.       http://www.motherjones.com/mojo/2013/05/bank-record-profits-fdic-unemployment-housing
8.       http://surveillance.rsf.org/en/amesys/
9.       http://www.defenceweb.co.za/index.php?option=com_content&view=article&id=18932&catid=74&Itemid=30
10.     “Freedom and the Future of the Internet”, Julian Assange, 2012. http://emilkirkegaard.dk/en/?p=3429
11.     http://en.wikipedia.org/wiki/Twitter_Revolution
12.     “The Net Delusion”, Evgeny Morozov, 2012, http://www.publicaffairsbooks.com/morozovch1.pdf, p10
13.     http://www.huffingtonpost.com/2012/07/23/google-glass-inspired-specs-auto-translate_n_1695008.html
14.     http://www.guardian.co.uk/technology/2013/jun/03/google-glass-facial-recognition-ban

Why use FPGAs for data acquisition systems?

Why would you want to purchase an ADC (analogue to digital convertor) or DAC (digital to analogue convertor) for use with an FPGA module?  This is an important question.  The choice of FPGA (Field Programmable Gate Array chip) over say a conventional CPU or DSP, or even a GPU (Graphics Processing Unit) means for many people a leap into the unknown.  We move away from the familiar world of programming in languages such as C/C++ or .NET (C#, VB etc) and have to deal with more esoteric means such as VHDL or Verilog.  There are of course new tools to simplify the programming of FPGAs but they have their limitations and are often prohibitively expensive.

ImageImage

ImageImage

Two FPGA cards and two FMC I/O cards

So, I return to my question, why move to FPGAs?  No one decides out of the blue one day that they want an FPGA based system.  What we want is a means to process our data and in the context of this article we’re assuming real world data being received from sensors and/or being generated and transmitted (ADCs and DACs respectively).  Say we want to acquire a number of channels of data, for example radio signals from a variety of positions.  We can connect our antennas to ADCs.  Then what?  We need to do something with the digitized data.  We could store this to disk for off line analysis.  We could monitor the data for specific signatures and alert a user.  In either case once we go above a certain sampling frequency and number of channels the sheer quantity of data is likely to prove a handful.  We’re going to have issues with:

·         Transfer speed

·         Storage speed

·         Storage space

·         Processing

Transfer speed would be the bandwidth from the ADC over the data link to a disk, CPU or network card.  Storage speed would be the speed at which we can write data to a hard disk or solid state disk.  Storage space refers to the fact that disk space is finite and keeping the quantity of data for off-line processing manageable.  Processing here refers to real-time processing – this should keep up with the rate of data acquisition.

A single modern ADC module can easily produce in excess of 5GBytes/second.  The fastest links typically found in a computer can handle around 6GBytes/second sustained.  But that is only the beginning of our problems.  Once the data has arrived at the CPU what do we do with it?  The fact is that such a data stream will easily saturate the capabilities of even a powerful CPU especially if we must re-order the data and do complex digital signal processing.

Tools such as NVIDIA’s CUDA have allowed simpler programming of GPUs for non-graphics purposes (read general purpose processing) and these add in cards are often quite suited to the purpose of signal processing.  But even here we are taxing our system considerably, relying on predictable data transfers, introducing latency with every step and on top of it all we should expect relatively high power consumption.  A GPU can easily require in excess of 200W and a high end CPU can require more than 125W.  So our reasonably low cost, easy to program system is going to be relatively large, need considerable cooling and will consume a fair bit of power.  And what happens if we have many more sensors and therefore more ADCs?

It is when we hit these issues that we start looking at alternatives and we learn about FPGAs.  We learn that these little marvels can outperform any CPU and even GPU for the typical signal processing tasks and at a fraction of the power consumption.  We then figure that we do not want a discrete ADC card with its own application programming interface (API) that would still need to transfer data to the FPGA via the host CPU.  We realize that the ADC should interface directly with the FPGA module.  We then read about FMC – FPGA Mezzanine Card and the many very high performance ADCs (faster, higher resolution can be more insight to your data) available in this form factor and the many FPGA modules supporting these daughter cards.  Just mount the ADC directly on the FPGA module, what could be easier?  Then we remember that we’ve left the world of our favourite programming languages and we need to learn a whole new approach.  It’s a step into the unknown for many, a steep learning curve, complex and expensive tools, but the performance benefits are clear as day and the allure of greater depth of analysis is irresistible and game changing.

Now the blurb – Hybrid DSP has the experience to advise on and supply some of the highest performance ADCs, DACs, video and other input/output cards on the market.  All these I/O products are available exclusively as FPGA daughter cards.  You may choose to configure the FPGA card as a simple glue-logic interface to a host computer, or you may elect to have the FPGA card perform the complete processing chain, or you may choose any number of steps in between.  A hybrid or heterogeneous system may use the FPGA card for re-organizing the data, some simple pre-processing that will not tax your FPGA skills too highly nor require too large and expensive an FPGA.  You then may do the complex signal processing (e.g. FFTs) on a GPU card.

As I said no one wakes up one day and decides to go with FPGA and FPGA I/O daughter cards.  We just end up here.  And that’s where Hybrid DSP comes in with our fairly unique approach to advising on and putting together systems that meet requirements while balancing cost, development time, skills, power consumption and size.