Think Before You Parallelize

The loop abstractions examined here are just one type of parallel or concurrent programming abstraction available. There is a whole universe out there, Actor Models, Async/Await, Tasks, Thread Pools, and so on. Be sure to understand what you are using, and measure whether it will really be useful, or whether you should focus on fast single threaded algorithms or look for third party tools with better performance.

Source: Think Before You Parallelize

World’s First 1,000-Processor Chip

Each processor core can run its own small program independently of the others, which is a fundamentally more flexible approach than so-called Single-Instruction-Multiple-Data approaches utilized by processors such as GPUs; the idea is to break an application up into many small pieces, each of which can run in parallel on different processors, enabling high throughput with lower energy use, Baas said.

Because each processor is independently clocked, it can shut itself down to further save energy when not needed, said graduate student Brent Bohnenstiehl, who developed the principal architecture.

Source: World’s First 1,000-Processor Chip | UC Davis

Two-hundred-terabyte maths proof is largest ever

The puzzle that required the 200-terabyte proof, called the Boolean Pythagorean triples problem, has eluded mathematicians for decades. In the 1980s, Graham offered a prize of US$100 for anyone who could solve it. (He duly presented the cheque to one of the three computer scientists, Marijn Heule of the University of Texas at Austin, earlier this month.) The problem asks whether it is possible to colour each positive integer either red or blue, so that no trio of integers a, b and c that satisfy Pythagoras’ famous equation a2 + b2 = c2 are all the same colour. For example, for the Pythagorean triple 3, 4 and 5, if 3 and 5 were coloured blue, 4 would have to be red.

Source: Two-hundred-terabyte maths proof is largest ever

There are more than 102,300 ways to colour the integers up to 7,825, but the researchers took advantage of symmetries and several techniques from number theory to reduce the total number of possibilities that the computer had to check to just under 1 trillion. It took the team about 2 days running 800 processors in parallel on the University of Texas’s Stampede supercomputer to zip through all the possibilities. The researchers then verified the proof using another computer program.

AMD lawsuit over false Bulldozer chip marketing is bogus

AMD is facing a lawsuit over claims that it misrepresented the core counts of its eight-core Bulldozer products, but the lawsuit’s technical merit seems extremely weak.

Source: AMD lawsuit over false Bulldozer chip marketing is bogus | ExtremeTech

This lawsuit essentially asks a court to define what a core is and how companies should count them. As annoying as it is to see vendors occasionally abuse core counts in the name of dubious marketing strategies, asking a courtroom to make declarations about relative performance between companies is a cure far worse than the disease. From big iron enterprise markets to mobile devices, companies deploy vastly different architectures to solve different types of problems.

New Device Could Greatly Improve Speech and Image Recognition

Holography is a technique based on the wave nature of light which allows the use of wave interference between the object beam and the coherent background. It is commonly associated with images being made from light, such as on driver’s licenses or paper currency. However, this is only a narrow field of holography.

Holography has been also recognized as a future data storing technology with unprecedented data storage capacity and ability to write and read a large number of data in a highly parallel manner.

Source: UCR Today: New Device Could Greatly Improve Speech and Image Recognition

Speeding Up Grep Log Queries with GNU Parallel

Enter GNU Parallel, a shell tool designed for executing tasks in parallel using one or more computers. For my purposes I just ran in on a single system, but wanted to take advantage of multiple cores.

Having enough memory on my system, I loaded the entire massive file into memory and pipe it to GNU Parallel along with another file consisting of thousands of different strings I want to search for in the “PATTERNFILE”:

cat BIGFILE | parallel –pipe grep -f PATTERNFILE

via Speeding Up Grep Log Queries with GNU Parallel – The State of Security.

Knights Landing Details

knl2-1Table 1 shows estimates of the critical characteristics of the 14nm Knights Landing, compared to known details of the 22nm Knights Corner, Haswell, and Ivy Bridge-EP. The estimate of Knights Landing differ from the rumored specifications primarily in the capacity of the shared L2 cache, which is estimated to be 512KB, rather than 1MB. It is possible, although extremely unlikely that the shared L2 cache is 256KB. The analysis also incorporate several other critical factors which were not mentioned in any rumors, specifically cache read bandwidth and the large shared L3 cache. The L3 cache is estimated as eight times the size of the L2 caches or 144MB in the unlikely scenario that the L2 cache is 256KB, then the L3 cache is likely to be proportionately smaller.

via Knights Landing Details.

GPUs would make terrific network monitors

The task of monitoring networks requires reading all the data packets as they cross the network, which “requires a lot of data parallelism,” Wenji said.

Wenji has built a prototype at Fermilab to demonstrate the feasibility of a GPU-based network monitor, using a Nvidia M2070 GPU and an off-the-shelf NIC (network interface card) to capture network traffic. The system could easily be expanded with additional GPUs, he said.

via Super Computing 13: GPUs would make terrific network monitors – Network World.

Realtime GPU Audio

While these techniques are widely used and understood, they work primarily with a model of the abstract sound produced by an instrument or object, not a model of the instrument or object itself. A more recent approach is physical modeling- based audio synthesis, where the audio waveforms are generated using detailed numerical simulation of physical objects or instruments.

via Realtime GPU Audio – ACM Queue.

There are various approaches to physical modeling sound synthesis. One such approach, studied extensively by Stefan Bilbao,1 uses the finite difference approximation to simulate the vibrations of plates and membranes. The finite difference simulation produces realistic and dynamic sounds (examples can be found at http://unixlab.sfsu.edu/~whsu/FDGPU). Realtime finite difference-based simulations of large models are typically too computationally-intensive to run on CPUs. In our work, we have implemented finite difference simulations in realtime on GPUs.