Analysis of Erasure Code Patents for everyone

The most prominent prior art invalidating this patent is the RAID6 (one of the most commonly used Erasure Code) implementation of the linux kernel. In an article dated 2004 (i.e. ten years before the patent was granted to StreamScale) it is described to be optimized as follows : For additional speed improvements, it is desirable to use any integer vector instruction set that happens to be available on the machine, such as MMX or SSE-2 on x86, AltiVec on PowerPC, etc. Where SSE2 is the acronym of Streaming SIMD Extensions 2. The patent cites Anvin aticle’s but only to state the problem and does not acknowledge it also contains the solution.

via Erasure Code Patents | Analysis of Erasure Code Patents for everyone.

Why the Z-80’s data pins are scrambled

I have been reverse-engineering the Z-80 processor using images and data from the Visual 6502 team. The image below is a photograph of the Z-80 die. Around the outside of the chip are the pads that connect to the external pins. (The die photo is rotated 180° compared to the datasheet pinout, if you try to match up the pins.) At the right are the 8 data pins for the Z-80’s 8-bit data bus in a strange order.

via Ken Shirriff’s blog: Why the Z-80’s data pins are scrambled.

The motivation behind splitting the data bus is to allow the chip to perform activities in parallel. For instance an instruction can be read from the data pins into the instruction logic at the same time that data is being copied between the ALU and registers. The partitioned data bus is described briefly in the Z-80 oral history[3], but doesn’t appear in architecture diagrams.

The complex structure of the data buses is closely connected to the ordering of the data pins.

Virtual AGC Home Page

The Apollo spacecraft used for lunar missions in the late 1960’s and early 1970’s was really two different spacecraft, the Command Module (CM) and the Lunar Module (LM).  The CM was used to get the three astronauts to the moon, and back again.  The LM was used to land two of the astronauts on the moon while the third astronaut remained in the CM, in orbit around the moon.

via Virtual AGC Home Page.

The Virtual AGC project provides a virtual machine which simulates the AGC, the DSKY, and some other portions of the guidance system.  In other words, if the virtual machine—which we call yaAGC—is given the same software which was originally run by the real AGCs, and is fed the same input signals encountered by the real AGCs during Apollo missions, then it will responds in the same way as the real AGCs did.  The Virtual AGC software is free of charge, can be obtained for Windows, Mac OS X, Linux, or as open source software source code so that it can be studied or modified.

The Mining Algorithm And CPU Mining – All About Bitcoin Mining: Road To Riches Or Fool’s Gold?

One of the most difficult problems in computer science is reversing a secure hash (finding an input text for a given output, the digital signature). Let me explain this problem in simple terms. Let’s assume the wealthy but terminally ill Alice wrote her will and stored it on her computer. Knowing that a computer can be hacked and the will can be altered, Alice digitally signed her will with the secure hash algorithm SHA-256. She then emailed the digital signature to all her friends, allowing them to check the validity of the document. Bob wants to hack into the computer and change Alice’s will so that he becomes the sole beneficiary, but he faces a problem: he needs to change the will in such a way that the widely distributed SHA-256 signature stays the same. Otherwise, everybody realizes that the will has been forged. This is the computationally difficult problem of reversing or brute-forcing SHA-256, or finding an input that matches a predefined output. Satoshi famously decided that in order to find a new block, people all over the world need to compete in reversing SHA-256, turning block creation into a global lottery.

via The Mining Algorithm And CPU Mining – All About Bitcoin Mining: Road To Riches Or Fool’s Gold?.

Once very popular among Bitcoin miners, but now somewhat dated, the Radeon HD 5830 card boasts 1120 stream processing units. But that doesn’t mean it literally has 1120 separate cores. Rather, the GPU employs 224 SIMD cores, each of which sports five ALUs operating in parallel (VLIW5).

Are We Shooting Ourselves in the Foot with Stack Overflow?

Unless you’ve been living under a rock for a past couple of years, you must have heard of the Toyota unintended acceleration (UA) cases, where Camry and other Toyota vehicles accelerated unexpectedly and some of them managed to kill people and all of them scared the hell out of their drivers.

The recent trial testimony delivered at the Oklahoma trial by an embedded guru Michael Barr for the fist time in history of these trials offers a glimpse into the Toyota throttle control software. In his deposition, Michael explains how a stack overflow could corrupt the critical variables of the operating system (OSEK in this case), because they were located in memory adjacent to the top of the stack. The following two slides from Michael’s testimony explain the memory layout around the stack and why stack overflow was likely in the Toyota code (see the complete set of Michael’s slides).

via Are We Shooting Ourselves in the Foot with Stack Overflow? « State Space.

Knights Landing Details

knl2-1Table 1 shows estimates of the critical characteristics of the 14nm Knights Landing, compared to known details of the 22nm Knights Corner, Haswell, and Ivy Bridge-EP. The estimate of Knights Landing differ from the rumored specifications primarily in the capacity of the shared L2 cache, which is estimated to be 512KB, rather than 1MB. It is possible, although extremely unlikely that the shared L2 cache is 256KB. The analysis also incorporate several other critical factors which were not mentioned in any rumors, specifically cache read bandwidth and the large shared L3 cache. The L3 cache is estimated as eight times the size of the L2 caches or 144MB in the unlikely scenario that the L2 cache is 256KB, then the L3 cache is likely to be proportionately smaller.

via Knights Landing Details.

GPUs would make terrific network monitors

The task of monitoring networks requires reading all the data packets as they cross the network, which “requires a lot of data parallelism,” Wenji said.

Wenji has built a prototype at Fermilab to demonstrate the feasibility of a GPU-based network monitor, using a Nvidia M2070 GPU and an off-the-shelf NIC (network interface card) to capture network traffic. The system could easily be expanded with additional GPUs, he said.

via Super Computing 13: GPUs would make terrific network monitors – Network World.

See What’s Inside the PlayStation 4 With These Exclusive Photos

What we see is a hardware architecture that’s both simple and powerful. With longtime game designer Mark Cerny leading the way, lending his software-minded expertise to Ootori and the rest of the hardware engineering team, Sony abandoned the overly complex Cell microprocessor that drove the PlayStation 3, building the PS4 around an “x86″ chip similar to the processors that have driven most of our personal computers for the last three decades. The idea was to make it that much easier for developers to build games for the new console, to create the things that will ultimately capture our attention.

via See What’s Inside the PlayStation 4 With These Exclusive Photos | Game|Life | Wired.com.

Barbarians at the Gateways

The goal of this article is to introduce the problems on both sides of the wire. Today a big Wall Street trader is more likely to have a Ph.D from Caltech or MIT than an MBA from Harvard or Yale. The reality is that automated trading is the new marketplace, accounting for an estimated 77 percent of the volume of transactions in the U.K. market and 73 percent in the U.S. market. As a community, it’s starting to push the limits of physics. Today it is possible to buy a custom ASIC application- specific integrated circuit to parse market data and send executions in 740 nanoseconds or 0.00074 milliseconds.4 Human reaction time to a visual stimulus is around 190 million nanoseconds.

via Barbarians at the Gateways – ACM Queue.

By 2005, most shops were also modifying kernels and/or running realtime kernels. I left HFT in late 2005 and returned in 2009, only to discover that the world was approaching absurdity: by 2009 we were required to operate well below the one-millisecond barrier, and were looking at tick-to-trade requirements of 250 microseconds. Tick to trade is the time it takes to:

1. Receive a packet at the network interface.

2. Process the packet and run through the business logic of trading.

3. Send a trade packet back out on the network interface.

To do this, we used realtime kernels with bypass drivers (either InfiniBand or via Solarflare’s

Linux-capable Arduino TRE debuts at Maker Faire Rome

As Zoe Romano puts it in an Arduino blog post, “the Arduino TRE is two Arduinos in one.” Basically, the new ARM Cortex-A8-based Sitara AM335x’s job is to run Linux applications and manage the SBC’s PC-style interfaces (video, audio, Ethernet, USB, optional WiFi, etc.), while an Atmel ATmega microcontroller takes care of the SBC’s real-world I/O (analog inputs, digital I/O, PWM outputs, etc) and handles the interface to shields (Arduino expansion modules) in a fully AVR-compatible manner. Best of all, Romano points out, the 1GHz TI ARM processor offers up to “100 times more performance” than Arduino’s earlier Leonardo and Uno boards, writes Romano.

via Linux-capable Arduino TRE debuts at Maker Faire Rome ·  LinuxGizmos.com.

What’s unique about the TRE, however, is that its Linux OS runs on an ARM processor that’s truly integrated into the SBC’s basic architecture, as opposed to being a collateral benefit of a WiFi add-on module. As a result, the TRE will support a “full Linux” OS in contrast to the Yun’s Linino OS, a custom version of the lightweight OpenWRT embedded Linux distribution.