An Open-Source exFAT Implementation Reaches v1.0

Linus Torvalds and others in the past have characterized FUSE file-systems as being for toys and misguided people, but FUSE has been used before for bringing Sun/Oracle’s ZFS to Linux, various other creative file-system implementations, and now exFAT. ExFAT support for Linux has been talked about going back to early 2009 but the support has been crap on Linux.

via [Phoronix] An Open-Source exFAT Implementation Reaches v1.0.

I always find filesystem debates fascinating.

Hands On With Kim Dotcom’s New Mega

So what’s to stop Mega from going down just the way Megaupload did? Mega’s privacy, which is a no-foolin’ stroke of genius. See, all of your files are encrypted locally before they’re uploaded, so Mega has no idea what anything is. It could be family photos or work documents, or an entire discography of your favorite band. Poof: online and easy to share. And importantly, Mega doesn’t have the decryption key necessary to get in. See? It’s a masterstroke of copyright subversion.

via Hands On With Kim Dotcom’s New Mega: This Service Could Dismantle Copyright Forever.

Technically you should be able to do this with any cloud storage service.  The key here is that the encryption is done locally.  There are many ways to encrypt your stuff locally so why would it matter which cloud storage provider you use?  Maybe I’m missing something but this doesn’t seem all that novel of an idea other than perhaps the new Mega provides the software and user interface to make the entire process easier.  Mega is supposed to launch tomorrow so more information will surface.

Google Accidentally Transmits Self-Destruct Code to Army of Chrome Browsers

This may be a first. Bad webpage coding can often cause a browser to crash, but yesterday’s crash looks like something different: widespread crashing kicked off by a web service designed to help drive your browser.

via Google Accidentally Transmits Self-Destruct Code to Army of Chrome Browsers | Wired Enterprise | Wired.com.

Twitter, PayPal reveal database performance

Cole revealed that Twitter’s MySQL database handles some huge numbers — three million new rows per day, the storage of 400 million tweets per day replicated four times over — but it is managed by a team of only six full-time administrators and a sole MySQL developer.

via Twitter, PayPal reveal database performance – Software – Technology – News – iTnews.com.au.

Daniel Austin, a technology architect at Paypal, has built a globally-distributed database with 100 terabytes of user-related data, also based on a MySQL cluster.

Austin said he was charged with building a system with 99.999 percent availability, without any loss of data, an ability to support transactions (and roll them back), and an ability to write data to the database and read it anywhere else in the world in under one second.

State of the NAS: private clouds and an app platform

Just as significantly, the firmware that many companies are offering is now extensible. Most NAS boxes are Linux systems, and it’s often been possible to ssh in and install software on them. But several companies are currently offering something that looks suspiciously like an app store, where NAS users can do one-click installs of additional features.

via State of the NAS: private clouds and an app platform | Ars Technica.

The main challenge is that all these options add a degree of complexity to managing things, some on the NAS itself, and some in terms of integrating it with your router, software, etc. Finding the software and firmware with the right balance for you is probably more important than picking your hardware. Of the ones we’ve tried, we’re partial to Synology’s firmware (some of us exceedingly fond) because of its huge range of capabilities and frequent updates that add even more. But if you can, try a few

iSNS: Technical overview of discovery in IP SANs

The three main protocols for IP SANs are Fibre Channel over IP (FCIP), Internet Fibre Channel Protocol (iFCP), and Internet SCSI (iSCSI). As shown in Figure 1, the iSCSI, iFCP, and FCIP protocols support a serial SCSI-3 interface to the standard SCSI command set expected by the operating system and upper-layer applications. This allows conventional storage I/O to be performed over a high-performance gigabit transport. Serial SCSI-3 transactions are carried over TCP/IP, although only iFCP and iSCSI leverage native TCP/IP for each storage end device. Each IP storage protocol has unique requirements for discovery.

via iSNS: Technical overview of discovery in IP SANs.

A database that knows what time it is

Google has made public the details of its Spanner database technology, which allows a database to store data across multiple data centers, millions of machines and trillions of rows. But it’s not just larger than the average database, Spanner also allows applications that use the database to dictate where specific data is stored so as to reduce latency when retrieving it.

via Google’s Spanner: A database that knows what time it is — Data | GigaOM.

Spanner is cool as a database tool for the current era of real-time data, but it also indicates how Google is thinking about building a compute infrastructure that is designed to run amid a dynamic environment where the hardware, the software and the data itself being processed is constantly changing.

OpenAFS

AFS is a distributed filesystem product, pioneered at Carnegie Mellon University and supported and developed as a product by Transarc Corporation (now IBM Pittsburgh Labs). It offers a client-server architecture for federated file sharing and replicated read-only content distribution, providing location independence, scalability, security, and transparent migration capabilities. AFS is available for a broad range of heterogeneous systems including UNIX, Linux,  MacOS X, and Microsoft Windows

IBM branched the source of the AFS product, and made a copy of the source available for community development and maintenance. They called the release OpenAFS.

via OpenAFS.

Disks from the Perspective of a File System

Most applications do not deal with disks directly, instead storing their data in files in a file system, which protects us from those scoundrel disks. After all, a key task of the file system is to ensure that the file system can always be recovered to a consistent state after an unplanned system crash (for example, a power failure). While a good file system will be able to beat the disks into submission, the required effort can be great and the reduced performance annoying. This article examines the shortcuts that disks take and the hoops that file systems must jump through to get the desired reliability.

via Disks from the Perspective of a File System – ACM Queue.

Luckily, SATA (serial ATA) has a new definition called NCQ (Native Command Queueing) that has a bit in the write command that tells the drive if it should report completion when media has been written or when cache has been hit. If the driver correctly sets this bit, then the disk will display the correct behavior.

In the real world, many of the drives targeted to the desktop market do not implement the NCQ specification. To ensure reliability, the system must either disable the write cache on the disk or issue a cache-flush request after every metadata update, log update (for journaling file systems), or fsync system call. Both of these techniques lead to noticeable performance degradation, so they are often disabled, putting file systems at risk if the power fails. Systems for which both speed and reliability are important should not use ATA disks. Rather, they should use drives that implement Fibre Channel, SCSI, or SATA with support for NCQ.

On Linux here’s how you can check if your drive has NCQ.

$ cat /sys/block/sd?/device/queue_depth

A 1 indicates no NCQ.  and

$ cat /sys/block/sd?/device/queue_type

My green drives came back with none.