Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.
Tag Archives: big data
Troll sues Facebook, Amazon and others for using Hadoop
Big data has become the latest front for the patent troll epidemic as a shell company is suing firms for using a common open-source storage framework known as the Hadoop Distributed File System (HDFS).
via Troll sues Facebook, Amazon and others for using Hadoop — Tech News and Analysis.
Hadoop has been built by a large network of contributors, including individual developers and large companies like Yahoo and is an Apache Software Foundation project. HDFS, its storage component, was based on Google’s Google File System. Parallel Iron’s patent complaints, however, say the whole system was made possible by four men:
Erlang Solutions teams with Infoblox to release the first Openflow-1.2 compliant switch
OpenFlow based SDN allows companies to cost effectively improve performance of clustered applications in high density data centres like Big Data analysis and reduce their operating costs.
via Erlang Solutions teams with Infoblox to release the first Openflow-1.2 compliant switch.
Everything You Wanted to Know About Data Mining but Were Afraid to Ask
With data mining it is possible to let the data itself determine the groups. This is one of the black-box type of algorithms that are hard to understand. But in a simple example – again with purchasing behavior – we can imagine that the purchasing habits of different hobbyists would look quite different from each other: gardeners, fishermen and model airplane enthusiasts would all be quite distinct. Machine learning algorithms can detect all of the different subgroups within a dataset that differ significantly from each other.
How Web giants store big—and we mean big—data
The Great Disk Drive in the Sky: How Web giants store big—and we mean big—data.
The need for this kind of perpetually scalable, durable storage has driven the giants of the Web—Google, Amazon, Facebook, Microsoft, and others—to adopt a different sort of storage solution: distributed file systems based on object-based storage. These systems were at least in part inspired by other distributed and clustered filesystems such as Red Hat’s Global File System and IBM’s General Parallel Filesystem.
And one more blurb…
Google wanted to turn large numbers of cheap servers and hard drives into a reliable data store for hundreds of terabytes of data that could manage itself around failures and errors. And it needed to be designed for Google’s way of gathering and reading data, allowing multiple applications to append data to the system simultaneously in large volumes and to access it at high speeds.
MarkLogic: The Operational Database for Big Data
MarkLogic offers next-generation database technology capable of handling any data, at any volume, in any structure. Organizations around the world rely on MarkLogic’s enterprise-grade technology to get to better decisions faster.