The hackers who broke into Equifax exploited a flaw in open-source server software

That vulnerability, according to a report on the data breach by William Baird & Co., was in a popular open-source software package called Apache Struts, which is a programming framework for building web applications in Java. Two vulnerabilities in Struts have been discovered so far in 2017. One was announced in March, and another was announced earlier this week on Sept. 4. At the moment, it’s unclear which vulnerability the Baird report was referring to.

Source: The hackers who broke into Equifax exploited a flaw in open-source server software — Quartz

The bug specifically affects a popular plugin called REST, which developers use to handle web requests, like data sent to a server from a form a user has filled out. The vulnerability relates to how Struts parses that kind of data and converts it into information that can be interpreted by the Java programming language. When the vulnerability is successfully exploited, malicious code can be hidden inside of such data, and executed when Struts attempts to convert it.

Eight Ways to Blacklist with Apache’s mod_rewrite

With the imminent release of the next series of (4G) blacklist articles here at Perishable Press, now is the perfect time to examine eight of the most commonly employed blacklisting methods achieved with Apache’s incredible rewrite module, mod_rewrite. In addition to facilitating site security, the techniques presented in this article will improve your understanding of the different rewrite methods available with mod_rewrite.

via Eight Ways to Blacklist with Apache\’s mod_rewrite | Perishable Press.

Solr: The Most Important Open Source Project You’ve Never Heard Of

Lucene is used by many companies and groups as the foundation for their search engines. These organizations include AOL, Disney, and Eclipse. Lucene’s chief selling point is that the indexing engine, with a footprint of a mere megabyte of RAM, can index up to 150GBs per hour of text on commercial off-the-shelf hardware. That’s darn good!

Solr comes into the picture as the search platform front-end for Lucene. It provides full-text search, including the ability to handle such formats as Microsoft Word and PDF with Apache Tika; hit test highlighting; and faceted search, which incorporates free text searching with topic taxonomy indexing.

via Solr: The Most Important Open Source Project You’ve Never Heard Of.

Under the hood, Solr is written in Java and it relies on Lucene for its core functionality.  It usually runs within a servlet container such as the Jetty HTTP server and Javax.servlet.

Apache Mesos: Dynamic Resource Sharing for Clusters

Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark (a new framework for low-latency interactive and iterative jobs), and other applications. Mesos is open source in the Apache Incubator.

via Apache Mesos: Dynamic Resource Sharing for Clusters.

Mesos is being used to manage clusters at Twitter, AirBnb, Conviva, UC Berkeley, and UC San Francisco.

Apache plugin turns legit sites into bank-attack platforms

Bureau didn’t say how the site running the plugin was hacked. Many legitimate websites used in malware attacks are commandeered after administrator credentials are compromised. He said the malicious Apache plugin is separate from a Linux rootkit discovered last month that also injects malicious content into otherwise legitimate webpages.

via Apache plugin turns legit sites into bank-attack platforms | Ars Technica.

New Apache project will Drill big data in near real time

Because Hadoop uses MapReduce to perform data queries, searches have to be done in batches. So, while you can perform highly detailed analysis of historical data, for instance, one area you would not want to use Hadoop for is transactional data. Transactional data, by its very nature, is highly complex and fluid, as a transaction on an ecommerce site can generate many steps that all have to be implemented quickly.

Nor would it be efficient for Hadoop to be used to process structured data sets that require very minimal latency, such as a Web site served up by a MySQL database in a typical LAMP stack. That’s a speed requirement that Hadoop would poorly serve.

via New Apache project will Drill big data in near real time | ITworld.

Expanding supported query languages will be one area of focus for the Drill project. Another will be adding support for additional formats, such as JSON, since right now Dremel only supports the Google Protocol Buffer Format.

Apache Accumulo

The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system. Apache Accumulo is based on Google’s BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Apache Accumulo features a few novel improvements on the BigTable design in the form of cell-based access control and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. Other notable improvements and feature are outlined here.

via Apache Accumulo.

Troll sues Facebook, Amazon and others for using Hadoop

Big data has become the latest front for the patent troll epidemic as a shell company is suing firms for using a common open-source storage framework known as the Hadoop Distributed File System (HDFS).

via Troll sues Facebook, Amazon and others for using Hadoop — Tech News and Analysis.

Hadoop has been built by a large network of contributors, including individual developers and large companies like Yahoo and is an Apache Software Foundation project. HDFS, its storage component, was based on Google’s Google File System. Parallel Iron’s patent complaints, however, say the whole system was made possible by four men:

Project Serengeti

Serengeti is an open source project initiated by VMware to enable the rapid deployment of an Apache Hadoop cluster HDFS, MapReduce, Pig, Hive, .. on a virtual platform.

Serengeti 0.5 currently supports vSphere, with the ability to support other platforms. The project is at an early stage, and is endorsed by all major Hadoop distributions including Cloudera, Greenplum, Hortonworks and MapR.

via Project Serengeti.