Information Extraction and Synthesis Laboratory

Cross-document coreference resolution is the task of grouping the entity mentions in a collection of documents into sets that each represent a distinct entity. It is central to knowledge base construction and also useful for joint inference with other NLP components. Obtaining large, organic labeled datasets for training and testing cross-document coreference has previously been difficult. We use a method for automatically gathering massive amounts of naturally-occurring cross-document reference data to create the Wikilinks dataset comprising of 40 million mentions over 3 million entities. Our method is based on finding hyperlinks to Wikipedia from a web crawl and using anchor text as mentions. In addition to providing large-scale labeled data without human effort, we are able to include many styles of text beyond newswire and many entity types beyond people.

via Wikilinks – Information Extraction and Synthesis Laboratory.

Finding the Source of the Pioneer Anomaly

These spacecraft also underscore the value of data preservation. In the early days of the Pioneer missions, scientists and engineers often viewed the medium as more valuable than the data it contained. Many considered raw data to be worthless once “useful” scientific and technical information had been extracted. Nowadays data storage may be cheap, but we’re still in danger of suffering from shortsightedness when it comes to data custodianship. Every experiment needs a clear plan in place to ensure that a record of the original observations is still available and readable, even decades into the future. It may very well be the only way we’ll resolve the next confounding mystery.

via Finding the Source of the Pioneer Anomaly – IEEE Spectrum.

The elusive capacity of networks

What makes that question particularly hard to answer is that no one knows how to calculate the data capacity of a network as a whole — or even whether it can be calculated. Nonetheless, in the first half of a two-part paper, which was published recently in IEEE Transactions on Information Theory, MIT’s Muriel Médard, California Institute of Technology’s Michelle Effros and the late Ralf Koetter of the University of Technology in Munich show that in a wired network, network coding and error-correcting coding can be handled separately, without reduction in the network’s capacity. In the forthcoming second half of the paper, the same researchers demonstrate some bounds on the capacities of wireless networks, which could help guide future research in both industry and academia.

via The elusive capacity of networks – MIT News Office.

Network coding

Network coding – Wikipedia, the free encyclopedia.

Network coding is a technique where, instead of simply relaying the packets of information they receive, the nodes of a network will take several packets and combine them together for transmission. This can be used to attain the maximum possible information flow in a network. Network coding is a field of information theory and coding theory.