← news

Breakthrough Listen Releases Largest-Ever Dataset of Radio Observations

Two petabytes of raw SETI data now available to researchers worldwide through open science portal.

VERIFIED

Date on File

June 2, 2022

Berkeley, California — The Breakthrough Listen initiative, humanity's most comprehensive search for extraterrestrial radio signals, has published its entire observational dataset—2 petabytes of raw radio data—to a publicly accessible online portal. The move represents an unprecedented democratization of SETI science, placing decades of observations into the hands of researchers worldwide at no cost.

Breakthrough Listen is funded by Yuri Milner's Breakthrough Initiatives and operates through partnerships with major radio telescopes including the Green Bank Telescope in West Virginia, the Parkes Observatory in Australia, and the Murchison Widefield Array in Western Australia. Since 2015, the project has conducted the most sensitive and comprehensive search for narrowband radio signals from beyond Earth ever attempted.

"This is science at scale," said Andrew Siemion, Director of the UC Berkeley SETI Research Center, which hosts and maintains the Breakthrough Listen database. "We're not just releasing observations—we're releasing the raw, uncalibrated data. Any researcher can download it, process it with their own algorithms, and search for signals we might have missed."

What's in the Archive

The 2-petabyte collection represents observations of over 1 million nearby stars, nearby galaxies, and the galactic plane. The data spans multiple frequency ranges, from roughly 400 megahertz to 100 gigahertz, covering the most promising portions of the electromagnetic spectrum for SETI work. Included are observations from dedicated SETI campaigns as well as data collected as "piggyback" observations—instances where Breakthrough Listen uses radio telescopes during their routine astronomical observations.

Each observation includes not just the raw intensity measurements but also information about signal polarization (the orientation of electromagnetic waves), spectral characteristics, and the position and timing of observations. The dataset is organized by source, frequency range, and date, allowing researchers to reconstruct the sky as Breakthrough Listen saw it on any given night.

Much of the data has already been processed through Breakthrough Listen's primary signal-detection pipeline, which searches for narrowband signals—the kind that would most clearly indicate artificial origin. A narrowband signal is precisely the opposite of natural radio emission. Pulsars, magnetars, and active galactic nuclei produce broadband "static" across wide frequency ranges. An artificial radio transmitter, by contrast, would concentrate its power into a narrow frequency band to maximize efficiency. It's the kind of signal that would stand out unmistakably.

"Our detection pipeline looks for candidates with signal-to-noise ratios above threshold, then applies a series of filters to rule out terrestrial and instrumental artifacts," explained Siemion. "But the pipeline is only as good as the algorithms that define it. Release the raw data, and you invite the global research community to bring fresh eyes—and fresh algorithms—to the problem."

Why Open Science Matters

The decision to release raw data reflects a philosophy of radical transparency in SETI. The old model—where a small team of researchers controlled data and publication timelines—has given way to open science. If Earth receives a signal from an extraterrestrial source, its discovery should not hinge on the expertise or biases of any single research group.

Open data also accelerates the development of new analysis techniques. Machine learning researchers can train neural networks on the Breakthrough Listen dataset to recognize signal patterns humans might miss. Graduate students at universities worldwide can use the data for their dissertations. Amateur astronomers and citizen scientists can download portions of the archive and contribute their own analyses.

The archive includes metadata on Earth-originating radio interference—known as radio frequency interference, or RFI. Satellites, broadcast stations, cell towers, and radar installations all produce signals that mimic narrowband transmissions. Learning to distinguish genuine extraterrestrial candidates from RFI is half the battle in SETI work. The released dataset includes examples of both, providing a training ground for developing better filters.

Access and Future Work

The Breakthrough Listen open data portal is located at the UC Berkeley SETI Research Center and accessible via standard HTTP and FTP protocols. Data is organized by observation date and source, with comprehensive metadata for each file. For researchers interested in specific stars or galaxies, the portal includes search tools to locate observations.

Breakthrough Listen continues to conduct new observations and will add incoming data to the archive on a rolling basis. The project is also working with the upcoming Square Kilometre Array (SKA), a next-generation radio telescope under construction in South Africa and Australia, which will offer unprecedented sensitivity.

"The release of this dataset is not the end of Breakthrough Listen's work—it's an inflection point," Siemion said. "We've moved from a era where SETI was done by a few facilities with limited computing power, to one where the entire field can participate. The next breakthrough could come from someone we haven't even met yet, working in a country we didn't expect, using algorithms we haven't invented. That's the promise of open data."

For researchers interested in accessing the dataset, documentation and download instructions are available through the Breakthrough Listen website.

Related Files

Attached Sources

  • [1] Breakthrough Listen open data portal and press release (June 2022)
  • [2] UC Berkeley SETI Research Center announcement