DIRAC Postdoctoral Positions

The DIRAC Institute in the Department of Astronomy at the University of Washington is seeking applicants with a strong research record in the development of statistical techniques or algorithms for analyzing large astrophysical data sets for two postdoctoral positions.

AstroML: The first position is to help in the development of the second edition of astroML (http://astroml.org) a popular Python-based machine learning package for astrophysics. New components we are incorporating within astroML include methodologies from deep learning and hierarchical bayesian statistics. Special emphasis will be placed on building a broader community and making astroML a sustainable open-source project. The successful candidate will lead these activities, including the application of the new codes to dataset available to UW researchers.

Time Series Data: The second position is to develop new approaches for analyzing astronomical time series data using modern computational frameworks. The goal of this framework will be to enable science with the ZTF and LSST data sets. Promising applicants should possess an interest in time domain science and experience or interest in the use of databases and large scale compute platforms such as Spark, Dask, or similar. Good Python skills, and experience with machine learning libraries, image processing of astronomical images, or astronomical databases are desirable.

The DIRAC Institute is a newly formed center for data intensive astrophysics at the University of Washington. The Institute consists of six faculty and senior fellows, and over 20 postdoctoral researchers and research scientists. It has active research programs in Cosmology, Solar System science, Milky-Way structure, the Variable and Transient universe, andAstronomical Software.

The University of Washington is a partner in the Zwicky Transient Facility (ZTF) project, a new time-domain survey which will begin operations in early 2018. The UW is a founding partner of the LSST project, and leads the construction of its time domain and solar system processing pipelines. Other research activities at UW/DIRAC include topics in extragalactic science, as well as the understanding the structure, formation, and evolution of the Milky Way using large surveys (SDSS, WISE, PanSTARRS PS1, and others).

A Ph.D. degree in astronomy, physics, computer science, or a related subject is required. The initial appointment is for two years, renewable up to three years, and offers competitive salary and benefits. The appointments are available immediately and are expected to start no later than September 2018.

Applicants should submit a curriculum vitae, description of research interests (with links to Github if relevant) and arrange for three letters of reference to be submitted to Nikolina Horvat at horvat@uw.edu with subject line “DIRAC postdoc application (your name)”. Applications will be accepted until the positions are filled, to assure full consideration, please send your application by Dec 31st 2017

For detailed information about the benefits available through the University of Washington, including dental, medical and disability insurance, retirement, and childcare centers, see the University of Washington benefits page: https://www.washington.edu/admin/hr/benefits/.

The DIRAC Institute is a community of people with diverse interests and areas of expertise, engaged in the understanding of our universe through the analysis of large and complex data sets. We are an open, ethical, highly engaged and collaborative community based on trust, transparency and mutual respect. We believe in providing a welcoming and inclusive environment, in the importance of quality of life, in embracing diversity, in making a difference and having fun.

2020-2-10 Seminar: Stephanie Juneau

When:

Monday, February 10th, 2020 @ 12:30pm  

Where:

PAA, Room A214

The Influence of Giant Black Holes on the Fate of Galaxies, Stephanie Juneau (NSF’s OIR Lab)

Supermassive black holes – with masses of millions to billions of times that of the Sun – reside in the nuclei of galaxies. While black holes are not directly visible, surrounding material becomes extremely luminous before being accreted, creating telltale signatures of black hole activity. In turn, the amount of activity tells us about black hole growth, and about energy injection back into the host galaxies. This so-called black hole feedback is thought to play a role in regulating the rate at which galaxies form new stars, thereby affecting directly their evolution across cosmic time. After a brief overview, I will highlight new findings from a multi-scale analysis of gas ionization and dynamics thanks to 3D spectroscopy with the VLT/MUSE instrument. I will then present observational constraints on the fueling of black holes, and on the extent to which they can change the fate of galaxies from statistical analyses of large datasets derived from SDSS. The latter are paving the way to yet larger experiments such as the Dark Energy Spectroscopic Instrument (DESI), which will yield over 35 million spectra of galaxies and quasars. I will conclude by briefly showcasing how the Astro Data Lab (datalab.noao.edu) and other science platforms play a role in the analysis of large datasets to further our knowledge on supermassive black holes, galaxies, and beyond.

About Stephanie Juneau

Stephanie joined the NSF’s OIR Lab ASTRO Data Lab team as a staff scientist, coming from a staff scientist position at CEA Saclay in France. She received her PhD in astronomy from the University of Arizona in 2011 under the supervision of the NSF’s OIR Lab’s Mark Dickinson. Her research interests are focused on the evolution of galaxies and supermassive black holes across cosmic time. She brings to the Data Lab team a wealth of experience and ideas in developing and applying new methods for turning large survey data sets into scientific knowledge.

List of publications

ADS link features recently published papers by DIRAC researchers.

Astro Machine Learning

The astroML project was started in 2012 to accompany the book Statistics, Data Mining, and Machine Learning in Astronomy, by Željko Ivezić, Andrew Connolly, Jacob Vanderplas, and Alex Gray. 

The astroML Python package is publicly available and designed as a repository of statistical routines  and machine learning tools for astrophysics. It builds on the scientific Python ecosystem, on well known libraries such as Numpy, Scipy, Scikit-learn, and Astropy; extending the functionality available in these general-purpose libraries.  

astroML is designed to be a resource for both researchers and students of astronomy and Python.   It is envisioned to be a community resource, with the development and submission of new algorithms, data sets, and examples provided by GitHub’s collaborative coding interface. In addition to being used for astronomical research, several university courses build on astroML, for example at the University of Washington, University of Cambridge, and Drexel University to list a few.

astroML strives to bring the astronomical community closer to the ideals of Reproducible Research, in which research papers are accompanied by well-written code to reproduce, check, and extend the results. With this in mind we share the source code used  to generate the figures in both editions of the textbook in a separate GitHub repository.

Updates and news about astroML project can be found here.

KBMOD: Kernel Based Moving Object Detection

Searching for faint Solar System objects Kuiper Belt Objects (KBOs) are a population of Solar System objects that exist beyond the orbit of Neptune. Finding these objects is important because understanding their true distribution teaches us about the formation history of the Solar System and especially about the evolution of the orbits of the gas giants. However, due to the large distances from the Earth and Sun KBOs are very faint and hard to find. Some existing techniques for finding moving objects rely on the objects being bright enough for observation in a single image, but here at DIRAC we are working on a type of technique based upon “shift-and-stack” algorithms.

Find more information in the following papers [Gladman & Kavelaars 1997, Kuiper Belt searches from the Palomar 5-m telescope]; [Allen et al. 2001, The Edge of the Solar System]; [Bernstein et al. 2004, The Size Distribution of Trans-Neptunian Bodies].

“Shift-and-stack” techniques are able to find objects that are fainter than those that can be found in a single image by shifting multiple images of the same part of the sky along the path of a potential orbit and adding up the light from any moving objects following that orbit. Static objects blur out while potential moving objects pop out as a point source as in the figure below from our paper [Whidden et al. 2019, Fast Algorithms for Slow Moving Asteroids: Constraints on the Distribution of Kuiper Belt Objects].

Shift and Stack Example

Digital Tracking This “shift-and-stack” method originally worked by shifting along a limited number of trajectories on the order of a few dozen and then looking by eye for point sources in the resulting stacks. More recently, astronomers developed “digital tracking” where we use computers to search many more possible trajectories and find point sources in large stacks of data. As we move to larger stacks of images and longer baselines in our search we are able to find fainter and slower moving objects but this also creates challenges as our search parameter space becomes much larger as we go further in time. This is because we want to find the slowest objects without missing faster ones and must search a much larger group of possible orbits. To help solve this problem we developed our technique Kernel Based Moving Object Detection (KBMOD).

KBMOD To tackle the problem of searching on the order of 10<sup>12</sup> possible orbits we have turned to using Graphics Processing Units (GPUs). GPUs are much better suited to highly parallel applications than traditional CPUs and since we are performing the same operation of adding up the flux values along trillions of trajectories repeatedly the GPU is perfect for our algorithm. In fact, KBMOD is capable of searching on the order of 10<sup>10</sup> trajectories in a stack of 10-15 4k-by-4k images in a minute using a consumer grade GPU. The software is up and running and in our first application of KBMOD to the 2015 HITS dataset [Förster et al. 2016, The High Cadence Transient Survey (HITS). I. Survey Design and Supernova Shock Breakout Constraints] we discovered 39 new KBOs that were reported to the Minor Planet Center as well as recovering 6 previously reported objects. Further development of KBMOD is ongoing and we are applying it to new and different datasets.

To follow along with our progress stay tuned to our GitHub Repository.

ADAM: Asteroid Decision Analysis and Mapping

The increased number of asteroid discoveries over the past few decades as well as the expectation that the number of known asteroids will increase by five times when the Large Synoptic Survey Telescope (LSST; https://www.lsst.org/) is expected to come online in 2022 bring an increased number of potentially hazardous asteroid (PHA) discoveries. PHAs are near-Earth objects – either asteroids or comets – for which the closest points between their orbits and the Earth’s orbit are less than 0.05 astronomical units (19.5 times the distance between the Earth and Moon) apart and with diameters of approximately 460 ft (140 m) or greater. An object this large is big enough to cause devastation to a populated region in the case of a land impact or a major tsunami for an impact into the ocean. 

Not all PHAs are likely to impact the Earth in the foreseeable future, but their orbits put them in close enough proximity to the Earth that they need to be monitored in case their impact probability changes. One way the chances of an impact can change is if a PHA comes sufficiently close to a planet for it to gravitationally tug on the asteroid, changing its orbit enough that its future trajectory puts it on an impact course with the Earth. Fortunately, however, the threat due to asteroid impacts is a natural disaster we have the ability to avoid. 

One of the projects supported by the DiRAC Institute is the Asteroid Decision Analysis and Mapping (ADAM) platform being developed by the Asteroid Institute, a program of B612. B612 is a non-profit organization dedicated to protecting the Earth from asteroid impacts as well as advising and advancing decision-making on planetary defense issues on a world-wide scale (https://b612foundation.org).

The ADAM platform is being developed to answer questions such as “How long after discovery does it takes for typical Earth-impacting asteroids to be labeled as impact threats?” and “How far in advance do we need to deflect such asteroids to avoid a collision?”. To answer these questions and others, ADAM is being built in Google Cloud, which allows the required computations to be run on a large-scale platform that provides ample data for analysis. One of the goals of ADAM is to make these computations accessible to the greater scientific community, not only in scale and accuracy, but in ease of use. ADAM is being developed as open-source software that upon completion of initial development and testing will be available to the scientific community to both use and contribute to its computational abilities. 

One capability of ADAM is to compute large-scale asteroid orbit propagations that predict the orbital characteristics and locations of a large set of asteroids at times in the future given their current orbital characteristics. The animation shown here is of an orbit propagation of a synthetic Earth-impacting asteroid that shows the orbital motion of the four terrestrial planets (Mercury, Venus, Earth, and Mars) as well as the asteroid (labeled 129_2011_04_DeltaV) over a period of roughly 8 months. The animation ends when the asteroid impacts the Earth. This animation was produced using the tools upon which ADAM’s orbit propagation is based; visualization of orbit propagations will eventually be a benefit of computing propagations with ADAM.

Along with computing large-scale orbit propagations, ADAM can calculate the deflection impulse, or nudge, needed to avoid an asteroid impact. Such a nudge could be imparted to an asteroid using a spacecraft called a kinetic impactor, which would rendezvous with an Earth-impacting asteroid before it were to collide with the Earth to gently push the asteroid an amount large enough to avoid the collision. The goal of such a maneuver would be to avoid the asteroid and the Earth being at the point where their orbits intersect at the same time, thus avoiding a collision. This is similar to either stepping on the brakes or the gas in your car to avoid a traffic collision.  

In a study1 recently submitted to the journal Icarus and expected to be published in early 2020 by members of the Asteroid Institute and the DiRAC Institute lead by Dr. Sarah Greenstreet using ADAM to determine the distribution of nudges needed by a large sample of synthetic Earth-impacting asteroids to avoid collision with Earth, researchers found that required nudges range from a few hundredths of an inch per second to a few inches per second, depending on the time before impact available to impart the nudge. In terms of the amount of energy this would impart to the asteroid, for a 450-ft-diameter asteroid made of typical rocky material, a nudge of roughly half an inch per second is the equivalent of the energy required to power a 60 W light bulb for one hour. 

The researchers found that the required deflection impulse, or nudge, typically changes roughly as the inverse of the time before impact that the deflection impulse can be applied. This means that a nudge applied to an asteroid 20 years before impact needs to be approximately half the size of a nudge applied 10 years before impact to miss the Earth by the same distance.  

Another finding of the study described above is that a small fraction of the synthetic Earth-impacting asteroid population studied require either 10 times more or less velocity change (nudge) than the median value. This means some asteroids are either much harder or easier to deflect than the typical Earth-impacting asteroid. These types of impact scenarios are important to study in addition to the typical cases to best understand the full breadth of the threat due to asteroid impacts. 

An additional capability of ADAM currently being developed is determining the evolution of the probability of impacts for a large sample of synthetic Earth-impacting asteroids. For a given asteroid, as further observations are made of the asteroid after discovery, the orbit of the asteroid evolves. Each new observation adds additional data that can be used to compute an orbit for the asteroid, with each new calculation producing a different orbit until the evolution stabilizes and further observations do very little to change the orbit. As the determined orbit evolves, so does the probability of a future Earth-impact for the computed orbit. Like the evolution of the asteroid’s orbit, the impact probability changes with additional observations. Studying the impact probability evolution of a large sample of synthetic Earth-impacting asteroids can provide a better understanding of how long it can take to say with confidence that an impact is expected to occur for a wide range of Earth-impact scenarios. 

Altogether, the capabilities of ADAM are helping us to better understand the threat due to asteroid impacts and what we can do to avoid them. This information can further future discussions and decisions regarding impact hazard mitigation on a global scale, as is the mission of the Asteroid Institute.

1Greenstreet, S., Lu, E., Loucks, M., Carrico, J., Kichkaylo, T., & Jurić, M., “Required deflection impulses as a function of time before impact for Earth-impacting asteroids”, 2019, Icarus, reviewed.

Dr. Sarah Greenstreet is a joint postdoctoral fellow with the Asteroid Institute, a program of B612, and the DiRAC Institute at the University of Washington. Her research interests include the study of orbital dynamics and impacts of small bodies in the Solar System.

THOR: Tracklet-less Heliocentric Orbit Recovery

Key Points:

  • Discovering Solar System small bodies (asteroids and comets) is not easy
  • Currently deployed algorithms impose a strict constraint on how telescopes operate and what datasets can be used to discover these objects
  • Tracklet-less Heliocentric Orbit Recovery (THOR) is a state-of-the-art algorithm designed to remove this restriction and open up the possibility of more discoveries
  • Built on top of a well understood physical framework, and by leveraging the latest technologies in high performance cloud computing and machine learning, THOR is capable of discovering moving objects in datasets that would otherwise be unsuitable or unoptimized for Solar System small body discovery

Our Solar System is the current frontier of human and robotic exploration. Part of planetary science and Solar System astronomy is tasked with informing decisions regarding where and what to explore next. The DIRAC Institute has been helping answer these questions by developing state-of-art algorithms that enable the discovery of asteroids and comets — small bodies in our Solar System that very well may be the next points of interest on the ever growing map of our Solar System. 

Discovering small bodies is not an easy task. Unlike most astrophysical objects, small bodies move on appreciable time scales; asteroids and comets can move at a wide variety of different speeds and our motion, the motion of the observer, also complicates the problem. As bigger telescopes are being built, the number of observations that need to be processed also increases dramatically. The problem is akin to throwing a bucket full of sand on a table and taking a picture of the table, and then having a friend shake the table, taking a new picture. Your task is to figure out which grain moved where using just those two images. If you imagine having unlimited computing resources, the way you would approach this problem is by letting a telescope observe the sky, and every time a new detection occurs, you test all other unidentified detections for a possible linkage with this one detection. These linkages are known as orbits and define how an object moves in space. To discover a moving object is to know, with a high degree of certainty, its orbit. When a telescope generates millions of new detections in a single evening it is simply not possible to test every combination of un-linked detections for an orbit. 

Astronomers figured out a way so solve this problem: the “tracklet”. A “tracklet” is a combination of two or more detections, with the time between detections typically no more than 30 minutes apart. A tracklet, which is essentially just a motion vector, constrains the position and speed of a potential moving object. In a 30-minute time span, an asteroid or comet can only have moved a certain distance, and so, by limiting the time between two exposures on the sky, a survey telescope limits the number of combinations of detections it would need to test for an orbit down the road. Typically, to discover a moving object a telescope needs to observe three tracklets (three pairs of at least two detections) over a two-week window. 

Tracklets are problematic. By requiring tracklets for moving object discovery, it requires telescopes to operate in a very specific way. Telescopes must come back to the same area of the sky to take at least a second exposure within 30 minutes. Effectively, to discover moving objects by building tracklets you are limiting a telescope to, at best, observe only half the amount of sky it could observe in a single night. Datasets from past missions or surveys, that did not have this specific cadence of operation are also unsuitable to do retrospective searches of moving objects since they don’t allow for the building of tracklets.

Tracklet-less Heliocentric Orbit Recovery. Aside from the awesome acronym, THOR aims to solve these problems by removing the need for “tracklets” to be observed. The algorithm makes use of certain aspects of the motion of small bodies in the Solar System. THOR assumes a series of test orbits, when assuming a test orbit you know exactly where in space that potential object will be at any point in time in the past or future. Which means you can look for that potential object in datasets from any survey regardless of the time between detections (ie, no tracklets needed!). Naively, if there are 800,000 objects you would need to test 800,000 orbits to discover them all. However, small bodies in the Solar System tend to have orbits that are similar. THOR utilizes this fact, and so as opposed to needing one orbit to discover one object, a single orbit can be used to discover hundreds or even thousands of objects. The power of the THOR framework is that all you need to discover more moving objects is another well-selected test orbit. 

What has THOR achieved thus far? We ran THOR on two weeks worth of detections from the Zwicky Transient Facility (a survey in operation from the Palomar Observatory in California). ZTF’s internal tracklet-based moving object algorithm in that two week period was able to recover about 14,000 previously known moving objects. THOR running on the exact same dataset recovered a little over 21,000 objects (97% of the objects with at least five detections), a factor of 1.5 improvement. We are now working to deploy a newer, better, and faster version of THOR on all of the detections coming from ZTF. 

THOR is a completely open-source project. Find it here on Github: https://github.com/moeyensj/thor 

Letter From the Director

Welcome to the DiRAC Institute newsletter. As we head towards the winter solstice, if you were lucky with the clouds and the near full moon, you may have seen the Geminid meteor shower which peaked early in the morning of December 14th. The meteors you saw streaking across the sky are the results of debris burning up in our atmosphere, shed from the asteroid (3200) Paethon. 

We have been getting quite a few questions about another set of streaks and objects that cross the sky, the Starlink satellites. These satellites are the first of a new generation of communication satellites that will be launched over the next few years. If you get the chance to see them just after launch they appear as a train of bright dots that move rapidly across the sky. They are a spectacular view but, given the sensitivities of professional astronomical telescopes and instruments, if the numbers of satellites reach the tens of thousands that have been proposed they could pose a significant challenge for future astronomical observations. Some of our Large Synoptic Survey Telescope (LSST) scientists are working to understand the impact these constellations of satellites might have on LSST science and ways we might mitigate these effects.

In the context of the LSST, we recently had the opportunity to present some of our ideas and work on new ways of thinking about astronomy in the era of big-data. At the Petabytes-to-Science conference in October, Mario Jurić and Dino Bektešević demonstrated their work on frameworks that can use the cloud to process the mass of images that will come from LSST and then stream and analyze the millions of “events” (e.g. supernova, variable stars, and asteroids) that will be discovered from these images.

Looking forward, many of the researchers in DiRAC are busily preparing for the next American Astronomical Society (AAS) meeting in January. At the AAS, DiRAC will be hosting a workshop to introduce the 2nd edition of our machine learning for astronomy textbook (the “astroML book”) which was released in December. Our newest DiRAC Fellows, Kyle Boone and Keaton Bell (who is also an NSF postdoctoral fellow), will be presenting their research on supernova cosmology and the seismology of stars. Perhaps the most fun project that we will be presenting resulted from an idea to see if, by working as a group and using the big-data analysis tools developed at DiRAC and the data from the Zwicky Transient Facility, we could find new examples of the mysterious Tabby’s star (one of the most unusual variable stars in our galaxy). Watching a group of researchers collaboratively explore massive data sets in real time (collaboratively means lots of yelling of ideas and laughing) is a wonderful thing. You can read more about this fascinating work from Meredith Rawls a little later in this newsletter.

I hope you will enjoy reading about the research that is ongoing at DiRAC (38 publications this year) and will join us for some of the public events we have planned for coming year.

Sincerely,

Andrew Connolly

Professor, Department of Astronomy

Director, DiRAC Institute

Read all December Newsletter articles here

Meet DiRAC’s Research Team: Dr. Kyle Boone

Understanding why the expansion of the universe is getting faster with time is one of the biggest mysteries in cosmology today. Kyle Boone, a fellow at the DiRAC Institute, focuses his research on developing novel statistical methods for astronomy and cosmology. He is particularly interested in using Type Ia supernovae to probe the accelerated expansion of the universe that researchers believe is due to some form of “dark energy”.

Dr. Boone is a DiRAC Postdoctoral Fellow. He joined the DiRAC institute in September 2019 after finishing up his PhD in Physics at the University of California in Berkeley, California. He previously received a Bachelors degree in Engineering Physics at the University of British Columbia in Vancouver, Canada in 2013.

Type Ia supernovae are interesting for cosmology because they can be used as “standard candles” to make a map of the universe. In the late 1990s, with a sample of only 50 Type Ia supernovae, researchers were able to discover that the expansion of the universe was accelerating. Upcoming astronomical surveys such as the Large Synoptic Survey Telescope (LSST) will discover hundreds of thousands of Type Ia supernovae, and will enable researchers to probe the fundamental nature of dark energy and the accelerated expansion of the universe.

Dr. Boone is working on developing statistical techniques to handle this deluge of data. One challenge for LSST is that there will not be enough resources to follow up every transient that is discovered. Dr. Boone developed the software package “avocado” that uses machine learning to identify what these transients are. With the algorithms in this package, Dr. Boone won a Kaggle competition that simulated the challenges that LSST will face out of 1094 different teams. The results of this work will be published in a paper that has been accepted by the Astronomical Journal.

In a series of papers in preparation, Dr. Boone is working on developing better methods to estimate the distances to Type Ia supernovae. Dr. Boone has shown that manifold learning can be used to capture the diversity of Type Ia supernovae that we see in the universe. His results show that our current methods of estimating distances to Type Ia supernovae are highly biased, and could lead surveys such as LSST to make incorrect conclusions about the properties of dark energy.

Dr. Boone is very grateful to the DiRAC institute for warmly welcoming him, and is excited to collaborate with researchers there. The DiRAC institute is heavily involved with several large scale transient surveys, such as ZTF and LSST, and has a wide range of researchers working on applying machine learning to astronomical time series data. Dr. Boone is also an eScience Postdoctoral Fellow, and is interested in collaborating with researchers outside of astronomy who are working on data science problems with similar time series data.

DiRAC Researchers Present at the AAS Meeting 2020

Searching for Boyajian’s Star Analogs with the Zwicky Transient Facility

The Zwicky Transient Facility (ZTF) survey is a treasure trove for time domain science, and members of DiRAC’s Time Domain & Inference group are collaborating to comb through some 200 million light curves to search for candidate Boyajian’s Star analogs.

Boyajian’s Star is a unique F star observed by the Kepler Space Telescope to have repeated, infrequent short-term variability (“dips”) and a long-term gradual decrease in brightness. The physical cause of the variability is not understood. One of the big questions is still there, is it unique or are there a whole families that have a similar patterns of variability.

We have developed a tool to find similar light curve dips and applied it to stars observed by ZTF in a 2000-square-degree region of sky that has been observed in a high-cadence mode. Of about 180 million total objects with high-cadence light curves, we found 40,000 potential dippers.

The bulk of the work on this project has been accomplished during weekly group meetings, where one person shares their screen and we all collaboratively write code and make plots. This innovative group project was proposed by James Davenport, and Kyle Boone has taken the lead on much of the computational work. Other contributors include Meredith Rawls, Colin Slater, Keaton Bell, Eric Bellm, Brigitta Sipocz, Bryce Kalmbach, and Daniela Huppenkothen.

We are now working to cross-match these sources with Gaia, WISE, and other survey catalogs to eliminate known variables such as young stellar objects and eclipsing systems. We will present the results of the project at a poster at the 235th AAS meeting in January 2020.

LSST Data in the Clouds

The Large Synoptic Survey Telescope (LSST) is an upcoming sky survey that aims to conduct a 10 year long survey from which we hope to answer questions about dark matter, dark energy, hazardous asteroids and the formation and structure of the Milky Way. To find these answers LSST images the entire night sky every three nights. It is estimated that in the 10 years of operations LSST will deliver 500 petabytes (PB) of data – largest, to date, released astronomical dataset.

Science catalogs, on which most of the science will be performed, are produced by image reduction pipelines that are a part of LSST’s code base called Science Pipelines. While LSSTs Science Pipelines adopt a set of image processing algorithms and metrics that cover as many science goals as possible, and while the LSST will set aside 10% of their compute power to be shared by the collaboration members, enabling processing of the underlying pixel data by scientists remains a very challenging problem. The largest obstacle to wide-spread data processing is the sheer data volume that will be produced by LSST, which requires large compute infrastructure. If pixel data re/processing were accessible to more astronomers it would undoubtedly repeatability, reproducibility and would, in general, increase the type and quantity of science that can be done with the data.

The tech industry, which has in a lot of cases significantly surpassed LSST’s data volumes, has adopted cloud based solutions because of their ability to scale up and down dependent on the size and complexity of the data. LSST Data Management (DM) commissioned an Amazon Web Services (AWS) Proof of Concept (PoC) group to determine whether a cloud deployment of the LSST codebase is feasible (to measure its performance and determine the cost of cloud-native options).

The first results of this work were presented at the Petabytes to Science conference in Boston where Dino Bektesevic and colleagues from LSST and Amazon demonstrated how the LSST Science Pipelines can be run on Amazon’s cloud, scaling up to thousands of compute cores. The preliminary tests indicate that the cloud definitely has the potential for significant scaling while still remaining affordable. 

Read more here.