ASKAP: Data-Intensive Science

This page showcases some of my recent technical work with the WALLABY team. This page offers an overview of the projects I've contributed to and the solutions I've developed. Througout my time with WALLABY, I've worn many hats, including roles in the data processing team, leading a data quality assurance project, and contributing to the project management team.


1.0 Intro - ASKAP

ASKAP (Australian Square Kilometre Array Pathfinder) is a state-of-the-art radio telescope located at the Inyarrimanha Ilgari Bundara, part of the Murchison Radio-astronomy Observatory in Western Australia. It is one of the precursors to the global Square Kilometre Array (SKA), which is one of the world's largest and most ambitious radio astronomy projects. ASKAP is designed to explore the Universe in unprecedented detail, enabling scientists to conduct cutting-edge research in astronomy.

The ASKAP telescope consists of 36 dish antennas that work together as a single telesocope. Each antenna is equipped with an advanced phased array feed (PAF) - a specialized "camera" for the sky, dramatically increasing the speed at which ASKAP can survey the sky. This advanced technology and innovative design (developed in Australia) make it a key player in the future of radio astronomy, providing valuable insights into the origins of the Universe and much more.


1.1 Intro - WALLABY

WALLABY (Widefield ASKAP L-band Legacy All-sky Blind surveY) is a one of the key scientific initiatives using the ASKAP telescope. It is a top-ranked survey that focuses on mapping the neutral hydrogen gas (HI) in galaxies across a large portion of the southern sky. By studying the hydrogen at a 21 cm wavelength, WALLABY aims to deepen oour understanding of galaxy formation and evolution, providing valuable insights into the structure of the Universe.


2.0 Data-Intensive Science - Data Processing

ASKAP is a data driven facility, generating extremely high data rates. The incoming data rates at the Pawsey Supercomputer Centre in Perth, WA is approximately 2.5 Gbytes per second, which translates to 75 Petabytes (PB) per year. The data volume is beyond what current systems can store. To manage this huge volume of data, ASKAP processes it almost immediately using automated systems. Below is a schematic diagram of the data processing flowchart.

During the initial testing phase of ASKAP, I worked closely with the ASKAP computing team to optimize the parameters required to produce the final science-ready product for the WALLABY team (end user). Below is a schematic diagram of the data processing flowchart.

Data Processing Flowchart

2.1 Data Quality Assurance

Aside from tweaking the data processing parameters, delivering a high quality data product to the WALLABY team (end user) is upmost important. I led the QA development by deciding what statistics were best suit to evaluate the data quality. The task is written in Python, which the script generates a HTML style report. Below is a schematic diagram of the data quality assurance process.

QA Chart

2.1.1 Data Quality Assurance - Script

Below is a schematic diagram of what the Python script does. It goes through the standard data engineering ETL process without the commercial ETL tools. The concept remains the same in this context. The HTML style data validation report along with the approved data are passed/loaded onto the database. The script is available on GitHub (Click here). A documentation explaining each metric is also given on there.

Script Chart

2.1.2 Data Quality Assurance - HTML Style Report

There are two goals to be achieved with this HTML style report. 1) An easy to understand visualization for WALLABY QA team to decide on accepting or rejecting the data quickly; 2) User is able to gain basic information about the data, such as observation ID and targeted sky position, as well as data quality. I adopted a "traffic light" system (red, yellow, green) and color gradient for the data QA metrics. Below shows what the HTML style report looks like.

Script Chart
Script Chart

3.0 End User - Astronomer / Data Scientist

Astronomers possess strong data analysis skill, which are highly transferable to roles in data science and analysis. We use a range of tools and techniques, from collecting data with advanced telescopes to developing sophisticated computer models and performing in-depth data analysis. The following schematic diagram illustrates these skill sets, though it is simplied for clarity. In reality, the work of an astronomer is much more complex than this, requiring exceptional problem-solving abilities and keen ability to think critically. These skills are crucial in tackling the complex challenges we encounter in both research and practical applications.

Script Chart