API#

Note

This documentation is automatically generated from docstrings in the code. Please open an issue on GitHub if you find some missing documentation or if you have suggestions for improvements.

ALminer: ALMA archive mining and visualization toolkit#

A package for mining the Atacama Large Millimeter/submillimeter Array (ALMA) data archive and visualizing the queried observations.

alminer.catalog(target_df, search_radius=1.0, tap_service='ESO', point=False, public=True, published=None, print_query=False, print_targets=True)#

Query the ALMA archive for a list of coordinates or a catalog of sources based on their coordinates.

Parameters:
  • target_df (pandas.DataFrame) –

    Source names and coordinates.

    Index:

    RangeIndex

    Columns:

    Name: Name, dtype: str, description: target name (can be numbers or dummy names) Name: RAJ2000, dtype: float64, description: right ascension in degrees (ICRS) Name: DEJ2000, dtype: float64, description: declination in degrees (ICRS)

  • search_radius (float, optional) – (Default value = 1. arcmin) Search radius (in arcmin) around the source coordinates.

  • tap_service (str, optional) – (Default value = ‘ESO’) The TAP service to use. Options are: ‘ESO’ for Europe (https://almascience.eso.org/tap), ‘NRAO’ for North America (https://almascience.nrao.edu/tap), or ‘NAOJ’ for East Asia (https://almascience.nao.ac.jp/tap)

  • point (bool, optional) – (Default value = True) Search whether the specified position (ra, dec) is contained within any ALMA observations (point=True) or query all ALMA observations that overlap with a cone centred at the specified position (ra, dec) and extending the search_radius (point=False). In the case of point=True, the search_radius parameter is ignored.

  • public (bool, optional) – (Default value = True) Search for public data (public=True), proprietary data (public=False), or both public and proprietary data (public=None).

  • published (bool, optional) – (Default value = None) Search for published data only (published=True), unpublished data only (published=False), or both published and unpublished data (published=None).

  • print_query (bool, optional) – (Default value = True) Print the ADQL TAP query to the terminal.

  • print_targets (bool, optional) – (Default value = False) Print a list of targets with ALMA data (ALMA source names) to the terminal.

Return type:

pandas.DataFrame containing the query results.

alminer.CO_lines(observations, z=0.0, print_summary=True, print_targets=True)#

Determine how many CO, 13CO, and C18O lines were observed in the provided query DataFrame.

Parameters:
  • observations (pandas.DataFrame) – This is likely the output of e.g. ‘conesearch’, ‘target’, ‘catalog’, & ‘keysearch’ functions.

  • z (float64, optional) – (Default value = 0.) Redshift by which the frequencies should be shifted.

  • print_summary (bool, optional) – (Default value = True) Print a summary of the observations for each (redshifted) CO, 13CO, and C18O line to the terminal.

  • print_targets (bool, optional) – (Default value = True) Print the target names (ALMA source names) with ALMA data for each (redshifted) CO, 13CO, and C18O line to the terminal.

Return type:

pandas.DataFrame containing all observations of (redshifted) CO, 13CO, and C18O lines.

alminer.conesearch(ra, dec, search_radius=1.0, tap_service='ESO', point=False, public=True, published=None, print_targets=True, print_query=False)#

Query the ALMA archive for a given position and radius around it.

Parameters:
  • ra (float) – Right ascension in degrees (ICRS).

  • dec (float) – Declination in degrees (ICRS).

  • search_radius (float, optional) – (Default value = 1. arcmin) Search radius (in arcmin) around the source coordinates.

  • tap_service (str, optional) – (Default value = ‘ESO’) The TAP service to use. Options are: ‘ESO’ for Europe (https://almascience.eso.org/tap), ‘NRAO’ for North America (https://almascience.nrao.edu/tap), or ‘NAOJ’ for East Asia (https://almascience.nao.ac.jp/tap)

  • point (bool, optional) – (Default value = True) Search whether the specified position (ra, dec) is contained within any ALMA observations (point=True) or query all ALMA observations that overlap with a cone centred at the specified position (ra, dec) and extending the search_radius (point=False). In the case of point=True, the search_radius parameter is ignored.

  • public (bool, optional) – (Default value = True) Search for public data (public=True), proprietary data (public=False), or both public and proprietary data (public=None).

  • published (bool, optional) – (Default value = None) Search for published data only (published=True), unpublished data only (published=False), or both published and unpublished data (published=None).

  • print_query (bool, optional) – (Default value = True) Print the ADQL TAP query to the terminal.

  • print_targets (bool, optional) – (Default value = False) Print a list of targets with ALMA data (ALMA source names) to the terminal.

Return type:

pandas.DataFrame containing the query results

alminer.download_data(observations, fitsonly=False, dryrun=False, print_urls=False, filename_must_include='', location='./data', archive_mirror='ESO')#

Download ALMA data from the archive to a location on the local machine.

Parameters:
  • observations (pandas.DataFrame) – This is likely the output of e.g. ‘conesearch’, ‘target’, ‘catalog’, & ‘keysearch’ functions.

  • fitsonly (bool, optional) – (Default value = False) Download individual fits files only (fitsonly=True). This option will not download the raw data (e.g. ‘asdm’ files), weblogs, or README files.

  • dryrun (bool, optional) – (Default value = False) Allow the user to do a test run to check the size and number of files to download without actually downloading the data (dryrun=True). To download the data, set dryrun=False.

  • print_urls (bool, optional) – (Default value = False) Write the list of urls to be downloaded from the archive to the terminal.

  • filename_must_include (list of str, optional) – (Default value = ‘’) A list of strings the user wants to be contained in the url filename. This is useful to restrict the download further, for example, to data that have been primary beam corrected (‘.pbcor’) or that have the science target or calibrators (by including their names). The choice is largely dependent on the cycle and type of reduction that was performed and data products that exist on the archive as a result. In most recent cycles, the science target can be filtered out with the flag ‘_sci’ or its ALMA target name.

  • location (str, optional) – (Default value = ./data) directory where the downloaded data should be placed.

  • archive_mirror (str, optional) – (Default value = ‘ESO’) The archive service to use. Options are: ‘ESO’ for Europe (https://almascience.eso.org), ‘NRAO’ for North America (https://almascience.nrao.edu), or ‘NAOJ’ for East Asia (https://almascience.nao.ac.jp)

alminer.explore(observations, allcols=False, allrows=False)#

Control how much of the pandas.DataFrame with the query results is presented in the displayed table.

Parameters:
  • observations (pandas.DataFrame) – This is likely the output of e.g. ‘conesearch’, ‘target’, ‘catalog’, & ‘keysearch’ functions.

  • allcols (bool, optional) – (Default value = False) Show all 81 columns (allcols=True), or the first 18 columns (allcols=False).

  • allrows (bool, optional) – (Default value = False) Show all rows in the DataFrame (allrows=True), or just a summary (allrows=False).

Return type:

pandas.DataFrame containing the query results displayed to the user interface as specified by the user.

alminer.filter_results(TAP_df, print_targets=True)#

Add a few new useful columns to the pandas.DataFrame with the query results from the PyVO TAP service and return the full query DataFrame and optionally a summary of the results.

Parameters:
  • TAP_df (pandas.DataFrame) – This is likely the output of ‘run_query’ function.

  • print_targets (bool, optional) – (Default value = True) Print a list of targets with ALMA data (ALMA source names) to the terminal.

Return type:

pandas.DataFrame containing the query results.

alminer.get_description(column)#

Print the description of a given column in the query results DataFrame.

alminer.get_units(column)#

Print the units for a given column in the query results DataFrame.

alminer.get_info(column)#

Print the description and units of a given column in the query results DataFrame.

Parameters:

column (str) – A column in the pandas.DataFrame query table.

alminer.keysearch(search_dict, tap_service='ESO', public=True, published=None, print_query=False, print_targets=True)#

Query the ALMA archive for any (string-type) keywords defined in ALMA TAP system.

Parameters:
  • search_dict (dict[str, list of str]) – Dictionary of keywords in the ALMA archive and their values. Values must be formatted as a list. A list of valid keywords are stored in VALID_KEYWORDS_STR variable.

  • tap_service (str, optional) – (Default value = ‘ESO’) The TAP service to use. Options are: ‘ESO’ for Europe (https://almascience.eso.org/tap), ‘NRAO’ for North America (https://almascience.nrao.edu/tap), or ‘NAOJ’ for East Asia (https://almascience.nao.ac.jp/tap)

  • public (bool, optional) – (Default value = True) Search for public data (public=True), proprietary data (public=False), or both public and proprietary data (public=None).

  • published (bool, optional) – (Default value = None) Search for published data only (published=True), unpublished data only (published=False), or both published and unpublished data (published=None).

  • print_query (bool, optional) – (Default value = True) Print the ADQL TAP query to the terminal.

  • print_targets (bool, optional) – (Default value = False) Print a list of targets with ALMA data (ALMA source names) to the terminal.

Return type:

pandas.DataFrame containing the query results.

Notes

The power of this function is in combining keywords. When multiple keywords are provided, they are queried using ‘AND’ logic, but when multiple values are provided for a given keyword, they are queried using ‘OR’ logic. If a given value contains spaces, its constituents are queried using ‘AND’ logic. Words encapsulated

in quotation marks (either ‘ or “) are queried as phrases. Values for the ‘target_name’ keyword are queried with ‘OR’ logic.

Examples

keysearch({“proposal_abstract”: [“high-mass star formation outflow disk”]})

will query the archive for projects with the words “high-mass” AND “star” AND “formation” AND “outflow” AND “disk” in their proposal abstracts.

keysearch({“proposal_abstract”: [“high-mass”, “star”, “formation”, “outflow”, “disk”]})

will query the archive for projects with the words “high-mass” OR “star” OR “formation” OR “outflow” OR “disk” in their proposal abstracts.

keysearch({“proposal_abstract”: [“‘high-mass star formation’ outflow disk”]})

will query the archive for projects with the phrase “high-mass star formation” AND the words “outflow” AND “disk” in their proposal abstracts.

keysearch({“proposal_abstract”: [“‘star formation’”], “scientific_category”:[‘Galaxies’]})

will query the archive for projects with the phrase “star formation” in their proposal abstracts AND projects that are within the scientific_category of ‘Galaxies’.

alminer.line_coverage(observations, line_freq, z=0.0, line_name='', print_summary=True, print_targets=True)#

Determine how many observations were observed at a given frequency (+redshift).

Parameters:
  • observations (pandas.DataFrame) – This is likely the output of e.g. ‘conesearch’, ‘target’, ‘catalog’, & ‘keysearch’ functions.

  • line_freq (float64) – Frequency of the line of interest in GHz.

  • z (float64, optional) – (Default value = 0.) Redshift by which the frequency given in ‘line_freq’ parameter should be shifted.

  • line_name (str, optional) – (Default value = ‘’) Name of the line specified in ‘line_freq’.

  • print_summary (bool, optional) – (Default value = True) Print a summary of the observations to the terminal.

  • print_targets (bool, optional) – (Default value = True) Print a list of targets with ALMA data (ALMA source names) to the terminal.

Return type:

pandas.DataFrame containing all observations of line of interest.

alminer.plot_line_overview(observations, line_freq, z=0.0, line_name='', showfig=True, savefig=None)#

Create overview plots of observed frequencies, angular resolution, LAS, frequency and velocity resolutions, highlighting the observations of a give (redshifted) frequency with hatches on the bar plots.

Parameters:
  • observations (pandas.DataFrame) – This is likely the output of e.g. ‘conesearch’, ‘target’, ‘catalog’, & ‘keysearch’ functions.

  • line_freq (float64) – Frequency of the line of interest in GHz.

  • z (float64, optional) – (Default value = 0.) Redshift by which the frequency given in ‘line_freq’ parameter should be shifted.

  • line_name (str, optional) – (Default value = ‘’) Name of the line specified in ‘line_freq’.

  • showfig (bool, optional) – (Default value = True) Display the plot (showfig=True) or not (showfig=False).

  • savefig (str, optional) – (Default value = None) Filename (without an extension) for the plot to be saved as. Default file extension is PDF. Figure is saved in a subdirectory called ‘reports’ within the current working directory. If the directory doesn’t exist, it will be created. Default quality is dpi=300.

alminer.plot_overview(observations, mark_freq='', z=0.0, mark_CO=False, showfig=True, savefig=None)#

Create overview plots of observed frequencies, angular resolution, LAS, frequency and velocity resolutions.

Parameters:
  • observations (pandas.DataFrame) – This is likely the output of e.g. ‘conesearch’, ‘target’, ‘catalog’, & ‘keysearch’ functions.

  • mark_freq (list of float64, optional) – (Default value = ‘’) A list of frequencies to mark on the plot with dashed lines.

  • z (float64, optional) – (Default value = 0.) Redshift by which the frequencies given in ‘mark_freq’ and ‘mark_CO’ parameters should be shifted. Currently only one redshift can be given for all targets.

  • mark_CO (bool, optional) – (Default value = False) Mark CO, 13CO, and C18O frequencies on the plot with dashed lines.

  • showfig (bool, optional) – (Default value = True) Display the plot (showfig=True) or not (showfig=False).

  • savefig (str, optional) – (Default value = None) Filename (without an extension) for the plot to be saved as. Default file extension is PDF. Figure is saved in a subdirectory called ‘reports’ within the current working directory. If the directory doesn’t exist, it will be created. Default quality is dpi=300.

alminer.plot_observations(observations, mark_freq='', z=0.0, mark_CO=False, showfig=True, savefig=None)#

Create detailed plots of observations in each band. The x-axis displays the observation number ‘Obs’ column in the input DataFrame.

Parameters:
  • observations (pandas.DataFrame) – This is likely the output of e.g. ‘conesearch’, ‘target’, ‘catalog’, & ‘keysearch’ functions.

  • mark_freq (list of float64, optional) – (Default value = ‘’) A list of frequencies to mark on the plot with dashed lines.

  • z (float64, optional) – (Default value = 0.) Redshift by which the frequencies given in ‘mark_freq’ and ‘mark_CO’ parameters should be shifted. Currently only one redshift can be given for all targets.

  • mark_CO (bool, optional) – (Default value = False) Mark CO, 13CO, and C18O frequencies on the plot with dashed lines.

  • showfig (bool, optional) – (Default value = True) Display the plot (showfig=True) or not (showfig=False).

  • savefig (str, optional) – (Default value = None) Filename (without an extension) for the plot to be saved as. Default file extension is PDF. Figure is saved in a subdirectory called ‘reports’ within the current working directory. If the directory doesn’t exist, it will be created. Default quality is dpi=300.

alminer.plot_bands(observations, mark_freq='', z=0.0, mark_CO=False, showfig=True, savefig=None)#

Create overview and detailed plots of observed frequencies in each band.

Parameters:
  • observations (pandas.DataFrame) – This is likely the output of e.g. ‘conesearch’, ‘target’, ‘catalog’, & ‘keysearch’ functions.

  • mark_freq (list of float64, optional) – (Default value = ‘’) A list of frequencies to mark on the plot with dashed lines.

  • z (float64, optional) – (Default value = 0.) Redshift by which the frequencies given in ‘mark_freq’ and ‘mark_CO’ parameters should be shifted. Currently only one redshift can be given for all targets.

  • mark_CO (bool, optional) – (Default value = False) Mark CO, 13CO, and C18O frequencies on the plot with dashed lines.

  • showfig (bool, optional) – (Default value = True) Display the plot (showfig=True) or not (showfig=False).

  • savefig (str, optional) – (Default value = None) Filename (without an extension) for the plot to be saved as. Default file extension is PDF. Figure is saved in a subdirectory called ‘reports’ within the current working directory. If the directory doesn’t exist, it will be created. Default quality is dpi=300.

alminer.plot_sky(observations, showfig=True, savefig=None)#

Plot the distribution of the targets on the sky.

Parameters:
  • observations (pandas.DataFrame) – This is likely the output of e.g. ‘conesearch’, ‘target’, ‘catalog’, & ‘keysearch’ functions.

  • showfig (bool, optional) – (Default value = True) Display the plot (showfig=True) or not (showfig=False).

  • savefig (str, optional) – (Default value = None) Filename (without an extension) for the plot to be saved as. Default file extension is PDF. Figure is saved in a subdirectory called ‘reports’ within the current working directory. If the directory doesn’t exist, it will be created. Default quality is dpi=300.

alminer.run_query(query_str, tap_service='ESO')#

Run the TAP query through PyVO service.

Parameters:
Return type:

pandas.DataFrame containing the query results

alminer.summary(observations, print_targets=True)#

Print a summary of the observations.

Parameters:
  • observations (pandas.DataFrame) – This is likely the output of e.g. ‘conesearch’, ‘target’, ‘catalog’, & ‘keysearch’ functions.

  • print_targets (bool, optional) – (Default value = True) Print a list of targets with ALMA data (ALMA source names) to the terminal.

alminer.target(sources, search_radius=1.0, tap_service='ESO', point=False, public=True, published=None, print_query=False, print_targets=True)#

Query targets by name.

This is done by using the astropy SESAME resolver to get the target’s coordinates and then the ALMA archive is queried for those coordinates and a search_radius around them. The SESAME resolver searches multiple databases (Simbad, NED, VizieR) to parse names commonly found throughout literature and returns their coordinates. If the target is not resolved in any of these databases, consider using the ‘keysearch’ function and query the archive using the ‘target_name’ keyword (e.g. keysearch({‘target_name’: sources})).

Parameters:
  • sources (str or list of str) – list of sources by name. (IMPORTANT: source names must be identified by at least one of Simbad, NED, or Vizier)

  • search_radius (float, optional) – (Default value = 1. arcmin) Search radius (in arcmin) around the source coordinates.

  • tap_service (str, optional) – (Default value = ‘ESO’) The TAP service to use. Options are: ‘ESO’ for Europe (https://almascience.eso.org/tap), ‘NRAO’ for North America (https://almascience.nrao.edu/tap), or ‘NAOJ’ for East Asia (https://almascience.nao.ac.jp/tap)

  • point (bool, optional) – (Default value = True) Search whether the specified position (ra, dec) is contained within any ALMA observations (point=True) or query all ALMA observations that overlap with a cone centred at the specified position (ra, dec) and extending the search_radius (point=False). In the case of point=True, the search_radius parameter is ignored.

  • public (bool, optional) – (Default value = True) Search for public data (public=True), proprietary data (public=False), or both public and proprietary data (public=None).

  • published (bool, optional) – (Default value = None) Search for published data only (published=True), unpublished data only (published=False), or both published and unpublished data (published=None).

  • print_query (bool, optional) – (Default value = True) Print the ADQL TAP query to the terminal.

  • print_targets (bool, optional) – (Default value = False) Print a list of targets with ALMA data (ALMA source names) to the terminal.

Return type:

pandas.DataFrame containing the query results.

See also

keysearch

Query the ALMA archive for any (string-type) keywords defined in ALMA TAP system.

alminer.save_source_reports(observations, mark_freq='', z=0.0, mark_CO=False)#

Create overview plots of observed frequencies, angular resolution, LAS, frequency and velocity resolutions for each source in the provided DataFrame and save them in PDF format in the ‘reports’ subdirectory. If the directory doesn’t exist, it will be created.

Parameters:
  • observations (pandas.DataFrame) – This is likely the output of e.g. ‘conesearch’, ‘target’, ‘catalog’, & ‘keysearch’ functions.

  • mark_freq (list of float64, optional) – (Default value = ‘’) A list of frequencies to mark on the plot with dashed lines.

  • z (float64, optional) – (Default value = 0.) Redshift by which the frequencies given in ‘mark_freq’ and ‘mark_CO’ parameters should be shifted. Currently only one redshift can be given for all targets.

  • mark_CO (bool, optional) – (Default value = False) Mark CO, 13CO, and C18O frequencies on the plot with dashed lines.

Notes

Reports will be grouped by ALMA target names, therefore the same source with many different ALMA names will be treated as individual unique targets (e.g. TW_Hya, TW Hya, twhya).

alminer.save_table(observations, filename='mytable')#

Write the DataFrame with the query results to a table in CSV format.

The table will be saved in the ‘tables’ subdirectory within the current working directory. If the directory doesn’t exist, it will be created.

Parameters:
  • observations (pandas.DataFrame) –

  • filename (str) – (Default value = “mytable”) Name of the table to be saved in the ‘tables’ subdirectory.