Hdf5 For Mac

The h5py package is a Pythonic interface to the HDF5 binary data format. It lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays. It has a plug-in architecture that can support other file formats or remote resources. In addition to HDF5, Compass comes with plugins for ASCII Grid, BAG, and OPeNDAP resources. A free and open source software product, HDF Compass is cross-platform and runs on Windows (7 or later), Linux (x86, 64bit), and Mac OS X (Mountain Lion or later). HDF5DIR: a shortcut for common installations, a directory with lib and include subdirectories containing compiled libraries and C headers. HDF5PKGCONFIGNAME: A name to query pkg-config for. If none of these options are specified, h5py will query pkg-config by default for hdf5, or hdf5-openmpi if building with MPI support. The h5py package is a Pythonic interface to the HDF5 binary data format. It lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays.

Introduced in release: 1.18.

Hierarchical Data Format (HDF) is a set of file formats designed to store and organize large amounts of data 1. Originally developed at the National Center for Supercomputing Applications, it is supported by The HDF Group, a non-profit corporation whose mission is to ensure continued development of HDF5 technologies and the continued accessibility of data stored in HDF 2.

This plugin enables Apache Drill to query HDF5 files.

Configuring the HDF5 Format Plugin

There are three configuration variables in this plugin and which are tabled below.

OptionDefaultDescription
type(none)Set to “hdf5” to make use of this plugin
extensions“.h5”This is a list of the file extensions used to identify HDF5 files. Typically HDF5 uses .h5 or .hdf5 as file extensions.
defaultPathnullThe default path defines which path Drill will query for data. Typically this should be left as null in the configuration file. Its usage is explained below.

Example Configuration

For most uses, the configuration below will suffice to enable Drill to query HDF5 files.

Usage

Since HDF5 can be viewed as a file system within a file, a single file can contain many datasets. For instance, if you have a simple HDF5 file, a star query will produce the following result:

Hdf5 For Mac Versions

Hdf5 for mac installer

The actual data in this file is mapped to a column called int_data. In order to effectively access the data, you should use Drill’s FLATTEN() function on the int_data column, which produces the following result.

apache drill> select flatten(int_data) as int_data from dfs.test.dset.h5;

Once the data is in this form, you can access it similarly to how you might access nested data in JSON or other files.

However, a better way to query the actual data in an HDF5 file is to use the defaultPath field in your query. If the defaultPath field is defined in the query, or via the plugin configuration, Drill will only return the data, rather than the file metadata.

Note

Once you have determined which data set you are querying, it is advisable to use this method to query HDF5 data.

Mac

Note

Datasets larger than 16MB will be truncated in the metadata view.

You can set the defaultPath variable in either the plugin configuration, or at query time using the table() function as shown in the example below:

This query will return the result below:

If the data in defaultPath is a column, the column name will be the last part of the path. If the data is multidimensional, the columns will get a name of <data_type>_col_n. Therefore a column of integers will be called int_col_1.

Attributes

Occasionally, HDF5 paths will contain attributes. Drill will map these to a map data structure called attributes, as shown in the query below.

Mac

You can access the individual fields within the attributes map by using the structure table.map.key. Note that you will have to give the table an alias for this to work properly.

Known Limitations

There are several limitations of the HDF5 format plugin in Drill.

  • Drill cannot read unsigned 64 bit integers. When the plugin encounters this data type, it will write an INFO message to the log.
  • While Drill can read compressed HDF5 files, Drill cannot read individual compressed fields within an HDF5 file.
  • HDF5 files can contain nested data sets of up to n dimensions. Since Drill works best with two dimensional data, datasets with more than two dimensions are reduced to 2 dimensions.
  • HDF5 has a COMPOUND data type. At present, Drill supports reading COMPOUND data types that contain multiple datasets. At present Drill does not support COMPOUND fields with multidimesnional columns. Drill will ignore multidimensional columns within COMPOUND fields.
  1. https://en.wikipedia.org/wiki/Hierarchical_Data_Format ↩

  2. https://www.hdfgroup.org ↩

HDF (Hierarchical Data Format), see [https://www.hdfgroup.org] is another format for storing large data files and associated metadata, that is not unrealted to netCDF, because in fact netCDF-4 uses the HDF-5 “data model” to store data. There are currently two versions of HDF in use HDF4 and HDF5, and two special cases designed for handling satellite remote-sensing data (HDF-EOS and HDF-EOS5). Unlike netCDF-4, which is backward compatible with netCDF-3, the two versions are not mutually readible, but there are tools for converting between the different versions (as well as for converting HDF files to netCDF files).

Hdf5 For Mac Download

HDF5 files can be read and written in R using the rhdf5 package, which is part of the Bioconductor collection of packages. HDF4 files can also be handled via the rgdal package, but the process is more cumbersome. Consequently, the current standard approach for analyzing HDF4 data in R is to first convert it to HDF5 or netCDF, and proceed from there using rhdf5.

Hdf5 For Mac Pro

Compared to netCDF files, HDF files can be messy: they can contain images as well as data, and HDF5 files in particular may contain groups of datasets, which emulated in some ways the folder/directory stucture on a local machine. This will become evident below.