A distributed in-situ analysis method for large-scale scientific data

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 8
  • Download : 0
Recently, a massive amount of data is generated in a wide range of scientific applications such as NASA's satellite, the large hadron collider, and large synoptic survey telescope. Most of scientific data follows the array model, and there are various kinds of standard array formats such as HDF, NetCDF, MDSplus, and ROOT. SciDB is the most well-known DBMS that stores the array-based scientific data and processes queries on it. SciDB is a distributed DBMS, and so, is scalable in terms of query performance. However, it has a severe drawback that takes a huge amount of time for loading a massive amount of scientific data into DBMS. That is, it is not scalable in terms of data loading. To overcome that problem, we propose a distributed in-situ analysis method that allows processing queries on raw scientific data in a distributed manner without explicit data loading. In detail, we propose the in-situ scan operator that scans necessary data of the array format and passes it to upper operators of the pipeline of a query plan. It also performs repartitioning during in-situ scanning, which is required for correct query results. Through experiments using real datasets, we have shown that the SciDB system using our method significantly outperforms the original SciDB system by orders of magnitude in terms of the performance of the first query. ? 2017 IEEE.
Institute of Electrical and Electronics Engineers Inc.
Issue Date

2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017, pp.69 - 75

Appears in Collection
RIMS Conference Papers
Files in This Item
There are no files associated with this item.


  • mendeley


rss_1.0 rss_2.0 atom_1.0