Home
Softono
arctern

arctern

Open source Apache-2.0 C++
104
Stars
49
Forks
10
Issues
13
Watchers
4 years
Last Commit

About arctern

Arctern is an open-source spatial-temporal analytics framework designed for fast and scalable data processing. It provides a unified interface for data analytics and processing across platforms ranging from laptops to clusters and the cloud. Built on a GeoDataFrame and GeoSeries abstraction that follows GeoPandas conventions, Arctern enables scaling both up and out across different execution environments. The framework delivers rich and consistent spatial-temporal algorithms suitable for various stages of a data science pipeline, including trajectory processing, spatial clustering, regression, spatial-relation operations, and rendering. It currently includes an efficient multi-threaded GeoSeries implementation, with a distributed version based on Spark under development. Performance benchmarks show Arctern achieving up to 24x speedup over GeoPandas in multi-threaded execution and an average 7x improvement in single-thread execution. It also offers GPU-accelerated rendering and spatial-relation operations with

Platforms

Web Self-hosted

Languages

C++

Links

Arctern Docs

Arctern 中文文档

Overview

Arctern is a fast scalable spatial-temporal analytics framework.

Scalability is key to building productive data science pipelines. To address the scalability challenge, we launched Arctern, an open-source spatial-temporal analytic framework for boosting end-to-end data science performance. Arctern aims to improve scalability from two aspects:

  • Unified data analytic and processing interface across different platforms, from laptops to clusters and cloud.
  • Rich and consistent algorithms and models, including trajectory processing, spatial clustering, and regression, etc., across different data science pipeline stages.

Arctern's approach and current progress

We adopt GeoPandas‘s interface and plan to build the GeoDataFrame/GeoSeries that scale both up and out. On top of GeoDataFrame/GeoSeries, we will develop a consistent spatial-temporal algorithm set across execution environments.

We have now developed an efficient multi-thread GeoSeries implementation, and the distributed version is in progress. In the latest version 0.2.0, Arctern achieves 24x speedup against GeoPandas. Even under single-thread execution, Arctern outperforms GeoPandas 7x on average. The detailed evaluation results are illustrated in the figure below.

We are also conducting experimental GPU acceleration for spatial-temporal data analysis and rendering. By now Arctern provides six GPU-accelerated rendering methods and eight spatial-relation operations, which outperform their CPU-based counterparts with up to 36x speedup.

In the next few releases, our team will focus on:

  • Developing a distributed version of GeoSeries. Our first distributed implementation of GeoDataFrame/GeoSeries will be based on Spark. It is developed in sync with Spark 3.0 since its preview release. Spark's supports on GPU scheduling and column-based processing is highly in line with our idea of high-performance spatial-temporal data processing. Besides, the introduced Koalas interface offers a promising option for implementing consistent GeoDataFrame/GeoSeries interfaces on Spark.
  • Enriching our spatial-temporal algorithm sets. We will concentrate on KNN search and trajectory analysis in the project's early stages.