Home
Softono
orcli

orcli

Open source MIT Shell
23
Stars
2
Forks
14
Issues
1
Watchers
8 months
Last Commit

About orcli

OpenRefine command-line interface written in Bash (πŸ’Ž+πŸ€–). Supports batch processing (import, transform, export).

Platforms

Web Self-hosted

Languages

Shell

Links

orcli (πŸ’Ž+πŸ€–)

Bash script to control OpenRefine via its HTTP API.

Demo

Features

  • works with latest OpenRefine version (currently 3.9)
  • run batch processes (import, transform, export)
    • orcli takes care of starting and stopping OpenRefine with temporary workspaces
    • allows execution of arbitrary bash scripts
    • interactive mode for playing around and debugging
    • your existing OpenRefine data will not be touched
  • supports stdin, multiple files and URLs
  • import CSV, TSV, JSON, JSONL, line-based TXT, fixed-width TXT or XML
  • transform data by providing an undo/redo JSON file
    • orcli calls specific endpoints for each operation to provide improved error handling and logging
  • export to CSV, TSV, JSONL, HTML, XLS, XLSX, ODS
  • templating export to additional formats like JSON or XML

Requirements

Install

  1. Navigate to the OpenRefine program directory

  2. Download bash script there and make it executable

  wget https://github.com/opencultureconsulting/orcli/raw/main/orcli
  chmod +x orcli

Optional:

  • Create a symlink in your $PATH (e.g. to ~/.local/bin)

    ln -s "${PWD}/orcli" ~/.local/bin/
  • Install Bash tab completion

    • temporary

      source <(orcli completions)
    • permanently

      mkdir -p ~/.bashrc.d
      orcli completions > ~/.bashrc.d/orcli

Getting Started

  1. Launch an interactive playground
  ./orcli run --interactive
  1. Create OpenRefine project duplicates from comma-separated-values (CSV) file
  orcli import csv "https://git.io/fj5hF" --projectName "duplicates"
  1. Remove duplicates by applying an undo/redo JSON file
  orcli transform "duplicates" "https://git.io/fj5ju"
  1. Export data from OpenRefine project to tab-separated-values (TSV) file duplicates.tsv
  orcli export tsv "duplicates" --output "duplicates.tsv"
  1. Write out your session history to file example.sh (and delete the last line to remove the history command)
  history -a "example.sh"
  sed -i '$ d' example.sh
  1. Exit playground
  exit
  1. Run whole process again
  ./orcli run example.sh

Usage

  • Use πŸ“– HTML form docs or integrated help screens for available options and examples for each command.

    orcli --help
  • If your OpenRefine is running on a different port or host, then use the environment variable OPENREFINE_URL.

    OPENREFINE_URL="http://localhost:3333" orcli list
  • If OpenRefine does not have enough memory to process the data, it becomes slow and may even crash. Check the message after the run command finishes to see how much memory was used and adjust the memory allocated to OpenRefine accordingly with the --memory flag (default: 2048M).

Development

orcli uses bashly for generating the one-file script from files in the src directory.

  1. Install bashly (requires ruby)
  gem install bashly
  1. Edit code in src directory

  2. Generate script

  bashly generate --upgrade
  1. Run tests
  ./orcli test
  1. Generate docs
  bashly render templates/html-form docs