Home
Softono
castroom

castroom

Open source JavaScript
13
Stars
4
Forks
18
Issues
1
Watchers
3 years
Last Commit

About castroom

Podcast Search Engine

Platforms

Web Self-hosted Kubernetes

Languages

JavaScript

Links

githubheader

Overview

Castroom is a podcast search engine. It was primarily made to learn how to make a distributed web crawler using Kubernetes. It is capable of gathering hundreds of thousands of podcasts within a few hours, and can easily be scaled up even more with one simple command.

Project Structure

Discovery

Master

  • coordinates all the crawler jobs
  • maintains a local cache (using LevelDB) to prevent the same URL from being crawled multiple times
  • receives data from the crawler nodes and pushes to the queue
  • the crawler nodes send all data to this node after crawling a website
  • send the data to ElasticSearch on completion
  • managed by Google Kubernetes Engine

    Crawler

  • crawls iTunes podcast pages and sends batched data to the master node for caching
  • goes through a proxy to bypass certain restrictions
  • managed by Google Kubernetes Engine

    API

  • provides endpoints for querying Elasticsearch and retrieving podcast Feed information
  • hosted on Heroku

    Web

  • frontend for the search engine
  • managed by Firebase Hosting

project-structure

Technologies Used

  • Docker
  • Google Kubernetes Engine
  • Amazon Simple Queue Service
  • Amazon Elasticsearch Service
  • Heroku
  • Firebase Hosting
  • React
  • Node.js
  • LevelDB
  • Datadog

Screenshots

Search Search Results





GIF of Castroom