In this article we documented the steps to install Open Semantic Search server on Ubuntu to crawl, index and OCR (optical character recognition). The out-of-the-box configuration allows you to index websites, different file formats as images, word and excel document, pdf’s, etc.
Installad Ubuntu Server 18.04.3 LTS
Download the lastest Open Semantic Search version for your system. In our case we will be using Ubuntu 18.04.3 LTS. At the time we are writing this article the download package for Ubuntu OS is at the bottom right on the page.
Ubuntu Linux (LTS): open-semantic-search_19.07.19.deb (~300MB)
Copy the link to the package and download it to your system.
cd ~ wget https://opensemanticsearch.org/download/open-semantic-search_19.07.19.deb
Install Open Semantic Search Server
sudo apt install ./open-semantic-search*.deb
NOTE: After installation stops at Solr installation script continue by typing “q”
Check if everything is running, browse to
http://localhost/search or http://<ip>/search
Install Samba FileShare
sudo apt install samba
Create directory to share
sudo mkdir /var/shares/oss -p
Set folder permissions
NOTE: For ease of use I will set the permission for folder and subfolder to “full control”.
sudo chmod 777 /var/shares/oss -R
Configure Samba FileShare
sudo nano /etc/samba/smb.conf
Add the following lines to the end of the file to setup a new file share
[ossshare]comment = Open Semantic Search Share path = /var/shares/oss read only = no browsable = yes
sudo service smbd restart
Update firewall rules to allow Samba traffic
sudo ufw allow samba
Setup user account
sudo smbpasswd -a <username>
Connect to Share
Linux use smb://<ip>/ossshare
Indexing from fileshare
- Copy data to file share
- Open browser http://<ip>/search
- Click on “Datasources”, “Files & directories (filesystem)” and then on “Add new file or directory”
- Enter local path on the Open Semantec Search Server for indexing
- Click on Save
- On the next screen click on “Index file or directory”
- Go back to http://<ip>/search and refresh page
NOTE: Indexing can take a while depending on the type, amount and size of the content.
In this article we showed a straight forward installation of Open Semantic Search Server and Samba on a fresh Ubuntu OS. There is much more you can configure in terms of how the Open Semantic Search Server can be optimized for your needs.