Self-hosting: Difference between revisions

No edit summary
add blog link
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[File:Server rack interconnects work in progress - IMG 3473.jpg|thumb|Server rack interconnects work in progress]]
When you host a web-based software on your own (self), it is called '''self-hosting'''. It could be a simple HTML website, a dynamic PHP blog, or a distributed social network.
When you host a web-based software on your own (self), it is called '''self-hosting'''. It could be a simple HTML website, a dynamic PHP blog, or a distributed social network.


Typically "self" refers to configuration of the software (operating system, packages, database, and so on) alone and the physical server is operated by a cloud provider like AWS, DigitalOcean, or Hetzner. But some people self-host on computers they physically control, in their home or office.
Typically "self" refers to configuration of the software (operating system, packages, database, and so on) alone and the physical server is operated by a cloud provider like AWS, DigitalOcean, or Hetzner. But some people self-host on computers they physically control, in their home or office.


<small>Note: The examples used throughout this article are '''not endorsements'''/suggestions. They're for illustrative purposes with the hope that a novice reader might have heard about that service provider/software and can consequently gain greater understanding.</small>
<small>Note: The examples used throughout this article are '''not endorsements'''/suggestions. They're for illustrative purposes with the hope that a novice reader might have heard about that service provider/software and can consequently gain greater understanding.</small>


== Elements of self-hosting ==
== Elements of self-hosting ==
Line 13: Line 15:


When you register a domain (say, fsci.in), you are automatically in charge of all subdomains (wiki.fsci.in, videos.fsci.in, meet.fsci.in, etc) and these do not require further registration with the registry. How do you configure these then? That's where '''DNS''' comes in. DNS allows you to specify which server your domains and subdomains should be connected to.
When you register a domain (say, fsci.in), you are automatically in charge of all subdomains (wiki.fsci.in, videos.fsci.in, meet.fsci.in, etc) and these do not require further registration with the registry. How do you configure these then? That's where '''DNS''' comes in. DNS allows you to specify which server your domains and subdomains should be connected to.
<small>Note: You might already have seen the term DNS in your router or network settings. It's the same concept. The public/free DNS service you use on your computer is what your browser, etc will use to find out IP addresses of websites you want to visit. But even that DNS service needs to know how to answer a request for your domain. And therefore you need an "authoritative" DNS server which is configured with the right answer. This is often provided by the registrar themselves, but you can also have a service provider different from your registrar be your authoritative DNS server. DNS servers talk to each other. Therefore, even if you have a different DNS server on your router/network, it will eventually ask your authoritative DNS server for the answers.</small>


You can, for example, have fsci.in pointed to 35.185.44.232 with a DNS 'A' record and have wiki.fsci.in pointed to 135.181.250.25 with another DNS 'A' record. That means, when someone types in fsci.in on their browser, the browser will connect to the IP address 35.185.44.232 which will connect it to a '''server''' on the internet with that particular IP address. And similarly when someone types in wiki.fsci.in the browser will connect to a different server, the one with the IP address 135.181.250.25.
You can, for example, have fsci.in pointed to 35.185.44.232 with a DNS 'A' record and have wiki.fsci.in pointed to 135.181.250.25 with another DNS 'A' record. That means, when someone types in fsci.in on their browser, the browser will connect to the IP address 35.185.44.232 which will connect it to a '''server''' on the internet with that particular IP address. And similarly when someone types in wiki.fsci.in the browser will connect to a different server, the one with the IP address 135.181.250.25.
Line 34: Line 38:
=== Database ===
=== Database ===
Most software store important data in a different software – the database (eg: mysql/mariadb, postgresql, sqlite). Some software support using any database software, some require a specific database software.
Most software store important data in a different software – the database (eg: mysql/mariadb, postgresql, sqlite). Some software support using any database software, some require a specific database software.
== Difficulties in self-hosting ==
Although conceptually straightforward, there are some real life scenarios which might make self-hosting a difficult choice for some people.
=== Reliability and uptime ===
Your self-hosted setup can stop working for various reasons. Your site could become popular and get too many requests, thus overloading your server. You could run out of storage space on your disk. You might exceed the bandwidth provided by your internet service provider. Your cloud provider might restart your server without warning, or shut you out because you missed paying.
You might have to account for these when deploying a super-important service.
=== Data loss and backups ===
There are ways in which you might lose some or all of the data stored by your services. The server could crash. You might accidentally delete some important files or database. You might lose access to the server backend because you lost your SSH credentials and password. The provider might kick you out without warning. Issues during software upgrade could cause data corruption.
Frequently backing up important data is advised, but also difficult to practice. Some cloud providers have backup services that can do this at an extra cost. There is no guarantee that you'll be able to restore data from backup too.
=== Security ===
There are several bad people on the internet who will try to gain access to your server for various reasons. They might hack into your accounts at your service providers (by phishing, stealing passwords, etc), into your operating system (by brute forcing, etc), or into your software (by exploiting vulnerabilities, etc). Once they gain access to your system they might steal your data and misuse it, hold it ransom, or destroy it altogether. They might run harmful software on your server. They might sometimes even be silently collecting information without revealing their presence.
When a software maker becomes aware of a security bug (vulnerability) in their software, they fix the bug (patch it) and release new version of their software without the vulnerability. Therefore, keeping your software (operating system and packages) up-to-date with new versions of software is crucial. (But newer versions of software could introduce newer bugs and other incompatibilities).
Even if the core packages of your software is regularly updated, plugins/extensions can also make the software vulnerable. For example, WordPress has several thousands of plugins (developed by numerous people) and some of these might have vulnerabilities that are not fixed.
=== Cost ===
The more powerful and complicated your self-hosting setup is, the more costly it becomes to run it. You would need to purchase servers with more RAM/storage/CPU and that would be more expensive. You would need larger backups and that'll need to be paid for. And so on.
While it is possible to find relatively cheap hosting providers, some of them are also more risky. For example, there are complaints about Hetzner kicking users out and deleting their data for missing payments. Some providers (like AWS) offer a free tier to begin with, but at the end of the free tier you would have to pay relatively more (and by then you might have done too much work that is difficult to move to a different provider).
As costs depend on the exact services you're running, it is difficult to suggest a reasonable cost. But if you are running a simple website or service, it should cost you about ₹500-1500 per month (and not more).
== Tips ==
=== Prefer static websites when possible ===
A static website refers to a bunch of HTML pages (and the images/styles/javascript required for these) that can be directly served by any web server software. These can be generated by hand (by typing out HTML), by static site generators (like hugo, jekyll), or by using a static export feature of dynamic websites (eg: Simply Static plugin of WordPress, or Static Export feature of NextJS).
Once generated, these HTML files can be served to any number of users without any change, and without running any other software. Therefore they're simple to host and serve. You can simply use your own web server and point it to the folder containing HTML files to serve the website.
There are several free hosting providers who host simple HTML static websites too! These include [https://docs.codeberg.org/codeberg-pages/ Codeberg Pages], [https://grebedoc.dev/ Grebedoc], [https://pages.cloudflare.com/ Cloudflare Pages], [https://www.netlify.com/ Netlify], [https://docs.gitlab.com/user/project/pages/ Gitlab Pages], [https://docs.github.com/en/pages Github Pages], etc.
Although it has limitations in terms of personalization and dynamic features, several websites (including FSCI's website) run as static websites. These are easier to be made secure, can be easily hosted, can be optimized to load very fast, and has several other advantages.
=== Read documentation ===
Some people don't read. And then they make mistakes that they could have avoided if they had read the documentation. One could say that this is a design issue and that the person who did the documentation is to be blamed. But if you want to host services on your own, a mature approach would be to hold yourself accountable in reading documentation. Read documentation. Even if it is long. Even if it will take a long time. Even if you don't understand most of it. (Try to understand more about topics you don't understand). It will help.
=== If you can read the code, you will be able to solve more problems ===
While good documentation is enough to deploy software, documentation is not always complete or correct. If you are able to read the source code you will be able to get an exact answer to various problems you come across. You might be able to spot bugs, report them, and even fix them! Since free software is a collective enterprise, your contribution in this way is welcome and encouraged! (Be mindful about the personal preferences of maintainers, though)
=== LLMs are Stochastic Parrots ===
While LLMs are good at giving you guidance on things about which there is plenty of literature on the internet, many sysadmin tasks are very contextual to your setup (the state of your server and the peculiarities of the software you're dealing with). If you do not know what you're doing, using LLM can be dangerous as they can give generic answers which might not be the best for you. If you do know what you're doing, you probably will be using LLM like a search engine.
== See Also ==
* [[Sysadmin roadmap]]
== External Links ==
* [https://www.keithcirkel.co.uk/a-playbook-for-hosting-simple-services/ How I host websites] — blog post by Keith Cirkel, engineer at Mozilla