Hello World the Polyglot Way

Intro

You will learn a lot in this booklet.

Typically a Hello World tutorial teaches the bare minimum to get a program running, then explains a little bit about the code used. Many tutorials (depending on the language) even tell you: “don’t worry” about certain basic components until later, leaving you with a frustrating mystery.

There are no mysteries here. I will cover everything I can think of for you to fully understand exactly what is going on with the Hello World programs.

How This Will Go

Things to learn:

You will learn each concept (the what), using three different programming languages (the how) at the same time. You will of course also need to lean some tools to accomplish these things with. I will cover all the gory details.

This technique is a double edged sword. On one side, the language (the how) learning curve is steeper than traditional teaching as you’re learning three at once! On the other side, the important fundamental concepts are emphasised (the what), which I believe is the more difficult hurdle to cross for new programmers. Too often students get hung up on the syntax and can’t progress onto really understanding the semantics, let alone the actual meaningful problem solving! With this approach the question of what a particular syntax means becomes a question the learner can answer themselves and, more often than not, the answer will be: “It’s just how this particular language writes the concept that I now understand!”.

Lesson 0 - What Programming Is

Programming is writing instructions for a computer to follow. The instructions are encode in a language you can understand, and that can be transformed into a form a computer can execute.

Programming is also solving problems. At least, that’s the purpose of programming! Even if the problem is just a personal itch you want to scratch like: “How can I backup my files automatically?”. Many problems have been solved with existing programs. Backing up files is an example of this. We will have to solve some simple problems however, in order to learn some basics. I will keep the focus on problem solving so we don’t forget the bigger picture ;).

What is called a program can be a few things. It usually starts as a plain text file containing instructions written in a programming language. It usually then becomes the version a computer can execute. Lets talk about that.

Lesson 1 - How the Computer Does Stuff

The code we write cannot be executed directly by the computer. It only understands one language, known as machine code, which we have no hope of reading or writing while maintaining our sanity. Machine code is just numbers, written in binary. Therefore what we write is human readable (just!), and then translated into what the computer can understand. There are two ways this is done: compilation, and interpretation.

Compiler

A compiler is a computer program that takes the code we write and turns in into code that the computer can execute. If you like analogies, it’s like how a cook takes a recipe and turns it into food. Not literally of course. The cook uses the recipe as instructions for how to use other resources (ingredients) to make the food; and does so. After the cook is done the recipe still exists, and can be used again as many times as you like. This is just like using source code and compiling it into software. We call the code we write source code as it’s the original source of the eventual program that is created.

Once a program’s source code has been compiled, we say the output is an ‘executable’ as the machine can now execute it. Executables are also called binaries (as the are usually binary numbers), programs, or applications as they are usually applied to some problem to solve.

Sometimes executing a program is called running a program. Run and execute are synonymous in this context.

Interpreter

An interpreter does a similar job to a compiler. It takes source code, and turns it into a form the computer can execute. The only difference is that it does this while the program is being executed. There is no need for us to do the compiling before we want to run the program, it is done automatically, on the fly. The downsides to this are: programs that need an interpreter to run, need an interpreter to run; and they typically run slower than compiled programs. An interpreter is another program that must run on the computer, and the program that it runs has to rely on this extra ‘layer’ between it and the machine. This means programs execute more slowly when run by an interpreter than programs that have been compiled.

Lesson 2 - Writing Programs

Now that you know all that theory, or can refer back to it at least, we can move on to some more practical information!

Text Editor

Step one is to choose a text editor. I advise you use whatever you have on your computer, unless it really sucks. I will assume you use free software and ensure the rest of this booklet works on the following operating systems:

On GNU/Linux you typically have a text editor included such as Gedit, Pluma, Kate, Leafpad, or Mousepad. If your system is command line only, you will have at least one of vi, vim, emacs, or nano. I will assume you know how to use these terminal based text editors.

Whichever text editor you have will be fine for writing programs, provided it is not a word processor. Don’t use an office suite’s word processor, or any web based word processors. They won’t work.

Environment

Each programming language is used to write source code. Source code is just text saved in a file. For this source code to be useful it must be executed somehow. Each language is executed in a different way, some very similar, but some very differently. For a program in a particular language to be executed, it requires it’s own execution environment. I will cover setting up each language’s environment with their introductions next.

Lesson 3 - The Three Languages

The three programming languages we will be learning throughout this series of booklets are POSIX Shell, C, and Python. I have chosen these as they are available on all platforms and represent very different uses of programming languages. This will give you a good understanding of how to apply the basic concepts in different ways. We will also cover how these three languages are related, and a few ways to use them together.

C

C is a compiled language. This means once you have written a program in C, you must use a compiler to turn it into an executable.

Environment for C

The C compiler we will use is called cc. On Debian you should use GCC. GCC is an acronym for GNU Compiler Collection. It was originally just a C compiler, and the name an acronym for GNU C Compiler. On your system you must ensure the program cc is installed. Some operating systems include it by default, others require you to install it.

On Debian, it can be installed using the following command if you are logged in as the root user:

# apt install gcc

On FreeBSD cc is provided by default by the clang C compiler.

POSIX Shell

A shell is a type of interpreter as described above. A shell is an interpreter, specifically for an operating system in this case. POSIX is an operating system standard, that defines a Shell Command Language. This language standard is adhered to by many shells. The name shell means it is the ‘outer layer’ of an operating system. Modern operating systems typically have a graphical shell by default called a graphical user interface (GUI). They also usually have a text user interface (TUI), although some try to hide it. Text user interfaces are more commonly called command line interfaces. This is because the user type it commands, line by line for the computer to execute.

Although graphical user interfaces are technically a type of shell, for simplicity, and because this is how most people refer to it, I will use the term “shell” from now on where I mean a POSIX compatible shell.

Reference to the POSIX standard

Environment for POSIX Shell

There are many shells that are POSIX compliant. Most shells offer their own extensions to the POSIX features. Here, we stick to POSIX, so any POSIX compliant shell will do.

Bash is an acronym for Bourne Again SHell. It is the name of a shell that adheres to the POSIX Shell Command Language standard. Bash is designed for Unix like operating systems, but it can also run on windows using Cygwin. As this tutorial focuses on using GNU/Linux, you should have the bash package pre-installed, and I will use bash for the examples. Other operating systems come with POSIX compliant shells such as macOS with zsh, and FreeBSD with sh.

Python

Python is another interpreted language. Unlike POSIX shell, it is not OS specific. The python interpreter is designed to run on any OS. There are two major versions of Python: 2 and 3. They are very similar but not compatible. We will use 3 as it replaced version 2 as the only supported version by the developers as of 1st January 2020.

Python Environment

Install python 3 if it is not already. You can get it from your package manager, or from the python website. On Debian the command is:

# apt install python3

On FreeBSD the command is:

# pkg install python3

Lesson 4 - Let’s Get Started Already!

We are going to start with the traditional first program which is boring but essential. It’s called Hello World. The reason we use this program is to cover the basics of getting the compiler or interpreter up and running, and making sure we can write and execute, one of the most basic possible program. If we can’t do this, we can’t really progress!

What this program will do is display the text “Hello, World!”. This is a form of data. There are many others that we will cover soon. For now, we will focus on text. Text like “Hello, World!” is called a string in computer terms. Computer programs are all just text so we need some way to distinguish the regular text of the program, and the text we want to keep as text when the program runs. To remember the name string, think of it as a string of beads on a necklace. It’s just a string of characters one after the other.

Which brings me on nicely to characters! Characters are any single letter, number, punctuation or other symbol. For example ‘a’ is a character, ‘!’ is also a character. We typically write characters and strings in single or double quotes. This depends on the language however

Starting With POSIX Shell

Let’s start with shell as we will use it to run C and Python. As I said, shell is an interpreted language. Like many interpreted languages, shell has a REPL, which stands for Read–Eval–Print Loop. This means we can type the program into the interpreter which will interpret it on the fly.

Open your terminal application. Most Linux distributions use bash by default, if you are not sure, enter the following command:

$ sh

Now that we are sure to be using sh in your terminal, enter the following command:

$ echo 'Hello, World!'

As you should see, the string ‘Hello, World!’ has been displayed on the screen directly after the line where you entered the statement.

You may have noticed we used single quotes for this string. This is one way to denote a string in shell. Another way is using double quotes. Try this:

$ echo "Hello, World!"

It should work the same way for this string, however depending on the content of the string, it might not. When using single quotes in shell, exactly the characters that appear in the string are shown. This is called a literal string. However, if you use double quotes, some characters have special meaning and can change the string. This can be useful, but we will get to that shortly.

The third way to use strings in shell is this:

$ echo Hello, World!

If you enter that, you will see you get the same result. This is because the type of data the shell deals with by default is strings. Most other languages deal with numbers first and foremost, so need to distinguish strings using quotes. Shell however, assumes it will be working on strings, so we don’t always have to quote them. It is better to use quotes as standard though. This avoids the shell interpreting the content of strings when you didn’t intend that behaviour.

As you will see, there are many ways of doing the same thing in most programming languages. That is why it is important to focus on the what rather than the how. If you know what you want to do, the how is flexible, and changes depending on the language. The what is almost always the same however.

echo is a special command utility in shell. echo will do one thing as you have seen: echo some text back to the user. When I say ‘back to the user’ what I mean is, it is shown on the screen so that you (the user) can read it. Showing text to a user is an integral part of any shell. It is called standard output. Historically, standard output was printed on paper, as was everything a user typed in. The remnants of this can be seen in the more usual term for displaying text: ‘print’. Shell also has a print command called printf. The ‘f’ is short for format. This means it expects a format string, and potentially some other data to be formatted with it. We will go in more detail on this when we get to C. For now try:

$ printf "Hello, World!"

You might notice something different here. The result of using echo, was that the string was printed on one line, by itself. The output of printf however is followed directly by the shell prompt, with not even a space. This is because echo automatically appends an invisible newline character that, you guessed it, puts any following text on a new line. We can write a newline character by, strangely, using two characters: a backslash followed by an ‘n’: \n. So if we append this to our string, there should be a new line before any following text. The double quoted string reveals it’s difference here though.

$ printf "Hello, World!\n"

If you enter this you will get an error message like this: bash: !\n: event not found. This is because an exclamation character ! is one of the shell’s reserved words. Bash also uses it for history, which is what the above error message is about. So ! is interpreted by shell when in a string using double quotes. Try again with single quotes:

$ printf 'Hello, World!\n'

That should work as before. One caveat is that a string that uses either type of quotes cannot contain the same type of quotes. For example if we want to print the string ‘This isn’t going to work.’ it won’t work. You can try it and see:

$ echo 'This isn't going to work.'

You will kind of get stuck at a different prompt that usually looks like >. To get out of it, either enter another single quote character.

The reason that didn’t work is the shell can’t tell where the string begins and ends. Is the string ‘This isn’ or is it ‘t going to work.’? It can’t be both because a single quote delimits the beginning and end of a string. We either have a string that doesn’t begin, or one that doesn’t end. echo will just recognise the first complete string and treat the rest as more text with the beginning of another string at the end. > is the shell telling you it’s waiting for more input, in this case the end of the string that was started with '.

printf fails in a slightly different way to echo. You can experiment with this but for now we will move on.

Python

Python is interpreted, just like the POSIX shell language, but we must use the Python interpreter instead of a POSIX shell. One of the main uses for a shell is to run other programs. We will now use Bash to run the Python interpreter. At the time of writing, the Python 3 interpreter is usually called python3 and Python 2 is just called python. A few systems call Python 3 python and call Python 2 python2. Helpfully, Python will print it’s version number when it starts so we can be sure we’re in the right version. So if you enter python3 and get an error, try python. As most systems use python3 now.

In your terminal, enter the python command:

$ python3

You should see a few lines of text printed, including the version number which should begin with 3. The final line should be >>>. This is the python prompt.

You enter commands into the Python interpreter just as you did for the shell. Try this:

>>> print('Hello, World!')

You should see the usual message printed to the terminal, then be returned to a new Python prompt.

As you will have noticed, the way we write the command in python is slightly different than in shell. Some definition of terminology here is required.

In Python, the way we print to standard output is to use a function. In shell, the same concept is usually referred to as a command, or utility. In fact common use is to call any single instruction in shell a command, weather it is a function or not.

Simply put: a function is a way of grouping instructions for the computer to execute.

In this example, the function print, takes some data in the form of a string, and prints it to standard output. The way functions are used in Python is to enclose the data we give it in brackets. Brackets are also known as parentheses. Each piece of data we give to a function is called an argument. This is a term used in all programming languages so we will use it from here on.

We will now jump back to shell to explore other ways to run programs in interpreted languages. To exit the Python interpreter enter another function:

>>> exit()

As you can see, this function takes no arguments as the brackets are empty. It is still a function however, which in Python’s syntax requires brackets. The exit function doesn’t need any arguments, as it will just exit the Python interpreter.

Shell

Now we’re back in the shell, we need to go over files and environments.

Bash, as a POSIX shell, is a full text user interface to your operating system. I’m assuming this is GNU/Linux, so what we will do is use the shell to explore our system and create some files.

To have Bash print exactly where we are in the file system, enter this command:

$ pwd

This is an acronym for Print Working Directory. You may be used to calling directories folders. They are one and the same. The original name is directory and that is the name most languages use. Your working directory is the directory you are currently working in. You should have seen output like this:

/home/name

Where name is your user name. You might be somewhere else, which is fine. pwd will show you where you are no matter what! In fact if you are ever unsure where you are in the file system, use pwd.

You can change your working directory with the cd command. An acronym for Change Directory. Try this:

$ cd ~

That squiggle is the tilde character. You probably don’t use it often, and will probably need shift to access it on your keyboard. You should now be in your home directory. That is, the directory with the name of your user name. You can of course check with pwd.

cd usually takes one argument, as above. It should be a directory name, either one relative to the current directory, or it’s full path from the root directory.

~ is a short way to refer to your home directory. On GNU/Linux it will be in another directory, called home. A bit confusing but gets the point across. Now that you’re in your home directory, pwd will show you where that is. / is used in two ways here. It represent the root of the file system, and is the separator between directories and files. The root of the file system is just the name of the directory that contains all other directories. We can change to the root directory with this:

$ cd /

Now you’re in the root directory which you can see with pwd. The next shell utility is ls, which is short for list somehow.

$ ls

You should now see the names of all the files and directories in your root directory. Change to the one called home with:

$ cd home

Now if you enter pwd you should see that you are in the home directory, which is in the root directory (denoted with /). From here, enter ls again. You should see at least a directory for each user account on your system, including yours. These are called the home directories. Your home directory is the one named with your user name. You can check your username with the command:

$ who am I

Yes that’s right, you’re asking your computer who you are. Don’t have an existential crisis though, your user name will be printed to standard output. Use cd to change to your home directory. Now we can create a file with the command touch.

$ touch hello.sh

This will create a new, empty text file called ‘hello.sh’. Touch is an old command with a weird name like most shell utilities. It is designed to ‘touch’ a file, leaving a trace that the file had been accessed or modified. POSIX compliant Operating Systems keep track of when files were last accessed or modified. touch by default just updates that time to the present. If however, the file named doesn’t exist, touch creates it. A roundabout way to create an empty file I know, but it’s the standard way to do it on a POSIX system! The catch here is that if you already have a file named ‘hello’, you won’t create a new one. This is unlikely however, but just in case, you can use a different file name.

We use the .sh file extension so that we know this is a shell script. This is common, but not required. It is also considered bad practice for finished scripts that you might distribute to others to use.

Now we should edit this file with your chosen editor. Usually you can just type the name of the editor in all lower case, and give it the file name as argument. For example if the editor is Gedit, enter:

$ gedit hello.sh

The editor should pop up, with the empty file open. Enter the following text:

#!/bin/sh

echo 'Hello, World!'

Now save, then close the editor. You should be back at your shell prompt. There might have been some output from your editor, ignore it if there is.

What we have created is called a shell script. A script is a file, containing instructions for an interpreter to execute. In this case, the interpreter is sh. GNU/Linux has a very good security feature called permissions. We currently don’t have permission to execute the script because this is the default for any new file. To give ourselves permission enter this command:

$ chmod u+x hello

chmod is short for CHange MODe. Change mode means to change the modes of the permissions of a file. As you can see, chmod take two arguments: u+x and the file name. u+x it the cryptic way to tell chmod to change the mode of the user’s permissions, to add the execute permission. user, add, execute: u, +, x.

Now we can execute the script by entering the following command:

$ ./hello

You should see the familiar Hello, World! message printed to standard output.

Some explanation: The first line of the script #!/bin/sh is a special message to the shell. It’s telling the shell which interpreter to use to interpret the rest of the script.

The first two characters in this context are called the ‘shebang’, this naming may have come from a contraction of the character’s names “haSH”, and “bang” an old name for “exclamation mark”.

The next part /bin/sh is a file path. It is telling the shell to look there for the right interpreter. Here we specify that it is sh we want to use to interpret the script. More on environments later. sh is the POSIX standard name of a compliant shell.

Python

No we’re going to write a script in python. As we did with the shell script, lets first create a file:

$ touch hello.py

As you can see we are using the .py file extension for python. Similar to shell scripts, this is not required in general, but is quite common for personal scripts. We will use it so that we don’t get our different language files mixed up.

Again open the file with your text editor:

$ gedit hello.py

Now type in the following program:

#!/usr/bin/env python3

print('Hello, World!')

Similar to our shell script we start with the shebang. However the next part is different here. /usr/bin/env is the file path in this case. env is a POSIX standard utility. It obtains the current environment of the system. This includes python3. So the program env knows where python3 is, regardless of the OS it runs on so we pass that as argument to env so that env knows we wan it to run python3. More on environments later. And as with the shell script, save then close your editor. Exactly the same as the shell script we have to make this one executable:

$ chmod u+x hello.py

Now we can run it. This is done the same way as the bash script:

$ ./hello.py

You should see the ususal messege printed to the standard output.

C

Now that we’ve done the Hello World program in Shell and Python, we will do the same in C.

As with the previous two, let’s first create an empty file:

$ touch hello.c

As you can see, a C program uses the file extension .c. This is standard in C and required for the source code files for C programs.

Open it with your text editor:

$ gedit hello.c

Now enter the following program:

#include <stdio.h>

int main()
{
    printf("Hello, World!\n");
}

We have some extra syntax here even beyond what was required for python! C is a lower level language python or shell. This means it is closer to the machine’s language, and requires more detail. And remember, C is a language that must be compiled, rather that interpreted. Let me explain:

The first line #include <stdio.h> is an instruction to the compiler to include another source code file called stdio.h. stdio.h is a C header file (denoted by the .h file extension). Header files include other functions that many programs need, so we don’t have to write them repeatedly for each of our own programs.

The third line begins with a function definition int main(). There are a few things going on here. First, int is a type of data. Short for integer, int specifies the type of data that should be returned by the function.

More on Functions

Functions are subsections of programs and so can be run repeatedly. Typically they take in some data, return some data, or have some side effect. Either of the input data, or output data can be nothing.

In some languages the type of data must be specified. C is one of those languages. Python and shell language have interpreters that can figure out the data types for you.

C

Did you save your C program? Make sure you do. Now that you know more about functions in general, let’s look more at the main function in C.

main is the name of the function. It is a special name in C as it is the one function that is required for every C program. As with the print function in python, functions in C use brackets to surround the data that goes into a function. In this main function, the brackets are empty. This shows that this function will not take any arguments in this case.

Next we have a line with one character, an opening curly bracket {. This is used in C to denote the beginning of a block of code. Likewise on the last line is the matching closing curly bracket }. This denotes the end of the block of code. Blocks of code like this are just a sequence of instructions that are defined as a group. In our case, it groups just one statement. Grouping one thing seems kind of redundant, but in C functions of any length require curly brackets.

Functions again

When writing programs in any language, functions are written in two ways: definition, and usage.

When a function is defined, we write out exactly how it should work. This includes what data it expects as argument (if any), what data it will return (if any), and what it is supposed to do.

When a function is used, we ‘call’ the function to come and do what we expect it to do. Thing of a well trained dog. You call it’s name and you expect it to follow your instructions. Functions are like dogs that are trained to do one routine of tricks, and only one. All you have to do is call it’s name, and give it what it need to perform it’s ‘tricks’, and off it goes. Each function will always do the same thing, given the same inital state and data. We get a function to do it’s thing by calling it’s name and passing it the required data as arguments.

C

In this Hello World program in C, we define a function called main and use a function called printf. printf is defined it the header file stdio.h so we don’t have to define it ourselves. stdio is short for STandard Input and Output. Because we include the header file stdio.h in the first line, we will be able to use the function printf in our program.

The definition of any function in C has two main parts: the function header, and the function body.

The header is one line that consists of: the type of the data that will be returned, the function’s name, and in brackets the list of parameters it needs. Our main function will return data of type int, it’s name is main, and it lists no parameters because in needs none. Parameter is another name for the defenition of an argument. Strictly speaking a parameter is the name used when defining a function, and argument is the name used when calling a function.

The body is a block of code. Blocks of code are surrounded by curly brackets as we know, and contain any other code we want.

Each individual instruction in C is called a statement. Think of it like a sentence in English. In any programming language however, it is common (and sometimes required) to put each statement on it’s own line. Like in English though, there are certain grammatical rules. In the same way a sentence in English ends with a full stop, a statement in C ends with a semicolon. We can see this with the statement printf("Hello, World!\n"); in our main function’s body.

White Space

You will notice that the single statement we have defined in our main function is indented from the left margin. I have used 4 spaces here but other people use different numbers of space characters. Some people like to use tab instead of space. Whichever way you do it, when used in this way, these invisible characters are referred to as white space. This is because when code is printed on white paper (as it always used to be) the empty space left by these characters is white.

C

In C is it standard for people to indent code in a block. Later we will see other blocks of code indented in this way. It is not required for C code to be indented. It fact much of the white space is not required for the program to work, but it makes it easier for humans to read and understand. Just as in English, the same text can be written in one giant block, bit it is easier for humans to read and understand it if split into paragraphs and chapters.

You will recognise the string as we used a similar one in shell when we used the printf utility. The difference here is that strings in C always use double quotes. We don’t have to worry about the ! having special meaning in a string as it doesn’t in C. There are other parts of a string that can have special meaning in C which we will come to later.

When we wrote shell and python code into files, we made them executable, then executed them from the command line. With C we have an extra step as it is a compiled language. You guessed it, we need to compile it. For this we use the compiler cc. Once you have saved and closed your editor, enter this:

$ cc hello.c

You will not see anything special displayed if everything went okay. If you see any errors, check you have typed the program exactly as shown above.

You should have a new file called a.out in your working directory. This is the output of the compiler and is our compiled Hello World program! The compiler will have made it executable by default, so we can run it with this:

$ ./a.out

As with the bash and python versions of this, you should see the string “Hello, World!” printed on it’s own line to standard output.

Congratulation! You have now completed the traditional Hello World program in three languages! You should have learned a lot compared to many Hello World tutorials. We have covered compilers, interpreters, shells, function definitions and function calling, the string data type, creating and editing files, navigating the file system, execute permissions, to name a few.