You will learn a lot in this booklet.
Typically a Hello World tutorial teaches the bare minimum to get a program running, then explains a little bit about the code used. Many tutorials (depending on the language) even tell you: “don’t worry” about certain basic components until later, leaving you with a frustrating mystery.
There are no mysteries here. I will cover everything I can think of for you to fully understand exactly what is going on with the Hello World programs.
Things to learn:
You will learn each concept (the what), using three different programming languages (the how) at the same time. You will of course also need to lean some tools to accomplish these things with. I will cover all the gory details.
This technique is a double edged sword. On one side, the language (the how) learning curve is steeper than traditional teaching as you’re learning three at once! On the other side, the important fundamental concepts are emphasised (the what), which I believe is the more difficult hurdle to cross for new programmers. Too often students get hung up on the syntax and can’t progress onto really understanding the semantics, let alone the actual meaningful problem solving! With this approach the question of what a particular syntax means becomes a question the learner can answer themselves and, more often than not, the answer will be: “It’s just how this particular language writes the concept that I now understand!”.
Programming is writing instructions for a computer to follow. The instructions are encode in a language you can understand, and that can be transformed into a form a computer can execute.
Programming is also solving problems. At least, that’s the purpose of programming! Even if the problem is just a personal itch you want to scratch like: “How can I backup my files automatically?”. Many problems have been solved with existing programs. Backing up files is an example of this. We will have to solve some simple problems however, in order to learn some basics. I will keep the focus on problem solving so we don’t forget the bigger picture ;).
What is called a program can be a few things. It usually starts as a plain text file containing instructions written in a programming language. It usually then becomes the version a computer can execute. Lets talk about that.
The code we write cannot be executed directly by the computer. It only understands one language, known as machine code, which we have no hope of reading or writing while maintaining our sanity. Machine code is just numbers, written in binary. Therefore what we write is human readable (just!), and then translated into what the computer can understand. There are two ways this is done: compilation, and interpretation.
A compiler is a computer program that takes the code we write and turns in into code that the computer can execute. If you like analogies, it’s like how a cook takes a recipe and turns it into food. Not literally of course. The cook uses the recipe as instructions for how to use other resources (ingredients) to make the food; and does so. After the cook is done the recipe still exists, and can be used again as many times as you like. This is just like using source code and compiling it into software. We call the code we write source code as it’s the original source of the eventual program that is created.
Once a program’s source code has been compiled, we say the output is an ‘executable’ as the machine can now execute it. Executables are also called binaries (as the are usually binary numbers), programs, or applications as they are usually applied to some problem to solve.
Sometimes executing a program is called running a program. Run and execute are synonymous in this context.
An interpreter does a similar job to a compiler. It takes source code, and turns it into a form the computer can execute. The only difference is that it does this while the program is being executed. There is no need for us to do the compiling before we want to run the program, it is done automatically, on the fly. The downsides to this are: programs that need an interpreter to run, need an interpreter to run; and they typically run slower than compiled programs. An interpreter is another program that must run on the computer, and the program that it runs has to rely on this extra ‘layer’ between it and the machine. This means programs execute more slowly when run by an interpreter than programs that have been compiled.
Now that you know all that theory, or can refer back to it at least, we can move on to some more practical information!
Step one is to choose a text editor. I advise you use whatever you have on your computer, unless it really sucks. I will assume you use free software and ensure the rest of this booklet works on the following operating systems:
On GNU/Linux you typically have a text editor included such as Gedit, Pluma, Kate, Leafpad, or Mousepad. If your system is command line only, you will have at least one of vi, vim, emacs, or nano. I will assume you know how to use these terminal based text editors.
Whichever text editor you have will be fine for writing programs, provided it is not a word processor. Don’t use an office suite’s word processor, or any web based word processors. They won’t work.
Each programming language is used to write source code. Source code is just text saved in a file. For this source code to be useful it must be executed somehow. Each language is executed in a different way, some very similar, but some very differently. For a program in a particular language to be executed, it requires it’s own execution environment. I will cover setting up each language’s environment with their introductions next.
The three programming languages we will be learning throughout this series of booklets are POSIX Shell, C, and Python. I have chosen these as they are available on all platforms and represent very different uses of programming languages. This will give you a good understanding of how to apply the basic concepts in different ways. We will also cover how these three languages are related, and a few ways to use them together.
C is a compiled language. This means once you have written a program in C, you must use a compiler to turn it into an executable.
The C compiler we will use is called cc
. On Debian you
should use GCC. GCC is an acronym for GNU Compiler Collection. It was
originally just a C compiler, and the name an acronym for GNU C
Compiler. On your system you must ensure the program cc
is
installed. Some operating systems include it by default, others require
you to install it.
On Debian, it can be installed using the following command if you are logged in as the root user:
# apt install gcc
On FreeBSD cc
is provided by default by the clang C
compiler.
A shell is a type of interpreter as described above. A shell is an interpreter, specifically for an operating system in this case. POSIX is an operating system standard, that defines a Shell Command Language. This language standard is adhered to by many shells. The name shell means it is the ‘outer layer’ of an operating system. Modern operating systems typically have a graphical shell by default called a graphical user interface (GUI). They also usually have a text user interface (TUI), although some try to hide it. Text user interfaces are more commonly called command line interfaces. This is because the user type it commands, line by line for the computer to execute.
Although graphical user interfaces are technically a type of shell, for simplicity, and because this is how most people refer to it, I will use the term “shell” from now on where I mean a POSIX compatible shell.
Reference to the POSIX standard
There are many shells that are POSIX compliant. Most shells offer their own extensions to the POSIX features. Here, we stick to POSIX, so any POSIX compliant shell will do.
Bash is an acronym for Bourne Again SHell. It is the name of a shell
that adheres to the POSIX Shell Command Language standard. Bash is
designed for Unix like operating systems, but it can also run on windows
using Cygwin. As this tutorial focuses on using GNU/Linux, you should
have the bash
package pre-installed, and I will use bash
for the examples. Other operating systems come with POSIX compliant
shells such as macOS with zsh
, and FreeBSD with
sh
.
Python is another interpreted language. Unlike POSIX shell, it is not OS specific. The python interpreter is designed to run on any OS. There are two major versions of Python: 2 and 3. They are very similar but not compatible. We will use 3 as it replaced version 2 as the only supported version by the developers as of 1st January 2020.
Install python 3 if it is not already. You can get it from your package manager, or from the python website. On Debian the command is:
# apt install python3
On FreeBSD the command is:
# pkg install python3
We are going to start with the traditional first program which is boring but essential. It’s called Hello World. The reason we use this program is to cover the basics of getting the compiler or interpreter up and running, and making sure we can write and execute, one of the most basic possible program. If we can’t do this, we can’t really progress!
What this program will do is display the text “Hello, World!”. This is a form of data. There are many others that we will cover soon. For now, we will focus on text. Text like “Hello, World!” is called a string in computer terms. Computer programs are all just text so we need some way to distinguish the regular text of the program, and the text we want to keep as text when the program runs. To remember the name string, think of it as a string of beads on a necklace. It’s just a string of characters one after the other.
Which brings me on nicely to characters! Characters are any single letter, number, punctuation or other symbol. For example ‘a’ is a character, ‘!’ is also a character. We typically write characters and strings in single or double quotes. This depends on the language however
Let’s start with shell as we will use it to run C and Python. As I said, shell is an interpreted language. Like many interpreted languages, shell has a REPL, which stands for Read–Eval–Print Loop. This means we can type the program into the interpreter which will interpret it on the fly.
Open your terminal application. Most Linux distributions use
bash
by default, if you are not sure, enter the following
command:
$ sh
Now that we are sure to be using sh
in your terminal,
enter the following command:
$ echo 'Hello, World!'
As you should see, the string ‘Hello, World!’ has been displayed on the screen directly after the line where you entered the statement.
You may have noticed we used single quotes for this string. This is one way to denote a string in shell. Another way is using double quotes. Try this:
$ echo "Hello, World!"
It should work the same way for this string, however depending on the content of the string, it might not. When using single quotes in shell, exactly the characters that appear in the string are shown. This is called a literal string. However, if you use double quotes, some characters have special meaning and can change the string. This can be useful, but we will get to that shortly.
The third way to use strings in shell is this:
$ echo Hello, World!
If you enter that, you will see you get the same result. This is because the type of data the shell deals with by default is strings. Most other languages deal with numbers first and foremost, so need to distinguish strings using quotes. Shell however, assumes it will be working on strings, so we don’t always have to quote them. It is better to use quotes as standard though. This avoids the shell interpreting the content of strings when you didn’t intend that behaviour.
As you will see, there are many ways of doing the same thing in most programming languages. That is why it is important to focus on the what rather than the how. If you know what you want to do, the how is flexible, and changes depending on the language. The what is almost always the same however.
echo
is a special command utility in shell.
echo
will do one thing as you have seen: echo some text
back to the user. When I say ‘back to the user’ what I mean is, it is
shown on the screen so that you (the user) can read it. Showing text to
a user is an integral part of any shell. It is called standard output.
Historically, standard output was printed on paper, as was everything a
user typed in. The remnants of this can be seen in the more usual term
for displaying text: ‘print’. Shell also has a print command called
printf
. The ‘f’ is short for format. This means it expects
a format string, and potentially some other data to be formatted with
it. We will go in more detail on this when we get to C. For now try:
$ printf "Hello, World!"
You might notice something different here. The result of using
echo
, was that the string was printed on one line, by
itself. The output of printf
however is followed directly
by the shell prompt, with not even a space. This is because
echo
automatically appends an invisible newline character
that, you guessed it, puts any following text on a new line. We can
write a newline character by, strangely, using two characters: a
backslash followed by an ‘n’: \n
. So if we append this to
our string, there should be a new line before any following text. The
double quoted string reveals it’s difference here though.
$ printf "Hello, World!\n"
If you enter this you will get an error message like this:
bash: !\n: event not found
. This is because an exclamation
character !
is one of the shell’s reserved words. Bash also
uses it for history, which is what the above error message is about. So
!
is interpreted by shell when in a string using double
quotes. Try again with single quotes:
$ printf 'Hello, World!\n'
That should work as before. One caveat is that a string that uses either type of quotes cannot contain the same type of quotes. For example if we want to print the string ‘This isn’t going to work.’ it won’t work. You can try it and see:
$ echo 'This isn't going to work.'
You will kind of get stuck at a different prompt that usually looks
like >
. To get out of it, either enter another single
quote character.
The reason that didn’t work is the shell can’t tell where the string
begins and ends. Is the string ‘This isn’ or is it ‘t going to work.’?
It can’t be both because a single quote delimits the beginning and end
of a string. We either have a string that doesn’t begin, or one that
doesn’t end. echo
will just recognise the first complete
string and treat the rest as more text with the beginning of another
string at the end. >
is the shell telling you it’s
waiting for more input, in this case the end of the string that was
started with '
.
printf
fails in a slightly different way to
echo
. You can experiment with this but for now we will move
on.
Python is interpreted, just like the POSIX shell language, but we
must use the Python interpreter instead of a POSIX shell. One of the
main uses for a shell is to run other programs. We will now use Bash to
run the Python interpreter. At the time of writing, the Python 3
interpreter is usually called python3
and Python 2 is just
called python
. A few systems call Python 3
python
and call Python 2 python2
. Helpfully,
Python will print it’s version number when it starts so we can be sure
we’re in the right version. So if you enter python3
and get
an error, try python
. As most systems use
python3
now.
In your terminal, enter the python command:
$ python3
You should see a few lines of text printed, including the version
number which should begin with 3. The final line should be
>>>
. This is the python prompt.
You enter commands into the Python interpreter just as you did for the shell. Try this:
>>> print('Hello, World!')
You should see the usual message printed to the terminal, then be returned to a new Python prompt.
As you will have noticed, the way we write the command in python is slightly different than in shell. Some definition of terminology here is required.
In Python, the way we print to standard output is to use a function. In shell, the same concept is usually referred to as a command, or utility. In fact common use is to call any single instruction in shell a command, weather it is a function or not.
Simply put: a function is a way of grouping instructions for the computer to execute.
In this example, the function print
, takes some data in
the form of a string, and prints it to standard output. The way
functions are used in Python is to enclose the data we give it in
brackets. Brackets are also known as parentheses. Each piece of data we
give to a function is called an argument. This is a term used in all
programming languages so we will use it from here on.
We will now jump back to shell to explore other ways to run programs in interpreted languages. To exit the Python interpreter enter another function:
>>> exit()
As you can see, this function takes no arguments as the brackets are empty. It is still a function however, which in Python’s syntax requires brackets. The exit function doesn’t need any arguments, as it will just exit the Python interpreter.
Now we’re back in the shell, we need to go over files and environments.
Bash, as a POSIX shell, is a full text user interface to your operating system. I’m assuming this is GNU/Linux, so what we will do is use the shell to explore our system and create some files.
To have Bash print exactly where we are in the file system, enter this command:
$ pwd
This is an acronym for Print Working Directory. You may be used to calling directories folders. They are one and the same. The original name is directory and that is the name most languages use. Your working directory is the directory you are currently working in. You should have seen output like this:
/home/name
Where name
is your user name. You might be somewhere
else, which is fine. pwd
will show you where you are no
matter what! In fact if you are ever unsure where you are in the file
system, use pwd
.
You can change your working directory with the cd
command. An acronym for Change Directory. Try this:
$ cd ~
That squiggle is the tilde character. You probably don’t use it
often, and will probably need shift to access it on your keyboard. You
should now be in your home directory. That is, the directory with the
name of your user name. You can of course check with
pwd
.
cd
usually takes one argument, as above. It should be a
directory name, either one relative to the current directory, or it’s
full path from the root directory.
~
is a short way to refer to your home directory. On
GNU/Linux it will be in another directory, called home
. A
bit confusing but gets the point across. Now that you’re in your home
directory, pwd
will show you where that is. /
is used in two ways here. It represent the root of the file system, and
is the separator between directories and files. The root of the file
system is just the name of the directory that contains all other
directories. We can change to the root directory with this:
$ cd /
Now you’re in the root directory which you can see with
pwd
. The next shell utility is ls
, which is
short for list somehow.
$ ls
You should now see the names of all the files and directories in your root directory. Change to the one called home with:
$ cd home
Now if you enter pwd
you should see that you are in the
home directory, which is in the root directory (denoted with
/
). From here, enter ls
again. You should see
at least a directory for each user account on your system, including
yours. These are called the home directories. Your home directory is the
one named with your user name. You can check your username with the
command:
$ who am I
Yes that’s right, you’re asking your computer who you are. Don’t have
an existential crisis though, your user name will be printed to standard
output. Use cd
to change to your home directory. Now we can
create a file with the command touch
.
$ touch hello.sh
This will create a new, empty text file called ‘hello.sh’. Touch is
an old command with a weird name like most shell utilities. It is
designed to ‘touch’ a file, leaving a trace that the file had been
accessed or modified. POSIX compliant Operating Systems keep track of
when files were last accessed or modified. touch
by default
just updates that time to the present. If however, the file named
doesn’t exist, touch creates it. A roundabout way to create an empty
file I know, but it’s the standard way to do it on a POSIX system! The
catch here is that if you already have a file named ‘hello’, you won’t
create a new one. This is unlikely however, but just in case, you can
use a different file name.
We use the .sh file extension so that we know this is a shell script. This is common, but not required. It is also considered bad practice for finished scripts that you might distribute to others to use.
Now we should edit this file with your chosen editor. Usually you can just type the name of the editor in all lower case, and give it the file name as argument. For example if the editor is Gedit, enter:
$ gedit hello.sh
The editor should pop up, with the empty file open. Enter the following text:
#!/bin/sh
echo 'Hello, World!'
Now save, then close the editor. You should be back at your shell prompt. There might have been some output from your editor, ignore it if there is.
What we have created is called a shell script. A script is a file,
containing instructions for an interpreter to execute. In this case, the
interpreter is sh
. GNU/Linux has a very good security
feature called permissions. We currently don’t have permission to
execute the script because this is the default for any new file. To give
ourselves permission enter this command:
$ chmod u+x hello
chmod
is short for CHange MODe. Change mode means to
change the modes of the permissions of a file. As you can see,
chmod
take two arguments: u+x
and the file
name. u+x
it the cryptic way to tell chmod
to
change the mode of the user’s permissions, to add the execute
permission. user, add, execute: u, +, x.
Now we can execute the script by entering the following command:
$ ./hello
You should see the familiar Hello, World!
message
printed to standard output.
Some explanation: The first line of the script #!/bin/sh
is a special message to the shell. It’s telling the shell which
interpreter to use to interpret the rest of the script.
The first two characters in this context are called the ‘shebang’, this naming may have come from a contraction of the character’s names “haSH”, and “bang” an old name for “exclamation mark”.
The next part /bin/sh
is a file path. It is telling the
shell to look there for the right interpreter. Here we specify that it
is sh
we want to use to interpret the script. More on
environments later. sh
is the POSIX standard name of a
compliant shell.
No we’re going to write a script in python. As we did with the shell script, lets first create a file:
$ touch hello.py
As you can see we are using the .py
file extension for
python. Similar to shell scripts, this is not required in general, but
is quite common for personal scripts. We will use it so that we don’t
get our different language files mixed up.
Again open the file with your text editor:
$ gedit hello.py
Now type in the following program:
#!/usr/bin/env python3
print('Hello, World!')
Similar to our shell script we start with the shebang. However the
next part is different here. /usr/bin/env
is the file path
in this case. env
is a POSIX standard utility. It obtains
the current environment of the system. This includes
python3
. So the program env
knows where
python3
is, regardless of the OS it runs on so we pass that
as argument to env so that env knows we wan it to run
python3
. More on environments later. And as with the shell
script, save then close your editor. Exactly the same as the shell
script we have to make this one executable:
$ chmod u+x hello.py
Now we can run it. This is done the same way as the bash script:
$ ./hello.py
You should see the ususal messege printed to the standard output.
Now that we’ve done the Hello World program in Shell and Python, we will do the same in C.
As with the previous two, let’s first create an empty file:
$ touch hello.c
As you can see, a C program uses the file extension .c
.
This is standard in C and required for the source code files for C
programs.
Open it with your text editor:
$ gedit hello.c
Now enter the following program:
#include <stdio.h>
int main()
{
printf("Hello, World!\n");
}
We have some extra syntax here even beyond what was required for python! C is a lower level language python or shell. This means it is closer to the machine’s language, and requires more detail. And remember, C is a language that must be compiled, rather that interpreted. Let me explain:
The first line #include <stdio.h>
is an
instruction to the compiler to include another source code file called
stdio.h
. stdio.h
is a C header file (denoted
by the .h
file extension). Header files include other
functions that many programs need, so we don’t have to write them
repeatedly for each of our own programs.
The third line begins with a function definition
int main()
. There are a few things going on here. First,
int
is a type of data. Short for integer, int
specifies the type of data that should be returned by the function.
Functions are subsections of programs and so can be run repeatedly. Typically they take in some data, return some data, or have some side effect. Either of the input data, or output data can be nothing.
In some languages the type of data must be specified. C is one of those languages. Python and shell language have interpreters that can figure out the data types for you.
Did you save your C program? Make sure you do. Now that you know more
about functions in general, let’s look more at the main
function in C.
main
is the name of the function. It is a special name
in C as it is the one function that is required for every C program. As
with the print function in python, functions in C use brackets to
surround the data that goes into a function. In this main function, the
brackets are empty. This shows that this function will not take any
arguments in this case.
Next we have a line with one character, an opening curly bracket
{
. This is used in C to denote the beginning of a block of
code. Likewise on the last line is the matching closing curly bracket
}
. This denotes the end of the block of code. Blocks of
code like this are just a sequence of instructions that are defined as a
group. In our case, it groups just one statement. Grouping one thing
seems kind of redundant, but in C functions of any length require curly
brackets.
When writing programs in any language, functions are written in two ways: definition, and usage.
When a function is defined, we write out exactly how it should work. This includes what data it expects as argument (if any), what data it will return (if any), and what it is supposed to do.
When a function is used, we ‘call’ the function to come and do what we expect it to do. Thing of a well trained dog. You call it’s name and you expect it to follow your instructions. Functions are like dogs that are trained to do one routine of tricks, and only one. All you have to do is call it’s name, and give it what it need to perform it’s ‘tricks’, and off it goes. Each function will always do the same thing, given the same inital state and data. We get a function to do it’s thing by calling it’s name and passing it the required data as arguments.
In this Hello World program in C, we define a function called
main
and use a function called printf
.
printf
is defined it the header file stdio.h
so we don’t have to define it ourselves. stdio
is short for
STandard Input and Output. Because we include the header file
stdio.h
in the first line, we will be able to use the
function printf
in our program.
The definition of any function in C has two main parts: the function header, and the function body.
The header is one line that consists of: the type of the data that
will be returned, the function’s name, and in brackets the list of
parameters it needs. Our main function will return data of type
int
, it’s name is main
, and it lists no
parameters because in needs none. Parameter is another name for the
defenition of an argument. Strictly speaking a parameter is the name
used when defining a function, and argument is the name used when
calling a function.
The body is a block of code. Blocks of code are surrounded by curly brackets as we know, and contain any other code we want.
Each individual instruction in C is called a statement. Think of it
like a sentence in English. In any programming language however, it is
common (and sometimes required) to put each statement on it’s own line.
Like in English though, there are certain grammatical rules. In the same
way a sentence in English ends with a full stop, a statement in C ends
with a semicolon. We can see this with the statement
printf("Hello, World!\n");
in our main function’s body.
You will notice that the single statement we have defined in our main function is indented from the left margin. I have used 4 spaces here but other people use different numbers of space characters. Some people like to use tab instead of space. Whichever way you do it, when used in this way, these invisible characters are referred to as white space. This is because when code is printed on white paper (as it always used to be) the empty space left by these characters is white.
In C is it standard for people to indent code in a block. Later we will see other blocks of code indented in this way. It is not required for C code to be indented. It fact much of the white space is not required for the program to work, but it makes it easier for humans to read and understand. Just as in English, the same text can be written in one giant block, bit it is easier for humans to read and understand it if split into paragraphs and chapters.
You will recognise the string as we used a similar one in shell when
we used the printf utility. The difference here is that strings in C
always use double quotes. We don’t have to worry about the
!
having special meaning in a string as it doesn’t in C.
There are other parts of a string that can have special meaning in C
which we will come to later.
When we wrote shell and python code into files, we made them
executable, then executed them from the command line. With C we have an
extra step as it is a compiled language. You guessed it, we need to
compile it. For this we use the compiler cc
. Once you have
saved and closed your editor, enter this:
$ cc hello.c
You will not see anything special displayed if everything went okay. If you see any errors, check you have typed the program exactly as shown above.
You should have a new file called a.out
in your working
directory. This is the output of the compiler and is our compiled Hello
World program! The compiler will have made it executable by default, so
we can run it with this:
$ ./a.out
As with the bash and python versions of this, you should see the string “Hello, World!” printed on it’s own line to standard output.
Congratulation! You have now completed the traditional Hello World program in three languages! You should have learned a lot compared to many Hello World tutorials. We have covered compilers, interpreters, shells, function definitions and function calling, the string data type, creating and editing files, navigating the file system, execute permissions, to name a few.