Posts Tagged programming basics

What In The Hell Are Conditionals

GOTO All What In The Hell Articles

A conditional is a mechanism for altering the control flow of a program based on the result of a test. That is a very technical and complicated description of something that is very simple in practice. Let’s look at a small example before continuing.

if temperature > 80:
  run_air_conditioning()

All humans are used to dealing with conditional statements, in fact we probably compute thousands of conditionals every day without even thinking about them. They’re so in-grained in our thinking that I didn’t even have to explain the example above, you just got it.

The most basic form of conditional, which is present in almost every language, is if/else. Each language has slightly different syntax which we’ll get to in a moment, first let’s look at some psudo-code that describes the basic workings of if/else.

if test:
  do_if_test_is_true()
else:
  do_if_test_is_false()

When our programming language comes across this block of code it will first evaluate test. If the evaluation of test equals true, the do_if_test_is_true() function will be called. If the evaluation of test equals false, the do_if_test_is_false() function is called.

Note that if test evaluates to true the do_if_test_is_false() function is never called. This behavior is called branching and a popular way of specifying control flow in a program.

if predicate:
  consequent
else:
  alternate

These are some common terms that refer to the parts of an if/else. Predicate refers to the expression or statement that is to be tested. Consequent refers to the action to be taken if the predicate is equal to true. Alternate refers to the action to be taken if the predicate is equal to false.

You’ve already seen examples from Python (even our psudo-code examples were valid Python), here’s some examples from other languages of if/else.

Javascript

if (2 > 1) 1
//===> 1

if (1 > 2) {
  1;}
else {
  2;}
//===> 2

Scheme

(if (> 2 1) 1)
;===> 1

(if (> 1 2)
    1
    2)
;===> 2

 

Else If

Another common pattern is if/else if/else. This pattern is equivalent to chaining if/else expressions together. Here’s some psudo-code of an if/else if/else in Python and then a diagram of how it is executed.

if temp > 80:
  run_ac()
elif temp < 60:
  run_heat()
else:
  do_nothing()

As you can see, Python first checks if the temperature is greater than 80. If the temperature is greater than 80, Python will call run_ac() and ignore the rest of the code. If the temperature is less than 80 it will call the elif branch and check if the temperature is less than 60. If the temperature is less than 60, Python will call run_heat() and if the temperature is between 60 and 80 Python will call do_nothing().

Instead of using the keyword elif, we could have written our Python code a different way that emphasizes the nesting.

if temp > 80:
  run_ac()
else:
  if temp < 60:
    run_heat()
  else:
    do_nothing()

Writing our program this way makes it clear that each ‘else if’ branch is just another if/else. However keywords like elif exist for a reason and if you are thinking about nesting your if/else expressions, don’t.

You’ve seen elif in Python, now I’ll show you how some other languages handle it.

 

Functions in Other Languages

Javascript

if (temp > 80) {
  1;}
else if (temp < 60) {
  2;}
else {
  3;}

Scheme

(cond ((> temp 80) (run-ac))
      ((< temp 60) (run-heat))
      (else (do-nothing)))

As you can see the syntax can vary quite considerably between languages. However the fundamental concept of the if/else is the same in most programming languages.

 

Switch

Switch is another form of conditional. Using the if/else if/else model above is sufficient for a small number of possible conditions. However as the number of conditions grows the limitations of if/else if/else start to show.

The main problem is that each condition must be checked every time the conditional is evaluated. If there were 100 conditions each one would have to be checked until one of them was true. If the condition that returns true is at the end of that list, it could take a lot of time to find it.

Switch statements are usually found in lower-level languages, however I’ll show you how to implement them in higher-level languages as well (should you need them). A switch in a lower-level language is often implemented as a branch table (or jump table) and can only be done on integers.

Switch in C:

switch(expression_returning_int) {
  case 1:
    printf("Expression returned 1.\n");
    break;
  case 2:
    printf("Expression returned 2.\n");
    /* since there is no break; here it will 'fall through' to default. */
  default:
    printf("Expression returned greater than 1.\n");
    break;
}

In the example above the expression expression_returning_int will be evaluated (it must return an integer). The value of the expression will then be used as an index into the branch table. Control is then transfered to the branch referred to by the expression or the default branch (if supplied).

In the example above a break statement was not supplied for the case 2. If expression_returning_int equals 2 it will print:

Expression returned 2.
Expression returned greater than 1.

The break statement normally stops the execution of that branch. When it is not present the control flow “falls through” to the next branch, and the next, and the next until it hits a break statement or the end of the switch. This is a common characteristic of switch statements, however it is not universal. A switch would still be a switch even if it didn’t allow execution to fall through branches.

 

Switch in Higher-Level Languages

Many high level languages do not provide a switch statement. However this is rarely a problem as they often do supply data structures which let us represent one easily.

A switch on integers can be done as an array lookup, if you want to learn more about arrays first go to What In The Hell Are Arrays.

If we think about the switch statement above it maps out a little bit like this…

You’ll notice a problem with the above representation. Where is the ‘default’ branch? The default branch would be executed by array_lookup if it did not find a value in the array. The implementation of this will be left as an exercise to the reader as the specifics vary from language to language.

Similarly a switch on strings, or other hash-able values, can be done as a hash lookup. If you want to learn more about hash tables go to What In The Hell Are Hash Tables.

 

Links

GOTO Table of Contents

, , , , , , , , , ,

No Comments

What In The Hell Are Hash Tables

GOTO All What In The Hell Articles

A hash table is a combination of two parts; an array, to store the data, and a hash function that computes the index of the array from a key. The best analogy to a hash table is a dictionary. When you use a dictionary you lookup a word (the key) and find the definition (the value).

Hash tables are an improvement over arrays and linked lists for large data sets. With a hash table you don’t have to search through every element in the collection (O(n) access time). Instead you pay a one time cost (the hash function) to retrieve data (O(1) access time).

 

The Hash Function

The purpose of the hash function is to compute an index from a key. That index is then used to store and retrieve data from the array. Many higher-level languages include some type of hash table and hash function.

As you can see from the diagram above the hash(“string”) function transforms a string into an integer which is then used as the index into the array. This index is used to store or retrieve data in the array.

The amount of time it takes to set or retrieve a value from a hash table is the time it takes to run the hash function plus the time it takes to access an element of the array. For small data sets this could take longer than simply iterating over each element of the array. However, since it is a fixed cost, for larger data sets using a hash table could be much faster.

Assuming we have a function set that takes a key to be hashed as its first argument and a value to be stored as its second argument, we can create the following hash table.

As you can see from the above diagram we store both the key and the value in the table. The reason for this is collisions, which will be explained below.

 

Collisions

A collision occurs when two different keys hash to the same value. Collisions will occur in even the best designed hash tables, however there are a few reasons why they can occur more frequently. The first reason is a poor hash function that does not distribute the indices evenly throughout the table, this will be explained in more detail below. The second reason is that the array is too small. If you have a 10 element array and you are trying to store 100 key/values in the hash table, there will obviously be a lot of collisions.

In the diagram above “Puff” hashed to the same value as “Peter” causing a collision. Both “Peter” and “Puff” are now stored in the same index of the array. A certain amount of collisions are unavoidable in a hash table, so we’ll need a way to deal with them.

The simplest way to deal with collisions is to make each element of the array a linked list. Then, when a collision occurs, we simply search through the linked list to find the data we want. This way the hash function gives us a ‘starting point’ to begin our search.

With a good hash function and an appropriately sized array collisions are a rare occurrence. However it is possible, though very unlikely, that all the key/values will be stored in one index and we will end up doing a search on a linked list.

 

What is a good hash function?

The sole job of the hash function is to return a unique array index for each key. This is not always possible and collisions can be expected to occur when the number of elements in the hash table approaches the square root of the size of the hash table’s array. A hash function that does not provide a uniform distribution of keys will cause a lot of collisions, possibly breaking our hash table’s performance down from O(1) to O(n). The only workaround to an ineffective hash function is to make the hash table’s array much larger than the number of elements that it holds, which obviously wastes space.

 

Hash Table Implementations

Python

Python’s equivalent of a hash table is called a dictionary or dict.

# Creating a dictionary
d = {'a':5, 'b':10}

# Adding key/value pairs
d['c'] = 15

# Get value by key
d['c']
#===> 15

Chicken Scheme

;; Creating a hash-table from an alist
(define d (alist->hash-table '((a . 5) (b . 10))))

;; Adding key/value pairs
(hash-table-set d 'c 15)

;; Get value by key
(hash-table-ref d 'c)
;===> 15

 

Performance

Indexing: O(1)
Insertion/Removal: O(1)
Search: O(1)

The performance listed above is what is normally found in specific hash table implementations. There are actually a number of different implementations, each with their own advantages, disadvantages and performance guarantees. Specific implementations will have to be presented in future WITH articles.

 

Links

GOTO Table of Contents

, , , , , , , , , , , , , , , , ,

No Comments

What In The Hell Are Strings

GOTO All What In The Hell Articles

Strings are a data type that consist of a sequence of characters. The simplest way to think about strings is that they are text. Strings are usually formed by “putting quotes around some text” or by using a string function.

Strings come in two varieties. Terminated strings use a special character to signal the end of the string. This character is not allowed to appear in the string, as it would cause the string to end, leaving out the rest. Strings can also be implemented using a length field. These strings do not need a terminating character as the length of the string is stored with the string. In many languages you will not be able to tell which type of string you are using and it is usually irrelevant. However some languages, like C, require you to place the termination character manually.

Strings can use a variety of encodings for the characters contained within. The two most common are ASCII, which can represent 128 different characters, and Unicode, which has support for over 100,000 different characters and allows strings to hold non-English characters. Both forms of encoding are popular today, however most applications are moving toward Unicode characters as the need for international software grows.

One common problem with strings is representing characters that would otherwise be interpreted by the language to mean something else. For instance the string “Have you read the book “1984″ by George Orwell?” would be interpreted as 3 expressions; (1) The string: “Have you read the book ” (2) The number: 1984 (3) The string: ” by George Orwell?”. This is obviously not what we want. In this case we would use an escape character to stop the quotes around “1984″ from terminating the string. Our new string would look something like this (this may vary from language to language): “Have you read the book “1984″ by George Orwell?”. In this string the ” character tells the language not to end the string when it sees the following “.

 

Terminology for Strings

Concatenation:
Joining two strings together to form one string is called concatenation.

Substring:
A substring is a part of a larger string. For example, in the string “Hello World” one possible substring would be “Hello”, and another could be “llo Wo”.

Literal:
A literal string is a string that appears directly in the source code. Strings formed by “putting quotes around them” are usually literal strings in a language. To build a regular string you must usually use a string function. In languages with mutable strings it is usually not possible to mutate a literal string, even where it is possible it is almost never a good idea. More information on string literals can be found here.

 

Code

C
Strings in C are represented as a null-terminated array of characters. To create a string you create an array, fill it with characters and place a null character directly after the last character in the string. If you are using string literals to create a string you do not have to supply the size or the null character, the compiler will do that for you.

/* Create a string */
char s[6] = {'H', 'e', 'l', 'l', 'o', '\0'}
char sl[] = "Hello World"

/* Escape Character */
char e[50] = "Have you read the book \"1984\" by George Orwell?"

Python
Strings in Python are enclosed in quotes. You can use one of three different quoting styles to create a string. Single quotes are useful when you have double quote characters in the string. Triple quotes allow you to place newlines in the string.

# Create a string
a = "Hello World"
b = 'Have you read "1984" by George Orwell?'
c = """Materials:
Pen
Paper"""
# Escape Character
d = "Have you read the book \"1984\" by George Orwell?"

Scheme
Strings in Scheme are enclosed in quotes or created with the ‘string’ function. Newlines may be embedded in any string.

;; Create a string
(define a "Hello World")
(define b (string #H #e #l #l #o))

;; Escape Character:
(define c "Have you read the book \"1984\" by George Orwell?")

 

Performance

The performance characteristics of a string will depend on the data type that is used to represent them. Most often arrays are used, however linked lists are somewhat common as well. Strings may also be either mutable or immutable which can have an impact on their algorithmic performance.

Because of these factors an accurate overview of the performance of strings cannot be given. However in the vast majority of cases strings will share the same characteristics as arrays.

In the case of terminated strings implemented as an array the indexing time can still be O(1) however supplying an index greater than the length will access memory outside the string and, hopefully, cause an error. You may need to determine the length first to prevent this, which would be an O(n) operation.

 

What the heck is: a String
A great article on strings, much more in-depth than this one. Incidentally where I got the inspiration for this series.

 

GOTO Table of Contents

, , , , , , , , , , , , , , , , , ,

No Comments