Lesson 9: Text and String Manipulation
Not all games involve pressing buttons to shoot enemies on the screen. Many of the all time favorites involve entering text to play such as Hangman, Scrabble, crossword puzzles, Boggle and Text Twist just to name few. Luckily QB64 has a rich feature set of commands that allow the programmer to manipulate text entry in many different ways.
The UCASE$ and LCASE$ Statements
When a user answers a program's question, for instance, "What is the capital of Ohio?", how will the user answer? Will the user type in the correct answer? Will the answer be in UPPER or lower case or a cOmbInaTion of both? How do you as the programmer check for all of these variations? The answer lies in turning the user's response into something that can easily be checked. Type the following program in and save it as UCASE.BAS before executing it.
As long as the user knows how to spell Columbus the case of the response does not matter. The UCASE$ statement in line 14 returns the string parameter associated with it in all UPPER CASE. The LCASE$ statement does just the opposite, returning the string parameter associated with it in all lower case. Line 14 could have just as easily been:
IF LCASE$(Answer$) = "columbus" THEN
and the program would behave just as it did using UCASE$.
The LTRIM$, RTRIM$, and _TRIM$ Statements
User's do funny things like not enter responses as we programmers expect. You always have to take into account the different ways a response can be entered in the WRONG way by a user for your code to be effective. The example code above will fail if the user enter spaces (presses the space bar) either before or after the word Columbus. The spaces will be seen as legitimate characters typed in by the user.
There are three statements that will remove leading, trailing, and both leading and trailing spaces from a string. LTRIM$ removes leading, or left hand, spaces from a string. RTRIM$ removes trailing, or right hand, spaces from a string, and _TRIM$ removes both leading and trailing spaces from a string. If you change line 14 in the example code to:
IF _TRIM$(UCASE$(Answer$)) = "COLUMBUS" THEN
Answer$ will first be converted to upper case and then have both leading and trailing spaces removed from it. Yes, this new line of code looks a bit complicated because one statement is embedded into another. This is a very common thing to do in programming that enables multiple actions to be taken on a parameter at the same time. Simply follow the order of operations to see how the line of code operates. UCASE$ falls inside of _TRIM$'s parenthesis so the UCASE$ statement is acted upon first. Once UCASE$ has returned the upper case form of Answer$ it's _TRIM$'s turn to take that upper case Answer$ and remove the leading and trailing spaces. You now have a string returned that is in both upper case and has had the leading and trailing spaces removed.
You could even change the line of code to read:
Answer$ = _TRIM$(UCASE$(Answer$))
IF Answer$ = "COLUMBUS" THEN
to permanently modify Answer$ by making the changes and then placing those changes back into Answer$ itself. This is handy if you need to reference Answer$ again later in the code as it will save you from having to test for upper case and leading or trailing spaces again.
The INSTR Statement
The previous examples work great for single word answers but those darn users will always test the limits of your code. What if the user were to answer "I think it's Columbus?" Technically the answer has the correct response embedded in the string the user entered but how do we see that in code?
The INSTR statement has the ability to search a base string with a defined search string and return a numeric value of where the search string was found within the base string. Once again let's modify the previous example code to see this in action. When you execute the program type your answer in as "I believe Columbus would be the answer" and see if you are correct.
Figure 1: This is how Skynet got its start!
INSTR requires a string to search called the base string. In this case the base string is the upper case result of UCASE$, so the entire response entered is converted to upper case. The next parameter that INSTR requires is a string to search for called the search string. In line 14 we supplied INSTR with a search string of "COLUMBUS". If the string "COLUMBUS" is found anywhere within the user's response INSTR will return a numeric value of where it was found. If INSTR is anything other than zero then "Correct!" will get printed since we know that the IF...THEN statement will consider any numeric value other than zero as true.
There is one strange behavior of INSTR that needs to be pointed out. The following line of code will result in a value of 1 being returned.
Location% = INSTR("ABCDE", "") ' null string returns a value of 1
A search for a null string ( "" ) will always result in a positive value being returned (unless you search for a null string within a null string). This may sound like a bug but in actuality every string does contain a null string. You just need to be aware of this behavior if you start receiving a result you did not expect.
INSTR also has another trick up its sleeve. It can find multiple occurrences of the search string in the base string and report back where it finds all of them. Type in the following example to see how this works. Save the code as InstrDemo.BAS when finished.
Figure 2: The weather in Spain seems nice
The INSTR statement can accept an optional position parameter as seen in line 20 of the code.
Position% = INSTR(Position% + 1, Phrase$, Search$)
Each time an instance of "ain" is found that position is recorded so the next time around Position% + 1 can be used to resume the search through the remainder of the base string. This continues on until Position% becomes zero, meaning no more matches found, and ending the loop.
The STR$ and VAL Statements
There will be times when you need to convert a string to a numeric value and a numeric value to a string. The STR$ statement is used to convert a numeric value to a string. The VAL statement is used to convert a string to a numeric value.
LINE INPUT "Enter a number between 1 and 10 > ", Number$
Value! = VAL(Number$) ' convert string to a numeric value
The above example shows a number being asked for however it's being saved in a string variable. The VAL statement is used to convert Number$ into an actual numeric value and then saved into the single type variable Value!. If the characters in Number$ are non-numeric, such as "Hello World", VAL will simply return a numeric value of zero.
INPUT "Enter a number between 1 and 10 > ",Value%
Number$ = STR$(Value%) ' convert numeric value to a string
In this example the opposite is being performed. The numeric value contained within Value% is being converted to a string and then saved into Number$. Positive numeric values converted to strings will always contain a leading space. The space is there for the possibility of a negative value that includes a minus sign. For example:
PRINT "*"; STR$(10); "*" ' * 10* printed to the screen
PRINT "*"; STR$(-10); "*" ' *-10* printed to the screen
You'll need to use either LTRIM$ or _TRIM$ to remove the leading space if you do not wish it to be present.
Number$ = LTRIM$(STR$(Value%)) ' no leading space
The LEN Statement
The LEN statement returns the number of characters contained in a string effectively reporting its length.
Phrase$ = "The rain in Spain falls mainly on the plain."
PRINT LEN(Phrase$) ' 44 is printed to the screen
The RIGHT$, LEFT$, and MID$ Statements
These statements are used to parse, or break apart, a string into smaller pieces. An example of this is when you use the TIME$ and DATE$ statements to retrieve the time and date from QB64. TIME$ delivers the time in the string form "HH:MM:SS" and DATE$ in the string form "MM-DD-YYYY". In order to get the individual hours, minutes, and seconds from TIME$, and the individual month, day, and year from DATE$ you'll need to to use the LEFT$, RIGHT$, and MID$ statements. Type the following example program into your IDE to show this. Save the code as ParseDemo.BAS when finished.
Figure 3: Time to make the donuts!
The LEFT$ statement is used to grab a predetermined number of characters starting at the left-hand side of the string. In line 30 of the program LEFT$ is used to parse just the month portion of DATE$.
Month% = VAL(LEFT$(DATE$, 2)) ' parse MM from MM-DD-YYYY
A statement within a statement within a statement! WooHoo, as I stated before this is a common occurrence in programming. Let's break it down.
DATE$ returns a string in the form of "MM-DD-YYYY". Then we plug that string into LEFT$:
LEFT$("MM-DD-YYYY", 2) ' get the first two left hand characters from MM-DD-YYYY
LEFT$ grabbed the first two characters of the string which equals the month. In our example this would be "04" or April. The VAL statement is now used to convert that string into an actual numeric value:
VAL("04") ' convert to a true numeric value
LEFT$ was also used in the same manner on line 33 to get the current hour from TIME$.
Hours% = VAL(LEFT$(TIME$, 2)) ' parse HH from HH:MM:SS
In line 31 the MID$ statement is used to parse out a string from within, or the middle, of DATE$:
Day% = VAL(MID$(DATE$, 4, 2)) ' parse DD from MM-DD-YYYY
MID$ starts at position 4 of the string and then parses 2 characters starting at that position.
And finally the RIGHT$ statement is used to parse out the remaining date from the right-hand side in line 32:
Year% = VAL(RIGHT$(DATE$, 4)) ' parse YYYY from MM-DD-YYYY
The ASC and CHR$ Statements
The CHR$ statement is used to print any value from 0 to 255 that corresponds to a character within the ASCII table. Typically CHR$ is used to print characters that are not accessible through the keyboard. One example would be playing card suits that are in the ASCII table.
PRINT CHR$(3) ' ♥ heart symbol
PRINT CHR$(4) ' ♦ diamond symbol
PRINT CHR$(5) ' ♣ club symbol
PRINT CHR$(6) ' ♠ spade symbol
The ASC statement does just the opposite and returns the ASCII numeric value of a character passed to it.
PRINT ASC("A") ' 65 printed to the screen
PRINT ASC(" ") ' 32 printed to the screen
This example program shows how the ASC and CHR$ statements can be used together to identify keystrokes. Save the code as CHR_ASC.BAS when finished typing it in.
The STRING$ and SPACE$ Statements
When you need a lot of the same character in a row the STRING$ statement has you covered. Simply supply STRING$ with a numeric value and a character like so:
PRINT STRING$(80, "*") ' 80 asterisks will be printed to the screen
You can also supply STRING$ with the ASCII numeric value of a character.
PRINT STRING$(80, 42) ' 80 asterisks will be printed to the screen
The STRING$ statement comes in handy when building text screen boxes by using the extended ASCII characters provided to do so. Type in the following example and save it as ASCIIBoxes.BAS when complete.
Figure 4: Old school ASCII boxes
The SPACE$ statement is used to generate spaces of a requested length.
PRINT SPACE$(80) ' 80 spaces printed to the screen
The SWAP Statement
The SWAP statement is used to switch values between two numeric or string values. The variables to be swapped must be of the same type. The following example shows the SWAP statement in action. Save the code as SwapDemo.BAS when completed.
Figure 5: Variable swapping
The LOCATE, CSRLIN, and POS Statements
The LOCATE, CSRLIN, and POS statements are used to set and retrieve the position of the text cursor on the screen. The following example code show hows to use the LOCATE command. Save the code as LocateDemo.BAS when finished typing it in.
Figure 6: Using LOCATE to position the text cursor
As you move the cursor around the screen it's position is printed at the bottom of the screen. When you click the left mouse button a smiley character appears where the mouse cursor was. The LOCATE statement that would be used to position the text cursor where the smiley is printed is shown at the bottom of the screen. Did you notice something about the numbers that were displayed at the bottom? If you look closely they reversed. That's because the LOCATE statement requires the text row first (the y coordinate) and then the column (the x coordinate). This is backwards from other coordinate oriented commands:
LOCATE row%, column%
The LOCATE statement can also be used to manipulate the text cursor in other ways. You can use LOCATE to hide or show the flashing cursor and well as control the size and shape of it with optional parameters:
LOCATE row%, column%, cursor%, cursorstart%, cursorstop%
cursor% can be set to 0 to turn the text cursor off or 1 to turn the text cursor on.
The text cursor is made up of 31 scan lines and cursorstart% and cursorstop% can be used to control which of those scan lines are used to change the shape of the text cursor. Both of these optional parameters can accept a value between 0 and 31.
The CSRLIN statement is used retrieve the current text row the cursor resides in (the y coordinate) and POS is used to retrieve the current text column the cursor resides at (the x coordinate). Here is another demonstration of these two commands being used together. Save the code as CsrlinPosDemo.BAS when finished.
Figure 7: Using CSRLIN and POS to save the text cursor location
The above code is fairly straight forward however there is one thing to point out. After the string of asterisks has been printed in line 22 there is a semicolon ( ; ) immediately following the PRINT command. This is to keep the text cursor from moving to the beginning of the next line. Remember that a semicolon used within a PRINT statement tells the text cursor to stay where it is and print what ever follows right after. It's also a handy way to keep the text cursor where you want it as is this case with this code.
The POS statement curiously requires a value within parenthesis after it:
Column% = POS(0) ' get current column position of text cursor
However the value of this number means nothing and 0 is usually placed there. It can be any integer value you wish, but again means nothing.
The Semicolon (;), Comma (,) and TAB Statement
The LOCATE, CSRLIN, and POS statements are great for precise control of the text cursor anywhere on the screen. If all you need is a little text cursor control on a single line however then the semicolon ( ; ), comma ( , ), and TAB statement are what you need.
The semicolon tells the PRINT command to leave the text cursor at the end of the PRINT statement instead of the default behavior of moving the cursor to the next line (also known as a CR/LF or Carriage Return/Line Feed). The following lines of code prints two literal strings on separate lines:
PRINT "Hello there."
PRINT "My name is Bob."
No mystery there as this is the expected behavior. However, add a semicolon ( ; ) after the first PRINT statement and things change:
PRINT "Hello there.";
PRINT "My name is Bob."
Even though two PRINT statements were used they both printed to the same line. The semicolon ( ; ) told the first PRINT statement to leave the text cursor alone and let it remain where it is. The second PRINT statement uses that text cursor position to print its string of information. This can be used to create complex lines of information with a single PRINT statement:
FName$ = "Bob"
Age% = 28
Job$ = "programmer"
PRINT "Hello there. My name is ";FName$;". I'm";Age%;"years old and I'm a ";Job$;"."
The comma ( , ) is used in the same manner as a semicolon ( ; ) but instead of leaving the text cursor at the end of the line it moves it to the next tab position on the current line line. All screens have hidden tab points on them spaced 15 characters apart. This is a throwback from the function of typewriters that would move the platen over to the next tab position by pressing the TAB key. This was a quick way for a typist to either move the platen quickly to the left or line up fields of information easily. The next lines of code show this behavior:
The first two FOR ... NEXT loops lined up the output in nice neat columns spaced 15 characters apart. However, you can only have up to 13 characters in a column. If you place more than 13 characters in a column the next column is skipped in favor of the one after that as seen with the last FOR ... NEXT behavior.
If you need precise control of tab positions on a line then the TAB statement is needed. Instead of the default 15 characters you can control where the next tab position occurs.
The first FOR ... NEXT loop lines the text up in 8 columns tabbed 10 spaces apart as expected. The TAB(count% * 10); at the end of each line within the loop moved the cursor to that calculated tab position.
However, if the next TAB position is not available because it has already been used the TAB statement will use that position on the following line. This can lead to unexpected results as seen in the second FOR ... NEXT loop.
The COLOR Statement
The COLOR statement is used to change the background and foreground color of text to be printed to the screen. The COLOR statement needs to parameters:
COLOR foreground&, background&
The foreground& is the color of the text and the background& is the color strip contained behind the text. Now that you can manipulate strings of data it's time to give those strings a dash of color. Type in the example code below and save it as ColorDemo.BAS when finished.
When the above code is executed the column of the text on the right is blinking. Foreground color values can be in the ranges of:
0 through 15 for standard non-blinking colors
16 through 31 for standard blinking colors (simply add 16 to the color value to make text blink)
Background color values can range from 0 to 7 for a total of eight. The standard SCREEN 0 text and background colors are:
0 - BLACK 8 - DARK GRAY
1 - BLUE 9 - LIGHT BLUE
2 - GREEN 10 - LIGHT GREEN
3 - CYAN 11 - LIGHT CYAN
4 - RED 12 - LIGHT RED
5 - MAGENTA 13 - LIGHT MAGENTA
6 - BROWN 14 - YELLOW
7 - LIGHT GRAY 15 - WHITE
Note: The color blue indicated for value 1 above has been altered. The true color of blue for position 1 was too difficult to see given the blue background of this page. The same goes for black.
Legacy SCREENs for the most part have these color limitations however 32bit screens do not. Type in the following code and save it as 32bitColorDemo.BAS when finished.
Figure 8: Groovy
As Figure 8 shows you can assign any _RGB32 value to COLOR to get the full spectrum of colors in a 32bit screen as shown in line 30 of the code above.
The PRINT USING Statement
The PRINT USING statement is used to print text in a predetermined format by supplying a template to use. This is a very powerful statement and one that too often gets overlooked. To understand how it works type in the example code below and save it as UsingDemo.BAS when finished.
Figure 9: Super hero roster
Where are all of the string manipulation commands to create the chart seen in Figure 9 above? There are not even any uses of semicolons, commas, or the TAB statement to get this output. That's the power of the PRINT USING command.
The reason this command is so often overlooked is because it was created to aid in formatting text on wide carriage printers that contained 132 columns and on texted based monochrome terminals back in the very early days of computing. Most of the old-school languages, such as COBOL, had statements like this too. When those went out of style programmers simply left this command behind. But as you can see this is an extremely useful tool at your disposal.
The secret lies in line 20 of the code. A string variable named Format$ contains a pre-made template on how to format data given to it. The "\ \" fields within the template means to keep a string value within this area. The "####" field at the end means to keep a numeric value within this area.
Format$ = " \ \ \ \ \ \ \ \ \ \ #### "
Format$ has been set up to receive 5 strings and a numeric value. When the first string is passed in its placed in the first "\ \" field, the second string in the second "\ \" field, and so on. The last field expects a numeric value to format within its "####" area. This is done in line 27 of the code.
PRINT USING Format$; fn$; ln$; afn$; aln$; ch$; dj% ' print formatted data fields
The five strings passed in are placed in the "\ \" fields and the integer is placed in the "####" field automatically. This would be mind-numbing If you had to do this with string manipulation commands such as LEFT$, RIGHT$, STR$, etc.. and then concatenate the formatted strings together. Using semicolons, commas, and TABs would be a pain too because you would always need to account for the length of the individual values so as not to mess the column alignment up.
There is a table of all the available formatting characters to be used with PRINT USING in the QB64 Wiki. The best way to get familiar with the PRINT USING statement is to play around with the various formatting characters. For instance, adding a dollar sign to a value to make it a monetary value and then printing it could be done like this:
Payout! = 100 ' pay day!
po$ = "$" + _TRIM$(STR$(Payout!)) ' insert $ at beginning
decimal% = INSTR(po$, ".") ' get position of the decimal point
IF decimal% = 0 THEN ' was there a decimal point?
po$ = po$ + ".00" ' no, add .00 to end
ELSE ' yes, a decimal point exists
po$ = po$ +"00" ' append 00 to end to assure at least 2 places
po$ = LEFT$(po$, decimal%) + MID$(po$, decimal% + 1, 2) ' build monetary string
END IF
PRINT po$ ' phew! that was a lot of work
The above code ensures that no matter what the value in Payout! is, 100, 100.1, 100.99, etc.. that a decimal point is taken into account and the number of places in the decimal area is always 2. Or you could simply do this:
PRINT USING "$###.##"; Payout! ' easy peasy!
No string manipulation needed and no worrying about the decimal point. Again, very powerful!
A String Manipulation Demo
Here is a sample program that uses string manipulation to create a hidden password function. Save the code as HiddenPassword.BAS when finished typing it in.
Figure 10: Don't peek!
Your Turn
Create a program that scans a user supplied sentence and counts the individual letters found within the sentence. Figure 11 below shows how the program should execute.
Figure 11: Counting letters
- Each letter in the sentence should be counted regardless of being upper or lower case.
- No graphics screen is needed. The program should run in a standard text screen.
- Save the program as LetterCount.BAS when finished.