General Resources

Please select a topic of interest:

Getting Started with Unicode

This guide will introduce the concepts behind Unicode and how it can be used for writing Punjabi in Gurmukhi. It is designed for people who are familiar with Gurmukhi and have used Gurmukhi on a computer before.

Getting started

To get started you need to check that your computer supports Unicode and install the appropriate software to handle Gurmukhi. This is a simple process if your operating system has this ability.

Windows XP and newer versions of Windows will fully support Unicode Gurmukhi. Older versions of Windows (95, 98, ME) do not support Unicode Gurmukhi. If you have an older version of Windows, you can download Internet Explorer 6.0 which will allow you to view Unicode Gurmukhi web pages. You will not be able to type in Unicode Gurmukhi on older versions of Windows.

Details on how to enable Unicode Gurmukhi on your computer depending on your operating system are available.

After you have enabled Gurmukhi support on your computer, you should install some Unicode Gurmukhi fonts if you have not got any already. A selection of free fonts is available. We recommend you install both Saab and AnmolUni.

Finally, you should get to grips with your keyboard layout. If you are unhappy, you can download a different one or you can get the Microsoft Keyboard Layout Creator to make your own. The same keyboard layout can be used for all Unicode Gurmukhi fonts. If you change the layout, this will not affect the fonts you are using.

Unicode

Unicode is designed as a character set that allows virtually all written scripts in the world to be used on a computer. It is a monumental step forward and allows non-English languages to be used universally on computer systems.

“Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.”

How Unicode differs from fonts

Historically, Gurmukhi has been represented using vast numbers of proprietary fonts – each with their own encoding method and keyboard layout. They worked by changing the appearance of Latin text characters so that they formed Gurmukhi. For example, with AnmolLipi the Latin capital letter ‘A’ represents Era.

AnmolLipi Era

Using a different font would corrupt the Gurmukhi text and make it unreadable. For example, DrChatrikWeb shows the same letter as Ura.

DrChatrikWeb Ura

Unicode does away with this ambiguity. Instead, it represents Gurmukhi with its own characters and does not use Latin text. This is standardized on all computers so no matter what font you use, the text will be the same!

Unicode also separates encoding from input. This means that you can have any type of keyboard layout that you want and the underlying text will always be the same.

What you should know

There are peculiarities involved with using Unicode Gurmukhi that one should be aware of.

The concept of independent vowels is unusual in Gurmukhi. Unlike other Indian scripts, Gurmukhi constructs independent vowels by using a combination of dependent vowel signs with Ura, Era and Iri. In keeping with other Indian scripts such as Devanagari, Unicode encodes these independent vowels separately. Thus when you wish to type Iri and Bihari you must use the pre-composed Iri Bihari character.

Conjuncts such as Paireen Haha and Paireen Rara are created using a Devanagari like Halant. Thus, if you wish to type ‘ਪ੍ਰ’ you would enter Puppa (ਪ), Halant (੍) and finally Rara(ਰ).

The above two issues can be addressed using different keyboard layouts. For example, you could have a keyboard layout that converted the individual key combinations of Iri and Bihari to the pre-composed Iri Bihari character. You could also have a key that contained both a Halant and a Rara/Haha so that when it is pressed it would automatically show the Paireen form. Some keyboards may contain these features and you are free to use whichever you find more comfortable.

Characters in Unicode are entered in logical order – that is the order that they are pronounced and not how they are written. Therefore if you wish to type a syllable with Sihari in it, you must type the Sihari after the character it applies to. The computer will then reposition this to the left. For example, to type ‘ਵਿਚ’ you would enter Vava (ਵ), Sihari (ਿ) and then Chucha (ਚ).

Unicode strongly enforces the basic rules of Gurmukhi. You are therefore not able to enter character combinations that are invalid in modern Gurmukhi. For example, you cannot attach more than one vowel sign to a consonant. If you attempt to do this, a dotted circle will appear next to the second vowel that prevents it joining onto the consonant.

An Introduction to Gurmukhi

Gurmukhi, a derivative of Landa, is a type of script called an abugida. It was standardized by Guru Angad Dev in the sixteenth century and is designed to write the Punjabi language.

This guide introduces the main concepts of the Gurmukhi script in relation to the Punjabi language. Gurmukhi has been adapted to write other languages (such as Sanskrit) but these adaptations will generally not be covered.

The Alphabet

The Gurmukhi (or Punjabi) alphabet contains thirty-five distinct letters. These are:

Ura Era Iri

The first three letters are unique because they form the basis for vowels. Apart from Era, these characters are never used on their own. See the section on vowels for further details.

Sussa
Sa
Haha
Ha
Kukka
Ka
Khukha
Kha
Gugga
Ga
Ghugga
Gha
Ungga
Nga
Chucha
Ca
Chhuchha
Cha
Jujja
Ja
Jhujja
Jha
Yanza
Nya
Tainka
Tta
Thutha
Ttha
Dudda
Dda
Dhudda
Ddha
Nahnha
Nna
Tutta
Ta
Thutha
Tha
Duda
Da
Dhuda
Dha
Nunna
Na
Puppa
Pa
Phupha
Pha
Bubba
Ba
Bhubba
Bha
Mumma
Ma
Yaiyya
Ya
Rara
Ra
Lulla
La
Vava
Va
Rahrha
Rra

In addition to these, there are six consonants created by placing a dot (bindi) at the foot (pair) of the consonant:

Shusha pair bindi
Sha
Khukha pair bindi
Khha
Gugga pair bindi
Ghha
Zuzza pair bindi
Za
Fuffa pair bindi
Fa
Lulla pair bindi
Lla

Vowels

Gurmukhi follows similar concepts to other Brahmi scripts and as such, all consonants are followed by an inherent ‘a’ sound (unless at the end of a word when the ‘a’ is usually dropped). This inherent vowel sound can be changed by using dependent vowel signs which attach to a baring consonant. In some cases, dependent vowel signs cannot be used – at the beginning of a word or syllable for instance – and so an independent vowel character is used instead.

Dependent Vowels

ਿ
Mukta
a
Kanna
aa
Sihari
i
Bihari
ii
Lavan
ee
Dulavan
ai
Onkar
u
Dulankar
uu
Hora
oo
Kanaura
au

Dotted circles represent the barer consonant. Vowels are always pronounced after the consonant they are attached to. Thus, Sihari is always written to the left, but pronounced after the character on the right.

Independent Vowels

a aa i ii ee ai
u uu oo au

Vowel Examples

ਆਲੂ – aaluu – potato

ਦਿਲ – dil – heart

Halant

The Halant character is not used when writing Punjabi in Gurmukhi. However, it may occasionally be used in Sanskritised text. When it is used, it represents the suppression of the inherent vowel.

Halant

The affect of this is shown below:

ਕ – Ka

ਕ੍ – K

Numbers

Gurmukhi has its own set of numerals that behave exactly as Latin (Arabic) numerals do. These are used extensively in older texts. In modern contexts, they are being replaced by standard Latin numerals although they are still in widespread use.

Sifar
0
Ek
1
Dhau
2
Tinn
3
Char
4
Panj
5
Chaay
6
Sat
7
Aht
8
Noh
9

Other Signs

Bindi Tippi Addak

Bindi and Tippi are used for nasalisation (similar to the ‘n’ sound in words ending in ‘ing’). In general, Onkar (u) and Dulankar (uu) take Bindi in their initial forms and Tippi when used after a consonant. All other short vowels take Tippi and all other long vowels take Bindi. Older texts may not follow these conventions.
The use of Addak indicates that the following consonant is geminate. This means that the subsequent consonant is doubled or reinforced.

Conjuncts

A conjoined consonant combines two (or more) consonants. Modern Gurmukhi employs three main conjoined characters that sit at the bottom of a barer consonant. A half form of Yaiyya (ya) is also occasionally used.

Your browser may have problems displaying these conjuncts on their own.

‍੍ਹ ‍੍ਰ ‍੍ਵ ‍੍ਯ
Ha Ra Va Ya

The affect of this is shown below:

Mha - ਮ + ਹ = ਮ੍ਹ

Pra - ਪ + ਰ = ਪ੍ਰ

Dva - ਦ + ਵ = ਦ੍ਵ

Dya - ਦ + ਯ = ਦ੍ਯ

Ek Onkar

Ek Onkar is a Gurmukhi symbol that is often used in Sikh literature. It literally means ‘one God’.

Ek Onkar

Visarg

The Visarg symbol is very occasionally used in Gurmukhi. It can either represent an abbreviation (like period is used in English) or it can act like a Sanskrit Visarg where a voiceless ‘h’ sound is pronounced after the vowel.

Visarg

Enabling Punjabi Support

This article deals with enabling Punjabi support on various computer systems. At the moment it only covers Windows XP. Windows XP is the first operating system to fully support Gurmukhi text entry and display. There are methods to enable this support on Windows 2000 but are overly complicated. Details on Windows 2000 and Linux/MacOS support will be added at a later date.

Linux (Gnome, KDE)

Mac OS X

Windows 95, 98 and ME

It is not possible to enable system-wide support for Unicode Gurmukhi on Windows 95, 98 and ME. You may still view web pages
in Unicode Gurmukhi on Windows 98 and ME by downloading and installing Microsoft Internet Explorer 6.

Windows Vista

Windows XP

Enabling Punjabi support on Windows XP is easy! Carry on reading to find out how.

Control Panel

Load the control panel by clicking on the "Start"' menu and pressing the "Control Panel" icon. Make sure you are in category view. If you are not, there should be an option on the left hand side to "Switch to Category View". Select the icon that says "Date, Time, Language and Regional Options" and then select "Regional and Language Options".

Regional and Language Options

Regional and Language Options Dialog
Select the "Languages" tab and make sure you select the option saying "Install files for complex script and right-to-left languages (including Thai)". A confirmation message should now appear - press "OK" on this confirmation message.
Now select the button that says "Details...".

Text Services and Input Languages

Text Services and Input Languages Dialog
Press the "Add..." button. This should load a dialog box asking you which input language to add. Select "Punjabi" from the drop-down list and make sure the check box labelled "Keyboard layout/IME" is selected as shown:
Add Input Language Dialog
Now select "OK". This should now enable you to both read and write Gurmukhi on your Windows XP computer. You can use the combination ALT + SHIFT to switch between different keyboard layouts (e.g. from a UK Keyboard to Gurmukhi and vice-versa). If you want a language bar, you can select it by pressing the "Language Bar..." button on the "Text Services and Input Languages" dialog and then selecting "Show the language bar on my desktop". The language bar enables you to visually select the keyboard layout you are using.
Once this is done, press "OK" on all remaining dialog boxes. Congratulations - you should now be able to use Unicode Gurmukhi!

Developing Punjabi Web Sites

Creating a Punjabi languages web site is easy! This guide is designed for users
already familiar with creating web sites.
You can only create Punjabi web sites using Unicode on Windows XP and newer, Mac
OS 10.2 and newer or a compatible Linux distribution running a Unicode aware
desktop manager such as Gnome.

Unicode Awareness

The first step towards creating a Punjabi web site is to ensure that the web
browser knows how to read the pages. There are a couple of methods that can be
employed to ensure this happens. First though, you have to decide whether you
are going to use UTF-8 or UTF-16. UTF-8 is an 8 bit transformation format for
Unicode and UTF-16 is a 16 bit transformation format for Unicode. If your web
site is going to have lots of Punjabi text, you should set the encoding to
UTF-16 because this will actually save space. However, if your web site only
has a little bit of text or is mainly in English, select UTF-8.
Note: Programs such as Macromedia Dreamweaver do not support Gurmukhi and
therefore you should use a program like Notepad if you are familiar with HTML.

To save a page in Notepad in your preferred format, click 'File' > 'Save
As...' and then select the drop-down list labelled 'Encoding'. To save in
UTF-16 format select 'Unicode' and for UTF-8 select 'UTF-8'.

If you are creating an XHTML page and you have included an XML declaration,
ensure that it has the appropriate encoding listed:

<?xml version="1.0" encoding="utf-8"?>

or

<?xml version="1.0" encoding="utf-16"?>

You only need to do this if you already have a declaration and you are using
XHTML. The XML declaration is known to cause problems with Internet Explorer when used with XHTML 1.1. It's not required for plain HTML.
For all sites, whether they have been created in XHTML or plain HTML*, you should
include the following line inside the <head> tags:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

or

<meta http-equiv="Content-Type" content="text/html; charset=utf-16" />

*If you are using HTML, ensure that the tag ends with > only and not />.

Punjabi Awareness

Now that you have made your browser aware of the fact that you have created a
Unicode web page, the next step is to make it aware that it is in
Punjabi! This is quite simple although differs slightly between plain
HTML, XHTML 1.0 and XHTML 1.1. The method involves adding an attribute to
the <html> tag so that it looks something like this:

HTML

<html lang="pa-IN">

XHTML 1.0

<html xmlns="http://www.w3.org/1999/xhtml" 
     lang="pa-IN" xml:lang="pa-IN">

XHTML 1.1 and above

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="pa-IN">

Browser Compatibility

Unfortunately, not all browsers support Gurmukhi. If using Internet Explorer 6
and above with appropriate fonts, users on all versions of Windows should be
able to view the web page correctly. This covers the vast majority of internet users.The newest versions of both Opera and Safari
support Gurmukhi with correct operating system support. Mozilla (including Netscape) and Mozilla Firefox support
Gurmukhi if the underlying operating system supports it (Windows XP and above).
Issues with Gurmukhi rendering on Mozilla products using Linux still remain.

Templates

In addition to this introduction, we also provide some very basic templates
to get you started. These templates validate correctly using the W3C Validator.

HTML 4.0 Transitional

UTF-8

UTF-16

XHTML 1.0 Transitional

UTF-8

UTF-8 (with XML declaration)

UTF-16

UTF-16 (with XML declaration)

XHTML 1.1

UTF-8

UTF-8 (with XML declaration)

UTF-16

UTF-16 (with XML declaration)

Developing Punjabi Applications

This article is aimed at programmers wishing to develop Unicode enabled applications - specifically in Punjabi.

Operating Systems

If you wish to develop Unicode Gurmukhi applications on anything older than Windows XP - forget it!
Only Windows XP and newer support Unicode Gurmukhi natively in applications. Only consider creating Unicode
Gurmukhi applications if virtually all of your user base uses Windows XP (or anything newer). Although Windows
2000 has support for Unicode, it does not contain the required version of the Uniscribe DLL that supports Gurmukhi.
You may consider upgrading the Uniscribe DLL to the one that accompanies Windows XP but this is in no way
supported by Microsoft.

Under Linux, GTK+ 2 uses Pango to render text which means any programs created using GTK+ 2 should render
Unicode Gurmukhi correctly. Gnome 2.4 and above (possibly earlier versions too) have been tested and support
Unicode Gurmukhi when appropriate fonts have been installed.

Mac OS 10.2 (Jaguar) and above supports Unicode Gurmukhi.

C++ Issues

If you are familiar with programming with Unicode in C++ you probably don't need to read this.

Define _UNICODE

Make sure you #define _UNICODE to ensure that any APIs that have both ANSI and Unicode versions default
to the Unicode version. Alternatively, under Windows, you can explicitly call the wide (Unicode) versions
of the Win32 APIs by adding a W to the end of the function name.

wchar_t and std::wstring

If you are using the char data type this must be changed to wchar_t. Do not take wchar_t to mean 16 bits or 32 bits
and if you are using strings widely, we recommend you use std::wstring instead.

String Literals - L or _T()

Use L in front of string literals to ensure that they are Unicode. For example: L"Hello World!\n". Alternatively
if you have defined _UNICODE you can use _T("Hello World!\n").

.NET & Mono Issues

Both .NET and Mono use Unicode natively and as such require no special adjustment to use Unicode Gurmukhi. However
be aware that the rendering of Gurmukhi is dependent on the underlying operating system.

Visual Basic Issues

Visual Basic 6 does not support Unicode very well at all. You should upgrade to VB .NET if you wish to use
Unicode Gurmukhi.