What character set for Arabic should we use?

As mentioned in the first blog we will discuss some areas, first of which is Arabic as a Programming Language. For this to happen, code written in Arabic letters must be coded in some standard character set that is agreed upon for data integrity, transmission, consistency and security.

There have been many attempts to code Arabic in some sort of different character sets since the early mainframes, that changed along the years and sometimes the coding was different between Arabic countries! We now reached to some maturity state of adhering to a standardized character set provided by the Unicode Consortium. Still there is a but…

There are issues with the coding design used in the Unicode, for example if you want to search for “أحمد” in a database that uses Unicode it will return only the name that begins with “أ” and neglect the name if it began with “ا” which is a problem, since they are the same. Not only that, you have the problem of “ي” and “ى”, also “أ ؤ ئ” and “ء”, etc. for each of which has a different Unicode character code that is not designed in mind to be handled in these situations.

Now this is what I actually want to discuss, it’s not designed in mined to handle situations where algorithms are constructed to do search or use regular expressions. Let us look at the ASCII table, you can make a search for a word regardless of the case by an AND logic operation of the characters to 0x5F, the ASCII table was designed taking this into consideration. This makes comparison, regular expressions and search operations very fast since they are performed at low-level logic operations.

What I am suggesting here is a project to make a new coding for Arabic letters that takes in consideration all design scenarios that will be used when dealing with the Arabic text. That means we have to open a discussion about the origin of Arabic letters that should lead in constructing a coding table, that fulfills requirements of easing expected text operations used in data systems or digital context.

What are your comments on this matter?

Welcome to Shawkani Blog Posts

This Blog concentrates on the following topics:
  • Arabic as a Programming Language.
  • FreeBSD as a base for a fully Arabic system from the core.
  • The Java programming language, discussing JCP and the latest JSR’s.
  • Keep a close look at KHRONOS and Wayland.

The aim is to promote the Arabic language as programming language, that can build any solution from core OS modules and drivers to business applications.

Ok, but why Java JCP and the latest JSR’s? and what about KHRONOS and Wayland? What does these have to do with promoting Arabic as a Programming Language?

I will discuss these in later posts, for the time being I would like to see your feedback and speculations in your comments.

