Database Answers City Temple.

Home Ask a Question Careers Contact us Data Models First Timers Search Site Map
   World Language Classification
Back to the Data Model
Here's a discussion of the way Oracle handles Hierarchies.
Requirements :-
A Data Model for a World Language Classification System, (an example of hierarchies in general).

Definitions have been taken from Table 9-13 in the definitive Paper on 'Demographics, ministries, 
and scriptures via 13,500 language profiles'.

This summarises the schema worked out by linguist David Dalby in his 2000 publication :-
"The Linguasphere:register of the World's languages"

B. Things of Interest :-
B.1 Geographic Areas and Zones
B.2 Languages and Zones

C. How are they related ?
C.1 Geo-Linguistic Zones
Each dialect is in only one language, each language in only one cluster, 
each cluster in only one net, chain, set, zone and so on up to macrozone. 

The LangID identifies which cluster, net, chain, set, zone and macrozone 
the language is in. 

So, each language is in only one 'geographic area' where the term 
'geographic area' is used to cover zones and so on.  

C.2 Languages 
Languages are defined within Language Groups, and different Languages can be 
used within different Geographic Areas.
Therefore a hierarchy of Languages in Groups is established.
There are ten Language Macrozones :-
0 African
1 Afro-Asian
2 Australasian
3 Austronesian
4 Eurasian
5 Indo-European
6 North American
7 Sino_Tibetan
8 South American
9 TransAfrican

Each MacroZone is divided up into 10 GlossoZones.
For example, Indo-European (number 5) consists of these GlossoZones :-
50 Celtic
51 Romanic
52 Germanic
53 Slavonic
54 Baltic
55 Albanic
56 Hellenic
57 Armenic
58 Iranic
69 Indic

C.3 Geographic Areas
A hierarchy of Geographic Areas is established, and every language appears at 
the appropriate level for every Area where it is used around the world.

Start of the User's Verbatim Statement of Requirements.
The Background
I'm searching for advice on Database design for components of the World Christian Database. 

The second printed edition of the publication, 'The World Christian Encyclopedia', appeared in 2001 and
it is now appropriate to review the design in the light of accepted Best Practice for database design.
Sample Data
The language component of the work (not online yet) is the primary area of interest. 
There is a single table called WLC (World Language Classification) with about 33,000 records describing the 
world's languages, and more importantly classifying them into different levels, the last two of which are 
language (9) and dialect (10). 

The LangID is the primary key and is linked to other tables including a listing of 13,000 people groups. The LangID is a unique linguistic feature developed over 20 years which indicates linguistic relationships and the proximity between languages. World Language Classification LangID LenLangID RefName Population (2000) 0 1 AFRICA macrozone 68,968,350 00 2 MANDIC zone 23,007,660 00-A 4 WEST MANDE set 17,549,981 00-AA 5 NORTHWEST MANDE chain 14,771,168 00-AAA 6 MANDING net 13,474,770 00-AAAA 7 WEST MANDING cluster 5,497,738 00-AAAA-a 9 mandinka-kango 1,364,391 00-AAAA-aa 10 mandinka-'gambia' 00-AAAA-ab 10 mandinka-'casamance' 00-AAAA-ac 10 woyinko 00-AAAA-ad 10 bilaso etc. The SQL is shown below to create, populate and query a tailored table for the data just above. The question :- Should this type of hierarchy be modeled in a separate table for each level ? While the splitting into tables is easy, putting it all back together is much more complex than the single table method. The hope is that for multi-user read-only access the single table may still be appropriate. e.g having the single Ref_Languages table in your 'World Health Organization STD' to link to Publications is easier than linking to the result of several joins or union queries to see the context of the langage. End of User's Verbatim Statement of Requirements. B. Things of Interest :- B.1 Geographic Zones for all part os the World. B.2 Languages spoken in every Geographic Zone. C. How are they related ? C.1 Many levels of Language can be defined. C.2 Each Level has one Parent and zero, one or many Chidlren,(lower levels of Language). C.3 The lowest Levels are called Language,(level 9) and Dialect,(level 10). The example uses a Language Table that looks like this ... Oracle Scripts :- CREATE TABLE languages (LangID char(10) primary key ,parent_langID char(10) ,lang_sequence integer ,LenLangID integer ,RefName varchar(25) ,population_2000 integer) / INSERT INTO languages (LangID,parent_langID,lang_sequence,LenLangID, RefName ,population_2000) VALUES ( '0' , null , 1 , 1 ,'AFRICA macrozone', 68968350 ) / INSERT INTO languages (LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000) VALUES ( '00' , '0' , 1 , 2 ,'MANDIC zone' , 23007660 ) / INSERT INTO languages (LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000) VALUES ('00-A', '00' , 1 , 4 ,'WEST MANDE set' , 17549981 ) / INSERT INTO languages (LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000) VALUES ('00-AA', '00-A' , 1 , 5 ,'NORTHWEST MANDE chain',14771168 ) / INSERT INTO languages (LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000) VALUES ('00-AAA', '00-AA' , 1 , 6 ,'MANDING net',13474770 ) / INSERT INTO languages (LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000) VALUES ('00-AAAA', '00-AAA' , 1 , 7 ,'WEST MANDING cluster',5497738 ) / INSERT INTO languages (LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000) VALUES ('00-AAAA-a','00-AAAA' , 1 , 9 ,'mandinka-kango',1364391 ) / INSERT INTO languages (LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000) VALUES ('00-AAAA-aa','00-AAAA-a', 1 , 10 ,'mandinka-gambia',null ) / INSERT INTO languages (LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000) VALUES ('00-AAAA-ab','00-AAAA-aa', 2 , 10 ,'mandinka-casamance',null ) / INSERT INTO languages (LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000) VALUES ('00-AAAA-ac','00-AAAA-a', 3 , 10 ,'woyinko',null ) / INSERT INTO languages (LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000) VALUES ('00-AAAA-ad','00-AAAA-a', 4 , 10 ,'bilaso',null ) / SELECT LangID, refname, population_2000 FROM languages START WITH parent_langID is null CONNECT BY PRIOR LangID = parent_langID / -- Produces this output ... LANGID REFNAME POPULATION_2000 ---------- ---------- ----------------- --------------- 0 AFRICA macrozone 68968350 00 MANDIC zone 23007660 00-A WEST MANDE set 23007660 00-AA NORTHWEST MANDE chain 14771168 00-AAA MANDING net 13474770 00-AAAA WEST MANDING cluster 5497738 00-AAAA-a mandinka-kango 1364391 00-AAAA-aa mandinka-gambia 00-AAAA-ab mandinka-casamance 00-AAAA-ac woyinko 00-AAAA-ad bilaso 11 rows selected.

Barry Williams
14th. February 2004
Principal Consultant
Database Answers


Home Ask a Question Careers Contact us Data Models First Timers Search Site Map