Back to the Data Model
Here's a discussion of the way Oracle handles Hierarchies.
Requirements :-
A Data Model for a World Language Classification System, (an example of hierarchies in general).
Background
Definitions have been taken from Table 9-13 in the definitive Paper on 'Demographics, ministries,
and scriptures via 13,500 language profiles'.
This summarises the schema worked out by linguist David Dalby in his 2000 publication :-
"The Linguasphere:register of the World's languages"
B. Things of Interest :-
B.1 Geographic Areas and Zones
B.2 Languages and Zones
C. How are they related ?
C.1 Geo-Linguistic Zones
Each dialect is in only one language, each language in only one cluster,
each cluster in only one net, chain, set, zone and so on up to macrozone.
The LangID identifies which cluster, net, chain, set, zone and macrozone
the language is in.
So, each language is in only one 'geographic area' where the term
'geographic area' is used to cover zones and so on.
C.2 Languages
Languages are defined within Language Groups, and different Languages can be
used within different Geographic Areas.
Therefore a hierarchy of Languages in Groups is established.
There are ten Language Macrozones :-
0 African
1 Afro-Asian
2 Australasian
3 Austronesian
4 Eurasian
5 Indo-European
6 North American
7 Sino_Tibetan
8 South American
9 TransAfrican
Each MacroZone is divided up into 10 GlossoZones.
For example, Indo-European (number 5) consists of these GlossoZones :-
50 Celtic
51 Romanic
52 Germanic
53 Slavonic
54 Baltic
55 Albanic
56 Hellenic
57 Armenic
58 Iranic
69 Indic
C.3 Geographic Areas
A hierarchy of Geographic Areas is established, and every language appears at
the appropriate level for every Area where it is used around the world.
Start of the User's Verbatim Statement of Requirements.
The Background
I'm searching for advice on Database design for components of the World Christian Database.
The second printed edition of the publication, 'The World Christian Encyclopedia', appeared in 2001 and
it is now appropriate to review the design in the light of accepted Best Practice for database design.
Sample Data
The language component of the work (not online yet) is the primary area of interest.
There is a single table called WLC (World Language Classification) with about 33,000 records describing the
world's languages, and more importantly classifying them into different levels, the last two of which are
language (9) and dialect (10).
The LangID is the primary key and is linked to other tables including a listing of 13,000 people groups.
The LangID is a unique linguistic feature developed over 20 years which indicates linguistic relationships and
the proximity between languages.
World Language Classification
LangID LenLangID RefName Population (2000)
0 1 AFRICA macrozone 68,968,350
00 2 MANDIC zone 23,007,660
00-A 4 WEST MANDE set 17,549,981
00-AA 5 NORTHWEST MANDE chain 14,771,168
00-AAA 6 MANDING net 13,474,770
00-AAAA 7 WEST MANDING cluster 5,497,738
00-AAAA-a 9 mandinka-kango 1,364,391
00-AAAA-aa 10 mandinka-'gambia'
00-AAAA-ab 10 mandinka-'casamance'
00-AAAA-ac 10 woyinko
00-AAAA-ad 10 bilaso
etc.
The SQL is shown below to create, populate and query a tailored table for the data just above.
The question :-
Should this type of hierarchy be modeled in a separate table for each level ?
While the splitting into tables is easy, putting it all back together is much more complex than the
single table method.
The hope is that for multi-user read-only access the single table may still be appropriate.
e.g having the single Ref_Languages table in your 'World Health Organization STD' to link to Publications is
easier than linking to the result of several joins or union queries to see the context of the langage.
End of User's Verbatim Statement of Requirements.
B. Things of Interest :-
B.1 Geographic Zones for all part os the World.
B.2 Languages spoken in every Geographic Zone.
C. How are they related ?
C.1 Many levels of Language can be defined.
C.2 Each Level has one Parent and zero, one or many Chidlren,(lower levels of Language).
C.3 The lowest Levels are called Language,(level 9) and Dialect,(level 10).
The example uses a Language Table that looks like this ...
Oracle Scripts :-
CREATE TABLE languages
(LangID char(10) primary key
,parent_langID char(10)
,lang_sequence integer
,LenLangID integer
,RefName varchar(25)
,population_2000 integer)
/
INSERT INTO languages
(LangID,parent_langID,lang_sequence,LenLangID, RefName ,population_2000)
VALUES
( '0' , null , 1 , 1 ,'AFRICA macrozone', 68968350 )
/
INSERT INTO languages
(LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000)
VALUES
( '00' , '0' , 1 , 2 ,'MANDIC zone' , 23007660 )
/
INSERT INTO languages
(LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000)
VALUES
('00-A', '00' , 1 , 4 ,'WEST MANDE set' , 17549981 )
/
INSERT INTO languages
(LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000)
VALUES
('00-AA', '00-A' , 1 , 5 ,'NORTHWEST MANDE chain',14771168 )
/
INSERT INTO languages
(LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000)
VALUES
('00-AAA', '00-AA' , 1 , 6 ,'MANDING net',13474770 )
/
INSERT INTO languages
(LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000)
VALUES
('00-AAAA', '00-AAA' , 1 , 7 ,'WEST MANDING cluster',5497738 )
/
INSERT INTO languages
(LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000)
VALUES
('00-AAAA-a','00-AAAA' , 1 , 9 ,'mandinka-kango',1364391 )
/
INSERT INTO languages
(LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000)
VALUES
('00-AAAA-aa','00-AAAA-a', 1 , 10 ,'mandinka-gambia',null )
/
INSERT INTO languages
(LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000)
VALUES
('00-AAAA-ab','00-AAAA-aa', 2 , 10 ,'mandinka-casamance',null )
/
INSERT INTO languages
(LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000)
VALUES
('00-AAAA-ac','00-AAAA-a', 3 , 10 ,'woyinko',null )
/
INSERT INTO languages
(LangID,parent_langID,lang_sequence,LenLangID,RefName ,population_2000)
VALUES
('00-AAAA-ad','00-AAAA-a', 4 , 10 ,'bilaso',null )
/
SELECT LangID, refname, population_2000
FROM languages
START WITH parent_langID is null
CONNECT BY PRIOR LangID = parent_langID
/
-- Produces this output ...
LANGID REFNAME POPULATION_2000
---------- ---------- ----------------- ---------------
0 AFRICA macrozone 68968350
00 MANDIC zone 23007660
00-A WEST MANDE set 23007660
00-AA NORTHWEST MANDE chain 14771168
00-AAA MANDING net 13474770
00-AAAA WEST MANDING cluster 5497738
00-AAAA-a mandinka-kango 1364391
00-AAAA-aa mandinka-gambia
00-AAAA-ab mandinka-casamance
00-AAAA-ac woyinko
00-AAAA-ad bilaso
11 rows selected.
Barry Williams
14th. February 2004
Principal Consultant
Database Answers