(Return to Index)
2. Natural Language Processing
good arrangement of words
http://www.wjh.harvard.edu/~inquirer/kellystone.htm Specific Links on Natural Language Processing and "Common Sense"
Common Sense is one of the hardest things to set inside the computer. It consists of the largest database imaginable, the human mind, and directly how that mind correlates all the data it knows.

My Theories:
A computer can be programmed to do many complex tasks, and it has been shown to excell in certain specific areas such as medical diagnosing, and mathmatical formulas, but the one area that it cannot handle well is human Natural Language, or Common Sense.
I feel that Common Sense is the most interesting and useful ability for a Computer AI to achieve.
There are a number of ideas on how to advance in this area, and for now I will just discuss mine and my programs that attempt to achieve this.

Ideal Goals:
Here we state our goals. I want a AI that can communicate in basic english, "understanding" and responding to general ideas, asking appropriate questions. We need a general World Database, with the ability to add new data to it. After reading through an Encyclopedia the conputer can respond to basic questions such as:
Who is Archimedes?
He is a Greek mathematician.

All of this inclusively is called Ontology, or Knowledge Representation, an ontology is a specification of a conceptualization. This is our formilism of representing data.
Ontology
In its general meaning, ontology (pronounced ahn-TAH-luh-djee) is the study or concern about what kinds of things exist - what entities there are in the universe. It derives from the Greek onto (being) and logia (written or spoken discourse). It is a branch of metaphysics, the study of first principles or the essence of things. In information technology, an ontology is the working model of entities and interactions in some particular domain of knowledge or practices, such as electronic commerce or "the activity of planning." In artificial intelligence (AI), an ontology is, according to Tom Gruber, an AI specialist at Stanford University, "the specification of conceptualizations, used to help programs and humans share knowledge." In this usage, an ontology is a set of concepts - such as things, events, and relations - that are specified in some way (such as specific natural language) in order to create an agreed-upon vocabulary for exchanging information.

Statement: My car is red.
This should be interpreted as My, being the user typing, possessing a car, that has the color quality of red. This would be stored under the database in cars, ie There exists a Red Car, owned by User, and cross referenced in user, as Owns - Car red

The statement: "I am going to the park" should issue a "park" query that shows that the AI knows parks exist, and then a question reply: What park are you going to?

This should extend all the way down to more complicated concepts.
Question from AI: Where is the park?
Response from user: Down Elm street near the corner.
This would save a basic connection in the Park to Elm Street.

With this basic ability the computer could hold a simple conversation, and gather additional information about the world!!!
Now the hard part of making this all work, and work smoothly.


Language - The only language I will be using is English, as it is my native language and it has been shown that all languages work similarly for communication, so if aa theory works for English it can easily be ported over into another language. Further I will only handle the written language as I am not a hardware guru, and am not worried about responding to voice commands or speech, as these all are basically translated in to the written word.
The English Grammar - The English Language and Grammar is HUGE! First statement over with. But as with all grammars, it of course has structure and usage rules. Structures and Rules are ideas that computers can handle better than humans, while tonal inflections, meanings, and subtle nuances that humans use in everyday speech. This is fine, first a computer must understand Basic Language.

Basic Language - Basic Language is best compared with a child learning, the analagy is similar. Our computer must first understand a few things before it understands everything. I am using the English Grammar as a template for learning. So we start with Grade School Grammar, sentences use the basic word forms, Nouns, Verbs.
(We are going to use the AI Concepts of a basic Frame System, with Word-Association Network)

Parts of Speech (PoS)
Our Teachers taught us that Nouns are People, Places, Things, and Ideas, and can be common or proper, singular or plural, and possesive.
These terms by themselves are not that difficult to deal with, simple Frame-Network systems are in use by most every business for People, they are Customer Databases. And things are easily catogorized as well by simple properties such as, animal, plant or other, the Scientific ORDER??? giving us the template for that.
One major problem to address here is the sheer quantity of Data to be absorbed. There are multiple-billion people in the world, and millions of words. To think about sitting down and making a list, and sorting them out for a computer AI is fairly preposterous. Therefor we will need to have some way of teaching the AI, giving it larger sections of info via the internet, in the form of online Dictionaries, Encyclopedias, and full-text Books.
Another thing is, even teachign basic structures to the computer can be bothersome and time consuming, so we want the computer to automatically pick up as many thigns as possible.
Example: We Teach the computer that people are noung things, and give them basic properties such as is-animal, is alive.
Now it pulls in a list of people with information, everything about them. It sorts through the information and realizes that there are two sexes.
Data Representation: Now based on this information, the frame for people should automatically split into two frames: 1 for Male and one for Female.
It should continue doing this for every marjor partition in the frames, therby automatically sorting people into races, living areas, ages and such.
This is a general rule, that must be watched, but does provide us with some basic form.

English Sentences - Parseing the sentence is essential to comprehending the meanings. A perfect knowledge database with a perfect parser would give us a strong basis for an AI entity.
Unfortuantely, no parser is 100% accurate, and teh english language is continually changing, therefor our program should have multiple redudnant checks to insre data is correct and the ability to adapt and add new words to its mind.
I use a multi-part english parser that I wrote entirely myself. First it checks our dictionary to see what Part of Speech (PoS) the word is, then checks the words around it for matching, ie most words following the artciles a an the, are nouns or adjectives. Most words in front of nouns are adjectives, and a number other rules all together give us a good feel for what PoS each word is (Current rate is still < 90%)
If in doubt we choose the most likely PoS, the first one listed in our dictionary.

So at this point we have a basic Sentence Parser and a basic Information Gatherer.
*more on these two, on another page... What do we do now?
Reference Network - Now I want to work with a number of different networking schemes. One of the simplest and most useful is our IS-A connection.

Is-A Relationship - this is the basic relationship connection for all words, mainly for nouns.
Ex: tulip IS-A flower, flower IS-A plant, plant IS-A thing, thing IS-A noun
Ex: John IS-A human, human IS-A person, person IS-A noun
Now in theory, this could be done with 100% accuracy with every Noun, and the entire english vocabulary would be sorted into a nice graphical tree. Unfortunately there are vaugenesses in our usage of words that must be worked around.
The easiest way to extract IS-A relationship is through a simple dictionary or encyclopedia.
More Examples: *Graphicals*
(percentage found: ?? about 80%?)
(percentage correct: about 80%?)

General Word Association There are many different type of Relationships we can have with words, other than the IS-A one, such as human HAS-PART leg and Cheesecake MADE-FROM cream cheese. The number of these is only slightly less than the number of words, so one way I would like to explore all relationships between words is through Word Association. This is based on the popular psychology game where a person says one word, and then the other responds with the first thing they think of, such as:
Dr.: TV
John: Radio
Dr.: cat
John: dog
Dr.: rain
John: hail
Dr.: nurse
John: doctor
Dr.: baby
John: mother

Now each set of words listed here are Related in different ways:
TV is like Radio, both forms of communication
Cat is a enemy of Dog, they are both pets...
Hail and Rain are types of Precipitation
a Nurse works with a Doctor
a baby has a mother
Now, our computer given two words like this, should be able to tell us how they are related...

Now a more complicated version of this is word-sentence association. In that every word in a sentence has a particular relationship with the other words in a sentence, and taken over a large number of groups, some words have a higher chance of being seen together. The words computer and keyboard have a very much higher occurrence rate than would computer and stapler, therfore, we want our compute rto find out the relationship between the words that have a high occurrence.
The much harder part of this excersize is determining the relationship type, that will require more in depth looking at the context.
Rethought - This word association really gives us a Topical Dictionary - A large set of words that all have a similar root topic, which will be useful later in cutting down the size of our dictionaries.
(* Link to Full Listings - Examples...*)


Further Breaking Down Nouns - We need to programaticaly break down the nouns into thier parts - Person Place Thing Idea I add one more to simplify things as animal 1. Person 2. Thing 3. Animal 4. Place 5. Idea Now the first three fall under the category Object as well, as in something that has mass, takes up space and can be seen. The 'Place' is really a Object as well but will have different properties.
The last is the trickiest as they are vague puffs of air, Ideas and Thoughts, more on those later.
We can now determine wether a word is a noun, but based on keywords- surrounding the word in several examples we must determine wether it is a person, place, animal, thing, or idea. Rules: Capitalization - if it is a capitol, it is most likely a proper person or place.
Examples:
What is a 'table'? We know this is a thing, of course, but how does the computer, let us go find example sentences on the net with table in them. (is also an idea ech)
Main Definition:n. An article of furniture supported by one or more vertical legs and having a flat horizontal surface. 
is-a should be Furniture
furniture - The movable articles in a room or an establishment that make it fit for living or working.
is-a article quality movable
Rule Movable Articles are Things.
:. Table is an Movable Furniture Article
:. Tables are things.
Hmmm hard logic to program, we still need a very very good is-a table!
mattress - A usually rectangular pad of heavy cloth filled with soft material or an arrangement of coiled springs, used as or on a bed. Is-a pad pad - n. A thin, cushionlike mass of soft material used to fill, to give shape, or to protect against jarring, scraping, or other injury. is-a Material Mass a material mass is a Thing :. a Matress is a thing.
Rule: Things have properties, weight, dimensions, volume, placement, ownership.
Rule: Standard measurement in feet, other as marked, and lbs
Table Frame
Avg width=4
Avg length=4
Avg height=3
Avg Weight=50
IS-A=Furniture article
Mattress Frame
Avg width=4
Avg length=4
Avg height=3
Avg Weight=50
Property=rectangular,usually or most
Property=soft
computer - n a machine that can be programmed to manipulate symbols.
is-a machine
machine - A device consisting of fixed and moving parts that modifies mechanical energy and transmits it in a more useful form.
is-a device
A device is a thing.
Rule: Most things inherit properties from 1 down Is-A frame!!!!
Mattress is soft because pads are soft. BUT NO FURTHER THAN ONE IS GAURANTEED!!
Restate computer: a device consisting of fixed and moving parts that modifies mechanical energy and transmits it in a more useful form that can be programmed to manipulate symbols.

Computer Frame
Avg width=2
Avg length=1br> Avg height=2
Avg Weight=50
"can be programmed" ech!
IS-A machine
Machine Frame
"consisting of fixed and moving parts"
is-a device inherits "consisting of fixed and moving parts"
RULE!: We need to run encyc to understand that if an is-a is an object, then the child is an object as well, all the way down the line.
or animal, or thing, or person.

DO THESE NOW!!! Statistics There are ****** words in the english language, a breakdown of them is.....










Find Book: The Emotion Machine, Marvin Minsky




# add eliza online somewhere
Joseph Weizenbaum's ELIZA
About Eliza
# also add anthony's program as well.